Anthony Caputo

The Rise of the System Processing Unit

Blog Post created by Anthony Caputo Employee on Jan 30, 2016

I recently revisited the silicon world of micro processing and its evolution since I wrote my book Build Your Own Server (McGraw-Hill 2003), and was fascinated by how computing is shifting from what was an integrated circuits and component based system architecture, with the central processing unit (CPU) at its core, to a more consolidated, embedded and integrated system processing unit (SPU), that improves overall computing performance rather than just processing power; an evolution that appears to have not only changed computing, benefiting software development, but also seems to have derailed Moore’s Law.

In 1975, during a meeting of the IEEE International Electron Devices Meeting, Gordon Moore stated that based on the increasing die sizes, reduction in defective densities and the simultaneous evolution of finer and smaller dimensions, “circuit density-doubling would occur every 24 months.” Although Douglas Engelbart had a similar theory in 1960 and David House of Intel factored in the increasing performance of transistors and transistors counts, the fact that “micro-processing performance would double every two years” has come to be known as “Moore’s Law.”


Moore’s Law is an important consideration in any traditional system architecture that uses a central processing unit (CPU), surrounded by multiple system components. Encryption and cryptography, compression and decompression and any data transmissions uses processing power. The more complex the intelligence, the more GFLOPS of processing power that is required. GFLOPS is a measure of computer speed; a gigaflop is a billion floating-point operations per second.

The table below identifies the increase of CPU speed every two years, beginning in 1998, using simple desktop processors. Column A is the year. Column B shows the theoretical leap in processing speed from two years prior, while Column C shows an example of a real-world CPU released that year. Column D is the multiple of the previous processor speed. For example, in the year 2000, processing speed more than doubled from about 233Mhz to 500Mhz. Column E is the processing speed of an identified make and model CPU (listed in Column J). Column F presents the Front Side Bus (FSB) speed (more on this later) and then the Direct Media Interface (DMI), where the processor embedded the FSB for better performance between the CPU, memory and the video graphics hardware accelerator (see F1-6 versus F8-10).

Columns G through I shows the evolution of the internal CPU Cache, used to store repetitive tasks within the CPU rather than reaching out to RAM, for even faster recall. Column K shows the number of transistors within the CPU.


table 1.jpg

I believe in order to explain the significance of these changes; one must understand how personal computers work. A personal computer was a system of various components that served specific functions. This was primarily because of size limitations of integrated circuits (180 nm in 2000, down to 10 nm today).


I’ll try to make this as painless as possible.


Personal computers have many different types of memory, used for different purposes, but all working together to make the interaction work seamlessly and faster. The system’s first access to memory happens even before the operating system boots up. The computer BIOS (basic input/output system) is stored in CMOS (complementary metal-oxide semiconductor) memory, powered by a lithium battery. This is how a personal computer becomes self-aware, through a small memory chip that will insure that your basic configuration information (date, time, etc.) stays the same the next time you turn on the power. CMOS is Nonvolatile Memory, which also includes all forms of read-only memory (ROM) such as programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and flash memory. Typically, a battery powers Nonvolatile Memory, sometimes referred to as Non-volatile RAM (Random Access Memory). The BIOS holds just enough data to recognize itself and its components, then loads the operating system into RAM when you boot up the computer.


Although RAM is a microchip itself, it doesn’t permanently write itself to that chip or any other hardware. It’s a virtual place where all software is first loaded after you boot up the computer or double-click an icon. This makes it easily accessible by the CPU. RAM is very important for your computer’s performance simply because reading and writing to RAM is much faster than the standard hard drive. RAM access time is in nanoseconds (ns), or one billionth of a second, while the measured access time for the standard hard drive in milliseconds (ms), or one millionth of a second. A newer solid-state drive (SSD), or solid-state disk or flash disk does not contain a drive motor or spinning hard disk. It's a integrated circuit assembly of memory, but unlike your system memory, it stores the data persistently, so now the hard drive works in nanoseconds, it also does not have any moving parts as a point of failure.


Side note: If you have a standard spinning hard drive that fails to boot up, stick it in the freezer for 30-60 minutes. Works every time.

Unlike the hard drive, RAM only exists while the power is on. When the computer shuts down, all the data held in RAM disappears. This is why you lose all your data if you haven’t saved it. It saves it to the hard drive to be accessed later. When you boot up your computer all your software (this also includes your operating system) is loaded once again into RAM from your hard drive.

Most personal computers allow you to add additional RAM modules, but up to a physical limit and processing speed. If there’s not enough RAM, then the system will begin writing to disc in the form of a pagefile. The pagefile is an allocated space on the hard disk, used as an extension of RAM. If you open a file that requires more space than readily available, the system will begin to write any idle data to the hard drive. This process itself can eat up RAM and time, depending on the operating system, processor, and how fast your hard drive can write. The advantage to a pagefile is that it keeps the data in one large file, making it faster to access later, than trying to recollect all the data from their original locations, which could be scattered all over multiple locations.


As computers evolved into churning more graphics and video (loads more data processing than simple ASCII text), and analytical data processing of those pixel intensive streams and its metadata, a separate hardware graphics accelerator, or graphics processing unit (GPU) was added, with their own exclusive video RAM, similar to your system RAM except that while the processor writes data into video RAM, the video controller can simultaneously read from RAM for refreshing the display graphics display. Many GPUs, whether embedded onto the motherboard, or as a separate expansion card module, used to share memory with the system memory, essentially stealing a percentage to improve the video performance. Today, GPUs can come with its own share to improve video performance whether embedded within the motherboard or system board or even the CPU itself, called an Accelerator Processing Unit (APU).


Now that you understand a bit how a computer works, let’s take a look at the diagram below which shows the logical system architecture of a computer motherboard circa 2002, from my book Build Your Own Server. The CPU used to communicate with what used to be called the Northbridge, another microchip that controlled communications between the CPU and the Memory and GPU. This was called the Front Side Bus (FSB). If you wanted a high performance machine for video games, graphics or digital video processing and production, the speed of the FSB was usually the bottleneck, embedded within the motherboard. You had to find the right mix and match of the various components to build an efficient high performance machine.


System Architecture 1.jpg

Personal Computer System Architecture, circa 2002

As depicted in the diagram above, back in 2002, computer motherboards had a Dual Independent Bus (DIB). The bus of a motherboard is a circuit arrangement, which attaches devices to a direct line, allowing all signals to pass through each one of them. The signals are unique to each particular device, so the devices only understand their own signals.




ATX Motherboard 1.jpg
Personal Computer motherboard/system board, circa 2002

The FSB is the data path and physical interface between the processor, video graphics accelerator or GPU and RAM, through the Northbridge, which then connected through to the Southbridge that processed communications to the Parallel Printer Port, Serial Ports, PCI, IDE and/or SATA and USB ports. So, when grabbing data from RAM in 2002, you were working in nanoseconds at 800 MB/s versus 33 MB/s for the IDE Hard drive, when writing to the pagefile.


Intel's Sandy Bridge, introduced with the Core processors, and AMD's Fusion processors (both released in 2011) integrated the FSB and Northbridge into the CPU, replaced by Intel’s QuickPath Interconnect (QPI) or AMD’s HyperTransport to the Platform Controller Hub (PCH) architecture, formerly the Southbridge, which then became redundant; now directly connected to the CPU via the Direct Media Interface (DMI).

sandy bridge 1.jpg

Personal Computing System Architecture, circa 2011

The modern CPU is more of a SPU (system processing unit), integrating original bottlenecks to provide smaller, faster performing computers and computing devices.

Beginning with the Pentium Pro, the level-2 (L2) is packaged on the processor. If the data resides in L1 or L2 cache, the CPU doesn’t need to even look in RAM, thus saving even more time in operation. An embedded level 3 (L3) cache and a 128M L4 cache was added with Intel’s multi-core processors, and Ivy Bridge (2011-2013) gave the GPU its own dedicated slice of L3 cache in lieu of sharing the entire 8MB.


ITX Motherboard 1.jpg

New Mini ITX Motherboard/System Board

Below I’ve added some images to give you an graphical output example of what all this geek-speak means visually. Although there were discussions that Moore’s Law will eventually collapse, it was sometimes pushed back another decade, or some academic professors believe it was still 600 years into the future, while others believe once we create a transistor the size of an atom, there’s no where else to go.

In 2003, Intel predicted the collapse would be between 2013 and 2018, which based on this little exercise seems accurate, although the numbers would be different if I added the high-performance server-side processors, like the 18-core Xeon Haswell-E5, with 5,560,000,000 transistors, or even the X-Box One with 5,000,000,000 transistors. However, I’m not sure it’s a fair assessment because I believe the rules have changed. I don’t see the same Central Processing Unit that managed a sophisticated system of integrated circuits and components from 15 years ago. I see the SPU, designed to expedite the rapid development and deployment of the Internet of Things. Smaller, faster, smarter computing devices, and not personal computers.


Computing continues to evolve. It will be interesting to watch as we now slip the new “personal computer” into our pocket and/or purse.

PC 2000.jpg

Personal computing, circa 2000




Personal Computing, circa 2016 (can't get more personal than something you carry in your pocket)


Tomb Raider II.jpg

Video game graphics, circa 1997 (Tomb Raider II)


Tomb Raider 2016.jpg

Video game graphics, circa 2015 (Rise of the Tomb Raider) Tomb Raider is Copyrighted 2016 Square Enix


CCTV 2005.png

Video surveillance CCTV resolution 30fps (352 x 240 pixels CIF), circa 2005


CCTV 2016.jpg

Video surveillance CCTV resolution 30fps (2048 x 1536 pixels 3MP), circa 2016