Skip navigation

I recently revisited the silicon world of micro processing and its evolution since I wrote my book Build Your Own Server (McGraw-Hill 2003), and was fascinated by how computing is shifting from what was an integrated circuits and component based system architecture, with the central processing unit (CPU) at its core, to a more consolidated, embedded and integrated system processing unit (SPU), that improves overall computing performance rather than just processing power; an evolution that appears to have not only changed computing, benefiting software development, but also seems to have derailed Moore’s Law.

In 1975, during a meeting of the IEEE International Electron Devices Meeting, Gordon Moore stated that based on the increasing die sizes, reduction in defective densities and the simultaneous evolution of finer and smaller dimensions, “circuit density-doubling would occur every 24 months.” Although Douglas Engelbart had a similar theory in 1960 and David House of Intel factored in the increasing performance of transistors and transistors counts, the fact that “micro-processing performance would double every two years” has come to be known as “Moore’s Law.”

 

Moore’s Law is an important consideration in any traditional system architecture that uses a central processing unit (CPU), surrounded by multiple system components. Encryption and cryptography, compression and decompression and any data transmissions uses processing power. The more complex the intelligence, the more GFLOPS of processing power that is required. GFLOPS is a measure of computer speed; a gigaflop is a billion floating-point operations per second.


The table below identifies the increase of CPU speed every two years, beginning in 1998, using simple desktop processors. Column A is the year. Column B shows the theoretical leap in processing speed from two years prior, while Column C shows an example of a real-world CPU released that year. Column D is the multiple of the previous processor speed. For example, in the year 2000, processing speed more than doubled from about 233Mhz to 500Mhz. Column E is the processing speed of an identified make and model CPU (listed in Column J). Column F presents the Front Side Bus (FSB) speed (more on this later) and then the Direct Media Interface (DMI), where the processor embedded the FSB for better performance between the CPU, memory and the video graphics hardware accelerator (see F1-6 versus F8-10).

Columns G through I shows the evolution of the internal CPU Cache, used to store repetitive tasks within the CPU rather than reaching out to RAM, for even faster recall. Column K shows the number of transistors within the CPU.

 

table 1.jpg


I believe in order to explain the significance of these changes; one must understand how personal computers work. A personal computer was a system of various components that served specific functions. This was primarily because of size limitations of integrated circuits (180 nm in 2000, down to 10 nm today).

 

I’ll try to make this as painless as possible.

 

Personal computers have many different types of memory, used for different purposes, but all working together to make the interaction work seamlessly and faster. The system’s first access to memory happens even before the operating system boots up. The computer BIOS (basic input/output system) is stored in CMOS (complementary metal-oxide semiconductor) memory, powered by a lithium battery. This is how a personal computer becomes self-aware, through a small memory chip that will insure that your basic configuration information (date, time, etc.) stays the same the next time you turn on the power. CMOS is Nonvolatile Memory, which also includes all forms of read-only memory (ROM) such as programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and flash memory. Typically, a battery powers Nonvolatile Memory, sometimes referred to as Non-volatile RAM (Random Access Memory). The BIOS holds just enough data to recognize itself and its components, then loads the operating system into RAM when you boot up the computer.

 

Although RAM is a microchip itself, it doesn’t permanently write itself to that chip or any other hardware. It’s a virtual place where all software is first loaded after you boot up the computer or double-click an icon. This makes it easily accessible by the CPU. RAM is very important for your computer’s performance simply because reading and writing to RAM is much faster than the standard hard drive. RAM access time is in nanoseconds (ns), or one billionth of a second, while the measured access time for the standard hard drive in milliseconds (ms), or one millionth of a second. A newer solid-state drive (SSD), or solid-state disk or flash disk does not contain a drive motor or spinning hard disk. It's a integrated circuit assembly of memory, but unlike your system memory, it stores the data persistently, so now the hard drive works in nanoseconds, it also does not have any moving parts as a point of failure.

 

Side note: If you have a standard spinning hard drive that fails to boot up, stick it in the freezer for 30-60 minutes. Works every time.


Unlike the hard drive, RAM only exists while the power is on. When the computer shuts down, all the data held in RAM disappears. This is why you lose all your data if you haven’t saved it. It saves it to the hard drive to be accessed later. When you boot up your computer all your software (this also includes your operating system) is loaded once again into RAM from your hard drive.


Most personal computers allow you to add additional RAM modules, but up to a physical limit and processing speed. If there’s not enough RAM, then the system will begin writing to disc in the form of a pagefile. The pagefile is an allocated space on the hard disk, used as an extension of RAM. If you open a file that requires more space than readily available, the system will begin to write any idle data to the hard drive. This process itself can eat up RAM and time, depending on the operating system, processor, and how fast your hard drive can write. The advantage to a pagefile is that it keeps the data in one large file, making it faster to access later, than trying to recollect all the data from their original locations, which could be scattered all over multiple locations.

 

As computers evolved into churning more graphics and video (loads more data processing than simple ASCII text), and analytical data processing of those pixel intensive streams and its metadata, a separate hardware graphics accelerator, or graphics processing unit (GPU) was added, with their own exclusive video RAM, similar to your system RAM except that while the processor writes data into video RAM, the video controller can simultaneously read from RAM for refreshing the display graphics display. Many GPUs, whether embedded onto the motherboard, or as a separate expansion card module, used to share memory with the system memory, essentially stealing a percentage to improve the video performance. Today, GPUs can come with its own share to improve video performance whether embedded within the motherboard or system board or even the CPU itself, called an Accelerator Processing Unit (APU).

 

Now that you understand a bit how a computer works, let’s take a look at the diagram below which shows the logical system architecture of a computer motherboard circa 2002, from my book Build Your Own Server. The CPU used to communicate with what used to be called the Northbridge, another microchip that controlled communications between the CPU and the Memory and GPU. This was called the Front Side Bus (FSB). If you wanted a high performance machine for video games, graphics or digital video processing and production, the speed of the FSB was usually the bottleneck, embedded within the motherboard. You had to find the right mix and match of the various components to build an efficient high performance machine.

 

System Architecture 1.jpg

Personal Computer System Architecture, circa 2002


As depicted in the diagram above, back in 2002, computer motherboards had a Dual Independent Bus (DIB). The bus of a motherboard is a circuit arrangement, which attaches devices to a direct line, allowing all signals to pass through each one of them. The signals are unique to each particular device, so the devices only understand their own signals.

 

 

 

ATX Motherboard 1.jpg
Personal Computer motherboard/system board, circa 2002


The FSB is the data path and physical interface between the processor, video graphics accelerator or GPU and RAM, through the Northbridge, which then connected through to the Southbridge that processed communications to the Parallel Printer Port, Serial Ports, PCI, IDE and/or SATA and USB ports. So, when grabbing data from RAM in 2002, you were working in nanoseconds at 800 MB/s versus 33 MB/s for the IDE Hard drive, when writing to the pagefile.

 

Intel's Sandy Bridge, introduced with the Core processors, and AMD's Fusion processors (both released in 2011) integrated the FSB and Northbridge into the CPU, replaced by Intel’s QuickPath Interconnect (QPI) or AMD’s HyperTransport to the Platform Controller Hub (PCH) architecture, formerly the Southbridge, which then became redundant; now directly connected to the CPU via the Direct Media Interface (DMI).


sandy bridge 1.jpg


Personal Computing System Architecture, circa 2011


The modern CPU is more of a SPU (system processing unit), integrating original bottlenecks to provide smaller, faster performing computers and computing devices.

Beginning with the Pentium Pro, the level-2 (L2) is packaged on the processor. If the data resides in L1 or L2 cache, the CPU doesn’t need to even look in RAM, thus saving even more time in operation. An embedded level 3 (L3) cache and a 128M L4 cache was added with Intel’s multi-core processors, and Ivy Bridge (2011-2013) gave the GPU its own dedicated slice of L3 cache in lieu of sharing the entire 8MB.

 

ITX Motherboard 1.jpg

New Mini ITX Motherboard/System Board


Below I’ve added some images to give you an graphical output example of what all this geek-speak means visually. Although there were discussions that Moore’s Law will eventually collapse, it was sometimes pushed back another decade, or some academic professors believe it was still 600 years into the future, while others believe once we create a transistor the size of an atom, there’s no where else to go.

In 2003, Intel predicted the collapse would be between 2013 and 2018, which based on this little exercise seems accurate, although the numbers would be different if I added the high-performance server-side processors, like the 18-core Xeon Haswell-E5, with 5,560,000,000 transistors, or even the X-Box One with 5,000,000,000 transistors. However, I’m not sure it’s a fair assessment because I believe the rules have changed. I don’t see the same Central Processing Unit that managed a sophisticated system of integrated circuits and components from 15 years ago. I see the SPU, designed to expedite the rapid development and deployment of the Internet of Things. Smaller, faster, smarter computing devices, and not personal computers.

 

Computing continues to evolve. It will be interesting to watch as we now slip the new “personal computer” into our pocket and/or purse.




PC 2000.jpg

Personal computing, circa 2000

 

iPhone.jpg

 

Personal Computing, circa 2016 (can't get more personal than something you carry in your pocket)

 

Tomb Raider II.jpg

Video game graphics, circa 1997 (Tomb Raider II)

 

Tomb Raider 2016.jpg

Video game graphics, circa 2015 (Rise of the Tomb Raider) Tomb Raider is Copyrighted 2016 Square Enix

 

CCTV 2005.png

Video surveillance CCTV resolution 30fps (352 x 240 pixels CIF), circa 2005

 

CCTV 2016.jpg

Video surveillance CCTV resolution 30fps (2048 x 1536 pixels 3MP), circa 2016

There are a few devices on the market that can provide network failover and load balancing multiple Wide Area Network (WAN) connections. The theory is, if you lose one of multiple WAN connections, the router seamlessly failover to a secondary and/or third WAN connection to transmit and receive crucial data. It can also load balance bandwidth from all the connections, which if using 4G LTE can provide relief from extensive public usage and saturation.

load balancing road.jpg

I've researched and configured a few devices. I've been primarily focusing my efforts on the Cradlepoint IBR1400 and the IBR600, along with the Pepwave MAX 700 at two locations. In order to configure these devices properly, you need at least two WAN connections.

 

My first location was an isolated cottage, with some perseverance, invoked AT&T to install a Uverse Internet connection for $40 per month. The infrastructure limited the bandwidth capabilities to up to 12Mbps download (excellent for Netflix), and only 1Mbps upload (poor for video surveillance video). In may not appear to be such a monumental task to get wired nowadays, but this isolated cottage was on a small secluded island, in the middle of a river. It was well worth the effort as wired broadband connections are sold as infrastructure and not by bandwidth or data usage such as the cellular carriers. The data usage costs for uploading security video (or download video) can be astronomical.  Even wireless 4G LTE service providers, with an unlimited data plan does not include uploading a continuous bandwidth intensive stream of synchronous security video. Experience has proven that at peak capacity, because it's a shared public network, bandwidth will be throttled, reducing quality and performance.

 

This cottage provided free reign to temporarily install anything, on any pole or structure without city ordinances, codes or permits and privacy issues, as it was secluded and used infrequently. I've installed four cameras to test this new failover and load balancing test case. Power was a challenge as usual, but nothing an upgraded circuit breaker and some trenching couldn't handle.

 

I decided to install various cameras for comparison. My analog Pelco Spectra IV PTZ workhorse, connected to a digital video encoder; a Hikvision 1080p PTZ; a Hikvision Darkfighter 1080p PTZ, and a fixed 3MP Longse IP camera, with IR. All together, these four cameras, to get optimal quality require a minimum of 10Mbps upload through a WAN connection to view remotely. I had to trim down the frame rates substantially and got it down to 5Mbps. So, now how do I get a 5Mbps elephant through a ATT Uverse 1Mbps fire hose?

 

The Milestone XProtect server received the full 15fps at maximum resolution through the LAN, the Pelco using MPEG4 and the Hikvision and LongSe using H.264 without any issues. The video stream on the Pelco Spectra IV was set to the maximum analog 4CIF resolution and performed as expected. Sharp, smooth and exceptionally well at low light. By comparison, next to the Hikvision HD PTZ 1080p megapixel video stream, the image quality was found wanting. The Hikvision Darkfighter image was crisper, with exponentially far more details in the area-of-coverage, including color correction, leaves, grass, and even wood grain and the best low light capabilities without IR I've seen to date. However, the pan-tilt-zoom on the Pelco out-shined the clunky Hikvision. This probably has something to do with the TCP commands for PTZ (versus the UDP streaming for video) and the processing power behind streaming only 4CIF versus 1080p.

 

The 3MP LongSe camera was powered by a PoE injector (all the equipment is in climate controlled LCOM enclosures). I have another one of these cameras powered using a PoE Injector at the other location, with impressive image quality, although with a maximum frame rate of 15fps. Of course, I have Comcast/Xfinity internet at that location, with 20Mbps upload bandwidth.

 

The math identifies its connectivity issues are not switch or power related, but the limited ATT Unverse 1Mbps upload speed, which barely gave me a single frame per second for the cameras, and erratic PTZ controls. A 3MP camera, at full resolution, using H.264 30% compression requires about 900Kbps for a single frame. A 1Mbps upload pipe doesn't leave much room for anything other than a signal frame from the 3MP camera.

 

Obviously, seemed like a perfect location for some load balancing and failover testing using a 4G LTE network or two. Started with the Cradlepoint AER2100 Multi-WAN Router, using swappable USB modems, with exceptional horse-power, but it's 8" x 10" x 2" size wouldn't fit into the enclosure. The smaller COR IBR600 was the right size, but included an integrated modem, limiting flexibility. The Cradlepoint MBR1400 provided a smaller form factor, and swappable USB modems ports. However, unlike the AER2100 or IBR600, the MBR1400 wasn't a hardened device with extended temperature specifications, so I had to upgrade the enclosure for extended temperatures.

 

CP.jpg

Cradlepoint MBR1400


Would’ve like to have used the Pepwave MAX700, but I grew too fond of it in my home lab for testing platforms and devices. The Pepwave MAX700 includes two separate wired WAN inputs and up to four USB modems inputs.

 

pepwave.jpg

Pepwave MAX700

Below is the configuration using the ATT Uverse as the primary WAN connection, an ATT 4G LTE  and a Sprint 3G/4G connection (limited bandwidth in the region) for failover and load balancing. The ATT 4G LTE and the Sprint 3G/4G are configured as “on-demand,” which limits data usage to only when needed.

 

Cradlepoint configured with three load balancing and failover WAN connections

 

The WAN connection bandwidth demand jumps when streaming from my Milestone video management server to view the camera remotely, especially when viewing all four cameras at once. Now, thanks to the load balancing, I’m able to get some crisp imagery with smoother control of the PTZ cameras.

Data usage example chart


Incidentally, when I tested the failover feature, by unplugging the ATT Uverse wired connection, it took me a while to realize that it worked. It was that seamless.

 

Remote Live View of cameras using Milestone Xprotect Web Client