Skip navigation

When you look at the painting above, you see richness, depth and life, even within a still two-dimensional image. That's because an artist can see what ordinary people fail to see and are able to represent that elusive vision in a visual medium using just paint and canvas. These artistic nuances take the limitation of vision to a higher level of perception. The artist learns this deeper level of visual intelligence from an appreciation of the natural beauty and design around them. A fascination, fueled by an emotional desire to really see; to understand the universe around them and share that unique vision in a simplified two-dimensional design using a richer palette of vibrant color, geometric shapes, tones and perceived depth. An artist can see beyond just pixel color, because they have multiple senses and sensory fusion.


Although human beings only begin to learn about the visual world around us through the power of sight after birth, we understand even before birth, from multiple senses such as sound, touch, smell, and embedded genetic code that has given us the ability to identify more shades of green than any other color, so we may identify a predator hidden in the jungle for survival. We already have metadata to draw from in life. Infants may be limited to frowning, smiling, and crying to communicate, but they are a sponge for knowledge using our unique multi-dimensional channels for learning. An infant knows their mother’s heartbeat, taste and voice; recognizes the deeper baritone voice of their father; his larger, stronger hands and even their smells. Their connection to the world around them begins small, and grows as they grow. They eventually understand words from the sounds, not from the written word – that comes later, along with the magic of mathematics, when we jump-start the left side of our brain and formulaic language; opening up yet another channel of learning.

We now understand that that image, person, video, voice, sound, touch, heartbeat, smell and the word “Momma” all mean the same thing. Our continued intelligence and learning draws from all our senses all our lives. We eventually learn when shadowy figures in hoodies are a threat or just a couple kids who forgot their coats, by how they’re standing, talking, moving, whispering, and even smell. Is it a sixth sense that provides us with this threat assessment, or a life time of multi-sensory meta-data stored within our brains? How do we know when someone is looking at us from across a room full of people and not the person behind us?

Intelligence, as we understand it, is developed from a correlation of multiple senses, an intuitive connection to the universe around us, and life experience. We learn by sight, sound, smell, touch, taste and maybe even an intuitive, spiritual sixth sense of our surroundings. This happens while we are awake, and even in our dreams.

Neural Network

An artist draws from that inner-space to push themselves to create. They don’t just see a two-dimensional representation of a shadow as a grayed object reflection of light, but also the colors and life within that shadow. An artist can see life, even in darkness.

What happens to intelligence when the only sense available is sight, but not sight as we understand it, but like the sight of a newborn infant opening their eyes for the very first time?

An infant may be overwhelmed and confused while absorbing data from a three-dimensional world in awe and wonder. If we delete the human elements or feelings, there would only be data absorption. An infant with all five senses may only recognize their mother from her heartbeat, voice or smell. An artificial infant (or A.I.) with only the ability of sight, and without any self-knowledge (metadata), or desires and emotions fueling a drive to learn, to understand, to survive, to think (just left-brain formulaic language of ones and zeros) would simply continue its data absorption, collecting metadata from digital pixels from moving objects from multiple frames per second. This miracle of sight is also limited, as with only one eye, it cannot observe in three-dimensions; only compiles data for a flat two-dimensional representation of the analog world.

Without nine months of sounds, warmth and any pre-programmed genetic code, once its powered up for the first time (birth) our A.I. is at a disadvantage. A more powerful, larger image processor (retina) provides better sight, lets it be alone come darkness. Still, at this point, all our A.I. can do is capture pixel color data and observe motion; pixel blobs moving around their field of view like atoms forming molecules. It needs to be trained - to learn what these random bits and bytes are, beyond color and motion, even in a limited single framed field of existence.

Artificial intelligence (AI) is defined as the capability of a machine to imitate intelligent human behavior. We’ve already identified that human intelligence is multi-sensory with hereditary pre-programmed metadata, so is it possible for a one-eyed artificial infant who cannot listen, let alone hear; speak, let alone ask why; or even smell and taste to develop true intelligence?

Most all the video analytics software applications I’ve recently tested includes the ability to teach perspective to recreate the three-dimensional world from the single one-eyed two-dimensional field-of-view. It's a fixed field of view and metadata is created to identify that the ten-pixel object moving at twenty kilometers an hour, 100 meters away is the same type of object that is 200 pixels moving at twenty kilometers an hour, two meters away (distance must also be taught). That object then can either be automatically or manually categorized as a “vehicle,” so the artificial infant (machine) can answer the question “What is a vehicle?” A vehicle is only categorized by estimated size, shape and speed (calculated by motion between frames per second).


Traffic Cam


In time, much like a newborn baby, it will continue to collect data and become smarter, or if it's a multi-dimensional analytical algorithm, it may capture additional metadata beyond just “vehicle,” such as color, shape, size, even vehicle type and store that information for the future. Mounds and mounds of categorical metadata that can be used for forensic searching, real time alerting and more importantly, building knowledge.

That’s how the typical artificial infant, with machine learning (and machine teaching) software can identify a vehicle. That new-found knowledge, unfortunately, becomes irrelevant the moment the artificial infant changes its field of view by turning its head (panning, tilting and zooming). Now all that it’s been taught and its learned is out of context, as its knowledge is limited to understanding moving pixels and not objects.


Shape of a Car


A young child knows what an automobile looks like by identifying the simple geometric shapes that make up an automobile. We also learn very early that automobiles are on the street and we (people) should stay on the sidewalk. The street is where we see all the larger parked and moving vehicles. They should never come on the sidewalk, where all the people walk. This is something a human being learns before pre-school.


We also know person is more than just a mass of moving pixels.


Stick Figures


People have a head, shoulders, arms and legs -- even a child knows that. We, as human beings, understand the concept of objects and shapes, even when panning, tilting and zooming from the moment we first discovered how to turn our heads and a lifetime of viewing the world through television.

Our A.I. must also go to art school to draw the environment and classify objects by something more than moving blobs in a fixed background that is relatively ignored. Identifying simple shapes, patterns, perspective, a horizon line can give our A.I. additional input for additional intelligence. Train the machines to draw, to create, to wonder in awe of the dynamism of the analog world around them through deep learning neural networks and sensor fusion.

Deep learning neural net can provide the ability to draw “digital stick figures” through edge detection, introducing the shape of people with a head, arms and legs at any angle or size.


Edge Detection 1


Edge Detection 2

Identifying the “street” can teach it “traffic” and store that when the field of view is changed. Same with a sidewalk where people walk, run and jog. Using a predictive analytic, it can extend the simple tracing of the field of view and start drawing and tagging the environment, outside of what is initially visible.

Every time the A.I. is panned, tilted and zoomed, it can then validate its predictive analysis, verify and extend its reach to learn the world around it and store it for a 360-degree view of it’s present environment, no matter where it may be “looking.” Just like a human being.

This can be accomplished with modification to an existing algorithm by training the deep learning neural net to understand the expected environment using the stationary markers, and even the shadows from the sun to understand cardinal views. Since most off-the-shelf video analytic solutions ignore the "background" and focus mostly on moving pixels, this would require a deep analysis of the background to understand the environment and then identify objects at whatever focal length required to see, building a thorough understanding of the 360-degree world around them. .
Our A.I. also needs to start asking questions.

Where am I? What is this? What is this structure? Why is that truck double-parked?

This information may be difficult to achieve using the limited power of one-eyed sight through pixel color, so much like the human being, other input channels of data absorption would be required. Asking the A.I. across the street, with its own metadata from a completely different vantage point can be stitched to help better understand the objects and the environment shared. Our A.I. now has two-eyes for better depth perception, although still reading two-dimensional imagery from two points of view, it’s opening additional channels of data absorption and the beginning of multi-sensory fusion. Stitching multiple data from varying angles and sources could provide super-human abilities.

Fusing data from fire, burglar, 911 emergency calls and gunshot detection will give our A.I. the miracle of hearing, without ears. It can smell the ozone or heavy pollen in the air through weather data, without a nose.

A 3D LiDAR Sensor can truly recreate the three-dimensional world using pulses of infrared light, giving our A.I. the ability to “feel” real world shapes and actual distance. A digital method of recognizing a physical object by touch.
Once an object is out of our A.I. field of view, it can follow without feet, by connecting with neighboring A.I. sources.
The question “Where am I” could be answered by learning how to read, using Object Character Recognition (OCR). OCR can read the letters off a billboard behind Home-plate on a live broadcast, it can certainly read street signs, and/or the sign on the bank across the street, or the FedEx logo off a double-parked truck.

If the street sign reads “8th Street NE” and reading the business signs also helps identify the surrounding environment; our A.I. is on the intersection of 8th and H Street. How do I know this? I asked the teacher, or in this case, an integrated geo-spatial mapping application. Integration into outside sources and sensors would provide the necessary additional inputs for our A.I. to really build artificial intelligence, much like how a human being, when we do not know, we ask.
Deep Learning and Sensor Fusion graces our A.I. with the ability to hear without ears, smell without a nose, touch without hands, read without first learning the Alphabet, and even ask questions like Why?

Why am I here?

The answer to this question for our Artificial Infant is far simpler than when a human child eventually learns the word “Why?” At some point in the development of intelligence; sight, sound, smell, feel, and touch is even not enough – we need to ask why.

Recently read about how artificial intelligence is being developed to play games. In one article, the A.I. was given the means to slow the opponent down by shooting a virtual laser beam at the opponent to momentarily discombobulate them. The A.I. became more aggressive when losing the game and succumbed to repeatedly blasting the opponent with laser fire. If we teach AI to be competitive, even in something that appears to be as innocent as a game, maybe programming the fact that it’s not whether you win or lose, but how you play the game that matters. Really? In any competitive game, no one likes to lose. The goal is always to win. So really, it’s just a war game. Do we really want to teach the machines to win at war, at all costs? Has anyone ever seen the movies War Games or The Terminator?

We should teach artificial intelligence not to be a Chess master (a game of war); but instead to be an artist -- learn the visual complexity of the analog world around them and observe life’s dynamism that human beings overlook and take for granted; all made up of colors, shapes, lights, patterns, shadows and all its natural wonder. The miraculous simplistic natural beauty of the analog world.

It’s my belief that there cannot be true artificial intelligence until the digital world fuses with the analog world, and that requires the miracle of sight.

Teach the machines to create as an artist, and they will see what we cannot see.

My deep dive into the introduction a core suite of Hitachi Video Analytics (HVA) last year, as presented in my article Digital Video Analytics - Test Results, opened the door to a flood of inquiries and use cases. Fortunately, my hundreds of testing exercises gave me insight on how the analytical engine for each module could be calibrated for specific use cases and be able to identify if something within our expanding portfolio could meet the requirements, including the introduction of the Hitachi ToF 3D Sensor, which improves HVA by delivering a digital video stream that is completely unaffected by lighting, shadows and color variances.

In a continued effort to expand our video intelligence capabilities and flexibility, I tested several video analytics, including licensed, third party commercial-off-the-shelf (COTS) and our own proprietary solutions for a number of specific use cases, using a few different variables in camera distance, angle, focal length and lighting. This exercise uncovered valuable insights on the differences in analytical engines and the limitations in how algorithms read pixel color, contrast and frames per second. This effort confirmed that one size does not fit all from both analytical capabilities, flexibility and solution architecture.

A popular question was “can HVA Object Detector send an alert if it recognizes X,” with “X” being an classified object (e.g. gun, knife, box, etc.). HVA Object Detector identifies objects based on calibration variables which includes a static object identified by the number of overall blocks or connected blocks (pixel size) within the restricted zones using machine learning. HVA Object Detector learns the objects within a zone, without the need to classify them, and continues to learn through variances in illumination states and motion. That’s all it does, and it does it very well, so it can cover a wide variety of use cases, but not all of them. Object recognition and identification requires deep learning, even when using the Hitachi 3D LiDAR sensor (more on this later).

Below is an example of how HVA Object Detector recognizes a black back-pack on a black bench. I created yet another example that effectively sent an alert when a white garbage bag was added to the trashcan, peeking behind the station structure.


Watch Object Left Behind

The following examples, which were completely out of the recommended system requirements (experiments) uncovered more insight on HVA Object Detector effectiveness and capabilities. I used HVA Object Detector to turn the standard CIF resolution camera stream monitoring the pantograph of a train car to send real-time alerts when damaged. Although there is constant motion within the video, and the specifications require a fixed field of view, I limited the zone for analysis to outlining the pantograph itself, while ignoring the motion in the scene.


Watch Pantograph

System requirements for HVA Object Detector includes a static mounted fixed field of view camera, with a minimum of CIF 352x240 resolution and one frame per second. The example above and below do not meet all those requirements, but with some persistence, I was able to get positive results.


Watch Ship Alert

The unique configuration variables also provides the ability to use HVA Object Detector for loitering alerts (HVA Loitering Detector is scheduled to be released soon).


Watch Loitering

This can also work for object theft, as the demo video presents below. Given enough frames for HVA Object Detector to learn the entire scene and its objects, it will send an alert if those objects are removed or disappear.


Watch Gallery Burglary

Another early use case question was -- “Can HVA Traffic Analyzer provide alerts for restricted or double parking?” HVA Traffic Analyzer monitors traffic and classifies vehicles moving pixels as a car, truck, bus or bike.


Watch Traffic Analyzer

While HVA Traffic Analyzer may not be engineered to identify double parked vehicles, HVA Object Detector can be calibrated to ignore moving objects, and even static objects for a predefined time before sending an alert. This could then take both the time for double parking and even the vehicles waiting at a traffic signal into consideration. I needed to try it. I was only able to dig up this old CIF resolution footage from 2008 of a person moving their vehicle from a restricted parking area. Even with the poor, standard resolution, and people walking across the restricted zone, I could calibrate the demo video for an accurate alert. This opens the opportunity to turn decade old traffic cameras into IoT sensors by scheduling pan-tilt-zoom pre-sets within the video management software with the automated enable/disable function of HVA.


Watch Parking Video

Although the HVA Object Detector may not provide actual object identification or classification, it’s flexible enough, and smart enough for many unique use cases.I also tested five different people counting video analytics products in four different "people counting" use case scenarios. This was to review configuration and calibration capabilities, algorithm responsiveness, and performance. It was also to prove my theory that the term “people counting” is too vague when it comes to video analytics. There are too many scenarios, environmental factors, camera angles and specific use cases to believe that one size fits all. This is just people counting, not people tracking or people identification or people falling down or running. Just a people-sized pixel object crossing a virtual line or zone in either direction with tailgating alert capabilities.One of the four different people counting use case scenarios presented here is the typical top-down use case, which provides the best separation of individual people for the maximum performance. However, I have done live demonstrations on live CCTV cameras, and even a demo clip at night, with a typical 45 degree angle of coverage with some success (depending on the field of view). These experiments used a top-down view calibrated on each solution by object size from the same Axis camera. The top-down field of view negates any recreation of the three dimensional space and so this is typically done by intelligent people-size identification. All the solutions required some unique calibration for each location, with performance based on how well the algorithm can filter luggage, shadows and reflections. The test results of this simple example proved that, given the calibration effort, all five video analytics people counting products could provided 100% accuracy. Please note that no one was holding an infant child during the making of these films. A top-down field of view, from a single camera source, makes consistent 100% accuracy a challenge with any people counter analytics, not only because of parents carrying children but also the variables at each area of coverage and multiple illumination states. A single people counting solution may work well in five different locations, but how about 50, or 500 dramatically different locations? The reflection of the light off the floor, the shadow on the door, different illumination states, windows with sunlight at different times of the day, during different seasons, all affect the pixel colors and contrast and require continuous machine learning or re-calibration. Although this was a simple example, it wasn't necessarily an easy calibration.

Watch Top-Down People Counters

The only method of increasing accuracy and greatly reducing calibration time for any of the more challenging environments, with any lighting, is by using a 3D LiDAR (TOF) sensor rather than a RGB CCTV camera. The Hitachi 3D LiDAR sensor is integrated into the Hitachi Video Analytics suite and provides the "perfect" image for analysis. Although the Hitachi 3D LiDAR sensor, with its SDK can do so much more than simple people counting, tailgating, loitering or intrusion detection (more on Sensor Fusion in the future), the video stream is unaffected by lighting, shadows, reflections because it generates it's own light source. High resolution pulses of infrared light that is measured as it bounces off objects. So even when I turned my office lab into a disco by flipping my lights on and off, there was no change to the video image. It continued to provide 100% accuracy, out of the box. Although it would not be able to identify the infant child cradled in mother's arms (that would need deep learning), it would continue to effectively count people no matter what color or lighting changes to the environment may occur and took only a couple minutes to setup and calibrate.


3D LiDAR Sensor with HVA People Counter

Now, the integration of the Hitachi 3D LiDAR sensor into HVA provides a solution to those problematic locations with wildly changing lighting, contrasts and shadows, extending its usability for even more challenging environments.Another one of the five specific use case tests included using a typical field of view. Below are two of the five COTS video analytics that were able to provide a somewhat accurate count of the people crossing the virtual line. The video sample is a purchased stock video clip and required a considerable amount of effort that included cropping out the giant LCD billboards above the platform, testing multiple locations of the virtual counting line, configuring the simulation of three-dimensional space, adding a pre-recorded video loop for machine learning, the calibration of the people size, contrast, etc. Moving the camera angle down another 10 degrees or so would've been so much easier. The field of view is crucial for the analytical engines to identify people on each side of the virtual counting line and provided more separation of groups, but alas, this was a stock video clip.

Watch Famous Station People Counting

If we move from "machine teaching" using finite number of variables, to deep learning, using thousands or millions of sample images for training and analyzing the video stream at multiple layers for a better understanding of its objects using grayscale edge detection, a deeper analysis of color using the RGB dimensions for image segmentation and layered classes, you can achieve a multi-point people counter/tracker, even within a crowded scene.

Watch Deep Learning Famous Station People Tracking

Above samples are examples of how Deep Learning goes beyond typical pixel analysis and can meet some of the more challenging requirements.


Before we discuss digital video analytics I need to explain, as painless as possible, why the following examples have inspired me to write this post. You see, I’ve been working with digital imagery and video since the 1990s and I’ve come to understand that the image presented on your screen is made up of digital pixels. In the digital world of absolute mathematical equations, pixels are not measured in dots of Cyan, Magenta, Yellow and Black, like the offset printing process, but rather in bits and bytes. A digital pixel represents visual color. There are 8-bits (1 byte) in a black and white image and 24-bits for a color image (1 byte each for Red, Green and Blue). So, each pixel contains 256 shades of gray (for black and white) or 256 shades of Red and 256 shades of Green and 256 shades of Blue, or 16,777,215 colors for a color image. If you’re wondering what happened to the Black in the transition from CMYK in print to the RGB of pixels, mix Red, Green and Blue paint together, and see what you get – black. The richness of the blacks are also defined by brightness and contrast in the digital world.

This is why your 1080p television looks so much sharper and more colorful than that old CRT television, because the digital image has more pixels to pick up more detail and color variables. However, more pixel depth doesn’t make a smarter camera, only a better quality image.

Now that you understand how the IP camera image processor captures visual images in the analog world, the next step is motion. Digital motion pictures is achieved the same traditional way Thomas Edison achieved motion back in 1901, with frames per second. The rapid succession of multiple snapshots of the field of view captures the color changes at a rate per second providing the illusion of movement on screen.

The real magic of digital video is the compression and decompression (Codec) algorithms. These codecs analyze motion within the multiple frames and dissects them into blocks, categorizing them into special frames and data for transmission. This is a necessity for the transmission of digital video because transmitting full 1080p frames per second (MJPEG) requires about 31 Mbps bandwidth (yes, thirty-one megabits per second), versus the H.264 codec, which can transmit the same quality imagery using only 2.5 Mbps. Further details on Codecs isn’t necessary for this post, but only to explain that Codecs do not care what is moving within the digital image to encapsulate that movement within its macroblocks. It’s only function is to shrink the video stream for transmission and populate less storage space when recording.

Digital pixels identify color. Multiple frames creates the illusion of motion. Codecs just shrink it for transmission and storage. The fact of the matter is, IP cameras are not very smart. They do not know what they are “seeing.” They do not know what is moving; they just capture, replicate and transmit. They don’t know the difference between blowing snow and a person walking across the scene. This is why video analytics systems have failed in the past, because software only cares about the pixels so you’re limited in trying to understand what is actually being “seen.”

Traditionally, analytical software is limited to the data received from these IP cameras, and so they analyze pixels (color), motion (FPS) and once calibrated, begin to understand a difference between something that’s 10 pixels and 50 pixels in size, calculate the time between frames and determine that the 10 pixels maybe a person walking and the 50 pixels is a car speeding, if its calibrated as such. The moment the lighting changes (which changes the color), or that person opens a giant umbrella, or that car slows down, it needs to be able to categorize shapes in order to remember that, “wait, that’s still a car.”

So you see, when I was assigned the task of testing and creating demonstration samples for Hitachi Video Analytics Suite, I was quite apprehensive in accepting the project. I envision hours of frustration ahead of me because IP cameras and software are not that smart. I wanted the killer app (analytics) to be that smart. I envisioned re-purposing the tens of thousands underutilized security IP cameras into Smart City sensors.

HVA not only surprised me, it impressed me. One of the first examples I created is below. When I realized HVA Object Detector could be calibrated to ignore moving objects, I remembered a use case from a decade ago that involved sending a real-time alert if there was a stalled vehicle or person at a railroad crossing. I recalled it took a freight train over a mile to stop and cost millions of dollars a day for delays, let alone the liability. HVA Object Detector ignored all movement, including any cars crossing the tracks and sent an alert when the person fell on the tracks

Watch Video

HVA Intrusion Detector includes a built-in filter for weather conditions. I inadvertently performed a test comparison between the analytics built into a camera and HVA by tapping into a video stream from a backyard camera which I had configured with its built-in analytics. The only method of calibration and configuration for the built-in analytics was adjusting its sensitivity. Although all the false positives from animals made me realize what a jungle the neighborhood was (squirrels, cats, raccoons, possums), I eventually disabled the built-in analytics, as I was sick of getting email alerts with snapshots of rain and snow. After a while, the continued reducing of its sensitivity doesn’t alert you to anything but the huge afternoon shadows that cause dramatic changes in pixel color. Absentmindedly, I did notice that I didn’t receive any false positives from the HVA Intrusion Detector, ingesting another RTSP stream from the same camera. That’s when I decided to create the example below. Simple area protection configuration, taken during snow fall. HVA ignores the snow, and the squirrel running around, and only alerts me when the person walks into the frame.


Watch Video

HVA knows what snow is. The intelligence behind the snow, rain, haze and fog filter that’s built into HVA Intrusion Detector is also available in the HVA Video Enhancer module. Impressed, I decided to give it an even bigger challenge. How about a Chicago-style snowstorm? Analyze This! To the left is the actual footage, crazy windblown snow creating white out conditions. It gets to the point at the end of the clip that there’s so much snow, it tricks the camera back to color mode, thinking it was daylight. The clip to the right is the sample video processed through HVA Video Enhancer, which now can be ingested into other video analytic modules for better accuracy and performance.


Watch Video

HVA really does know what snow is. The HVA Intrusion Detector sample clip below is configured for Perimeter Intrusion. A person must walk from the green zone into the red zone in order to be recognized as an intruder. Even though I configured the zones to be the same size, HVA’s ability to recreate a three-dimensional space from the two-dimensional image, it understands perspective so it recognizes that the figure attempting to enter the facility is 1.8 meters tall, and an intruder at each door.


Watch Video

A unique and very effective module is the HVA Privacy Protector, which enables the ability to protect the privacy of individuals and still allow for video monitoring for safety and security. I configured the HVA Privacy Protector example below with a couple layers. First, I wanted the ATM to always be pixelated, to protect PIN numbers, and the vehicles on the street, to protect license plates. Although HVA Privacy Protector is engineered for static fixed camera views, noticed how the persons-of-interest are still fully pixelated even when standing still? This stream is now available for input into other systems and/or analytics, such as Intrusion Detector or Object Detector while still protecting the privacy of individuals. The secured archived footage can only be seen by authorized personnel with the correct security clearance. You can even add a second layer of security using a Smart Card and transaction authentication number (TAN) for protection.

Watch Video


I created over a hundred test samples for all the HVA modules (listed at the end). HVA is impressive because each module has its own analytical engine, engineered to do that specific function. It’s not one pixel analyzer, and movement calculator that was built upon to do something more than its core capability. HVA also recreates three-dimensional space from a two dimensional video image and then adds the 4th dimension (time) for improved performance. You can also calibrate length of its 3D learning phase and each scene with multiple illumination states – day, night, afternoon, which also improves its performance and accuracy. It really does add more intelligence to cameras and I've tried it on many different types from a generic low-end bullet camera to the popular Axis cameras (including the panoramic), to the top of the line Thermal camera.I could go on with other samples, but you get the idea. I was apprehensive at first, but I’m excited to have been a part of this new technology release, and the thought that my dream of the analytics killer app for Smart City has finally become a reality. The Hitachi Video Analytics Suite:

  • Activity Visualizer
  • Camera Health Monitor
  • Face Collector
  • Intrusion Detector
  • License Plate Recognizer
  • Object Detector
  • Parking Space Analyzer
  • People Counter
  • People Counter 3D
  • Privacy Protector
  • Queue Detector
  • Traffic Analyzer
  • Vehicle Counter
  • Video Enhancer

This was the cover feature in the May/June issue of Asia Pacific Security Magazine

At our peril: Innovating without security | LinkedIn



Welcome to the Future

By Anthony C Caputo


A “boundary” can be defined as a line that marks the limits of an area; a dividing line, real or imaginary, separating a subject or sphere of activity. The keyword here is “imaginary.” Laws and rules (both man-made and by physics) were/are designed to create those imaginary lines, much like the sides of good versus evil, but humanity doesn’t follow laws and rules in black-and-white, especially now that there’s 256 shades of pixel grays. Human beings are too self-indulgent, self-possessed with needs and desires. Many of them twisted by a bombardment of media overload. A friend would give us a copy of a song or movie, first on tape, then CD/DVD, then emailed instantly. These new generations copy movies, games, books, and pass them along, creating gray areas to serve our selfish purposes. The digital invasion just made it all easier, leaving a disruptive path along the way.


As organic machines, human beings are fed higher education’s left-brained logic and mathematical view of the world, neglecting the right-brain’s thirst for creativity, music, love, compassion and beauty – the very things that make us unique not only in the world of animals, but in anything of our own creation.  Sure, we can create pretty Smartphone’s that empowers our creativity, plays our favorite music and connects us to our loved ones, but is the technology empathetic, compassionate and protective? Can the Smartphone itself protect you and your digital assets and information? Of course not, because it really isn’t that smart – still needs your fingerprint, pass code and/or online ID.


Our continued thirst, first for survival (after all, the Internet was designed to survive a nuclear attack), then individualistic empowerment, marketing, entertainment, and for monetary gain to feed the capitalistic juggernaut inadvertently created a virtual universe of digital data, further deteriorating our own importance in the overall logistical corporeal world. This invasion of the digital universe destroyed all imaginary and physical boundaries, creating a level playing field for everyone and everything, from the convicted murderer researching legal loopholes, to the innocent school girl, desperate for a copy of her favorite boy-band’s album to the Uber driver and the monolithic Yellow Cab Company.  Unfortunately, in order to keep up with the ever growing, rapid dissemination of data, and metamorphosis, through a myriad of new intuitive person-to-machine interface devices, we create even more data –much faster and better.


If God truly created Man in his own image, than is God also an organic machine, or a version of our own image that resides in an alternative universe that moves so fast there is no yesterday or tomorrow and everything happens at once? Alternatively, like God with Adam and Eve, did we also see our own creations molded into something extraordinary, pure and righteous, but then, much like ourselves, our creations ran amok throughout human society, and evolved into the monstrosities we now see every day on the news?


These new digital generations have blurred the physical boundaries, escaping into a virtual existent while driving in the real world, causing 25% of all the automobile accidents and even the National Highway & Transportation Administration (NHTSA) has determined that texting while driving is equivalent to drinking four beers before getting behind the wheel, calling it “another potentially lethal distraction.” On the other hand, Apple’s new Campus was design without door thresholds so that engineers had less chance of getting distracted from their work as they walked.


I’ve seen (as I’m sure you have) how this digital invasion has affected all aspects of human existence.  Several months ago, working alongside a small crew helping renovate my “fixer-upper, “were a man and a woman who were glued to their Smartphones. They were texting, which is not completely unusual in today’s youth, but it became quite frenzied. They were warned, which only stopped them momentarily, until the woman began to sob (not because we fired them). Unbeknownst to everyone around them, they were “a couple” and were having a lover’s quarrel, while working together, through text.


It’s 2017, and I’ve realized that the digital invasion has succeeded, and humanity, as it once was, has fallen. The physical world is (or was) different. There used to be clear boundaries. There used to be barriers erected for structure and cohesion, for productivity and serenity. Limitations created by physics for the easy absorption of information and knowledge. In my youth, a single business letter, with a physical buffer for the time allotted for its creation, mailing and arrival, alongside the patience for its forthcoming reply was accepted as the reality. It was slow, deliberate, and concise and even though at the time I was frustrated that it was such a slow process, there was no alternative. Even then, there were the signs of how the digital universe was disrupting our own individual creativity as is was only several decades prior to word processing software that we created wonderful letters using calligraphy.


Little did I know that I would mutate within the next twenty-five years, developing hyper speed superpowers, just to be able to mentally, physically and emotionally receive, respond and send  up to a hundred emails a day.


The building blocks of society, the boundaries setup by commerce, religion, physics, rules and laws have broken down. We blindly moved our existence into the digital universe, believing still in our imaginary boundaries, even though there was no one there to serve and protect.


There is no such thing as 100% digital security –only real-time intrusion detection, and 100% high availability and fault tolerance. There are even cyber guards who identify an intruder and direct them to a “safe house” within this digital universe for further interrogation and deciphering. We still need to continue to feed the delusion of imaginary lines and without our virtual deadbolts, chains, guards and alarms; we are naked and vulnerable in this new world. Cryptography only tries to keep up with processor speeds – our own treacherous machines that could calculate trillions of computations per second, surpassing the human mind completely. Most of us didn’t even know our front door was wide open, until it was too late, and so millions fall victim to identity theft, electronic robbery, privacy invasion, and life threatening cyber crimes.


We never need be concerned about invaders from the other side of the world when we were just a small dot on a physical map. Now, we’re in the same bits and bytes neighborhood, mere nanoseconds away. The Second Amendment of the United States Constitution cannot even protect us as this alternative universe does not follow the same rules as our physical world. Its attacks may come in a blink of an eye – silent, intrusive, destructive.


If you cannot protect what you own, you don't own anything.


In 2000, I collaborated on a book on networked media with an executive from Hollywood. He had the foresight to envision this new world and was correct on every speculation, but one; the ferocious speed and depth of the change and disruption that lay ahead for humanity.


We live in an age where businesses have a Facebook page; where you can learn anything from the palm of your hand; where you can deposit a physical check into your bank account without a physical check; Smartphone video clips of cats getting tens of millions of viewers; hostages sending silent texts and videos of their captors; and police officers who would rather use their Smartphone at a crime scene than their police radio.


A friend of mine in law enforcement once told me that if he left for work in the morning and forgot his gun – not a problem, but if he left for work in the morning without his Smartphone, he’d have to turn around and go back to get it. We’ve all done that, haven’t we? It’s not about forgetting our phone. There’s a phone we can use in the office, isn’t there?


It’s all about The Data.


In 1990, before the Internet was a glimmer in anyone’s eye, Roger Fidler coined the phrase “Mediamorphosis,” which refers to the transformation of communication and media spearheaded by perceived needs, social and technological innovations.  About the same time, after reviewing centuries of research data, Stanford professor and Futurist Paul Saffo suggested it takes 30 years for a new idea to seep into the culture.


Well, its 2017, and if you do not have a Digital Strategy  (e.g. Data Governance, Data Management, Data Mobility, Data Protection, and Data Analytics) for digital transformation – and I’m not talking about your company – I’m talking about you; you’re almost three decades behind.


Unless you succumb completely to the digital universe and follow the latest calculation of the real world vs. the Internet calendar – then you’re about 120 years behind.


Welcome to the Future. There really is no escape. We truly have all been assimilated.

Anthony Caputo

Head in The Cloud

Posted by Anthony Caputo Employee Jan 31, 2016

Whether we like it or not, or understand it or not, it’s become clear to me that the next step in our technology evolutionary path is the assimilation into the “Cloud.” It’s not a new idea, but as we have more and more everyday devices connected to the Internet (The Internet of Things), we find the need to include steadfast 24/7/365 cyber security resources to protect our data, and accessibility with the accountability that many organizations lack, which is the most attractive aspect of “the Cloud”.


For me, it started back in 2000, when I wrote my book Build Your Own Server for McGraw-Hill. During that writing process I decided to do an experiment, which I detailed in the book. I built a Windows 2000 Server, a Red Hat 7.1 Server and an Apple OS 8.6 server and I plugged them all into my cable modem. This was pre-Service Pack 1 for Windows 2000 Server, which plugged up a hole in the operating system (something about including the Anonymous login into the Everyone group), and so, after about 90 minutes, the Windows 2000 Server started beeping, whistling and automatically shut down. At first, I didn’t know what had happened because I didn’t anticipate a cyber-intrusion within 90 minutes. I wasn’t able to boot the system up again. I reinstalled the operating system and unplugged it from the Internet.

The Red Hat 7.1 server hosted one of my websites at the time, using dynamic DNS. It lasted seven days before it crashed. The Apple OS 8.6 lasted about two years, hosting my web sites, but the SMTP server was then hijacked for email spam. Within the two years, I worked diligently on locking down my Windows 2000 Server, which I wanted to use as a web server, application server and FTP server. I introduced a small business router to block unused ports, configured local and remote access security policies, tried to stay up to date on security patches, updates, anti-virus updates, created more complex passwords, but it finally became clear to me that the management, and maintenance of a live server was a full time job. Invariably, someway, somehow, some ankle biter or computer farm found the latest vulnerability and hijacked the machine for personal use, present some flag of defiance, or simply crashed it; because they could.

I decided, since I already had a full-time job, it was time to find a hosting service. I needed more than just web pages from templates. I needed a secured, managed operating system that could provide me with a easy to use control panel for various web-based applications and statistical information. Still use them to this day, and they provide me with the hardware, bandwidth, operating system and applications I need to host as many websites and web-based applications as required, using as many domain names as I want – I just have to pay a monthly or annual fee. I can honestly say, after over a decade, the service has been well worth it. I’ve delegated the task of fighting the forever war in cyberspace to their cyber security agents – human, hardware and software. They fight the good fight.

In 2010, Elsevier, my publisher of Digital Video Surveillance and Security suggested I create a Blog to promote my book. This was a challenge as most of my projects are centered around sophisticated public safety and homeland security technologies and well – it’s not the kind of subject that should be posted freely on the internet for everyone to absorb and share. However, it’s the 21st Century and its now also the author’s responsibility to market his works, in hopes that at least one out of ten people will actually pay for a copy, rather than download it for free. Yes, even writers have succumbed to the “Power of the Network.”

I seem to recall that as an option on my hosted services, there was Word Press – a popular blogging application that could be installed onto my website at no additional charge. I only knew about Word Press because my son had previously started a music blog on his website I hosted for him and it had some impressive templates and features. And so, I delved into blogging, with vague fluff pieces and technical observations.

I never anticipated my book becoming as successful as it’s become. The first clue was the 25+ reviews on Amazon for the first edition. My Build Your Own Server book, and even a competing book received only less than a handful. The second clue was the 35,000 hits on the Blog every month, but the final clue was when my publisher asked me to write a second edition.

Meanwhile, I started experiencing issues with Word Press. I tried to stay on top of the installation of security updates, added plug-ins that were supposed to protect the application, but those plug-ins also needed updates and security patches. I remember missing one update by two days and it was too late. Even though I disabled commenting, there were dozens of spam comments waiting to be approved for posting and the blog was down.

I had since shut off all interaction, so no one was capable of subscribing, commenting or reviewing. It was just an information portal with no interaction because I didn’t have the time nor couldn’t I stop the invasion. I read lengthy documentation on how to protect it; to lock it down, but it became more of an effort that trying to find a muse to post brilliant insight for readers. It seemed to just get hacked every two weeks.

I decided to go deep into the coded files and attempt a different approach. I did research on the specific files that were mostly targeted and changed them to read-only. Of course, this stopped me from even logging in, unless I reverted the files back to execute, but I had had enough. It got to the point that the more effort I put into the protection; the more creative and aggressive the attacks, until finally I just moved what was left of the Blog to, who like my hosting services company, also has full time cyber security agents. Unfortunately, at that point, I lost all interest. Nothing like troubleshooting technology to scare a muse away.

Cloud computing is defined as the practice of using the Internet to store, manage, and process data, rather than using a local hardware and software. It separates the hardware, from the operating system from the applications. Obviously, not a new idea, as I continue to succumb to its allure for my own personal lucidity. The migration into more and more web-based applications, the continued exponential speed of processing power, and the growing “Internet of Things” continues to elevate the complexity and sheer girth of 24/7/365 maintainability and support. I’m incline to agree with Lev Grossman of TIME magazine, who in his commentary about the Sony Pictures hack states that corporation data breaches happen all the time (whether we know it or not), and as networks get more complex, the harder they will be to defend, “to the point where there’s no such thing as an impenetrable system.” Even if you can achieve 99% up-time, that’s still seven hours of downtime a month, or three and a half days per year.

The Internet of Things is the objects or devices connected, and the communications network where they can all connect, and the computing platforms that ingest the data flowing between all things. There are about 26 billion devices , with a global economic value of $1.9 trillion by 2020 and $9 trillion in annual sales by 2020 (source: IDC). This kind of growth requires more than an I.T. department for the escalating cyberwar, or even a team of cyber security agents (human, hardware and software), which automates processes for defense. When going to war, everyone needs an army. An army of cyber security agents that include global (24/7/365)human resources who are foxhole thinkers, visionaries and programmers, and their analytical software counterparts, which can work in computer speed and process suspected threats and breaches in nanoseconds, and automate the defense strategy immediately.

A full security analysis must be considered when evaluating porting mission critical applications to the Cloud Computing Stack of Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS), and Infrastructure-as-a-Service (IaaS). I’m not talking about “OoooOooo, it’s the Internet, and it’s not secure.” That’s irrelevant because everything is now connected to the Internet. That’s where the risk is high. It’s the Cloud infrastructure within that reduces that risk. We need the Cloud. It’s reasonable to be cautious about deploying mission critical applications into The Cloud, but do you really believe you or your organization has the human, hardware and analytical software resources (like the impressive Hitachi Live Insight for IT Operations) for the ever-growing cyberwar?

I had thought my insignificant servers were immune to the advances of cyber pirating over a decade ago. That was not the case. I also believed that my sad Blog application was insignificant in the cyberwar battlefield a few years ago. That was also not the case. I’m sure Sony Pictures believed they were immune or protected, but the fact of the matter is we’ve been daydreaming of the ubiquitous interconnected broadband network for decades and now we have it.

Be careful what you wish for.

I recently revisited the silicon world of micro processing and its evolution since I wrote my book Build Your Own Server (McGraw-Hill 2003), and was fascinated by how computing is shifting from what was an integrated circuits and component based system architecture, with the central processing unit (CPU) at its core, to a more consolidated, embedded and integrated system processing unit (SPU), that improves overall computing performance rather than just processing power; an evolution that appears to have not only changed computing, benefiting software development, but also seems to have derailed Moore’s Law.

In 1975, during a meeting of the IEEE International Electron Devices Meeting, Gordon Moore stated that based on the increasing die sizes, reduction in defective densities and the simultaneous evolution of finer and smaller dimensions, “circuit density-doubling would occur every 24 months.” Although Douglas Engelbart had a similar theory in 1960 and David House of Intel factored in the increasing performance of transistors and transistors counts, the fact that “micro-processing performance would double every two years” has come to be known as “Moore’s Law.”


Moore’s Law is an important consideration in any traditional system architecture that uses a central processing unit (CPU), surrounded by multiple system components. Encryption and cryptography, compression and decompression and any data transmissions uses processing power. The more complex the intelligence, the more GFLOPS of processing power that is required. GFLOPS is a measure of computer speed; a gigaflop is a billion floating-point operations per second.

The table below identifies the increase of CPU speed every two years, beginning in 1998, using simple desktop processors. Column A is the year. Column B shows the theoretical leap in processing speed from two years prior, while Column C shows an example of a real-world CPU released that year. Column D is the multiple of the previous processor speed. For example, in the year 2000, processing speed more than doubled from about 233Mhz to 500Mhz. Column E is the processing speed of an identified make and model CPU (listed in Column J). Column F presents the Front Side Bus (FSB) speed (more on this later) and then the Direct Media Interface (DMI), where the processor embedded the FSB for better performance between the CPU, memory and the video graphics hardware accelerator (see F1-6 versus F8-10).

Columns G through I shows the evolution of the internal CPU Cache, used to store repetitive tasks within the CPU rather than reaching out to RAM, for even faster recall. Column K shows the number of transistors within the CPU.


table 1.jpg

I believe in order to explain the significance of these changes; one must understand how personal computers work. A personal computer was a system of various components that served specific functions. This was primarily because of size limitations of integrated circuits (180 nm in 2000, down to 10 nm today).


I’ll try to make this as painless as possible.


Personal computers have many different types of memory, used for different purposes, but all working together to make the interaction work seamlessly and faster. The system’s first access to memory happens even before the operating system boots up. The computer BIOS (basic input/output system) is stored in CMOS (complementary metal-oxide semiconductor) memory, powered by a lithium battery. This is how a personal computer becomes self-aware, through a small memory chip that will insure that your basic configuration information (date, time, etc.) stays the same the next time you turn on the power. CMOS is Nonvolatile Memory, which also includes all forms of read-only memory (ROM) such as programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and flash memory. Typically, a battery powers Nonvolatile Memory, sometimes referred to as Non-volatile RAM (Random Access Memory). The BIOS holds just enough data to recognize itself and its components, then loads the operating system into RAM when you boot up the computer.


Although RAM is a microchip itself, it doesn’t permanently write itself to that chip or any other hardware. It’s a virtual place where all software is first loaded after you boot up the computer or double-click an icon. This makes it easily accessible by the CPU. RAM is very important for your computer’s performance simply because reading and writing to RAM is much faster than the standard hard drive. RAM access time is in nanoseconds (ns), or one billionth of a second, while the measured access time for the standard hard drive in milliseconds (ms), or one millionth of a second. A newer solid-state drive (SSD), or solid-state disk or flash disk does not contain a drive motor or spinning hard disk. It's a integrated circuit assembly of memory, but unlike your system memory, it stores the data persistently, so now the hard drive works in nanoseconds, it also does not have any moving parts as a point of failure.


Side note: If you have a standard spinning hard drive that fails to boot up, stick it in the freezer for 30-60 minutes. Works every time.

Unlike the hard drive, RAM only exists while the power is on. When the computer shuts down, all the data held in RAM disappears. This is why you lose all your data if you haven’t saved it. It saves it to the hard drive to be accessed later. When you boot up your computer all your software (this also includes your operating system) is loaded once again into RAM from your hard drive.

Most personal computers allow you to add additional RAM modules, but up to a physical limit and processing speed. If there’s not enough RAM, then the system will begin writing to disc in the form of a pagefile. The pagefile is an allocated space on the hard disk, used as an extension of RAM. If you open a file that requires more space than readily available, the system will begin to write any idle data to the hard drive. This process itself can eat up RAM and time, depending on the operating system, processor, and how fast your hard drive can write. The advantage to a pagefile is that it keeps the data in one large file, making it faster to access later, than trying to recollect all the data from their original locations, which could be scattered all over multiple locations.


As computers evolved into churning more graphics and video (loads more data processing than simple ASCII text), and analytical data processing of those pixel intensive streams and its metadata, a separate hardware graphics accelerator, or graphics processing unit (GPU) was added, with their own exclusive video RAM, similar to your system RAM except that while the processor writes data into video RAM, the video controller can simultaneously read from RAM for refreshing the display graphics display. Many GPUs, whether embedded onto the motherboard, or as a separate expansion card module, used to share memory with the system memory, essentially stealing a percentage to improve the video performance. Today, GPUs can come with its own share to improve video performance whether embedded within the motherboard or system board or even the CPU itself, called an Accelerator Processing Unit (APU).


Now that you understand a bit how a computer works, let’s take a look at the diagram below which shows the logical system architecture of a computer motherboard circa 2002, from my book Build Your Own Server. The CPU used to communicate with what used to be called the Northbridge, another microchip that controlled communications between the CPU and the Memory and GPU. This was called the Front Side Bus (FSB). If you wanted a high performance machine for video games, graphics or digital video processing and production, the speed of the FSB was usually the bottleneck, embedded within the motherboard. You had to find the right mix and match of the various components to build an efficient high performance machine.


System Architecture 1.jpg

Personal Computer System Architecture, circa 2002

As depicted in the diagram above, back in 2002, computer motherboards had a Dual Independent Bus (DIB). The bus of a motherboard is a circuit arrangement, which attaches devices to a direct line, allowing all signals to pass through each one of them. The signals are unique to each particular device, so the devices only understand their own signals.




ATX Motherboard 1.jpg
Personal Computer motherboard/system board, circa 2002

The FSB is the data path and physical interface between the processor, video graphics accelerator or GPU and RAM, through the Northbridge, which then connected through to the Southbridge that processed communications to the Parallel Printer Port, Serial Ports, PCI, IDE and/or SATA and USB ports. So, when grabbing data from RAM in 2002, you were working in nanoseconds at 800 MB/s versus 33 MB/s for the IDE Hard drive, when writing to the pagefile.


Intel's Sandy Bridge, introduced with the Core processors, and AMD's Fusion processors (both released in 2011) integrated the FSB and Northbridge into the CPU, replaced by Intel’s QuickPath Interconnect (QPI) or AMD’s HyperTransport to the Platform Controller Hub (PCH) architecture, formerly the Southbridge, which then became redundant; now directly connected to the CPU via the Direct Media Interface (DMI).

sandy bridge 1.jpg

Personal Computing System Architecture, circa 2011

The modern CPU is more of a SPU (system processing unit), integrating original bottlenecks to provide smaller, faster performing computers and computing devices.

Beginning with the Pentium Pro, the level-2 (L2) is packaged on the processor. If the data resides in L1 or L2 cache, the CPU doesn’t need to even look in RAM, thus saving even more time in operation. An embedded level 3 (L3) cache and a 128M L4 cache was added with Intel’s multi-core processors, and Ivy Bridge (2011-2013) gave the GPU its own dedicated slice of L3 cache in lieu of sharing the entire 8MB.


ITX Motherboard 1.jpg

New Mini ITX Motherboard/System Board

Below I’ve added some images to give you an graphical output example of what all this geek-speak means visually. Although there were discussions that Moore’s Law will eventually collapse, it was sometimes pushed back another decade, or some academic professors believe it was still 600 years into the future, while others believe once we create a transistor the size of an atom, there’s no where else to go.

In 2003, Intel predicted the collapse would be between 2013 and 2018, which based on this little exercise seems accurate, although the numbers would be different if I added the high-performance server-side processors, like the 18-core Xeon Haswell-E5, with 5,560,000,000 transistors, or even the X-Box One with 5,000,000,000 transistors. However, I’m not sure it’s a fair assessment because I believe the rules have changed. I don’t see the same Central Processing Unit that managed a sophisticated system of integrated circuits and components from 15 years ago. I see the SPU, designed to expedite the rapid development and deployment of the Internet of Things. Smaller, faster, smarter computing devices, and not personal computers.


Computing continues to evolve. It will be interesting to watch as we now slip the new “personal computer” into our pocket and/or purse.

PC 2000.jpg

Personal computing, circa 2000




Personal Computing, circa 2016 (can't get more personal than something you carry in your pocket)


Tomb Raider II.jpg

Video game graphics, circa 1997 (Tomb Raider II)


Tomb Raider 2016.jpg

Video game graphics, circa 2015 (Rise of the Tomb Raider) Tomb Raider is Copyrighted 2016 Square Enix


CCTV 2005.png

Video surveillance CCTV resolution 30fps (352 x 240 pixels CIF), circa 2005


CCTV 2016.jpg

Video surveillance CCTV resolution 30fps (2048 x 1536 pixels 3MP), circa 2016

There are a few devices on the market that can provide network failover and load balancing multiple Wide Area Network (WAN) connections. The theory is, if you lose one of multiple WAN connections, the router seamlessly failover to a secondary and/or third WAN connection to transmit and receive crucial data. It can also load balance bandwidth from all the connections, which if using 4G LTE can provide relief from extensive public usage and saturation.

load balancing road.jpg

I've researched and configured a few devices. I've been primarily focusing my efforts on the Cradlepoint IBR1400 and the IBR600, along with the Pepwave MAX 700 at two locations. In order to configure these devices properly, you need at least two WAN connections.


My first location was an isolated cottage, with some perseverance, invoked AT&T to install a Uverse Internet connection for $40 per month. The infrastructure limited the bandwidth capabilities to up to 12Mbps download (excellent for Netflix), and only 1Mbps upload (poor for video surveillance video). In may not appear to be such a monumental task to get wired nowadays, but this isolated cottage was on a small secluded island, in the middle of a river. It was well worth the effort as wired broadband connections are sold as infrastructure and not by bandwidth or data usage such as the cellular carriers. The data usage costs for uploading security video (or download video) can be astronomical.  Even wireless 4G LTE service providers, with an unlimited data plan does not include uploading a continuous bandwidth intensive stream of synchronous security video. Experience has proven that at peak capacity, because it's a shared public network, bandwidth will be throttled, reducing quality and performance.


This cottage provided free reign to temporarily install anything, on any pole or structure without city ordinances, codes or permits and privacy issues, as it was secluded and used infrequently. I've installed four cameras to test this new failover and load balancing test case. Power was a challenge as usual, but nothing an upgraded circuit breaker and some trenching couldn't handle.


I decided to install various cameras for comparison. My analog Pelco Spectra IV PTZ workhorse, connected to a digital video encoder; a Hikvision 1080p PTZ; a Hikvision Darkfighter 1080p PTZ, and a fixed 3MP Longse IP camera, with IR. All together, these four cameras, to get optimal quality require a minimum of 10Mbps upload through a WAN connection to view remotely. I had to trim down the frame rates substantially and got it down to 5Mbps. So, now how do I get a 5Mbps elephant through a ATT Uverse 1Mbps fire hose?


The Milestone XProtect server received the full 15fps at maximum resolution through the LAN, the Pelco using MPEG4 and the Hikvision and LongSe using H.264 without any issues. The video stream on the Pelco Spectra IV was set to the maximum analog 4CIF resolution and performed as expected. Sharp, smooth and exceptionally well at low light. By comparison, next to the Hikvision HD PTZ 1080p megapixel video stream, the image quality was found wanting. The Hikvision Darkfighter image was crisper, with exponentially far more details in the area-of-coverage, including color correction, leaves, grass, and even wood grain and the best low light capabilities without IR I've seen to date. However, the pan-tilt-zoom on the Pelco out-shined the clunky Hikvision. This probably has something to do with the TCP commands for PTZ (versus the UDP streaming for video) and the processing power behind streaming only 4CIF versus 1080p.


The 3MP LongSe camera was powered by a PoE injector (all the equipment is in climate controlled LCOM enclosures). I have another one of these cameras powered using a PoE Injector at the other location, with impressive image quality, although with a maximum frame rate of 15fps. Of course, I have Comcast/Xfinity internet at that location, with 20Mbps upload bandwidth.


The math identifies its connectivity issues are not switch or power related, but the limited ATT Unverse 1Mbps upload speed, which barely gave me a single frame per second for the cameras, and erratic PTZ controls. A 3MP camera, at full resolution, using H.264 30% compression requires about 900Kbps for a single frame. A 1Mbps upload pipe doesn't leave much room for anything other than a signal frame from the 3MP camera.


Obviously, seemed like a perfect location for some load balancing and failover testing using a 4G LTE network or two. Started with the Cradlepoint AER2100 Multi-WAN Router, using swappable USB modems, with exceptional horse-power, but it's 8" x 10" x 2" size wouldn't fit into the enclosure. The smaller COR IBR600 was the right size, but included an integrated modem, limiting flexibility. The Cradlepoint MBR1400 provided a smaller form factor, and swappable USB modems ports. However, unlike the AER2100 or IBR600, the MBR1400 wasn't a hardened device with extended temperature specifications, so I had to upgrade the enclosure for extended temperatures.



Cradlepoint MBR1400

Would’ve like to have used the Pepwave MAX700, but I grew too fond of it in my home lab for testing platforms and devices. The Pepwave MAX700 includes two separate wired WAN inputs and up to four USB modems inputs.



Pepwave MAX700

Below is the configuration using the ATT Uverse as the primary WAN connection, an ATT 4G LTE  and a Sprint 3G/4G connection (limited bandwidth in the region) for failover and load balancing. The ATT 4G LTE and the Sprint 3G/4G are configured as “on-demand,” which limits data usage to only when needed.


Cradlepoint configured with three load balancing and failover WAN connections


The WAN connection bandwidth demand jumps when streaming from my Milestone video management server to view the camera remotely, especially when viewing all four cameras at once. Now, thanks to the load balancing, I’m able to get some crisp imagery with smoother control of the PTZ cameras.

Data usage example chart

Incidentally, when I tested the failover feature, by unplugging the ATT Uverse wired connection, it took me a while to realize that it worked. It was that seamless.


Remote Live View of cameras using Milestone Xprotect Web Client

Anthony Caputo


Posted by Anthony Caputo Employee Oct 31, 2015

It was a decade ago when I starting diving into digital video surveillance products and technologies. I've been into photography since I was a teenager, when I purchased my first single lens reflex (SLR). It was a Canon FTB with a f1.4 lens. It wasn't new, as they were expensive, but it was new to me. It was an all metal, mechanical workhorse with a light meter being the only electronics. I worked at a neighborhood camera store as a teenager where they promoted me to Assistant Manager for a while before I went off to college. It was my first introduction into the world of optics and photography - and cameras.


Cameras were very different back in the 1970s. They were heavy metal and used film for taking pictures. There was no instant gratification as with digital photography today. As a photographer, because of all the variables in lighting, movement, f-stop, aperture and shutter speed, you shot many rolls of film hoping that a handful would turn out just right. Although I did freelance photography while going through college (even had a darkroom to develop pictures), my fascination was the cameras. These technical marvels had the power to capture a moment in time. We had a bin at the camera store of broken, discarded cameras and I would dissect them. Eventually, I was able to successfully reassemble them, too.


One day, a woman came in and wanted to buy the new electronic Canon AE-1. She had an older Yashica SLR she wanted to trade in, but she said it stopped working. I examined it. It was in perfect condition, so she probably didn't use it much. Many people bought fancy SLRs but rarely used them as they were too heavy and complex for family photos.


The shutter release button didn’t work and the film advance lever was stuck. If it was a Canon, or Nikon or even a Minolta, it may have been worth sending in to fix, but a Yashica? I saw it’s future in the discarded camera bin.  She mentioned she didn’t like it and wanted something easier to use so she did buy the Canon AE-1 and was happy with the $20 I gave her for trade-in on her broken Yashica.


After the Holiday rush, I came across that Yashica in the old camera bin and decided to dissect it. I was determined to fix this one as it was otherwise in perfect condition. I laid out a white piece of paper and some two sided tape (to hold the tiny screws) and when I got down to the mechanical gears that advanced the film cartridge I discovered a tiny loose screw wedged into the gear. Removed it, reassembled it and the camera was as good as new.

You’re probably wondering what all this has to do with Pelco, the onetime king of industrial strength, commercial grade video surveillance pan-tilt-zoom (PTZ) cameras.  I recently had the privilege of visiting the factory in California, where I watched them fabricate and mold metal, solder printed circuit boards and assemble cameras for shipment. They may have been dethroned by Axis and Sony, but in 2006, when I worked on the homeland security camera deployment for the City of Chicago, Pelco was the standard. This $3000 analog camera is still a marvel of optical, and mechanical technology.


One ended up in the “junk bin” at the office one day back then. I inquired about it and they informed me it was broken and not covered under warranty. It appeared one of the installers wired the power wrong and fried it.


“I’ll take it,” I said, and IBM let me have it. It isn’t every day that you get to dissect a $3000 Pelco Spectra IV.


When I dissembled it, I admired the industrial design and quality.  I found my way to the printed circuit board on the back box where I noticed a blown fuse. I replaced the fuse, reassembled it and the camera booted up and was as good as new.




Over the years, this camera has survived being used for imagery and testing for my Digital Video Surveillance and Security book, a divorce, three moves, nine Chicago winters, with a few snow storms, 100+ F August days, and even a flood. Although the Pelco Spectra IV was 18 feet high on a pole, the power supply and video encoder fried while underwater.


This past weekend, we closed our cottage, where the camera resides for the past couple years. I moved it from that pole to the back of the garage. While I singlehandedly tried to install it, the camera module popped out of its back box and housing and fell 15 feet to hit the leaves and rocks below. There was no dome to catch it. During my visit, Pelco gave a me brand new clear dome. I didn’t add it because I did not want to scratch it during installation. Rookie move. (They really need to get away from the snap-in tab design and move to screws like everyone else).


My heart sank as I watched the camera fall in slow motion and make impact on the ground and bounce like a basketball. I tried the camera and of course it didn’t work. I don’t know of many electronics that can survive a 15 foot fall. It was a hour or too later, after the initial shock subsided that I decided to dissect it and see if I could do something. After all, this was the mighty Pelco Spectra IV. There was a piece of plastic broken off around the lens. Not a good sign.


When I got to the internal PTZ mechanism, it seemed fine. All the gears and belts were still linked together and working smoothly. That’s when I noticed the sensor printed circuit board had popped out of its bracket and was no longer connected to the back box mount. With some finesse, I was able to lock it back in place. I reassembled the module and plugged it back into the backbox inside outdoor enclosure (and quickly added the dome).


I powered the camera up and it came back to life. I smiled. Made in America and even after a 15 foot drop, it takes a licking and keeps on ticking.


Let’s hope the same holds true for Pelco as a company.

Traditional security command centers included multiple closed circuit television (CCTV) monitors that provided centralized surveillance technology, giving security operators the ability to expand their limited physical area-of-coverage. Thus, instead of patrolling a select location, a security operator can now split their attention to multiple locations, panning back and forth from different areas-of-interest from the comfort of a chair. They can use their keen eye on a wider scale, but at the cost of a more fragmented approach. Active surveillance works better when there's little activity, but a crowded venue or CCTV monitor creates natural blindness we are only capturing snapshots in time of the activity. We are only human. Analytical technologies has made it possible to become superhuman.


The linear experience made it somewhat less complex and so, all we need to do was sit back and watch. As an entertainment venue, television delivered information, but there was no interaction. The television changed society by informing the world what was going on around them and around the world.  Just as society became smarter with the information provided by television, security personal became smarter with CCTV.

Monitoring and security professionals are  trained to be alert, observant and smart because monitoring security cameras is not linear, but the beginning of our new interactive media world.  When a security professional observed or was radioed suspicious activity, they would need to evaluate and respond immediately, based on their experience and training and internal security policies and procedures for risk assessment, criticality and effectiveness. 

However, there was always an inherent inefficiency in ‘human eye’ surveillance of multiple monitors presenting linear video, because of human nature. We can be easily distracted because we are not wired to absorb every single detail from a linear video presented to us, and studies show that the effectiveness of an operator monitoring two security cameras drops 95% in about 22+ minutes.  This is due to a number of issues from environmental ergonomics to behavioral; being bombarded by information overload.

Many of the command and control centers I've seen (from homeland security to schools to major corporations and transportation hubs), include multiple  video walls, multiple workstations with multiple monitors, multiple televisions playing 24 hour news (still informing us), all running a plethora of applications from video management system software, to incident reporting, 911 Emergency, intrusion detection, fire alarms, elevator alarms, access control, perimeter intrusion, and panic/duress alarms.

The world has changed dramatically since the invention of CCTV, which is nothing more than a linear collection of silent moving images.  Security operators cannot absorb the necessary information fast enough because everything moves faster and there are too many monitors and not enough trained eyeballs to decipher their meaning. Monitoring operators typically work in four to eight hour shifts, and with hundreds of cameras, an active approach to surveillance becomes cumbersome and costly, and still plagued by the basic fact that human beings will have a difficult time monitoring live video feeds for extended periods of time.

Dr. Richard Mayer, a psychology professor at the University of California, has done extensive research concerning cognition, instruction, and technology in multimedia learning, and thus proposed a 'cognitive theory of multimedia learning.'  He wanted to replace the behavioral perspective (what classrooms have been like most of the last century) on multimedia instruction with a more cognitive and constructivist approach.  The behavioral perspective sees students as passively absorbing new knowledge, using practice activities, and memorization, while a cognitive and constructivist approach in more 'like real life experiences' or 'interactive.'

When an animation about how a bicycle tire pump works was presented concurrently with systematic narration, the students significantly outperformed those who just read a textbook.  Additionally, using spatial contiguity (printed text, with related pictures near or integrated) students showed significantly better recall and problem solving skills (faster), than those that just read a textbook.  When reading a book, or watching silent video imagery, you're using a single channel of data consumption.  When seeing imagery, listening to narration, and reading words together, as in a graphical user interface, you've opened up three channels of data consumption. Theoretically, you can absorb the information three times faster, or in the case of a security operator, you can make a decision three times faster.

Designing a centralized, interactive, intelligent command and control center integrates all security assets, available application technologies, alerts and alarms, and then presents them to the security operator faster, through multiple communication channels to provide them with the information they need to make the right decision at the right time. The convergence of media, communications and data analytical technologies creates a multimedia command and control center using analytical intelligence to provide security operators with a brain center formulated to immediately teach them about active and previous incidents, their level of severity, and give them the information they need to sit up and take rapid notice, and be superhuman.

For example, while the security operator in charge of  a few select monitors is watching the latest crisis on CNN, or reading a text, a backpack is left behind at a train station. Analytical software continuously analyzes every frame of every video stream being presented and/or recorded. The threshold is two minutes between when the owner and the backpack separate. If that owner doesn't return in two minutes, an alert is generated in the software, which initiates a series of events that generates an alarm. That alarm includes a range of multimedia elements including changing the lighting in the command and control center to red, signifying an alarm has been triggered. An audio alert is activated because the analytical software has identified the backpack owner as having left the station on a train. On the far left video wall, alarm tiles showing video footage is presented. One from two minutes ago, one with live footage, another of a PTZ camera zoomed into the back pack in question, and yet another following the backpack's owner into the last train leaving the station. An interactive map is presented onto the video wall of all other security cameras, and GPS coordinates of all mobile security vehicles and officers in the general area, along with the trains leaving and entering the station.

Meanwhile, facial recognition has identified the backpack owner by cross-referencing the face capture with worldwide databases and presented his identity not only on the video wall, but to each mobile unit in the vicinity (smart phone, tablet, laptop). The security operator has since taken notice, thanks to the multimedia alarming, spoke briefly with his supervisor and clicked the alarm function on his workstation to send a message to the rail dispatch to stop the train the backpack owner is on, and all trains scheduled to enter the station. After quickly reviewing all the data before him, the supervisor approves an evacuation of the station via the two-way audio paging system built into the camera. The bomb squad enters the station shortly thereafter.  This superhuman system can also provide automated audit trails, trend analysis, effective risk management in an ever changing environment, statistical data, incident reporting and budgetary risk assessments.

By making today's technology work for you, instead of working around the technology before you, important decisions can be made within minutes, saving lives, property and time, because in an ever faster moving world of video clips and twitter, time becomes even more crucial.

This is what Hitachi Visualization Suite is all about.