Anthony Caputo

Hitachi Video Analytics Test Results

Blog Post created by Anthony Caputo Employee on Nov 24, 2018


Before we discuss digital video analytics I need to explain, as painless as possible, why the following examples have inspired me to write this post. You see, I’ve been working with digital imagery and video since the 1990s and I’ve come to understand that the image presented on your screen is made up of digital pixels. In the digital world of absolute mathematical equations, pixels are not measured in dots of Cyan, Magenta, Yellow and Black, like the offset printing process, but rather in bits and bytes. A digital pixel represents visual color. There are 8-bits (1 byte) in a black and white image and 24-bits for a color image (1 byte each for Red, Green and Blue). So, each pixel contains 256 shades of gray (for black and white) or 256 shades of Red and 256 shades of Green and 256 shades of Blue, or 16,777,215 colors for a color image. If you’re wondering what happened to the Black in the transition from CMYK in print to the RGB of pixels, mix Red, Green and Blue paint together, and see what you get – black. The richness of the blacks are also defined by brightness and contrast in the digital world.

This is why your 1080p television looks so much sharper and more colorful than that old CRT television, because the digital image has more pixels to pick up more detail and color variables. However, more pixel depth doesn’t make a smarter camera, only a better quality image.

Now that you understand how the IP camera image processor captures visual images in the analog world, the next step is motion. Digital motion pictures is achieved the same traditional way Thomas Edison achieved motion back in 1901, with frames per second. The rapid succession of multiple snapshots of the field of view captures the color changes at a rate per second providing the illusion of movement on screen.

The real magic of digital video is the compression and decompression (Codec) algorithms. These codecs analyze motion within the multiple frames and dissects them into blocks, categorizing them into special frames and data for transmission. This is a necessity for the transmission of digital video because transmitting full 1080p frames per second (MJPEG) requires about 31 Mbps bandwidth (yes, thirty-one megabits per second), versus the H.264 codec, which can transmit the same quality imagery using only 2.5 Mbps. Further details on Codecs isn’t necessary for this post, but only to explain that Codecs do not care what is moving within the digital image to encapsulate that movement within its macroblocks. It’s only function is to shrink the video stream for transmission and populate less storage space when recording.

Digital pixels identify color. Multiple frames creates the illusion of motion. Codecs just shrink it for transmission and storage. The fact of the matter is, IP cameras are not very smart. They do not know what they are “seeing.” They do not know what is moving; they just capture, replicate and transmit. They don’t know the difference between blowing snow and a person walking across the scene. This is why video analytics systems have failed in the past, because software only cares about the pixels so you’re limited in trying to understand what is actually being “seen.”

Traditionally, analytical software is limited to the data received from these IP cameras, and so they analyze pixels (color), motion (FPS) and once calibrated, begin to understand a difference between something that’s 10 pixels and 50 pixels in size, calculate the time between frames and determine that the 10 pixels maybe a person walking and the 50 pixels is a car speeding, if its calibrated as such. The moment the lighting changes (which changes the color), or that person opens a giant umbrella, or that car slows down, it needs to be able to categorize shapes in order to remember that, “wait, that’s still a car.”

So you see, when I was assigned the task of testing and creating demonstration samples for Hitachi Video Analytics Suite, I was quite apprehensive in accepting the project. I envision hours of frustration ahead of me because IP cameras and software are not that smart. I wanted the killer app (analytics) to be that smart. I envisioned re-purposing the tens of thousands underutilized security IP cameras into Smart City sensors.

HVA not only surprised me, it impressed me. One of the first examples I created is below. When I realized HVA Object Detector could be calibrated to ignore moving objects, I remembered a use case from a decade ago that involved sending a real-time alert if there was a stalled vehicle or person at a railroad crossing. I recalled it took a freight train over a mile to stop and cost millions of dollars a day for delays, let alone the liability. HVA Object Detector ignored all movement, including any cars crossing the tracks and sent an alert when the person fell on the tracks

Watch Video

HVA Intrusion Detector includes a built-in filter for weather conditions. I inadvertently performed a test comparison between the analytics built into a camera and HVA by tapping into a video stream from a backyard camera which I had configured with its built-in analytics. The only method of calibration and configuration for the built-in analytics was adjusting its sensitivity. Although all the false positives from animals made me realize what a jungle the neighborhood was (squirrels, cats, raccoons, possums), I eventually disabled the built-in analytics, as I was sick of getting email alerts with snapshots of rain and snow. After a while, the continued reducing of its sensitivity doesn’t alert you to anything but the huge afternoon shadows that cause dramatic changes in pixel color. Absentmindedly, I did notice that I didn’t receive any false positives from the HVA Intrusion Detector, ingesting another RTSP stream from the same camera. That’s when I decided to create the example below. Simple area protection configuration, taken during snow fall. HVA ignores the snow, and the squirrel running around, and only alerts me when the person walks into the frame.


Watch Video

HVA knows what snow is. The intelligence behind the snow, rain, haze and fog filter that’s built into HVA Intrusion Detector is also available in the HVA Video Enhancer module. Impressed, I decided to give it an even bigger challenge. How about a Chicago-style snowstorm? Analyze This! To the left is the actual footage, crazy windblown snow creating white out conditions. It gets to the point at the end of the clip that there’s so much snow, it tricks the camera back to color mode, thinking it was daylight. The clip to the right is the sample video processed through HVA Video Enhancer, which now can be ingested into other video analytic modules for better accuracy and performance.


Watch Video

HVA really does know what snow is. The HVA Intrusion Detector sample clip below is configured for Perimeter Intrusion. A person must walk from the green zone into the red zone in order to be recognized as an intruder. Even though I configured the zones to be the same size, HVA’s ability to recreate a three-dimensional space from the two-dimensional image, it understands perspective so it recognizes that the figure attempting to enter the facility is 1.8 meters tall, and an intruder at each door.


Watch Video

A unique and very effective module is the HVA Privacy Protector, which enables the ability to protect the privacy of individuals and still allow for video monitoring for safety and security. I configured the HVA Privacy Protector example below with a couple layers. First, I wanted the ATM to always be pixelated, to protect PIN numbers, and the vehicles on the street, to protect license plates. Although HVA Privacy Protector is engineered for static fixed camera views, noticed how the persons-of-interest are still fully pixelated even when standing still? This stream is now available for input into other systems and/or analytics, such as Intrusion Detector or Object Detector while still protecting the privacy of individuals. The secured archived footage can only be seen by authorized personnel with the correct security clearance. You can even add a second layer of security using a Smart Card and transaction authentication number (TAN) for protection.

Watch Video


I created over a hundred test samples for all the HVA modules (listed at the end). HVA is impressive because each module has its own analytical engine, engineered to do that specific function. It’s not one pixel analyzer, and movement calculator that was built upon to do something more than its core capability. HVA also recreates three-dimensional space from a two dimensional video image and then adds the 4th dimension (time) for improved performance. You can also calibrate length of its 3D learning phase and each scene with multiple illumination states – day, night, afternoon, which also improves its performance and accuracy. It really does add more intelligence to cameras and I've tried it on many different types from a generic low-end bullet camera to the popular Axis cameras (including the panoramic), to the top of the line Thermal camera.I could go on with other samples, but you get the idea. I was apprehensive at first, but I’m excited to have been a part of this new technology release, and the thought that my dream of the analytics killer app for Smart City has finally become a reality. The Hitachi Video Analytics Suite:

  • Activity Visualizer
  • Camera Health Monitor
  • Face Collector
  • Intrusion Detector
  • License Plate Recognizer
  • Object Detector
  • Parking Space Analyzer
  • People Counter
  • People Counter 3D
  • Privacy Protector
  • Queue Detector
  • Traffic Analyzer
  • Vehicle Counter
  • Video Enhancer