Skip navigation

painting.jpg

When you look at the painting above, you see richness, depth and life, even within a still two-dimensional image. That's because an artist can see what ordinary people fail to see and are able to represent that elusive vision in a visual medium using just paint and canvas. These artistic nuances take the limitation of vision to a higher level of perception. The artist learns this deeper level of visual intelligence from an appreciation of the natural beauty and design around them. A fascination, fueled by an emotional desire to really see; to understand the universe around them and share that unique vision in a simplified two-dimensional design using a richer palette of vibrant color, geometric shapes, tones and perceived depth. An artist can see beyond just pixel color, because they have multiple senses and sensory fusion.

Although human beings only begin to learn about the visual world around us through the power of sight after birth, we understand even before birth, from multiple senses such as sound, touch, smell, and embedded genetic code that has given us the ability to identify more shades of green than any other color, so we may identify a predator hidden in the jungle for survival. We already have metadata to draw from in life. Infants may be limited to frowning, smiling, and crying to communicate, but they are a sponge for knowledge using our unique multi-dimensional channels for learning. An infant knows their mother’s heartbeat, taste and voice; recognizes the deeper baritone voice of their father; his larger, stronger hands and even their smells. Their connection to the world around them begins small, and grows as they grow. They eventually understand words from the sounds, not from the written word – that comes later, along with the magic of mathematics, when we jump-start the left side of our brain and formulaic language; opening up yet another channel of learning.

We now understand that that image, person, video, voice, sound, touch, heartbeat, smell and the word “Momma” all mean the same thing. Our continued intelligence and learning draws from all our senses all our lives. We eventually learn when shadowy figures in hoodies are a threat or just a couple kids who forgot their coats, by how they’re standing, talking, moving, whispering, and even smell. Is it a sixth sense that provides us with this threat assessment, or a life time of multi-sensory meta-data stored within our brains? How do we know when someone is looking at us from across a room full of people and not the person behind us?

Intelligence, as we understand it, is developed from a correlation of multiple senses, an intuitive connection to the universe around us, and life experience. We learn by sight, sound, smell, touch, taste and maybe even an intuitive, spiritual sixth sense of our surroundings. This happens while we are awake, and even in our dreams.

 

An artist draws from that inner-space to push themselves to create. They don’t just see a two-dimensional representation of a shadow as a grayed object reflection of light, but also the colors and life within that shadow. An artist can see life, even in darkness.

What happens to intelligence when the only sense available is sight, but not sight as we understand it, but like the sight of a newborn infant opening their eyes for the very first time?

An infant may be overwhelmed and confused while absorbing data from a three-dimensional world in awe and wonder. If we delete the human elements or feelings, there would only be data absorption. An infant with all five senses may only recognize their mother from her heartbeat, voice or smell. An artificial infant (or A.I.) with only the ability of sight, and without any self-knowledge (metadata), or desires and emotions fueling a drive to learn, to understand, to survive, to think (just left-brain formulaic language of ones and zeros) would simply continue its data absorption, collecting metadata from digital pixels from moving objects from multiple frames per second. This miracle of sight is also limited, as with only one eye, it cannot observe in three-dimensions; only compiles data for a flat two-dimensional representation of the analog world.

Without nine months of sounds, warmth and any pre-programmed genetic code, once its powered up for the first time (birth) our A.I. is at a disadvantage. A more powerful, larger image processor (retina) provides better sight, lets it be alone come darkness. Still, at this point, all our A.I. can do is capture pixel color data and observe motion; pixel blobs moving around their field of view like atoms forming molecules. It needs to be trained - to learn what these random bits and bytes are, beyond color and motion, even in a limited single framed field of existence.

Artificial intelligence (AI) is defined as the capability of a machine to imitate intelligent human behavior. We’ve already identified that human intelligence is multi-sensory with hereditary pre-programmed metadata, so is it possible for a one-eyed artificial infant who cannot listen, let alone hear; speak, let alone ask why; or even smell and taste to develop true intelligence?

Most all the video analytics software applications I’ve recently tested includes the ability to teach perspective to recreate the three-dimensional world from the single one-eyed two-dimensional field-of-view. It's a fixed field of view and metadata is created to identify that the ten-pixel object moving at twenty kilometers an hour, 100 meters away is the same type of object that is 200 pixels moving at twenty kilometers an hour, two meters away (distance must also be taught). That object then can either be automatically or manually categorized as a “vehicle,” so the artificial infant (machine) can answer the question “What is a vehicle?” A vehicle is only categorized by estimated size, shape and speed (calculated by motion between frames per second).

In time, much like a newborn baby, it will continue to collect data and become smarter, or if it's a multi-dimensional analytical algorithm, it may capture additional metadata beyond just “vehicle,” such as color, shape, size, even vehicle type and store that information for the future. Mounds and mounds of categorical metadata that can be used for forensic searching, real time alerting and more importantly, building knowledge.

That’s how the typical artificial infant, with machine learning (and machine teaching) software can identify a vehicle. That new-found knowledge, unfortunately, becomes irrelevant the moment the artificial infant changes its field of view by turning its head (panning, tilting and zooming). Now all that it’s been taught and its learned is out of context, as its knowledge is limited to understanding moving pixels and not objects.

A young child knows what an automobile looks like by identifying the simple geometric shapes that make up an automobile. We also learn very early that automobiles are on the street and we (people) should stay on the sidewalk. The street is where we see all the larger parked and moving vehicles. They should never come on the sidewalk, where all the people walk. This is something a human being learns before pre-school.

A person is more than just a mass of moving pixels.

People have a head, shoulders, arms and legs -- even a child knows that. We, as human beings, understand the concept of objects and shapes, even when panning, tilting and zooming from the moment we first discovered how to turn our heads and a lifetime of viewing the world through television.

Our A.I. must also go to art school to draw the environment and classify objects by something more than moving blobs in a fixed background that is relatively ignored. Identifying simple shapes, patterns, perspective, a horizon line can give our A.I. additional input for additional intelligence. Train the machines to draw, to create, to wonder in awe of the dynamism of the analog world around them through deep learning neural networks and sensor fusion.

Deep learning neural net can provide the ability to draw “digital stick figures” through edge detection, introducing the shape of people with a head, arms and legs at any angle or size.

Identifying the “street” can teach it “traffic” and store that when the field of view is changed. Same with a sidewalk where people walk, run and jog. Using a predictive analytic, it can extend the simple tracing of the field of view and start drawing and tagging the environment, outside of what is initially visible.

Every time the A.I. is panned, tilted and zoomed, it can then validate its predictive analysis, verify and extend its reach to learn the world around it and store it for a 360-degree view of it’s present environment, no matter where it may be “looking.” Just like a human being.

This can be accomplished with modification to an existing algorithm by training the deep learning neural net to understand the expected environment using the stationary markers, and even the shadows from the sun to understand cardinal views. Since most off-the-shelf video analytic solutions ignore the "background" and focus mostly on moving pixels, this would require a deep analysis of the background to understand the environment and then identify objects at whatever focal length required to see, building a thorough understanding of the 360-degree world around them. .

Our A.I. also needs to start asking questions.

Where am I? What is this? What is this structure? Why is that truck double-parked?

This information may be difficult to achieve using the limited power of one-eyed sight through pixel color, so much like the human being, other input channels of data absorption would be required. Asking the A.I. across the street, with its own metadata from a completely different vantage point can be stitched to help better understand the objects and the environment shared. Our A.I. now has two-eyes for better depth perception, although still reading two-dimensional imagery from two points of view, it’s opening additional channels of data absorption and the beginning of multi-sensory fusion. Stitching multiple data from varying angles and sources could provide super-human abilities.

Fusing data from fire, burglar, 911 emergency calls and gunshot detection will give our A.I. the miracle of hearing, without ears. It can smell the ozone or heavy pollen in the air through weather data, without a nose.

A 3D LiDAR Sensor can truly recreate the three-dimensional world using pulses of infrared light, giving our A.I. the ability to “feel” real world shapes and actual distance. A digital method of recognizing a physical object by touch.

Once an object is out of our A.I. field of view, it can follow without feet, by connecting with neighboring A.I. sources.

The question “Where am I” could be answered by learning how to read, using Object Character Recognition (OCR). OCR can read the letters off a billboard behind Home-plate on a live broadcast, it can certainly read street signs, and/or the sign on the bank across the street, or the FedEx logo off a double-parked truck.

If the street sign reads “8th Street NE” and reading the business signs also helps identify the surrounding environment; our A.I. is on the intersection of 8th and H Street. How do I know this? I asked the teacher, or in this case, an integrated geo-spatial mapping application. Integration into outside sources and sensors would provide the necessary additional inputs for our A.I. to really build artificial intelligence, much like how a human being, when we do not know, we ask.

Deep Learning and Sensor Fusion graces our A.I. with the ability to hear without ears, smell without a nose, touch without hands, read without first learning the Alphabet, and even ask questions like Why?

Why am I here?

The answer to this question for our Artificial Infant is far simpler than when a human child eventually learns the word “Why?” At some point in the development of intelligence; sight, sound, smell, feel, and touch is even not enough – we need to ask why.

Recently read about how artificial intelligence is being developed to play games. In one article, the A.I. was given the means to slow the opponent down by shooting a virtual laser beam at the opponent to momentarily discombobulate them. The A.I. became more aggressive when losing the game and succumbed to repeatedly blasting the opponent with laser fire. If we teach AI to be competitive, even in something that appears to be as innocent as a game, maybe programming the fact that it’s not whether you win or lose, but how you play the game that matters. Really? In any competitive game, no one likes to lose. The goal is always to win. So really, it’s just a war game. Do we really want to teach the machines to win at war, at all costs? Has anyone ever seen the movies War Games or The Terminator?

We should teach artificial intelligence not to be a Chess master (a game of war); but instead to be an artist -- learn the visual complexity of the analog world around them and observe life’s dynamism that human beings overlook and take for granted; all made up of colors, shapes, lights, patterns, shadows and all its natural wonder. The miraculous simplistic natural beauty of the analog world.

It’s my belief that there cannot be true artificial intelligence until the digital world fuses with the analog world, and that requires the miracle of sight.

Teach the machines to create as an artist, and they will see what we cannot see.

 

My deep dive into the introduction a core suite of Hitachi Video Analytics (HVA) last year, as presented in my article Digital Video Analytics - Test Results, opened the door to a flood of inquiries and use cases. Fortunately, my hundreds of testing exercises gave me insight on how the analytical engine for each module could be calibrated for specific use cases and be able to identify if something within our expanding portfolio could meet the requirements, including the introduction of the Hitachi ToF 3D Sensor, which improves HVA by delivering a digital video stream that is completely unaffected by lighting, shadows and color variances.

In a continued effort to expand our video intelligence capabilities and flexibility, I tested several video analytics, including licensed, third party commercial-off-the-shelf (COTS) and our own proprietary solutions for a number of specific use cases, using a few different variables in camera distance, angle, focal length and lighting. This exercise uncovered valuable insights on the differences in analytical engines and the limitations in how algorithms read pixel color, contrast and frames per second. This effort confirmed that one size does not fit all from both analytical capabilities, flexibility and solution architecture.

A popular question was “can HVA Object Detector send an alert if it recognizes X,” with “X” being an classified object (e.g. gun, knife, box, etc.). HVA Object Detector identifies objects based on calibration variables which includes a static object identified by the number of overall blocks or connected blocks (pixel size) within the restricted zones using machine learning. HVA Object Detector learns the objects within a zone, without the need to classify them, and continues to learn through variances in illumination states and motion. That’s all it does, and it does it very well, so it can cover a wide variety of use cases, but not all of them. Object recognition and identification requires deep learning, even when using the Hitachi 3D LiDAR sensor (more on this later).

Below is an example of how HVA Object Detector recognizes a black back-pack on a black bench. I created yet another example that effectively sent an alert when a white garbage bag was added to the trashcan, peeking behind the station structure.

 

Watch Object Left Behind

The following examples, which were completely out of the recommended system requirements (experiments) uncovered more insight on HVA Object Detector effectiveness and capabilities. I used HVA Object Detector to turn the standard CIF resolution camera stream monitoring the pantograph of a train car to send real-time alerts when damaged. Although there is constant motion within the video, and the specifications require a fixed field of view, I limited the zone for analysis to outlining the pantograph itself, while ignoring the motion in the scene.

 

Watch Pantograph

System requirements for HVA Object Detector includes a static mounted fixed field of view camera, with a minimum of CIF 352x240 resolution and one frame per second. The example above and below do not meet all those requirements, but with some persistence, I was able to get positive results.

 

Watch Ship Alert

The unique configuration variables also provides the ability to use HVA Object Detector for loitering alerts (HVA Loitering Detector is scheduled to be released soon).

 

Watch Loitering

This can also work for object theft, as the demo video presents below. Given enough frames for HVA Object Detector to learn the entire scene and its objects, it will send an alert if those objects are removed or disappear.

 

Watch Gallery Burglary

Another early use case question was -- “Can HVA Traffic Analyzer provide alerts for restricted or double parking?” HVA Traffic Analyzer monitors traffic and classifies vehicles moving pixels as a car, truck, bus or bike.

 

Watch Traffic Analyzer

While HVA Traffic Analyzer may not be engineered to identify double parked vehicles, HVA Object Detector can be calibrated to ignore moving objects, and even static objects for a predefined time before sending an alert. This could then take both the time for double parking and even the vehicles waiting at a traffic signal into consideration. I needed to try it. I was only able to dig up this old CIF resolution footage from 2008 of a person moving their vehicle from a restricted parking area. Even with the poor, standard resolution, and people walking across the restricted zone, I could calibrate the demo video for an accurate alert. This opens the opportunity to turn decade old traffic cameras into IoT sensors by scheduling pan-tilt-zoom pre-sets within the video management software with the automated enable/disable function of HVA.

 

Watch Parking Video

Although the HVA Object Detector may not provide actual object identification or classification, it’s flexible enough, and smart enough for many unique use cases.I also tested five different people counting video analytics products in four different "people counting" use case scenarios. This was to review configuration and calibration capabilities, algorithm responsiveness, and performance. It was also to prove my theory that the term “people counting” is too vague when it comes to video analytics. There are too many scenarios, environmental factors, camera angles and specific use cases to believe that one size fits all. This is just people counting, not people tracking or people identification or people falling down or running. Just a people-sized pixel object crossing a virtual line or zone in either direction with tailgating alert capabilities.One of the four different people counting use case scenarios presented here is the typical top-down use case, which provides the best separation of individual people for the maximum performance. However, I have done live demonstrations on live CCTV cameras, and even a demo clip at night, with a typical 45 degree angle of coverage with some success (depending on the field of view). These experiments used a top-down view calibrated on each solution by object size from the same Axis camera. The top-down field of view negates any recreation of the three dimensional space and so this is typically done by intelligent people-size identification. All the solutions required some unique calibration for each location, with performance based on how well the algorithm can filter luggage, shadows and reflections. The test results of this simple example proved that, given the calibration effort, all five video analytics people counting products could provided 100% accuracy. Please note that no one was holding an infant child during the making of these films. A top-down field of view, from a single camera source, makes consistent 100% accuracy a challenge with any people counter analytics, not only because of parents carrying children but also the variables at each area of coverage and multiple illumination states. A single people counting solution may work well in five different locations, but how about 50, or 500 dramatically different locations? The reflection of the light off the floor, the shadow on the door, different illumination states, windows with sunlight at different times of the day, during different seasons, all affect the pixel colors and contrast and require continuous machine learning or re-calibration. Although this was a simple example, it wasn't necessarily an easy calibration.







Watch Top-Down People Counters

The only method of increasing accuracy and greatly reducing calibration time for any of the more challenging environments, with any lighting, is by using a 3D LiDAR (TOF) sensor rather than a RGB CCTV camera. The Hitachi 3D LiDAR sensor is integrated into the Hitachi Video Analytics suite and provides the "perfect" image for analysis. Although the Hitachi 3D LiDAR sensor, with its SDK can do so much more than simple people counting, tailgating, loitering or intrusion detection (more on Sensor Fusion in the future), the video stream is unaffected by lighting, shadows, reflections because it generates it's own light source. High resolution pulses of infrared light that is measured as it bounces off objects. So even when I turned my office lab into a disco by flipping my lights on and off, there was no change to the video image. It continued to provide 100% accuracy, out of the box. Although it would not be able to identify the infant child cradled in mother's arms (that would need deep learning), it would continue to effectively count people no matter what color or lighting changes to the environment may occur and took only a couple minutes to setup and calibrate.

 

3D LiDAR Sensor with HVA People Counter

Now, the integration of the Hitachi 3D LiDAR sensor into HVA provides a solution to those problematic locations with wildly changing lighting, contrasts and shadows, extending its usability for even more challenging environments.Another one of the five specific use case tests included using a typical field of view. Below are two of the five COTS video analytics that were able to provide a somewhat accurate count of the people crossing the virtual line. The video sample is a purchased stock video clip and required a considerable amount of effort that included cropping out the giant LCD billboards above the platform, testing multiple locations of the virtual counting line, configuring the simulation of three-dimensional space, adding a pre-recorded video loop for machine learning, the calibration of the people size, contrast, etc. Moving the camera angle down another 10 degrees or so would've been so much easier. The field of view is crucial for the analytical engines to identify people on each side of the virtual counting line and provided more separation of groups, but alas, this was a stock video clip.


Watch Famous Station People Counting

If we move from "machine teaching" using finite number of variables, to deep learning, using thousands or millions of sample images for training and analyzing the video stream at multiple layers for a better understanding of its objects using grayscale edge detection, a deeper analysis of color using the RGB dimensions for image segmentation and layered classes, you can achieve a multi-point people counter/tracker, even within a crowded scene.



Watch Deep Learning Famous Station People Tracking

Above samples are examples of how Deep Learning goes beyond typical pixel analysis and can meet some of the more challenging requirements.

https://www.linkedin.com/pulse/digital-video-analytics-test-results-anthony-caputo

 

Before we discuss digital video analytics I need to explain, as painless as possible, why the following examples have inspired me to write this post. You see, I’ve been working with digital imagery and video since the 1990s and I’ve come to understand that the image presented on your screen is made up of digital pixels. In the digital world of absolute mathematical equations, pixels are not measured in dots of Cyan, Magenta, Yellow and Black, like the offset printing process, but rather in bits and bytes. A digital pixel represents visual color. There are 8-bits (1 byte) in a black and white image and 24-bits for a color image (1 byte each for Red, Green and Blue). So, each pixel contains 256 shades of gray (for black and white) or 256 shades of Red and 256 shades of Green and 256 shades of Blue, or 16,777,215 colors for a color image. If you’re wondering what happened to the Black in the transition from CMYK in print to the RGB of pixels, mix Red, Green and Blue paint together, and see what you get – black. The richness of the blacks are also defined by brightness and contrast in the digital world.

This is why your 1080p television looks so much sharper and more colorful than that old CRT television, because the digital image has more pixels to pick up more detail and color variables. However, more pixel depth doesn’t make a smarter camera, only a better quality image.

Now that you understand how the IP camera image processor captures visual images in the analog world, the next step is motion. Digital motion pictures is achieved the same traditional way Thomas Edison achieved motion back in 1901, with frames per second. The rapid succession of multiple snapshots of the field of view captures the color changes at a rate per second providing the illusion of movement on screen.

The real magic of digital video is the compression and decompression (Codec) algorithms. These codecs analyze motion within the multiple frames and dissects them into blocks, categorizing them into special frames and data for transmission. This is a necessity for the transmission of digital video because transmitting full 1080p frames per second (MJPEG) requires about 31 Mbps bandwidth (yes, thirty-one megabits per second), versus the H.264 codec, which can transmit the same quality imagery using only 2.5 Mbps. Further details on Codecs isn’t necessary for this post, but only to explain that Codecs do not care what is moving within the digital image to encapsulate that movement within its macroblocks. It’s only function is to shrink the video stream for transmission and populate less storage space when recording.

Digital pixels identify color. Multiple frames creates the illusion of motion. Codecs just shrink it for transmission and storage. The fact of the matter is, IP cameras are not very smart. They do not know what they are “seeing.” They do not know what is moving; they just capture, replicate and transmit. They don’t know the difference between blowing snow and a person walking across the scene. This is why video analytics systems have failed in the past, because software only cares about the pixels so you’re limited in trying to understand what is actually being “seen.”

Traditionally, analytical software is limited to the data received from these IP cameras, and so they analyze pixels (color), motion (FPS) and once calibrated, begin to understand a difference between something that’s 10 pixels and 50 pixels in size, calculate the time between frames and determine that the 10 pixels maybe a person walking and the 50 pixels is a car speeding, if its calibrated as such. The moment the lighting changes (which changes the color), or that person opens a giant umbrella, or that car slows down, it needs to be able to categorize shapes in order to remember that, “wait, that’s still a car.”

So you see, when I was assigned the task of testing and creating demonstration samples for Hitachi Video Analytics Suite, I was quite apprehensive in accepting the project. I envision hours of frustration ahead of me because IP cameras and software are not that smart. I wanted the killer app (analytics) to be that smart. I envisioned re-purposing the tens of thousands underutilized security IP cameras into Smart City sensors.

HVA not only surprised me, it impressed me. One of the first examples I created is below. When I realized HVA Object Detector could be calibrated to ignore moving objects, I remembered a use case from a decade ago that involved sending a real-time alert if there was a stalled vehicle or person at a railroad crossing. I recalled it took a freight train over a mile to stop and cost millions of dollars a day for delays, let alone the liability. HVA Object Detector ignored all movement, including any cars crossing the tracks and sent an alert when the person fell on the tracks


Watch Video

HVA Intrusion Detector includes a built-in filter for weather conditions. I inadvertently performed a test comparison between the analytics built into a camera and HVA by tapping into a video stream from a backyard camera which I had configured with its built-in analytics. The only method of calibration and configuration for the built-in analytics was adjusting its sensitivity. Although all the false positives from animals made me realize what a jungle the neighborhood was (squirrels, cats, raccoons, possums), I eventually disabled the built-in analytics, as I was sick of getting email alerts with snapshots of rain and snow. After a while, the continued reducing of its sensitivity doesn’t alert you to anything but the huge afternoon shadows that cause dramatic changes in pixel color. Absentmindedly, I did notice that I didn’t receive any false positives from the HVA Intrusion Detector, ingesting another RTSP stream from the same camera. That’s when I decided to create the example below. Simple area protection configuration, taken during snow fall. HVA ignores the snow, and the squirrel running around, and only alerts me when the person walks into the frame.

 


Watch Video

HVA knows what snow is. The intelligence behind the snow, rain, haze and fog filter that’s built into HVA Intrusion Detector is also available in the HVA Video Enhancer module. Impressed, I decided to give it an even bigger challenge. How about a Chicago-style snowstorm? Analyze This! To the left is the actual footage, crazy windblown snow creating white out conditions. It gets to the point at the end of the clip that there’s so much snow, it tricks the camera back to color mode, thinking it was daylight. The clip to the right is the sample video processed through HVA Video Enhancer, which now can be ingested into other video analytic modules for better accuracy and performance.

 

Watch Video

HVA really does know what snow is. The HVA Intrusion Detector sample clip below is configured for Perimeter Intrusion. A person must walk from the green zone into the red zone in order to be recognized as an intruder. Even though I configured the zones to be the same size, HVA’s ability to recreate a three-dimensional space from the two-dimensional image, it understands perspective so it recognizes that the figure attempting to enter the facility is 1.8 meters tall, and an intruder at each door.

 

Watch Video

A unique and very effective module is the HVA Privacy Protector, which enables the ability to protect the privacy of individuals and still allow for video monitoring for safety and security. I configured the HVA Privacy Protector example below with a couple layers. First, I wanted the ATM to always be pixelated, to protect PIN numbers, and the vehicles on the street, to protect license plates. Although HVA Privacy Protector is engineered for static fixed camera views, noticed how the persons-of-interest are still fully pixelated even when standing still? This stream is now available for input into other systems and/or analytics, such as Intrusion Detector or Object Detector while still protecting the privacy of individuals. The secured archived footage can only be seen by authorized personnel with the correct security clearance. You can even add a second layer of security using a Smart Card and transaction authentication number (TAN) for protection.

Watch Video

 

I created over a hundred test samples for all the HVA modules (listed at the end). HVA is impressive because each module has its own analytical engine, engineered to do that specific function. It’s not one pixel analyzer, and movement calculator that was built upon to do something more than its core capability. HVA also recreates three-dimensional space from a two dimensional video image and then adds the 4th dimension (time) for improved performance. You can also calibrate length of its 3D learning phase and each scene with multiple illumination states – day, night, afternoon, which also improves its performance and accuracy. It really does add more intelligence to cameras and I've tried it on many different types from a generic low-end bullet camera to the popular Axis cameras (including the panoramic), to the top of the line Thermal camera.I could go on with other samples, but you get the idea. I was apprehensive at first, but I’m excited to have been a part of this new technology release, and the thought that my dream of the analytics killer app for Smart City has finally become a reality. The Hitachi Video Analytics Suite:

  • Activity Visualizer
  • Camera Health Monitor
  • Face Collector
  • Intrusion Detector
  • License Plate Recognizer
  • Object Detector
  • Parking Space Analyzer
  • People Counter
  • People Counter 3D
  • Privacy Protector
  • Queue Detector
  • Traffic Analyzer
  • Vehicle Counter
  • Video Enhancer