Computer vision techniques to analyse video content
Classification, localization, detection, segmentation, tracking and identification are frequently used terms when talking about smart camera’s and intelligent applications. They are all computer vision and deep learning techniques to analyze video content. For many people the terminology is confusing. This article will breakdown the concepts and shine a light on these techniques. It will explain the difference between them and helps to choose the proper recognition technique to offer your client.

Lets’ start with image classification by looking at the above image. The object in the image is easily recognized as being a car. The image can be classified to belong to the class car. Assigning a class to an image is what image classification is all about. Class assignment can be done with one class or with multiple classes. Single class classification comes in handy in determining whether there is a car or not. Multi class image classification can be used to predict there is a car, a truck and a bicycle in one image or reveals the brand, make and model of a car.

Localization and detection

Classification predicts what is in the image. The location of the object in the image is still unknown. Image localization identifies the location of the main and most visible object in the given image.
Object detection finds all the objects and their boundaries. Object detection predicts the location along with the class for each object and draws a rectangle around them.

Classification network recognises and classifies objects
Tracking follows objects through different frames

The next technique in line is segmentation. Image Segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as image objects). To be more specific, image segmentation assigns a label to every pixel in an image such that pixels with the same label share certain characteristics. Object detection models will detect and draw a bounding box corresponding to each class in the image. But it does not tell anything about the shape of the object as the bounding boxes are either rectangular or square in shape. Image segmentation will create a pixel-wise mask for each object in the image. It provides a far more refined understanding of the objects itself.

Segmentation: What is in the image and what is it’s shape at pixel level
Detection and segmentation
Segementation and detection

Until now objects and people are detected and classified. Image identification takes recognition a step further by identifying a person. Not only will an identification network detect a person, it will also be able to tell who this person is. This technology is strictly bound to privacy regulations and needs permission of the person to identify. Image identification only works when a network is trained for a specific person. It cannot identify random people.

Tracking follows objects through different frames

The last technique described is tracking. Object tracking involves tracking objects as they move across several video frames coming from one or more video sources. Often those objects are people or cars, but in theory it can be any object. Objects are detected, given a ID and re-identified in the next frame. Object tracking has many practical applications including surveillance, medical imaging, traffic flow analysis, self-driving cars, people counting and human-computer interaction.

All the techniques have their own characteristics. Depending on the requirements of a project the choice can be made for one of them.

Classification: What is in the image
Localization and detection: What is in the image and where
Identification: Who is in the image
Segmentation: What is in the image and what is it’s shape at pixel level
Tracking: Follow an object through different frames.

For more information about the possibilities AVUTEC has to offer, please contact our sales team at [email protected].