Image tagging

Image tagging is the process of automatically assigning text tags to images. Each image can have multiple tags. Use cases include photos in an e-commerce search engine, finding images quickly in a database and the diagnostic process of medical image examinations. This in turn allows reduced workload and faster classification as well as recommendation of alternative content.
In order to construct a system that can tag images, it needs a training dataset with annotated images. Developping such a dataset can be a laborious task. Depending on the complexity of the task, the training dataset needs to be large and adapted to the specific context. For example, if you need a system that can recognize french food items, it needs to be trained with foods that are common in France. For very complex tasks more data are needed. A carefully curated dataset improves the final result. Deep learning methods such as convolutional neural networks can be employed to produce a system that can automatically tag the image. Interested? I’d be happy to talk with you and show a demo.

Object detection

The goal of any object detection task is to indicate which object(s) are in an image and indicate the location. The location is indicated by bounding boxes each having a label with the probable object class. This task can be performed and automated in a system that is presented with photos or videos.
Use case examples include detecting faces in the crowd, or detecting different objects in a scene such as traffic that includes cars, bikes, pedestrians.
Object detection algorithms can categorize different objects in static images or dynamic images captured from camera, such as garbage separation in a recycling center or tracking the speed of objects for example a ball in a tennis game. Deep learning methods such as Yolo or R-CNN can be used in these cases.
Interested? I’d be happy to talk with you and show an example.

Photo shows object detection using the Yolo algorithm.

Video illustrates detection and segmentation of moving objects using R-CNN.

Text detection and recognition

Text detection and recognition, also called optical character recognition (OCR), deals with the conversion of photos or documents of text into a format that is readable by a computer.
The detection of text in natural scenes is important in different applications. Think about number plates, identifying financial documents or scanning passports, to name a few.
Detecting text from a clear single colored background is relatively simple. Detecting the same text in a natural scene is much more harder, particularly in certain conditions such as weather (sunlight), poor photo quality (blurring, low resolution pictures), or special object properties (text can be wrapped around an object, it can be displayed in different angles). An algorithm that detects text needs to account for these conditions to get a solid performance. Several image preprocessing steps and algorithms help to get the text into a digital format.

Text detection puts bounding boxes around each text element.

Images were taken from Unsplash and credits go to the following persons:

Bernard Hermant Rohit Tandon Lukas Souza