Technical Application Note 0005

Parallelization with lprcarrida

Revision: 1.0
Date: 2018-07-16
Contact: support@carrida-technologies.com
Copyright: 2017-2018 Carrida Technologies GmbH, Ettlingen, Germany
Author: Carrida Technologies

Home page

Table of Contents

1   Introduction

The lprcarrida library offers two main strategies to optimally utilize the available CPU's:

  1. Intra-image parallelization;
  2. Inter-image parallelization.

The first one means the parallelization happens within each image that's being processed. It can be useful if the ANPR process has to be performed in the fastest way for one particular image. The second method does the processing of each image in a different thread. It is best suited for cases where average ANPR processing time for a sequence of images has to be minimized - e.g. when working with live/video streams. There is also possible to combine both methods, i.e. to use both intra- and inter-image parallelizations simultaneously.

The parallelization strategy has to be defined in the configuration file before the engine is initialized, it cannot be changed "on the fly".

Note

A standard Carrida license allows usage of one Carrida instance, meaning the inter-image parallelization is switched off.

2   Intra-image parallelization

The intra-image parallelization can be configured in an ini file as follows:

[ANPR_ENGINE]

...

NumberCores = 2

...

Here, NumberCores can be set to the desired number of CPU's that have to be utilized. A value 0 means the engine uses all available hardware CPU's (but not more than 4).

The following diagram depicts intra-image parallelization logic.

./images/intraframe_parallelization.png

Intra-image parallelization diagram

So, the ANPR process is parallelized within the processing of every single image. The following two tables give some numbers that show the effectiveness of this approach.

Runtimes in milliseconds per image on an Intel Core i7-6700K CPU @ 4.00 GHz
Mode 1 core 2 cores 3 cores 4 cores
fast 13.1 9.5 9.1 9.1
standard 16.1 11.1 9.9 9.9
high 25.1 18.8 16.6 16.4
Runtimes in milliseconds per image on a VC Nano camera
Mode 1 core 2 cores
fast 82.2 68.5
standard 126.0 91.5
high 198.5 147.2

As it can be seen, the dependency time/number_of_cores is not linear. Also, utilizing more than 4 CPU's does not bring processing speed-up anymore. This is due to the architecture of the intern processing pipeline. To get the most from the available hardware resources, one has to use the inter-parallelization method which is described in the next section.

3   Inter-image parallelization

In this approach, each image will be processed in a separate thread - the number of threads can be configured. The input images form a queue. Once one thread finishes processing its image, it accepts a new one from the queue. At the end of the processing pipeline, ANPR results for individual images are collected and sorted before they will be sent to the output.

Generally, there are two parameters to configure in an ini-file:

  • NumberOfEngines: this is a number of working threads, each of which processes one image from the queue. If set to zero, the actual number of hardware cores will be taken.
  • EngineQueueLength: max possible queue length of images, waiting to be processed. Increasing this parameter increases memory usage, but allows more flexible usage of thread resources. Once the queue is full, either the next coming image will be rejected or the engine will be blocked until one of the processing threads is available (see BlockingMode parameter on the CARRIDA parameters page ).

These parameters must be specified within the [ANPR_STREAM] section.

[ANPR_STREAM]

...

NumberOfEngines = 2

EngineQueueLength = 2

In this example, ANPR processing runs in two parallel threads.

The following diagram depicts inter-image parallelization principle.

./images/interframe_parallelization.png

Inter-image parallelization diagram

Finally, the following two tables represent the effectiveness of the inter-image parallelization method.

Runtimes in milliseconds per image on an Intel Core i7-6700K CPU @ 4.00 GHz
Mode 1 engine 2 engines 3 engines 4 engines 6 engines
fast 13.1 7.8 6.4 4.9 4.0
standard 16.1 9.4 7.2 6.5 5.4
high 25.1 14.5 10.9 9.3 7.1
Runtimes in milliseconds per image on a VC Nano camera
Mode 1 engine 2 engines
fast 82.2 54.0
standard 126.0 74.8
high 198.5 122.0

As it can be seen, hardware resources can be utilized more effictively by using this method.

4   Setting processor affinities

On some systems, the processing can be accelerated by setting processor affinities explicitly. There is a possibility to assign CPU's to use for each thread described in the Inter-image parallelization section. See the description of the ProcessorsToUse parameter on CARRIDA parameters page for more detail.