The frame called Current Frame (CF) is the output of this substitution operation. At this point, a comparison between CF and a reference frame (RF) generates a foreground frame (FF), that emphasizes the pixels belonging to the human shape. The RF is very similar to the CF, but it contains only still objects, without any human subject, as it is captured in the initial phase, when the sensor starts catching depth frames, and no people must be in the scene. Equation (1) defines the value of the pixels in the FF:FF(x,y)={CF(x,y)+gapCoeffif|CF(x,y)?RF(x,y)|>ThPersonCF(x,y)otherwise(1)where x is the column index and y is the row index of the pixel in the frame; the ThPerson threshold is set to 50 mm, and it allows identification of depth gaps that reveal new objects, or human subjects, in the scene.
The pixels that verify the first condition in Equation (1) are increased by the gapCoeff quantity: this addition can be defined as a Depth level slicing, similar to the Intensity level slicing process in [16]. The latter method is used to enhance the relative visual perception on RGB images, while, in this context, it enables to improve the object discrimination step. A Sobel edge detection solution helps to achieve objects separation inside the scene, especially when they are overlapped. The object bounds extracted are then set to the floor depth level (MaxHeight), in the FFSobel frame, according to Equation (2):FFSobel(x,y)={MaxHeigthif(Sobel(CF(x,y)) The output value must be compared to a threshold, to set the level of detail of the edges. This threshold, named ThSobel, is empirically set to 2,000. Based on both Equations (1) and (2), setting the parameter gapCoeff equal to 6,000 mm allows maintaining ThSobel fixed, and ensures the correct discrimination of the human shape, even when it features depth values very similar to those of nearby objects.The last operation consists in the creation of a so-called 40 �� 40 super-pixel frame (FFs): each super-pixel corresponds to a 6 �� 8 block of pixels in FFSobel. The i-th super-pixel takes Brefeldin_A the value 1, if all the pixels in the block differ from MaxHeight, otherwise it takes the value 0. This process improves the separation between each object in the scene, and also allows the processing time to decrease because the total amount of pixels passed to the following steps is reduced, as shown in Figure 2b.3.2. Distinguish Object ProcedureThis section describes the discrimination algorithm that splits all the objects present in the depth scene. The frame resolution value required by the procedure is not fixed, so the algorithm can work with different depth frame sources.