This 3D imaging technique was introduced by [1] in the early 1980s and detailed in [2]. It consists in the acquisition of a pair of images of the same scene by two cameras from different angles. These two cameras are spaced by a distance called ��base��. Then, based on the pinhole camera model and epipolar geometry [3], the depth is determined from the disparity (difference between the position of an object viewed from multiple angles). This measure of disparity is the main difficulty for smooth functioning of this technique and depends on the choice of the base between cameras and their tilt angles. Indeed, the larger the base is, the more accurate the measure will be, but there will be more occlusions (a point on the scene viewed by a camera is not necessarily viewed by the other).
These occlusion problems do not allow us to obtain good results due to the kind of scene where this phenomenon often happens (crops). A 3D reconstruction technique that frees itself from occlusion problems is necessary. We can group 3D reconstruction techniques into three large families : geometric approaches, photometric approaches and those based on the physical properties of the acquisition system. Geometrical approaches are based on the knowledge of the scene structure and the internal and external parameters of the cameras used. Stereovision technique is part of this approach. In the case of photometric approaches, the principle is the evaluation of a pixel’s intensity to obtain 3D information as in the case of the method known as Shape from Shading [4].
Finally, many techniques of the previous techniques are based on the pinhole model; the third approach uses a real optical system. The main difference is that instead of considering a perfect projection of all points of the scene onto the image plane, only some of these points are projected correctly. This phenomenon comes from a limited depth of field that will be explained later.The Shape from Focus technique (SFF) [5] or Depth from Focus is based on this depth of field. This technique is used to solve our problem of 3D acquisition of a scene with strong occlusions. This is a passive and monocular technique that provides a depth map of a scene based on a stack of 2D images. This stack is obtained by varying the camera/object distance (dco) according to a defined Drug_discovery step where, for each step, an image is acquired in order to scan the entire scene.
A focus measure is calculated for each pixel of each image according to a local window, and the spatial position of the image where this measure is maximal is determined. This image position allows linking each pixel to a spatial position to obtain the depth map. The main drawbacks of this method are the need for a textured scene, because the focus measure is based on the high frequency content of the scene, and a large number of acquired images.2.2.