In the postgraduate stage, my research topics are three-dimensional double visual research and application. This article will discuss from the following aspects.
1. Topic content
2. Expand the topic, the meaning of research and the goal of achieving
3. Technical difficulties
4. General plan
Double-purpose vision overview
The distance information for obtaining the space 3D scene is the most basic content in computer vision research. There are many ways and technologies for getting distance information. Stereoscopic vision is the most important distance perception technology in the computer passive ranging method. It directly simulates the manner's visual treatment of scenes, which can be flexible to measure the stereo information of the scene in a variety of conditions. The effect is that other computer vision methods cannot be replaced. For its research, it is very important to have a visual psychological perspective or in engineering applications.
Stereoscopic vision is a method of acquiring three-dimensional geometric information of an object by a plurality of images (generally two). In a computer vision system, two digital images of the surrounding scene can be used to simultaneously obtain the two digital images of the surrounding view from different angles, and then the three-dimensional shape and position of the surrounding scene is reconstructed by the computer through a series of images.
The prerequisite for three-dimensional reconstruction is to establish a correspondence between two images.
Three-dimensional reconstruction is performed with stereoscopic vision, refers to a method of recovering a three-dimensional geometry of an object by two or more images.
In summary, the double visual vision is a kind of stereoscopic visual, its purpose is to pass the image processing, target identification, target matching, and finally the corresponding computational relationship is obtained by image processing, target identification, target matching. Restore the three-dimensional geometry of the object.
A complete stereo vision system can usually be divided into image acquisition, camera label, feature extraction, stereo matching, and interpolation.
Image acquisition
The USB camera is used in the project, so there is an image acquisition problem.
There are now two ways to get a video stream through a USB camera, one is VFW, one is to use DirectShow technology. Under the Windows desktop operating system, both methods can obtain a video stream of the USB camera, and intercept the image of each frame, prepare for the next image processing. Here I am using DirectShow technology.
If it is a real-time processing, it is a two-point visual application in a moving environment, there will be two cameras to synchronize the problem. That is, the image data processed in the final image processing should be obtained at the same time.
These two parts work, now they are basically completed, but the work of two cameras synchronously acquire images has not passed experimental verification. However, it should be not difficult.
In addition to these, since it is necessary to perform binocular visual work, the positional relationship between the two cameras is still to be considered. The analysis shows that the longer the baseline length, the smaller the relative error of the position in the obtained space, but the baseline length is not too long, otherwise, the two cameras may not be observed at the same time due to the mutual occlusion of each portion of the object. In addition, the swing angle between the two cameras is also considered. This will reduce the visual dead zone as much as possible. Some papers discussed the relationship between swing angles and visible regions. However, this part, I just have a concept, but there is no specific consideration.
Computer calibration
The three-dimensional computer vision system should be able to depart from the image information acquired by the camera, calculate the position, shape, and the like of the three-dimensional environment, and thus identify the object in the environment. The brightness of each point on the image reflects the strength of a point of reflected light on the surface of the space object, and the position on the image is related to the geometric position corresponding to the surface of the space object. The interrelationship of these locations is determined by the camera imaging geometry model. The parameters of the geometric model are referred to as camera parameters. These parameters must be determined by trials and calculations, and the experiment and calculation process is called a camera labeled. The camera model is a simplification of the optical imaging geometry, the simplest model is a linear model, or a pin-hole model. However, when the calculation accuracy is required, or when the linear model does not accurately describe the imaging geometric relationship of the camera, it must introduce linear or nonlinear distortion compensation, and use the corrected model for three-dimensional reconstruction. Get higher precision. The camera is marked with a traditional camera labeling method and a camera self-calibration method. As a general industrial application, traditional scaling method, because of its simple, conceptual, easy to understand, and is readily used. The self-defined technique of the camera does not require a calibration reference, directly utilizing the parameters of the camera model to calculate the camera model from the constraint relationship from the image sequence, making the real-time online calibration camera parameters possible. The research focus in the traditional camera label method is to seek more accurate, more in line with the camera model of human visual features, establish a common camera label method, strive to be practical, simple, and how to determine the nonlinear distortion correction model. Scale parameters. The research focus in the camera self-calibration method is: How to solve the problem of sensitive problems in self-calibration technology, how to make the self-calibration algorithm simpler, and try to avoid complex nonlinear search issues.
Recently, only the traditional camera labeling method that does not consider distortion is used to calibrate the USB camera.
Imaging transformation involves transformations between different coordinate systems. Considering that the final result of the image collection is to get the digital image in the computer, the coordinate system involved in the 3D spatial scene imaging mainly: the world coordinate system, camera coordinate system; like planar coordinate system; computer image coordinate system.
The camera calibration is the internal parameters and external parameters of the calibration camera, these internal and external parameters, can be expressed in a matrix synthesis called a projection matrix. As shown in the formula below:
among them,
The coordinate value in the z-axis direction in the camera coordinate system in the space can be found in the later calculation.
,
. (
,
) The intersection of the optical axis and the image plane, that is, the origin of the planar coordinate system, the unit is pixel, is a value represented in the computer image coordinate system. This point is generally located at the center of the image, but will be somewhat deviated due to the cause of the camera.
with
The physical dimensions per pixels are in the X-axis and the Y-axis direction, respectively. due to
,
,
with
Only related to the internal structure of the camera, so these parameters are called internal parameters inside the camera;
It is fully determined by the camera relating to the world coordinate system, called the camera's external parameters. This determines the inside and outside parameters of a camera, called the camera.
The camera calibration generally needs a special calibration reference that is placed in front of the camera, the camera acquires the image of the object, and calculates the inside and outside of the camera. The position of each feature point on the calibration reference should be accurately measured at the position of the world coordinate system. By processing the image captured by the calibration camera, identify the feature points in the reference object, which is corresponding to the point in the world coordinate system, which can pass the formula, through the least squares method The projection matrix containing inside and outside the camera is obtained. With the projection matrix, you can find out the inside and outside of the camera, which will bring errors. However, in the double visual visual, the projection matrix is sufficient, and the projection matrix does not have to decompose the parameters inside and outside the camera. In this topic, a chessboard is used as a calibration reference. As a conventional camera labeling method, the basic step is very clear, but the difficulty is that the extraction of the characteristic points of the calibrated image used to calibrate the calibration image must be high, and finally the image coordinates will reach sub-sub- The accuracy of the pixel. This is the content of the image processing that.
A program that is calibrated by a video camera is written in accordance with the traditional camera labeling method of distortion without consideration of the distortion. The key inside is image processing.
The general steps are as follows:
1. For convenience, only portions of the image that are really used for calibration is only retained in the images. The image obtained at this time should be generally black and white, but because of the light of the light, there is still no black and white color in the image.
2. The edge of the image is detected, which is used here to detect the edge detection of the color image. Because if you first give a threshold directly, the image is divided into two-value segmentation, then the first threshold is not very good, so that there are too many subjectivity. Here, only the simplest method is used for color image edge detection, which is done according to a color edge detection operator given by a master's paper, and the final result is to get a gray edge image.
3. The threshold value of grayscale image segmentation is automatically obtained by the OTSU method, and the image is binarized by the resulting threshold.
4. Detained two-value images in a modified image of a morphological image. Because the detected edges have a certain width.
5. By performing Hugh transformation, the straight line in the image is asked to separate the horizontal and longitudinal lines. And draw straight lines on the original image and the post-processed image.
6. Ask the horizontal and longitudinal line coordinates, and sort the horizontal lines and vertical lines for this purpose. Here, it is mainly to solve the Code, in order to seek the inverse of the matrix, to facilitate the use of VC and MATLAB. The coordinates of the final resulting image point also expressed in the original image and the transformed image.
7. The position corresponding to the image in the input image is in the world coordinate system, and again is connected to MATLAB, with the least squares method to obtain the M matrix.
The problem is that due to the effect of image processing, in the experiment, the points seen in the found image are not very accurate, although it is only 1 to 2mm, but finally, it will produce The non-small error is initially considered two parts. The method used in image processing may not be very good (not every step algorithm can't, but I don't know how to use these algorithms to achieve the best results). The image effect taken at the time is not very good, and when the experiment, there is no accurate measurement of the coordinate of the world coordinate system. Solving these issues, is the next step.
Stereo match
Three-dimensional match is a key technique in a double visual visual and is also difficult.
The three-dimensional match is a correspondence between the same spatial scene in the pixel of the image in different views. Unlike ordinary image templates, stereo matching is conducted between two or more views between viewpoints, geometry, and grayscale distortion, and noise interference, there is no standard template. When the three scenarios are projected to a two-dimensional image, the images are affected by many factors in scenarios, such as illumination conditions, scene geometry and physical properties, noise interference and distortion, and camera characteristics, the same viewpoints in different views It will be very difficult to accurately make a match that contains such a multi-disadvantage-containing image is quite difficult. At present, research on stereo vision can be summarized as two directions: 1. From the perspective fusion mechanism of understanding human visual, trying to establish a common human biocontricity visual calculation model. 2. From practical applications and requirements, it is committed to establishing a practical three-dimensional vision system for specializational fields and object-oriented, to reduce the difficulty of visual treatment issues by emphasis on scene and task constraints, thereby increasing the practicality of the system.
The design of any stereo matching method must solve the following three problems. 1. Base selection: Select the appropriate image feature as a matching base, such as points, straight lines, regions, phase, and the like. 2. Matching Guidelines: Certain solid features of the physical world is expressed as a number of rules necessary to follow, and is the matching result that the results can really reflect the original appearance of the scene. 3. Algorithm structure: Design the appropriate mathematical method to properly match the stability algorithm of the selected element.
First, choose the appropriate matching primitive. Matching primitive refers to image features for matching, due to the influence of perspective differences and noise interference, it is difficult to make all pixels in the image, for this, should be selected The characteristics of the lack of landscape properties can be identified as a matching primitive. The commonly used matching primitives are mainly: point-like features, current status characteristics, and regional features.
The matching criterion is based on the selected matching base to represent some of the rules that must be followed, to improve the system's decentralization matching capabilities and calculation efficiency. MARR proposed unique constraints, compatibility constraints, and continuity constraints are considered to be the most general, most basic physical constraint in matching control. On the basis of these three basic constraints, the characteristics and application requirements for treating the scene can be derived by additional prior knowledge. Some specific matching control guidelines. Commonly used control guidelines are: uniqueness constraints, continuous constraints, compatibility constraints, polar line constraints, and sequential constraints.
The algorithm structure that implements the match is closely related to the selection of the matching base and the matching criteria, which should generally take into account validity and calculation. The three-dimensional match is essentially the best search problem under matching standards, and most optimization techniques in many mathematics can be applied to stereoscopic matching, such as dynamic planning methods, relaxation methods, and genetic algorithms.
Depending on the matching base and the mode, the current stereo matching algorithm can basically be divided into two categories: area-based matching and feature-based matching. Commonly used match features include edge points in the image, angular points and other grayscale discontinuities and edge straight lines.
Here, if it is only used by binocular visual to judge a small ball, it may be possible to directly find the position of the ball of the ball in the two images as a matching primitive, so that it will be simple. However, this is required to be more accurate, and in image processing, it is a difficult point to accurately find the ball's heart. However, the specific effect is not experimental verification, and now this step is still in the stage of the test image processing.
Distance of space points
When the calibration of the two cameras ends, the projection matrix of the two cameras is obtained, and the position of the required spatial point is also obtained in the position of the two images by image processing. At this time, it is convenient to calculate the coordinates of this spatial point in the world coordinate system through the formula.
Since it is not possible to accurately match each of the actual scenes, the three-dimensional image of the entire scene is required, there is an interpolation.
The next step work and expectation work is in the graduate period, and it is hoped to build a platform for software for the various algorithms studied by binocular visual. Complete a basic program framework, which also achieves basic biven videos.
Therefore, in the initial, the focus of possible work will be built on such a software platform. This will include a traditional camera calibration and a simple visual rage of a simple object.
As mentioned above, there is no universally applicable algorithm for the characteristics of the image, so it is necessary to add some prior knowledge when matching, only this will have a better effect. The more prior knowledge, the better the result. Therefore, the stereo matching is a method for different image processing for different applications. First, it will start from the ranging from the small ball.
The desired last built software platform can see the effect of the real-time video capture image processing on the screen, that is, display the results of the three-dimensional match and the distance measured by the measured object. This platform is intended to be implemented with VC and DirectShow. The processing of captured video images can be implemented with a Filter. Therefore, when this platform is implemented, as long as the different FILTERs are written, different three-dimensional matching can be achieved for different applications, and different algorithms are studied.
To complete this platform, the next step is that the camera calibration still needs to continue, but what is the implementation, that is, how to improve the accuracy of the calibration, the analysis calibration is unsuccessful, or the calibration error is too large, better Image processing, the ultimate goal is to calibrate the camera with a conventional camera labeling method. In addition, the extraction of the ball of the small ball is also a problem with image processing.
It is also more in-depth learning of DirectShow technology. Although using DirectShow technology, the capture of video streams is realized. But this is the simplest. If you want to prepare a Filter, there is still a lot to learn, and you still don't quite understand.
In addition, since it is an embedded topic group, it is necessary to understand the operating system of Wince, in order to better design the program. That is, it is necessary to apply binocular vision systems under WinCE. Although the principle is the same,, in specific applications, there will be many problems. But I hope that the accuracy of this system will be verified first in the ordinary Windows platform, and then transplanted. This part of the work is going to be completed in mid-March.
If there is a time, it will further study the distortion camera labeling algorithm and self-standard method, and how to apply the double visual visual to another scene.