Robot Vision Introduction
Composition of machine vision systems
The machine vision system refers to the use of a computer to realize people's visual features, which is to realize the identification of objective three-dimensional world with a computer. According to now, the feeling of the human visual system is a retina, which is a three-dimensional sampling system. The visible portion of the three-dimensional object is projected onto the retina, and people perform three-dimensional understanding of the object in accordance with two-dimensional images projected onto the retina. The three-dimensional understanding refers to an understanding of the shape, size, distance of the observation point, the texture and motion characteristics (direction and speed), and the like.
The input device of the machine vision system can be a camera, a drum, and the like, which uses a three-dimensional image as an input source, that is, the input computer is the two-dimensional projection of the three-dimensional tube. If the three-dimensional guest world to two-dimensional projection is as a positive transformation, the machine vision system is to do, from this two-dimensional projection image to the inverse transformation of the three-dimensional guest world, that is, according to this two-dimensional projection Image to rebuild 3D objective world.
The machine vision system is mainly composed of three parts: the image acquisition, the processing and analysis, output, or display of the image.
Nearly 80% of industrial vision systems are mainly used in terms of inspection, including product quality, controlling product data, etc. in the production process. The classification and selection of products is also integrated into the detection function. The following is a single camera vision system for the production line, indicating the composition and function of the system.
Visual system detects products on the production line, determines whether the product meets quality requirements, and generates a corresponding signal input host machine based on the results. Image acquisition devices include light sources, cameras, etc .; image processing devices include corresponding software and hardware systems; output devices are related systems connected to the manufacturing process, including process controllers and alarm devices, and the like. Data is transferred to the computer, analyzing and product control, if you find unqualified products, the alarm alarm and exclude the birthplace. The results of the machine vision are the quality information source of the CAQ system, and can also be integrated with other systems of CIMS.
Image acquisition
The acquisition of the image is actually a series of data that converts the visual image and the intrinsic feature of the measured object into the computer, mainly consists of three parts: * Lighting * image focus formation * image determination and forming a camera output signal
1, lighting lighting and important factors affecting the input of machine vision systems, because it directly affects the quality of input data and at least 3 0% application effect. Since there is no universal machine visual lighting device, the corresponding lighting device is selected for each specific application example to achieve the best results.
In the past, many industrial machine visual systems used visible light as a light source, mainly because visible light was easily obtained, low price, and convenient operation. Several visible light sources are white lights, fluorescent lamps, mercury lamps and soda lamps. However, a maximum disadvantage of these light sources is that light can remain stable. Taking the fluorescent lamp as an example, the light energy will drop by 15% in the first 100 hours of use, and the light energy will continue to decrease as the usage time increases. Therefore, how to keep light energy stabilize in a certain extent, which is a problem that needs to be solved in the practical process.
In another aspect, the ambient light will change these light sources to the total light energy on the object, so that the output image data is stored in noise, and the method of adding a protective screen is generally used to reduce the effect of ambient light.
Due to the presence of the above problems, in today's industrial applications, it is often used as a light source using X-rays, ultrasonic waves, and other non-visible light. However, invisible light is not conducive to the operation of the system, and the price is higher, so it is currently in practical applications, still use visible light as a light source.
The lighting system can be divided into: back illumination, forward lighting, structural light and flash lighting. In this, the backlight is placed between the measured object between the light source and the camera, and its advantage is to achieve high contrast images. The forward lighting is that the light source and the camera are located on the same side of the measured object, which is easy to install. Structural light illumination is projected onto the measured object to the measured article, and the three-dimensional information of the measured object is demodulated according to the distortion thereof. The frequency flash light is illuminated to illuminate the high frequency of the light pulse to the object, and the camera shooting requirements are synchronized with the light source. 2, image focus formation
The image of the measured object is focused on the sensitive element through a lens, just like the camera photo. The difference is that the camera uses a film, and the machine vision system uses a sensor to capture the image, and the sensor converts the visual image into an electrical signal, which is convenient for computer processing.
Select the camera in the machine vision system should be based on the actual application, where the camera's lens parameters are an important indicator. The lens parameters are divided into four parts: magnification, focal length, depth of field and lens installation.
3, image determine and form a camera output signal
The machine vision system is actually a photoelectric conversion device, that is, the lens imaging received by the sensor, transforms the electrical signal that the computer can handle, the camera can be a tube, or a solid state sensing unit.
The electronic camera has earlier, and it has been applied to commercial television in the 1930s. It uses a vacuum tube containing a photosensitive element to convert the received image into an analog voltage signal output. A camera with RS-170 output system can be directly connected to a commercial television display. The solid state camera was in the late 1960s, and the US Bell Phone Laboratory invented the charge coupling device (CCD), which developed. It is configured on the linear array or rectangular array of photosensitizing diodes of each pixel, and outputs the image optical signal to electrical signals by outputting the voltage pulse of each diode in a certain order. The output voltage pulse sequence can be input directly to the RS-170 model, or enter the computer's memory, numerical processing. The CCD is now the most common machine visual sensor.
Image processing technology
In the machine vision system, the processing technology of visual information mainly depends on image processing methods, including image enhancement, data coding, smoothing, edge sharpening, segmentation, feature extraction, image recognition, and other content. After these treated, the quality of the output image is improved, which improves both the visual effect of the image, but also facilitates the analysis, processing, and identification of the image.
1. Enhanced image of the image is used to adjust the contrast of the image, highlight important details in the image, improve visual quality. Image enhancement is usually used in grayscale histogram modification technology.
The grayscale histogram of the image is a statistical characteristic chart showing an image grayscale distribution, which is closely connected to the contrast.
Typically, a two-dimensional digital image represented in the computer can be represented as a matrix, and the element in the matrix is an image gradation value located at the corresponding coordinate position, is an integer of discretization, generally take 0, 1, ..., 255. This is mainly because the numerical range represented by one byte in the computer is 0 ~ 255. In addition, the human eye can only distinguish between about 32 grayscale levels. So, use a byte to represent the grayscale.
However, the histogram can only count the probability of appearing at a certain gray pixel, reflecting the two-dimensional coordinates of the pixel in the image. Therefore, different images may have the same histogram. Through the shape of the grayscale histogram, the sharpness of the image and the black and white contrast can be determined.
If the histogram effect of obtaining an image is not satisfactory, it can be appropriately modified by histogram equalization processing technology, ie, pixel grayscale in a known grayscale probability distribution image makes a mapping transform, so that it becomes A new image with a uniform gray probability distribution, which makes the image clearly.
2, the smoothing of the image
The smoothing technology of the image is the decimal noise processing of the image, mainly in the actual imaging process, and extracts useful information due to image distortion caused by imaging equipment and the environment. It is well known that the actual image is in the process of forming, transmitting, receiving, and processing, there is inevitable external interference and internal interference, such as the sensitivity of sensitivity of sensitivity during photoelectric conversion, quantifying noise, transmission process Errors and human factors, etc., the image can be metamous. Therefore, remove noise, restore the original image is an important part of image processing. The linear filters developed in this century, in its perfect theoretical basis, mathematical processing, easy to adopt FFT and hardware implementation, etc., has always placed an important role in the image filtering area, where Wiener filter theory and The Karman filter theory is represented. However, the linear filter has a disadvantage such as high computational complexity, not convenient for real-time processing. Although it has a good smooth effect on Gaussian noise, it is poor induced by pulse signal interference and other forms of noise interference, and the signal edge is blurred. To this end, in 1971, the famous scholar tukey proposed a non-line filter--median filter, that is, the median value of the gradation in the local area as the output gradation, combined with statistical theory, using iterative methods It is ideally designed to recover from noise, and protect the contour boundary of the image, which is not blurred. In recent years, nonlinear filtering theory has been widely used in machine vision, medical imaging, voice treatment, etc., and it has also developed the research to develop in the depth direction.
3, data encoding and transmission of images
The amount of data of the digital image is quite large. The amount of data of 512 * 512 pixels is 256 k bytes. If a 25-frame image is assumed, the channel rate of the transmission is 52.4m bits per second. High channel rate means high investment, which also means a popularization of difficulties. Therefore, during transmission, the image data is compressed, which is not often important. The compression of the data is mainly completed by encoding and transform compression of image data.
Image data encoding generally adopts predictive coding, so that the spatial variation of image data and sequence changes are expressed in a predictive formula. If the front of the front of a certain pixel is known, the pixel value can be predicted by formula. Using predictive coding, only the starting value and predictive error of image data are generally required, and 8 bits / pixels can be compressed to 2 bits / pixels. The transformation compression method is to divide the entire image into a small (one show 8 * 8 or 16 * 16) data block, and then classify, transform, quantize these data blocks, thereby constituting an adaptive transformation compression system. This method can compress the data of a image to several tens of transmissions, and change back in the receiving end.
4, edge sharpening
The image edge sharpening process is mainly to enhance the contour edges and details in the image, forming a complete object boundary, reaching the object from the image or detects the area of the same object surface. It is the basic problem in the early visual theory and algorithm, and is also one of the important factors of mid-term and post-visual success failures.
5, segmentation of the image
Image splitting is divided into several portions, each part corresponding to a certain object surface, and at the time of segmentation, the grayscale or texture of each part meets a uniform metric. A certain essence is classified. The classification is based on the grayscale value, color, spectrum characteristics, spatial characteristics or texture characteristics of the pixels. Image segmentation is one of the basic methods of image processing techniques, and is applied to, such as chromosome classification, scene understanding systems, machine vision.
There are two main methods of image segmentation: one is given the grayscale threshold segmentation method in the metric space. It determines image spatial domain pixel clustering based on image gray histograms. However, it only utilizes image grayscale characteristics, and does not utilize other useful information in the image, so that the division results are very sensitive to noise; the other is the spatial domain region growth segmentation method. It is a pixel communication formation of a similar property in a sense such as gray level, tissue, gradient, etc., which has a good segmentation effect, but the disadvantage is complex, and the processing speed is slow. Other methods such as edge tracking, mainly focusing on maintaining edge properties, tracking edges and forming a closed outline, dividing the target; the cone image data structure and the laboratory lack of lighting are also using the pixel distribution relationship, the edges The pixels are reasonable and returned. The knowledge-based segmentation method is to utilize the prior art and statistical characteristics of the scene, first split the image, extract zone features, and then utilize the interpretation of the domain knowledge derivation area, and finally in accordance with the interpretation of the region. 6, image recognition
The identification process of the image can actually be regarded as a marking process, i.e., using an identification algorithm to identify individual objects that have been split in the scene, give these objects to a specific mark, which is a task that the machine vision must be completed.
According to image identification, it can be divided into three types of problems. In the first class identification problem, the pixels in the image express a particular information of a certain object. A certain pixel in the remote sensing image represents the reflection characteristics of a certain light spectrum section of a certain position of the ground, and can determine the type of the object by it. In the second type of problem, the object to be identified is tangible, and the two-dimensional image information is sufficient to identify the object, such as text identification, some of which have three-dimensional identification of stable visual surfaces. However, such problems are not like the first type of problem, it is easy to represent a feature vector. During the recognition process, the object to be identified correctly from the background of the image, and then try to establish the property map of the object in the image. Assume that there is a match between the properties chart of the model library. The third type of problem is the three-dimensional representation of the object to be measured by the two-dimensional map, element map, 2 · 5 dimension, etc. Here, how to extract an implicit 3D information, as the hotspots of this research.
The method currently used for image recognition is mainly divided into decision theory and structural methods. The basis of the decision theory is the decision function, and it is based on the mode vector to classify the mode vector. It is based on timing description (such as statistical texture); the core of the structural method is to decompose objects into model or mode primitives, and different The object structure has a different base string (or a string), by obtaining the encoded boundary by using a given mode element using a given object, obtain a string, and then determination of its genus based on the string. This is a method dependent on the relationship between the symbols describe the object being measured.
?
Related Links:
Computer Vision Page At CMU
Computer Vision and Image Understanding (a magazine, a lot of articles provide PDF available)
Technical CommitteE on Computer and Robot Vision
?
Related items:
IMPROV - Image Processing for Robot Vision
Note: Original text from http://www.robotdiy.com/Article.php?sid=138