These days have begun to touch a camera driver, and I have encountered a tricky problem - the image data from the camera and the resolution of the LCD screen are consistent, but the width and high are just reversed. For example, the camera supports the resolution of QVGA, that is, the resulting image size is 320 × 240. The LCD is just the resolution of QVGA, but its size is 240 × 320. The data of the camera must be rotated 90 degrees, in order to put it in the LCD. Rotate 90 degrees? Isn't it the data displayed on the screen that is upside down? Oh, it is very simple to deal with this problem. When you install the lens, you can guarantee that the rotated data is just consistent with the direction of the view. However, since the data from the camera bus must be 320 pixels, this camera has no way to make hardware transformations, let the data turn out into 240 pixels, this conversion must be done with the software!
There are many ways to use software to rotate, no matter what practice, you must have a high efficiency, so you have practicality. Because this rotation is actually prepared for the camera view, the preview is 15 frames per second. The data traffic is still relatively large. If the entire algorithm is slower, the preview itself will take a lot of CPU time, and it is bound to affect the CPU. Others equally important things. Moreover, the entire development is for ARM, on the embedded platform, each CPU resource is fully saved. My friend has used two algorithms before, one is a direct pixel copy, such as copying the pixel points of coordinates (A, B) to (B, A), a double cycle, and the content of the entire screen is rotated. Once again. But do more than 100 milliseconds, think about it, you need 15 frames per second, this value can only make the system display a few frames, and the CPU has consumed 100%! Since it is developed on the Intel's platform, we naturally think of rotating with Intel's IPP function. However, the result is not satisfactory, because Intel does not provide a single large image rotation function, the function provided is a composite function function, which also contains Resize (adjusting image size), and it is not Separate optimization for this rotation operation, so the image is turned around for about 100 milliseconds. Too slow, too slow! Either write a function with compilation? It feels that it is very troublesome to maintain it, it is not convenient for future transplantation. Let's consider you can get this problem with pure C.
By the way, the image data we handle is YCBCR data, and each channel is separately processed, each pixel occupies 1 byte. On the 32-bit processor, 4 bytes of reading and writing may be the most efficient. Considering that ARM's read and write memory is a bottleneck of system efficiency (so on other systems), we have improved the previous algorithms. IPP functions We have no way to change, so it is only available on your own soil measures. I used 1 pixel before, it is one byte. This time we calculate four adjacent pixels, then write it once. The result of the test allows us to encourage, the score is about 80 milliseconds. 80 milliseconds! I am so fast, but I can't meet our requirements.
But my confidence is still greatly improved, and the pursuit of performance makes me bubble a cup of coffee, and the end is sitting in front of the desk and starting new thinking. Since you can write 4 bytes at a time, you can also read 4 bytes! I re-adjusted the code, because the direction of reading and writing is inconsistent (one is the direction along the pixel, one is the direction along the pixel), so it is necessary to read and write in the two directions. The code is slightly complicated. Some, you must use a small number of small rectangular blocks of 4 × 4, and use a large number of local variables and shift operations when processing. It looks a bit chaos. But the result of the result is amazed - the data processing per frame is only 26 milliseconds! This value is very fast, not only can handle enough frames per second, but can make the CPU to make more time to do. It doesn't slow down than a simple Memcpy operation. Cool! Why reduce the number of read memory operations, more than the number of times the memory operation is added, and the system performance increases more? Maybe I read 4 bytes, more for the processing of the pipeline! Ha ha
Of course, if the camera hardware supports the image 90 degrees rotation, there is no above. I also hope that there is a "hard way" to solve this problem, so that the CPU can fully liberate. It's hard, huh, huh, huh, what algorithm is adopted, be sure to choose the most appropriate according to the specific conditions, this is not, do it yourself, full of food.