Matrix, floating point calculation test report

xiaoxiao2021-03-06  52

1) Use D3DXMatrixMultiply to perform matrix multiplication 2) Use Optimized 4-membered algorithm to simulate matrix multiplication (multiplied results with 1) 3) The most stupid method calculates the matrix multiplication (to count 64 multiplication, 48 additions) , And several assignments) 4) use SSE assembly calculation matrix multiplied (less than 64 multiplication, 48 additions, and several assignments) 5) Use XMMIntrin instruction set calculation matrix multiplication

The results showed that: 1) The fastest, estimated not only the instructions were optimized, and the algorithm was also optimized 2) is 1 4 to 5 times, compared with 3, which demonstrates that the algorithm has active 3) is a 10 times of 1 4) Almost in 3, I ft, this is why? I clearly use compilation? How is it not fast? ! ! 5) Almost in 4.

Lessons: 1) Floating point arrays such as Float [4] f cannot be directly declared, due to their problems, can't run in assembly language, should be stated __declspec (align (16)) float [4] f; or Declare Union SSE4 {__M128 m; float f [4];}; 2) The matrix is ​​the declaration: Union SSE16 {__M128 m [4]; float f [4] [4];

转载请注明原文地址:https://www.9cbs.com/read-93603.html

New Post(0)