Large storage
RSA rely on a large number of operations, currently the mainstream RSA algorithm is built on a large number of large numbers of 512 to 1024. Most compilers can only support the 64-bit integer operation, that is, the integer we used in the calculation must be less than or equal to 64 bits, namely: 0xfffffffffffffffffff, that is, 1844674073709551615, which is far from the need for RSA, so It is necessary to establish a large number of calculative libraries to solve this problem.
The easiest way is to process the large number as an array, that is, the large number of numbers with 0-9 10 numbers, and then simulate people's manual "vertical calculation" process to write its addition and subtraction function. But this is very efficient, because there is more than 1024-bit large numbers, there are more than 10 yuan. For any of the operations, it is necessary to do multiple cycles on the array spaces of hundreds of elements, and there are many Additional spatial storage calculation of the return sign and intermediate results. In addition, for some special operations, it is clear that the calculation process will greatly simplify the calculation process. This large number representation is evolved to binary system. Therefore, in some instances, it simply uses the binary array method to record large. Number, this efficiency is lower.
An effective improvement method is to represent a large number as an N-en-array. For the current 32-bit system, N can take a 32th part of the value of 2, that is, 0x10000000, if a binary is 1024 bits of large numbers It turns into 0x10000000, it becomes 32 bits, while each of the value range is not a binary 0-1 or ten-based 0-9, but 0-0xffffffffffffffffffffff, we just use an unsigned long integer To represent this value. Therefore, the large number of 1024-bit is a unsigned long array with 32 elements, and the cyclic scale required for the unsigned long array is 32 times. And the 0x100000000 enrollment and binary, for the computer, almost one thing, the conversion is very easy.
For example, in large numbers 18446744073709551615, equal to FFFFFFFFFFFFF, is equivalent to decimal 99: There are two bits, each bit is fffffff. 18446744073709551616 is equal to 00000001 0000000000000000, it is equivalent to decimal 100: There are three, the first is 1, the other two is 0, so, wait. In practical applications, the order in which the "digital" array is low in the previous high level, so that large numbers can be easily represented by mathematical expressions: a = sum [i = 0 to n] (A [I] * 0x100000000 ** i) (where SUM is summary, A [i] is used to record the i-group of the array of A, ** means a passenger).
Any integer operation can eventually break down the operation between numbers and numbers, and its "digital" up to 0x100000000, its digital and digital operations, the result will inevitably beyond the word length of the current 32 system. In VC , there is a __int64 type to handle 64-bit integers, so don't worry about this problem, and if there is no 64-bit shaping in other compilation systems, you need to store large numbers. For example, Word Type (16-bit) can be used to represent 0x10000, but the efficiency is more efficient to use 32-bit DWORD types, but the 0x100000000 credit is changed to 0x40000000, so that two numbers are the largest The result is 0x3fffffff * 0x3fffffff, less than 0xfffffffff, but does not simply use the high low position to split the calculation result into two "numbers". addition
Set: a = sum [i = 0 to P] (a [i] * 0x100000000 ** i) b = sum [i = 0 to q] (b [i] * 0x100000000 ** i), p> = q c = SUM [i = 0 to n] (c [i] * 0x100000000 ** i) = a b
Obviously: C [i] is not simply equal to A [i] b [i], because if C [i]> 0xffffffFFFFFF is required, it is possible to generate the input when C [I-1] is calculated, so calculates C. [i] will add the last carry value.
If you use carry [i] to record each time the carry is: C [i] = a [i] b [i] carry [i-1] -carry [i] * 0x10000000000 where carry [-1] = 0 If a [i] b [i] carry [i-1]> 0xfffffff, carry [i] = 1; vice versa Carry [i] = 0 If carry [p] = 0, n = p; N = P 1
Subtract
Set: a = sum [i = 0 to P] (a [i] * 0x100000000 ** i) b = sum [i = 0 to q] (b [i] * 0x100000000 ** i), p> = q c = SUM [i = 0 to n] (c [i] * 0x100000000 ** i) = ab
Obviously: C [i] is not simply equal to A [i] -b [i], because if A [i]
If you use Carry [i] to record each borrow,: c [i] = a [i] carry [i] * 0x100000000-B [i] -carry [i-1] where carry [-1] = 0 If a [i]> b [i], Carry [i] = 0; vice versa, Carry [i] = 1 If c [p] = 0, n = p-1; villain, n = P
multiplication
Set: a = sum [i = 0 to P] (a [i] * 0x100000000 ** i) b = sum [i = 0 to q] (b [i] * 0x100000000 ** i), p> = q c = SUM [i = 0 to n] (C [i] * 0x100000000 ** i) = a * b
Obviously: c = sum [i = 0 to q] (A * B [i] * 0x100000000 ** i) (A * B [i] * 100000000 ** i) = sum [j = 0 to P] (a [j] * b [i] * 0x100000000 ** (i j)) so c = sum [i = 0 to q] (sum [j = 0 to p] (a [j] * b [i] * 0x100000000 ** (i j)))) therefore: C [i] = sum [j = 0 to q] (a [ij] * b [j]) carry [i-1] -carry [i] * 0x100000000 where Carry [-1] = 0 carry [i] = (sum [j = 0 to q] (a [ij] * b [j]) carry [i-1]) / 0x100000000 n = P Q-1, If Carry [n]> 0, n = n 1, c [n] = carry
division
Set: a = sum [i = 0 to P] (a [i] * 0x100000000 ** i) b = sum [i = 0 to q] (b [i] * 0x100000000 ** i), p> = q c = SUM [i = 0 to n] (C [i] * 0x100000000 ** i) = a / b
Since B is unable to "trial", we can only convert to b [q] to get an approximation of A [P] to get an approval, so we cannot directly calculate C. However, we can approach C step by step.
Obviously: (a [p] / b [q] -1) * 0x100000000 ** (P-Q) Let: x = 0 Repeat: a = a-x * b, x = x (a [p] / b [q] -1) * 0x100000000 ** (P-Q) until A
Then: x = c Note: Since the large number can be understood as 0x100000000, it is equivalent to the left to move the EE in the array of A, which is not necessary to calculate the elements of each element in the array of A. Right movement Mold Set: a = sum [i = 0 to P] (a [i] * 0x100000000 ** i) b = sum [i = 0 to q] (b [i] * 0x100000000 ** i), p> = q c = SUM [i = 0 to n] (C [i] * 0x100000000 ** i) = a% B The mode of demo is consistent with the process of the business, but it is simple to be more simple because you don't need to record. Repeat: a = A- (a [p] / b [q] -1) * 0x100000000 ** (P-Q) * B until A
There is: a = c Binary one equation In the RSA algorithm, it is often necessary to obtain B, such that (a * b)% m = 1 is made. That is, it is equivalent to solving B, N is unknown binary one, unclear equation A * b-m * n = 1, minimum integer solution. For the minimum integer solution of the united equation, AX-by = 1, the ancient and modern China and foreign countries have conducted a detailed study, and the West has a famous Ou Sudee algorithm. The European ride algorithm is a recursive algorithm, it is easier to understand: For example: 11x-49Y = 1, ask X (a) 11 x - 49 y = 1 49% 11 = 5 -> (b) 11 x - 5 y = 1 11% 5 = 1 -> (c) x - 5 Y = 1 Let Y = 0 generation (c) Get x = 1 order x = 1 generation (b) Get Y = 2 Let Y = 2 Generation (a) Get x = 9 Simplelation can use the recursive algorithm to find any AX- By = 1 (A, B Multi-Mutual) Solution, in fact, by analyzing the conversion of recursive algorithm into non-intensive algorithm, becoming a large derivation. Power mode operation The power mold is the RSA core algorithm, which most directly determines the performance of the RSA algorithm, and for the fast power mode operation, many Western modern mathemologists have proposed a lot of solutions. Usually, the power mode is simultaneously operated as a model operation. For example, D = C ** 15% N, due to: a * b% n = (a% N) * (b% N)% N So: C1 = C * C% N = C ** 2% N C2 = C1 * C% N = C ** 3% N C3 = C2 * C2% n = C ** 6% N C4 = C3 * C% N = c ** 7% n C5 = C4 * C4% n = c ** 14% n C6 = C5 * C% n = c ** 15% N That is: the power mode calculation for E = 15 can be decomposed into 6 multi-mode operations The above method can be found to calculate the following algorithm to calculate D = C ** E% N: D = 1 while e> = 0 if e q = D * D * C% N: D * D * C% N: E = E-1 IF E is an even number D = D * D% N E = E / 2 RETURN D Then analyze, to know when D needs to take C, do not need to repeat the E-enhancement or second operation, only need to verify that E's binary is 0 or 1, and from left to right validation And from right to left verify, but it is easier to verify from left to right: If e = sum [i = 0 to n] (E [i] * 2 ** i), 0 <= e [i] <= 1 d = 1 for i = n to 0 d = D * D% N if E [i] = 1 d = D * C% N return d The rest of the problem is to multiply the model, for a * b% n, if A, b is a large number of 1024 bits, first calculate A * B, then% n, will generate the intermediate result of 2048 bits, if not With dynamic memory allocation technology, you must double the number of spaces in the large number of definitions, which causes a lot of waste because there is no double space in most cases, and dynamic memory allocation technology It will make large number storage to lose continuity and make cycle operation in the operation process very cumbersome. So the primary principle of calculation is to avoid calculating A * B. Due to: a * b = a * (SUM [i = 0 to n] (B [i] * 0x100000000 ** i)),: a * b% n = (sum [i = 0 to n] ((A * B [i]) * 0x1000000 ** i)))% N You can use a cycle: c = 0; for i = 0 to n c = c a * b [i] * 0x100000000% N Return C In fact, there is a Montgomery algorithm to complete multiple cycles of multiplied models, but its principle involves more knowledge of numbers, and it is more troublesome, which is more troublesome. Although the speed is improved, the test will not exceed one quantitude Therefore, it is not considered. Vulnese test The analysis of the Mumerian Test Method using Gemadomati, the fastest algorithm is the Robin Miller test algorithm, the process is as follows: (1) Calculate odd number M so that n = (2 ** r) * m 1 (2) Select the random number a If n is tested once, N is not probability of 25%. If n is tested by T, then n is not the probability of 1/4 ** t. When T is 5, the probability of N is not a probability of 1/128, n is probably greater than 99.99%. In practical applications, 300-500 small pins can be tested to improve the probability of passing the laminmler test, thereby increasing test speed. When the number of randoms is generated, the selected random number is best allowed to save R = 0, and the test of step (3) can be omitted to further improve the test speed. input Output The large number of input output is done by a string, in fact, it is easy to implement, for example, in accordance with the decimal format, then: Enter: x = 0 for i = 0 to n x = x * 10 x = x (int) (STR [n] -48) Return X Output: str = while (x> 0) STR = (char) (x% 10-48) Str Return STR