1. The problem
I haven't been completely awake in the morning, you are woken up by the phone, there is a middle school classmate to ask you an Excel problem. As a so-called Excel expert, you often be harassed. The problem is probably like this, a big Excel file, some of which are repeated, that is, 2 lines are exactly the same, and some lines are not repeated, and now the problem is to find out all not repeat Or duplicate rows, you didn't understand. You probably consider it, use "vlookup" to find it, then reorder, you should be, you need to try it, then tell him how to use, then you tell him, call you again after 20 minutes.
2. Problem solving ideas
You first open Excel, enter some test data, probably this look:
Among them, "Zhang San", "Li Si" has 2, only one, you need to separate them. First enter 1 in the B column, then fill down, input "VLOOKUP (A1, $ A $ 1: $ B $ 7, 2, FALSE)" [1], then return 1, if you can't find it, if you find it, if you can't find it, Empty. The result of the column C column became 1. Because you find yourself can be found, then the found Range must remove the Bank.
You then found several other functions, "match", "index" tried, could not do it; then use the IF function, you start trying to write if functions. Enter the 4th line, parameters, and reference zones turn back and then processed, perhaps Excel smart to fill the reference area you need.
You entered the following IF function:
IF (OR (VLOOKUP (A4, A1: B3, 2, FALSE), VLOOKUP (A4, A5: B7, 2, FALSE), 1, 0)
It's complicated, Excel should open a small window, then use this judgment logic as a code, IF function can nested 7 layers, I really don't know how Microsoft's engineer thinks [2], you put it back while mutter The car, the result is "# n / a", it is "value unavailability", you know that if the function vlookup does not find the required value, return the error value # n / a, the expression has this stuff, so regardless What calculations, the result is it.
From the Tools menu, "Error Check Procedure", "Show Calculation Steps", confirmed your guess, the second VLOOKUP function returned # n / a passed to the end.
At this time, your classmates are coming, you tell him to write a small program, you decide or use direct and simple VBA to solve the problem.
3. VBA program
Open the VBA editor, insert a module, you can't think about the following code:
Sub selectiondouble ()
DIM I As Long, J AS Long
For i = 1 to 7 Step 1
For J = 1 to 7 Step 1
'Does not compare the same line
IF i <> j Then
IF Range ("a" & i) .value = Range ("a" & j) .value the
Range ("e" & i) .value = 1
END IF
END IF
NEXT J
Next i
End Sub
Click to run, it is very good, it is repeatedly marked 1, no repetition is empty, then sort it. You are very satisfied that you also entered a note. You dialed your classmates' calls, tell him you, then he called you, you give the program to him, tell him what to change. God knows what language in school is going to school, anyway, not BASIC, you have to explain what DIM is meaningful. After some toss, he finally entered the code on the phone. As a telecom employee, he can chat with a phone 24 hours a day, just a poor mobile phone bill, you sigh, you should wash your face. 4. Efficiency
Wash your face, brush your teeth, you bubble a cup of coffee and return to the computer next to the computer. You think that "Good News" that has been completed, heard it is a crash, standing for 0.1 second, you think that it should be that the program is still executing or dead cycles. You asked him about the amount of data, knowing about more than 9,000 records, okay, you think.
You check the code, there is no death loop, maybe you have any mistakes when you enter, you will change the cycle to 1 to 1000, then pick up the cup, swallow a coffee, rely on the results, etc. . A few minutes, still didn't end, you feel somewhat strange, you knocked "Ctrl Break", suspend the program, put the mouse on the I variable, show i or 24, TNND, you know that it is the Range function, too slow, Forget it, you call you classmates, prior to a few hours can be calculated. You have drank a coffee, self-speaking, compared to hand-screening, after all.
But there is no less than 10,000 records, the built-in function such as Excel's VLOOKUP is okay.
4.1. By array
The array is more than the RANGE function, you change the program, define 2 arrays, first read all the data into the first array, then operate the array, for repetition, the corresponding part of the second array Write as 1, after the calculation is completed, write the result back according to the second array. The program code is as follows:
Sub selectiondouble2 ()
DIM I As Long, J AS Long
DIM MAX AS Long
DIM A () AS String, B () As long
MAX = 10000
Redim A (max) AS STRING
Redim B (Max) As Long
For i = 1 To Max Step 1
A (i) = Range ("a" & i) .value
Next i
For i = 1 To Max Step 1
For j = 1 to max STEP 1
'Does not compare the same line
IF i <> j Then
IF a (i) = a (j) THEN
B (i) = 1
END IF
END IF
NEXT J
Next i
For i = 1 To Max Step 1
Range ("f" & i) .value = b (i)
NEXT
End Sub
You have implemented it, for a 10,000 record, it takes less than 5 minutes. You feel very satisfied, the efficiency has increased a few quantities, you haven't forgotten to set up a MAX variable, so that the code will change when the code is used.
4.2. Using the built-in function you remember the function of VLOOKUP, it is really a soul. Yes, why VLOOKUP is so fast, of course because it is compiled, not writing [3] in VBA [3]. You aim a machine, why don't you use this function, in VBA, you can use the Application. Function name, call the built-in function of Excel. In this way, the changed code is as follows:
Sub selectiondouble3 ()
DIM I As Long, J As Long, A, B
For i = 2 to 9999 step 1
A = Application.vlookup (Range ("A" & I), Range ("A1: B" & (i - 1)), 2, FALSE
B = Application.vlookup ("A" & I), Range ("A" & (i 1) & ": b1000"), 2, false
IF ISERROR (a) and isrror (b) THEN
Range ("g" & i) .value = 0
END IF
Next i
End Sub
The code is short, but there is a bit of complex and annoying, the loop is from 2 to 9999 because these two rows need to be manually processed in order to prevent the Range range of the VLOOKUP function. The ISError function detects the return value. If the two return values are wrong, this behavior is single without repeated rows, the flag is 0. The program execution speed and the above are similar, at least you don't feel different.
4.3. Continue HACK
Here, you still feel unsatisfactory, use arrays, data volume is too large, the memory is tight, use the VLOOKUP function, the code feels ugly [4]. You don't know why you think of the stuff, then you should find it, you should sort before looking for, you will sequence the data in Excel. The problem is to cycle 2 times, the complexity is n * n, if ..., if you rank, just check if the current value is the same, if the same, then the current and next one The location is marked, the loop variable plus 2, skip the next one, if not, the loop variable plus 1 will continue to be more, the code is as follows:
Sub selectiondouble4 ()
DIM I as long, max as long
MAX = 10000
i = 1
DO
IF Range ("a" & i) .value = range ("a" & (i 1)). Value Then
Range ("i" & i) .value = 1
Range ("i" & (i 1)). Value = 1
i = i 2
Else
i = i 1
END IF
Loop
While i End Sub This program has only n, and the execution speed is of course the fastest in all the programs you wrote today, and the memory is also minimized. You feel very satisfied, show the smile of thief thief. 5. Summary You opened the log and started to write down today's solution. You think, um, if you just want to change the Range function to solve the problem, the speed will not improve the essence. The speed is improved, first, sort is the key, fast findings and search are based on the sequential content, such as two-point lookup, then why the database should be indexed, the index has a lot of finding speed, truth It's all the same; second, there is no backtrack when looking for, skipping directly, this and string matching algorithm, it seems to be the KMP algorithm [5], the idea is the same, um, then if it is not the same The content is not 2, it is multiple, then you can use a loop to be traced, and for different numbers, you can identify as different numbers. You suddenly feel confident, it seems that you have forgotten the facts that have been unemployed for half a year. [1] Find the specified value in the first column of the table or numerical array, and thereby returns the value of the specified column in the table or array current row. When the comparison value is in the first column of the data table, the function VLookup can be used instead of the function hlookup. Specific usage can refer to Excel Help. [2] As a programmer, I have always thought that if the IF function is a waste time and more, how can the 7-story IF function understand? But the function represents simple, you don't want to tell him because you want to write the program to solve the problem. [3] Day knows that Microsoft uses these code, maybe C, maybe C , definitely not BASIC, nor c #, write it, C # is not born yet. [4] Maybe you have not written it. [5] Although it is not a class, you have learned the data structure and algorithm.