Analyze a source code, an effective method is:
1. Read the description of the source code. For example, the readme in this example, the author writes very detailed, after reading it, it is often able to find the corresponding instruction from the ReadMe file when reading the program, thus simplifying the source program Reading work.
2. If the source code has a document directory, it is generally DOC or DOCS, it is best to read carefully before reading the source program, because these documents have a good explanation of annotations.
3, start from the Makefile file, analyze the hierarchy of the source code, find which is the main program, which is a function package. This is a great help to quickly grasp the program structure.
4, start from the main function, read down one step, encounter simple functions that can be speculated, you can skip. But must pay attention to the global variable used in the program (if a C program), you can copy the critical data structure instructions to a text editor to find it at any time.
5. Analyze function packs (for C procedures), pay attention to which is a full function, which is the internal functions, pay attention to the extern keyword. Also pay attention to variables. First analyze the internal functions, then analyze the external function, because the internal function is definitely called in an external function.
6, the importance of the data structure is required: For a C program, all functions are operated with some data, and these data may appear anywhere in the program because there is no better encapsulation. Any function modification, so be noted to pay attention to the definition and meaning of these data, but also to note which functions are operated on them, what changes do it.
7. While reading the program, it is best to deploy the program into the version of the controller and the source code. You can do some modified tests on the source code when needed, because the hands modification is better than just reading. The method of reading the program. When you modify the running program, you can call the original code from the CVS to compare the part of your change (DIFF command), you can see some source code advantages and disadvantages and can practice your programming technology.
8. While reading the program, pay attention to the use of some gadgets, can improve the speed, such as the lookup feature in the VI, mode matching, make a tag, and grep, find these two most powerful text search tools usage of.
For a UNIX / Linux running under command line, there are such a set of routines, you can use the program as a reference when reading the program.
1. At the beginning of the program, it is often analyzing the command line. According to the command line parameters, some variables or arrays, or structural assignments, the latter programs are different from these variables.
2. After analyzing the command line, prepare data is often the counter, the structure is clear, and the structure is clear.
3. There are some pre-compiled options in the middle of the program that can be found in the Makefile.
4. Note that the procedure for the process of logging, and the action to open when the debug option is opened, these are great helpful for debuggers.
5. Pay attention to the operation of multi-threaded data. (This is not involved in this example)
Here is an actual example: Total_Rec ; if (strlen == (bufsize-1)) {if (verbose) {fprintf (stderr, "% s", msg_big_rec); if (debug_mode) fprintf (stderr, " :% s ", buffer); else fprintf (stderr," ");} Total_bad ; / * Bump Bad Record counter * / / * get the rest of the record * / while ((gz_log)? (gl _)? (gaff)? , Buffsize! = Z_null): (Fgets (buffer, bufsize, log_fname? Log_fp: stdin)! = Null) {if (strlen (buffer / * got a record ... * / strcpy (tmp_buf, buffer); / * save buffer in case of error * / if (Parse_Record (Buffer) / * Parse the record * / Copy the data into a buffer and call Parse_Record () to process. We can speculate on the same, get_record () is a primary processing section of this program, analyzing log data. In PARSE_RECORD.C, there is this function, / ***************************************************************** / / * PARSE_RECORD - uhhh, you know ... * / / *************************************************** ****** / int parse_record (char * buffer) {/ * clear out structure * / memory (& log_rec, 0, sizeof (struct log_struct)); / * log_rec.hostname [0] = 0; log_rec.datetime [ 0] = 0; log_rec.url [0] = 0; log_rec.resp_code = 0; log_rec.xfer_size = 0; log_rec.refer [0] = 0; log_rec.Agent [0] = 0; log_rec.srchstr [0] = 0; log_rec.Ident [0] = 0; * / #ifdef use_dns memset (& log_rec.addr, 0, sizeof (struct in_addr)); #ENDIF / * Call appropriate handler * / switch (log_type) {default: case LOG_CLF: return parse_record_web (buffer); break; / * clf * / case LOG_FTP: return parse_record_ftp (buffer); break; / * ftp * / case LOG_SQUID: return Parse_record_squid (buffer); Break; / * Squid * /}} You can see that log_rec is a global variable, which calls three different analytical functions according to the type of log files. In Webalizer.h, find the definition of this variable, can be seen from the structural definition, the structure defines all the information that the log file may contain (refer to the format description of the CLF, FTP, Squid log file). / * Log record structure * / struct log_struct {char hostname [MAXHOST]; / * hostname * / char datetime [29]; / * raw timestamp * / char url [MAXURL]; / * raw request field * / int resp_code; / * response code * / u_long xfer_size; / * xfer size in bytes * / #ifdef uSE_DNS struct in_addr addr; / * IP address structure * / #endif / * uSE_DNS * / char refer [MAXREF]; / * referrer * / char agent [MaxAgent]; / * user agent (browser) * / char srstr [maxsrch]; / * search string * / char IDent [maxIdent];}; / * ident string (user) * / EXTERN STRUCT log_struct log_rec; Let's take a look at the internal function used by Parser.c, then take a look at how this function works as an example, parse_record_ftp, parse_record_squid leaves the reader's own analysis as an exercise. / ****************************************************************** / * FMT_LOGREC - TERMINATE LOG FIELDS W / ZEROS * / / ********************************************************** ***** / void fmt_logrec (char * buffer) {char * cp = buffer; int = 0, b = 0, p = 0; While (* cp! = ') {/ * Break record Up, Terminate Fields with' * / switch (* cp) {case ': IF (b || q || P) Break; * cp =' ' ; Break; Case '": q ^ = 1; Break; Case' [': IF (q) Break; B ; Break; Case'] ': IF (q) Break; IF (B> 0) B - Break; Case '(': if (q) Break; P ; Break; Case ')') ': IF (q) Break; IF (P> 0) P -; Break;} CP ;}} from Parser.h You can see in the header file. This function is an internal function. This function replaces the space characters in the middle of a line of strings (end characters), while considering the double quotes, square brackets, parentheses The middle space characters are avoided from being separated by a line of data errors. (Please refer to the file format of the web log, you can understand this function more clearly) INT PARSE_RECORD_WEB (CHAR * BUFFER) {Int size; char * CP1, * CP2, * CPX, * EOB, * EOS; size = strlen (buffer); / * get length of buffer * / eob = buffer size; / * Calculate end of buffer * / fmt_logrec (buffer); / * seperate fields with 's * / / * hostname * / cp1 = cpx = buffer; cp2 = log_rec.hostname; EOS = (CP1 MaxHost) -1; if (EOS > = EB) EOS = EOB-1; While (* CP1! = ') && (CP1! = EOS)) * CP2 = * CP1 ; * CP2 =' '; if (* CP1! =' ') { IF (Verbose) {fprintf (stderr, "% s", msg_big_host); if (debug_mode) fprintf (stderr, ":% s", cpx); else fprintf (stderr, "");} while (* cp1! = '') CP1 ;} IF (cp1 IF (EOS> = EB) EOS = EOB-1; While (* CP1! = ') && (CP1! = EOS)) * CP2 = * CP1 ; * CP2 =' '; if (* CP1! =' ') {if (verbose) {fprintf (stderr, "% s", msg_big_date); if (debug_mode) fprintf (stderr, ":% s", cpx); else fprintf (stderr, "");} while (* CP1! = ') CP1 ;} IF (cp1 EOS = (CP1 MaxRef-1); if (EOS> = EB) EOS = EOB-1; While (* CP1! = ') && (* cp1! =') && (CP1! = EOS)) * CP2 = * CP1 ; * cp2 = '; if (* CP1! =') {if (verbose) {fprintf (stderr, "% s", msg_big_ref); if (debug_mode) fprintf (stderr, ":% s ", cpx); Else FPRINTF (stderr," ");} while (* cp1! = ') CP1 ;} if (cp1 / * Convert MONTH name to lorscase * / for (i = 4; i <7; i ) log_rec.datetime [i] = tolower (log_rec.datetime [i]); / * get Year / Month / day / hour / min / sec values * / for (i = 0; i <12; i ) {ix (strncmp (log_month [i], & log_rec.datetime [4], 3) == 0) {REC_MONTH = i 1; Break;} } REC_Year = ATOI (& log_rec.datetime [8]); / * get Year Number (int) * / REC_DAY = ATOI (& log_rec.datetime [1]); / * get day number * / rec_Hour = atoi (& log_rec.datetime [ 13]); / * Get Hour Number * / REC_MIN = ATOI (& log_rec.datetime [16]); / * Get minute Number * / REC_SEC = ATOI (& log_rec.datetime [19]); / * get second Number * /. ... After PARSE_Record analysis, the date analysis is done, convert the months such as the month into the log (understandable) data and stored into the log_rec. IF ((i> = 12) || (REC_MIN> 59) || (REC_SEC> 59) || (REC_YEAR <1990)) {Total_Bad ; / * if a bad date, bump counter * / if (verbose) {fprintf (stderr, "% s:% s [% lu]", MSG_BAD_DATE, LOG_REC.DATETIME, TOTAL_REC); ... If the date, time error, increase the Total_Bad counter 1, and print error messages to standard errors Output. good_rec = 1; / * get current records timestamp (seconds since epoch) * / req_tstamp = cur_tstamp; rec_tstamp = ((jdate (rec_day, rec_month, rec_year) -epoch) * 86400) (rec_hour * 3600) (rec_min * 60 ) rec_sec; / * Do we need to check for duplicate records (incremental mode) * / if (check_dup) {/ * check if less than / equal to last record processed * / if (rec_tstamp <= cur_tstamp) {/ *? if it is, assume we have already processed and ignore it * / total_ignore ; continue;} else {/ * if it is not .. disable any more checks this run * / check_dup = 0; / * now check if it's a new month * / if (cur_month = rec_month!) {clear_month (); cur_sec = rec_sec; / * set current counters * / cur_min = rec_min; cur_hour = rec_hour; cur_day = rec_day; cur_month = rec_month; cur_year = rec_year; cur_tstamp = rec_tstamp; f_day = l_day = rec_day; / * reset first and last day * /}}} / * check for out of sequence records * / if (rec_tstamp / 3600 If this date, the time has no error, then the data is a good data, plus the GOOD_RECORD counter 1, and check the timestamp, and whether the data is repeated. Here is a function, JDATE () we have encountered at the beginning of the main program, and there was no more than a study in the past, and he left the reader to do an exercise. (Tip: This function generates a string according to a date, this string is the only ability to check the repetition of time, is a universal function, you can use in other programs) / **************************************************************** / * DO Some pre-process formatting * / / ********************************************************** *** / / * FIX URL FIELD * / CP1 = CP2 = LOG_REC.URL; / * HANDLE NULL '-' Case Here ... * / if (* CP1 == '-') {* CP2 = ' - '; * cp2 =' ';} else {/ * strip actual url out of request * / while (* cp1! =') && (* cp1! = ')) CP1 ; if (* cp1! = '') {/ * scan to begin of actual url field * / while (* cp1 == ') && (* cp1! =')) CP1 ; / * remove duplicate / if needed * / if ((* CP1 == '/') && (* (cp1 1) == '/')) CP1 ; While ((* CP1! = ') && (* cp1! =' ") && (* cp1! = ')) * CP2 = * cp1 ; * cp2 =' ';}} / * un-escape url * / unscape (log_rec.URL); / * Check for service (IE: http: //) and lowercase if found * / if ((CP2 = strstr (log_rec.url, ": //")))! = null) {cp1 = log_rec.URL; while (cp1! = cp2) {IF ((* cp1> = 'a') && (* CP1 <= 'Z')) * CP1 = 'A' - 'A'; CP1 ;}} / * Strip Query Portion of CGI Scripts * / CP1 = log_rec.URL; while (* cp1! = ' ') IF (! isurlchar (* cp1)) {* cp1 =' '; Break;} else cp1 ; if (log_rec.url [0] == ') {log_rec.URL [0] =' / '; log_rec.URL [1] =';} / * strip off index.html (or any aliases) * * / lptr = index_alias; while (lptr! = null) {IF ((cp1 = str.url, lptr-> string))! = null) {IF ((cp1 == log_rec.URL) || (* (* (* CP1-1) = = '/')) {* cp1 = ''; if (log_rec.URL [0] == ') {log_rec.URL [0] =' / '; log_rec.URL [1] =';} Break ;}} lptr = lptr-> next;} / * unescape referrer * / unescape (log_rec.refer); ...... this paragraph, made characters in the URL string work, very long, I personally It is believed that this code should be changed to a function in order to modularize the program, avoiding the main program body too long, causing less integrated and no portability, and is not structured. Skating this boring code, enter a part of the following - post-processing. IF (gz_log) gzclose (gzlog_fp); Else if (log_fname) fclose (log_fp); if (good_rec) / * WERE Any Good Records? * / {TM_SITE [CUR_DAY-1] = dt_site; / * if Yes, Clean Up A Bit * / TM_VISIT [CUR_DAY-1] = TOT_VISIT (SD_HTAB); T_Visit = TOT_VISIT (SM_HTAB); if (ht_hit> MH_HIT) MH_HIT = HT_HIT; IF (Total_Rec> (TOTAL_IGNORE TOTAL_BAD) / * DID WE Process ANY? * / {if (incremental) {if (save_state ()) / * incremental stuff * / {/ * error: unable to save current run data * / if (verbose) FPRINTF (stderr, "% s", msg_data_err); unlink state_fname);}} month_update_exit (rec_tstamp); / * calculate exit pages * / write_month_html (); / * write monthly HTML file * / write_main_index (); / * write main HTML file * / put_history (); / * write history * /} end_time = TIMES (& myTMS); / * Display Timing Totals? * / if (time_me '')) {Printf ("% lu% s", total_rec, msg_records); if (total_ignore) {Printf (PRINTF "(% lu% s", total_ignore, msg_ignored); if (total_bad) Printf (",% lu% s)", total_bad, msg_bad); Else Printf (")");} else f (total_ BAD) PRINTF ("(% lu% s)", total_bad, msg_bad); / * get processing time (end-start) * / temp_time = (float) (end_time-start_time) / clk_tck; printf ("% s%. 2F% S ", MSG_IN, TEMP_TIME, MSG_SECONDS); / * Calculate Records Per Second * / if (Temp_time) i = ((int) ((fLOAT) TOTAL_REC / TEMP_TIME); ELSE I = 0; IF ((i> 0) && (i <= total_rec)) Printf (",% D / sec", i); Else Printf ("");} This section has made some later processing. The next part, I want to be skipped in this article, leaving to the reader of interest to the analysis. There are two points: 1. This program is stronger in front of the previous structure, and it is a bit chaotic in the structure, although the code efficiency is relatively high, but the reusability is not strong enough, limited to the space, I will not explain one by one. 2. During the previous analysis program, some predictions and estimates have been made to the following code, and it is slightly related to the following code, and the reader can come from the analysis code according to the principles mentioned above, as well as a practice. Finally, for some ways to analyze the analytical source code programs mentioned in this article, as the end of this article. Conclusion: Of course, in this article, there is no way to read the methods and techniques of reading source code, and there is no access to any auxiliary tool (except for a simple text editor), there is no reading method for object-oriented program. I want to leave these to the future. Please also discuss these topics.