Chapter III Chapter 2 Introduction Chapter 3 Read Input Archive Chapter 4 Print Chapter 5 Patterns Chapter VI EXPRESSION EXPORT CONCRUTION Chapter 7 Control of Actions Chapter VIII Chapter 8 (Built-In Functions) Chapter 10, User-defined Letters Chapter 10 Example Chapter 11 Conclusion ========================== ============= The first chapter preface AWK is a program language that has strong functions for the processing of information. For processing, comparison, extraction, etc. in the text file, and the AWK can be easily completed in a very short program. If the above action is completed using a language written in C or PASCAL, it will be inconvenient and take time, and the writings will be large. The AWK can decompose the input data in accordance with the user's definition format, or printed in accordance with the format defined by the user. The AWK name is the first letter of its original designer's last name: Alfred V. Aho, Peter J. Weinberger, Brian W. Kernighan. AWK was originally completed in 1977. A new version of AWK was published in 1985, and its features have been enhanced a lot more than the old version. Gawk is the awk made by the GNU. Gawk was originally completed in 1986, which is constantly being improved and updated. Gawk contains all the features of awk. The next Gawk will explain the following two input files to explain. Archive 'bbs-list': aardvark 555-5553 1200/300 B Alpo-Net 555-3412 2400/1200/300 a Barfly 555-7685 1200/300 a Bites 555-1675 2400/1200/300 a Camelot 555-0542 300 C core 555-2912 1200/300 c fooey 555-1234 2400/1200/300 b Foot 555-6699 1200/300 B Macfoo 555-6480 1200/300 A SDACE 555-3430 2400/1200/300 a Sabafoo 555-2127 1200 / 300 c file 'shipped': JAN 13 25 15 115 Feb 15 32 24 226 Mar 15 24 34 228 APR 31 52 63 420 May 16 34 29 208 J Un 31 42 75 492 JUL 24 34 67 436 AUG 15 34 47 316 Sep 13 55 37 277 NOV 20 87 82 577 DEC 17 35 61 401 Jan 21 36 64 620 Feb 26 58 80 652 Mar 24 75 70 495 APR 21 70 74 514 Chapter 2 Introduction Gawk's main function is for file Each row is searching for the specified patterns. When you have the specified patterns, Gawk will perform the specified Actions in this line.
Gawk handles every line of the input file until the file is entered in this way. The Gawk program is composed of many Pattern and Action, and the action is written in a large bracket {}, and one Action is followed behind a pattern. The entire GAWK program will look like the following: pattern {action} pattern {action} in the Gawk program, Pattern, or Action can be omitted, but two cannot be omitted at the same time. If Pattern is omitted, the Action will be executed for each row inside the input file. If Action is omitted, the internal Action will print all input lines that match Pattern. 2.1 How to perform the Gawk program basically, 2 methods can perform the Gawk program. □ If the Gawk program is very short, Gawk can be written directly in Command Line, as shown below: Gawk 'Program' INPUT1 INPUT-FILE2 ... where Program includes some Pattern and Action. □ If the Gawk program is longer, the more convenient practice is to exist in the GAWK program, which is written in the file named Program-file in the file. The implementation of the Gawk is as follows: Gawk -f program-file input -file1 input-file2 ... Gawk program more than one file, performing the Gawk's format as follows: gawk -f program-file1 -f program-file2 ... input-file1 input-file2 ... 2.2 a simple Examples Now we will give a simple example because the Gawk program is very short, so write the Gawk program directly in Command Line. Gawk '/ foo / {Print $ 0}' BBS-List actual GAWK program is / foo / {print $ 0}. / foo / for pattern, means whether each line of the input file contains a sub-string 'foo', if the 'foo' is included, execute an action. Action is Print $ 0, which is printed now. BBS-LIST is the entered file. After performing the above instructions, the following results will be printed: Fooe 555-1234 2400/1200/300 b Foot 555-6699 1200/300 B Macfoo 555-6480 1200/300 a Sabafoo 555-2127 1200/300 C 2.3 Complex example gawk '$ 1 == "feb" {SUM = $ 2 $ 3} end {print sum}' shipped now this example will compare the first field of the input file 'shipped' with "Feb", if equally However, the corresponding column 2 and the value of the third field are added to the variable sum. The above operations are repeated for each row of the input file until each row of the input file is processed. Finally, the value of the SUM is reproduced. End {print su} means that after all of the input is completed, the action of print sum is executed, that is, the value of the SUM is displayed.
The following is the result of the execution: 84 Chapter 3 Reads the input of the input file Gawk can be read from the standard input or specified file. The input read order is called "records", and Gawk is a record-recorded process (P9 of 46) recording. The internal value of each record is a line, and a record is divided into multiple fields (Fields). 3.1 How to Decompose into a Recording (Records) Gawk language will decompose the input into records. The record and records are separated by Record Separator. The internal value of the Record Separator is a new line character, so the contents of the repland separator make each row of the text are a record. Record Separator changes with the change of the inner variable RS. RS is a string, its internal value is "" ". Only RS's first character is valid, it is used as Record Separator, and other fonts of RS will be ignored. Built-in variable FNR will store the current input file has neck-free> Raiji Jia 鍪 D CCR NR will store all the input files have neck and kiwi> Rao Jia 鍪 3.2 column (Field) Gawk automatically breaks each record into multiple fields (field). Similar to the word in one line, Gawk's internal action will consider that the field is separated from Whitespace. In Gawk, Whitespace means one or more blank or tabs. In the Gawk program, the first field is represented in '$ 1', '$ 2' indicates the second field, and so on. For example, assume that the input is as follows: This Seems Like a Pretty Nice Example. The first field or $ 1 is 'this', the second field or $ 2 is 'seems', and so on. There is a place worth particular attention, seventh field or $ 7 is 'Example.' Instead of 'Example'. Regardless of how many fields, $ NF can be used to represent the last field of records. Take the above example as an example, $ Nf is the same as $ 7, that is, 'example.'. NF is an built-in variable, which represents the number of current recorded fields. $ 0, it seems that it is a zero-ray, it is a special case, it represents the entire record. Here is a more complicated example: Gawk '$ 1 ~ / foo / {Print $ 0}' bbs-list results are as follows: Fooe 555-1234 2400/1200/300 b Foot 555-6699 1200/300 B Macfoo 555-6480 1200 / 300 a Sabafoo 555-2127 1200/300 C This example is to check the first field of each record of the input file 'bbs-list'. If it contains a substring 'foo', this record will be Print. 3.3 How to break the record decomposed into the field Gawk to break a record into a field according to Field Separator. Field Sepa- Rator is represented by built-in variable fs.
For example, if Field Separator is 'oo', the following line: Moo Go Gai Pan is divided into three fields: 'm', 'g', 'Gai Pan'. In the Gawk program, '=' can be used to change the value of the FS. For example: gawk 'begin {fs = ","}; {Print $ 2}' input line is as follows: John Q. Smith, 29 Oak St., Walamazoo, MI 42139 Exercise the result of Gawk '29 Oak ST. '. The Action behind Begin will execute once before the first record is read. The fourth chapter is printed in the Gawk program, and the most common thing to actions is printed. Simple printing, using Printe described. Printing of complex formats, using Printf. 4.1 PRINT Narrative PRINT Narration is used in a simple, standard output format. The format of the narrative is as follows: Print item1, item2, ... output, each Item is separated between a blank, and finally will be wrapped. If the 'print' is not followed by anything, it is the same as the effect of 'Print $ 0', it will print the record (RECORD). To print blank lines can use 'print "". A fixed text is printed, which can be enclosed in both sides of the text, such as' print "Hello there". Here is an example, it prints the first two columns of each input record: Gawk '{Print $ 1, $ 2}' Shipped results as follows: Jan 13 Feb 15 Mar 15 APR 31 May 16 Jun 31 JUL 24 AUG 15 SEP 13 OCT 29 NOV 20 DEC 17 Feb 26 Mar 24 APR 21 4.2 Output Separators In front of us, we have enabled if the Print describes that there is multiple items, and the items are separated from the teasings, and each Item will be in one. Blank spacer. You can use any string as Output Field Separator, you can set the Output Field Separator. The initial value of OFS is "" ", that is, a blank. The output of the entire Print is called Output Record. After the Print describes Output Record, it will then output a string, which is called Output Record Separator. Built-in variables ORS is used to indicate this string. The initial value of the ORS is "" ", that is, the wrap. The following example prints the first field and the second field of each record. This bid (P16 of 46) fields are separated by semicolon ';' Separate, and a blank will be added after each line output Row.
Gawk 'begin {OFS = ";"; ics = "} {Print $ 1, $ 2}' bbs-list results are as follows: Aardvark; 555-5553 Alpo-Net; 555-3412 Barfly; 555-7685 Bites; 555 -1675 Camelot; 555-2912 Fooe; 555-1234 foot; 555-6699 macfoo; 555-6480 SDACE; 555-3430 SABAFOO; 555-2127 4.3 PrintF Description PrintF Description Make output format easier to accurately control. The PrintF describes the width printed by each Item, or various types of numbers can be specified. The format of the Printf is as follows: PrintF format, item1, item2, ... Print and Printf differences are Format, PrintF's quota than Print (P18 of multi-string format.format is the same as the format of ANSI C PRINTF does not do automatic walking action. Built-in variable OFS OFS is not an impact on the PrintF narrative. The specification of the format begins with the character '%', and then the format controls letters. Format control the letter as follows: 'c 'Print the number in ASCII. For example,' printf "% c", 65 'prints the character' a '.' D 'printing the integer of the ten-in-one.' I 'printing the integers of the decimal.' E 'Print the number in the form of a scientific symbol. For example, Print "$ 4.3e", 1950 (P19 of the result will print the' 1.950E 03 '.' f 'Print the number in floating point.' g ' Printing the number in the form of a scientific symbol. The absolute value of the number is printed in the form of floating point equal to 0.0001, otherwise prints in the form of a scientific symbol. 'O' printing no eight Revenge integers. 'S' printing a string. 'X' printing no hexadecimal integers. 10 to 15 represented by 'A' to 'f'. 'X' printing no hexadecimal hexadecimal integersion .10 to 15 are indicated by 'A' to 'f ".'% 'It is not a real format control letter,' %%" will print "% '. MODIFIER can be added between% and format control letters, Modifier It is used to further control the format. Possible Modifier is as follows: '-' Before using Width, it indicates to lean to the left. If '-' does not appear, the width being specified is to right. For example: Printf "% -4s", "foo" prints 'foo'. 'Width' This number indicator is printed with the width of the field printing. For example: Printf "% 4s", "foo" will print The value of 'foo'. The value of Width is a minimum width rather than the maximum width. If the value of an ITEM needs to be larger than the width, it is not affected by Width. For example, PrintF "% 4s", "FOOBAR" will print ' FOOBAR '.' .PREC 'This figure specifies the accuracy of printing. It specifies the number of digits on the right side of the decimal point. If you want to print a string, it specifies how many characters will be printed by this string.
Chapter 5 Patterns In the Gawk program, when Pattern meets the current input record (Record), its corresponding Action will be executed. 5.1 Type of Pattern This is one here for Gawk's Pattern Type: / Regular Expression / (P22 of a regular expression is as a Pattern. Whenever the input record (Record) is consistent with regular expression. EXPRESSION a single Expression. When a value is not 0 or a string is not empty, it can be considered in accordance with the match. PAT1, PAT2 pair Patterns are separated by commas, specify the recorded □. Begin End This is special pattern, Gawk starts start The action corresponding to Begin or End is performed separately at the end of execution or end. Null This is an empty Pattern, which is considered to match Pattern. (P23 of 5.2 Regular Expressions as patterns a regular expression Short-handed as regexp, is a way to describe strings. A regular expression is surrounded by slash ('/') as a Pattern. If the input record contains regexp, it is considered compliance. For example: patter is / foo /, It is considered to be consistent with any input records. The following example prints the second field of the input record containing 'foo'. Gawk '/ foo / {print $ 2}' BBS-List results are as follows : 555-1234 555-6699 555-6480 555-2127 RegexP can also be used in comparison. (P24 of exp ~ / regexp / if exp is regexp, the result is true (TRUE). Exp! ~ / Regexp / if Expament does not comply with regexp, the result is true. 5.3 Comparison Equity When the Pattern comparison of Patterns is used to test the relationship between two numbers or strings, such as greater than, equal to, less than. The following lists some comparison pattern: x
5.4 Boolean Pattern is a Boolean Pattern, "or" (), "and" ('&&'), "Anti -" ('!), "Anti -" ('!), "Anti -" ('!). For example: Gawk '/ 2400 / && / foo /' bbs-list gawk '/ 2400 / || / foo /' bbs-list gawk '! / Foo /' bbs-list Chapter 6 EXPRESSION As the actions EXPRESSION is the basic components of Action in the Gawk program. 6.1 Arithmetic operation in Gawk in Gawk as follows: x y plus X-Y reduced -X negative X positive. There is no impact on it. X * Y multiply X / Y in addition to X% Y. For example, 5% 3 = 2. X ^ y x ** y X of the y again. For example, 2 ^ 3 = 8. 6.2 Comparative Algorite and Bubland Comparative Comparison Element Use the relationship between the strings or numbers, the operation symbols are the same as the C language. The table is as follows: x
The following example prints the top three fields of each input recording (Record). Gawk '{i = 1 While (i <= 3) {Print $ I }}' 7.3 Do-while narrative Do Body while (condition) This do loop performs Body once, then as long as the cons corresponding is true, you will repeat your body. (P32 of even if the condition is a pseudo, Body is also executed. The following example will print each input record ten times. Gawk '{i = 1 DO {Print $ 0 i } while (i <= 10)} '7.4 for narrative for (Initialization; Condition; Increment) Body This narrative begins inTITION, then as long as the condition is true, it will execute Body and do Increment. The following example will print the first three records. Boot. Gawk '{for (i = 1; i <= 3; i ) Print $ I}' 7.5 Break Narrative Break Descript The innermost layer containing its for, while, do-while ring loop. Below The example will find the minimum divisor of any integer. It will also determine whether it is a magple number. Gawk '# Find Smallst Divisor of Num {Num = $ 1 for (DIV = 2; Div * Div <= NUM; DIV ) IF (Num% DIV) == 0) Break if (NUM% DIV == 0) Printf "Smallest Divisor Of% D IS% D", NUM, DIV Else Printf "% D Is Prime", NUM} '7.6 Continue Narrative (P34 of 46) Continue User uses in For, While, Do-While rehearse, it will skip the remaining parts of the loop Body, so that it immediately performs the implementation of the next recoil. The following example prints all the numbers of 0 to 20, But 5 will not be printed. Gawk 'begin {for (x = 0; x <= 20; x ) {if (x == 5) Continue Printf ("% d", x)} print "}' 7.7 NEXT Narration, NEXT FILE NR, EXIT Nar Descriptive NEXT Narration Forced Gawk immediately stopped processing the current record (Record) continues the next record. NEXT FILE describes NEXT. However, it is forced Gawk to stop processing the current data file. EXIT Narration Will make the Gawk program stop executing and jump out. However, if the end appears, it will execute the act of your END Ions. Chapter 8 Built-in Functions built-in function is a Gawk built-in function that calls the built-in function anywhere in the Gawk program. 8.1 Introduction Function INT (X) of Numerical Int (X) finds the integer part of the X, and is going to the direction of 0. For example: INT (3.9) is 3, int (-3.9) is -3. (P36 of 46) SQRT (x) obtains the X positive square root value. Example SQRT (4) = 2 EXP (x) finds the second part of X. Example exp (2) is to ask E * E. LOG (X) finds the natural logarithm of X.
SIN (X) finds the X of the Sine value, X is the metric. COS (X) finds X's cosine value, X is a metric. Atan2 (Y, X) Ask Y / X of the Arctangent value, the value of the value is a metric. Rand () gives a chaotic value. This chaotic value is distributed between 0 and 1. This value will not be 0, nor is 1. Each time Gawk is executed, the RAND starts generating a number from the same point or SEED. SRAND (X) Sets the starting point or SEED of a mess. If you set the same SEED value in the second time, you will re-get the chaotic value of the same sequence. If the quotes x, such as SRAND (), now, the time will be used as a SEED. This method can make the chaos are really unpredictable. The return value of SRAND is the previously set SEED value. 8.2 String Binding Function INDEX (In, Find) (P37 of 46) It will be in the strings in, find the first place in the string Find, the return value is the string Find appears in the string The location in IN. If the string Find cannot be found in the strings in, the return value is 0. For example: Print Index ("peanut", "AN") is printed 3. Length (String) finds a String has several characters. For example: Length ("abcde") is 5. Match (string, regexp) Match Verstries will look for the longest and left skewers that meet the REGEXP in the string string. The return value is Regexp in the start position of String, ie index value. The MATCH function sets the inner variable RStart equal to INDEX, which will also set the inner variable RLenstegliomethth number of characters equivalent. If it does not meet, RSTART is 0, and RLEngth is -1. (p38 of 46) Sprintf (Format, Expression1, ...) is similar to Printf, but Sprintf is not printed, but the string. For example: Sprintf ("PI =% .2f (approx.) ', 22/7) The string is" pi = 3.14 (approx.) "SUB (Regexp, Replacement, Target) in the string Target, looking for Compliance with the longest and left side of RegeXP, replacing the leftmost replacement in place, such as: str = "water, water, everywhere" sub (/ at /, "ith", str) result string STR will GSUB (Regexp, Replacement, Target) GSUB is similar to the previous SUB. Inside the string Target, find all the places that match the replacement in the string Replacement, for example: (p39 Of 46) Str = "Water, Water, Everywhere" GSUB (/ at /, "ITH", STR) Result String STR will become 'Wither, Wither, Everywhere "Substr (String, Start, Length) Passing strings String's substring, the length of this sub-string is the length of the LENGTH, starting from the position of the START.
For example: Substr ("Washington", 5, 3) The return value "ING" If the length does not appear, the transmitted sub-string starts from the start of the start to the end. For example: Substr ("Washington", 5) The return value "INGTON" TOLOWER (String) Change the uppercase of the string String to lowercase letters. For example: TOLOWER ("Mixed Case 123") The return value is "Mixed Case 123" TouPper (String) to change the lowercase letters of the string String to uppercase letters. For example: TouPper ("Mixed Case 123") The return value is "Mixed Case 123" 8.3 Input Output In-input Output The input or output file filename is turned off. System (Command) This file allows the user to perform an instruction of the job system, will return to the Gawk program after execution. For example: Begin {system ("ls")} User-Defined functions, user-defined functions, often simplifies yourself using your own defined functions. The call user-defined a letter is the same as the method of calling a buffer. 9.1 Functional definition format The definition of a function can be placed anywhere in the Gawk program. A user-defined a function of its format is as follows: function name (parameter-list) {body-of-function} name is the name of the defined function. A correct writer name can include a sequence of brands, numbers, and underscores, but not using numbers. Parameter-List is a list of all quotes (Argument), and each extraordination is separated. Body-of-function contains a narrative of Gawk. It is the most important part of the letter definition, which determines what kind of thing in the letter actually do. 9.2 Example of a Letter Definition The following example will add the square of the value of each record of the first field of each record to the square of the value of the second field. {Print "Sum =", SquareSum ($ 1, $ 2)} Function SquareSum (X, Y) {SUM = X * X Y * Y Return SUM} Chapter 10 Examples will list some examples of Gawk programs. Gawk '{if (nf> max) max = nf} end {print max}' This process is printed in all input lines, the maximum number of fields. Gawk 'Length ($ 0)> 80' This is printed with each line of more than 80 fifths. Only Pattern is listed here, Action is a Print using the internal definition. Gawk 'NF> 0' For all rows with at least one field, this program will be printed. This is a simple method that deletes all blank lines in a file. Gawk '{if (nf> 0) Print}' is printed on all lines with at least one field. This is a simple method that deletes all blank lines in a file. Gawk 'begin {for (i = 1; i <= 7; i ) Print Int (101 * rand ())}' This is printed with 7 challenges between 0 and 100.