What is awk?
You may be familiar with UNIX, but you may be very strange to awk, this is not surprising, it is indeed, AWK is far from the well-known known than its excellent function. What is AWK? Unlike most of the other UNIX commands, from the name, we cannot know the function of awk: it is neither an independent English word, nor a few related words abbreviation. In fact, AWK is an abbreviation of three people, they are: AHO, (Peter) Weinberg and (Brain) kernighan. It is these three people created awk - an excellent style scan and processing tool.
What is the function of awk? Similar to SED and GREP, AWK is a style scan and processing tool. But its function is great for Sed and GREP. AWK provides an extremely powerful function: it can complete all the work that GREP and SED can do, while it can also make styles, stream control, math operators, process control statements, or even built-in variables and functions. . It has a complete language that should have almost all exquisite features. In fact, AWK does have your own language: awk programming language, the three creators of awk have officially defined it as: style scan and handling language.
Why use awk?
Even so, you may still ask, why should I use awk?
The first reason for using aw is based on text-based style scanning and processing is our frequent work. Some of the work made by AWK, but different from the database is that it is handled by text files, these files are not specifically Storage formats, ordinary people can edit, read, understand, and handle them. Database files often have special storage formats, which make them need to process them with database handles. Since this type of process similar to the database will often encounter, we should find a simple and easy way to handle them. Unix has many tools, such as SED, GREP, SORT, and Find, AWK is among them. A very good one.
The second reason to use aw is that awk is a simple tool, of course, this is relative to its powerful function. Indeed, Unix has many excellent tools, such as UNIX Natural Development Tools C language and its continuation C is very excellent. But relative to them, AWK has more convenient and simple to complete the same function. This is the first because AWK provides a solution to a variety of needs: from the AWK command to solve the simple problem to the complex and more and intrinsic awk programming language, the advantage is that you can solve the original use of complex methods. Simple problem. For example, you can use a command line to solve a simple problem, and C can't, even if a simple program, the C language must be written, compiled. Second, the awk itself explains the execution, which makes the AWK program do not have to compile the process, and this also makes it possible to match the shell script program. Finally, the AWK itself is simpler, although AWK absorbs a lot of excellent ingredients in C language, familiar with C language has a lot of help, but awk itself does not need to use C language - a powerful but need A large number of time learning can master the development tools of their skills.
The third reason to use aw is that awk is an easy tool. Unlike C and C languages, AWK has only one file (/ bin / awk), and almost every version of Unix provides a version of AWK, you don't have to worry about how to get awk. However, the C language is not the case, although the C language is UNIX natural development tool, but this development tool is released separately. In other words, you must pay for your UNIX version of C language development tool (especially except for D version ), Get and install it, then you can use it. Based on the above reasons, we have the powerful function of AWK. We have reason to say that if you want to handle the work related to text style scanning, the awk should be your first choice. There is a general principle that can be followed here: if you have difficult words with ordinary shell tools or shell script, try awk, if the awk does not solve the problem, use the C language, if the C language still fails, then move to C .
AWK call mode
I have said before, AWK provides different solutions to a variety of needs, they are:
First, the awk command line, you can use awk like using the ordinary unix command, you can also use the awk programming language in the command line, although AWK supports multi-line entry, but enters long command lines and guarantees it correctly It is a distressing thing, so this method is generally only used to solve simple problems. Of course, you can also quote the awk command line or even the AWK program script in the Shell Script program.
Second, call the AWK program using the -f option. AWK allows a paragraph AWK program to be written to a text file and then call and execute this program in the AWK command line. Specific methods We will tell them in the back AWK grammar.
Third, using the command interpreter to call the AWK program: Using the Unix support command interpreter function, we can write a paragraph to text files and then add:
#! / bin / awk -f
And give this text file to execute the permissions. After doing this, you can call and execute this AWK program in the command line.
$ awk script text name to be processed
AWK grammar:
Like other UNIX commands, AWK has its own syntax:
AWK [-f re] [parameter ...] ['prog'] [-f progfile] [in_file ...]
Parameter Description:
-F RE: Allows AWK to change its field separator.
Parameter: This parameter assists for different variables.
'Prog': AWK's program segment. This segment must be enclosed in single-extension: 'and' to prevent it from the shell. The standard form of this sentence segment is:
'Pattern {anction}'
The Pattern parameter can be any of the EGREP regular expression, which can be composed of syntax / RE / plus some style matching skills. Similar to SED, you can also use, separate two patterns to select a range. With regard to the details of the match, you can refer to Appendix, if you still don't understand, find this UNIX book learning GREP and SED (I have a matching technology when learning ED). The Action parameter is always surrounded by braces, which consists of a system AWK statement, and is used between statements; AWK explains them and performs its operations on the records that match the pattern given by Pattern. Similar to Shell, you can also use "#" as an comment, which makes "#" to the end of the row, and it will be ignored when interpreting is executed. You can omit one of Pattern and Action, but it cannot be omitted simultaneously. When Pattern is omitted, there is no style matching, indicating that all rows (records) are executed, omit the Action, and the default operation is issued - displayed on the standard output . -f progfile: Allows AWK to call and perform ProgFile specified with program files. PROGFILE is a text file that he must conform to the AWK syntax.
In_File: AWK input file, AWK allows multiple input files to be processed. It is worth noting that AWK does not modify the input file. If you do not specify an input file, AWK will accept the standard input and display the result on the standard output. AWK supports input and output redirection.
AWK record, fields and built-in variables:
As mentioned earlier, the work of the AWK processing has the same way, one of the same places is that AWK supports processing of records and fields, where the processing of fields is GREP and SED unreal, this is also awk excellent One of the reasons for both. In AWK, the line in the text file is always treated as a record in the default, and some of the line is a field in the record. To operate these different fields, AWK borrows the shell method, sequentially represents the different fields in the row (record) with $ 1, $ 2, $ 3 ... in such a way. Specially, AWK expresses the entire row with $ 0. Different fields are separated by characters called separators. The default separator is spaced. The AWK allows this separator to change this separator in the form of -f RE in the command line. In fact, AWK remembers this separator with a built-in variable FS. There are several such built-in variables in AWK, for example, record the separator variable RS, the current work record number NR, etc., the table later, lists all built-in variables. These built-in variables can be referenced or modified in the AWK program, for example, you can use the NR variable to specify the operating range in the mode match, or by modifying the record separator RS to make a special character rather than a resilive separator.
Example: Display text file Myfile seven lines to the fifteenth row in character% separated first fields, third fields, and seventh fields:
AWK -F% 'NR == 7, NR == 15 {PrintF $ 1 $ 3 $ 7}'
AWK's built-in function
One of the reasons why AWK becomes an excellent programming language is that it absorbs many of the advantages of some excellent programming languages (such as C). One of these advantages is the use of built-in functions, and the awk defines and supports a series of built-in functions. Due to the use of these functions, the features provided by AWK are more complete and powerful, for example, AWK uses a series of string to process built-in Function (these functions look like the string processing function of the C language, the functions in the use of the C language are different from those of these built-in functions, making the AWK processing string more powerful. The built-in functions provided in the appendix in this article have a general AWK, which may have some access to your AWK version, so before using it, it is best to refer to online help in your system. As an example of a built-in function, we will introduce the Printf function of awk here, which makes the AWK match the output of the C language. In fact, there are many reference forms from the AWK to borrow from the C language. If you are familiar with the C language, you may remember the Printf function, which is the powerful format output function that has taken us many of our convenience. Fortunately, we reunite with it in AWK. Printf in AWK is almost the same as the C language. If you are familiar with C language, you can use the Printf in the AWK in the C language mode. So here, we only give an example. If you are not familiar, please find an entry into the C language.
Example: Display the line number and third field in MyFile:
$ awk '{printf% 03D% S / N, NR, $ 1}' Myfile
Use awk in the command line
According to the order, we should explain the contents of the AWK program design, but before explaining, we will use some examples to review the previous knowledge, and these examples are used in the command line, where we can know in the command line. How easy is it to use AWK. On the one hand, the reason for this is the following, on the other hand, the other hand introduces some methods to solve the simple problem, we don't have to use a complex method to solve simple problems --- Since AWK provides simpler Method's words.
Example: Displaying text file MyDoC matches all rows of strings Sun.
$ awk '/ sun / {print}' mydoc
Since the entire record (full line) is the default action of awk, the Action item can be omitted.
$ awk '/ sun /' mydoc
Example: The following is an example of a more complex match:
$ awk '/ [ss] un /, / / [mm] OON / {print}' myfile
It will display the first row between the rows that match the Sun or Sun and the first row that matches the Moon or Moon and display it on the standard output.
Example: The following example shows the use of built-in variables and built-in functions Length ():
$ awk 'Length ($ 0)> 80 {print nr}' myfile
The command line will display a line number of all 80 characters in the text myfile, here, with $ 0 to represent the entire record (line), and the built-in variable NR does not use the flag '$'.
Example: As a more practical example, we assume that you want to securely check the users in UNIX, the method is to examine the Passwd file under / etc, check whether the Passwd field (second field) is *, if not * Then, the user does not set a password, display these username (first field). We can use the following statement:
#awk -f: '$ 2 == {Printf (% s no password! / n, $ 1' / etc / passwd In this example, the field separator of the Passwd file is ":", so, must use -f: come Change the default field separator, which also involves the use of the built-in function PrINTF.
AWK variable
Like other programming languages, the AWK allows variables in the program language. In fact, the functionality of providing variables is the programming language of the programming language, and the programming language that does not provide variables have never seen.
AWK provides two variables, one is the built-in variable built into the AWK. We have already told that it is necessary to focus on that other variables mentioned later are that the built-in variables are not required in the AWK program. $ (Remember the use of NR mentioned earlier). Another variable provided by AWK is a custom variable. AWK allows the user to define and call their own variables in the AWK program statement. Of course, this variable cannot be the same as the built-in variable and other AWK reserves, and references the custom variable in the AWK must be added to the marker $ in front of it. Unlike the C language, the variable is not required to initialize the variables in AWK, and the AWK determines its specific data type according to the form and context of the first appearance in AWK. When the variable type is uncertain, AWK defaults to a string type. Here is a trick: If you want your awk program to know the clear type of the variable you use, you should assign a initial value in the program. In the following example, we will use this trick.
Computing and judgment:
As one of the characteristics of programming language, AWK supports multiple operations, these operations are available in C language: such as , -, *, /,%, etc., at the same time, AWK also supports C Similar to , -, =, - =, = , = -, which is greatly convenient to prepare a AWK program for users who are familiar with C language. As an extension of the calculation function, AWK also provides a series of built-in calculation functions (such as log, sqr, cos, sin, etc.) and some functions for operating (calculating) for strings (such as Length, Substr). and many more). These functions have greatly improved the AWK's calculation.
As part of the conditional transfer instruction, the relationship judgment is the function of each programming language, and AWK is no exception. A variety of tests are allowed in AWK, such as common == (equal to),! = (Not equal),> (greater than), <(less than),> = (greater than or equal),> = (less than or equal), etc., simultaneously, as a pattern match, it is also provided ~ (Match) and! ~ (Do not match) judgment.
As an expansion of the test, the AWK also supports the logical operator:! (Non), && (), || (or) and parentheses () to make multiple judgments, which greatly enhances the function of AWK. The appendix of this article lists the operations, judgments, and operators allowed by AWK.
AWK process control
Process control statements are not missing parts in any programming language. Any good language has some statements that perform process control. The complete process control statement provided by AWK is similar to the C language, which makes us great convenience.
1, Begin and End:
Two special expressions, begin, and end in awk, both can be used in Pattern (refer to the AWK syntax for the previous AWK syntax), providing Begin and End's role to give the program to the initial state and perform some sweep after the end of the program. work. Any operation listed after BeGin (in {}) will execute before the AWK starts scan input, and the operation listed after the end will be executed after the input of the scanner. Therefore, begin is usually used to display variables and preset (initialized) variables, using END to output the final result. Example: The amount of sales in XS in the sales file (assuming sales amount is recorded in the third field):
$ awk
> 'Begin {fs = :; Print statistics sales amount; Total = 0}
> {Print $ 3; Total = Total $ 3;
> END {PrintF sales amount total:%. 2F, Total} 'SX
(Note:> is the second prompt provided by the shell. If you want to wrap in the shell program AWK statement and the AWK language, you need to add a slope on the tail.
Here, BeGin presets internal variable FS (field separator) and custom variable Total, while displaying output lines before scanning. End prints a total assembly after the scan is completed.
2, process control statement
AWK provides a complete process control statement that is similar to the C language. Let's explain one by one below:
2.1, if ... ELSE statement:
format:
IF (expression)
Statement 1
Else
Statement 2
Format 1 can be multiple statements, if you are easy to read, you can read multiple statements to {} in order to facilitate AWK. The AWK branch structure allows nested, formatted:
IF (Expression 1)
{IF (Expression 2)
Statement 1
Else
Statement 2
}
Statement 3
Else {IF (Expression 3)
Statement 4
Else
Statement 5
}
Statement 6
Of course, you may not use such a complex branch structure during the actual operation, just to give its styles.
2.2, while statement
The format is:
While (expression)
Statement
2.3, do-while statement
The format is:
DO
{
Statement
} while (condition judgment statement)
2.4, for statement
The format is:
For (initial expressions; termination conditions; step size expression)
{Statement}
In the WHILE, DO-WHILE and FOR statements of aw, the continue statement is allowed to control the process trend, which is also allowed to use the statement such as EXIT to exit. BREAK interrupts the currently executing loop and jumps to the next statement outside the loop. Continue jumps from the current location to the beginning of the cycle. There are two cases for execution of Exit: When the exit statement is not in End, the exit command in any operation is characterized by the file end, all modes, or operation execution will stop, and the operation in the End mode is executed. EXIT that appears in End will cause the program to terminate.
Example: for
Customized function in awk
Defining and calling a user's own function is a function of almost every advanced language. AWK is no exception, but the original AWK does not provide functionality, and only functions can be added in Nawk or newer AWK versions.
The use of functions contains two parts: the definition of the function is called with the function. The function definition includes a temporary call to the code (function itself) to be executed and passed from the main program code to the function.
The definition method of the AWK function is as follows:
Function function name (parameter table) {
Function body
}
Allows FUNCTION in Gawk to be omitted as FUNC, but other versions of AWK is not allowed. The function name must be a legitimate flag, and the parameter can not be provided in the parameter table (but a pair of parentheses after the function name is still indispensable) or one or more parameters can also be provided. Similar to the C language, the parameters of AWK are also passed through a value. The transfer function in the awk is relatively simple, and its method is similar to the C language, but AWK is more flexible than C language, it does not perform parameter validity check. In other words, when you call the function, you can list more or less parameters than the function expected (function definition), which will be ignored by AWK, instead of the parameters, and the AWK set them by default. A value of 0 or empty strings, which is set, depending on the way of use of the parameters.
There are two ways to return: implicit return and explicit returns. When the awk is executed to the end of the function, it automatically returns to the calling program, which is implicitly returned. If you need to exit the function before the end, you can use the return statement to exit in advance. The method is to use the statement in the function such as RETURN Return value format.
Example: The following example demonstrates the use of functions. In this example, a function named Print_Header is defined. The function calls two parameters filename and PAGENUM, the filename parameter passed the file name currently used, and the PAGENUM parameter is the page number of the current page. The function of this function is to print (display) the file name of the current file, and the page number of the current page. After completing this feature, this function will return to the page number of the next page.
Nawk
> 'Begin {PAGENO = 1; File = filename
> PAGENO = Print_Header (File, Pageno); # Call Function Print_Header
> Printf (current page number is:% D / N, PAGENO);
>}
> # Define Functions Print_Header
> Function Print_Header (filename, Pagenum) {
> PrintF (% s% d / n, filename, PAGENUM);> PAGENUM ; Return Pagenum
>}
>} 'Myfile
Executing this program will display the following:
Myfile 1
The current page number is: 2
AWK Advanced Input Output
1. Read the next record:
AWK's NEXT statement causes the AWK to read the next record and complete the mode match, and then perform the appropriate operation. It usually performs the code in the operation with a matching mode. NEXT causes any additional matching mode of this record to be ignored.
2. Simply read a record
AWK's getLine statement is used to simply read a record. If the user has a data record similar to two physical records, GetLine will be especially useful. It completes the separation of the general field (set field variable $ 0 FNR NF NR). If success, return 1. If it fails, it returns 0 (reaching the file end). If you need to simply read a file, you can write the following code:
Example: Example Getline
{while (getLine == 1)
{
#process the inputted fields
}
}
You can also save GetLine in a field instead of processing a general field by using GetLine Variable. When this method is used, NF is set to 0, the FNR, and NR are added.
Users can also enter data from a given file using the getLine Example: Example Accepts Enter from UNIX Commands {While (Who -u | getLine) { #process Each Line from the who command } } Of course, you can also use the following form: Command | Getline Variable 3. Close the file: AWK is allowed to close an input or output file in the program, and the method is to use the AWK's Close statement. Close (filename) FileName can be a file opened by getLine (or STDIN, a variable containing the file name or the exact command used by getLine). Or an output file (which can be stdout, a variable containing the file name or the exact command to use the pipeline). 4. Output to a file: The AWK is allowed to output the result to a file as follows: Printf (Hello Word! / N)> DataFile or Printf (Hello Word! / N) >> DataFile 5. Output to a command The AWK is allowed to output the result to a command with the following manner: Printf (Hello Word! / N) | Sort-T ',' AWK and Shell Script Mixed Programming Because AWK can be used as a shell command, the AWK can be combined with the shell batch program, which provides a possibility that the AWK and the shell program is provided. The key to achieving the mixed program is a dialog between awk and shell script. In other words, the information exchange between awk and shell script: awk Gets the required information from the shell script (usually the value of the variable), executing in the AWK The shell command line, the shell script sends the results executed by the command to the AWK process and the SHELL Script read the execution result of the AWK. 1.Awk reads the shell script program variable In AWK we can read the variables in the Sell Scrpit program through the "'$ Variable Name". Example: In the example below, we will read the variable Name in the Sell Scrpit program, which is stored in the text MyFile, and the AWK will print out. $ Cat WriteName : # @ (#) # . . . Name = Zhang three nawk 'begin {name =' name '; / printf (/ t% s / t writer% S / N, FILENAME, NAME);} / {...} end {...} 'myfile . . . 2. Send the execution result of the shell command to the AWK process As a method of information transmission, we can pass the results of a shell command to the AWK process through the pipeline line (|): Example: Example AWK handles the result of the shell command $ WHO -U | awk '{Printf (% s is executing% S / N, $ 2, $ 1)}' This command prints the program name that the registered terminal is being executed. 3. SHELL SCRIPT program read awk execution results In order to implement the result of the SHELL Script program to read the AWK execution, we can take some special methods, for example, we can put the result of the AWK in the form of a variable name = `awk statement into a shell script variable. Of course, you can also pass the AWK execution to the Shell Script program processing by means of a pipeline. Example: As one of the mechanisms of transferring messages, UNIX provides a command Wall that is sent to all of the user transfer messages (meaning Write to ALL writes to all users), which allows the message to be sent to the user (terminal) in all works. To do this, we can simulate this program through a shell batch program Wall.Shell (in fact, Wall in the old version is a SHELL batch program: $ Cat Wall.Shell : # @ (#) Wall.shell: Send a message to each registered terminal # CAT> / TMP / $$ # 用户 消 消 文 文 | | awk '{print $ 2}' | While Read TTY DO CAT / TMP / $$> $ TTY DONE In this program, the AWK accepts the execution result of the WHO -U command, which prints out all the registered terminals, where the second field is the device name of the registered terminal, so use the awk command to precipitate the name, then use The While Read TTY statement loops reads these files named variables (Shell Script Variables) TTY as the end address of the information transfer. 4. Execute the shell command line in awk ---- embedding function system () System () is an embedded function that is not suitable for characters or digital types, which is the string that is processed as a parameter to it. The processing of this parameter is to process it as a command, that is, executes it as a command line. This allows the user to flexibly execute commands or scripts when they need their own AWK programs. Example: The following program will print a user-prepared report file using the System embedded function, which is stored in a file called myReport.txt. For the sake of simplicity, we only list their End section: . . . End {close (myreport.txt); system (lp myreport.txt);} In this example, we first use the Close statement to close the file myreport.txt file, then use the System embedding function to transfer myReport.txt to the printer print. Writing here, I have to say goodbye to my friends, I really said that these content is still the preliminary knowledge of awk, the computer is always the science, AWK is no exception, this article can do just in front of you On the long way to pave the small opening, the remaining road has to rely on yourself. Honestly, if this article can bring a little convenient to your way forward, then it is full of content! If you have any questions about this article, please leave a message in e-mail to: chizlong@yeah.net or to the home page http://chizling.yeah.net. appendix: 1.Awk's routine expression meta / Croam sequence ^ Match the beginning of the string $ Starts with the end of the string . Match with any single string [ABC] Matching any of [] [A-CA-C] Matching characters in the range of A-C and A-C (in alphabetical order) [^ ABC] Matching any of the characters other than all characters in [] Desk | Chair matches any one in DESK and CHAIR [ABC] [DEF] is associated. Match with any of A, B, and C, and thereafter, it is necessary to follow any of the characters in D, E, and F. * Compared with any one of A, B or C, a character matching any one of A, B or C appears 1 or more characters in any one or more times. ? Match with an empty string or A, b or c in any character (Blue | Black) Berry combines conventional expressions, matches Blueberry or BlackBerry 2.awk arithmetic operator Operator ------------------ X ^ y x Y power X ** y X% y calculates the remainder of X / Y (in mode) X Y X plus Y X-Y X reduction y x * y x multiplied Y X / Y X in addition y -y negative y (Y's switch symbol); also known as a clear y y plus 1 after using Y (front plus) Y uses the Y value to add 1 (suffix plus) --y y is reduced after 1 after use Y (pre-reduction) Y - after use, Y minus 1 (suffix) X = Y assigns Y's value to X X = y assumes the value of x y to X X- = Y assumes the value of X-Y to X X * = y assumes the value of x * y to X X / = y assigns x x% = y to assure x% y's value to X X ^ = y assumes the value of x ^ y to X X ** = y assigns the value of x ** y to X 3. AWK Allowed Test: Operator meaning X == y x equal to Y X! = y x does not equal Y x> y x is greater than y X> = y x is greater than or equal to Y x x <= y x less than or equal to Y? X ~ RE X matches the regular expression RE? X! ~ RE X does not match the regular expression RE? 4.Awk's operator (arranged as a priority) =, =, - =, * =, / =,% = || &&&& >> = <<= ==! = ~! ~ XY (string of strings, 'x''n' changes into XY) - * /% - 5.awk built-in variable (predefined variable) Note: The V item represents the first tool (below) of the first support variable: a = awk, n = nawk, p = pOSIX awk, g = Gawk V variable meaning default -------------------------------------------------- -------- N argc command line parameters number G argind currently processed Argv flag N argv command line parameter array G CONVFMT digital conversion format% .6g P Environ UNIX environment variable N Errno UNIX system error message GfieldWidths Enter a blank separation string of field width A FileName The name of the current input file P fnr current record number A FS input field separator space G ignorecase control sensitive 0 (case sensitive) A nf current record in the current record A NR has read the number of records A OFMT digital output format .6g A OFS output field separator space A new line of record separator output by A ORS output A RS entered recorded in his jam N RSTART is the string of the string of matching functions N rlength is matched by a string length N subsep subscript separator / 034 6.awk's built-in function V function use or return value ------------------------------------------------ N GSUB (REG, STRING, TARGET) replaces String in Target each time regular expression REG matches N index (Search, String) Returns the position of the SEARCH string in STRING String String string string String characters N Match (string, reg) Returns location in string, regular expression REG matches N printf (Format, Variable) Format Output, pressing the format Variable by Format. N split (string, store, delim) is decomposed as a Store array element based on dividing character Delim. N sprintf (Format, variable) Returns a data containing Format-based formatted data, and variables is data to be placed in strings. G struntime (format, timestamp) Returns a Format-based date or time string, TimeStmp is the time returned by the system () function N sub (reg, string, target) for the first time when a regular expression REG matches, replacing the string in the Target string A SUBSTR (String, Position, LEN) Returns a substring that starts LEN characters in POSITION P Totower (String) Returns the corresponding lowercase characters in the string P TouPper (String) Returns the corresponding uppercase characters in the string The remainder of A Atan (X, Y) X (radians) N cos (x) x cosine (radians) X Power A Exp (x) E A INT (X) X integer part A log (x) x natural logarithmic value Random number between n rand () 0-1 N sin (x) x sine (curvature) A square root of A SQRT (X) X A SRAND (X) initializes the random number generator. If you ignore X, use system () G system () returns the time since January 1, 1970 (calculated by second)