AWK instance, Part 1

xiaoxiao2021-03-06  74

Universal thread: awk instance, Part 1, a name, a very strange language introduction

Daniel Robbins President and CEO, Gentoo Technologies, Inc. 2000 December

content:

Defend the first awk multiple fields External script begin and END block rules expressions and block expressions and block condition statement numerical variable characters Scenic variables numerous operator field separator field number record number Reference information about the author

AWK is a very good language while having a very strange name. In the first article of this series (three articles), Daniel Robbins will enable you to quickly master the AWK programming skills. With the progress of this series, the higher theme will be discussed, and a real advanced AWK demo will be given.

Defending awk in this series of articles, I will make you a master of proficiency AWK. I admit that AWK did not have a very good name and very fashionable name. The GNU version of AWK (called Gawk) sounds very weird. Those who are not familiar with this language may have heard of "awk" and may think it is a set of outdated and outdated chaos. It will even make the most enlightenment UNIX authority into the desperate edge (let him constantly send "Kill -9!" Command, just like using a coffee machine).

Indeed, AWK does not have an angry name. But it is a great language. AWK is suitable for text handling and report generation, which also has many carefully designed features that allow for special skills to design. Unlike some languages, the grammar of awk is more common. It draws on some of the essences of certain languages, such as C language, Python and Bash (although in technology, awkity is more early as Python and Bash.). AWK is the language of the main part of your strategic coding library once you have learned.

The first awk lets us continue to use awk to understand its working principle. Enter the following command in the command line:

$ awk '{print}' / etc / passwd

You will see the contents of the / etc / passwd file appear in front of you. Now, explain what AWK has done something. When you call awk, we specify / etc / passwd as the input file. When performing AWK, it executes the print command in turn on each line of / etc / passwd. All outputs are sent to stdout, and the resulting results are identical to the execution Catting / etc / passwd.

Now explain the {print} code block. In AWK, the curly brackets are used to combine several code together, which is similar to the C language. There is only one print command in the code block. In AWK, if only the print command appears, the full content of the current line will be printed.

Here is another AWK example, its role is identical to the above example:

$ awk '{print $ 0}' / etc / passwd

In AWK, $ 0 variable represents the entire current line, so PRINT and PRINT $ 0 are exactly the same.

If you prefer, you can create an AWK program that allows it to output data that is completely unrelated to the input data. The following is an example:

$ awk '{print "}' / etc / passwd

As long as the "" string is passed to the print command, it will print blank lines. If you test this script, you will find that every row in the / etc / passwd file, all of the AWK outputs a blank line. Once again, the AWK executes this script for each line in the input file. The following is another example: $ awk '{print "hiya"}' / etc / passwd

Running this script will be filled with Hiya on your screen. :)

Multiple fields AWK is very good at handling text into multiple logical fields, and allows you to reference each independent field in the awk script. The following scripts will print a list of all user accounts on your system:

$ awk -f ":" {print $ 1} '/ etc / passwd

In the above example, when the AWK is called, the -f option is used to specify ":" as a field separator. When the AWK processes the Print $ 1 command, it prints the first field that appears in each row in the input file. The following is another example:

$ awk -f ":" '{Print $ 1 $ 3}' / etc / passwd

The following is an excerpt of the script output:

Halt7

Operator11

root0

Shutdown6

Sync5

bin1

..... ETC.

As you can see, AWK prints the first and third fields of the / etc / passwd file, which is exactly the username and the user identification field, respectively. Now, when the script is run, it is not ideal - there is no space between the two output fields! If you are used to programming with Bash or Python, you will expect the Print $ 1 $ 3 command to insert space between two fields. However, when two strings are adjacent to each other in the AWK program, AWK will connect to them but not add spaces between them. The following command is inserted into the space in these two fields:

$ awk -f ":" '{Print $ 1 "" $ 3}' / etc / passwd

When PRINT is called in this way, it will connect to $ 1, "" and $ 3, create readable outputs. Of course, we can also insert some text tags if needed:

$ awk -f ":" '{print "Username:" $ 1 "/ t / tuid:" $ 3 "}' / etc / passwd

This will produce the following output:

Username: HALT Uid: 7

UserName: Operator Uid: 11

UserName: root uid: 0

Username: shutdown uid: 6

UserName: SYNC Uid: 5

UserName: bin uid: 1

..... ETC.

The external script passes the script as a command line from variables to awk. For small single lines, it is relatively complicated for multiple lines. You will definitely want to write scripts in external files. You can then pass the -f option to the AWK to provide this script file:

$ awk -f myscript.awk myfile.in

Playing a script in a text file or allows you to use an additional awk feature. For example, this multi-line script is the same as the previous single-line script, and they all printed the first field of each line in / etc / passwd:

Begin {

Fs = ":"

}

{Print $ 1}

The difference between these two methods is how to set field separators. In this script, the field separator is specified in the code itself (by setting the FS variable), and in the previous example, the FS is set by passing the -f ":" option on the command line to the AWK. Typically, it is best to set field separators in the script itself, just because this means that you can enter a command line independent variable. We will discuss FS variables in detail later in this article. BEGIN and END blocks are typically, for each input line, AWK executes each script code block once. However, in many programming conditions, you may need to perform the initialization code before the AWK starts processing the text in the input file. For this case, AWK allows you to define a begin block. We use the Begin block in the previous example. Because AWK executes the Begin block before starting the input file, it is an excellent location that initializes the FS (field separator) variable, print header or initialization other in the program, will be referenced later.

AWK also provides another special block called END blocks. AWK performs this block after processing all rows in the input file. Typically, the END block is used to perform summary information that should appear in the end of the output stream.

Rules Expressions and Block AWK allow for rule expressions to select whether to perform a stand-alone code block according to whether the rule expression matches the current row. The following example scripts only output those lines containing the character sequence foo:

/ foo / {print}

Of course, you can use more complex regular expressions. The following scripts will only print the line containing floating point:

/[0-9] /.[0-9]*/ {print}

There are still many other methods of expressions and blocks to select the code block. We can put any Boolean expression before a code block to control when a certain block is executed. The code block is implemented only when the previous Boolean expression is true. The following sample script output will output the first field equal to the third field in all rows of the FRED. If the first field of the current row is not equal to FRED, AWK will continue to process the file without performing a Print statement for the current line:

$ 1 == "fred" {Print $ 3}

AWK provides a complete comparison operator collection, including "==", "<", ">", "<=", "> =" and "! =". In addition, AWK also provides "~" and "! ~" Operators, which represent "match" and "mismatch" respectively. Their usage is to specify variables on the left side of the operator, specify a rule expression on the right. If the fifth field of a certain line contains the character sequence root, then the following example will only print the third field in this line:

$ 5 ~ / root / {Print $ 3}

The conditional statement AWK also provides a very good IF statement similar to the C language. If you prefer, you can use the IF statement to override the previous script:

{

IF ($ 5 ~ / ROOT /) {

PRINT $ 3

}

}

These two scripts have the same functionality. In the first example, the Boolean expression is placed outside the code block. In the second example, the code block will be executed for each input line, and we use the IF statement to select the execution command. Both methods can be used, and one way to best sufficiently suitable for the other parts of the script can be selected.

The following is a more complex AWK IF statement example. It can be seen that although complex, nested conditional statements are used, the IF statement looks like the corresponding C language IF statement:

{

IF ($ 1 == "foo") {

IF ($ 2 == "foo") {Print "UNO"

} else {

Print "One"

}

Else IF ($ 1 == "bar") {

Print "Two"

} else {

Print "Three"

}

}

You can also use the IF statement:

! / Matchme / {Print $ 1 $ 3 $ 4}

Convert to:

{

IF ($ 0! ~ / matchme /) {

PRINT $ 1 $ 3 $ 4

}

}

These two scripts only output those rows that do not contain the MatchMe character sequence. In addition, you can choose the method that best suits your code. Their features are identical.

AWK also allows the use of Boolean operators "||" (logic and "&&" (logic or) to create more complex Boolean expressions:

($ 1 == "foo") && ($ 2 == "bar") {print}

This example only prints one of the lines equal to FOO and the second field is equal to BAR.

Value variable! So far, we are not printing strings, and the tribute is a specific field. However, AWK also allows us to perform integers and floating point operations. By using mathematical expressions, you can easily write scripts in the calculation file in the calculation file. The following is such a script:

Begin {x = 0}

/ ^ $ / {x = x 1}

End {print "i found" x "blank lines. :)}

In the Begin block, the integer variable x is initialized into zero. Then, AWK will perform X = x 1 statement, increment x. After processing all the rows, execute the END block, and the AWK will print the final summary and point out the number of blank line it finds.

One of the advantages of string variable AWK is "Simple and Character Stroke". I think the AWK variable "character string" is because all AWK variables are stored in strings in the internal AWK variable. At the same time, the AWK variable is "simple" because it can perform mathematical operations, and as long as the variable contains a valid numeric string, AWK will automatically process the string to the digital conversion step. To understand my point of view, study the following example:

X = "1.01"

# We just set x to contact the * string * "1.01"

X = x 1

# We just added one to a * string *

Print X

# Incidentally, THESE ARE Comments :)

AWK will output:

2.01

Interest! Although the string value of 1.01 is assigned to the variable X, we can still add it to it. But in Bash and Python can't do this. First, BASH does not support floating point operations. Moreover, if BASH has a "character string" variable, they do not "simple"; to perform any mathematical operation, Bash requires us to put the numbers in the ugly $ ()) structure. If you use Python, you must convert it to floating point values ​​before any mathematical operation is performed on the 1.01 string. Although this is not difficult, it is still an additional step. If you use awk, it is fully automated, and it will make our code and neat. If you want to multiply the first field of each input line, you can use the following script: {print ($ 1 ^ 2) 1}

If you do a small experiment, you can find that if a particular variable does not contain a valid number, the AWK will use the variable as a digital zero when the mathematical expression is evaluated.

Another advantage of many operators awk is that it has a complete math operator collection. In addition to standard plus, minus, multiply, AWK also allows the use of the previously demonstrated index operators "^", model (expensive) operators "%" and many other types of assignment operators borrowed from C language. .

These operators include front and rear or subtraction (I , - foo), plus / subtraction / multiplication / divided assignment (A = 3, B * = 2, C / = 2.2, D- = 6.2). Not only this - we also have an easy to use model / index assignment operator (A ^ = 2, b% = 4).

Field Separator AWK has its own special variable collection. Some of them allow adjustment of the AWK's operating mode, while other variables can be read to collect useful information about the input. We have already contacted one of these special variables, FS. As mentioned earlier, this variable allows you to set the character sequence between the fields that awk you want to find. When we use / etc / passwd as an input, we set fs to ":". When doing this, we can use FS more flexibly.

The FS value is not limited to a single character; can be set to a rule expression by specifying the character mode of any length. If you are processing by one or more Tab, you may want to set fs as follows:

FS = "/ t "

In the above example, we use special " " rules expressions characters, which represent "one or more previous characters".

If the field is separated by spaces (one or more spaces or tabs), you may want to set FS to the following rules:

Fs = "[[: space:] ]"

This assignment expression also has problems, which is not necessary. why? Because of the default, FS is set to a single space character, and AWK explains this to represent "one or more spaces or TABs". In this special example, the default FS setting is exactly what you want!

Complex rules expressions are not problematic. Even if your record is separated by the word "foo", followed by three numbers, the following rules express will still allow the data to be correctly analyzed:

FS = "foo [0-9] [0-9] [0-9]"

The number of fields will then be the two variables we have to discuss usually not need to assign values, but used to read for useful information about the input. The first is an NF variable, also called the "field number" variable. AWK automatically sets the variable to the number of fields in the current record. You can use the NF variable to display only some input rows: NF == 3 {Print "this Particular Record Has Three Fields:" $ 0}

Of course, the NF variables can also be used in the conditional statements, as follows:

{

IF (nf> 2) {

Print $ 1 "" $ 2 ":" $ 3

}

}

The record number record number (NR) is another convenient variable. It always contains the current recorded number (AWK will count the first record as a record number 1). To date, we have handled an input file containing a record. For these cases, NR will also tell you the current line number. However, when we start processing multi-line records in the following sections, we will not have this situation, so pay attention! You can use NR to print some input lines as using NF variables:

(NR <10) || (NR> 100) {Print "We are on record number 1-9 or 101 "}

Another example:

{

#SKIP Header

IF (NR> 10) {

Print "OK, Now for the Real Information!"

}

}

AWK provides additional variables for a variety of purposes. We will discuss these variables in future articles.

It has now reached the end of the AWK's end. As this series is carried out, I will demonstrate more advanced awk features, we will use a real AWK application as the end of this series. At the same time, please refer to the reference listings below if you are eager to learn more.

Reference

If you want to optimally book, O'Reilly's Sed & awk, 2ndedition is excellent. Please refer to Comp.lang.awkfaq. It also contains many additional AWK links. Patrick Hartigan's awk tutorial also includes a practical AWK script. Thompson's TawkCompiler compiles the AWK script into a fast binary executable. The available version has a Windows version, OS / 2, DOS version, and UNIX version. The Gnuawk User's Guide can be used for online reference.

About the author Daniel Robbins live in Albuquerque in New Mexico. He is the founder of Gentoo Technologies, Inc. and CEO, Gentoo Linux, and the founder of the PC's advanced Linux and Portage Systems (next-generation transplantation systems for Linux). He is also a collaborator of Macmillan books Caldera OpenLinux Unleashed, SUSE Linux Unleashed and Samba Unleashed. Daniel has an intravenous end of the computer in some areas in the second grade. At that time, he first contacted the LOGO program language and indulge in the PAC-Man game. This may be the reason why he still serves as the chief graphic designer of Sony Electronic Publishing / Psygnosis. Daniel likes to spend time with his wife Mary and newborn daughter Hadassah. Can contact Daniel via DRobbins@gentoo.org.

转载请注明原文地址:https://www.9cbs.com/read-95230.html

New Post(0)