AWK instance, Part 3

xiaoxiao2021-03-06  69

GM: AWK instance, Part 3 string function and ... check book? Daniel Robbins President and CEO, Gentoo Technologies, Inc. 2001 Reprinted from: IBM DeveloperWorks China website

content:

Format Output String Function String Replace Special String Fundamental Functions Code Financial Functions Master Generation Report Upgrade Reference About Authors Reviews

In this summary in the AWK series, Daniel introduces you an important string function to the AWK, and demonstrate how to write a complete checkbook settlement program from scratch. In this process, you will learn how to write your own functions and use the AWK's multi-dimensional array. After learning this article, you will master more AWK experiences, allowing you to create a more powerful script.

Formatting Output Although most cases the AWK's print statement can complete the task, but sometimes we need more. In those cases, AWK offers two old friends Printf () and sprintf () we are familiar with. Yes, like many other AWK components, these functions are equivalent to the corresponding C language function. Printf () prints the formatted string to stdout, and sprintf () returns a format string that can assign a variable. If you are not familiar with Printf () and sprintf (), you can introduce the C language to allow you to quickly understand the two basic print functions. On the Linux system, you can enter "Man 3 Printf" to view the Printf () Help page. Here are some sample code for some awk sprintf () and printf (). It can be seen that they are almost identical to the C language.

X = 1 b = "foo" printf ("% s got a% D on the last test / n", "jim", 83) myout = ("% S-% D", B, X) Print myout this code Will print:

Jim Got A 83 on the last test foo-1

String Functions AWK has many string functions, this is a good thing. In AWK, you do need a string function, as you can't see the string as a character array like it is in other languages ​​(such as C, C, C and Python). For example, if you do the following code:

MyString = "How are you doing today?" Print MyString [3]

An error will be received as follows:

AWK: STRING.GAWK: 59: FATAL: Attempt to use scalar as array 噢, ok. Although it is convenient as Python's sequence type, the AWK string function can still complete the task. Let's take a look. First, there is a basic length () function that returns the length of the string. The following is its method of use:

Print Length (MyString) This code will print the value:

twenty four

Ok, continue. The next string function is called index, which will return the sub-string where the position appears in another string, if the string is not found, returns 0. With MyString, you can call it as follows:

Print Index (MyString, "You") AWK will print: 9 Let's continue to discuss additional two simple functions, tolower () and TouPper (). Like you guess, these two functions will return a string and convert all characters to lowercase or uppercase. Note that tolower () and TouPper () return new strings without modifying the original string. This code:

Print Tolower (MyString) Print MyString ... The following output will be generated:

How are you doing today? how are you doing today? How do we do everything now, but how do we select a subster in a string, or even a single character? That is why SubStr (). The following is a call method for substr ():

MySUB = SUBSTR (MyString, StartPos, Maxlen) MyString should be a string variable or text string to extract the substrings from it. StartPOS should be set to the start character position, and Maxlen should contain the maximum length of the string to extract. Please note that I am talking about the maximum length; if Length (MyString) is shorter than StartPOS Maxlen, the resulting result will be truncated. Substr () does not modify the raw string, but returns a substring. The following is an example:

Print Substr (MyString, 9, 3) AWK will print:

You are usually used for programming using an array subscript to access some strings (and those who do not use this language), remember that substr () is awk instead of the method. It is necessary to use a single character and substring; because AWK is a string-based language, it will be used frequently. Now, we discuss some more intriguing functions, first of all Match (). Match () is very similar to index (), and its difference from index () is that it does not search for the substring, which searches for rule expressions. The match () function returns a matching start position, and 0 is returned if no match is found. In addition, match () will also set two variables called RSTART and RLLENGTH. RStart contains the return value (the first matching position), and RLength specifies the character span it occupies (returns -1) if no match is found. Each match in the string can be easily iterated by using Rstart, Rlength, SubStr () and a small loop. The following is a Match () call example:

Print match (mystring, / you /), Rstart, Renstength AWK will print:

9 9 3 String Now, we will study two string replacement functions, SUB (), and GSUB (). These functions are slightly different from the functions currently discussed because they do modify the original string. The following is a template showing Sub (): Sub (Regexp, Replstring, MyString) When you call Sub (), it matches the first character sequence of Regexp in MyString, and replaces this sequence with Replstring. Sub () and gsub () use the same argument; the only difference is SUB () will replace the first regexp match (if any), GSUB () will perform global replacement, revealed all match in the string . The following is an SUB () and GSUB () call example:

SUB (/ O /, "O", MyString) Print MyString MyString = "How are you doing Today?" GSUB (/ O /, "O", mystring) print mystring must reset myString to its initial value, because the first Sub () calls directly modified myString. This code will make the AWK output when executed:

How are you doing today? How are you doing today? Of course, it can also be a more complex rule expression. I left the task of testing some complex rules expressions to you. By introducing functions split (), let's summarize the functions that have been discussed. Split () tasks are "cut" strings and place each part into an array using an integer subscript. The following is an example of a split () call:

NumeLements = Split ("Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, DEC", MyMonths, ",") When Split (), the first self-variable contains it to be cut Open writing string or string variable. In the second argument, the split () should be specified to fill the array name of the fragment portion. In the third element, the separator used to cut the string is specified. Split () When it returns, it will return the number of split string elements. Split () assigns each fragment to the array starting from 1, so the following code:

Print mymonths [1], mymonths [numerLeference] ... will print:

JAN DEC Special String Form Short Note - When you call Length (), Sub () or GSUB (), you can remove the last self-variable so that the AWK will call the $ 0 (entire current line). To print the length of each line in the file, use the following AWK script:

{Print Length ()} I have a few weeks ago, I decided to write my own check book settlement program with awk. I decided to use a simple Tab to deliver text files in order to enter the nearest deposit and withdrawal record. Its idea is to hand over this data to the awk script, which will automatically all amounts and tell me the balance. The following is how I decided to record all transactions into "ASCII Checkbook": 23 AUG 2000 Food - - Y Jimmy's Buffet 30.25 This file is separated by one or more Tabs. After the date (fields 1, $ 1), there are two fields called "fees classification" and "income classification". With this behavior, when entering the fee, I put the alias of the four letters in the cost field, put "-" (blank item) in the income field. This means that this particular item is "food cost." :) The following is an example of deposit:

23 AUG 2000 - INCO - Y BOSS Man 2001.00 In this example, I put "-" (blank) in the fee classification, put "inco" in the income classification account. "inco" is an alias that generally (salary.). Using a classification account name, I can generate a detailed classification account for income and fees by category. As for the rest of the record, all other fields do not need to be explained. "Whether to pay?" Field ("y" or "n") records if the transaction has been posted to my account; in addition to this, there is a transaction description, and a positive US dollar amount. The algorithm used to calculate the current balance is not too difficult. AWK only needs to read each line sequentially. If the fee classification account is listed, there is no income classification ("-"), then this is the debit. If the income classification account is listed, there is no fee classification ("-"), then this is the credit. Moreover, if the cost and income classification account are listed, then this amount is "classified account transfer"; ie, from the cost classification account to the US dollar, and add this amount to the income classification account. In addition, all of these classification accounts are virtual, but it is very useful for tracking income and spending and budget.

The code is now the research code. We will start from the first line (Begin block and function): Balance, Part 1

#! / usr / bin / env awk -f begin {fs = "/ t " MONTHS = "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"} Function Monthdigit (MyMonth) {Return (Index (Months, MyMonth) 3) / 4}

First, execute the "chmod x myscript" command, add the first line "#! ..." to any AWK script will make it execute directly from the shell. The remaining line defines the BEGIN block, and the code block will be executed before the AWK starts processing the checkbook file. We set fs (field separators) to "/ T ", which will tell the awk field separated by one or more TABs. In addition, we define string months, which will appear below will use it. The last three lines show how to define their own awk. The format is simple - enter "Function", then enter the name, and then enter the parameters separated by comma in parentheses. After that, the "{}" code block contains the code you want this function to execute. All functions can access global variables (such as MONTHS variables). In addition, AWK provides an "return" statement that allows the function to return a value and perform an operation similar to "Return" in C and other languages. This particular function will convert the month name represented by the month name in 3 alphanumeric string formats to an equivalent value. For example, the following code: Print Monthdigit ("Mar") ... will print:

3 Now let's discuss some other functions.

Financial functions The following is the other three functions of the bookkeeping. The primary sector we will see will call one of these functions, and process each row of the checkbook file in order to record the corresponding transactions into the AWK array. There are three basic transactions, doexpenses (DONCOENSE) and Dotransfer. You will find that these three functions are all accepted, called Mybalance. Mybalance is a placeholder of a two-dimensional array, and we use it as an argument. At present, we have not processed two-dimensional arrays; however, you can see below, the syntax is very simple. It is only necessary to separate each dimension with a comma. We will record the information to "MyBalance" as follows. The first dimension of the array ranges from 0 to 12 for the specified month, and 0 represents the whole year. The second dimension is a classification account of four letters, such as "Food" or "inco"; this is the true classified account we handled. Therefore, to find the balance of the annual food classification account, you should check MyBalance [0, "Food"]. To find June income, you should check MyBalance [6, "INCO"]. BALANCE, Part 2

function doincome (mybalance) {mybalance [curmonth, $ 3] = amount mybalance [0, $ 3] = amount} function doexpense (mybalance) {mybalance [curmonth, $ 2] - = amount mybalance [0, $ 2] - = amount} Function Dotransfer (MyBalance) {Mybalance [0, $ 2] - = Amount Mybalance [Curmonth, $ 2] - = Amount Mybalance [0, $ 3] = Amount MyBalance [Curmonth, $ 3] = Amount} Call doincome () or any other When the function, we record the transaction to two locations - MyBalance [0, Category] and MyBalance [CURMONTH, CATEGORY], which represent the annual classification balance and the classified account balance of the previous month. This allows us to easily generate annual or monthly revenue / expenditial classification accounts later. If these functions are studied, they will find the array of MyBalance references in my reference. In addition, we also quoted several global variables: Curmonth, which saved the value of the current record, $ 2 (fee classification), $ 3 (income classification) and amount ($ 7, US dollar). All such variables have been properly set up when DoinCome () and other functions are called.

The main block is the main block, which contains the code to analyze each line input data. Keep in mind that since the FS is set correctly, you can use the first field with $ 1, reference the second field with $ 2, and push it accordingly. These functions can access Curmonth, $ 2, $ 3, and the current value of Curmonth, $ 2, $ 3 and amount from the function. Please study the code first, you can see my instructions after the code. Balance, Part 3

{Curmonth = monthdigit (Substr ($ 1,4,3)) Amount = $ 7 # genecord all the categories Encountered IF ($ 2! = "-") Globcat [$ 2] = "YES" IF ($ 3! = "-") Globcat [$ 3] = "YES" #tally Up The Transaction Properly IF ($ 2 == "-") {IF ($ 3 == ") {Print" Error: inc And Exp Fields Are Both Blank! "EXIT 1} Else {#This INCOME DOINCOME ($ 5 == "Y") Doincome (Balance2)}} else if ($ 3 == "-") {#this is an expense doexpense (Balance) IF ($ 5 == " Y ") DoExpense (Balance2)} else {#this is a transfer dotransfer ($ 5 ==" y ") dotransfer (balance2)}}

In the main block, the front two lines set the curmonth to an integer between 1 and 12 and set the amount into a field 7 (so that the code is easy to understand). Then, it is a four-line interesting code that writes the value into the array of Globcat. Globcat, or a global classification account, used to record all of the classified accounts encountered in the file - "INCO", "MISC", "Food", "Util", etc. For example, if $ 2 == "INCO", Globcat ["inco"] is set to "Yes". Later, we can use a simple "X IN Globcat" loop to iterate the list of classified accounts. In the next approximately twenty row, we analyze the fields $ 2 and $ 3 and record the transaction properly. If $ 2 == "-" and $ 3! = "-", it means that we have income, therefore calls doincome (). If it is the opposite situation, doexpense () is called; if $ 2 and $ 3 contain a classified account, call dotransfer (). Each time we pass the "Balance" array to these functions to record appropriate data in these functions. You will also find a few lines of code say "if ($ 5 ==" y "), then record the same transaction into Balance2." What did we do here? You will recall $ 5 contain "y" or "n", and record whether the transaction has been posted to the account. Since we only recorded the transaction to Balance2, Balance2 contains the real account balance, and "Balance" contains all transactions, regardless of whether it has been posted. You can use Balance2 to verify the data item (because it should match the current bank account balance), you can use "Balance" to ensure that there is no overdraft account (because it will consider all the checks that you have not cashed). After generating report main blocks, after each line of records, now we have a comparison, submitting and credit records by classifying and divided by month. Now, in this case the most appropriate approach is to only define the END block of the generated report: Balance, Part 4

End {bal = 0 BAL2 = 0 for (x in globcat) {bal = BAL Balance [0, x] BAL2 = BAL2 Balance2 [0, x]} Printf ("Your Available Funds:% 10.2F / N", BAL) PRINTF ("Your Account Balance:% 10.2F / N", BAL2)}

This report will print a summary, as shown below:

Your Available Funds: 1174.22 Your Account Balance: 2399.33 In the END block, we use the "For (X IN Globcat) structure to iterate each classified account, depending on the transaction settlement of the recorded transaction. In fact, we settle two balances, one is available, the other is the account balance. To execute the program and process your financial data entered in the file "MyCheckBook.txt", put all the above code "Balance", execute "ChMod X Balance", then enter "./balance mycheckbook.txt". The balance script then prints all transactions, prints two line balances summary. Upgrade I use this program's more advanced version to manage my personal and corporate finance. My version (due to space limit can't be covered), the monthly charting of income and fees will be printed, including annual total, net income, and many other content. It even outputs data in HTML format, so I can view it in a web browser. :) If you think this program is useful, I suggest you add these features to this script. You don't have to configure it into any additional information; the required information is already in Balance and Balance2. As long as the upgrade End block is full! I hope you like this series. For more information on awk, please refer to the references listed below.

Reference

Please read the first few articles in the AWK series published by Daniel in DeveloperWorks: AWK instance, Part 1, and Part 2. If you want to optimally book, O'Reilly's Sed & awk, 2ndedition is excellent. Please refer to Comp.lang.awkfaq. It also contains many additional AWK links. Patrick Hartigan's awk tutorial also includes a practical AWK script. Thompson's TawkCompiler compiles the AWK script into a fast binary executable. The available version has a Windows version, OS / 2, DOS version, and UNIX version. The Gnuawk User's Guide can be used for online reference.

About the author Daniel Robbins live in Albuquerque in New Mexico. He is the founder of Gentoo Technologies, Inc. and CEO, Gentoo Linux, and the founder of the PC's advanced Linux and Portage Systems (next-generation transplantation systems for Linux). He is also a collaborator of Macmillan books Caldera OpenLinux Unleashed, SUSE Linux Unleashed and Samba Unleashed. Daniel has an intake of intake in the second grade, then he first contacts the LOGO program language and indulge in the PAC-Man game. This may be the reason why he still serves as the chief graphic designer of Sony Electronic Publishing / Psygnosis. Daniel likes to spend time with his wife Mary and newborn daughter Hadassah. Can contact Daniel via DRobbins@gentoo.org.

转载请注明原文地址:https://www.9cbs.com/read-92768.html

New Post(0)