Sed instance

xiaoxiao2021-03-06  51

Universal Threads - Sed Examples, Part 1 In this article, Daniel Robbins will give you how to use a very powerful (but often forgotted) UNIX stream editor SED. The SED is an ideal tool for editing a file with a batch method or a very effective way to modify an ideal tool for existing files. Picking Editor There are a lot of text editors in UNIX world to choose from. Think about - VI, Emacs and JED and many other tools will be in the mind. We have you have gradually understand and love the editor (and our favorite combination). With a trustworthy editor, we can easily handle any number of management or programming tasks related to UNIX. Although the interactive editor is great, it has its limit. Although its interactive characteristics can become a strong, there is also a shortcomings. Consider the case where you need to perform similar changes to a set of files. You may be able to run your own editor, then manually perform a group of cumbersome, repeated and time-consuming editing tasks. However, there is a better way. Enter SED If you can automate the process of editing files to edit files with a "batch" method, even writing scripts that can be complex and change existing files, that will be too good. Fortunately, for this situation, there is a better way - this better method is called "SED". The SED is a lightweight stream editor that is almost included in all UNIX platforms (including Linux). Sed has a lot of good features. First of all, it is quite small, usually more times more than the scripting language you love. Second, since the SED is a stream editor, it can edit the data received from a standard input such as a pipe. Therefore, there is no need to store the data to be edited to the file on the disk. Because you can easily output the data pipe to the SED, it is easy to use the SED as a powerful shell script. Try to do it with your favorite editor. GNU SED is fortunate to Linux users is that one of the best SED versions happens to GNU SED whose current version is 3.02. Every Linux distribution has (or at least there should be) GNU SED. The reason why GNU SED is not only because it is free to distribute its source code, but also because it happens to have a lot of convenience, time-saving expansion. In addition, the GNU has many restrictions on Early Special versions of SED, such as row length restrictions - GNU can easily handle any length of rows. The latest GNU SED I noticed when studying this article: Several online SED enthusiasts mention GNU Sed 3.02a. Surprisingly, in ftp.gnu.org (for these links, see Refigu) on Sed 3.02a, so I have to find it elsewhere. I found it in the alpha.gnu.org / pub / sed. So I happily downloaded it, compile it, and I found that the latest SED version was 3.02.80 after a few minutes - can find its source code next to Alpha.Gnu.org 3.02A source code. After the GNU SED 3.02.80 is installed, I am completely ready. Alpha.gnu.orgalpha.gnu.org (see Resources) is the location of the new and experimental GNU source code. However, you will also find many excellent, stable source code. For some reason, not many GNU developers have forgotten to move a stable source code to ftp.gnu.org, is their "Beta" period, excellent long (2 years!).

For example, Sed 3.02a has been two years, or even 3.02.80 is also a year, but they still cannot be obtained in ftp.gnu.org when writing this article in August 2000. The correct SED will use GNU Sed 3.02.80 in this series. In the following subsequent article, some (but very few) the most advanced examples will not be used in GNU SED 3.02 or 3.02A. If you are not a GNU SED, then the result may be different. Why don't you spend some time to install GNU SED 3.02.80? In such way, not only can prepare the remaining parts of this series, but also possible to use the best SED. Sed Example SED works by performing an editing operation ("command") specified by the input data ("Command"). The SED is based on rows, so performs commands in each line in order. Then, the SED is written to the standard output (STDOUT), which does not modify any input files. Let us see some examples. There will be some weird, because I want to use them how Sed how to work, not any useful task. However, if you are a new hand, it is very important to understand them. Here is the first example: $ sed -e 'd' / etc / services If you enter this command, you will not be able to get any output. So what happened? In this example, the SED is called with an editing command 'd'. Sed open / etc / service file, read the line into its mode buffer, perform the Edit command ("Delete Row", then print the mode buffer (the buffer is empty). It then repeats these steps on each row behind. This doesn't generate output, because the "D" command removes every line in the mode buffer! In this example, there are still a few things to pay attention to. First, there is no modification / etc / service at all. This is still because the SED is only read on the file specified in the command line, uses it as input - it does not try to modify the file. The second thing to note is that the SED is facing row. The 'd' command is not simply telling the SED to delete all input data. Instead, the SED is entered into the internal buffer called the / etc / service in each line of / etc / services. Once you read the mode buffer, it will execute the 'd' command, then print the contents of the mode buffer (there is no content in this example). I will show you how to use the address range to control which rows apply the command - but if you don't use the address, the command will be applied to all rows. Thirdly, what is important is to enclose the single quotation of the 'd' command. The habit of developing a single quotes to enclose the SED command is a good note, which can disable the shell extension. Another SED example below is an example of using SED from output stream / etc / services file: $ sed -e '1d' / etc / services | more, except for the front of '1', This command is very similar to the first 'd' command. If you guessed '1' referring to the first line, then you guess. Unlike the first example, only 'd' is different, this time the 'd' front of the 'D' has an optional digital address. By using the address, you can tell the SED to edit one or some particular row. The address range is now now, let's take a look at how to specify the address range.

In this example, the SED will delete the output of the first to 10 lines: $ sed -e '1, 10d' / etc / services | more When two addresses are separated by a comma, the SED will apply the back command to from The first address starts to end the second address. In this example, the 'd' command is applied to the first to 10 lines (including these two rows). All other rows are ignored. Address with rule expressions now demonstrate a more useful example. Assume that you want to view the contents of the / etc / services file, but is not interested in viewing the comments included. If you know, you can place an comment in the / etc / service file at the beginning of the '#' character. In order to avoid comments, we hope that the SED is removed in '#' starting. The following is a specific approach: $ sed -e '/ ^ # / d' / etc / services | more tried this example and see what happened. You will notice that the SED successfully completed the expected task. Now let us analyze the situation. To understand the '/ ^ # / d' command, you must first need to analyze it. First, let us remove 'd' - this is the same delete line command us used earlier. The new increase is the '/ ^ # /' section, which is a new rule expression address. Rule expression addresses are always from slant bars. They specify a model that follows the commands after the rule expression address will only be applied to the rows that are just matching the specific pattern. Therefore, '/ ^ # /' is a rule expression. But what do it do? Obviously, the review rule expression is now. Rule Expressions You can use rule expressions to indicate patterns that may be discovered in text. Do you have used '*' characters in the shell command line? This usage is similar to rule expressions, but is not the same. Below is a special character that can be used in the rule expression: Character Description and Right Matching and Row Tail Matching with any of the characters will match all characters in [] and [], all characters in the zero or multiple characters of the previous character. The best way to match the feel of the regular expression may be a few examples. All of these examples will be accepted by the SED as a legal address, which appears on the left side of the command. Below is a few examples: Rule Expression Description /. / Match / Matches / ^ # / ^ # / ^ # / ^ # / ^ # / ^ # / will Any row match / ^ $ / will match all blank lines /} ^ / will match any rows ended over '}' (spaceless) /} * ^ / will have zero or more with the '}' Any row matching / [ABC] / will match / ^ [ABC] / ^ [ABC] / ^ [ABC] / ^ [ABC] / ^ [ABC] / ^ ^ [ABC] / ^ [A ' Any row of started matches these examples, encourage you to try a few. Take some time to familiarize yourself with the rules express, then try a few rule expressions you created yourself. You can use the regexp: $ sed -e '/ regexp / d' / path / to / my / test / file | More this will cause the SED to delete any matching rows. However, by telling SED printing regexp matching and deleting mismatched content, not the opposite method, it will be more beneficial to familiarize the rules expressions. You can do this with the following: $ sed -n -e '/ regexp / p' / path / to / my / test / file | more please note the new '-n' option, this option tells SEDs unless clear request print mode Space, otherwise it will not do this.

You will also notice that we replace the 'd' command with the 'p' command, as you guess, this clearly requires the SED print mode space. In this way, the matching portion will be printed. More about the address is now so far, we have seen the row address, row range address, and regexp address. However, there are more possibilities. We can specify two rules expressions separated by commas, and the SED will start with all the rows that match the first rule expression to match the rows of the second rule expression (including the line). match. For example, the following command will print from the line that contains "begin", and the text block ends with the line containing "end": $ sed -n -e '/ begin /, / end / p' / my / test / file "More If you don't find" begin ", you will not print data. If "Begin" is found, "end" is found in all rows after this, then all follow-up will be printed. This happens because the SED is a streaming characteristic - it does not know if "end" will occur. C Source Code Example If you print the main () function in the C source file, you can enter: $ sed -n -e '/ main [[: space:]] * (/, / ^} / p' sourcefile.c | More this command has two rule expressive '/ main [[: space:]] * (/' and '/ ^} /', and a command 'P'. The first rule expression will follow the back Any number of spaces or tabtons, and string "main" match that start parentheses. This should match the beginning of the general ANSI C main () declaration. In this special rule expression, '[[: Space " :]] 'Character class. This is just a special keyword, which tells SEDs to match Tab or spaces. If you prefer, you can not enter' [[: space:]] ', and enter' [', then space Letter, then -V, then enter the tab key letter and ']' - Control-V tells BASH to insert the "real" tab key instead of executing the command extension. Use '[[: space:]] 'Command class (especially in scripts) will be more clear. Ok, now look at the second regexp.' / ^} 'Will match any'} 'characters that appear in the new row line. If the code is formatted Ok, then this will match the end of the main () function. If the format is not good, it will not match correctly - this is a tricky thing to perform the mode matching task. Because it is in '-n' quiet way So the 'p' command is still a task, that is, clearly telling SED to print the line. Try to run this command to the C source file - it should output the entire main () {} block, including "main () "And the end of '}'. Next, since we have touched basic knowledge, we will speed up the pace in the latter two articles. If you want to see some more rich SED information, please be patient - there is! You may want to see the following SED and rule expressions. Bused is a very powerful and compact text stream editor. In this article, Daniel Robbins gives you how to use SED to perform string replacement, create more Large SED scripts and how to use the SED's additional, insert, and change line commands.

The SED is a very useful (but often forgotten) UNIX stream editor. It is a very ideal tool to edit the file in a batch method or create a shell script in a valid manner. This article is the previous introduction of the seventh article. replace! Let's take a look at one of the most useful commands of the SED, replace the command. Use this command to replace a specific string or matching rule expression with another string. Below is an example of the most basic usage of the command: $ SED -E 'S / FOO / BAR /' MyFile.txt The command above myFile.txt's first "foo '(if any) String 'Bar' Replace, then output the file content to the standard output. Please note that I am talking about the first time, although this is usually not what you want. When performing a string, you usually want to perform global replacement. That is, to replace all the appearances in each row, as shown below: $ SED -E 'S / FOO / BAR / G' MyFile.txt Additional 'g' after the last slash tells SED to perform global replacement . Regarding the 's ///' replacement command, there are other things to know. First, it is a command and is just a command, and there is no address in all the above example. This means that 's ///' can also use the address to control which lines to apply the commands, as shown: $ sed -e '1, 10s / enchantment / entrapment / g' myfile2.txt The phrase 'entrapment' will be replaced with all the phrase 'Enchantment', but only doing this on the first to 10th lines (including these two lines). $ SED -E '/ ^ $ /, / ^ end / s / hills / mountains / g' myfile3.txt This example will replace 'hills' with 'mountains', but only starting from the idlts, to three characters The line ends of 'end' starts (including these two lines). Another thing about 's ///' command is the '/' separator has many replacement options. If a string replacement is being executed, and there are many slashes in the rule expression or replacement string, you can change the separator by specifying a different character after 'S'. For example, the following example will replace all the appearance / usr / local to / usr: $ sed -e 's: / usl / local: / usr: g' mylist.txt In this example, use the colon as a separator. If you need to specify a separator character in the rule expression, you can add a backslash in front of it. Rule Expressions So far, we only perform simple string replacement. Although this is very convenient, we can also match rule expressions. For example, the following sed commands will match the beginning of '<' start, to '>', and include any quantity characteristic phrase. The following example will delete the phrase (replace with an empty string): $ sed -e 's /<. However, due to the unique rules of the rules expressions, it will not work well. What is the reason? When the SED is trying to match the rule expression in the row, it is looking for the longest match in the row.

In my previous SED article, this is not a problem, because we use 'd' and 'p' commands, these commands should always be deleted or printed. However, when using the 's ///' command, it is indeed very different because the entire section of the rule expression will be replaced by the target string, or in this example, deleted. This means that the above example will drop down: this is what i meant. We want to do this, but: this is what i meant. Fortunately, there is a simple way to correct the problem. We do not enter "'<'" and follow the rule expressions of "'>' characters end", just enter a "'<' character back and any number of" Non-'"characters and" " The rule expression of 'character ending. This will match the shortest, not the longest possibility. The new command is as follows: $ SED -E 'S / <[^>] *> // g' MyFile.html In the previous example, '[^>]' specifies "non-'>" characters, after the' * 'Complete this expression to indicate "zero or more non-'> 'characters". Test the command for several HTML files, output their pipes to "more" and then look carefully. More Character Match '[]' Rule Expression Syntax has some additional options. To specify the character range, you can use '-' as long as the character is not in the first or last position, as shown below: '[ax] *' This will match zero or more all of 'A', 'b', 'c' ... 'V', 'w', 'x' characters. Alternatively, you can use the '[: Space:]' character class to match the space. The following is a fairly complete list of available characters: Character class description [: alnum:] alphanumeric [AZ AZ 0-9] [: alpha:] Letter [AZ AZ] [: blank:] space or tab. [: CNTR:] Any control character [: Digit:] [0-9] [: graph:] Any Visual Character (no space) [: Lower:] lowercase [AZ] [: print:] Non-control characters [: punct :] Patch characters [: space:] space [: Upper:] uppercase [AZ] [: xdigit:] Hexadecimal number [0-9 AF] is very advantageous as possible using character classes, because they can be more Good intended to adapt to English Locale (including certain essential stress characters, etc.). Advanced replacement features We have seen how to perform simple or even complex direct replacement, but SED can also do more things. In fact, some or all of the matching rule expressions can be referenced, and these parts can be used to construct replacement strings. As an example, suppose you are replying to a message.

The following example will be added in front of each line "Ralph Said:": $ SED -E 'S /.*/ Ralph Said: & /' Origmsg.txt Output is as follows: Ralph Said: Hiya Jim, Ralph Said: Ralph Said: I SURE LIKE THIS SEDSUFF! RALPH SAID: The '&' characters are used in the replacement string of this example, which tells the SED into the entire matching rule expression. Therefore, any content (the maximum group or whole line of zero or multiple characters in the row in the row) can be inserted into any location in the replacement string, even multiplexes. This is very good, but SED is even more powerful. Those extremely good parentheses 's ///' commands even better than '&', allow us to define the area in the rule expression, and then reference these specific areas in the replacement string. As an example, it is assumed that there is a file containing the following text: Foo Bar Oni Eny Yeeny Larry Curly Moe Jimmy THE Weasel now assumes that you want to write a SED script, this script will replace "Eny MINY" "Victor Eny-Meeny Von" " and many more. To do this, you must first write a rule expression that is separated by spaces and matches three strings. '. *. *. *' Now, insert a cracker with a reverse slash in each of the regions of interest to define the area: '(. *) (. *) (. *)' In addition to defining three The operating principle of the rule expression will be the same as the first rule expression in the logical area referenced by replacing the string. Here is the final script: $ sed -e 's /( (. *) (. *) (. *) / Victor 1-2 von 3 /' MyFile.txt As you can see, by enter 'x' (where X It is the area number starting from 1) to reference each area that is bound by parentheses. Enter the following: Victor Foo-Bar Von ONI VICTOR EENY-MEENY VON MINY VICTOR LARRY-CURLY VON MIN MINY VICTOR LARRY-CURLY VON MOE VICTOR JIMMY-CURLY VON MOE VICTOR JIMMY-THE VON Weasel With the more familiarity of SED, you can spend the minimum force to perform considerable text processing. You may want to use the familiar scripting language to handle this problem - Can you easily implement such a solution with a line of code? The combination uses the ability to enter multiple commands when you start creating more complex SED scripts. There are several ways to do so. First, a semicolon can be used between commands. For example, the following command series uses '=' command and 'p' command, '=' command tells the SED print line number, 'P' command clearly tells SED to print the row (because in '-N' mode). $ SED -N -E '=; p' MyFile.txt Whenever two or more commands are specified, each command is applied to each line of the file in order. In the above example, first apply the '=' command to the first line, then apply the 'p' command. Next, the SED continues to process the second line and repeat the process. Although the semicolon is very convenient, it does not work properly in some cases.

Another replacement method is to use two-E options to specify two different commands: $ sed -n -e '=' -e 'p' myfile.txt However, when using more complex additional and inserted commands And even multiple '-e' options can not help us. For complex multi-line scripts, the best way is to put the command in a separate file. Then, use the -f option to reference this script file: $ SED -N -F myCommands.sed myfile.txt This method may not be too convenient, but always tubercular. Multiple commands of an address are sometimes necessary to specify multiple commands applied to an address. This is especially convenient when performing a number of 's ///' in the change in the source file. To perform multiple commands to an address, enter the SED command in the file, then group these commands using the '{}' character, as shown below: 1,20 {s / [ll] INUX / GNU / Linux / GS / Samba / Samba / GS / POSIX / POSIX / G} The above example will apply three replacement commands to Chain 1 to 20 (including these two rows). You can also use rule expressions or combinations of both: 1, / ^ end / {s / [l] INUX / GNU / Linux / GS / Samba / Samba / GS / POSIX / POSIX / GP} This example will All commands between {} 'are applied to the end of the line from Chain 1, and the row started with the letter "End" ends (if "end" is found in the source file, then the file ends to the end of the file). Addition, insert, and changing rows Since writing a SED script in a separate file, we can use additional, insert, and change the line command. These commands will insert a row after the current line, insert a row before the current row, or replace the current row in the mode space. They can also be used to insert multi-line into the output. Inserting the line command Usage: I this Line Will BE INSERTED Before Each Line If the address is not specified for the command, then it will be applied to each row and produces the following output: this Line Will BE Inserted Before Each Line Line 1 Here this Line Will Be Inserted Before Each Line Line 2 Here This Line Will Be Inserted Before Each Line Line 3 Here this Line Will Be Inserted Before Each Line Line 4 Here If you want to insert multiple rows before the current row, you can add an anti-reverse after the front line. Slash to add additional rows, as follows: i I I I I INSERT this line and this one and this one and, uh, this one Too. Additional command usage, but it will insert a row or multi-line into the mode space After the current line. The usage is as follows: a Insert this Line after Each Line. Thanks! :) On the other hand, the "Change" command will actually replace the current line in the mode space, the usage is as follows: C You're History, Original Line! Muhahaha! Because of the additional, insert, and changing line commands need to enter multiple lines, they will be entered into a text SED script, and then tell SED to execute them by using the '-f' option. There is a problem with other methods to pass commands to SED.

Next article, it is also the last article of this SED series, I will demonstrate a lot of excellent instances using SED to complete different types of tasks. I will not only show your script, but also shows why it is. Once you have finished, you will master more knowledge about how to use SED in different projects. see you then! In this Sing-series summary article, Daniel Robbins takes you to experience the true power of the SED. After introducing several important SED scripts, he will demonstrate some basic SED scripts by converting a Quicken .QIF file into a readable text format. The conversion script is not only practical, but also exhibits excellent examples of the SED script. Strong SED in the second SED article, I have some examples to demonstrate the working principle of the SED, but they have few things that actually do special use. In this final article of this SED series, I want to change that way and use SED to do practical things. I will show you a few examples, they not only demonstrate the ability of the SED, but also do some truly ingenious (and convenient). For example, in the second half of this article, you will show you how to design a SED script to convert the .qif file from the Intuit's Quicken financial program to a text file with a good format. Before doing, we will look at the SED script that is not complex but very useful. Text Conversion The first actual script converts the Unix style text into a DOS / Windows format. You may know that DOS / Windows-based text files have a CR (Enter) and LF (Removal) at the end of each row, and UNIX text has only one wrap. Sometimes you may need to move some UNIX text to the Windows system, which will perform the required format conversion. $ SED -E 'S / $ / /' MyUnix.txt> Mydos.txt In this script, '$' rule expression will match the end of the row, and '' tells the SED to insert a carriage return before it. Insert the carriage return before the wrap, immediately, each line ends with CR / LF. Note that only CR replacement only when using GNU SED 3.02.80 or later version. If you haven't installed GNU Sed 3.02.80, check out how to do this in my first SED article. I can't remember how many times after downloading some example scripts or C code, but I found it is a DOS / Windows format. Although many programs don't care about the DOS / Windows format CR / LF text file, there are several programs who care - the most famous is BASH, as long as they have a bus, it will have problems. The following sed calls will convert the text of the DOS / Windows format into trusted UNIX format: $ sed -e 's /. $/' mydos.txt> myunix.txt This script works very simple: alternative rule expressions Match with the last character of a row, and the character is just a carriage return. We replace it with empty characters to remove it from the output. If you use this script and notice the last character of each row in the output, you specify a text file that is already a UNIX format. There is no need to do it! The following is another convenient small script. As with the "TAC" command included in most Linux releases, the script will reverse the order of the lines in the file.

The name of "TAC" may give people a misleading because "TAC" does not reverse the position of the characters in the line (left and right), but the position (upper and bottom) of the line in the file. Handle the following files with "TAC": Foo Bar Oni will produce the following output: Oni Bar Foo can reach the same purpose with the following SED scripts: $ sed -e '1! G; h; $! D' forward.txt > backward.txt If you log in to the FreeBSD system that happens to the TAC command, it is useful to find that the SED script is useful. Although it is convenient, it is best to know why this script is. Let us discuss it. Reverse explanation First, the script contains three separate SED commands separated by a semicolon: '1! G', 'h' and '$! D'. Now you need to understand the address used for the first and third commands. If the first command is '1G', the 'g' command will only apply the first line. However, there is a '!' Character - the '!' The character ignores the address, ie, 'g' commands will be applied to all rows other than the first line. '$! d' command and the class. If the command is '$ d', only the 'd' command is applied to the last line in the file ('$' address is a simple way to specify the last line). However, after you're! ',' $! D 'will apply the' d 'command to all the rows other than the last line. Now, what we have to understand is what these commands themselves do. When the inverted script is performed on the above text file, the command that is first executed is 'h'. This command tells the SED to copy the contents of the mode space (saving the buffer that is being processed) to the reserved space (temporary buffer). Then, execute the 'd' command, the command removes "foo" from the mode space so that it does not print it after all commands are performed on this line. Now, the second line. After reading the "BAR" to the mode space, execute the 'g' command, which attached the content of the retained space ("foo") to the mode space ("BAR"), enabling the content of the mode space as "Bar Foo". The 'h' command puts the content back to keep the space protection, then the 'D' deletes the row from the mode space so that it does not print it. For the last "ONI" line, in addition to the contents of the mode space (due to '$!' Before 'D') and print the contents of the mode space (three rows) to the standard output, the same steps are repeated. Now, some powerful data conversion is performed with SED. Sed QIF Magic has been in the past few weeks, I always wanted to buy a Quicken to settle my bank account. Quicken is a very good financial procedure, of course, will successfully complete this work. However, after considering, I think I can easily write a software to settle my checkbook.

I think, after all, I am a software developer! I have developed a good small checkbook settlement program (using awk), which calculates the balance by analyzing the syntax of the text files for all my transactions. After slightly adjustment, I will improve it so that you can track different loans and borrowing categories like Quickers. However, I have to add a feature. Recently, I will transfer your account to a bank with an online web account interface. One day, I noticed that this bank's Web site allowed to download my account information in Quicken .QIF format. I immediately think that if you can convert this information into a text format, it is great. The story of two formats before viewing QIF format, let's take a look at my checkbook.txt format: 28 Aug 2000 Food - - Y Supermarket 30.94 25 Aug 2000 WATR - 103 Y Check 103 52.86 In my file, all fields are One or more tabs are separated, and each transaction is occupied. The next field after the date lists the expenditure type (if it is an income item, "-"). The third field lists the type of income (if it is an expenditure, "-"). Then, it is a check number field (if it is empty, or "-"), a transaction completion field ("y" or "n"), a comment and a dollar amount field. Now let's take a look at the QIF format. When you use the text viewer to view the downloaded QIF file, it looks as follows :! Type: Bank D08 / 28/2000 T-8.15 N Pcheckcard Supermarket ^ D08 / 28/2000 T-8.25 N PCheckcard Punjab Restaurant ^ D08 / 28 / 2000 T-17.17 N PCheckCard Supermarket After browsing the file, it is not difficult to guess its format - ignore the first line, the rest of the format is as follows: D T n P ^ This is a field separator) Start processing when processing an important SED project like this, don't be discouraged - SED allows you to gradually modify the data into the final form. In progress, the SED script can be continued until the output is exactly the same as expected. There is no need to ensure that it is completely correct when trying. To start, first create a file called "Qiftrans.sed", then start modifying the data: 1D / ^^ / DS / [[: cntrl:]] // g The first '1d' command deletes the first line, The second command removes the annoying '^' characters from the output. The last line removes any control characters that may exist in the file. Since processing foreign file format, I want to eliminate the risk of any control character in the middle. So far, everything goes well.

Now, add some processing functions to this basic script: 1D / ^^ / DS / [[: cntrl:]] // g / ^ D / {S / ^ D (. *) / 1 OUTY INNY / S / ^ 01 / jan / s / ^ 02 / feb / s / ^ 03 / mAR / S / ^ 04 / APR / S / ^ 05 / May / S / ^ 06 / jun / s / ^ 07 / jul / s / ^ 08 / Aug / S / ^ 09 / SEP / S / ^ 10 / OCT / S / ^ 11 / NOV / S / ^ 12 / DEC / S: ^ (. *) / (. *) / (. *): 2 1 3:} First, add a '/ ^ D /' address so that the SED will only start processing only when the first character 'd' of the QIF data field is encountered. When the SED reads such a row into its mode space, all commands in the curly brackets are performed in order. The first command in the curly brackets will take the following: D08 / 28/2000 transform into: 08/28/2000 OUTY Inny, of course, the current format is not perfect, but it doesn't matter. We will gradually refine the contents of the pattern space during the process. The final effect of the latter 12 lines is to convert the data into three letters, and the last line removes three slashes from the data. Finally, this line is obtained: AUG 28 2000 OUTY OUTY and INNY fields are placeholders and will be replaced later. It is still not possible to determine them, because if the US dollar is negative, Outy and Inny will be set to "MISC" and "-", however, if the US dollar is positive, will change them into "-" and "inco". Since I haven't read the US dollar, it is necessary to temporarily use placeholders. Refining is now further refined: 1D / ^^ / DS / [[: cntrl:]] // g / ^ D / {S / ^ D (. *) / 1 OUTY INNY / S / ^ 01 / Jan / S / ^ 02 / feb / s / ^ 03 / mAR / S / ^ 04 / APR / S / ^ 05 / May / S / ^ 06 / JUN / S / ^ 07 / jul / s / ^ 08 / aug / s / ^ 09 / SEP / S / ^ 10 / OCT / S / ^ 11 / NOV / S / ^ 12 / DEC / S: ^ (. *) / (. *) / (. *): 2 1 3: NNN S / T (. *) N (. *) P (. *) / Num2Num Y 3 AMT1AMT / S / NUMNUM / - / S / NUM ([0-9] *) Num / 1 / s / ([0-9 ]), Seven lines after / 1 /} are somewhat complicated, so they will be discussed in detail. First, use three 'n' commands. The 'n' command tells the SED to read the next line into the input and attach it to the current mode space. These three 'n' commands cause the next three lines to the current mode space buffer, now this line looks like this: 28 Aug 2000 OUTY INNY T-8.15 N PCheckCard Supermarket SED mode space becomes very difficult - need to remove Additional new row and perform some additional formatting. To do this, use alternate commands. The mode to match is: 't. * N. * p. *' This will follow back with 't', zero or multiple characters, new rows, 'n', any quantity, new line, ' P 'and the new row matching any quantity character. Yeah! This rule expression will match all the contents of the three rows that have just been attached to the mode space. But we have to reformat the area instead of replacing it. The US dollar amount, the check number (if any) and description need to appear in the replacement string.

To do this, we have enclose those "partial" with parenthesis with backslash, so that you can reference them in replace strings (using '1', '2 and' 3 'to tell SED to insert them into Where is it). The following is the last command: S / T (. *) N (. *) P (. *) / Num2num y 3 AMT1AMT / This command converts our row to: 28 AUG 2000 OUTY INNY NUMNUM Y CHECKCARD SuperMarket AMT-8.15 Although AMT is becoming better, there are a few things to see ... ah ... interesting. The first is the stupid "numnum" string - what is the purpose? If you look at the afterwards of the SED script, you will find its purpose, the latter line will replace "NUMNUM" to "Num" "Number>" NUM "replaces . If you see, enclose the check number with stupid markers Allow us to insert a "-" in this field. End Try the last line to remove the comma after the number. It converts the US dollar such as "3,231.00" into the format "3231.00" I used. Now let's take a look at the final script: the final "QIF to text" script 1D / ^^ / DS / [[: cntrl:]] // g / ^ D / {S / ^ D (. *) / 1 OUTY Inny / S / ^ 01 / Jan / S / ^ 02 / Feb / S / ^ 03 / MAR / S / ^ 04 / APR / S / ^ 05 / May / S / ^ 06 / JUN / S / ^ 07 / JUL / S / ^ 08 / Aug / S / ^ 09 / SEP / S / ^ 10 / OCT / S / ^ 11 / NOV / S / ^ 12 / DEC / S: ^ (. *) / (. *) / ( *: 2 1 3: NNN S / T (. *) N (. *) P (. *) / Num2Num Y 3 AMT1AMT / S / NUMNUM / - / S / NUM ([0-9] *) NUM / 1 / s / ([0-9]), / 1 / /AMT- [0-9]*. [0-9]*AMT/B Fixnegs S / AMT (. *) AMT / 1 / S / OUTY / - / S / INNY / INCO / B DONE: FIXNEGS S / AMT - (. *) AMT / 1 / S / OUTY / MISC / S / INNY / - /: DONE} Additional eleven rows of replacement and some branches Function to beautify the output. Take a look at this line: /amt-[0-9]*.[0-9]*AMT/B Fixnegs This line contains a format of the branch command for "/ regexp / b label". If the mode space matches the rule expression, the SED will branch to the Fixnegs label. You should easily find the label, it is ": fixnegs" in your code. If the rule expression does not match, continue to process the next command in a regular manner. Since you understand the working principle of the command itself, let's take a look at the branch. If you look at the branch rule expression, you will see it with the '-', any number of numbers, one '.' AMT 'matching with the following.', Any number of numbers, and 'AMT'. Just like I am sure that you have guessed, this rule expression dedicated to the negative dollar amount. Before this, the dollar amount was enclosed with 'ATM' so that it can be easily found later. Because the rule expression only matches the US dollar that starts with '-', the branch only occurs when it happens to handle the loan.

转载请注明原文地址:https://www.9cbs.com/read-117676.html

New Post(0)