Use Lex and Yacc to solve address books (1)
Foreword
There are not many good examples on the use of Lex and YACC constructive textbooks. Many tutorials simply mentioned the two tools of Lex and Yacc when lecture the lexical analysis and grammatical analysis, and there are many Domestic university textbooks do not mention only Lex and Yacc. In fact, LEX and YACC are not only developed to construct compilation systems. This section explains the use of LEX and YACC through a simple problem of extracting information recording information.
Extract information
A few days ago, I received a friend asked how to extract the character's name and phone number in a address book through the lexical, language analysis. I changed the problem, roughly as follows:
I have a notebook, the information inside is the communication record information generated by the telephone. Saved in the record.txt file in the text format. The information inside is composed in the following manner.
--------- 2004.1.10 ---------- Name: Jeclee
Tel: 05513606124 --------- 2004.1.11 ----------
Name: Wangan
Tel: 075528979205
...
Now I am going to build a database system, you need to enter the name and phone number of the person who will call me with my phone. Then I need to consider extracting useful information from the recording format file generated by this phone. Of course, there is a lot of ways to solve, but this section, we will explore two tools that use Lex and Yacc, very convenient constructive syntax analyzer, information in it.
Looking for two tools for Lex and Yacc
Perhaps you feel that the principle of compilation is to solve this problem is too much trouble, but when we have Lex and Yacc, complex processing will be simplified. The two things of Lex and Yacc were originally two tools under UNIX. Generally, you need to use the Windows operating system to find it online. I am using Flex.exe and Bison.exe in Cygwin. Bison.exe is YACC. Cygwin is a tool that simulates UNIX on a Windows platform. Everyone can go to the next cygwin.
Vocabulary analyzer input file
Regarding the issue of regular expressions, I have mentioned in the previous article in this series, and the detailed explanation please refer to the compilation principle textbook.
Here I will give some basic expression of the basic lexics, they are almost almost in each lexical input file.
Digit [0-9]
Number {DIGIT}
Letter [A-ZA-Z_]
Identifier ({letter} | _) ({Number} | {letter} |
NEWLINE [/ N] | [/ R] [/ n]
Whitespace [/ t]
There is also a marker head in the record file in the telephone --------- 2004.1.10 ---------- "We didn't think about it. The regular expression of the markup is very simple, that is, "-" and the combination of numbers and points, then it can be easily written under its regular expression.
Begin [-] ({Number} [.]) [-]
The "-] represents" - "symbols here, and Number has already given, for any integer. And comma". "So ({Number} [.]) Means record The date information in the head, but we don't need to know the date information, so there is no need to extract it separately, but it can be buried in a simple regular expression.
Ok, put these regular expressions into a name for Record.l.
The entire telephone record uses a fixed formal form. Then the writing of the grammatic input file is relatively simple. After the expression of the lexical analysis, we have finished halfway.
2003-1-13
Author: Tang Liang tangl_99
QQ: 8664220msn: TANGL_99@hotmail.com
Email: tangl_99@sohu.com
Chengdu, Sichuan University, Computer College