Use Lex to analyze Java source programs
General compilers delete the useless symbols and comments in the source program during the process of lexical analysis. Although this step is not complicated, you can implement it directly with C language, but if you use the lexical analysis program to generate tool Lex, it can be more convenient.
Lex is a lexical analysis program developed by the American Bell Laboratory C language
. Its basic principle is to use regular expressions to scan the text and define some operations for each matching mode. These operations are implemented by the C language when generic language is used in the C language.
A matching regular expression may contain a related action. This action may also include returning a tag. When Lex receives input of a file or text, it tries to match text to the regular expression. It reads an input character once until a matching mode is found. If you can find a matching mode, Lex performs the relevant action (possibly, it may include returning a tag). On the other hand, if there is no regular expression that can be matched, further processing will be stopped, and the Lex will display an error message.
LEX and C are strong coupled. One .l file (LEX file has .L extension) passed through the LEX utility and generates a C's output file. These files are compiled into executable versions of the lexical analyzer.
This program analyzes the Java source program, mainly implementing the following two features:
(1) Clear the comment. The Java source program has three annotations: 1, single-line comments, start until the beginning of the line; 2, multi-line comments, start, * / is end, you can comment multiple lines; 3, Java document comments, this Also a multi-line comment, but it can write to the Java program documentation through the Java Document Generation Tool. It starts with / **, * / is the end.
(2) Calculate the workload through the number of lines.
(3), calculate the number of classes in the program, and determine if there are two public classes, if there is an error: there is an error: One Java File Cannot Includes Two Public Class.
Clear of a single line of comments. Since the single-line comment is ended until the end, the first thing to match is //, then clear all the characters ending from the match to the line. The specific implementation is as follows:
"//" {
INT C;
While ((c = INPUT ())! = '/ n' &&
C! = EOF)
{
;
}
Code = add (code, '/ n');
}
Clear of multi-line comments. There are two kinds in multi-line comments, one is a general multi-line comment, the other is a Java document comment. Both comments are * / end, and ordinary multi-line comments start with / *, and the Java document comment begins with / **. You can match / *, then search for * /. To distinguish these two comments, you must see if you are following a * character, if not, it is a normal multi-line annotation; if you want to see if it is / characters, if it is also a normal comment, if not The Java document comment comment. The specific LEX program is implemented as follows:
"/ *" {
INT C, CT = 0;
Char * javadoc = "/ * there is a java doc comment * /";
FOR (;;)
{
While ((c = INPUT ())! = '*' &&
C! = EOF)
{
CT ;
}
IF (c == '*')
{
C = INPUT ();
IF (c == '/') {
CT = 0;
Break; / * Found the end * /
}
Else
{
IF (CT == 0)
Code = STRCAT (CODE, JAVADOC);
}
}
IF (c == EOF)
{
Printf ("EOF in Comment");
Break;
}
}
}
MY.L is the LEX program. Enter a java source program with a comment, then turn into the end flag. You can see that all comments in the output program have been deleted, plus a note in places containing Java document comments: / * There is a java doc comment * /.
After careful study, it is found that the above implementation process is still excessively dependent on the C language, and does not really play the powerful function of the LEX mode match. Single-line annotation, ordinary multi-line comments, Java document comments can be matched separately by the following mode:
* / n
/// * [^ / * //] * / * //
/// * / * [^ / * //] * / * //
This program also provides the ability to identify class definitions, the matching mode is as follows:
Public [/ n / t] class [/ n / t] [a-za-z] [_ a-za-z0-9] * / [/ n / t] * / {[^ /]} * / }
(public | protected | private) [/ n / t] class [/ n / t] [A-ZA-Z] [_ A-ZA-Z0-9] * / [/ n / t] * / {[ ^ /]} * /}
[/ n / t] * class [/ n / t] [A-ZA-Z] [_ a-za-z0-9] * / [/ n / t] * / {[^ /]} * /}
The perfect LEX program is as follows:
% {
#include
CHAR * CODE = ""
INT CODELINES = 0;
INT classnum = 0;
INT PUBCLASS = 0;
Char * classes [4] = {"" "", "" "" "};
/ * Add a char c to the string code * /
Char * Add (Char * Code, Char C)
{
Char * TEMP;
IF (code == null)
""; "
Temp = (char *) malloc (sizeof (char) * 2);
Temp [0] = C;
Temp [1] = '/ 0';
Temp = STRCAT (CODE, TEMP);
Return Temp;
}
%}
%%
/// * [^ / * //] * / * // code = add (code, '');
/// * / * [^ / * //] * / * // code = strat (code, "/ * there is a java doc comment * // n");
* / n code = add (code, '/ n');
Public [/ n / t] class [/ n / t] [a-za-z] [_ a-za-z0-9] * / [/ n / t] * / {[^ /}] * / } {
Classes [classnum] = (char *) malloc (100);
Classes [classnum] = STRCPY (Classes [classnum], yytext);
Classnum ;
Code = STRCAT (CODE, YYTEXT);
Pubclass ;
}
(public | protected | private) [/ n / t] class [/ n / t] [A-ZA-Z] [_ A-ZA-Z0-9] * / [/ n / t] * / {[ ^ /}] * /} {classes [classnum] = (char *) Malloc (100);
Classes [classnum] = STRCPY (Classes [classnum], yytext);
Classnum ;
Code = STRCAT (CODE, YYTEXT);
}
[/ n / t] * Class [/ n / t] [a-za-z] [_ a-za-z0-9] * / [/ n / t] * / {[^ /}] * /} {
Classes [classnum] = (char *) malloc (100);
Classes [classnum] = STRCPY (Classes [classnum], yytext);
Classnum ;
Code = STRCAT (CODE, YYTEXT);
}
/ n code = add (code, '/ n');
. {
IF (YYTEXT [0] == ';')
Codelines ;
Code = add (code, yytext [0]);
}
%%
YYWRAP ()
{
INT i = 0;
Printf ("/ NBELOW Is The Code without Comment: / N / N");
Printf (Code);
Printf ("/ n / nConclude: / nthis code weights% d lines / n", codelines);
Printf ("This Code Includes% D classes / N", ClassNum);
Printf ("classes: / n");
For (i = 0; i { Printf (Classes [i]); Printf ("/ n"); } IF (Pubclass> 1) Printf ("/ Nthere Is An Error: a Java File Cannot Have Two Public Class / N"); Code = (char *) Malloc (1); CODE [0] = '/ 0'; } Main () { Yylex (); System ("pause"); Return 1; } This program has two other other functions: 1. Determine the number of program works, the number of lines, according to the number of procedures. At the end of the result, the number of rows will be displayed; If there is an error if there is. Note: My.l file is a Lex program before the improvement, and the improved program is saved in my1.l file, and Java.txt contains a Java source program to test. Run lexyy.exe, copy the contents of java.txt paste to the program, add the input end character Ctrl z, then enter the result; or put Java.txt as the parameter of Lexyyy.exe as the parameter of Lexyy.exe under DOS Lexyy.exe can also be.