Foreword: After the last "written a simple C word scan program", the compilation principle experimental class requires a regral expression to turn NFA.
Diagram, this is nothing difficult to algorithm, introduces the Thompson's conversion algorithm in the "Compilation Principle and Practice" book, the book is THMPSON
The algorithm has made detailed text description. If you want to know more about the algorithm, the algorithm is deeper, the mechanical industry publishing house, "Compilation Principles and Practice" translation.
text:
I. Analysis:
Require a regular expression, convert it to NFA, and output the result. There are many ways to transform, using Thompson knot
The conversion method of the structure is the most simple. Thompson Structure: It uses Epsilon-conversion to "paste the regular expressive machine fragment" "
To constitute the entire expression corresponding to the entire expression. It is shown in accordance with the structure defined by the regular expression: an NFA is displayed for each basic regular expression.
Each regular expression is then connected by the operation of the connector. There are four types of regular expressions: connection, selection, closing package
, Positive closure (this program does not support this operation).
Second, the system design:
Overall design: In order to improve the portability of the program, divide it into interface, NFA map, NFA_NODE (NFA diagram point) class, Converte
R (Converter) class and a transition structure.
Program Design: Select C Builder6.0 (BCB6) as a development tool. Converted operation Similar to ordinary four arithmetic, Union operators
To "|", the operator of the Closure is "*", and the Connect connection operation in the expression is expressed as two characters connected EG: AB,
Then need to introduce a symbol to distinguish whether it is a connection operation, introduced "&" as a connection symbol, that is, the upper alteration is A & B, for this purpose, easy to program
. The priority of the three operations from high to low is: *, &, |, and the operation method is completely in accordance with the Thompson structure, so there is this
No squeezing. Refer to ordinary four operations, in the conversion operation of the regular expression, two stacks are introduced, one is an operator stack, the other is
The operation count, and the number of arithmeters is part of the local NFA map. Stack design application template, reduce unnecessary duplicate code, improve design
Abstract degree.
interface design:
http://blog.9cbs.net/images/blog_9cbs_net/zjt621/111594/o_nfainterface.jpg
Third, some of the program source code: (In order to save the switching method, the main algorithm's annotation is used in English, given the poor English,
Mainly written to yourself, if you don't understand your comments, please forgive me. ) / * The following is the event trigger section of the interface * /
/ * Do preprocessing the input string, insert "&" and "=" * / Ansistring __fastcall tform1 :: pretreatstr (ANSISTRING INREGEX) {INT i = 1; char ch, prech; inregex = INREGEX '='; Prech = INREGEX [i ]; ch = INREGEX [I];
While (INREGEX [I]! = '=') {IF ((Isalpha (Prech) && Isalpha (CH) || (isalpha (prech) && ch == '(') ||
(prech == ')' && isalpha (ch)) || (prech == '*' && isalpha (ch) || (prech == '*' && ch == ') {INREGEX.INSERT (' & ', i);} prech = INREGEX [i ]; ch = INREGEX [i];
Return INREGEX;
/ * Traversing NFA Figure * / void __fastcall tform1 :: display (nfa grap) {memo1-> lines-> add ("Convert to NFA results:"); grap.ncurrent = grap.nstart; // set current = Start And Begin Scan Memo1-> Lines-> Add (INTTOSTR (Grap.ncurrent-> StateID) "Start Status"); While (Grap.ncurrent-> nNext! = null) {if (grap.ncurrent-> thead -> TNEXT! = Null) // while have transmit {grap.ncurrent-> tcurrent = grap.ncurrent-> thead-> tnext; while (grap.ncurrent-> tcurrent-> tnext! = Null) {IF (Grap. Ncurrent-> tcurrent-> INCEPT! = '@') {
Memo1-> Lines-> Add (INTTOSTR (Grap.ncurrent-> StateID) "--->" INTSTR (Grap.ncurrent-> tcurrent->
State) "Receive:" Grap.ncurrent-> Tcurrent-> INCEPT);} else {
Memo1-> Lines-> Add (INTTOSTR (Grap.ncurrent-> StateID) "--->" INTSTR (Grap.ncurrent-> tcurrent->
State) "Receive: Epsilon");} Grap.ncurrent-> tcurrent = grap.ncurrent-> tcurrent-> tnext;} if (grap.ncurrent-> tcurrent-> INCEPT! = '@') {MEMO1- > Lines-> add (intep.ncurrent-> stateid) "->" INTOSTR (grap.ncurrent-> tcurrent->
State) "Receive:" Grap.ncurrent-> Tcurrent-> INCEPT);} else {
Memo1-> Lines-> Add (INTTOSTR (Grap.ncurrent-> StateID) "--->" INTSTR (Grap.ncurrent-> tcurrent->
State) "Receive: Epsilon");} Grap.ncurrent-> Tcurrent = grap.ncurrent-> tcurrent-> tnext;} else {memo1-> lines-> add (INTTOSTR (Grap.ncurrent-> StateID) "Termination Status");} grap.ncurrent = grap.ncurrent-> nnext;} // the last nfa_node if (grap.ncurrent-> thead-> tnext! = Null) {grap.ncurrent-> tcurrent = grap.ncurrent -> thead-> tnext; while (grap.ncurrent-> tcurrent-> tnext! = Null) {if (grap.ncurrent-> tcurrent-> INCEPT! = '@') {
Memo1-> Lines-> Add (INTTOSTR (Grap.ncurrent-> StateID) "--->" INTSTR (Grap.ncurrent-> tcurrent->
State) "Receive:" Grap.ncurrent-> Tcurrent-> INCEPT);} else {
Memo1-> Lines-> Add (INTTOSTR (Grap.ncurrent-> StateID) "--->" INTSTR (Grap.ncurrent-> tcurrent->
State) "Receive: Epsilon");} Grap.ncurrent-> tcurrent = grap.ncurrent-> tcurrent-> TNext;} if (grap.ncurrent-> tcurrent-> inspt! = '@') {
Memo1-> Lines-> Add (INTTOSTR (Grap.ncurrent-> StateID) "--->" INTSTR (Grap.ncurrent-> tcurrent->
State) "Receive:" Grap.ncurrent-> Tcurrent-> INCEPT);} else {
Memo1-> Lines-> Add (INTTOSTR (Grap.ncurrent-> StateID) "--->" INTSTR (Grap.ncurrent-> tcurrent->
State) "Receive: Epsilon");} Grap.ncurrent-> Tcurrent = grap.ncurrent-> tcurrent-> tnext;} else {memo1-> lines-> add (INTTOSTR (Grap.ncurrent-> StateID) "Termination Status");} Memo1-> Lines-> Add ("");}}
/ * The following part is the algorithm part * /
/ **************************************** * * File: Unit2.h ( Nfa.h && nfa_node.h) ** // ** The struct of nfa node and graphic ** // ** Author: zhanjiantao (Compower) ** // ************* ******************************************** / # ifndef unit2h # deflude
// NFA node, for making NFA Graphicclass NFA_Node {public: Transition * Thead; Transition * Tcurrent; Transition * Ttail; int stateID; // the state number of the NFA node NFA_Node * Nnext; // pointee of next NFA node public: NFA_Node (int stateID); // constructor // add a transition from this node to anothers void AddTransition (int NextState, char incept);}; // NFA Graphicclass NFA {public: NFA_Node * Nstart; NFA_Node * Ncurrent; NFA_Node * Nend Public: nfa (); // default, char inptor; // parameter constructor}; # Endif
/ ****************************************** ** FILE: Unit2.cpp (nfa.cpp && nfa_node.cpp) ** // ** The struct of nfa node and graphic ** // ** author: zhanjiantao (Compower) ** // ********* ************************************************ HDRSTOP # include "unit2.h" // --- -------------------------------------------------- ---------------------- # Pragma package (smart_init)
Nfa_node :: nfa_node (int stateid) {// new the head node of transition List for the nfa node.//notice :the head node doesn't Use, //just the head of transition list. This-> stateid = stateID THEAD = New Transition; THEAD-> NewTransition (0, '#'); Ttail = THEAD; NNEXT = NULL;}
void NFA_Node :: AddTransition (int NextState, char incept) {// do tail insert Tcurrent = new Transition; Tcurrent-> newTransition (NextState, incept); Ttail-> Tnext = Tcurrent; Ttail = Tcurrent;}
NFA :: NFA () {}
NFA :: NFA (int stateID, char incept) {// new two NFA node to make NFA Graphic // for basic RegularExp Ncurrent = new NFA_Node (stateID ); Ncurrent-> AddTransition (stateID, incept); Nstart = Ncurrent; Ncurrent = New nfa_node (state); nStart-> nnext = ncurrent; nend = ncurrent;}
/*********************************//**File:Unit3.cpp (STACK.CPP) * * // ** a cpp file of stack template ** // ** author: zhanjianTao (Compower) ** // **************************** *********** / # ifndef Unit3H # define unit3H // ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------- Template
Void Push (Type Item) {if (TOP == MaxSize-1) {return;} else {top ; vec [TOP] = Item;}};
TYPE POP (VOID) {TYPE TEMP; IF (Top! = - 1) {TEMP = VEC [TOP]; TOP -; RETURN TEMP;}}
Type gettop (void) {if (Top! = - 1) {Return Vec [TOP];}}
Bool ISempty (Void) {IF (TOP == - 1) Return True; Else Return False;};
Public: int top; int mapize; type * vec;}; # ENDIF
/*********************************//**File:Unit4.h (t) * * * * // ** Convert regularExp to nfa ** // ** author: zhanjianTao (Compower) ** // ************************************* ********* / # ifndef Unit4H # Define Unit4H # include "unit2.h" #include "unit3.cpp" #include
Private: NFA Connect (NFA G1, NFA G2); // Connect, Just Like The Regexp: Ab NFA Union (NFA G1, NFA G2); // Union, Just Like The Regexp: A | B NFA Closure (NFA G1) ; // Closure, Just Like the regexp: a * public: int s_id; // the state id};
#ENDIF
/*********************************//**File:Unit4.cpp (t) * * * // ** Convert regularExp to nfa ** // ** author: zhanjianTao (Compower) ** // ************************************* ********* / # pragma hdrstop # include "unit4.h" // ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ----------------------------------------------- # Pragma package (smart_init)
NFA Converter :: Tonfa (Char * Regex) {Stack
IF (ch! = '(' && ch! = ')' && ch! = '&' && ch! = '|' && ch! = '*') {NFA G3 (S_ID, CH); // Push Basic NFA graphic partg.push (g3); i ; s_id = 2;} else {switch (ch) {case '(': operaater.push (ch); i ; brek; case ')': op = operation.pop WHILE (OP! = '(') {Switch (op) {case '|': g1 = partg.pop (); g2 = partg.pop (); g2 = union (g1, g2); partg.push (G2); Break; Case '&': G1 = partg.pop (); g2 = partg.pop (); g2 = connect (g1, g2); partg.push (g2); break; case '*': G2 = partg.pop (); G2 = Closure (G2); Partg.push (G2); Break;} OP = OPERATER.POP ();} i ; break; case '|': if (! Operaater.isempty ()) {OP = Operaater.gettop (); // Test The Operator While (OP! = '(') {OP = Operaater.pop ();
Switch (OP) {case '|': g1 = partg.pop (); g2 = partg.pop (); g2 = union (g1, g2); partg.push (g2); break; case '&': g1 = Partg.POP (); g2 = partg.pop (); g2 = connection (g1, g2); partg.push (g2); Break; case '*': g2 = partg.pop (); g2 = closure G2); Partg.push (G2); Break;} if (! Operaater.isempty ()) {op = operat.gettop ();} else {Break;}}}}} OperaTer.push (CH); i ; Break; case '&': if (! operaater.isempty ()) {OP = Operaater.gettop (); if (OP == '|| op ==' * ') {OP = Operaater.pop () Switch (op) {case '&': g1 = partg.pop (); g2 = partg.pop (); g2 = connect (g1, g2); partg.push (g2); break; cas '*'
: G2 = partg.pop (); g2 = closure (g2); partg.push (g2); Break;}}} Operater.push (ch); i ; break; case '*': if (! Operaater.isempty) ()) {OP = OPERATER.GETTOP (); if (op == '*') {OP = Operaater.pop (); g2 = partg.pop (); g2 = closure (g2); partg.push (G2 );}} Operater.push (CH); i ; breaf;} // end of switch (ch)} // end of else} // end} (regex [i]! = '=') IF (Operaater .Isempty ()) // when the regexp is a Basic regexp {g2 = partg.pop ();} else {while (! Operaater.isempty ()) {OP = OperaTr.Pop (); switch (op) {casse ' | ': G1 = partg.pop (); g2 = partg.pop (); g2 = union (g1, g2); partg.push (g2); Break; Case' & ': G1 = partg.pop (); G2 = partg.pop (); g2 = connection (g1, g2); partg.push (g2); break; case '*': g2 = partg.pop (); g2 = closure (g2); partg.push ( G2); Break;}} // end of while (! Operaater.isempty ()} Return G2;} //
NFA Converter :: Connect (NFA G1, NFA G2) {// add transition in g2's end and INCEPT IS EPSION // and INCEPT G2'S End and g1's start g2.nend-> addtransition (g1.nstart-> stateid, '@ '); G2.nend-> nNext = g1.NStart; g2.nend = g1.nend; return g2;}
NFA Converter :: Union (NFA G1, NFA G2) {// New a state, and add transition: // this state to g2 's start and g1's start // instruction chars are Both Epsilon.//connect this State with g2's start, // and make this state to be g2's start. g2.ncurrent = new nfa_node (s_id ); g2.ncurrent-> addtransition (g2.nstart-> stateid, '@'); g2.ncurrent-> addtransition (g1.nstart) -> StateID, '@'); g2.ncurrent-> nNext = g2.nstart; g2.nstart = g2.ncurrent;
// new a star, and g2's end and g1's end, instruction // to this State, Incept Chars Are Both epsilon. g2.ncurrent = new nfa_node (s_id ); g2.nend-> addtransition (g2.ncurrent-> stateid, '@'); G1.nend-> addtransition (g2.ncurrent-> stateid, '@');
// Connect G1's End With this State, Make this State // TO BE G1'S End, AND THEN, CONNECT G2'S End with // G1 'START, LAST Step: Make G1's End To BE G2'S End G1.NEND-> NNEXT = G2. NCURRENT; g1.nend = g2.ncurrent; g2.nend-> nnext = g1.nstart; g2.nend = g1.nend;
Return G2;}
NFA Converter :: Closure (NFA G2) {// add transition in g2's end and inspt char is epsilon g2.nend-> addtransition (g2.nstart-> stateid, '@');
// new a star, add transition in this state = new nfa_node (s_id ); g2.ncurrent-> addtransition (g2.nstart-> stateid, '@') ; G2.ncurrent-> nnext = g2.nstart; g2.nstart = g2.ncurrent; // new a state, add transition in g2's end; // the INCEPT CHAR I EPSILON, AND Make this // state to be g2's end G2.ncurrent = new nfa_node (s_id ); g2.nend-> addtransition (g2.ncurrent-> stateid, '@'); g2.nend-> nNext = g2.ncurrent; g2.nend = g2.ncurrent;
// add transition in g2's start g2.nstart-> addtransition (g2.nend-> stateid, '@');
Return G2;}
Fourth, summary:
It is not a beautiful picture of the NFA map, so the function is removed, and it is not too intuitive text expression to output the result. (hope
Everyone gives some opinions, how to draw this picture is beautiful) has not completed the NFA to DFA.