C3 compiler
2.1
structure
In fact, since the language has been determined, the rest is good. The next thing we have to do is manually implementing the C3 compiler, and then we will use our automated analyzers to handle other languages so that we can see their difference.
Since it is intended to be an internal representation, it means that it is designed to design a class set to indicate these things. The first thing is Rule, which is no doubt.
Class Rule
Public Name
AS
String
Public RuleText
AS
String
END CLASS
I don't think anything will use such a class. The result of this compiled root is not compiled - all the rules of the text format. We need a representation of a grammar rule! After a long time, I chose a for a long time.
The method of the road map (I found this in the later further study] This is the same as the same way in 1976).
The left picture is a basic element. But just this, it's not good, the reason is the same as the Rule class above - too simple, we need some practical things and useful information for the back compiler - Just this is a service program . We want to change and make it close to C3 and try to approach the computer language. After a large number of trials, it has continuously improved its constant improvement to obtain the following representation scheme.
Recommendation method of the basic element of the road map
:
Normal chamber:
Optional map:
* Closed package chart:
Closed Package Chart:
Note: Illegal exports will not be expressed.
The element in the lower right corner is labeled in the lower right corner indicates that this is an optional element to match or match.
The matching exit of the closure is pointing to yourself. This is because as long as the match is matched, it will continue to match, and only if it is no longer match, it will be transferred from the mismatch outlet to the next element. The correct packet is actually a copy before the closed bag, and requires the same to match, which is the same as its mathematical meaning.
Below is a few route mapped rules:
Integer => DIGIT
String => "" "" ("" ""! "" "" "" "" "" "
Float => Digit * "." DIGIT
Then, RuleMap and RuleMapItem respective correspondence, respectively, respectively, respectively. Thus RuleMap is a collection class, according to the requirements of .NET, it needs to implement the IEnumeRATABLE interface. RuleMapItem needs optional, matching outlets, do not match the outlets and names.
Class rulemap
Implements IENUMERATABLE
Public
Function genumerator ()
AS IENUMERATOR
Public Name
As string
Public country
As integer
Public Item (Index
As integer
As rulemapitem
Public
Function Add (Name
As string
As rulemapitem
END CLASS
Class RulmapItem
Public Name
As string
Public Optional
As boolean
Public Match
As rulemapitem
Public dismatch
As rulemapitem
END CLASS
The definition of grammar rules is basically completed, and the next is the rule of the container: section. Since section is also a rule in semantics, the code of its rule is that all members of its internal, that is, it can match any of its internal rules, so the section is inherited from Rule. But as a container, it should also implement the IEnumerable interface. Class Section
Inherits Rule
Implements IENUMERABLE
Public function genumerator ()
AS IENUMERATOR
Public country
As integer
Public
SUB Add (r
As rule)
Public
SUB Remove
As integer
Public Item (Index
As integer
As rule
END CLASS
Since the unique difference between Class and Section is different, the location is different, so ISClass property is added in the section.
2.2
Road map combination principles
What is the internal most important point of the roadmap is what is the legal and illegal export of the roadmap during the compilation process? For example: (A | (B [C])) D
A after the analysis is over, why does not match the exit? Because they are or the relationship, only the matching of the B path is considered when the A does not match. Thereof
That C? Since it is associated with B, it should be legally exported in B, which is also matched. That is why the matching export does not point to C? Because A and C are not in an expression! Not right --a and b are not in an expression. That's right, but the expression of A, B is in the same expression. And actually optional C is also an expression only only one.
It is the most complex and trouble of places: D, which is associated with the previous expression, then it should be placed in legal exports. So is it legal export? The mismatch outlet of the optional primitive in accordance with C3 is also a legal exit. Both point D? Analyze the optional meaning,
Matching mismatch must match the next one. That is to say, the matching exit and the mismatch outlet must point to the subsequent chart, that is, D.
Is it finished? If the input is a string starting with a A, what? According to this rule AD is acceptable. However, if you scan it according to the road map above, only A is accepted, D is not accepted. This is obviously wrong. So the matching exit of A should also point to D.
However, this problem does not point to the above simple - how to find all these three legal exports? Obviously we need to find
All legitimate outlets are assigned to all legitimate outlets. There are two problems, one, because the road map is a mesh structure, how to traverse, this is actually described in the algorithm book, and it is no longer detailed; the second is how to judge the export is legal, of course match Exports As long as they do not point to any primitives, it is mainly legally, mainly, the problem that does not match the exit - the above C is a typical example - the solution is that the optional elevation does not match the exit.
Basically the frame design is complete, consider implementing problems, software engineering issues, and logical issues. The first is to implement the problem. Because of the class, interface and key points, the algorithm of the difficulty has been designed, and the remaining only fill in the code, it is relatively simple. Then there is a software engineering problem, whether there is robust, maintainability, and scalability, most worthless of scalability, because this test-based project is not a mature technology, so scalability is very important, because We don't know when we may forget something, no added. And logically can be very clear.
Its implementation stage considers that there is not much care in addition to scalability except scalability. In a specific implementation, I took the code integrates the code of the compilation section in these classes. But actually this is not a good idea, because this requires a special class to handle compilation problems, and it is a bit more troublesome due to the scattering of compiled code. Relative use of a fully split C3 compiler to handle although it may be more complicated (relatively), it is much more maintenanceability. The class mentioned above specially handled the compilation problem is the Grammar class that later born. It is entirely responsible for the compilation of C3 and compiled sequential / reverse sequencing work. More detailed description
Matching rules instructions documentation.
In the process of making the analyzer, we need an error message so that the function is added to the above architecture and the corresponding class to implement the automatic generation of the error information. However, since it can be automatically generated here, it is certainly automatic generation when analyzing its runtime, so these code is actually not large, but in order to prevent future needs, it has always been reserved from version 1.0, but in version 1.2 It is no longer used in the future.
After the completion of the above work, I wrote a simplified version of C-, a C, and made a corresponding change in .NET, see
Program Test Report 1, of course, this report is only after the scanner is completed, before it is only testing whether its function can be realized, sequential / reverse sequence is correct. Through this point, you can understand how important the test case is, there is no case, we can't even trust our code - there is no test, how do you know that it is right?