Removed first slash in [[ref:libraries/...]] wiki links.

git-svn-id: https://svn.eiffel.com/eiffel-org/trunk@1612 abb3cda0-5349-4a8f-a601-0c33ac3a8c38
This commit is contained in:
eiffel-org
2016-09-21 13:24:17 +00:00
parent 62de6c73e0
commit 952044ed1c
47 changed files with 278 additions and 278 deletions

View File

@@ -9,7 +9,7 @@ The process of recognizing the successive tokens of a text is called lexical ana
Besides recognizing the tokens, it is usually necessary to recognize the deeper syntactic structure of the text. This process is called '''parsing''' or '''syntax analysis''' and is studied in the next chapter.
Figure 1 shows the inheritance structure of the classes discussed in this chapter. Class [[ref:/libraries/parse/reference/l_interface_chart|L_INTERFACE]] has also been included although we will only study it in the [[EiffelParse Tutorial]]; it belongs to the Parse library, where it takes care of the interface between parsing and lexical analysis.
Figure 1 shows the inheritance structure of the classes discussed in this chapter. Class [[ref:libraries/parse/reference/l_interface_chart|L_INTERFACE]] has also been included although we will only study it in the [[EiffelParse Tutorial]]; it belongs to the Parse library, where it takes care of the interface between parsing and lexical analysis.
[[Image:figure1]]
@@ -34,37 +34,37 @@ The classes of the EiffelLex library make it possible to define lexical grammars
===Overview of the classes===
For the user of the EiffelLex library, the classes of most direct interest are [[ref:/libraries/lex/reference/token_chart|TOKEN]] , [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] , [[ref:/libraries/lex/reference/metalex_chart|METALEX]] and [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] .
For the user of the EiffelLex library, the classes of most direct interest are [[ref:libraries/lex/reference/token_chart|TOKEN]] , [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , [[ref:libraries/lex/reference/metalex_chart|METALEX]] and [[ref:libraries/lex/reference/scanning_chart|SCANNING]] .
An instance of [[ref:/libraries/lex/reference/token_chart|TOKEN]] describes a token read from an input file being analyzed, with such properties as the token type, the corresponding string and the position in the text (line, column) where it was found.
An instance of [[ref:libraries/lex/reference/token_chart|TOKEN]] describes a token read from an input file being analyzed, with such properties as the token type, the corresponding string and the position in the text (line, column) where it was found.
An instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] is a lexical analyzer for a certain lexical grammar. Given a reference to such an instance, say analyzer, you may analyze an input text through calls to the features of class [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] , for example:
An instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] is a lexical analyzer for a certain lexical grammar. Given a reference to such an instance, say analyzer, you may analyze an input text through calls to the features of class [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , for example:
<code>
analyzer.get_token
</code>
Class [[ref:/libraries/lex/reference/metalex_chart|METALEX]] defines facilities for building such lexical analyzers. In particular, it provides features for reading the grammar from a file and building the corresponding analyzer. Classes that need to build and use lexical analyzers may be written as descendants of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] to benefit from its general-purpose facilities.
Class [[ref:libraries/lex/reference/metalex_chart|METALEX]] defines facilities for building such lexical analyzers. In particular, it provides features for reading the grammar from a file and building the corresponding analyzer. Classes that need to build and use lexical analyzers may be written as descendants of [[ref:libraries/lex/reference/metalex_chart|METALEX]] to benefit from its general-purpose facilities.
Class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] is one such descendant of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] . It contains all the facilities needed to build an ordinary lexical analyzer and apply it to the analysis of input texts. Because these facilities are simpler to use and are in most cases sufficient, SCANNING will be discussed first; the finer-grain facilities of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] are described towards the end of this chapter.
Class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] is one such descendant of [[ref:libraries/lex/reference/metalex_chart|METALEX]] . It contains all the facilities needed to build an ordinary lexical analyzer and apply it to the analysis of input texts. Because these facilities are simpler to use and are in most cases sufficient, SCANNING will be discussed first; the finer-grain facilities of [[ref:libraries/lex/reference/metalex_chart|METALEX]] are described towards the end of this chapter.
These classes internally rely on others, some of which may be useful for more advanced applications. [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , one of the supporting classes, will be introduced after [[ref:/libraries/lex/reference/metalex_chart|METALEX]] .
These classes internally rely on others, some of which may be useful for more advanced applications. [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , one of the supporting classes, will be introduced after [[ref:libraries/lex/reference/metalex_chart|METALEX]] .
===Library example===
The EiffelStudio delivery includes (in the examples/library/lex subdirectory) a simple example using the EiffelLex library classes. The example applies EiffelLex library facilities to the analysis of a language which is none other than Eiffel itself.
The root class of that example, <eiffel>EIFFEL_SCAN</eiffel>, is only a few lines long; it relies on the general mechanism provided by [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] (see below). The actual lexical grammar is given by a lexical grammar file (a concept explained below): the file of name eiffel_regular in the same directory.
The root class of that example, <eiffel>EIFFEL_SCAN</eiffel>, is only a few lines long; it relies on the general mechanism provided by [[ref:libraries/lex/reference/scanning_chart|SCANNING]] (see below). The actual lexical grammar is given by a lexical grammar file (a concept explained below): the file of name eiffel_regular in the same directory.
===Dealing with finite automata===
Lexical analysis relies on the theory of finite automata. The most advanced of the classes discussed in this chapter, [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , relies on classes describing various forms of automata:
* [[ref:/libraries/lex/reference/dfa_chart|DFA]] : deterministic finite automata.
* [[ref:/libraries/lex/reference/pdfa_chart|PDFA]] : partially deterministic finite automata.
* [[ref:/libraries/lex/reference/ndfa_chart|NDFA]] : non-deterministic finite automata.
* [[ref:/libraries/lex/reference/automaton_chart|AUTOMATON]] , the most general: finite automata.
* [[ref:/libraries/lex/reference/fixed_automaton_chart|FIXED_AUTOMATON]] , [[ref:/libraries/lex/reference/linked_automaton_chart|LINKED_AUTOMATON]] .
Lexical analysis relies on the theory of finite automata. The most advanced of the classes discussed in this chapter, [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , relies on classes describing various forms of automata:
* [[ref:libraries/lex/reference/dfa_chart|DFA]] : deterministic finite automata.
* [[ref:libraries/lex/reference/pdfa_chart|PDFA]] : partially deterministic finite automata.
* [[ref:libraries/lex/reference/ndfa_chart|NDFA]] : non-deterministic finite automata.
* [[ref:libraries/lex/reference/automaton_chart|AUTOMATON]] , the most general: finite automata.
* [[ref:libraries/lex/reference/fixed_automaton_chart|FIXED_AUTOMATON]] , [[ref:libraries/lex/reference/linked_automaton_chart|LINKED_AUTOMATON]] .
These classes may also be useful for systems that need to manipulate finite automata for applications other than lexical analysis. The interface of [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , which includes the features from [[ref:/libraries/lex/reference/automaton_chart|AUTOMATON]] , [[ref:/libraries/lex/reference/ndfa_chart|NDFA]] and [[ref:/libraries/lex/reference/pdfa_chart|PDFA]] , will provide the essential information.
These classes may also be useful for systems that need to manipulate finite automata for applications other than lexical analysis. The interface of [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , which includes the features from [[ref:libraries/lex/reference/automaton_chart|AUTOMATON]] , [[ref:libraries/lex/reference/ndfa_chart|NDFA]] and [[ref:libraries/lex/reference/pdfa_chart|PDFA]] , will provide the essential information.
==TOKENS==
@@ -78,24 +78,24 @@ A lexical analyzer built through any of the techniques described in the rest of
==BUILDING AND USING LEXICAL ANALYZERS==
The general method for performing lexical analysis is the following.
# Create an instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] , giving a lexical analyzer for the desired grammar.
# Create an instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , giving a lexical analyzer for the desired grammar.
# Store the analyzer into a file.
# Retrieve the analyzer from the file.
# Use the analyzer to analyze the tokens of one or more input texts by calling the various features of class [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] on this object.
# Use the analyzer to analyze the tokens of one or more input texts by calling the various features of class [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] on this object.
Steps 2 and 3 are obviously unnecessary if this process is applied as a single sequence. But in almost all practical cases you will want to use the same grammar to analyze many different input texts. Then steps 1 and 2 will be performed once and for all as soon as the lexical grammar is known, yielding an instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] that step 2 stores into a file; then in every case that requires analyzing a text you will simply retrieve the analyzer and apply it, performing steps 3 and 4 only.
Steps 2 and 3 are obviously unnecessary if this process is applied as a single sequence. But in almost all practical cases you will want to use the same grammar to analyze many different input texts. Then steps 1 and 2 will be performed once and for all as soon as the lexical grammar is known, yielding an instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] that step 2 stores into a file; then in every case that requires analyzing a text you will simply retrieve the analyzer and apply it, performing steps 3 and 4 only.
The simplest way to store and retrieve the instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] and all related objects is to use the facilities of class [[ref:/libraries/base/reference/storable_chart|STORABLE]] : procedure store and one of the retrieval procedures. To facilitate this process, [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] inherits from [[ref:/libraries/base/reference/storable_chart|STORABLE]] .
The simplest way to store and retrieve the instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] and all related objects is to use the facilities of class [[ref:libraries/base/reference/storable_chart|STORABLE]] : procedure store and one of the retrieval procedures. To facilitate this process, [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] inherits from [[ref:libraries/base/reference/storable_chart|STORABLE]] .
The next sections explain how to perform these various steps. In the most common case, the best technique is to inherit from class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , which provides a framework for retrieving an analyzer file if it exists, creating it from a grammar description otherwise, and proceed with the lexical analysis of one or more input texts.
The next sections explain how to perform these various steps. In the most common case, the best technique is to inherit from class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , which provides a framework for retrieving an analyzer file if it exists, creating it from a grammar description otherwise, and proceed with the lexical analysis of one or more input texts.
==LEXICAL GRAMMAR FILES AND CLASS SCANNING==
Class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] may be used as an ancestor by classes that need to perform lexical analysis. When using [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] you will need a '''lexical grammar file''' that contains the description of the lexical grammar. Since it is easy to edit and adapt a file without modifying the software proper, this technique provides flexibility and supports the incremental development and testing of lexical analyzers.
Class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] may be used as an ancestor by classes that need to perform lexical analysis. When using [[ref:libraries/lex/reference/scanning_chart|SCANNING]] you will need a '''lexical grammar file''' that contains the description of the lexical grammar. Since it is easy to edit and adapt a file without modifying the software proper, this technique provides flexibility and supports the incremental development and testing of lexical analyzers.
===The build procedure===
To obtain a lexical analyzer in a descendant of class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , use the procedure
To obtain a lexical analyzer in a descendant of class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , use the procedure
<code>
build (store_file_name, grammar_file_name: STRING)</code>
@@ -146,7 +146,7 @@ This will read in and process successive input tokens. Procedure <eiffel>analyze
The initial action <eiffel>begin_analysis</eiffel>, which by default prints a header, and the terminal action <eiffel>end_analysis</eiffel>, which by default does nothing, may also be redefined.
To build lexical analyzers which provide a higher degree of flexibility, use [[ref:/libraries/lex/reference/metalex_chart|METALEX]] or [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , as described in the last part of this chapter.
To build lexical analyzers which provide a higher degree of flexibility, use [[ref:libraries/lex/reference/metalex_chart|METALEX]] or [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , as described in the last part of this chapter.
==ANALYZING INPUT==
@@ -154,9 +154,9 @@ Let us look more precisely at how we can use a lexical analyzer to analyze an in
===Class LEXICAL===
Procedure <eiffel>analyze</eiffel> takes care of the most common needs of lexical analysis. But if you need more advanced lexical analysis facilities you will need an instance of class [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] (a direct instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] itself or of one of its proper descendants). If you are using class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] as described above, you will have access to such an instance through the attribute <eiffel>analyzer</eiffel>.
Procedure <eiffel>analyze</eiffel> takes care of the most common needs of lexical analysis. But if you need more advanced lexical analysis facilities you will need an instance of class [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] (a direct instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] itself or of one of its proper descendants). If you are using class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] as described above, you will have access to such an instance through the attribute <eiffel>analyzer</eiffel>.
This discussion will indeed assume that you have an entity attached to an instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] . The name of that entity is assumed to be <eiffel>analyzer</eiffel>, although it does not need to be the attribute from [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] . You can apply to that <eiffel>analyzer</eiffel> the various exported features features of class [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] , explained below. All the calls described below should use <eiffel>analyzer</eiffel> as their target, as in
This discussion will indeed assume that you have an entity attached to an instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] . The name of that entity is assumed to be <eiffel>analyzer</eiffel>, although it does not need to be the attribute from [[ref:libraries/lex/reference/scanning_chart|SCANNING]] . You can apply to that <eiffel>analyzer</eiffel> the various exported features features of class [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , explained below. All the calls described below should use <eiffel>analyzer</eiffel> as their target, as in
<code>
analyzer.set_file ("my_file_name")
</code>
@@ -168,12 +168,12 @@ To create a new analyzer, use
create analyzer.make_new
</code>
You may also retrieve an analyzer from a previous session. [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] is a descendant from [[ref:/libraries/base/reference/storable_chart|STORABLE]] , so you can use feature retrieved for that purpose. In a descendant of [[ref:/libraries/base/reference/storable_chart|STORABLE]] , simply write
You may also retrieve an analyzer from a previous session. [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] is a descendant from [[ref:libraries/base/reference/storable_chart|STORABLE]] , so you can use feature retrieved for that purpose. In a descendant of [[ref:libraries/base/reference/storable_chart|STORABLE]] , simply write
<code>
analyzer ?= retrieved
</code>
If you do not want to make the class a descendant of [[ref:/libraries/base/reference/storable_chart|STORABLE]] , use the creation procedure <eiffel>make</eiffel> of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , not to be confused with <eiffel>make_new</eiffel> above:
If you do not want to make the class a descendant of [[ref:libraries/base/reference/storable_chart|STORABLE]] , use the creation procedure <eiffel>make</eiffel> of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , not to be confused with <eiffel>make_new</eiffel> above:
<code>
create analyzer.make
analyzer ?= analyzer.retrieved
@@ -183,12 +183,12 @@ If you do not want to make the class a descendant of [[ref:/libraries/base/refer
To analyze a text, call <eiffel>set_file</eiffel> or <eiffel>set_string</eiffel> to specify the document to be parsed. With the first call, the analysis will be applied to a file; with the second, to a string.
{{note|if you use procedure <eiffel>analyze</eiffel> of [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , you do not need any such call, since <eiffel>analyze</eiffel> calls <eiffel>set_file</eiffel> on the file name passed as argument. }}
{{note|if you use procedure <eiffel>analyze</eiffel> of [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , you do not need any such call, since <eiffel>analyze</eiffel> calls <eiffel>set_file</eiffel> on the file name passed as argument. }}
===Obtaining the tokens===
The basic procedure for analyzing successive tokens in the text is <eiffel>get_token</eiffel>, which reads in one token and sets up various attributes of the analyzer to record properties of that token:
* <eiffel>last_token</eiffel>, a function of type [[ref:/libraries/lex/reference/token_chart|TOKEN]] , which provides all necessary information on the last token read.
* <eiffel>last_token</eiffel>, a function of type [[ref:libraries/lex/reference/token_chart|TOKEN]] , which provides all necessary information on the last token read.
* <eiffel>token_line_number</eiffel> and<eiffel> token_column_number</eiffel>, to know where the token is in the text. These queries return results of type <eiffel>INTEGER</eiffel>.
* <eiffel>token_type</eiffel>, giving the regular expression type, identified by its integer number (which is the value <eiffel>No_token</eiffel> if no correct token was recognized).
* <eiffel>other_possible_tokens</eiffel>, an array giving all the other possible token types of the last token. (If <eiffel>token_type</eiffel> is <eiffel>No_token</eiffel> the array is empty.)
@@ -218,13 +218,13 @@ Here is the most common way of using the preceding facilities:
end_analysis
</code>
This scheme is used by procedure <eiffel>analyze</eiffel> of class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , so that in standard cases you may simply inherit from that class and redefine procedures <eiffel>begin_analysis</eiffel>, <eiffel>do_a_token</eiffel>, and <eiffel>end_analysis</eiffel>. If you are not inheriting from [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , these names simply denote procedures that you must provide.
This scheme is used by procedure <eiffel>analyze</eiffel> of class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , so that in standard cases you may simply inherit from that class and redefine procedures <eiffel>begin_analysis</eiffel>, <eiffel>do_a_token</eiffel>, and <eiffel>end_analysis</eiffel>. If you are not inheriting from [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , these names simply denote procedures that you must provide.
==REGULAR EXPRESSIONS==
The EiffelLex library supports a powerful set of construction mechanisms for describing the various types of tokens present in common languages such as programming languages, specification languages or just text formats. These mechanisms are called '''regular expressions'''; any regular expression describes a set of possible tokens, called the '''specimens''' of the regular expression.
Let us now study the format of regular expressions. This format is used in particular for the lexical grammar files needed by class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] and (as seen below) by procedure <eiffel>read_grammar</eiffel> of class [[ref:/libraries/lex/reference/metalex_chart|METALEX]] . The ''eiffel_regular'' grammar file in the examples directory provides an extensive example.
Let us now study the format of regular expressions. This format is used in particular for the lexical grammar files needed by class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] and (as seen below) by procedure <eiffel>read_grammar</eiffel> of class [[ref:libraries/lex/reference/metalex_chart|METALEX]] . The ''eiffel_regular'' grammar file in the examples directory provides an extensive example.
Each regular expression denotes a set of tokens. For example, the first regular expression seen above, <br/>
@@ -401,7 +401,7 @@ BOOLEAN
{{caution|Every keyword in the keyword section must be a specimen of one of the token types defined for the grammar, and that token type must be the last one defined in the lexical grammar file, just before the '''Keywords''' line. So in Eiffel where the keywords have the same lexical structure as identifiers, the last line before the keywords must be the definition of the token type ''Identifier'', as shown above. }}
{{note|The rule that all keywords must be specimens of one token type is a matter of convenience and simplicity, and only applies if you are using SCANNING and lexical grammar files. There is no such restriction if you rely directly on the more general facilities provided by [[ref:/libraries/lex/reference/metalex_chart|METALEX]] or [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] . Then different keywords may be specimens of different regular expressions; you will have to specify the token type of every keyword, as explained later in this chapter. }}
{{note|The rule that all keywords must be specimens of one token type is a matter of convenience and simplicity, and only applies if you are using SCANNING and lexical grammar files. There is no such restriction if you rely directly on the more general facilities provided by [[ref:libraries/lex/reference/metalex_chart|METALEX]] or [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] . Then different keywords may be specimens of different regular expressions; you will have to specify the token type of every keyword, as explained later in this chapter. }}
===Case sensitivity===
@@ -433,19 +433,19 @@ The inverse call, corresponding to the default rule, is
keywords_ignore_case
</code>
Either of these calls must be executed before you define any keywords; if you are using [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , this means before calling procedure build. Once set, the keyword case-sensitivity policy cannot be changed.
Either of these calls must be executed before you define any keywords; if you are using [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , this means before calling procedure build. Once set, the keyword case-sensitivity policy cannot be changed.
==USING METALEX TO BUILD A LEXICAL ANALYZER==
(You may skip the rest of this chapter if you only need simple lexical facilities.)
Class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , as studied above, relies on a class [[ref:/libraries/lex/reference/metalex_chart|METALEX]] . In some cases, you may prefer to use the features of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] directly. Since [[ref:libraries/lex/reference/scanning_chart|SCANNING]] inherits from [[ref:/libraries/lex/reference/metalex_chart|METALEX]] , anything you do with [[ref:/libraries/lex/reference/metalex_chart|METALEX]] can in fact be done with [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , but you may wish to stay with just [[ref:/libraries/lex/reference/metalex_chart|METALEX]] if you do not need the additional features of [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] .
Class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , as studied above, relies on a class [[ref:libraries/lex/reference/metalex_chart|METALEX]] . In some cases, you may prefer to use the features of [[ref:libraries/lex/reference/metalex_chart|METALEX]] directly. Since [[ref:libraries/lex/reference/scanning_chart|SCANNING]] inherits from [[ref:libraries/lex/reference/metalex_chart|METALEX]] , anything you do with [[ref:libraries/lex/reference/metalex_chart|METALEX]] can in fact be done with [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , but you may wish to stay with just [[ref:libraries/lex/reference/metalex_chart|METALEX]] if you do not need the additional features of [[ref:libraries/lex/reference/scanning_chart|SCANNING]] .
===Steps in using METALEX===
[[ref:/libraries/lex/reference/metalex_chart|METALEX]] has an attribute analyzer which will be attached to a lexical analyzer. This class provides tools for building a lexical analyzer incrementally through explicit feature calls; you can still use a lexical grammar file, but do not have to.
[[ref:libraries/lex/reference/metalex_chart|METALEX]] has an attribute analyzer which will be attached to a lexical analyzer. This class provides tools for building a lexical analyzer incrementally through explicit feature calls; you can still use a lexical grammar file, but do not have to.
The following extract from a typical descendant of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] illustrates the process of building a lexical analyzer in this way:
The following extract from a typical descendant of [[ref:libraries/lex/reference/metalex_chart|METALEX]] illustrates the process of building a lexical analyzer in this way:
<code>
Upper_identifier, Lower_identifier, Decimal_constant, Octal_constant, Word: INTEGER is unique
@@ -467,13 +467,13 @@ The following extract from a typical descendant of [[ref:/libraries/lex/referenc
make_analyzer
</code>
This example follows the general scheme of building a lexical analyzer with the features of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] , in a class that will normally be a descendant of [[ref:libraries/lex/reference/metalex_chart|METALEX]] :
This example follows the general scheme of building a lexical analyzer with the features of [[ref:libraries/lex/reference/metalex_chart|METALEX]] , in a class that will normally be a descendant of [[ref:libraries/lex/reference/metalex_chart|METALEX]] :
# Set options, such as case sensitivity.
# Record regular expressions.
# Record keywords (this may be interleaved with step 2.)
# "Freeze" the analyzer by a call to <eiffel>make_analyzer</eiffel>.
To perform steps 2 to 4 in a single shot and generate a lexical analyzer from a lexical grammar file, as with [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , you may use the procedure
To perform steps 2 to 4 in a single shot and generate a lexical analyzer from a lexical grammar file, as with [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , you may use the procedure
<code>
read_grammar (grammar_file_name: STRING)
</code>
@@ -492,7 +492,7 @@ Procedure <eiffel>dollar_w</eiffel> corresponds to the '''$W''' syntax for regul
put_nameless_expression ( "$W" ,Word )
</code>
Procedure <eiffel>declare_keyword</eiffel> records a keyword. The first argument is a string containing the keyword; the second argument is the regular expression of which the keyword must be a specimen. The example shows that here - in contrast with the rule enforced by [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] - not all keywords need be specimens of the same regular expression.
Procedure <eiffel>declare_keyword</eiffel> records a keyword. The first argument is a string containing the keyword; the second argument is the regular expression of which the keyword must be a specimen. The example shows that here - in contrast with the rule enforced by [[ref:libraries/lex/reference/scanning_chart|SCANNING]] - not all keywords need be specimens of the same regular expression.
The calls seen so far record a number of regular expressions and keywords, but do not give us a lexical analyzer yet. To obtain a usable lexical analyzer, you must call
<code>
@@ -503,17 +503,17 @@ After that call, you may not record any new regular expression or keyword. The a
{{note|for readers knowledgeable in the theory of lexical analysis: one of the most important effects of the call to <eiffel>make_analyzer</eiffel> is to transform the non-deterministic finite automaton resulting from calls such as the ones above into a deterministic finite automaton. }}
Remember that if you use procedure <eiffel>read_grammar</eiffel>, you need not worry about <eiffel>make_analyzer</eiffel>, as the former procedure calls the latter.
Another important feature of class [[ref:/libraries/lex/reference/metalex_chart|METALEX]] is procedure <eiffel>store_analyzer</eiffel>, which stores the analyzer into a file whose name is passed as argument, for use by later lexical analysis sessions. To retrieve the analyzer, simply use procedure <eiffel>retrieve_analyzer</eiffel>, again with a file name as argument.
Another important feature of class [[ref:libraries/lex/reference/metalex_chart|METALEX]] is procedure <eiffel>store_analyzer</eiffel>, which stores the analyzer into a file whose name is passed as argument, for use by later lexical analysis sessions. To retrieve the analyzer, simply use procedure <eiffel>retrieve_analyzer</eiffel>, again with a file name as argument.
==BUILDING A LEXICAL ANALYZER WITH LEX_BUILDER==
To have access to the most general set of lexical analysis mechanisms, you may use class [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , which gives you an even finer grain of control than [[ref:/libraries/lex/reference/metalex_chart|METALEX]] . This is not necessary in simple applications.
To have access to the most general set of lexical analysis mechanisms, you may use class [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , which gives you an even finer grain of control than [[ref:libraries/lex/reference/metalex_chart|METALEX]] . This is not necessary in simple applications.
===Building a lexical analyzer===
[[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] enables you to build a lexical analyzer by describing successive token types and keywords. This is normally done in a descendant of [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] . For each token type, you call a procedure that builds an object, or '''tool''', representing the associated regular expression.
[[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] enables you to build a lexical analyzer by describing successive token types and keywords. This is normally done in a descendant of [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] . For each token type, you call a procedure that builds an object, or '''tool''', representing the associated regular expression.
For the complete list of available procedures, refer to the flat-short form of the class; there is one procedure for every category of regular expression studied earlier in this chapter. Two typical examples of calls are:
<code>
@@ -527,7 +527,7 @@ For the complete list of available procedures, refer to the flat-short form of t
Every such procedure call also assigns an integer index to the tool it creates; this number is available through the attribute <eiffel>last_created_tool</eiffel>. You will need to record it into an integer entity, for example <eiffel>Identifier</eiffel> or <eiffel>Letter</eiffel>.
===An example===
The following extract from a typical descendant of [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] illustrates how to create a tool representing the identifiers of an Eiffel-like language.
The following extract from a typical descendant of [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] illustrates how to create a tool representing the identifiers of an Eiffel-like language.
<code>
Identifier, Letter, Digit, Underlined, Suffix, Suffix_list: INTEGER

View File

@@ -66,14 +66,14 @@ As noted at the beginning of this chapter, it is possible to build a single synt
==LIBRARY CLASSES==
The EiffelParse library contains a small number of classes which cover common document processing applications. The classes, whose inheritance structure was shown at the beginning of this chapter, are:
* [[ref:/libraries/parse/reference/construct_chart|CONSTRUCT]] , describing the general notion of syntactical construct.
* [[ref:/libraries/parse/reference/aggregate_chart|AGGREGATE]] , describing constructs of the "aggregate" form.
* [[ref:/libraries/parse/reference/choice_chart|CHOICE]] , describing constructs of the "choice" form.
* [[ref:/libraries/parse/reference/repetition_chart|REPETITION]] , describing constructs of the "repetition" form.
* [[ref:/libraries/parse/reference/terminal_chart|TERMINAL]] , describing "terminal" constructs with no further structure.
* [[ref:/libraries/parse/reference/keyword_chart|KEYWORD]] , describing how to handle keywords.
* [[ref:/libraries/parse/reference/l_interface_chart|L_INTERFACE]] , providing a simple interface with the lexical analysis process and the Lex library.
* [[ref:/libraries/parse/reference/input_chart|INPUT]] , describing how to handle the input document.
* [[ref:libraries/parse/reference/construct_chart|CONSTRUCT]] , describing the general notion of syntactical construct.
* [[ref:libraries/parse/reference/aggregate_chart|AGGREGATE]] , describing constructs of the "aggregate" form.
* [[ref:libraries/parse/reference/choice_chart|CHOICE]] , describing constructs of the "choice" form.
* [[ref:libraries/parse/reference/repetition_chart|REPETITION]] , describing constructs of the "repetition" form.
* [[ref:libraries/parse/reference/terminal_chart|TERMINAL]] , describing "terminal" constructs with no further structure.
* [[ref:libraries/parse/reference/keyword_chart|KEYWORD]] , describing how to handle keywords.
* [[ref:libraries/parse/reference/l_interface_chart|L_INTERFACE]] , providing a simple interface with the lexical analysis process and the Lex library.
* [[ref:libraries/parse/reference/input_chart|INPUT]] , describing how to handle the input document.
==EXAMPLES==
The EiffelStudio delivery includes (in the examples/library/parse subdirectory) a simple example using the EiffelParse Library classes. The example is a processor for "documents" which describe computations involving polynomials with variables. The corresponding processor is a system which obtains polynomial specifications and variable values from a user, and computes the corresponding polynomials.
@@ -100,7 +100,7 @@ Although some notations for syntax descriptions such as BNF allow more than one
* A terminal construct has no defining production. This means that it must be defined outside of the syntactical grammar. Terminals indeed come from the '''lexical grammar'''. Every terminal construct corresponds to a token type (regular expression or keyword) of the lexical grammar, for which the parsing duty will be delegated to lexical mechanisms, assumed in the rest of this chapter to be provided by the Lex library although others may be substituted if appropriate.
All specimens of terminal constructs are instances of class [[ref:/libraries/parse/reference/terminal_chart|TERMINAL]] . A special case is that of keyword constructs, which have a single specimen corresponding to a keyword of the language. For example, <code>if</code> is a keyword of Eiffel. Keywords are described by class [[ref:/libraries/parse/reference/keyword_chart|KEYWORD]] , an heir of [[ref:/libraries/parse/reference/terminal_chart|TERMINAL]] .
All specimens of terminal constructs are instances of class [[ref:libraries/parse/reference/terminal_chart|TERMINAL]] . A special case is that of keyword constructs, which have a single specimen corresponding to a keyword of the language. For example, <code>if</code> is a keyword of Eiffel. Keywords are described by class [[ref:libraries/parse/reference/keyword_chart|KEYWORD]] , an heir of [[ref:libraries/parse/reference/terminal_chart|TERMINAL]] .
The rest of this section concentrates on the parsing-specific part: non-terminal constructs and productions. Terminals will be studied in the discussion of how to interface parsing with lexical analysis.
@@ -158,9 +158,9 @@ The EiffelParse library supports a parsing mechanism based on the concepts of ob
===Class CONSTRUCT===
The deferred class [[ref:/libraries/parse/reference/construct_chart|CONSTRUCT]] describes the general notion of construct; instances of this class and its descendants are specimens of the constructs of a grammar.
The deferred class [[ref:libraries/parse/reference/construct_chart|CONSTRUCT]] describes the general notion of construct; instances of this class and its descendants are specimens of the constructs of a grammar.
Deferred though it may be, [[ref:/libraries/parse/reference/construct_chart|CONSTRUCT]] defines some useful general patterns; for example, its procedure process appears as: <br/>
Deferred though it may be, [[ref:libraries/parse/reference/construct_chart|CONSTRUCT]] defines some useful general patterns; for example, its procedure process appears as: <br/>
<code>
parse
if parsed then
@@ -170,11 +170,11 @@ Deferred though it may be, [[ref:/libraries/parse/reference/construct_chart|CONS
<br/>
where procedures <eiffel>parse</eiffel> and <eiffel>semantics</eiffel> are expressed in terms of some more specific procedures, which are deferred. This defines a general scheme while leaving the details to descendants of the class.
Such descendants, given in the library, are classes [[ref:/libraries/parse/reference/aggregate_chart|AGGREGATE]] , [[ref:/libraries/parse/reference/choice_chart|CHOICE]] , [[ref:/libraries/parse/reference/repetition_chart|REPETITION]] and [[ref:/libraries/parse/reference/terminal_chart|TERMINAL]] . They describe the corresponding types of construct, with features providing the operations for parsing their specimens and applying the associated semantic actions.
Such descendants, given in the library, are classes [[ref:libraries/parse/reference/aggregate_chart|AGGREGATE]] , [[ref:libraries/parse/reference/choice_chart|CHOICE]] , [[ref:libraries/parse/reference/repetition_chart|REPETITION]] and [[ref:libraries/parse/reference/terminal_chart|TERMINAL]] . They describe the corresponding types of construct, with features providing the operations for parsing their specimens and applying the associated semantic actions.
===Building a processor===
To build a processor for a given grammar, you write a class, called a '''construct class''', for every construct of the grammar, terminal or non-terminal. The class should inherit from [[ref:/libraries/parse/reference/aggregate_chart|AGGREGATE]] , [[ref:/libraries/parse/reference/choice_chart|CHOICE]] , [[ref:/libraries/parse/reference/repetition_chart|REPETITION]] or [[ref:/libraries/parse/reference/terminal_chart|TERMINAL]] depending on the nature of the construct. It describes the production for the construct and any associated semantic actions.
To build a processor for a given grammar, you write a class, called a '''construct class''', for every construct of the grammar, terminal or non-terminal. The class should inherit from [[ref:libraries/parse/reference/aggregate_chart|AGGREGATE]] , [[ref:libraries/parse/reference/choice_chart|CHOICE]] , [[ref:libraries/parse/reference/repetition_chart|REPETITION]] or [[ref:libraries/parse/reference/terminal_chart|TERMINAL]] depending on the nature of the construct. It describes the production for the construct and any associated semantic actions.
To complete the processor, you must choose a "top construct" for that particular processor, and write a root class. In accordance with the object-oriented method, which implies that "roots" and "tops" should be chosen last, these steps are explained at the end of this chapter.
@@ -186,9 +186,9 @@ The effect of processing a document with a processor built from a combination of
The syntax tree is said to be abstract because it only includes important structural information and does not retain the concrete information such as keywords and separators. Such concrete information, sometimes called "syntactic sugar", serves only external purposes but is of no use for semantic processing.
The combination of Eiffel techniques and libraries yields a very simple approach to building and processing abstract syntax trees. Class [[ref:/libraries/parse/reference/construct_chart|CONSTRUCT]] is a descendant of the Data Structure Library class [[ref:/libraries/base/reference/two_way_tree_chart|TWO_WAY_TREE]] , describing a versatile implementation of trees; so, as a consequence, are [[ref:/libraries/parse/reference/construct_chart|CONSTRUCT's]] own descendants. The effect of parsing any specimen of a construct is therefore to create an instance of the corresponding construct class. This instance is (among other things) a tree node, and is automatically inserted at its right place in the abstract syntax tree.
The combination of Eiffel techniques and libraries yields a very simple approach to building and processing abstract syntax trees. Class [[ref:libraries/parse/reference/construct_chart|CONSTRUCT]] is a descendant of the Data Structure Library class [[ref:libraries/base/reference/two_way_tree_chart|TWO_WAY_TREE]] , describing a versatile implementation of trees; so, as a consequence, are [[ref:libraries/parse/reference/construct_chart|CONSTRUCT's]] own descendants. The effect of parsing any specimen of a construct is therefore to create an instance of the corresponding construct class. This instance is (among other things) a tree node, and is automatically inserted at its right place in the abstract syntax tree.
As noted in the discussion of trees, class [[ref:/libraries/base/reference/two_way_tree_chart|TWO_WAY_TREE]] makes no formal distinction between the notions of tree and tree node. So you may identify the abstract syntax tree with the object (instance of [[ref:/libraries/parse/reference/construct_chart|CONSTRUCT]] ) representing the topmost construct specimen in the structure of the document being analyzed.
As noted in the discussion of trees, class [[ref:libraries/base/reference/two_way_tree_chart|TWO_WAY_TREE]] makes no formal distinction between the notions of tree and tree node. So you may identify the abstract syntax tree with the object (instance of [[ref:libraries/parse/reference/construct_chart|CONSTRUCT]] ) representing the topmost construct specimen in the structure of the document being analyzed.
===The production function===
@@ -200,9 +200,9 @@ A construct class describes the syntax of a given construct through a function c
end
</code>
Function production remains deferred in classes [[ref:/libraries/parse/reference/aggregate_chart|AGGREGATE]] , [[ref:/libraries/parse/reference/choice_chart|CHOICE]] and [[ref:/libraries/parse/reference/repetition_chart|REPETITION]] . Every effective construct class that you write must provide an effecting of that function. It is important for the efficiency of the parsing process that every effective version of production be a <eiffel>once</eiffel> function. Several examples of such effectings are given below.
Function production remains deferred in classes [[ref:libraries/parse/reference/aggregate_chart|AGGREGATE]] , [[ref:libraries/parse/reference/choice_chart|CHOICE]] and [[ref:libraries/parse/reference/repetition_chart|REPETITION]] . Every effective construct class that you write must provide an effecting of that function. It is important for the efficiency of the parsing process that every effective version of production be a <eiffel>once</eiffel> function. Several examples of such effectings are given below.
Classes [[ref:/libraries/parse/reference/aggregate_chart|AGGREGATE]] , [[ref:/libraries/parse/reference/choice_chart|CHOICE]] , [[ref:/libraries/parse/reference/repetition_chart|REPETITION]] and [[ref:/libraries/parse/reference/terminal_chart|TERMINAL]] also have a deferred function <eiffel>construct_name</eiffel> of type STRING, useful for tracing and debugging. This function should be effected in every construct class to return the string name of the construct, such as "INSTRUCTION" or "CLASS" for construct classes in a grammar of Eiffel. For efficiency reasons, the <eiffel>construct_name</eiffel> function should also be a <eiffel>once</eiffel> function. The form of such a function will always be the same, as illustrated by the following example which may appear in the construct class <eiffel>INSTRUCTION</eiffel> in a processor for Eiffel:
Classes [[ref:libraries/parse/reference/aggregate_chart|AGGREGATE]] , [[ref:libraries/parse/reference/choice_chart|CHOICE]] , [[ref:libraries/parse/reference/repetition_chart|REPETITION]] and [[ref:libraries/parse/reference/terminal_chart|TERMINAL]] also have a deferred function <eiffel>construct_name</eiffel> of type STRING, useful for tracing and debugging. This function should be effected in every construct class to return the string name of the construct, such as "INSTRUCTION" or "CLASS" for construct classes in a grammar of Eiffel. For efficiency reasons, the <eiffel>construct_name</eiffel> function should also be a <eiffel>once</eiffel> function. The form of such a function will always be the same, as illustrated by the following example which may appear in the construct class <eiffel>INSTRUCTION</eiffel> in a processor for Eiffel:
<code>
construct_name: STRING
-- Symbolic name of the construct
@@ -217,7 +217,7 @@ The examples of the next few sections, which explain how to write construct clas
Having studied the EiffelParse library principles, let us see how to write grammar productions for various kinds of construct. The main task is to write the production function for each construct class.
The production function for a descendant of [[ref:/libraries/parse/reference/aggregate_chart|AGGREGATE]] will describe how to build a specimen of the corresponding function from a sequence of specimens of each of the constituent constructs. Writing this function from the corresponding production is straightforward.
The production function for a descendant of [[ref:libraries/parse/reference/aggregate_chart|AGGREGATE]] will describe how to build a specimen of the corresponding function from a sequence of specimens of each of the constituent constructs. Writing this function from the corresponding production is straightforward.
As an example, consider the production function of class LINE for the Polynomial example language. The corresponding production is <br/>
<code>
@@ -321,7 +321,7 @@ Then for each alternative component represented by a local entity component (in
===Repetitions===
The <eiffel>production</eiffel> function for a descendant of [[ref:/libraries/parse/reference/repetition_chart|REPETITION]] will describe how to build a specimen of the corresponding function as a sequence or zero or more (or, depending on the grammar, one or more) specimens of the base construct. The class must also effect a feature <eiffel>separator</eiffel> of type <eiffel>STRING</eiffel>, usually as a constant attribute. (This feature is introduced as deferred in class [[ref:/libraries/parse/reference/repetition_chart|REPETITION]] .)
The <eiffel>production</eiffel> function for a descendant of [[ref:libraries/parse/reference/repetition_chart|REPETITION]] will describe how to build a specimen of the corresponding function as a sequence or zero or more (or, depending on the grammar, one or more) specimens of the base construct. The class must also effect a feature <eiffel>separator</eiffel> of type <eiffel>STRING</eiffel>, usually as a constant attribute. (This feature is introduced as deferred in class [[ref:libraries/parse/reference/repetition_chart|REPETITION]] .)
As an example, consider the construct Variables in the Polynomial example language. The right-hand side of the corresponding production is <br/>
<code>
@@ -363,7 +363,7 @@ Lexical interface classes usually follow a common pattern. To take advantage of
L_INTERFACE is a simple deferred class, with a deferred procedure <eiffel>obtain_analyzer</eiffel>. It is an heir of METALEX.
===Obtaining a lexical analyzer===
An effective descendant of [[ref:/libraries/parse/reference/l_interface_chart|L_INTERFACE]] must define procedure <eiffel>obtain_analyzer</eiffel> so that it records into the lexical analyzer the regular expressions and keywords of the language at hand. In writing <eiffel>obtain_analyzer</eiffel> you may use any one of three different techniques, each of which may be the most convenient depending on the precise context, to obtain the required lexical analyzer:
An effective descendant of [[ref:libraries/parse/reference/l_interface_chart|L_INTERFACE]] must define procedure <eiffel>obtain_analyzer</eiffel> so that it records into the lexical analyzer the regular expressions and keywords of the language at hand. In writing <eiffel>obtain_analyzer</eiffel> you may use any one of three different techniques, each of which may be the most convenient depending on the precise context, to obtain the required lexical analyzer:
* You may build the lexical analyzer by defining its regular expressions one by one, using the procedures described in the presentation of METALEX, in particular <eiffel>put_expression</eiffel> and <eiffel>put_keyword</eiffel>.
* You may use use procedure <eiffel>retrieve_analyzer</eiffel> from METALEX to retrieve an analyzer which a previous session saved into a file.
* Finally, you may write a lexical grammar file (or reuse an existing one) and process it on the spot by using procedure <eiffel>read_grammar</eiffel> from METALEX.
@@ -670,7 +670,7 @@ The call build <code> ( </code>document takes care of steps [[#step_e1|1]] and
<br/>
where set_input_file, from class <eiffel>INPUT</eiffel>, has a self-explanatory effect.
Finally, step [[#step_e4|4]] (processing the document) is simply a call to procedure process, obtained from [[ref:/libraries/parse/reference/construct_chart|CONSTRUCT]] . Recall that this procedure simply executes <br/>
Finally, step [[#step_e4|4]] (processing the document) is simply a call to procedure process, obtained from [[ref:libraries/parse/reference/construct_chart|CONSTRUCT]] . Recall that this procedure simply executes <br/>
<code>
parse
if parsed then