Removed first slash in [[ref:libraries/...]] wiki links.

git-svn-id: https://svn.eiffel.com/eiffel-org/trunk@1612 abb3cda0-5349-4a8f-a601-0c33ac3a8c38
This commit is contained in:
eiffel-org
2016-09-21 13:24:17 +00:00
parent 62de6c73e0
commit 952044ed1c
47 changed files with 278 additions and 278 deletions

View File

@@ -9,7 +9,7 @@ The process of recognizing the successive tokens of a text is called lexical ana
Besides recognizing the tokens, it is usually necessary to recognize the deeper syntactic structure of the text. This process is called '''parsing''' or '''syntax analysis''' and is studied in the next chapter.
Figure 1 shows the inheritance structure of the classes discussed in this chapter. Class [[ref:/libraries/parse/reference/l_interface_chart|L_INTERFACE]] has also been included although we will only study it in the [[EiffelParse Tutorial]]; it belongs to the Parse library, where it takes care of the interface between parsing and lexical analysis.
Figure 1 shows the inheritance structure of the classes discussed in this chapter. Class [[ref:libraries/parse/reference/l_interface_chart|L_INTERFACE]] has also been included although we will only study it in the [[EiffelParse Tutorial]]; it belongs to the Parse library, where it takes care of the interface between parsing and lexical analysis.
[[Image:figure1]]
@@ -34,37 +34,37 @@ The classes of the EiffelLex library make it possible to define lexical grammars
===Overview of the classes===
For the user of the EiffelLex library, the classes of most direct interest are [[ref:/libraries/lex/reference/token_chart|TOKEN]] , [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] , [[ref:/libraries/lex/reference/metalex_chart|METALEX]] and [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] .
For the user of the EiffelLex library, the classes of most direct interest are [[ref:libraries/lex/reference/token_chart|TOKEN]] , [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , [[ref:libraries/lex/reference/metalex_chart|METALEX]] and [[ref:libraries/lex/reference/scanning_chart|SCANNING]] .
An instance of [[ref:/libraries/lex/reference/token_chart|TOKEN]] describes a token read from an input file being analyzed, with such properties as the token type, the corresponding string and the position in the text (line, column) where it was found.
An instance of [[ref:libraries/lex/reference/token_chart|TOKEN]] describes a token read from an input file being analyzed, with such properties as the token type, the corresponding string and the position in the text (line, column) where it was found.
An instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] is a lexical analyzer for a certain lexical grammar. Given a reference to such an instance, say analyzer, you may analyze an input text through calls to the features of class [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] , for example:
An instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] is a lexical analyzer for a certain lexical grammar. Given a reference to such an instance, say analyzer, you may analyze an input text through calls to the features of class [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , for example:
<code>
analyzer.get_token
</code>
Class [[ref:/libraries/lex/reference/metalex_chart|METALEX]] defines facilities for building such lexical analyzers. In particular, it provides features for reading the grammar from a file and building the corresponding analyzer. Classes that need to build and use lexical analyzers may be written as descendants of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] to benefit from its general-purpose facilities.
Class [[ref:libraries/lex/reference/metalex_chart|METALEX]] defines facilities for building such lexical analyzers. In particular, it provides features for reading the grammar from a file and building the corresponding analyzer. Classes that need to build and use lexical analyzers may be written as descendants of [[ref:libraries/lex/reference/metalex_chart|METALEX]] to benefit from its general-purpose facilities.
Class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] is one such descendant of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] . It contains all the facilities needed to build an ordinary lexical analyzer and apply it to the analysis of input texts. Because these facilities are simpler to use and are in most cases sufficient, SCANNING will be discussed first; the finer-grain facilities of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] are described towards the end of this chapter.
Class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] is one such descendant of [[ref:libraries/lex/reference/metalex_chart|METALEX]] . It contains all the facilities needed to build an ordinary lexical analyzer and apply it to the analysis of input texts. Because these facilities are simpler to use and are in most cases sufficient, SCANNING will be discussed first; the finer-grain facilities of [[ref:libraries/lex/reference/metalex_chart|METALEX]] are described towards the end of this chapter.
These classes internally rely on others, some of which may be useful for more advanced applications. [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , one of the supporting classes, will be introduced after [[ref:/libraries/lex/reference/metalex_chart|METALEX]] .
These classes internally rely on others, some of which may be useful for more advanced applications. [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , one of the supporting classes, will be introduced after [[ref:libraries/lex/reference/metalex_chart|METALEX]] .
===Library example===
The EiffelStudio delivery includes (in the examples/library/lex subdirectory) a simple example using the EiffelLex library classes. The example applies EiffelLex library facilities to the analysis of a language which is none other than Eiffel itself.
The root class of that example, <eiffel>EIFFEL_SCAN</eiffel>, is only a few lines long; it relies on the general mechanism provided by [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] (see below). The actual lexical grammar is given by a lexical grammar file (a concept explained below): the file of name eiffel_regular in the same directory.
The root class of that example, <eiffel>EIFFEL_SCAN</eiffel>, is only a few lines long; it relies on the general mechanism provided by [[ref:libraries/lex/reference/scanning_chart|SCANNING]] (see below). The actual lexical grammar is given by a lexical grammar file (a concept explained below): the file of name eiffel_regular in the same directory.
===Dealing with finite automata===
Lexical analysis relies on the theory of finite automata. The most advanced of the classes discussed in this chapter, [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , relies on classes describing various forms of automata:
* [[ref:/libraries/lex/reference/dfa_chart|DFA]] : deterministic finite automata.
* [[ref:/libraries/lex/reference/pdfa_chart|PDFA]] : partially deterministic finite automata.
* [[ref:/libraries/lex/reference/ndfa_chart|NDFA]] : non-deterministic finite automata.
* [[ref:/libraries/lex/reference/automaton_chart|AUTOMATON]] , the most general: finite automata.
* [[ref:/libraries/lex/reference/fixed_automaton_chart|FIXED_AUTOMATON]] , [[ref:/libraries/lex/reference/linked_automaton_chart|LINKED_AUTOMATON]] .
Lexical analysis relies on the theory of finite automata. The most advanced of the classes discussed in this chapter, [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , relies on classes describing various forms of automata:
* [[ref:libraries/lex/reference/dfa_chart|DFA]] : deterministic finite automata.
* [[ref:libraries/lex/reference/pdfa_chart|PDFA]] : partially deterministic finite automata.
* [[ref:libraries/lex/reference/ndfa_chart|NDFA]] : non-deterministic finite automata.
* [[ref:libraries/lex/reference/automaton_chart|AUTOMATON]] , the most general: finite automata.
* [[ref:libraries/lex/reference/fixed_automaton_chart|FIXED_AUTOMATON]] , [[ref:libraries/lex/reference/linked_automaton_chart|LINKED_AUTOMATON]] .
These classes may also be useful for systems that need to manipulate finite automata for applications other than lexical analysis. The interface of [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , which includes the features from [[ref:/libraries/lex/reference/automaton_chart|AUTOMATON]] , [[ref:/libraries/lex/reference/ndfa_chart|NDFA]] and [[ref:/libraries/lex/reference/pdfa_chart|PDFA]] , will provide the essential information.
These classes may also be useful for systems that need to manipulate finite automata for applications other than lexical analysis. The interface of [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , which includes the features from [[ref:libraries/lex/reference/automaton_chart|AUTOMATON]] , [[ref:libraries/lex/reference/ndfa_chart|NDFA]] and [[ref:libraries/lex/reference/pdfa_chart|PDFA]] , will provide the essential information.
==TOKENS==
@@ -78,24 +78,24 @@ A lexical analyzer built through any of the techniques described in the rest of
==BUILDING AND USING LEXICAL ANALYZERS==
The general method for performing lexical analysis is the following.
# Create an instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] , giving a lexical analyzer for the desired grammar.
# Create an instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , giving a lexical analyzer for the desired grammar.
# Store the analyzer into a file.
# Retrieve the analyzer from the file.
# Use the analyzer to analyze the tokens of one or more input texts by calling the various features of class [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] on this object.
# Use the analyzer to analyze the tokens of one or more input texts by calling the various features of class [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] on this object.
Steps 2 and 3 are obviously unnecessary if this process is applied as a single sequence. But in almost all practical cases you will want to use the same grammar to analyze many different input texts. Then steps 1 and 2 will be performed once and for all as soon as the lexical grammar is known, yielding an instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] that step 2 stores into a file; then in every case that requires analyzing a text you will simply retrieve the analyzer and apply it, performing steps 3 and 4 only.
Steps 2 and 3 are obviously unnecessary if this process is applied as a single sequence. But in almost all practical cases you will want to use the same grammar to analyze many different input texts. Then steps 1 and 2 will be performed once and for all as soon as the lexical grammar is known, yielding an instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] that step 2 stores into a file; then in every case that requires analyzing a text you will simply retrieve the analyzer and apply it, performing steps 3 and 4 only.
The simplest way to store and retrieve the instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] and all related objects is to use the facilities of class [[ref:/libraries/base/reference/storable_chart|STORABLE]] : procedure store and one of the retrieval procedures. To facilitate this process, [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] inherits from [[ref:/libraries/base/reference/storable_chart|STORABLE]] .
The simplest way to store and retrieve the instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] and all related objects is to use the facilities of class [[ref:libraries/base/reference/storable_chart|STORABLE]] : procedure store and one of the retrieval procedures. To facilitate this process, [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] inherits from [[ref:libraries/base/reference/storable_chart|STORABLE]] .
The next sections explain how to perform these various steps. In the most common case, the best technique is to inherit from class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , which provides a framework for retrieving an analyzer file if it exists, creating it from a grammar description otherwise, and proceed with the lexical analysis of one or more input texts.
The next sections explain how to perform these various steps. In the most common case, the best technique is to inherit from class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , which provides a framework for retrieving an analyzer file if it exists, creating it from a grammar description otherwise, and proceed with the lexical analysis of one or more input texts.
==LEXICAL GRAMMAR FILES AND CLASS SCANNING==
Class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] may be used as an ancestor by classes that need to perform lexical analysis. When using [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] you will need a '''lexical grammar file''' that contains the description of the lexical grammar. Since it is easy to edit and adapt a file without modifying the software proper, this technique provides flexibility and supports the incremental development and testing of lexical analyzers.
Class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] may be used as an ancestor by classes that need to perform lexical analysis. When using [[ref:libraries/lex/reference/scanning_chart|SCANNING]] you will need a '''lexical grammar file''' that contains the description of the lexical grammar. Since it is easy to edit and adapt a file without modifying the software proper, this technique provides flexibility and supports the incremental development and testing of lexical analyzers.
===The build procedure===
To obtain a lexical analyzer in a descendant of class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , use the procedure
To obtain a lexical analyzer in a descendant of class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , use the procedure
<code>
build (store_file_name, grammar_file_name: STRING)</code>
@@ -146,7 +146,7 @@ This will read in and process successive input tokens. Procedure <eiffel>analyze
The initial action <eiffel>begin_analysis</eiffel>, which by default prints a header, and the terminal action <eiffel>end_analysis</eiffel>, which by default does nothing, may also be redefined.
To build lexical analyzers which provide a higher degree of flexibility, use [[ref:/libraries/lex/reference/metalex_chart|METALEX]] or [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , as described in the last part of this chapter.
To build lexical analyzers which provide a higher degree of flexibility, use [[ref:libraries/lex/reference/metalex_chart|METALEX]] or [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , as described in the last part of this chapter.
==ANALYZING INPUT==
@@ -154,9 +154,9 @@ Let us look more precisely at how we can use a lexical analyzer to analyze an in
===Class LEXICAL===
Procedure <eiffel>analyze</eiffel> takes care of the most common needs of lexical analysis. But if you need more advanced lexical analysis facilities you will need an instance of class [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] (a direct instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] itself or of one of its proper descendants). If you are using class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] as described above, you will have access to such an instance through the attribute <eiffel>analyzer</eiffel>.
Procedure <eiffel>analyze</eiffel> takes care of the most common needs of lexical analysis. But if you need more advanced lexical analysis facilities you will need an instance of class [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] (a direct instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] itself or of one of its proper descendants). If you are using class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] as described above, you will have access to such an instance through the attribute <eiffel>analyzer</eiffel>.
This discussion will indeed assume that you have an entity attached to an instance of [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] . The name of that entity is assumed to be <eiffel>analyzer</eiffel>, although it does not need to be the attribute from [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] . You can apply to that <eiffel>analyzer</eiffel> the various exported features features of class [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] , explained below. All the calls described below should use <eiffel>analyzer</eiffel> as their target, as in
This discussion will indeed assume that you have an entity attached to an instance of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] . The name of that entity is assumed to be <eiffel>analyzer</eiffel>, although it does not need to be the attribute from [[ref:libraries/lex/reference/scanning_chart|SCANNING]] . You can apply to that <eiffel>analyzer</eiffel> the various exported features features of class [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , explained below. All the calls described below should use <eiffel>analyzer</eiffel> as their target, as in
<code>
analyzer.set_file ("my_file_name")
</code>
@@ -168,12 +168,12 @@ To create a new analyzer, use
create analyzer.make_new
</code>
You may also retrieve an analyzer from a previous session. [[ref:/libraries/lex/reference/lexical_chart|LEXICAL]] is a descendant from [[ref:/libraries/base/reference/storable_chart|STORABLE]] , so you can use feature retrieved for that purpose. In a descendant of [[ref:/libraries/base/reference/storable_chart|STORABLE]] , simply write
You may also retrieve an analyzer from a previous session. [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] is a descendant from [[ref:libraries/base/reference/storable_chart|STORABLE]] , so you can use feature retrieved for that purpose. In a descendant of [[ref:libraries/base/reference/storable_chart|STORABLE]] , simply write
<code>
analyzer ?= retrieved
</code>
If you do not want to make the class a descendant of [[ref:/libraries/base/reference/storable_chart|STORABLE]] , use the creation procedure <eiffel>make</eiffel> of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , not to be confused with <eiffel>make_new</eiffel> above:
If you do not want to make the class a descendant of [[ref:libraries/base/reference/storable_chart|STORABLE]] , use the creation procedure <eiffel>make</eiffel> of [[ref:libraries/lex/reference/lexical_chart|LEXICAL]] , not to be confused with <eiffel>make_new</eiffel> above:
<code>
create analyzer.make
analyzer ?= analyzer.retrieved
@@ -183,12 +183,12 @@ If you do not want to make the class a descendant of [[ref:/libraries/base/refer
To analyze a text, call <eiffel>set_file</eiffel> or <eiffel>set_string</eiffel> to specify the document to be parsed. With the first call, the analysis will be applied to a file; with the second, to a string.
{{note|if you use procedure <eiffel>analyze</eiffel> of [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , you do not need any such call, since <eiffel>analyze</eiffel> calls <eiffel>set_file</eiffel> on the file name passed as argument. }}
{{note|if you use procedure <eiffel>analyze</eiffel> of [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , you do not need any such call, since <eiffel>analyze</eiffel> calls <eiffel>set_file</eiffel> on the file name passed as argument. }}
===Obtaining the tokens===
The basic procedure for analyzing successive tokens in the text is <eiffel>get_token</eiffel>, which reads in one token and sets up various attributes of the analyzer to record properties of that token:
* <eiffel>last_token</eiffel>, a function of type [[ref:/libraries/lex/reference/token_chart|TOKEN]] , which provides all necessary information on the last token read.
* <eiffel>last_token</eiffel>, a function of type [[ref:libraries/lex/reference/token_chart|TOKEN]] , which provides all necessary information on the last token read.
* <eiffel>token_line_number</eiffel> and<eiffel> token_column_number</eiffel>, to know where the token is in the text. These queries return results of type <eiffel>INTEGER</eiffel>.
* <eiffel>token_type</eiffel>, giving the regular expression type, identified by its integer number (which is the value <eiffel>No_token</eiffel> if no correct token was recognized).
* <eiffel>other_possible_tokens</eiffel>, an array giving all the other possible token types of the last token. (If <eiffel>token_type</eiffel> is <eiffel>No_token</eiffel> the array is empty.)
@@ -218,13 +218,13 @@ Here is the most common way of using the preceding facilities:
end_analysis
</code>
This scheme is used by procedure <eiffel>analyze</eiffel> of class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , so that in standard cases you may simply inherit from that class and redefine procedures <eiffel>begin_analysis</eiffel>, <eiffel>do_a_token</eiffel>, and <eiffel>end_analysis</eiffel>. If you are not inheriting from [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , these names simply denote procedures that you must provide.
This scheme is used by procedure <eiffel>analyze</eiffel> of class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , so that in standard cases you may simply inherit from that class and redefine procedures <eiffel>begin_analysis</eiffel>, <eiffel>do_a_token</eiffel>, and <eiffel>end_analysis</eiffel>. If you are not inheriting from [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , these names simply denote procedures that you must provide.
==REGULAR EXPRESSIONS==
The EiffelLex library supports a powerful set of construction mechanisms for describing the various types of tokens present in common languages such as programming languages, specification languages or just text formats. These mechanisms are called '''regular expressions'''; any regular expression describes a set of possible tokens, called the '''specimens''' of the regular expression.
Let us now study the format of regular expressions. This format is used in particular for the lexical grammar files needed by class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] and (as seen below) by procedure <eiffel>read_grammar</eiffel> of class [[ref:/libraries/lex/reference/metalex_chart|METALEX]] . The ''eiffel_regular'' grammar file in the examples directory provides an extensive example.
Let us now study the format of regular expressions. This format is used in particular for the lexical grammar files needed by class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] and (as seen below) by procedure <eiffel>read_grammar</eiffel> of class [[ref:libraries/lex/reference/metalex_chart|METALEX]] . The ''eiffel_regular'' grammar file in the examples directory provides an extensive example.
Each regular expression denotes a set of tokens. For example, the first regular expression seen above, <br/>
@@ -401,7 +401,7 @@ BOOLEAN
{{caution|Every keyword in the keyword section must be a specimen of one of the token types defined for the grammar, and that token type must be the last one defined in the lexical grammar file, just before the '''Keywords''' line. So in Eiffel where the keywords have the same lexical structure as identifiers, the last line before the keywords must be the definition of the token type ''Identifier'', as shown above. }}
{{note|The rule that all keywords must be specimens of one token type is a matter of convenience and simplicity, and only applies if you are using SCANNING and lexical grammar files. There is no such restriction if you rely directly on the more general facilities provided by [[ref:/libraries/lex/reference/metalex_chart|METALEX]] or [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] . Then different keywords may be specimens of different regular expressions; you will have to specify the token type of every keyword, as explained later in this chapter. }}
{{note|The rule that all keywords must be specimens of one token type is a matter of convenience and simplicity, and only applies if you are using SCANNING and lexical grammar files. There is no such restriction if you rely directly on the more general facilities provided by [[ref:libraries/lex/reference/metalex_chart|METALEX]] or [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] . Then different keywords may be specimens of different regular expressions; you will have to specify the token type of every keyword, as explained later in this chapter. }}
===Case sensitivity===
@@ -433,19 +433,19 @@ The inverse call, corresponding to the default rule, is
keywords_ignore_case
</code>
Either of these calls must be executed before you define any keywords; if you are using [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , this means before calling procedure build. Once set, the keyword case-sensitivity policy cannot be changed.
Either of these calls must be executed before you define any keywords; if you are using [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , this means before calling procedure build. Once set, the keyword case-sensitivity policy cannot be changed.
==USING METALEX TO BUILD A LEXICAL ANALYZER==
(You may skip the rest of this chapter if you only need simple lexical facilities.)
Class [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , as studied above, relies on a class [[ref:/libraries/lex/reference/metalex_chart|METALEX]] . In some cases, you may prefer to use the features of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] directly. Since [[ref:libraries/lex/reference/scanning_chart|SCANNING]] inherits from [[ref:/libraries/lex/reference/metalex_chart|METALEX]] , anything you do with [[ref:/libraries/lex/reference/metalex_chart|METALEX]] can in fact be done with [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , but you may wish to stay with just [[ref:/libraries/lex/reference/metalex_chart|METALEX]] if you do not need the additional features of [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] .
Class [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , as studied above, relies on a class [[ref:libraries/lex/reference/metalex_chart|METALEX]] . In some cases, you may prefer to use the features of [[ref:libraries/lex/reference/metalex_chart|METALEX]] directly. Since [[ref:libraries/lex/reference/scanning_chart|SCANNING]] inherits from [[ref:libraries/lex/reference/metalex_chart|METALEX]] , anything you do with [[ref:libraries/lex/reference/metalex_chart|METALEX]] can in fact be done with [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , but you may wish to stay with just [[ref:libraries/lex/reference/metalex_chart|METALEX]] if you do not need the additional features of [[ref:libraries/lex/reference/scanning_chart|SCANNING]] .
===Steps in using METALEX===
[[ref:/libraries/lex/reference/metalex_chart|METALEX]] has an attribute analyzer which will be attached to a lexical analyzer. This class provides tools for building a lexical analyzer incrementally through explicit feature calls; you can still use a lexical grammar file, but do not have to.
[[ref:libraries/lex/reference/metalex_chart|METALEX]] has an attribute analyzer which will be attached to a lexical analyzer. This class provides tools for building a lexical analyzer incrementally through explicit feature calls; you can still use a lexical grammar file, but do not have to.
The following extract from a typical descendant of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] illustrates the process of building a lexical analyzer in this way:
The following extract from a typical descendant of [[ref:libraries/lex/reference/metalex_chart|METALEX]] illustrates the process of building a lexical analyzer in this way:
<code>
Upper_identifier, Lower_identifier, Decimal_constant, Octal_constant, Word: INTEGER is unique
@@ -467,13 +467,13 @@ The following extract from a typical descendant of [[ref:/libraries/lex/referenc
make_analyzer
</code>
This example follows the general scheme of building a lexical analyzer with the features of [[ref:/libraries/lex/reference/metalex_chart|METALEX]] , in a class that will normally be a descendant of [[ref:libraries/lex/reference/metalex_chart|METALEX]] :
This example follows the general scheme of building a lexical analyzer with the features of [[ref:libraries/lex/reference/metalex_chart|METALEX]] , in a class that will normally be a descendant of [[ref:libraries/lex/reference/metalex_chart|METALEX]] :
# Set options, such as case sensitivity.
# Record regular expressions.
# Record keywords (this may be interleaved with step 2.)
# "Freeze" the analyzer by a call to <eiffel>make_analyzer</eiffel>.
To perform steps 2 to 4 in a single shot and generate a lexical analyzer from a lexical grammar file, as with [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] , you may use the procedure
To perform steps 2 to 4 in a single shot and generate a lexical analyzer from a lexical grammar file, as with [[ref:libraries/lex/reference/scanning_chart|SCANNING]] , you may use the procedure
<code>
read_grammar (grammar_file_name: STRING)
</code>
@@ -492,7 +492,7 @@ Procedure <eiffel>dollar_w</eiffel> corresponds to the '''$W''' syntax for regul
put_nameless_expression ( "$W" ,Word )
</code>
Procedure <eiffel>declare_keyword</eiffel> records a keyword. The first argument is a string containing the keyword; the second argument is the regular expression of which the keyword must be a specimen. The example shows that here - in contrast with the rule enforced by [[ref:/libraries/lex/reference/scanning_chart|SCANNING]] - not all keywords need be specimens of the same regular expression.
Procedure <eiffel>declare_keyword</eiffel> records a keyword. The first argument is a string containing the keyword; the second argument is the regular expression of which the keyword must be a specimen. The example shows that here - in contrast with the rule enforced by [[ref:libraries/lex/reference/scanning_chart|SCANNING]] - not all keywords need be specimens of the same regular expression.
The calls seen so far record a number of regular expressions and keywords, but do not give us a lexical analyzer yet. To obtain a usable lexical analyzer, you must call
<code>
@@ -503,17 +503,17 @@ After that call, you may not record any new regular expression or keyword. The a
{{note|for readers knowledgeable in the theory of lexical analysis: one of the most important effects of the call to <eiffel>make_analyzer</eiffel> is to transform the non-deterministic finite automaton resulting from calls such as the ones above into a deterministic finite automaton. }}
Remember that if you use procedure <eiffel>read_grammar</eiffel>, you need not worry about <eiffel>make_analyzer</eiffel>, as the former procedure calls the latter.
Another important feature of class [[ref:/libraries/lex/reference/metalex_chart|METALEX]] is procedure <eiffel>store_analyzer</eiffel>, which stores the analyzer into a file whose name is passed as argument, for use by later lexical analysis sessions. To retrieve the analyzer, simply use procedure <eiffel>retrieve_analyzer</eiffel>, again with a file name as argument.
Another important feature of class [[ref:libraries/lex/reference/metalex_chart|METALEX]] is procedure <eiffel>store_analyzer</eiffel>, which stores the analyzer into a file whose name is passed as argument, for use by later lexical analysis sessions. To retrieve the analyzer, simply use procedure <eiffel>retrieve_analyzer</eiffel>, again with a file name as argument.
==BUILDING A LEXICAL ANALYZER WITH LEX_BUILDER==
To have access to the most general set of lexical analysis mechanisms, you may use class [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , which gives you an even finer grain of control than [[ref:/libraries/lex/reference/metalex_chart|METALEX]] . This is not necessary in simple applications.
To have access to the most general set of lexical analysis mechanisms, you may use class [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] , which gives you an even finer grain of control than [[ref:libraries/lex/reference/metalex_chart|METALEX]] . This is not necessary in simple applications.
===Building a lexical analyzer===
[[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] enables you to build a lexical analyzer by describing successive token types and keywords. This is normally done in a descendant of [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] . For each token type, you call a procedure that builds an object, or '''tool''', representing the associated regular expression.
[[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] enables you to build a lexical analyzer by describing successive token types and keywords. This is normally done in a descendant of [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] . For each token type, you call a procedure that builds an object, or '''tool''', representing the associated regular expression.
For the complete list of available procedures, refer to the flat-short form of the class; there is one procedure for every category of regular expression studied earlier in this chapter. Two typical examples of calls are:
<code>
@@ -527,7 +527,7 @@ For the complete list of available procedures, refer to the flat-short form of t
Every such procedure call also assigns an integer index to the tool it creates; this number is available through the attribute <eiffel>last_created_tool</eiffel>. You will need to record it into an integer entity, for example <eiffel>Identifier</eiffel> or <eiffel>Letter</eiffel>.
===An example===
The following extract from a typical descendant of [[ref:/libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] illustrates how to create a tool representing the identifiers of an Eiffel-like language.
The following extract from a typical descendant of [[ref:libraries/lex/reference/lex_builder_chart|LEX_BUILDER]] illustrates how to create a tool representing the identifiers of an Eiffel-like language.
<code>
Identifier, Letter, Digit, Underlined, Suffix, Suffix_list: INTEGER