98 lines
4.1 KiB
Plaintext
98 lines
4.1 KiB
Plaintext
[/==============================================================================
|
|
Copyright (C) 2001-2011 Joel de Guzman
|
|
Copyright (C) 2001-2011 Hartmut Kaiser
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
===============================================================================/]
|
|
|
|
[section:lexer_quickstart1 Quickstart 1 - A word counter using __lex__]
|
|
|
|
__lex__ is very modular, which follows the general building principle of the
|
|
__spirit__ libraries. You never pay for features you don't use. It is nicely
|
|
integrated with the other parts of __spirit__ but nevertheless can be used
|
|
separately to build stand alone lexical analyzers.
|
|
The first quick start example describes a stand alone application:
|
|
counting characters, words, and lines in a file, very similar to what the well
|
|
known Unix command `wc` is doing (for the full example code see here:
|
|
[@../../example/lex/word_count_functor.cpp word_count_functor.cpp]).
|
|
|
|
[import ../example/lex/word_count_functor.cpp]
|
|
|
|
|
|
[heading Prerequisites]
|
|
|
|
The only required `#include` specific to /Spirit.Lex/ follows. It is a wrapper
|
|
for all necessary definitions to use /Spirit.Lex/ in a stand alone fashion, and
|
|
on top of the __lexertl__ library. Additionally we `#include` two of the Boost
|
|
headers to define `boost::bind()` and `boost::ref()`.
|
|
|
|
[wcf_includes]
|
|
|
|
To make all the code below more readable we introduce the following namespaces.
|
|
|
|
[wcf_namespaces]
|
|
|
|
|
|
[heading Defining Tokens]
|
|
|
|
The most important step while creating a lexer using __lex__ is to define the
|
|
tokens to be recognized in the input sequence. This is normally done by
|
|
defining the regular expressions describing the matching character sequences,
|
|
and optionally their corresponding token ids. Additionally the defined tokens
|
|
need to be associated with an instance of a lexer object as provided by the
|
|
library. The following code snippet shows how this can be done using __lex__.
|
|
|
|
[wcf_token_definition]
|
|
|
|
|
|
[heading Doing the Useful Work]
|
|
|
|
We will use a setup, where we want the __lex__ library to invoke a given
|
|
function after any of the generated tokens is recognized. For this reason
|
|
we need to implement a functor taking at least the generated token as an
|
|
argument and returning a boolean value allowing to stop the tokenization
|
|
process. The default token type used in this example carries a token value of
|
|
the type __boost_iterator_range__`<BaseIterator>` pointing to the matched
|
|
range in the underlying input sequence.
|
|
|
|
[wcf_functor]
|
|
|
|
All what is left is to write some boilerplate code helping to tie together the
|
|
pieces described so far. To simplify this example we call the `lex::tokenize()`
|
|
function implemented in __lex__ (for a more detailed description of this
|
|
function see here: __fixme__), even if we could have written a loop to iterate
|
|
over the lexer iterators [`first`, `last`) as well.
|
|
|
|
|
|
[heading Pulling Everything Together]
|
|
|
|
[wcf_main]
|
|
|
|
|
|
[heading Comparing __lex__ with __flex__]
|
|
|
|
This example was deliberately chosen to be as much as possible similar to the
|
|
equivalent __flex__ program (see below), which isn't too different from what
|
|
has to be written when using __lex__.
|
|
|
|
[note Interestingly enough, performance comparisons of lexical analyzers
|
|
written using __lex__ with equivalent programs generated by
|
|
__flex__ show that both have comparable execution speeds!
|
|
Generally, thanks to the highly optimized __lexertl__ library and
|
|
due its carefully designed integration with __spirit__ the
|
|
abstraction penalty to be paid for using __lex__ is negligible.
|
|
]
|
|
|
|
The remaining examples in this tutorial will use more sophisticated features
|
|
of __lex__, mainly to allow further simplification of the code to be written,
|
|
while maintaining the similarity with corresponding features of __flex__.
|
|
__lex__ has been designed to be as similar to __flex__ as possible. That
|
|
is why this documentation will provide the corresponding __flex__ code for the
|
|
shown __lex__ examples almost everywhere. So consequently, here is the __flex__
|
|
code corresponding to the example as shown above.
|
|
|
|
[wcf_flex_version]
|
|
|
|
[endsect]
|