209 lines
10 KiB
Plaintext
209 lines
10 KiB
Plaintext
[/==============================================================================
|
|
Copyright (C) 2001-2011 Joel de Guzman
|
|
Copyright (C) 2001-2011 Hartmut Kaiser
|
|
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
|
===============================================================================/]
|
|
|
|
[section Preface]
|
|
|
|
[:['["Examples of designs that meet most of the criteria for
|
|
"goodness" (easy to understand, flexible, efficient) are a
|
|
recursive-descent parser, which is traditional procedural
|
|
code. Another example is the STL, which is a generic library of
|
|
containers and algorithms depending crucially on both traditional
|
|
procedural code and on parametric polymorphism.]] [*--Bjarne
|
|
Stroustrup]]
|
|
|
|
[heading History]
|
|
|
|
[heading /80s/]
|
|
|
|
In the mid-80s, Joel wrote his first calculator in Pascal. Such an
|
|
unforgettable coding experience, he was amazed at how a mutually
|
|
recursive set of functions can model a grammar specification. In time,
|
|
the skills he acquired from that academic experience became very
|
|
practical as he was tasked to do some parsing. For instance, whenever he
|
|
needed to perform any form of binary or text I/O, he tried to approach
|
|
each task somewhat formally by writing a grammar using Pascal-like
|
|
syntax diagrams and then a corresponding recursive-descent parser. This
|
|
process worked very well.
|
|
|
|
[heading /90s/]
|
|
|
|
The arrival of the Internet and the World Wide Web magnified the need
|
|
for parsing a thousand-fold. At one point Joel had to write an HTML
|
|
parser for a Web browser project. Using the W3C formal specifications,
|
|
he easily wrote a recursive-descent HTML parser. With the influence of
|
|
the Internet, RFC specifications were abundent. SGML, HTML, XML, email
|
|
addresses and even those seemingly trivial URLs were all formally
|
|
specified using small EBNF-style grammar specifications. Joel had more
|
|
parsing to do, and he wished for a tool similar to larger parser
|
|
generators such as YACC and ANTLR, where a parser is built automatically
|
|
from a grammar specification.
|
|
|
|
This ideal tool would be able to parse anything from email addresses and
|
|
command lines, to XML and scripting languages. Scalability was a primary
|
|
goal. The tool would be able to do this without incurring a heavy
|
|
development load, which was not possible with the above mentioned parser
|
|
generators. The result was Spirit.
|
|
|
|
Spirit was a personal project that was conceived when Joel was involved
|
|
in R&D in Japan. Inspired by the GoF's composite and interpreter
|
|
patterns, he realized that he can model a recursive-descent parser with
|
|
hierarchical-object composition of primitives (terminals) and composites
|
|
(productions). The original version was implemented with run-time
|
|
polymorphic classes. A parser was generated at run time by feeding in
|
|
production rule strings such as:
|
|
|
|
"prod ::= {'A' | 'B'} 'C';"
|
|
|
|
A compile function compiled the parser, dynamically creating a hierarchy
|
|
of objects and linking semantic actions on the fly. A very early text
|
|
can be found here: __early_spirit__.
|
|
|
|
[heading /2001 to 2006/]
|
|
|
|
Version 1.0 to 1.8 was a complete rewrite of the original Spirit parser
|
|
using expression templates and static polymorphism, inspired by the
|
|
works of Todd Veldhuizen (__exprtemplates__, C++ Report, June
|
|
1995). Initially, the static-Spirit version was meant only to replace
|
|
the core of the original dynamic-Spirit. Dynamic-Spirit needed a parser
|
|
to implement itself anyway. The original employed a hand-coded
|
|
recursive-descent parser to parse the input grammar specification
|
|
strings. It was at this time when Hartmut Kaiser joined the Spirit
|
|
development.
|
|
|
|
After its initial "open-source" debut in May 2001, static-Spirit became
|
|
a success. At around November 2001, the Spirit website had an activity
|
|
percentile of 98%, making it the number one parser tool at Source Forge
|
|
at the time. Not bad for a niche project like a parser library. The
|
|
"static" portion of Spirit was forgotten and static-Spirit simply became
|
|
Spirit. The library soon evolved to acquire more dynamic features.
|
|
|
|
Spirit was formally accepted into __boost__ in October 2002. Boost is a
|
|
peer-reviewed, open collaborative development effort around a collection
|
|
of free Open Source C++ libraries covering a wide range of domains. The
|
|
Boost Libraries have become widely known as an industry standard for
|
|
design and implementation quality, robustness, and reusability.
|
|
|
|
[heading /2007/]
|
|
|
|
Over the years, especially after Spirit was accepted into Boost, Spirit
|
|
has served its purpose quite admirably. [*/Classic-Spirit/] (versions
|
|
prior to 2.0) focused on transduction parsing, where the input string is
|
|
merely translated to an output string. Many parsers fall into the
|
|
transduction type. When the time came to add attributes to the parser
|
|
library, it was done in a rather ad-hoc manner, with the goal being 100%
|
|
backward compatible with Classic Spirit. As a result, some parsers have
|
|
attributes, some don't.
|
|
|
|
Spirit V2 is another major rewrite. Spirit V2 grammars are fully
|
|
attributed (see __attr_grammar__) which means that all parser components
|
|
have attributes. To do this efficiently and elegantly, we had to use a
|
|
couple of infrastructure libraries. Some did not exist, some were quite
|
|
new when Spirit debuted, and some needed work. __mpl__ is an important
|
|
infrastructure library, yet is not sufficient to implement Spirit V2.
|
|
Another library had to be written: __fusion__. Fusion sits between MPL
|
|
and STL --between compile time and runtime -- mapping types to values.
|
|
Fusion is a direct descendant of both MPL and __boost_tuples__. Fusion
|
|
is now a full-fledged __boost__ library. __phoenix__ also had to be
|
|
beefed up to support Spirit V2. The result is __phoenix__. Last
|
|
but not least, Spirit V2 uses an __exprtemplates__ library called
|
|
__boost_proto__.
|
|
|
|
Even though it has evolved and matured to become a multi-module library,
|
|
Spirit is still used for micro-parsing tasks as well as scripting
|
|
languages. Like C++, you only pay for features that you need. The power
|
|
of Spirit comes from its modularity and extensibility. Instead of giving
|
|
you a sledgehammer, it gives you the right ingredients to easily create
|
|
a sledgehammer.
|
|
|
|
[heading New Ideas: Spirit V2]
|
|
|
|
Just before the development of Spirit V2 began, Hartmut came across the
|
|
__string_template__ library that is a part of the ANTLR parser
|
|
framework. [footnote Quote from http://www.stringtemplate.org/: It is a
|
|
Java template engine (with ports for C# and Python) for generating
|
|
source code, web pages, emails, or any other formatted text output.]
|
|
The concepts presented in that library lead Hartmut to
|
|
the next step in the evolution of Spirit. Parsing and generation are
|
|
tightly connected to a formal notation, or a grammar. The grammar
|
|
describes both input and output, and therefore, a parser library should
|
|
have a grammar driven output. This duality is expressed in Spirit by the
|
|
parser library __qi__ and the generator library __karma__ using the same
|
|
component infrastructure.
|
|
|
|
The idea of creating a lexer library well integrated with the Spirit
|
|
parsers is not new. This has been discussed almost since Classic-Spirit
|
|
(pre V2) initially debuted. Several attempts to integrate existing lexer
|
|
libraries and frameworks with Spirit have been made and served as a
|
|
proof of concept and usability (for example see __wave__: The Boost
|
|
C/C++ Preprocessor Library, and __slex__: a fully dynamic C++ lexer
|
|
implemented with Spirit). Based on these experiences we added __lex__: a
|
|
fully integrated lexer library to the mix, allowing the user to take
|
|
advantage of the power of regular expressions for token matching,
|
|
removing pressure from the parser components, simplifying parser
|
|
grammars. Again, Spirit's modular structure allowed us to reuse the same
|
|
underlying component library as for the parser and generator libraries.
|
|
|
|
[heading How to use this manual]
|
|
|
|
Each major section (there are 3: __sec_qi__, __sec_karma__, and
|
|
__sec_lex__) is roughly divided into 3 parts:
|
|
|
|
# Tutorials: A step by step guide with heavily annotated code. These
|
|
are meant to get the user acquainted with the library as quickly as
|
|
possible. The objective is to build the confidence of the user in
|
|
using the library through abundant examples and detailed
|
|
instructions. Examples speak volumes and we have volumes of
|
|
examples!
|
|
|
|
# Abstracts: A high level summary of key topics. The objective is to
|
|
give the user a high level view of the library, the key concepts,
|
|
background and theories.
|
|
|
|
# Reference: Detailed formal technical reference. We start with a quick
|
|
reference -- an easy to use table that maps into the reference proper.
|
|
The reference proper starts with C++ concepts followed by
|
|
models of the concepts.
|
|
|
|
Some icons are used to mark certain topics indicative of their relevance.
|
|
These icons precede some text to indicate:
|
|
|
|
[table Icons
|
|
|
|
[[Icon] [Name] [Meaning]]
|
|
|
|
[[__note__] [Note] [Generally useful information (an aside that
|
|
doesn't fit in the flow of the text)]]
|
|
|
|
[[__tip__] [Tip] [Suggestion on how to do something
|
|
(especially something that is not obvious)]]
|
|
|
|
[[__important__] [Important] [Important note on something to take
|
|
particular notice of]]
|
|
|
|
[[__caution__] [Caution] [Take special care with this - it may
|
|
not be what you expect and may cause bad
|
|
results]]
|
|
|
|
[[__danger__] [Danger] [This is likely to cause serious
|
|
trouble if ignored]]
|
|
]
|
|
|
|
This documentation is automatically generated by Boost QuickBook
|
|
documentation tool. QuickBook can be found in the __boost_tools__.
|
|
|
|
[heading Support]
|
|
|
|
Please direct all questions to Spirit's mailing list. You can subscribe
|
|
to the __spirit_list__. The mailing list has a searchable archive. A
|
|
search link to this archive is provided in __spirit__'s home page. You
|
|
may also read and post messages to the mailing list through
|
|
__spirit_general__ (thanks to __gmane__). The news group mirrors the
|
|
mailing list. Here is a link to the archives: __mlist_archive__.
|
|
|
|
[endsect] [/ Preface]
|