After thinking it over and looking at the test failures, this doesn't make
sense. Pretty-printing is not supposed to round-trip.
This reverts commit 16d96efd63.
Joachim Faulhaber pointed out the warnings. Turns out the second one shows
a real bug: if the JSON input ends with a high UTF-16 surrogate, the code
would try to read the low surrogate from the end iterator.
The build files in this directory are neither maintained nor tested. They
are already outdated (VS 2008 projects) and will only get more so. I have
no intention of maintaining them.
The new file contains lots of small unit tests for the new JSON parser, far more extensive than the old one.
Keep the old file too, though, because it contains writing and round-trip tests.
TL;DR: The new parser fixes long-standing bugs and has full Unicode support, but removes non-standard extensions
of the old parser, which could break code:
- String concatenation: the old parser concatenated adjacent string literals like C does.
- Comments: the old parser supported C and C++-style comments. JSON doesn't allow comments.
The JSON writer hasn't been changed; it still has all the Unicode-related problems.
The old JSON parser had quite a few problems:
- Slow to compile.
- Based on the obsolete Spirit.Classic.
- Inherited a multithreading bug from Spirit.Classic (see bug #5520).
- Poor to no support for Unicode.
- Weird departures from standard JSON.
- Tightly bound to string-based property trees.
The new parser has the following features:
- Hand-written recursive descent parser - few template instantiations, fast to compile.
- Parses through a pair of iterators with support for input iterators - can parse directly from streambuf_iterators.
Doesn't need to load the entire file into memory first.
- Push-based stream model.
- Full support for Unicode. Assumes that char is UTF-8. If wchar_t is 16 bits, assumes UTF-16, with support for surrogate pairs.
- Pluggable encoding support. The public interface doesn't expose this yet. Currently, narrow input streams are assumed to use
UTF-8 both internally and externally, and wide streams are assumed to use UTF-16 or UTF-32, depending on the bit width of wchar_t.
Malformed encodings are not accepted.
The pluggable support allows inserting other external encodings, or making narrow streams parse into wide internal trees, etc.
- Replaceable event handlers. Also not exposed by the public interface, the replaceable event handlers allow parsing into non-string
property trees and preserving type information of the JSON.
We use internal details of the container serialization, but these details changed. Re-implement.
Of course, now we use different details, but I don't want to change the format.