This is to prevent time and output size explosion in case of input
pattern generated by this:
$ python -c 'N=1000; print("[x]: " + "x" * N + "\n[x]" * N)'
We roughly allow to blowing up the input size of the document
16 times by link reference definitions or up to 1 MB, whatever is
smaller. When the threshold is reached, following reference definitions
are sent to output unresolved as a text.
Fixes#238.
Fixes#223 properly (one corner case has been unnoticed/hidden due test
suite normalization feature).
Fixes#230 (strictly speaking duplicate of the corner case).
with the input pattern in the form of geneated by this one-liner:
$ python3 -c 'N=1000; print("x|" * N + "\n" + "-|" * N + "\n" + "x\n" * N)'
Here the amount of HTML otput grows with N^2.
To mitigate false positives:
* We accept $ and $$ as a potential opener only if it's not preceded
with alnum char.
* Similarly closer cannot be followed with alnum char.
* We now also match closer with last preceding pontential opener, not
the first one. (And to avoid nesting, any previous openers are
ignored.)
* Also revert an unintended change in 3fc207affa
which allowed keeping nested resolved marks in it.
The function incorrectly used header from the head, leading to wrong
result (incompatible with e.. GFM) but even worse to bad internal state
md_rollback() is then potentially unable to solve.
Fixes#222.
* We have now dedicated run over the inline marks for them.
* We check more throughly whether it really looks as an URL or e-mail
address. The old implementation recognized even heavily broken ones.
* This allows us to be much more careful in order not to cross already
resolved marks.
* Share substantial parts of the code between all three types of the
permissive autolinks (URL, WWW, e-mail).
* Merge their tests into one file, spec-permissive-autolinks.txt.
* Add one pathological case which triggered quadratic behavior in the
old implementation.
(We also removed direct call support into the library. It was inherited
from cmark as the testsuite was originally taken from there, but it
actually was never updated to work with MD4C.)
* We incorrectly applied the infamous rule of three only to
asterisk-encoded emphasis, it has to be applied to underscore as
well.
* We incorrectly applied the rule of three only if the opener
and/or closer was inside a word. It has also to be applied if the
mark is both preceded and followed by punctuation.
Fixes#217.
if we know that the bracket pair contains nested brackets. That makes
the label invalid anyway, therefore we know that there is no link ref.
def. to be found anyway.
In case of heavily nested bracket pairs, the lookup could lead to
quadratic parsing times.
Fixes#172.
The old version likely could stop prematurely in a corner case when
there was a Unicode character at the end of the either string, which
maps into multiple fold info codepoints.
Fixes#142.