with the input pattern in the form of geneated by this one-liner:
$ python3 -c 'N=1000; print("x|" * N + "\n" + "-|" * N + "\n" + "x\n" * N)'
Here the amount of HTML otput grows with N^2.
* Rename MD_MARKCHAIN to MD_MARKSTACK to indicate its semantics much
clearer.
* Simplify its implementation (single-linked list instead of
double-linked one).
* Where it was reused (misused?) for other, unrelated stuff, with other
semantics, it's now done explicitly. (i.e. got rid of
TABLECELLBOUNDARIES).
* PTR_CHAIN still uses the stack (we don't care about order there), but
it got separated from the array of ordinary opener stacks at least.
We assume the provided opener_index and closer_index do not cross
boundaries of already resolved ranges. Previously the function tried
deal with such situation but this code should not be needed, it was very
complex and, most importantly, broken anyway.
To mitigate false positives:
* We accept $ and $$ as a potential opener only if it's not preceded
with alnum char.
* Similarly closer cannot be followed with alnum char.
* We now also match closer with last preceding pontential opener, not
the first one. (And to avoid nesting, any previous openers are
ignored.)
* Also revert an unintended change in 3fc207affa
which allowed keeping nested resolved marks in it.
For standard e-mail autolinks <user@host> we internally transformed '<'
into '@' (permissive e-mail autolink) to unify handling of missing
"mailto:" needed into the destination attribute.
This is now not true anymore and we handle that specially.
It is actually what has bitten us in
https://oss-fuzz.com/testcase-detail/4815193402048512.
Even though this isn't the root cause of the issue, this change makes
the code safer and easier to understand.
The function incorrectly used header from the head, leading to wrong
result (incompatible with e.. GFM) but even worse to bad internal state
md_rollback() is then potentially unable to solve.
Fixes#222.
* We have now dedicated run over the inline marks for them.
* We check more throughly whether it really looks as an URL or e-mail
address. The old implementation recognized even heavily broken ones.
* This allows us to be much more careful in order not to cross already
resolved marks.
* Share substantial parts of the code between all three types of the
permissive autolinks (URL, WWW, e-mail).
* Merge their tests into one file, spec-permissive-autolinks.txt.
* Add one pathological case which triggered quadratic behavior in the
old implementation.
(We also removed direct call support into the library. It was inherited
from cmark as the testsuite was originally taken from there, but it
actually was never updated to work with MD4C.)
they fall into range of previously analyzed mark. That can happen if the
previous mark has been expanded. That typically happens for permissive
auto-links.
This fixes one case of pathologic input leading to quadratic behavior.
* We incorrectly applied the infamous rule of three only to
asterisk-encoded emphasis, it has to be applied to underscore as
well.
* We incorrectly applied the rule of three only if the opener
and/or closer was inside a word. It has also to be applied if the
mark is both preceded and followed by punctuation.
Fixes#217.