This is to prevent time and output size explosion in case of input
pattern generated by this:
$ python -c 'N=1000; print("[x]: " + "x" * N + "\n[x]" * N)'
We roughly allow to blowing up the input size of the document
16 times by link reference definitions or up to 1 MB, whatever is
smaller. When the threshold is reached, following reference definitions
are sent to output unresolved as a text.
Fixes#238.
* Use consistently type MD_SIZE for line indeces.
* Remove pointer arithmetic if lines and replace it with line index
arithmetic.
This resolves some warnings in MSVC builds.
See PR #232.
Co-authored-by: Martin Mitas <mity@morous.org>
Co-authored-by: Shawn Rutledge <s@ecloud.org>
Fixes#223 properly (one corner case has been unnoticed/hidden due test
suite normalization feature).
Fixes#230 (strictly speaking duplicate of the corner case).
We were returning NULL previously, but that would lead to a crash
anyway; all callsites expect to get their respective stack anyway
and anything else would mean we are internally broken.
with the input pattern in the form of geneated by this one-liner:
$ python3 -c 'N=1000; print("x|" * N + "\n" + "-|" * N + "\n" + "x\n" * N)'
Here the amount of HTML otput grows with N^2.
* Rename MD_MARKCHAIN to MD_MARKSTACK to indicate its semantics much
clearer.
* Simplify its implementation (single-linked list instead of
double-linked one).
* Where it was reused (misused?) for other, unrelated stuff, with other
semantics, it's now done explicitly. (i.e. got rid of
TABLECELLBOUNDARIES).
* PTR_CHAIN still uses the stack (we don't care about order there), but
it got separated from the array of ordinary opener stacks at least.
We assume the provided opener_index and closer_index do not cross
boundaries of already resolved ranges. Previously the function tried
deal with such situation but this code should not be needed, it was very
complex and, most importantly, broken anyway.
To mitigate false positives:
* We accept $ and $$ as a potential opener only if it's not preceded
with alnum char.
* Similarly closer cannot be followed with alnum char.
* We now also match closer with last preceding pontential opener, not
the first one. (And to avoid nesting, any previous openers are
ignored.)
* Also revert an unintended change in 3fc207affa
which allowed keeping nested resolved marks in it.
For standard e-mail autolinks <user@host> we internally transformed '<'
into '@' (permissive e-mail autolink) to unify handling of missing
"mailto:" needed into the destination attribute.
This is now not true anymore and we handle that specially.
It is actually what has bitten us in
https://oss-fuzz.com/testcase-detail/4815193402048512.
Even though this isn't the root cause of the issue, this change makes
the code safer and easier to understand.
The function incorrectly used header from the head, leading to wrong
result (incompatible with e.. GFM) but even worse to bad internal state
md_rollback() is then potentially unable to solve.
Fixes#222.
* We have now dedicated run over the inline marks for them.
* We check more throughly whether it really looks as an URL or e-mail
address. The old implementation recognized even heavily broken ones.
* This allows us to be much more careful in order not to cross already
resolved marks.
* Share substantial parts of the code between all three types of the
permissive autolinks (URL, WWW, e-mail).
* Merge their tests into one file, spec-permissive-autolinks.txt.
* Add one pathological case which triggered quadratic behavior in the
old implementation.
they fall into range of previously analyzed mark. That can happen if the
previous mark has been expanded. That typically happens for permissive
auto-links.
This fixes one case of pathologic input leading to quadratic behavior.
* We incorrectly applied the infamous rule of three only to
asterisk-encoded emphasis, it has to be applied to underscore as
well.
* We incorrectly applied the rule of three only if the opener
and/or closer was inside a word. It has also to be applied if the
mark is both preceded and followed by punctuation.
Fixes#217.