Commit Graph

127 Commits

Author SHA1 Message Date
Crypto City
659d6d2db0 build PIC
Some checks failed
Build and Test / linux-debug (push) Has been cancelled
Build and Test / linux-release (push) Has been cancelled
Build and Test / windows-32-debug (push) Has been cancelled
Build and Test / windows-64-release (push) Has been cancelled
2024-10-30 15:00:11 +00:00
Crypto City
10cf0e78de add a flag to allow any (most) characters in a url
Some checks failed
Build and Test / linux-debug (push) Has been cancelled
Build and Test / linux-release (push) Has been cancelled
Build and Test / windows-32-debug (push) Has been cancelled
Build and Test / windows-64-release (push) Has been cancelled
2024-10-27 09:28:24 +00:00
Martin Mitáš
481fbfbdf7
Check for hard breaks more carefully to avoid false positives...
... caused by trailing tab characters.

Fixes #250.
2024-02-25 20:51:06 +01:00
Martin Mitáš
64f36805b0
Fix handling tab when removing trailing whitespace.
Espacially in connection with ATX headers.
2024-02-25 16:24:50 +01:00
Martin Mitáš
3848bfb6cc Make striketrough spans follow same flanking rules...
... as other emphasis spans.

Fixes #242.
2024-02-21 09:09:31 +01:00
Martin Mitáš
329954690e
Few assorted typo and wording fixes. 2024-02-13 15:46:13 +01:00
Martin Mitáš
aa53f82c29 Introduce an overall limit to link. ref. defs instantiations.
This is to prevent time and output size explosion in case of input
pattern generated by this:

    $ python -c 'N=1000; print("[x]: " + "x" * N + "\n[x]" * N)'

We roughly allow to blowing up the input size of the document
16 times by link reference definitions or up to 1 MB, whatever is
smaller. When the threshold is reached, following reference definitions
are sent to output unresolved as a text.

Fixes #238.
2024-02-07 14:45:09 +01:00
Martin Mitáš
30945d80f8
md_is_link_label: Fix warning about potentially uninitialized variable...
... when build with gcc 13.2.0 in release build.
2024-02-01 22:16:22 +01:00
Martin Mitáš
f37a89f5d7
md_is_inline_link_spec: Use md_lookup_line() instead of walking.
Fixes #236.
2024-02-01 22:14:36 +01:00
Martin Mitas
a44a1cf89c Update tags for HTML block starting condition.
Specifically, "<source>" has been removed, "<search>" added.
2024-01-28 09:00:08 +01:00
Martin Mitas
4aea320a9e md_is_html_comment: Reflect updated spec.txt.
* Accept "<!-->" and "<!--->" as valid HTML comments.
* HTML comment now can contain "--"
2024-01-28 09:00:08 +01:00
Martin Mitas
ef4dcd41df Updated spec.txt expands what's recognized as Unicode punctuation.
Namely all P and S general categories are now treated as punctuation.
2024-01-28 09:00:08 +01:00
Martin Mitáš
5bd6224147 Fix warning about a shadowed variable (with -Wshadow).
Fixes #234.
2024-01-28 08:26:39 +01:00
Martin Mitas
90f8d9646f Put all compiler option to one place and unify them for all targets.
(And fix a newly triggered warning in md2html/md2html.c.)
2024-01-28 08:26:39 +01:00
Shawn Rutledge
3e8048db2b Improve/unify approach to line indexing.
* Use consistently type MD_SIZE for line indeces.
* Remove pointer arithmetic if lines and replace it with line index
  arithmetic.

This resolves some warnings in MSVC builds.
See PR #232.

Co-authored-by: Martin Mitas <mity@morous.org>
Co-authored-by: Shawn Rutledge <s@ecloud.org>
2024-01-26 21:41:38 +01:00
Martin Mitas
5178c585af Fix uninitialized variable.
This was regression introduced in the commit
aeddaf587f.
2024-01-25 23:53:58 +01:00
Martin Mitas
aeddaf587f Simplify and fix handling of newline in code span.
Fixes #223 properly (one corner case has been unnoticed/hidden due test
suite normalization feature).

Fixes #230 (strictly speaking duplicate of the corner case).
2024-01-25 22:24:17 +01:00
Martin Mitas
f46000c7fc Use UTF-8 in copyright notes. 2024-01-24 09:49:59 +01:00
Martin Mitas
2cb4f23f37 md_collect_marks: Improve pre-test for '.'. 2024-01-22 09:14:58 +01:00
Martin Mitas
23e7929bf4 md_analyze_permissive_autolink: Check left boundary asap. 2024-01-22 09:10:25 +01:00
Martin Mitas
fcd3ca13e3 Fix source indentation. 2024-01-21 15:20:49 +01:00
Martin Mitas
83e093fbfc md_opener_stack: Mark the default branch of switch as unreachable.
We were returning NULL previously, but that would lead to a crash
anyway; all callsites expect to get their respective stack anyway
and anything else would mean we are internally broken.
2024-01-21 12:10:46 +01:00
Martin Mitas
0672f27c0c md_process_table_row: Remove not needed freeing of ptr_stack.
This is already handled universally in
md_process_normal_block_contents() which is called from
md_process_table_row() via md_process_table_cell().
2024-01-21 12:10:46 +01:00
Martin Mitas
faf39849db md_is_html_cdata: Remove not needed max_end shrinking.
md_scan_for_html_closer() handles that internally.
2024-01-21 12:10:46 +01:00
Martin Mitas
65957f5369 Limit number of table columns to prevent explosion of output...
with the input pattern in the form of geneated by this one-liner:

$ python3 -c 'N=1000; print("x|" * N + "\n" + "-|" * N + "\n" + "x\n" * N)'

Here the amount of HTML otput grows with N^2.
2024-01-19 22:42:56 +01:00
Martin Mitas
70b247cf7d md_analyze_permissive_autolink: Accept path ending with '/'.
Fixes #226.
2024-01-19 13:59:45 +01:00
Martin Mitas
bbb43fe098 Rename PUSH_MARK() to ADD_MARK().
This is to pevent confusion with opener stack operations.
2024-01-18 17:30:44 +01:00
Martin Mitáš
246e105dfb
Refactor mark chains. (#224)
* Rename MD_MARKCHAIN to MD_MARKSTACK to indicate its semantics much
  clearer.
* Simplify its implementation (single-linked list instead of
  double-linked one).
* Where it was reused (misused?) for other, unrelated stuff, with other
  semantics, it's now done explicitly. (i.e. got rid of
  TABLECELLBOUNDARIES).
* PTR_CHAIN still uses the stack (we don't care about order there), but
  it got separated from the array of ordinary opener stacks at least.
2024-01-18 17:22:54 +01:00
Martin Mitas
601ff05326 Fix handling new line at beginning/end of a code span.
Fixes #223.
2024-01-18 16:49:37 +01:00
Martin Mitas
c076698ab5 md_collect_marks: Get rid of helper vars line_beg, line_end. 2024-01-18 16:10:46 +01:00
Martin Mitas
087288312f md_rollback: Update outdated comment. 2024-01-18 13:39:48 +01:00
Martin Mitas
d40458b5b5 md_rollback: Simplify the function.
We assume the provided opener_index and closer_index do not cross
boundaries of already resolved ranges. Previously the function tried
deal with such situation but this code should not be needed, it was very
complex and, most importantly, broken anyway.
2024-01-18 12:39:36 +01:00
Martin Mitas
a08f6a05f1 Improve/fix latex math extension.
To mitigate false positives:

* We accept $ and $$ as a potential opener only if it's not preceded
  with alnum char.

* Similarly closer cannot be followed with alnum char.

* We now also match closer with last preceding pontential opener, not
  the first one. (And to avoid nesting, any previous openers are
  ignored.)

* Also revert an unintended change in 3fc207affa
  which allowed keeping nested resolved marks in it.
2024-01-18 12:29:31 +01:00
Martin Mitas
3fc207affa Handle e-mail autolinks in a safer way.
For standard e-mail autolinks <user@host> we internally transformed '<'
into '@' (permissive e-mail autolink) to unify handling of missing
"mailto:" needed into the destination attribute.

This is now not true anymore and we handle that specially.

It is actually what has bitten us in
https://oss-fuzz.com/testcase-detail/4815193402048512.

Even though this isn't the root cause of the issue, this change makes
the code safer and easier to understand.
2024-01-18 10:56:12 +01:00
Martin Mitas
4728cd981d md_analyze_tilde: Pop from chain tail like other emphasis.
The function incorrectly used header from the head, leading to wrong
result (incompatible with e.. GFM) but even worse to bad internal state
md_rollback() is then potentially unable to solve.

Fixes #222.
2024-01-17 16:04:14 +01:00
Martin Mitas
006611b9ab md_analyze_dollar: Call md_rollback() only when resolving.
Fixes #221.
2024-01-17 15:04:14 +01:00
Martin Mitáš
d955c495ee
Rework permissive autolinks. (#220)
* We have now dedicated run over the inline marks for them.

 * We check more throughly whether it really looks as an URL or e-mail
   address. The old implementation recognized even heavily broken ones.

 * This allows us to be much more careful in order not to cross already
   resolved marks.

 * Share substantial parts of the code between all three types of the
   permissive autolinks (URL, WWW, e-mail).

 * Merge their tests into one file, spec-permissive-autolinks.txt.

 * Add one pathological case which triggered quadratic behavior in the
   old implementation.
2024-01-17 02:48:57 +01:00
Martin Mitas
0ac9f35d06 md_analyze_marks: Skip analyzing marks if...
they fall into range of previously analyzed mark. That can happen if the
previous mark has been expanded. That typically happens for permissive
auto-links.

This fixes one case of pathologic input leading to quadratic behavior.
2024-01-16 11:36:13 +01:00
Martin Mitas
b6777d7812 Wiki-links extension: Search for '|' only outside resolved ranges. 2024-01-16 01:30:59 +01:00
Martin Mitas
afeece2981 Fix line indentation calculation when interrupting list...
due the "list item cannot begin with two blank lines" rule.
2024-01-15 23:03:21 +01:00
Martin Mitas
7882942708 Fix some emphasis parsing issues.
* We incorrectly applied the infamous rule of three only to
   asterisk-encoded emphasis, it has to be applied to underscore as
   well.

 * We incorrectly applied the rule of three only if the opener
   and/or closer was inside a word. It has also to be applied if the
   mark is both preceded and followed by punctuation.

Fixes #217.
2024-01-13 03:11:29 +01:00
Martin Mitas
5592352fdb HTML declaration doesn't require whitespace before the closer.
Fixes #216.
2024-01-13 00:30:08 +01:00
Martin Mitas
7497ea92b3 Allow tabs after setext header underline.
Fixes #215.
2024-01-13 00:17:08 +01:00
Martin Mitas
2750d9fa3b Add tags <h2>...<h6> as triggers for HTML block type 6.
Fixes #214.
2024-01-13 00:05:38 +01:00
Martin Mitas
4a64fee2ee Bump copyright years. 2024-01-11 13:12:55 +01:00
Martin Mitas
5204c30d40 md_is_html_block_end_condition: Fix return value. 2024-01-11 12:41:40 +01:00
Martin Mitas
f32a861efa md_end_current_block: Fix EOL handling. 2024-01-11 12:21:02 +01:00
Martin Mitas
76abc636ad md_is_html_block_end_condition: Fix EOF handling. 2024-01-11 12:09:55 +01:00
Martin Mitas
4a7246de40 md_is_inline_link_spec: Fix EOL checking. 2024-01-11 11:56:19 +01:00
Martin Mitas
e25ea3d182 Update list of named entities. 2024-01-11 03:34:24 +01:00