Commit Graph

143 Commits

Author SHA1 Message Date
Martin Mitas
3254b7cb00 md_process_table_block_contents: Suppress empty TBODY block generation.
When the table has no body rows, do not call the callback with
MD_BLOCK_TBODY events.

Fixes #138.
2020-11-13 12:02:39 +01:00
Martin Mitas
4585088ad7 md_analyze_permissive_url_autolink: Better GFM compatibility.
The autolinks now allow unmatched parenthesis, only the trailing
parenthesis closers are handled specially to deal with the situation the
autolink is all inside an outer parenthesis.

Somehow our tests were broken and avoided the cases with unmatched
parenthesis pairs inside the auto-link. That's now fixed and in sync
with GFM specs too.

Fixes #135.
2020-11-13 10:22:34 +01:00
Martin Mitas
002f76c975 md_resolve_links: Skip [...] used as a reference link/image label.
Fixes #131.
2020-10-18 09:43:06 +02:00
Martin Mitas
c501c891b9 Fix spelling of "than" in many occurances.
I often spell it errorneously as "then". Doing this mistake way too
often when typing fast.
2020-07-30 10:13:05 +02:00
Martin Mitas
c595c2ed00 md_process_verbatim_block_contents: Fix off by 1 error.
This caused outputting wrong indentation inside a fenced code blocks for
lines indented with mor ethan 16 spaces.

Fixes #124.
2020-07-30 08:38:19 +02:00
Martin Mitas
0c4d7f3d85 test/normalize.py: Use html.escape instead of cgi.escape.
Fixes #123.
2020-07-28 07:20:51 +02:00
Martin Mitas
d0e3ed79bf md2html: Skip UTF-8 BOM, if present in the input. 2020-03-12 23:08:29 +01:00
Martin Mitas
9e6ab76c24 Minor fuzz-input cleanup.
Move some permissive links incorrectly placed in commonmark.md into
gfm.md.
2020-02-17 12:41:50 +01:00
Martin Mitas
cc9a9d28ca test/fuzz-test: Add some fuzzing testing initial input. 2020-02-16 15:29:54 +01:00
Martin Mitas
5d7c35973e md_analyze_emph: Detect correctly opener chain when resolving the range.
Fixes #107.
2020-02-16 13:51:05 +01:00
Martin Mitas
b4c30cd6e6 Improve wiki-link parsing.
* Get rid of MD_LINE::total_indent.

 * Remove some special complicated branching for nested images: Instead
   we use md_rollback() the wiki-link destination span to kill _any_
   marks resolved so far, including the images.

 * Remove any length limit from label. Only destination length is
   limited, regardless of whether '|' is present or not.

 * Move the special handling of `[[foo|]]` from md_process_inlines()
   into md_resolve_links(). We simply expand the closer mark to consume
   the `|`.

 * Do not modify the opener and closer marks until we really know it
   is indeed a wiki-link.
2020-02-13 02:50:15 +01:00
Martin Mitas
403043bba3 md_mark_chain_append: Set next of the tail mark to -1.
Fixes #104.
2020-01-16 16:27:37 +01:00
Martin Mitáš
e6661f23dc
Implement an underline extension. (#103)
Closes #101.
2020-01-10 19:27:10 +01:00
Martin Mitas
82d7d087cc Rework/improve recognition of strike-through spans.
Closes #102.
2020-01-10 16:11:21 +01:00
Martin Mitas
561f52e05f md_is_autolink_email: Fix an off-by-one error.
Fixes #100.
2020-01-05 18:33:46 +01:00
Martin Mitas
46f25f0b47 md_analyze_emph: Call md_resolve_range() with proper chain.
Errorneously, we have called md_resolve_range() with mark chain derived
from the closer mark. In the case that the opener and closer marks
differ in length (and we have split one or the other), we pass in an
incorrect chain, which may lead to strange behavior in subsequent
analysis.

Fixes #98.
2019-11-12 21:48:26 +01:00
niblo
e336e6404f Add support for Wiki links (#92)
With a new flag MD_FLAG_WIKILINKS, recoginize wiki-style links
as [[foo]] and [[foo|bar]].

Update also the HTML renderer accordingly, to output a custom
HTML tag <x-wikilink> when seeing it.
2019-11-04 15:20:59 +01:00
Martin Mitáš
ef85cfc278
Simplify parsing of tables (#97)
We do so by removing the function md_is_table_row().

md_is_table_row() did some crazy inline parsing to detect whether the
line contains at least one pipe which is not inside a code span or other
high-priority inline element.

This was very complicated under the hood and to was actually breaking
the clean design which separates block analysis parse and inline analysis
of each block contents.

We now just use the table underline for determining the block is table
and its properties like e.g. the column count.

This means a paragraph now cannot interrupt a table. This is a change in
a behavior but likely acceptable one as it actually brings the behavior
closer to behavior of tables in cmark-gfm in this regard.

Last but not least, it seems to prevent adoption of other useful
features, for about that, see the discussion in PR #92.
2019-11-04 15:05:07 +01:00
Martin Mitas
993c7b9b88 Render LaTeX math into HTML as a tag <x-equation>...
... instead of <equation>. This is to highlight that it is not a
standard HTML tag.
2019-11-03 23:32:46 +01:00
Martin Mitas
e97d0250bb Link label comparision fixes.
* md_link_label_cmp: To match the labels, the loop has to reach ends of
   the labels for both of them.

 * md_link_label_cmp_load_fold_info: Collapse consequtive whitespace
   into a single ' ' for the label comparison purposes.

Fixes #96.
2019-11-03 13:57:00 +01:00
Martin Mitas
0354e1ab5a md_is_container_mark: Ordered list mark requires at least one digit.
Fixes #95.
2019-10-04 22:35:54 +02:00
Martin Mitas
9760636977 Fix the last test case in latex-math.txt. 2019-07-07 11:19:21 +02:00
Martin Mitas
099ce69b04 Add missing file into git. 2019-07-07 11:15:44 +02:00
Martin Mitas
2e965941ed Add/improve docs for the LaTeX math spans. 2019-07-07 10:59:20 +02:00
Tilman Roeder
8bac86aa43 Added support for LaTeX math (#87)
Addresses #86.
2019-07-07 10:46:10 +02:00
Martin Mitas
ce8b5d9440 md_analyze_line: Blockquote with blank line can interrupt a paragraph.
Fixes #83.
2019-05-27 22:16:35 +02:00
Martin Mitas
5138616445 md_link_label_cmp: Fix handling non-trivial folding info.
Fixes #78.
2019-05-19 11:46:26 +02:00
Martin Mitas
4f6a9e546f Update Unicode support to 12.1.
* scipts/build_*_map.py: Implement helper pythonic scripts used to
   generate some Unicode search maps and data for helper Unicode
   functions used in MD4C.

   This should simplify updating to future Unicode versions.

 * md_get_unicode_fold_info: Use data generated by the scripts.

 * md_is_unicode_whitespace__: Ditto.

 * md_is_unicode_punct__: Ditto.
2019-05-19 11:00:40 +02:00
Martin Mitas
aca5c27f1f test/spec.txt: Update from upstream head. 2019-05-16 22:48:08 +02:00
Martin Mitas
64a1bc37f5 test/coverage.txt: Sort the regression test cases by the issue number. 2019-05-15 23:25:05 +02:00
Martin Mitas
919a0cc9e0 test/*.txt: Fix some formatting. 2019-05-08 07:38:33 +02:00
Martin Mitas
1757ff55c6 test/spec_tests.py: Make ready for spec.txt from cmark-gfm project.
This allows easier checking of our GFM dialect compatibility.
2019-05-07 23:10:46 +02:00
Martin Mitas
83047d3eb1 md_analyze_permissive_url_autolink: Improve.
* Fix domain recognition so that it has to have at least two
   dot-delimited components.

 * Fix handling if parenthesis so that they have to form balanced
   pairs; i.e. the first ')' not having a preceding opener ends the
   path.

Fixes #76.
2019-05-07 22:24:29 +02:00
Martin Mitas
609dfb0b1e md_analyze_line: Treat blank lines inside a HTML block more carefully...
... with respect to the parent list containers.

Fixes #10 (but now really).
2019-05-05 15:56:51 +02:00
Martin Mitas
952791318f When undoing complete block from ctx->block_bytesp[], reset ctx->current_block properly.
Fixes #74.
2019-04-30 00:32:36 +02:00
Martin Mitas
d4d1091511 Improve parsing of inline raw HTML.
* Isolate some common code for scanning HTML closer into a new function
   so most HTML scanner functions reuse the same code.

 * Improve the scanning for the closer so that on failure we remember
   the range where no closer is present. So any later scanning attempts
   may fail early.

   Fixes #73.
2019-04-29 19:03:16 +02:00
Martin Mitáš
d7920b9c25
Merge pull request #67 from mity/spec-0.29
This merges all changes for CommonMark specification 0.28 -> 0.29 transition.
2019-04-08 19:35:06 +02:00
Martin Mitas
5b78f295c6 test/spec.txt: Update from upstream head. 2019-04-08 11:00:27 +02:00
Martin Mitas
2a7b97ed46 test/spec.txt: Update from upstream head. 2019-04-05 08:18:54 +02:00
Martin Mitas
b858698784 md_collect_mark: Add missing 'continue' to '~' branch.
Fixes #69.
2019-04-03 08:28:27 +02:00
Martin Mitas
855a1bfccf test/spec.txt: Update from upstream head. 2019-03-27 02:04:24 +02:00
Martin Mitas
94c86fe292 Revert "Fix problematic link destinations with angle brackets."
The updated specification now explicitly requests the behavior we
implemented before fixing #24.

This reverts commit 2e0a74ba99.
Also remove associated regression test as it is no longer valid.
2019-03-26 14:45:23 +02:00
Martin Mitas
0959975a8c md_analyze_emph: Follow specs changes to the "rule of three". 2019-03-26 14:01:02 +02:00
Martin Mitas
98968e22ed Update spec.txt from upstream head.
(I previously used an updated revision of it by mistake.)
2019-03-26 13:33:05 +02:00
Martin Mitas
1edd0c9cf5 test/spec.txt: Update to current upstream HEAD. 2019-03-26 11:49:25 +02:00
Martin Mitas
2dd96ab4ac Fix O(n^2) in handling the "rule of three".
We had to break the list of potential '*' openers into multiple ones so
we do not have to walk it when looking for matching length due to the
"rule of three" for intraword delimiter runs.

Fixes #63.
2019-03-12 10:27:36 +02:00
Martin Mitas
b21086522e md_analyze_line: Fix O(n^2) in thematic break handling.
Fixes #66.
2019-03-11 21:13:15 +02:00
Martin Mitas
37104fc281 md_is_code_span: Fix crash at EOF.
Fixes #65.
2019-03-11 20:26:58 +02:00
Martin Mitas
966b8e39b5 md_is_link_title: Stop on ')' lin ()-style title.
Fixes #60.
2019-03-11 19:56:46 +02:00
Martin Mitas
fc27108e71 test/pathological_tests.py: Output test durations. 2019-03-11 19:55:08 +02:00
Martin Mitas
53f65852be test/spec.txt: Little update.
Somehow we were having little different spec.txt version that the one
from CommonMark repo tag 0.28. But we still pass all its compliance
test suite.
2019-03-11 19:03:34 +02:00
Martin Mitas
685b714453 Move codespan detection from md_analyze_backtick() into...
md_is_code_span(), called from md_collect_marks().

We have to do this at the same time as detecting raw inline HTML to
follow CommonMark priority requirements.

Also it is done very differently now:

When scanning for the closer mark, we remember (the latest) position of
potential closers for all other lengths as well.

This means that:

(1) If we find it, we reduced the task because all subsequent scan shall
begin after the closer.

(2) If we do not find it, then we have to reach the end of the block and
hence we then know (for every allowed marker length) the position of last
such backtick sequence.

(3) That makes the guaranty that any subsequent call with either succeed
in its scan (and reduce the task even further); or that we shall be able
to detect instantly there is no suitable closer.

I.e. every call either reduces the task by O(n) scan (1); or collects
all the data in O(n) because (2) happens at most once; or fails in O(1)
(3).

This makes O(n) guaranty of the function complexity.

Fixes #59.
2019-03-11 13:02:17 +01:00
Martin Mitas
0cb61205b1 Move raw inline HTML detection from md_analyze_lt_qt() into md_collect_marks().
Fixes #58:

For resolving raw inline HTML the function tried closer with all
potential openers, because raw HTML can have '<' inside of an attribute.

However this caused O(n^2) for input like "<><><><><><><>...".

We solved by handling raw HTML in earlier stage, directly in
md_collect_marks(), where we can scan linerary forward.

Fixes #61:

As a side effect, this also fixes the issue that MD_FLAG_NOHTMLSPANS
disabled also recognition of CommonMark autolinks.
2019-03-11 13:02:17 +01:00
Martin Mitáš
8e01a769ea
Implement task lists. (#50)
Fixes #30.
2019-02-10 22:58:42 +01:00
Martin Mitas
d32aa2e076 Fix conflict in parsing permissive autolinks and ordinary links.
The issues is caused by the fact that we do not know exact position
of permissive auto-link in time of md_collect_marks() because there
is no syntax to mark its end on the 1st place.

This causes that eventually, the closer mark in ctx->marks[] can be
out-of-order somewhat.

As a consequence, if some other mark range (e.g. ordinary link)
shadows the auto-link, the closer mark may be left outside the shadowed
range and survive till the phase when we generate the output.

We fix by using an extra mark flag to remember we did really output
the opener mark, and output the closer only in such case.

Fixes #53.
2019-02-09 10:40:52 +01:00
Martin Mitas
67401e7019 md_analyze_inlines: Resolve table cell boundaries before links.
This brings some corner cases closer to cmark-gfm.

Also fixes #51.
2019-02-06 04:31:25 +01:00
Martin Mitas
8fc692badc md_rollback: Do not touch TABLECELLBOUNDARIES chain.
This chain is not normal opener/closer inline mark chain.

Fixes #42.
2018-06-11 18:18:56 +02:00
Martin Mitas
e6e2ea4c5a md_analyze_line: Fix mixing list and table parsing.
If table header underline is not nested the same way as the preceding
line (i.e. the wannabe table header line), then it cannot form a table.

Fixes #41.
2018-06-11 11:43:47 +02:00
Martin Mitas
4ef024fbb7 md_process_inlines: Fix link/image closers spanning over multiple lines.
Fixes #40.
2018-05-29 23:30:02 +02:00
Martin Mitas
7deaccf65d md_is_link_label: Fix if the link label contains just backslash escapes.
The function did not remember the label start line index, leading to bad
consequences.

Fixes #39.
2018-05-29 18:38:51 +02:00
Martin Mitas
bf022cb656 Fix md_split_simple_pairing_mark().
When splitting a mark into two, make sure each of them gets the right
share od dummies for case that we will have to split once more.

Fixes #36.
2018-05-28 21:16:29 +02:00
Martin Mitas
e7b84d65a4 pathological_tests.py: Fix test compatibility with Windows. 2018-05-28 21:09:32 +02:00
Martin Mitas
81e2a5cac2 pathological_tests.py: Test deeply nested lists. 2018-04-12 17:04:12 +02:00
Martin Mitas
0d1a41a4d2 md_build_attr_append_substr: Fix +1 allocation error.
Fixes #33.
2018-03-28 08:21:21 +02:00
Martin Mitas
19b24bdd11 Simplify the pathological test "many references". 2017-08-16 18:16:49 +02:00
Martin Mitas
07cec7dcd6 Add regression test for #24. 2017-08-16 16:34:50 +02:00
Martin Mitas
ee3bee1a5d Upgrade to CommonMark specification 0.28. 2017-08-02 00:38:54 +02:00
Martin Mitas
938460d564 Improve/unify output of test scripts. 2017-07-25 03:25:42 +02:00
Martin Mitas
c52a50a3db pathological_tests.py: Add test for reference definition lookup. 2017-07-25 03:25:42 +02:00
Martin Mitas
c51fb31058 md_analyze_marks: Walk only required range of the marks.
This changes causes that when recursing to analysis of link contents,
only the marks between the link opener and closer are iterated in
md_analyze_marks().

Fixes #22
2017-07-24 23:33:25 +02:00
Martin Mitas
a27aefded9 pathological_tests.py: Allow short option -p as a synonym of --program. 2017-07-24 20:17:50 +02:00
Martin Mitas
f4f7b2230c pathological_tests.py: Allow Windowish line ends. 2017-07-24 20:15:09 +02:00
Martin Mitas
26f14899ed Add pathological_tests.py from cmark. 2017-07-24 20:12:13 +02:00
Martin Mitas
ad4f28bb85 md_analyze_simple_pairing_mark: Fix the "rule of three".
If the first emphasis opener is refused due the rule of three, a previous
opener is examined. However the variable opener_orig_size_module3 was not
(re)set accordingly.

Fixes #21.
2017-07-24 20:09:23 +02:00
Martin Mitas
cfbce75910 Rework ref. def. dictionary.
It now uses FNV1a and we now sort/bsearch only contents of single bucket.
Additionally we fix #20 by disabling the invalid ref. definitions during
hashtable build.
2017-07-18 18:49:52 +02:00
Martin Mitas
f2821cbd8e md_analyze_permissive_email_autolink: Make it compatible with CMark-gfm. 2017-07-14 17:10:45 +02:00
Martin Mitas
1bc7f3a84e render_url_escaped: Fix escaping of ampersand.
This affected generating href attribute if links or src attribute of
images.
2017-07-14 02:24:21 +02:00
Martin Mitas
f3f9404e53 Improve URL autolinks extension.
It is now much more compatible to Cmark-gfm.

With the flag MD_FLAG_PERMISSIVEWWWAUTOLINKS, we now also support the
WWW autolinks (when the http: scheme is omitted).
2017-07-14 02:06:23 +02:00
Martin Mitas
25a156ee1b Implement strikethrough extension. 2017-07-12 23:30:14 +02:00
Martin Mitas
8999e1844a Fix "rule of three" for emphasis resolution (issue #14). 2017-01-04 15:20:46 +01:00
Martin Mitas
c63909df8e When splitting emphasis opener mark, we have to retain 'dummy' marks available for more splitting in the future (issue #15). 2017-01-04 15:06:14 +01:00
Martin Mitas
5271238426 When parsing tables, pipes inside a link/image/code span cannot make cell boundary (issue #7). 2016-12-27 22:52:06 +01:00
Martin Mitas
f9b4cb8f6e md_process_inlines: Fix when an expanded mark shadows some nested marks (issue #11). 2016-12-15 16:47:41 +01:00
Martin Mitas
c235a02ee8 test/coverage.txt: Add some tests for higher code coverage. 2016-12-15 13:18:48 +01:00
Martin Mitas
a725fee3f6 md_enter_child_containers: Fix crash (issue #10).
Calling md_push_container_bytes() may result in ending a current block
which may result in removing some contents from ctx->block_bytes when
removing some lines with link reference definitions.

This in effect means we have to end the block explicitly before storing
the offset into the ctx->block_bytes.
2016-12-14 16:51:24 +01:00
Martin Mitas
ba29d0075e md_is_link_reference_definition: Fix handling of multiline label (issue #9). 2016-12-12 23:31:59 +01:00
Martin Mitas
09ae86095f Handle images more like links.
Remove MD_SPAN_IMG_DETAIL::alt. Instead, the contents of the image is
propagated to the renderer via MD_RENDERER::text() callback.

 * This fixes handling of entities inside the image text (issue #4).
 * It simplifies parsing and, more importantly, it better distingusshes
   what is responsibility of parser or renderer respectively.
 * This allows more flexibility on renderers side. Renderer who do not
 * really support images can just output the image content as any
   other text.

The cost is a renderer into HTML (if it wants to render image contents
into the attribute ALT of the IMG tag), has to handle images with more
care. Typically such renderer has to track whether it is inside an image,
and if so, then render span enter/leave as an empty string.
2016-12-07 23:56:47 +01:00
Martin Mitas
23312d6d65 md_is_html_tag: Fix parsing unquoted attribute value (issue #2). 2016-12-05 11:13:43 +01:00
Martin Mitas
b40d595044 Fix file permissions of python scripts. 2016-12-04 17:01:00 +01:00
Martin Mitas
be7fcc16ff Implement tables.
Note it is implemented as an extension. To enable it, the flag MD_FLAG_TABLES
must be explicitly specified.
2016-11-21 13:39:45 +01:00
Martin Mitas
809e611b3c Migrate to CommonMark pecification 0.27. 2016-11-20 00:57:32 +01:00
Martin Mitas
ef5f230ffa Implement permissive autolinks extensions.
With MD_FLAG_PERMISSIVEURLAUTOLINKS, we treat not overly complicated URLs
as autolinks even without '<' and '>'.

With MD_FLAG_PERMISSIVEEMAILAUTOLINKS, we treat not overly complicated
e-mail addresses as autolinks even without '<', '>' and without the
'mailto:' scheme.

Also expanded md2html utility and tests to cover these.
2016-10-14 19:56:05 +02:00
Martin Mitas
1cfc6a5f42 Incorporate the specification testsuite from CommonMark. 2016-10-11 01:10:11 +02:00