The autolinks now allow unmatched parenthesis, only the trailing
parenthesis closers are handled specially to deal with the situation the
autolink is all inside an outer parenthesis.
Somehow our tests were broken and avoided the cases with unmatched
parenthesis pairs inside the auto-link. That's now fixed and in sync
with GFM specs too.
Fixes#135.
* Get rid of MD_LINE::total_indent.
* Remove some special complicated branching for nested images: Instead
we use md_rollback() the wiki-link destination span to kill _any_
marks resolved so far, including the images.
* Remove any length limit from label. Only destination length is
limited, regardless of whether '|' is present or not.
* Move the special handling of `[[foo|]]` from md_process_inlines()
into md_resolve_links(). We simply expand the closer mark to consume
the `|`.
* Do not modify the opener and closer marks until we really know it
is indeed a wiki-link.
Errorneously, we have called md_resolve_range() with mark chain derived
from the closer mark. In the case that the opener and closer marks
differ in length (and we have split one or the other), we pass in an
incorrect chain, which may lead to strange behavior in subsequent
analysis.
Fixes#98.
With a new flag MD_FLAG_WIKILINKS, recoginize wiki-style links
as [[foo]] and [[foo|bar]].
Update also the HTML renderer accordingly, to output a custom
HTML tag <x-wikilink> when seeing it.
We do so by removing the function md_is_table_row().
md_is_table_row() did some crazy inline parsing to detect whether the
line contains at least one pipe which is not inside a code span or other
high-priority inline element.
This was very complicated under the hood and to was actually breaking
the clean design which separates block analysis parse and inline analysis
of each block contents.
We now just use the table underline for determining the block is table
and its properties like e.g. the column count.
This means a paragraph now cannot interrupt a table. This is a change in
a behavior but likely acceptable one as it actually brings the behavior
closer to behavior of tables in cmark-gfm in this regard.
Last but not least, it seems to prevent adoption of other useful
features, for about that, see the discussion in PR #92.
* md_link_label_cmp: To match the labels, the loop has to reach ends of
the labels for both of them.
* md_link_label_cmp_load_fold_info: Collapse consequtive whitespace
into a single ' ' for the label comparison purposes.
Fixes#96.
* scipts/build_*_map.py: Implement helper pythonic scripts used to
generate some Unicode search maps and data for helper Unicode
functions used in MD4C.
This should simplify updating to future Unicode versions.
* md_get_unicode_fold_info: Use data generated by the scripts.
* md_is_unicode_whitespace__: Ditto.
* md_is_unicode_punct__: Ditto.
* Fix domain recognition so that it has to have at least two
dot-delimited components.
* Fix handling if parenthesis so that they have to form balanced
pairs; i.e. the first ')' not having a preceding opener ends the
path.
Fixes#76.
* Isolate some common code for scanning HTML closer into a new function
so most HTML scanner functions reuse the same code.
* Improve the scanning for the closer so that on failure we remember
the range where no closer is present. So any later scanning attempts
may fail early.
Fixes#73.
The updated specification now explicitly requests the behavior we
implemented before fixing #24.
This reverts commit 2e0a74ba99.
Also remove associated regression test as it is no longer valid.
We had to break the list of potential '*' openers into multiple ones so
we do not have to walk it when looking for matching length due to the
"rule of three" for intraword delimiter runs.
Fixes#63.
md_is_code_span(), called from md_collect_marks().
We have to do this at the same time as detecting raw inline HTML to
follow CommonMark priority requirements.
Also it is done very differently now:
When scanning for the closer mark, we remember (the latest) position of
potential closers for all other lengths as well.
This means that:
(1) If we find it, we reduced the task because all subsequent scan shall
begin after the closer.
(2) If we do not find it, then we have to reach the end of the block and
hence we then know (for every allowed marker length) the position of last
such backtick sequence.
(3) That makes the guaranty that any subsequent call with either succeed
in its scan (and reduce the task even further); or that we shall be able
to detect instantly there is no suitable closer.
I.e. every call either reduces the task by O(n) scan (1); or collects
all the data in O(n) because (2) happens at most once; or fails in O(1)
(3).
This makes O(n) guaranty of the function complexity.
Fixes#59.
Fixes#58:
For resolving raw inline HTML the function tried closer with all
potential openers, because raw HTML can have '<' inside of an attribute.
However this caused O(n^2) for input like "<><><><><><><>...".
We solved by handling raw HTML in earlier stage, directly in
md_collect_marks(), where we can scan linerary forward.
Fixes#61:
As a side effect, this also fixes the issue that MD_FLAG_NOHTMLSPANS
disabled also recognition of CommonMark autolinks.
The issues is caused by the fact that we do not know exact position
of permissive auto-link in time of md_collect_marks() because there
is no syntax to mark its end on the 1st place.
This causes that eventually, the closer mark in ctx->marks[] can be
out-of-order somewhat.
As a consequence, if some other mark range (e.g. ordinary link)
shadows the auto-link, the closer mark may be left outside the shadowed
range and survive till the phase when we generate the output.
We fix by using an extra mark flag to remember we did really output
the opener mark, and output the closer only in such case.
Fixes#53.
If table header underline is not nested the same way as the preceding
line (i.e. the wannabe table header line), then it cannot form a table.
Fixes#41.
This changes causes that when recursing to analysis of link contents,
only the marks between the link opener and closer are iterated in
md_analyze_marks().
Fixes#22
If the first emphasis opener is refused due the rule of three, a previous
opener is examined. However the variable opener_orig_size_module3 was not
(re)set accordingly.
Fixes#21.
It now uses FNV1a and we now sort/bsearch only contents of single bucket.
Additionally we fix#20 by disabling the invalid ref. definitions during
hashtable build.
It is now much more compatible to Cmark-gfm.
With the flag MD_FLAG_PERMISSIVEWWWAUTOLINKS, we now also support the
WWW autolinks (when the http: scheme is omitted).
Calling md_push_container_bytes() may result in ending a current block
which may result in removing some contents from ctx->block_bytes when
removing some lines with link reference definitions.
This in effect means we have to end the block explicitly before storing
the offset into the ctx->block_bytes.
Remove MD_SPAN_IMG_DETAIL::alt. Instead, the contents of the image is
propagated to the renderer via MD_RENDERER::text() callback.
* This fixes handling of entities inside the image text (issue #4).
* It simplifies parsing and, more importantly, it better distingusshes
what is responsibility of parser or renderer respectively.
* This allows more flexibility on renderers side. Renderer who do not
* really support images can just output the image content as any
other text.
The cost is a renderer into HTML (if it wants to render image contents
into the attribute ALT of the IMG tag), has to handle images with more
care. Typically such renderer has to track whether it is inside an image,
and if so, then render span enter/leave as an empty string.
With MD_FLAG_PERMISSIVEURLAUTOLINKS, we treat not overly complicated URLs
as autolinks even without '<' and '>'.
With MD_FLAG_PERMISSIVEEMAILAUTOLINKS, we treat not overly complicated
e-mail addresses as autolinks even without '<', '>' and without the
'mailto:' scheme.
Also expanded md2html utility and tests to cover these.