mediawiki reader: improve strong/emph conformance #10766
Draft
+193
−10
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
cf. #10761 and #3044.
I made some progress with this today without completely blowing up the existing strong and emph parsers but weird edge cases remain. E.g. consider
''foo''''bar''
. Pandoc today will give youEmph [ Str "foo" , Str "bar" ]
, which has an obvious appeal. My work in progress givesEmph [ Str "foo''" ] , Str "bar''"
, which is odder but defensible given other requirements for emphasized quote marks. The actual correct answer, according to MediaWiki, isEmph [ Str "foo'" , Strong [ Str "bar" ] ]
, i.e. foo'bar, which is basically a koan.Parsoid has a lot of code just for processing quotes, presumably aiming to maintain bug-for-bug compatibility with whatever MediaWiki's first parser did. So what a string of single-quotes means varies depending on what comes after it in the line, in a more context-sensitive way than I expected.
Would it be better to merge code that makes us more conformant with MediaWiki for some cases and "wrong in a different way" for others, or to try to reach perfection?