-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve handling of clobbering #31
Comments
Sounds like you wrote a long post explaining how your solution is actually rather complex and brittle, and how the current solution isn’t that bad? As you mentioned, “without extra context we can't know […]”. As for the difference between The unique clobber prefix per post, so as not to conflict, that is interesting too. I guess GH doesn’t normally run into it because headings are not slugged in comments. But they did have this problem with footnotes. Perhaps you can share how you run into this. What kind of markup are your users writing? |
It's for a personal project that's not yet deployed anywhere to be seen, and I'm currently the sole user. I generate HTML from markdown. I want to fix the GFM footnotes, and it took me long enough to figure out what was breaking them. Checking the code, I thought I might put in the extra effort to improve the sanitize plugin more than just for my use-case, and the extra effort for the extra improvements still seems reasonable for me to do.
It doesn't cover all possible cases because it's not possible ("without extra context"), but I don't think it's complex and brittle? But maybe I did not manage to properly convey what I meant, I'll try to be more concise and lay down my proposition differently. The most basic change that would fix GFM footnotes and internal links, while following the assumptions that were made for the current code (I should maybe have given that as one of the alternatives too?) would be something like this: diff --git lib/index.js lib/index.js
index 7fa8b25..1535a94 100644
--- lib/index.js
+++ lib/index.js
@@ -647,11 +647,17 @@ function propertyValuePrimitive(state, definition, key, value) {
if (!ok) return
}
- return state.schema.clobber &&
- state.schema.clobberPrefix &&
- state.schema.clobber.includes(key)
- ? state.schema.clobberPrefix + value
- : value
+ if (!state.schema.clobberPrefix || !state.schema.clobber?.includes(key)) {
+ return value
+ }
+
+ if (key == 'href') {
+ return value.startsWith('#')
+ ? '#' + state.schema.clobberPrefix + value.substring(1)
+ : value
+ }
+
+ return state.schema.clobberPrefix + value
} We can still keep the pretty links by default by not adding And the following improvement areas:
Does that seem more reasonable to you? |
Hey again! Sorry, rather busy.
Then why are you clobbering and sanitizing? You can turn things off? 🤔
How do you use them and what is broken? Can you make and MVP of what doesn’t work? I thought they worked by default?
this is what worries me, as the defaults should be good without extra context.
I mean that the technique is much more involved that a singular rule, as it is now?
I mean that it doesn’t work on all
Same. But for me the proposed alternative doesn’t seem acceptable. It would get too complex for me. You’d need an allowlist of URL attributes (https://github.com/rehypejs/rehype-minify/tree/main/packages/html-url-attributes) and extensibility to that list, then you need to know the exact URL a document is shown on; however that could be several URLs. You’d need to understand many URLs (
This gets near the difference between linking in a document (say, footnotes scoped to a single comment, there you don’t want them to “escape”), and linking outside it (say there are comments under a readme, those could link to the headings in the readme). None of these tools have the knowldege of whether they are processing a single comment. Or a whole document. I feel like you’d need that to answer this. |
Sorry I wasn't very clear, the GFM footnotes plugin on its own works fine, but if you sanitize afterwards then "the footnotes appear broken". This has already been reported in #30 with an example, but closed as a duplicate of #29 (also closed). The problem I have is the same. I am not satisfied with the workaround mentioned in the README, and propose that the sanitize plugin not break footnotes in the first place unless the user wants it that way (e.g. to have good looking URLs). Given this premise, my proposal has a solution that I think is better than the one in #29.
My proposal follows the observation that the current defaults are already not that good (broken GFM footnotes and internal links in general), and could be significantly improved even without extra context (not broken GFM footnotes, as well as internal links with a varying degree of support depending on which part of the proposal is implemented).
It depends how complete you want the solution to be. I think the "most basic change" in my previous answer is simple enough to cover the obvious case, and will stop breaking the footnotes. Allowing user-configurable rewrites doesn't change the concept that much either: instead of just two hardcoded rules (single ID or list of IDs), we add another possibility with user-configured rules.
You already don't cover all cases for simple ID rewriting in the default schema, for example the It covers by default at minimum a common case (naked internal links, as generated by GFM footnotes), and in the next answer below I explain how it can cover all urls through configuration, in the same way the user can already add support for more attributes that the default doesn't handle through configuration (like the
That's where making it configurable through the schema comes in handy: we don't need to make this complex on our side, in particular we don't need lists, and we can let the user handle it themself. From the user's perspective, the configuration could look something like this: clobber: [
'ariaDescribedBy',
'ariaLabelledBy',
'id',
'name',
{
'attribute': 'href',
'rewrite': (value, state) => value.startsWith('#') ? '#' + state.schema.clobberPrefix + value.substring(1) : value
}
] If the user wants to take care of cases we don't cover, they can do it by specifying a different function, with the knowledge of their own context. I haven't looked too much into this yet, but I think it shouldn't be very complex to implement handling this in the lib. As in the example, we can keep the simple attribute names for backwards compatibility. We can even keep the current default (no handling of
Correct, and I don't propose to solve the whole thing ourselves, but rather to enable the user to do it with our support. This part of my proposal requires far more work, and will likely require some changes to the structure of the code. Let's leave it out for now, so we can focus on the other parts of the proposal. |
30 points to this specific comment in 29, did you see that? #29 (comment)?
Which workaround? I do not think what GH does is a workaround. I think it’s quite good.
Please think about what this project is for and what it does. |
It's not extensible enough for me, and the changes I propose to implement would make it extensible in a way that could help me solve my use-case without forcing everyone to do it my way.
With all due respect, I don't think you really get what clobbering is about and how the sanitizer mitigates that issue, especially since rewriting the aria attributes as configured in the default schema has nothing to do with clobbering and all to do with fixing the document after sanitization, which was exactly the area I wanted to improve.
Yeah I get it now, this project is for github. I'll close this issue and implement my own sanitizer just for clobbering, disabling the one from this plugin. I'm still willing to put in the effort and do it as part of this plugin, if you change your mind let me know. If you want to just continue the discussion to see if you'd be willing to accept some variation of my proposal (and even if in the end it doesn't amount to anything) I'm also willing to explore that with you, as long as you're willing to entertain the notion that github is not the only valid use-case. Cheers. |
Initial checklist
Problem
The way DOM clobbering prevention is implemented, this plugin leaves sanitized HTML broken: it correctly rewrites the attributes that can cause clobbering (
id
andname
, configured by default) and by default some attributes that reference those IDs (ariaDescribedBy
andariaLabelledBy
).Notably missing is the
href
attribute, which can be used for internal links using a fragment which value matches theid
orname
attribute of an element. This breaks GFM footnotes (cf. #30, #29) but also internal links such as those targetting headings slugified byrehype-slug
.The current solution has been deemed acceptable because 1) it keeps the internal links "pretty" for the users; and 2) github likes it like that.
I think however that it is not very good, because it requires awareness of the issue otherwise the plugin just appears broken, and by default breaks the document and requires a workaround to be applied later. And the suggested workaround (the github script) also has limitations, like it wouldn't support multiple generated documents being included with different
clobberPrefix
to avoid collisions.I would much prefer if the sanitizer wouldn't break my documents by default where there is no reason to. Pretty links are nice, but it should be a conscious decision to enable them as it requires extra effort to un-break the document afterwards.
Solution
I propose to improve the rewriting of attributes that reference another element via their ID as follows:
While visiting nodes:
id
andname
attributes as currently;id
andname
attributes to a list of targets;ariaDescribedBy
,ariaLabelledBy
but alsohref
), add it to a list of references.Now that the whole tree has been traversed, we know all the IDs in the sanitized document, and can start rewriting the references. Doing it after the traversal instead of during allows to be a bit smarter with the rewriting, and allows (maybe as an option) to keep references untouched when they don't target an ID within the document (if the document is later included into another one, this would allow referencing elements in the enclosing document, e.g.
#top
for an internal link, or some common aria labels or descriptions).For the
href
attribute, I propose to only rewrite them if they contain just the fragment. So for examplehref="foobar.htm#section-xyz"
would not be rewritten, because without extra context we can't know whetherfoobar.htm
is the current page, buthref="#section-xyz"
would be rewritten tohref="#user-content-section-xyz"
. This would at least fix the broken GFM footnotes, and generally internal links, which I think is a common case when converting from markdown.This would require to special-case
href
because the assumption on the format used currently for the two aria attributes doesn't hold. Instead of that, I also propose to slightly change the format of the configuration to allow passing a function to handle the rewriting, which would let users add support for more exotic attributes, such as customdata-*
attributes (not that anybody requested such a thing yet) or simply let the user handle the difficult things like the URL fragment not being on its own.Alternatives
hast-util-sanitize
lib, as mentioned by the reporter of Don't apply the clobber prefix when it's already present #29: Yes, it works, but we're basically all on our own. Since sanitization is security-relevant, I think it would be better to do it oncerightbetter. It would also be quite brittle, and could break with any update.hast-util-sanitize
was used and all IDs have been rewritten). Also an extra walk of the tree. And the plugin's name would be somewhat stupid:hast-util-sanitize-clobbering-fix
.hast-util-sanitize
redundant, while probably ending up lifting a big chunk of the current code, would require configuring another plugin to make it work (i.e. disabling theclobber
handling inhast-util-sanitize
), extra tree walk, annoying plugin name.I hope you are at least open to the idea that my proposal can be better than the alternatives. On my part, I am also open to suggestions if my proposition doesn't quite fit the bill. I can implement it, but I can't promise to implement it fast =)
The text was updated successfully, but these errors were encountered: