Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor or remove MDX on demand #2606

Open
4 tasks done
remcohaszing opened this issue Apr 3, 2025 · 4 comments
Open
4 tasks done

Refactor or remove MDX on demand #2606

remcohaszing opened this issue Apr 3, 2025 · 4 comments
Labels
🤞 phase/open Post is being triaged manually

Comments

@remcohaszing
Copy link
Member

Initial checklist

Problem

I have been thinking about MDX on demand lately, and I see some quirks/problems

  1. The mdx-js/mdx package contains code for compiling code. But it also contains runtime code for evaluating the generated code. This seems like an odd place. For example, this adds runtime types for the compiler.
  2. It adds an extra option to the MDX compiler.
  3. The transform of a module to a function body runs after third party recma plugins. This adds complexity for recma plugins, which now have to deal with the possibility of a top-level return statement.
  4. The transform has bugs. For example:
    export * as foo from 'foo'
    export * from 'bar'
    transforms into:
    const _exportAll1 = await import(_resolveDynamicMdxSpecifier('foo'))
    const _exportAll2 = await import(_resolveDynamicMdxSpecifier('bar'))
    // …
    return {
      ..._exportAll1,
      ..._exportAll2
    }
    This should be:
    const _exportAll1 = await import(_resolveDynamicMdxSpecifier('foo'))
    const _exportAll2 = await import(_resolveDynamicMdxSpecifier('bar'))
    // …
    return {
      foo: _exportAll1,
      ..._exportAll2
    }
  5. Imports are resolved, but the JSX runtime is injected.
  6. Eval is evil. We do warn about the dangers of evaluating code, plenty I think. But still, I think we can discourage it further by moving this out of core.

Current solutions

The user can use run / evaluate. Alternatively they can use recma-module-to-function.

Proposed solutions

I have some solutions that we can implement partially or gradually.

  1. Base the current implementation on recma-module-to-function. I spotted some quirks for edge cases in recma-module-to-function which I’m still resolving, but those exist in the current MDX on demand implementation as well. Several of the now generated code for MDX on demand can be solved using a custom import implementation. This can be non-breaking. A naive untested implementation:
    function createImport(baseUrl, runtime) {
      _import.meta = {
        url: String(baseUrl),
        resolve
      }
    
      return _import
    
      function _import(specifier, options) {
        if (specifier === 'mdx:/jsx-runtime') {
          return runtime
        }
    
        return import(resolve(specifier), options)
      }
    
      function resolve(specifier) {
        if (typeof specifier !== "string") {
          return specifier
        }
    
        try {
          new URL(specifier)
          return specifier
        } catch {}
    
        if (
          specifier.startsWith("/") || 
          specifier.startsWith("./") ||
          specifier.startsWith("../")
        ) {
          return new URL(specifier, baseUrl).href
        }
    
        return specifier
      }
    }
  2. Move evaluate / run into a new package @mdx-js/on-demand. This means the core no longer needs the option outputFormat or any runtime code.
  3. Completely remove evaluate / run. We can still document an example like https://github.com/remcohaszing/recma-module-to-function#examples and keep the terminology MDX on demand to describe this pattern. Some (subjective) benefits:
    • Safer: This explicitly makes the user author the evaluation code using the JavaScript using the AsyncFunction constructor. I could even make it compatible with the Function constructor. This will trigger the ESLint rule no-new-func for many users, which I believe to be a good thing.
    • Greater flexibility: Import logic is very flexible and explicit. There’s no need to explain the concept of a base URL.
    • Simpler: There’s no special handling of JSX. It’s just an import now.
    • Teaching: I think this makes it more explicit that MDX gets compiled to regular JavaScript.
    • More compatibility: This approach works with anything that compiles to JavaScript. The same concept could be used to transform JavaScript, JSX, or TypeScript for example.

Option 3 is definitely a breaking change and would be semver major. Option 2 could be done earlier via a deprecated re-export, but IMO that’s not worth the effort until we do a semver major.

@github-actions github-actions bot added 👋 phase/new Post is being triaged automatically 🤞 phase/open Post is being triaged manually and removed 👋 phase/new Post is being triaged automatically labels Apr 3, 2025
@wooorm
Copy link
Member

wooorm commented Apr 4, 2025

This seems like an odd place

Why? With export specifiers, people can put different things in packages?
With conditions, we can make things for different environments.

this adds runtime types for the compiler.

What? Which types?

It adds an extra option to the MDX compiler.

MDX has several options, it also has many uses, I don’t think it is a goal to have few options.
Some things can exist in core.

There used to be an @mdx-js/runtime because the code was entirely different (it pulled in Babel).
Right now, packages are used because they pull in a different tool like React or Vue or esbuild or rollup.
I don’t think

The transform of a module to a function body runs after third party recma plugins. This adds complexity for recma plugins, which now have to deal with the possibility of a top-level return statement.

Which plugins deal with that? How?

The transform has bugs. For example:

Bugs can be fixed.

Imports are resolved, but the JSX runtime is injected.

Feature?

Eval is evil. We do warn about the dangers of evaluating code, plenty I think. But still, I think we can discourage it further by moving this out of core.

How do you mean discourage, if people still use it, just further down the line?
How are you proposing to actually drop eval?

but those exist in the current MDX on demand implementation as well

Which bugs?

Several of the now generated code for MDX on demand can be solved using a custom import implementation

What is broken?

Move evaluate / run into a new package @mdx-js/on-demand. This means the core no longer needs the option outputFormat or any runtime code.

Is that possible?

Greater flexibility: Import logic is very flexible and explicit. There’s no need to explain the concept of a base URL.

I believe that your proposal still needs a base URL, just in a plugin instead of in core?

Simpler: There’s no special handling of JSX. It’s just an import now.

Can you please specify which special handling you mean? I don’t understand how JSX is handled differently in function-body.

Teaching: I think this makes it more explicit that MDX gets compiled to regular JavaScript.

I... How? How does what you propose improve teaching?

More compatibility: This approach works with anything that compiles to JavaScript. The same concept could be used to transform JavaScript, JSX, or TypeScript for example.

I do not follow you. Which compat, with what? What is impossible now, what would be possible by removing code from core?


Did you know that it is possible to load modules with blobs. We’d still need to deal with the imports, but:

const example = `import {jsx as _jsx} from "react/jsx-runtime";
function _createMdxContent(props) {
  const _components = {
    h1: "h1",
    ...props.components
  };
  return _jsx(_components.h1, {
    children: "Hello, world!"
  });
}
export default function MDXContent(props = {}) {
  const {wrapper: MDXLayout} = props.components || ({});
  return MDXLayout ? _jsx(MDXLayout, {
    ...props,
    children: _jsx(_createMdxContent, {
      ...props
    })
  }) : _createMdxContent(props);
}`

const blob = new Blob([example], { type: 'text/javascript' })
const url = URL.createObjectURL(blob)
import(url).then(console.log, console.error).then(clean)
function clean() { URL.revokeObjectURL(url) }
TypeError: Module name, 'react/jsx-runtime' does not resolve to a valid URL.

Using https://esm.sh/react/jsx-runtime:

Module
default: function(props = {})
Symbol(Symbol.toStringTag): "Module"

@remcohaszing
Copy link
Member Author

This seems like an odd place

Why? With export specifiers, people can put different things in packages? With conditions, we can make things for different environments.

These are supposed to run in different environments. Compiling typically happens at build time. In case of MDX on demand this happens at runtime on the server, which is also fine. Evaluating typically happens on the frontend.

IMO this also goes against the unified philosophy of keeping packages small and focused. MDX on demand can be external.

this adds runtime types for the compiler.

What? Which types?

Dependencies on @types/mdx and hast-util-to-jsx-runtime.

It adds an extra option to the MDX compiler.

MDX has several options, it also has many uses, I don’t think it is a goal to have few options. Some things can exist in core.

I agree. However, I don’t think we need this option in core either.

There used to be an @mdx-js/runtime because the code was entirely different (it pulled in Babel). Right now, packages are used because they pull in a different tool like React or Vue or esbuild or rollup. I don’t think

The transform of a module to a function body runs after third party recma plugins. This adds complexity for recma plugins, which now have to deal with the possibility of a top-level return statement.

Which plugins deal with that? How?

The following plugin injects a statement that usually works, but becomes unreachable with the function-body output format.

import { compile } from '@mdx-js/mdx'

function plugin() {
  return (ast) => {
    ast.body.push({
      type: 'ExpressionStatement',
      expression: { type: 'Identifier', name: 'unreachable' }
    })
  }
}

const file = await compile('', {
  outputFormat: 'function-body',
  recmaPlugins: [plugin]
})

console.log(String(file))

output:

"use strict";
const {Fragment: _Fragment, jsx: _jsx} = arguments[0];
function _createMdxContent(props) {
  return _jsx(_Fragment, {});
}
function MDXContent(props = {}) {
  const {wrapper: MDXLayout} = props.components || ({});
  return MDXLayout ? _jsx(MDXLayout, {
    ...props,
    children: _jsx(_createMdxContent, {
      ...props
    })
  }) : _createMdxContent(props);
}
return {
  default: MDXContent
};
unreachable;

The transform has bugs. For example:

Bugs can be fixed.

They can be. I’m proposing to externalize this, so they won’t have to be.

Imports are resolved, but the JSX runtime is injected.

Feature?

Feature? Yes. Good abstraction? IMO not really. Definitely opiniated.

Eval is evil. We do warn about the dangers of evaluating code, plenty I think. But still, I think we can discourage it further by moving this out of core.

How do you mean discourage, if people still use it, just further down the line? How are you proposing to actually drop eval?

I’m proposing to make the user run the actual eval using JavaScript builtins. Don’t let MDX handle this.

but those exist in the current MDX on demand implementation as well

Which bugs?

These re-exports are all problematic.

export const __proto__ = {}
export { a as 'b' }
export { a as 'b' } from 'c'
export * as a from 'b'
export { __proto__ } from 'a'

Several of the now generated code for MDX on demand can be solved using a custom import implementation

What is broken?

The broken re-exports I mentioned above, but also the code can be replaced with estree-util-module-to-function.

Move evaluate / run into a new package @mdx-js/on-demand. This means the core no longer needs the option outputFormat or any runtime code.

Is that possible?

Greater flexibility: Import logic is very flexible and explicit. There’s no need to explain the concept of a base URL.

I believe that your proposal still needs a base URL, just in a plugin instead of in core?

My proposed solution for a backwards compatibility needs a base URL. My proposal is to eventually remove evaluate and run completely.

Simpler: There’s no special handling of JSX. It’s just an import now.

Can you please specify which special handling you mean? I don’t understand how JSX is handled differently in function-body.

JSX (automatic runtime) gets compiled to 2 things:

  1. An import of the JSX runtime
  2. Converting JSX syntax to function calls
import { useState } from 'react'

<div />

becomes:

import { useState } from 'react'
import { jsx as _jsx } from 'react/jsx-runtime'

_jsx('div', {})

With function-body this is no longer true. Imports are preserved, but processed with a base URL. The exteption is the react/jsx-runtime import. This is transformed into a function argument. So it becomes something along the lines of:

import { useState } from 'react'
const { jsx as _jsx } arguments[0]

_jsx('div', {})

Teaching: I think this makes it more explicit that MDX gets compiled to regular JavaScript.

I... How? How does what you propose improve teaching?

More compatibility: This approach works with anything that compiles to JavaScript. The same concept could be used to transform JavaScript, JSX, or TypeScript for example.

I do not follow you. Which compat, with what? What is impossible now, what would be possible by removing code from core?

Removing things never makes anything possible that were previously impossible. I’m saying I believe there are better solutions. Instead of having a special compiler option and MDX runtime code, I would like to:

  • Let the user compile MDX to a JavaScript module.
  • Explain the user they can use the Function / AsyncFunction constructor to evaluate JavaScript.
  • Tell the user they can use recma-module-to-function to have MDX (or ESTree in general) emit code that can be evaluated using the AsyncFunction constructor.
  • Explain to them they can resolve imports.

From this recma-module-to-function test:

const importName = '_import'
const file = await compile('Hello {props.name}!', {
  recmaPlugins: [[recmaModuleToFunction, { importName }]]
})

const fn = new AsyncFunction(importName, String(file))
const { default: MDXContent } = (await fn(async (name) => {
  if (name === 'react/jsx-runtime') {
    return runtime
  }

  throw new Error(`Cannot find module ${name}`)
}))

const result = renderToStaticMarkup(<MDXContent name="React" />)

All this example does with MDX, is compile it to JavaScript. It uses a recma plugin to turn the output into a function body. JSX doesn’t get special treatment. It’s just an import. The user can resolve any imports however they want. The examples in the readme show various example strategies. One example is basically a base URL that is compatible with bundlers. Another example shows how to use this with a CDN as a base URL.


Did you know that it is possible to load modules with blobs.

This didn’t occur to me. It’s an interesting find. This is less flexible than recma-module-to-function, but sufficient for use cases that only require a base URL. I think this adds to my point that there are multiple possible strategies. I imagine another alternative MDX on demand strategy can be, based on your solution:

const file = await compile('Hello {props.name}!', {
  recmaPlugins: [[recmaEsmBaseUrl, { baseUrl: 'https://esm.sh' }]]
})

const blob = new Blob([String(file)], { type: 'text/javascript' })
const url = URL.createObjectURL(blob)
import(url)

So again, this only needs a plugin and documentation. It’s simple enough for users to implement without the need for an option in MDX.

@wooorm
Copy link
Member

wooorm commented Apr 10, 2025

These are supposed to run in different environments.

That’s possible, that’s why run exists. But you can also run it in one place, with evaluate. The MDX website is build like that.

Evaluating typically happens on the frontend.

Common things are easy but uncommon things are possible. Perhaps it is not common to use Node loaders but it’s supposed to be possible to run MDX everywhere.

IMO this also goes against the unified philosophy of keeping packages small and focused.

MDX is small and focused.
You are taking it to extremes: not every option needs to be a different package.
The lower you go, the smaller things are. But MDX is the highest.
Similarly, react-markdown, has some common options.

recma-jsx-build exists and is used here with an option: I think that follows that philosophy enough. I do not think we need to remove @mdx-js/mdx and let people write unified() pipelines manually.

The following plugin injects a statement that usually works, but becomes unreachable with the function-body output format.

It can be solved as an issue?

This issue seems to exist because recma is a thing now and you are writing recma plugins that are unrelated to MDX. This could also be marked as wontfix from MDX because it is not something plugin authors should do?

They can be. I’m proposing to externalize this, so they won’t have to be.

Bugs need to be fixed, here or there.

I’m proposing to make the user run the actual eval using JavaScript builtins. Don’t let MDX handle this.

If it was that easy, that could happen. But it was not, so evaluate and run were made. MDX is also there to make things easy. Though, if Blob/URL is viable we could switch to that.

These re-exports are all problematic.

Exports as literals are new, not supported by astring yet, GH-2536.
Not sure about __proto__ but bugs can be fixed, wherever they are.

The broken re-exports I mentioned above, but also the code can be replaced with estree-util-module-to-function.

I think I mentioned this a while ago already, I think when you were working on that utility. That we could use it here, if it supported the things that are needed. Not sure but I was under the impression your work was not compatible with the code here.
Whether to use your module is different to whether some features should be externalized.

JSX (automatic runtime) gets compiled to 2 things:

  1. An import of the JSX runtime
  2. Converting JSX syntax to function calls

There are more things involved with JSX, such as injecting comments. Also, a bunch of classic options. And elementAttributeNameCase, stylePropertyNameCase, tableCellAlignToStyle.
So, I dunno, I could also make a list of 1 item (JSX) and a list of 5+ items.

Removing things never makes anything possible that were previously impossible.

Before evaluate and run were added, people were doing things brokenly, full of bugs.
While things could technically be possible, people did it badly, in a way that export/import were not supported, Babel was pulled into a client.

Your point here can also be inverted: nothing is impossible, so removing things does nothing.

The user can resolve any imports however they want

It is interesting, though, I am not sure such power needs to be the default. The current setup works with import maps.

It seems like a different concept/plugin: a) turn a module into a function body, b) dependency injection alternative to import maps.

What does your plugin do with import.meta.url and import.meta.resolve? These were important to me.

This is less flexible than recma-module-to-function

Why is this extra flexibility in recma-module-to-function?
Why is that not recma-dependency-injection?

recmaPlugins: [[recmaEsmBaseUrl, { baseUrl: 'https://esm.sh' }]]

Importantly, baseUrl should almost always be import.meta.url. Because you are almost always running JavaScript in the place you are running it. With a CDN it would be ./example/index.js -> https://esm.sh/example/index.js and import.meta.resolve('./index.css') -> https://esm.sh/index.css.
Without an explicit given URL, the import.meta.url would be https://server.com/assets/bundle/build.min.js or so. With it, it would be https://example.com/path/to/page.html or file:///Users/tilde/Projects/app/server.js.

@remcohaszing
Copy link
Member Author

[…]

That’s possible, that’s why run exists. But you can also run it in one place, with evaluate. The MDX website is build like that.

[…]

Common things are easy but uncommon things are possible. Perhaps it is not common to use Node loaders but it’s supposed to be possible to run MDX everywhere.

[…]

MDX is small and focused. You are taking it to extremes: not every option needs to be a different package. The lower you go, the smaller things are. But MDX is the highest. Similarly, react-markdown, has some common options.

recma-jsx-build exists and is used here with an option: I think that follows that philosophy enough. I do not think we need to remove @mdx-js/mdx and let people write unified() pipelines manually.

I’m not arguing about every option. I’m arguing mixing runtime logic into the core package seems out of place. This happens to be some exports and an option.

The following plugin injects a statement that usually works, but becomes unreachable with the function-body output format.

It can be solved as an issue?

Yes, absolutely. This is merely an example of a not very obvious problem that arises because of the function-body option.

This issue seems to exist because recma is a thing now and you are writing recma plugins that are unrelated to MDX. This could also be marked as wontfix from MDX because it is not something plugin authors should do?

recma plugins can add exports or other transformations. These may or may not be related to MDX. Some are more stable than others. But I think adding exports is a relatively simple one that typically shouldn’t cause problems.

I’m proposing to make the user run the actual eval using JavaScript builtins. Don’t let MDX handle this.

If it was that easy, that could happen. But it was not, so evaluate and run were made.

I just don’t see it. I really don’t think the latter is simpler:

const file = await compile('Hello {props.name}!', {
  recmaPlugins: [recmaModuleToFunction]
})

const fn = new AsyncFunction(String(file))
const { default: MDXContent } = await fn()

vs

import * as runtime from 'https://esm.sh/react/jsx-runtime'

const file = await compile('Hello {props.name}!', {
  outputFormat: 'function-body'
})

const fn = await run(String(file), {...runtime, baseUrl: import.meta.url})

These re-exports are all problematic.

Exports as literals are new, not supported by astring yet, GH-2536. Not sure about __proto__ but bugs can be fixed, wherever they are.

I stumbled upon this recently, but I didn’t realize they were new.

The broken re-exports I mentioned above, but also the code can be replaced with estree-util-module-to-function.

I think I mentioned this a while ago already, I think when you were working on that utility. That we could use it here, if it supported the things that are needed. Not sure but I was under the impression your work was not compatible with the code here.

It wasn’t made for MDX specifically, but it works with any ESTree, including the one that MDX produces. It just doesn’t treat JSX runtime imports different from other imports.

Whether to use your module is different to whether some features should be externalized.

Yes. I proposed 3 solutions. One is to use recma-module-to-function. Another to externalize MDX on demand functionality or even replace it with documentation only.

[…]

There are more things involved with JSX, such as injecting comments. Also, a bunch of classic options. And elementAttributeNameCase, stylePropertyNameCase, tableCellAlignToStyle. So, I dunno, I could also make a list of 1 item (JSX) and a list of 5+ items.

This is taken out of context. I’m not downplaying the complexities of compiling MDX or JSX. I’m highlighting differences between the two output formats.

Before evaluate and run were added, people were doing things brokenly, full of bugs. While things could technically be possible, people did it badly, in a way that export/import were not supported, Babel was pulled into a client.

Why was that? I wasn’t involved as much back then. I think because back then the MDX output still contained JSX and Babel was considered the go-to tool to compile JSX? The situation is much better now, even without evaluate or run. MDX compiles to regular JavaScript and doesn’t need post-processing.

What does your plugin do with import.meta.url and import.meta.resolve? These were important to me.

They are transformed into regular property accessors of customImport. How this works is shown in solution 1 of the OP. If you’re interested to see the compiled code, have a look at the convert-meta-properties test fixture.

The user can resolve any imports however they want

It is interesting, though, I am not sure such power needs to be the default. The current setup works with import maps.

It seems like a different concept/plugin: a) turn a module into a function body, b) dependency injection alternative to import maps.

This is less flexible than recma-module-to-function

Why is this extra flexibility in recma-module-to-function?

At the time I needed to resolve modules. Some modules were authored by the user in a virtual file system. Relative imports needed to resolve to local files. Bare import specifiers needed to resolve to packages on https://esm.sh. I think it wasn’t even used with MDX back then. It just happens to be a good fit.

Why is that not recma-dependency-injection?

estree-util-module-to-function is 3 years old. I think recma-dependency-injection is fine. I still think recma-module-to-function is also fine. Let’s not focus on these details.

recmaPlugins: [[recmaEsmBaseUrl, { baseUrl: 'https://esm.sh' }]]

Importantly, baseUrl should almost always be import.meta.url. Because you are almost always running JavaScript in the place you are running it. With a CDN it would be ./example/index.js -> https://esm.sh/example/index.js and import.meta.resolve('./index.css') -> https://esm.sh/index.css. Without an explicit given URL, the import.meta.url would be https://server.com/assets/bundle/build.min.js or so. With it, it would be https://example.com/path/to/page.html or file:///Users/tilde/Projects/app/server.js.

I don’t think baseUrl should almost always be import.meta.url, precisely for the reasons you’re giving. The value of import.meta.url is ambiguous. It can be used reliably with Node.js modules, or native modules in the browser. But many users use bundlers. Bundlers may or may not interpret import.meta.url under different circumstances. The value may vary between development and production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤞 phase/open Post is being triaged manually
Development

No branches or pull requests

2 participants