Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

import(cjs) with query strings has odd behavior #417

Open
mysticatea opened this issue Oct 31, 2019 · 25 comments
Open

import(cjs) with query strings has odd behavior #417

mysticatea opened this issue Oct 31, 2019 · 25 comments
Labels

Comments

@mysticatea
Copy link

Hello, module team. May I get help with the way to load packages without cache? I have seen a bit of odd behavior in import(cjs).


Copied from nodejs/node#29812

  • Version: 12.11.1, 13.0.1
  • Platform: Windows 10
  • Subsystem: ES modules

From eslint/eslint#12333

Repro https://github.com/mysticatea/import-cjs-issue

Description

I tried to import packages without import cache. From the document, it looks like I should use query strings.

    await import("x-esm") //→ Found as expected
    await import("./node_modules/x-esm/index.js?q=0") //→ Found and `x-esm` re-ran as expected
    await import("./node_modules/x-esm/index.js?q=1") //→ Found and `x-esm` re-ran as expected

x-esm is an ES module package. It worked as expected.

    const cjs = await import("x-cjs") //→ Found as expected
    const cjs0 = await import("./node_modules/x-cjs/index.js?q=0") //→ Found but `x-cjs` didn't re-run
    const cjs1 = await import("./node_modules/x-cjs/index.js?q=1") //→ Found but `x-cjs` didn't re-run
    console.log(cjs === cjs0, cjs === cjs1, cjs0 === cjs1) //→ all are false but `x-cjs` didn't run three times

x-cjs is a CJS package. The result was odd. The console.log() in x-cjs package ran only one time, but the returned values are different for each query string.

I found the entry of x-cjs in require.cache. However, the cache entry is odd as well. It's different from require("x-cjs"), the entry doesn't have parent property and the module.children of test.js is still empty.

Anyway, I tried to remove the cache entry.

    console.log("---- remove 'require.cache' ----")
    delete require.cache[require.resolve("x-cjs")]
    await import("x-cjs") //→ Found but `x-cjs` didn't re-run
    await import("./node_modules/x-cjs/index.js?q=0") //→ Found but `x-cjs` didn't re-run
    await import("./node_modules/x-cjs/index.js?q=1") //→ Found but `x-cjs` didn't re-run
    await import("./node_modules/x-cjs/index.js?q=2") //→ Found and `x-cjs` re-ran as expected
    await import("./node_modules/x-cjs/index.js?q=3") //→ Found but `x-cjs` didn't re-run

Cryptic. I guess this behavior is:

  • import(cjs) has cache apart from require.cache.
  • The import(cjs) cache is created from require.cache.
  • It runs CJS package only if require.cache entry was not found.
  • The import(cjs) cache is not removed even if require.cache entry deleted.

Therefore, I have to do the following steps if I want to import packages without cache.

  1. Find the main file of the package because I cannot put query strings to the package name.
    const url = "file:" + require.resolve(packageName) + uniqueQueryString;
  2. Import it.
    const ret = await import(url);
  3. Delete require.cache entry.
    delete require.cache[require.resolve(packageName)];

Questions

  1. Is it intentional behavior that import(cjs) creates incomplete require.cache entries?
  2. If yes, is it intentional behavior that import(cjs) with query strings returns different objects for the same CJS package?

I'm guessing that import(cjs) should not create any require.cache entries, and import(cjs) with query strings re-runs CJS packages as same as ES packages.

@jkrems
Copy link
Contributor

jkrems commented Oct 31, 2019

Thanks for running those experiments and letting us know about your findings!

x-cjs is a CJS package. The result was odd. The console.log() in x-cjs package ran only one time, but the returned values are different for each query string.

Can you try to compare .default on all of these? That value should be identical and I assume it is what counts for users.

I guess this behavior is: [...]

Yes, import() has its own cache (the "module map" in ES module speak) that is based on URLs instead of file paths. The algorithm is roughly:

// Load module for a URL not yet in the module map
function loadModule(url, referrerUrl) {
  if (isCommonJS(url)) {
    // - This is where that fresh namespace object comes from (top of my post).
    // - The toFileSystemPath is what drops the query string.
    // - This code doesn't care if the referrerUrl is a CJS file, exists in the require.cache, etc. 
    const require = createRequire(toFileSystemPath(referrerUrl);
    return { default: require(toFileSystemPath(url)) };
  }
  // continue loading standard JS modules
}

This also why technically the require.cache entries aren't incomplete. There's two completely independent module systems at work. require.cache only knows about the require system. And from that perspective, there isn't any parent. And there's no require-system children either when the module used import(). require.cache generally has no knowledge of what goes on in the ES module system and that includes import() inside of CommonJS files.

@jkrems
Copy link
Contributor

jkrems commented Oct 31, 2019

I'm guessing that import(cjs) should not create any require.cache entries, and import(cjs) with query strings re-runs CJS packages as same as ES packages.

The ES module system (import) doesn't support CommonJS. The only thing it knows to do is to forward to the require module system. So "one file path is one CJS module instance" and "each CJS module appears in require.cache" is things that it can't really influence. What we could do is to make require.cache invalidation a thing when we can tell that the URL doesn't match the URL originally associated with a CJS module. But I'm not sure there would be wide support for such a feature.

@devsnek devsnek added the cjs label Oct 31, 2019
@devsnek
Copy link
Member

devsnek commented Oct 31, 2019

everything here seems to be working as intended. its just weird and complex because two separate module systems with separate caches are being welded together.

@ljharb
Copy link
Member

ljharb commented Oct 31, 2019

In the example above, I’d expect q=3 to be reran - a different query string is a different URL, and that one hadnt already been cached. Everything else looks right to me.

@devsnek
Copy link
Member

devsnek commented Oct 31, 2019

@ljharb its a new esm module, but it hits the existing require cache.

@ljharb
Copy link
Member

ljharb commented Oct 31, 2019

Right, but in the OP, q=3 was never evaluated in the require cache.

@devsnek
Copy link
Member

devsnek commented Oct 31, 2019

when ?q=2 runs, it puts a new entry into require.cache. ?q=3 hits that cache entry.

@ljharb
Copy link
Member

ljharb commented Oct 31, 2019

How? Query strings aren’t/shouldn’t be treated specially; a !== URL is a different URL.

@jkrems
Copy link
Contributor

jkrems commented Oct 31, 2019

How? Query strings aren’t/shouldn’t be treated specially; a !== URL is a different URL.

Query strings don't exist in file system paths and the require system only sees the file system path which is identical. So it returns the same, existing module.

@ljharb
Copy link
Member

ljharb commented Oct 31, 2019

Then if q=1 and q=2 were already in the cache - and didn’t rerun - why wasn’t q=3 already in the same cache?

@ljharb
Copy link
Member

ljharb commented Oct 31, 2019

iow I’d expect either “all query strings are normalized out”, in which case, that module should only ever run once - or, “query strings are preserved”, in which case a different query string should always rerun the first time.

@devsnek
Copy link
Member

devsnek commented Oct 31, 2019

@ljharb query strings are always preserved in esm, and always normalized out in cjs. In this case, each ?q=x is a new esm module, but that new esm module may wrap an existing cjs module.

@ljharb
Copy link
Member

ljharb commented Oct 31, 2019

So if q=0 and q=1 are distinct ESM modules (ran twice in the above example; cached the second time), then i would expect q=2 to be one and also q=3 when using import. But the above example shows 2 re-evaluating and 3 not, which seems inconsistent to me.

@devsnek
Copy link
Member

devsnek commented Oct 31, 2019

q=0, q=1, q=2, q=3 are all distinct es modules. "re-evaluating" in this case refers to whether the cjs module they wrap reruns.

@ljharb
Copy link
Member

ljharb commented Oct 31, 2019

Deleting the require cache, thus, should either have no effect, or should have an effect the first time. In the above example, it seems to have an effect the third time but not the first two or the fourth.

@jkrems
Copy link
Contributor

jkrems commented Oct 31, 2019

The full example is:

const cjs = await import("x-cjs") //→ Found as expected
const cjs0 = await import("./node_modules/x-cjs/index.js?q=0") //→ Found but `x-cjs` didn't re-run
const cjs1 = await import("./node_modules/x-cjs/index.js?q=1") //→ Found but `x-cjs` didn't re-run

// At this point:
// * require.cache[x-cjs/index.js] is present
// * moduleMap { 'x-cjs/index.js', 'x-cjs/index.js?q=0', 'x-cjs/index.js?q=1' } are present

console.log("---- remove 'require.cache' ----")
delete require.cache[require.resolve("x-cjs")]

// At this point:
// * require.cache[x-cjs/index.js] is *not* present
// * moduleMap { 'x-cjs/index.js', 'x-cjs/index.js?q=0', 'x-cjs/index.js?q=1' } are still present and point to previous CommonJS execution

await import("x-cjs") //→ Found but `x-cjs` didn't re-run
await import("./node_modules/x-cjs/index.js?q=0") //→ Found but `x-cjs` didn't re-run
await import("./node_modules/x-cjs/index.js?q=1") //→ Found but `x-cjs` didn't re-run

// up to this point, we've been hitting the module map and got the existing instances

// but now, we start missing the module map and run into the previously emptied require cache

await import("./node_modules/x-cjs/index.js?q=2") //→ Found and `x-cjs` re-ran as expected
await import("./node_modules/x-cjs/index.js?q=3") //→ Found but `x-cjs` didn't re-run

// At this point:
// * require.cache[x-cjs/index.js] is present but points to the new execution
// * moduleMap { 'x-cjs/index.js', 'x-cjs/index.js?q=0', 'x-cjs/index.js?q=1' } are still present and point to first CommonJS execution
// * moduleMap { 'x-cjs/index.js?q=2', 'x-cjs/index.js?q=3' } are present and point to second CommonJS execution

@ljharb
Copy link
Member

ljharb commented Oct 31, 2019

ah, thank you. I understand how it’s working, but it seems like it’s not working how it should.

Why would deleting the require cache have any effect at all on ESM modules? The two caches should be unrelated (if not kept in sync).

@devsnek
Copy link
Member

devsnek commented Oct 31, 2019

simplified example: https://engine262.js.org/#gist=31db83a9a28067300fccedc80b9e7725

@jkrems
Copy link
Contributor

jkrems commented Oct 31, 2019

Why would deleting the require cache have any effect at all on ESM modules? The two caches should be unrelated (if not kept in sync).

It only has an effect on subsequent misses in the ESM cache. Deleting the require cache had no effect on the existing ESM modules or the module map. It's only when ESM has a cache miss and asks for a "new" URL from require that it gets affected by the disappearing require.cache entries.

@Jamesernator
Copy link

Jamesernator commented Nov 1, 2019

Stepping aside for a second, even with query strings only the entry point will actually be reevaluated (because the other specifiers are still unchanged).

You probably want to use something that creates a fresh import of the module graph e.g.:

import vm from 'vm';

async function defaultLinker(specifier: string, module: vm.Module) {
  // the default (or from current loader) node linking algorithm
  // hope that gets exposed at some point

  return new vm.SourceTextModule(...);
}

async function importFresh(file: URL) {
  const entry = new vm.SourceTextModule(fs.readFileSync(file, 'utf8'), {
    identifier: file.href,
  });

  await entry.link(defaultLinker);
  await entry.evaluate();

  return entry.namespace;
}

This means you can evaluate the whole graph, also thanks to the way vm.Module means you can even cache certain specifiers for speed if you want (e.g. cache ones in node_modules).

@mysticatea
Copy link
Author

Thank you for elaborating!

I learned that this is correct behavior, so likely I need the three steps for fresh importing:

// Find the main file of the package because I cannot put query strings to the package name.
const url = "file:" + require.resolve(packageName) + uniqueQueryString;
// Import it.
const ret = await import(url);
// Delete require.cache entry.
delete require.cache[require.resolve(packageName)];

@Jamesernator Thank you for your suggestion. The SourceTextModule class looks a good alternative. However, the class looks to not support a mix of modules and scripts (the linker accepts only modules). Is there a way to mix of those?

@devsnek
Copy link
Member

devsnek commented Nov 3, 2019

you have to create wrappers for anything that isn't a module. the vm.SyntheticModule class can help you with that.

@SMotaal
Copy link

SMotaal commented Nov 4, 2019

@mysticatea BTW, if it helps, I've used a ?dev approach on my ESM-based static site, which I am happy to give more input on if that helps.

Just try those with the network panel open in the console:

It boils down to the right dynamic import, and until top-level await lands, it is better practice imho to think of deps as clear partitions that are awaited. The gap that CJS would need to close to be portable (ie not bundled).

Sorry no CJS input here — I'm having seasonal allergies :)

@SMotaal
Copy link

SMotaal commented Nov 4, 2019

@devsnek re vm.‹module stuff› what is the un/flagging roadmap there? and where do these discussions happen?

I'm quite interested in seeing how they play out with electron et al.

@devsnek
Copy link
Member

devsnek commented Nov 4, 2019

@SMotaal sometime after esm is stable.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants