Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] intern types using Salsa #12061

Merged
merged 4 commits into from
Jul 5, 2024
Merged

[red-knot] intern types using Salsa #12061

merged 4 commits into from
Jul 5, 2024

Conversation

carljm
Copy link
Contributor

@carljm carljm commented Jun 27, 2024

Intern types using Salsa interning instead of in the TypeInference result.

This eliminates the need for TypingContext, and also paves the way for finer-grained type inference queries.

@MichaReiser MichaReiser added the red-knot Multi-file analysis & type inference label Jun 28, 2024
&self.module_type
}

pub(super) fn class_ty(&self, id: FileClassTypeId) -> &ClassType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning a refence would work if we use an actual Arena that allocates a new vector when it runs out of space instead of resizing the existing one (which would invalidate previously returned Ids).

Or more generally. The way to return references here is by lifting the lifetime of types outside of type store by having the actual data owned further up.

@MichaReiser
Copy link
Member

I do like splitting out the type internet from the type inference result. I think it makes sense to have them separate and also allows more effective reuse.

The part that's unclear to me and that you already mentioned on your write up is how to collect unused types. How can we prevent that this is an ever-growing arena?

We could try to track type references by using an AtomicUsize (not an Arc because that would heap allocate each type!) It would give us very fine-grained reclamation of types, at least if we use a slab like arena that supports reusing IDs. An other possibility could be to rely on Salsa by using one interner per file, package or some other granularity, and the interner would be retrieved by calling a salsa-query. This won't give us type level collection, but salsa would evict the entire type store if e.g. the file gets deleted (or all files in a package).

@carljm
Copy link
Contributor Author

carljm commented Jul 5, 2024

This now uses Salsa interning, which eliminates a lot of our own interning and ID boilerplate, and actually works!

Copy link

codspeed-hq bot commented Jul 5, 2024

CodSpeed Performance Report

Merging #12061 will degrade performances by 4.03%

Comparing cjm/fine (c93aeeb) with main (7b50061)

Summary

❌ 1 (👁 1) regressions
✅ 32 untouched benchmarks

Benchmarks breakdown

Benchmark main cjm/fine Change
👁 red_knot_check_file[without_parse] 292 µs 304.3 µs -4.03%

@carljm
Copy link
Contributor Author

carljm commented Jul 5, 2024

Some things in this PR that might require further attention:

  • I think you mentioned you don't want to expose Salsa Db directly to lint rules, but I'm not sure how this will work with the need for lint rules to call methods on types. Previously you exposed typing context, which is just a wrapped db and is passed to all methods on types. Now methods on types take a db, so if we want to avoid exposing db to lint rules, they won't be able to call methods on types.
  • Interning UnionType and IntersectionType in Salsa requires that all their fields be hashable, which means I can't use IndexSet. For now I just switched to Vec, but I need to do a bit more careful analysis of what properties we need here (do we want ordering? do we need fast contains checks? etc) in order to settle on the right data structure.

Copy link
Contributor

github-actions bot commented Jul 5, 2024

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

python-trio/trio (error)

Failed to clone python-trio/trio: warning: Could not find remote branch master to clone.
fatal: Remote branch master not found in upstream origin

Linter (preview)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

python-trio/trio (error)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

Failed to clone python-trio/trio: warning: Could not find remote branch master to clone.
fatal: Remote branch master not found in upstream origin

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

python-trio/trio (error)

Failed to clone python-trio/trio: warning: Could not find remote branch master to clone.
fatal: Remote branch master not found in upstream origin

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

python-trio/trio (error)

ruff format --preview

Failed to clone python-trio/trio: warning: Could not find remote branch master to clone.
fatal: Remote branch master not found in upstream origin

@carljm
Copy link
Contributor Author

carljm commented Jul 5, 2024

The message says I'm supposed to acknowledge the regression on CodSpeed, but when I login to CodSpeed with GitHub, it says I'm not allowed to acknowledge the regression...

@carljm carljm marked this pull request as ready for review July 5, 2024 06:58
@carljm carljm requested a review from AlexWaygood as a code owner July 5, 2024 06:58
@carljm carljm requested a review from MichaReiser July 5, 2024 06:58
@carljm carljm changed the title [red-knot] intern types separately from type inference result [red-knot] intern types using Salsa Jul 5, 2024
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised by the regression but I think it makes sense. Previously, accessing a field on ClassType was reading a reference but it is now a lookup in the salsa ingredient table. Niko's work should help reduce that cost but certainly something to be aware of. The good news, we can still build a more fancy type interner if it turns out that this is the bottleneck

@MichaReiser
Copy link
Member

MichaReiser commented Jul 5, 2024

You now have the permissions to acknowledge regressions. Use it responsibly :P

@carljm
Copy link
Contributor Author

carljm commented Jul 5, 2024

After offline discussion with @MichaReiser, going to switch to OrderSet for unions and intersections (from https://crates.io/crates/ordermap/0.5.0 ), which is a newtype wrapper around IndexSet that also adds order-respecting Eq and Hash. If we care about respecting order in unions (for UX reasons only, not because it's meaningful in the type system), which I think we do, and we care about fast contains checks (I think we also do), then this seems like the best option. It means we will intern "duplicate" unions that differ only in order separately, but this is pretty much unavoidable if we really want to respect order for UX. We will have to see how much of a cost this is in practice, and if it's worth paying for the UX.

@carljm
Copy link
Contributor Author

carljm commented Jul 5, 2024

The updated perf results are a bit better; not sure if this is just noise or if not interning ModuleType actually helped. I don't think Vec vs OrderSet should help, since we aren't currently doing any contains checks on unions or intersections.

@carljm carljm merged commit 0e44235 into main Jul 5, 2024
20 checks passed
@carljm carljm deleted the cjm/fine branch July 5, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants