-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arec performance improvements #154
Arec performance improvements #154
Conversation
230a745
to
ffdb558
Compare
Using the new user = toArec ( #name =: "Peter"
:& #age =: 4
:& #active =: True
:& RNil
) The difference is that it doesn't rely on inlining to get rid of the intermediate data structure, which might speed up compilation and work out better in -O0 situations, though I haven't tested it. I'm not sure if it's worth the additional code. |
I've been thinking about this, and I'm leaning towards just keeping the old |
I would agree, but I figured out that the same interface can be used for more things, e.g.
But I would probably get away from the "cute" operators. |
ffdb558
to
e396e6a
Compare
I have removed They still live in I have also added test cases for |
bae68f2
to
b8f30bb
Compare
* Move from Array to SmallArray# to avoid intermediate list during construction * Use class instead of recursive function in toARec for improved inlining
* Saves a 1 word per field * Improves accessor performance significantly both for Rec as well as ARec
b8f30bb
to
5aeff45
Compare
@@ -23,9 +24,26 @@ | |||
-- element to the head of a list, but element access is linear time; | |||
-- array access time is uniform, but extending the array is more | |||
-- slower. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This blank line needs to start with --
or haddock
can not process the file.
The comments you added are invaluable; I really appreciate that thoroughness! The The change in this PR that I'm most wary of is changing ETA: Compiling that executable also uses a huge amount of RAM. |
I can reproduce the compilation failure, and reverting 5e38074 fixes it. How vexing, I would never have guess that a newtypes might lead to that problem. |
The problem might be some over-zealous inlining in Frames, though. I ran
in the frames directory and it solved the problem |
In any case, the problem seems to be that turning the data into a newtype simplifies strictness analysis, giving the inliner more opportunities to fire, and boy does it ever fire. I manually copied the TH-splice back into the source file, and commented out a bunch of column definitions, and the result is that reducing the number of column definitions drastically improves compilation times (almost exponentially). Btw, that's part of the reason that I am interested in ARecs, since they don't rely on unrolling and inlining so heavily. |
That makes a lot of sense! I need to check that thE INLINABLE change doesn’t affect performance much. Is there a way of determining steps we could take to ease the tradeoffs between compile- and run-times? I never worked out a way of profiling compile times, so I’m always flying a bit blind trying to help that. But I figured that — to some extent — inlining happens when something is used, so hopefully we’re not spending time optimizing something the user isn’t even using. |
OK, I bisected the It doesn't seem to make a huge difference for run time:
after
That's well within run-to-run variance |
This is really interesting. I believe the thinking was that we wanted the whole chain of loading functions to be specialized to the right type. But perhaps it’s mostly okay if various layers are generic. Many, perhaps most, uses of frames are some kind of fold. We want the inner part of the fold to in-line away into vector accesses of particular columns so there is no manifest For now, we should probably just remove that |
Just to be a little more precise, I put the benchdemo
It seems that the |
Interestingly, comparing the core output from the
so it doesn't seem like the |
Okay, so one thing is to take out those Is it worth making the |
Back of the envelope calculation: moving from data to newtype saved ~9ns for 1 row, 15 fields [1] That's below the run-to-run variance, so no wonder we don't see anything. [1]: #150 (comment) |
For
You've explained how the As I understand it, the main performance win with the Compare to the So, end of the day, I'm happy to go ahead with this PR if you feel like it's ready as there are a lot of very positive changes. I think there's just that haddock issue lingering, but I can fix that in a followup commit if you'd prefer. |
These were increasing compile times without improving run times. They also caused the GHC simplifier to blow up with an upcoming change to Vinyl to [change](VinylRecords/Vinyl#154) the `ElField` type from a GADT to a `newtype`.
I've looked at It does however suggest that removing the right inline pragma(s) could make a significant difference in compilation times. Regarding |
I've gone ahead and pulled your commits in. I spent some more time poking at this, but didn't have any new ideas. Given that, I'd rather we move forward than let this great contribution wither on the vine. Thank you for the code and the discussions! |
SmallArray#
instead ofArray
to avoid intermediate lists when constructingARec
s&:
,arnil
andarec
to efficiently constructARec
s without going throughRec
sElField
into a newtype to improve accessor performance (both forRec
andARec
)KnownSymbol
dictionary that wasn't used anywhereSee #150 (comment) for benchmark comparisons
Example usage of the ARec pseudo-constructors:
Related to #150 and #153