-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enables GATs iteration in the Element API #442
Conversation
Codecov Report
@@ Coverage Diff @@
## main #442 +/- ##
==========================================
- Coverage 83.64% 83.54% -0.10%
==========================================
Files 84 85 +1
Lines 16376 16400 +24
==========================================
+ Hits 13697 13701 +4
- Misses 2679 2699 +20
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
I can't make any sense of the code coverage results; I don't think I trust |
/// If a field name appears more than once, this method makes the arbitrary decision to return | ||
/// the value associated with the last appearance. If your application uses structs that repeat | ||
/// field names, you are encouraged to use [get_all] instead. | ||
fn get_last<A: AsSymbolRef>(&self, field_name: A) -> Option<&Element> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why this is get_last
instead of get_first
(just get_any
). My only concern is that get_last
forces the implementation to read and map the entire struct. That may be unproblematic now, but I don't see a good reason to choose last over first or any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my assumption is that we would want to build the map (and the indexing to it) lazily. That doesn't seem to be happening now (and I don't think it needs to now). I confess that reviewing rust code in github is still relatively new to me, so please let me know what I'm missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern is that get_last forces the implementation to read and map the entire struct. ... I guess my assumption is that we would want to build the map (and the indexing to it) lazily.
This is a good instinct, and a LazyElement
is something we're planning to implement in the future. As you mention, this implementation (Element
/Struct
) is fully materialized in memory, so the choice of first or last is truly arbitrary.
This particular method (get_last
) lives on a private type (Fields
) that isn't exposed to the end user. It's invoked by a public method called get
that's defined on the IonStruct
trait. If we were going to change the contract, that's the method we'd want to change.
At present, get
promises to return the last field it found in the serialized struct. From an Ion data model perspective, this isn't really correct; struct fields don't have an order. The most correct thing to do would be to only offer a get_all()
method that returned an iterator whose iteration order was not guaranteed. However, nearly all struct
use cases don't use repeated fields and users expect to be able to access the one-and-only value for a given field trivially, so get
is available as a convenience. If it turns out someone wants the first appearance of a given field, they can call self.get_all(name).first()
and (now that GATs are enabled) that will be basically the same cost as get
.
let symbol = SymbolRef::with_unknown_text(); | ||
// ...and use the unknown text symbol to look up matching field values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to reason about what the behavior of this would be. What fields would match the unknown text symbol
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fields whose symbol ID intentionally maps to unknown text. This is the case for $0
as well as any symbol whose entry in the symbol table was non-text (null
, 5
, false
, etc).
/// If a field name appears more than once, this method makes the arbitrary decision to return | ||
/// the value associated with the last appearance. If your application uses structs that repeat | ||
/// field names, you are encouraged to use [get_all] instead. | ||
fn get_last<'iter, A: AsSymbolRef>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you've made this explicitly get_last
instead of get
with the documented disclaimer that it gets the last one. Nice touch.
fn get<T: AsSymbolRef>(&self, field_name: T) -> Option<&Self::Element> { | ||
match field_name.as_symbol_ref().text() { | ||
None => { | ||
// Build a cheap, stack-allocated `SymbolRef` that represents unknown text | ||
let symbol = SymbolRef::with_unknown_text(); | ||
// Use the unknown text symbol to look up matching field values | ||
self.fields.get(&symbol)?.last() | ||
} | ||
Some(text) => { | ||
// Otherwise, look it up by text | ||
self.fields.get(text)?.last() | ||
} | ||
} | ||
self.fields.get_last(field_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
This PR replaces all uses of
Box<dyn Iterator<Item=_> + 'a>
in theElement
andElementRef
APIs with generic associated types. This means that heap allocations are no longer required to iterate over annotations, sequence elements, or struct fields.Because this change uses generic associated types, it also configures the CI tests to use the
beta
channel ofcargo
. GATs will be available in the next release of Rust (v1.65, out in November); once that happens, we can and should switch back tostable
. (#443)This change also removes the recently added dependency on the
linkhash
crate. That dependency was originally pulled in so structs could remember the order in which their fields were inserted for later iteration. However, it only worked if the struct did not contain duplicate field names. This PR changes the field storage forStruct
andStructRef
to store their fields in aVec
and also maintain aHashMap<Symbol, IndexVec>
that remembers all of the indexes at which values for a given field can be found.Further, it moves
Sequence
elements andStruct
fields into anRc
to makeclone()
operations cheaper. A future PR will modify iterators over collection types to clone theRc
and thus remove one of their lifetime constraints, making it much easier to write a recursive iterator over an element tree without constant cloning.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.