-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize to and deserialize from Apache Arrow format #1
Comments
Cool, I'd be very happy to take contributions in this area! I'll be happy to discuss this further with you, answer any questions about the current implementation and/or review PRs. |
Thanks ! Check this out and have a play and think how it relates to qframe. https://github.com/urban-wombat I plan to work up more stuff with gotables in the urban-wombat repos. Just totally out of time right now. Anyway I am very curious how it can mate with QFrame as immutable is really important |
I took some time to check out gotables, flatbuffers and how they relate to arrow. As you mention arrow uses flatbuffers for the meta data which seems nice. Wouldn't it make sense to adopt the Arrow schema from the start and use that as the "native" serialization schema for QFrame? While browsing the Arrow data layout docs it seemed to me that a lot of the data should be possible to use with zero copying when "deserializing" given the current internal data formats in QFrame columns. where that is currently not the case adjustments to the internal format may be possible to allow it. |
agrre that the arrow schema makes sense. I feel out of my knowledge depth about arrow here. also influxDB startup donated the golang code btw. Its up in the air as to IF it will be maintained . has not been touched in ages. |
Yes, I also noticed the work on Arrow from Influx when it was first released and was very excited. I've also noticed that not much has happened since then. I hope they will pick it up again! |
Ok so let's wait and see first if that repo gets some traction.
You can leave this issue open if you like or close it.
…On Tue, 19 Jun 2018, 23:08 Tobias Gustafsson, ***@***.***> wrote:
Yes, I also noticed the work on Arrow from Influx when it was first
released and was very excited. I've also noticed that not much has happened
since then. I hope they will pick it up again!
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ATuCwkE08EXMSoyswP-gQWTHOFrwTzufks5t-Wg1gaJpZM4ULxel>
.
|
I think I'll start experimenting with the Arrow format for fast serialization and deserialization of QFrames to see how far away the current internal representation is from the Arrow format without waiting for the official repo. I'm already in need of an efficient binary format for that so why not choose Arrow. If that repo starts moving again it may make sense to align the internal representation with Arrow entirely since it would give access to some AVX2 optimized aggregations, etc that they seem to be developing. I'll change the title of this ticket a bit to narrow the focus to serialization and deserialization for now though. |
sorry about 1 month delay. Sounds like a good approach to use the Arrow format. https://github.com/apache/arrow/tree/master/go/arrow Nope.. hmm. seems that sbinet is the maintainer for the go Arrow code ? Might want to chat to him.. He works at Cern i think ? |
I've started to work on providing support for List arrays: feel free to have a look at that and comment/improve :) (PS: I work for IN2P3/CNRS, kind of the french equivalent of NSF/DOE and I do work for some experiments based at CERN. but I am not a CERN employee per se.) |
and now the PR for Struct arrays: |
Cool @sbinet, great to see the arrow initiative for Go moving again! |
Wow guys this is great. Much thanks and will play around with this. |
I am using arrow and it uses flat buffers internally which are very fast.
I would be interested in extending qframe to work with flat buffers.
There is also a special schemaless flat buffers called "flexible" which does not enforce a schema. I expect this is what you want to use for qframe.
The text was updated successfully, but these errors were encountered: