-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[D] D programming language Implementation of Arrow #44515
Comments
Thanks for your suggestion. |
I will try to do it. Recently contributing to Apache OpenDAL also for D support. |
Great! It seems that https://github.com/ananis25/darrow generates bindings from C header files. But we don't need to do it because Apache Arrow C GLib supports GObject Introspection. GObject Introspection provides API related metadata. We can use them instead of parsing C header files. It seems that there is a D tool for GObject Introspection: https://github.com/gtkd-developers/gir-to-d Can we use it instead of parsing C header files? |
Works! # $PWD = arrow/d
$ girtod -g ../c_glib.build/arrow-glib/ -i Arrow-1.0.gir -o source
$ girtod -g ../c_glib.build/arrow-glib/ -i ../arrow-dataset-glib/ArrowDataset-1.0.gir -o source
$ girtod -g ../c_glib.build/arrow-glib/ -i ../arrow-flight-glib/ArrowFlight-1.0.gir -o source tree-ls output$ tree .
.
├── README.md
├── dub.sdl
└── source
├── arrow
│ ├── AggregateNodeOptions.d
│ ├── Aggregation.d
│ ├── Array.d
│ ├── ArrayBuilder.d
│ ├── ArrayDatum.d
│ ├── ArraySortOptions.d
│ ├── AzureFileSystem.d
│ ├── BaseBinaryScalar.d
│ ├── BaseListScalar.d
│ ├── BinaryArray.d
│ ├── BinaryArrayBuilder.d
│ ├── BinaryDataType.d
│ ├── BinaryDictionaryArrayBuilder.d
│ ├── BinaryScalar.d
│ ├── BooleanArray.d
│ ├── BooleanArrayBuilder.d
│ ├── BooleanDataType.d
│ ├── BooleanScalar.d
│ ├── Buffer.d
│ ├── BufferInputStream.d
│ ├── BufferOutputStream.d
│ ├── CSVReadOptions.d
│ ├── CSVReader.d
│ ├── CallExpression.d
│ ├── CastOptions.d
│ ├── ChunkedArray.d
│ ├── ChunkedArrayDatum.d
│ ├── Codec.d
│ ├── CompressedInputStream.d
│ ├── CompressedOutputStream.d
│ ├── CountOptions.d
│ ├── DataType.d
│ ├── Date32Array.d
│ ├── Date32ArrayBuilder.d
│ ├── Date32DataType.d
│ ├── Date32Scalar.d
│ ├── Date64Array.d
│ ├── Date64ArrayBuilder.d
│ ├── Date64DataType.d
│ ├── Date64Scalar.d
│ ├── Datum.d
│ ├── DayMillisecond.d
│ ├── DayTimeIntervalArray.d
│ ├── DayTimeIntervalArrayBuilder.d
│ ├── DayTimeIntervalDataType.d
│ ├── DayTimeIntervalScalar.d
│ ├── Decimal128.d
│ ├── Decimal128Array.d
│ ├── Decimal128ArrayBuilder.d
│ ├── Decimal128DataType.d
│ ├── Decimal128Scalar.d
│ ├── Decimal256.d
│ ├── Decimal256Array.d
│ ├── Decimal256ArrayBuilder.d
│ ├── Decimal256DataType.d
│ ├── Decimal256Scalar.d
│ ├── DecimalDataType.d
│ ├── DenseUnionArray.d
│ ├── DenseUnionArrayBuilder.d
│ ├── DenseUnionDataType.d
│ ├── DenseUnionScalar.d
│ ├── DictionaryArray.d
│ ├── DictionaryDataType.d
│ ├── DoubleArray.d
│ ├── DoubleArrayBuilder.d
│ ├── DoubleDataType.d
│ ├── DoubleScalar.d
│ ├── EqualOptions.d
│ ├── ExecuteContext.d
│ ├── ExecuteNode.d
│ ├── ExecuteNodeOptions.d
│ ├── ExecutePlan.d
│ ├── Expression.d
│ ├── ExtensionArray.d
│ ├── ExtensionDataType.d
│ ├── ExtensionDataTypeRegistry.d
│ ├── ExtensionScalar.d
│ ├── FeatherFileReader.d
│ ├── FeatherWriteProperties.d
│ ├── Field.d
│ ├── FieldExpression.d
│ ├── FileIF.d
│ ├── FileInfo.d
│ ├── FileInputStream.d
│ ├── FileOutputStream.d
│ ├── FileSelector.d
│ ├── FileSystem.d
│ ├── FileT.d
│ ├── FilterNodeOptions.d
│ ├── FilterOptions.d
│ ├── FixedSizeBinaryArray.d
│ ├── FixedSizeBinaryArrayBuilder.d
│ ├── FixedSizeBinaryDataType.d
│ ├── FixedSizeBinaryScalar.d
│ ├── FixedWidthDataType.d
│ ├── FloatArray.d
│ ├── FloatArrayBuilder.d
│ ├── FloatDataType.d
│ ├── FloatScalar.d
│ ├── FloatingPointDataType.d
│ ├── Function.d
│ ├── FunctionDoc.d
│ ├── FunctionOptions.d
│ ├── GCSFileSystem.d
│ ├── GIOInputStream.d
│ ├── GIOOutputStream.d
│ ├── HDFSFileSystem.d
│ ├── HalfFloatArray.d
│ ├── HalfFloatArrayBuilder.d
│ ├── HalfFloatDataType.d
│ ├── HalfFloatScalar.d
│ ├── HashJoinNodeOptions.d
│ ├── ISO8601TimestampParser.d
│ ├── IndexOptions.d
│ ├── InputStream.d
│ ├── Int16Array.d
│ ├── Int16ArrayBuilder.d
│ ├── Int16DataType.d
│ ├── Int16Scalar.d
│ ├── Int32Array.d
│ ├── Int32ArrayBuilder.d
│ ├── Int32DataType.d
│ ├── Int32Scalar.d
│ ├── Int64Array.d
│ ├── Int64ArrayBuilder.d
│ ├── Int64DataType.d
│ ├── Int64Scalar.d
│ ├── Int8Array.d
│ ├── Int8ArrayBuilder.d
│ ├── Int8DataType.d
│ ├── Int8Scalar.d
│ ├── IntArrayBuilder.d
│ ├── IntegerDataType.d
│ ├── IntervalDataType.d
│ ├── JSONReadOptions.d
│ ├── JSONReader.d
│ ├── LargeBinaryArray.d
│ ├── LargeBinaryArrayBuilder.d
│ ├── LargeBinaryDataType.d
│ ├── LargeBinaryScalar.d
│ ├── LargeListArray.d
│ ├── LargeListArrayBuilder.d
│ ├── LargeListDataType.d
│ ├── LargeListScalar.d
│ ├── LargeStringArray.d
│ ├── LargeStringArrayBuilder.d
│ ├── LargeStringDataType.d
│ ├── LargeStringScalar.d
│ ├── ListArray.d
│ ├── ListArrayBuilder.d
│ ├── ListDataType.d
│ ├── ListScalar.d
│ ├── LiteralExpression.d
│ ├── LocalFileSystem.d
│ ├── LocalFileSystemOptions.d
│ ├── MapArray.d
│ ├── MapArrayBuilder.d
│ ├── MapDataType.d
│ ├── MapScalar.d
│ ├── MatchSubstringOptions.d
│ ├── MemoryMappedInputStream.d
│ ├── MemoryPool.d
│ ├── MockFileSystem.d
│ ├── MonthDayNano.d
│ ├── MonthDayNanoIntervalArray.d
│ ├── MonthDayNanoIntervalArrayBuilder.d
│ ├── MonthDayNanoIntervalDataType.d
│ ├── MonthDayNanoIntervalScalar.d
│ ├── MonthIntervalArray.d
│ ├── MonthIntervalArrayBuilder.d
│ ├── MonthIntervalDataType.d
│ ├── MonthIntervalScalar.d
│ ├── MutableBuffer.d
│ ├── NullArray.d
│ ├── NullArrayBuilder.d
│ ├── NullDataType.d
│ ├── NullScalar.d
│ ├── NumericArray.d
│ ├── NumericDataType.d
│ ├── ORCFileReader.d
│ ├── OutputStream.d
│ ├── PrimitiveArray.d
│ ├── ProjectNodeOptions.d
│ ├── QuantileOptions.d
│ ├── RankOptions.d
│ ├── ReadOptions.d
│ ├── ReadableIF.d
│ ├── ReadableT.d
│ ├── RecordBatch.d
│ ├── RecordBatchBuilder.d
│ ├── RecordBatchDatum.d
│ ├── RecordBatchFileReader.d
│ ├── RecordBatchFileWriter.d
│ ├── RecordBatchIterator.d
│ ├── RecordBatchReader.d
│ ├── RecordBatchStreamReader.d
│ ├── RecordBatchStreamWriter.d
│ ├── RecordBatchWriter.d
│ ├── ResizableBuffer.d
│ ├── RoundOptions.d
│ ├── RoundToMultipleOptions.d
│ ├── RunEndEncodeOptions.d
│ ├── RunEndEncodedArray.d
│ ├── RunEndEncodedDataType.d
│ ├── S3FileSystem.d
│ ├── S3GlobalOptions.d
│ ├── Scalar.d
│ ├── ScalarAggregateOptions.d
│ ├── ScalarDatum.d
│ ├── Schema.d
│ ├── SeekableInputStream.d
│ ├── SetLookupOptions.d
│ ├── SinkNodeOptions.d
│ ├── SlowFileSystem.d
│ ├── SortKey.d
│ ├── SortOptions.d
│ ├── SourceNodeOptions.d
│ ├── SparseUnionArray.d
│ ├── SparseUnionArrayBuilder.d
│ ├── SparseUnionDataType.d
│ ├── SparseUnionScalar.d
│ ├── SplitPatternOptions.d
│ ├── StreamDecoder.d
│ ├── StreamListener.d
│ ├── StrftimeOptions.d
│ ├── StringArray.d
│ ├── StringArrayBuilder.d
│ ├── StringDataType.d
│ ├── StringDictionaryArrayBuilder.d
│ ├── StringScalar.d
│ ├── StrptimeOptions.d
│ ├── StrptimeTimestampParser.d
│ ├── StructArray.d
│ ├── StructArrayBuilder.d
│ ├── StructDataType.d
│ ├── StructFieldOptions.d
│ ├── StructScalar.d
│ ├── SubTreeFileSystem.d
│ ├── Table.d
│ ├── TableBatchReader.d
│ ├── TableConcatenateOptions.d
│ ├── TableDatum.d
│ ├── TakeOptions.d
│ ├── TemporalDataType.d
│ ├── Tensor.d
│ ├── Time32Array.d
│ ├── Time32ArrayBuilder.d
│ ├── Time32DataType.d
│ ├── Time32Scalar.d
│ ├── Time64Array.d
│ ├── Time64ArrayBuilder.d
│ ├── Time64DataType.d
│ ├── Time64Scalar.d
│ ├── TimeDataType.d
│ ├── TimestampArray.d
│ ├── TimestampArrayBuilder.d
│ ├── TimestampDataType.d
│ ├── TimestampParser.d
│ ├── TimestampScalar.d
│ ├── UInt16Array.d
│ ├── UInt16ArrayBuilder.d
│ ├── UInt16DataType.d
│ ├── UInt16Scalar.d
│ ├── UInt32Array.d
│ ├── UInt32ArrayBuilder.d
│ ├── UInt32DataType.d
│ ├── UInt32Scalar.d
│ ├── UInt64Array.d
│ ├── UInt64ArrayBuilder.d
│ ├── UInt64DataType.d
│ ├── UInt64Scalar.d
│ ├── UInt8Array.d
│ ├── UInt8ArrayBuilder.d
│ ├── UInt8DataType.d
│ ├── UInt8Scalar.d
│ ├── UIntArrayBuilder.d
│ ├── UTF8NormalizeOptions.d
│ ├── UnionArray.d
│ ├── UnionArrayBuilder.d
│ ├── UnionDataType.d
│ ├── UnionScalar.d
│ ├── VarianceOptions.d
│ ├── WritableFileIF.d
│ ├── WritableFileT.d
│ ├── WritableIF.d
│ ├── WritableT.d
│ ├── WriteOptions.d
│ └── c
│ ├── functions.d
│ └── types.d
├── arrowdataset
│ ├── CSVFileFormat.d
│ ├── Dataset.d
│ ├── DatasetFactory.d
│ ├── DirectoryPartitioning.d
│ ├── FileFormat.d
│ ├── FileSystemDataset.d
│ ├── FileSystemDatasetFactory.d
│ ├── FileSystemDatasetWriteOptions.d
│ ├── FileWriteOptions.d
│ ├── FileWriter.d
│ ├── FinishOptions.d
│ ├── Fragment.d
│ ├── HivePartitioning.d
│ ├── HivePartitioningOptions.d
│ ├── IPCFileFormat.d
│ ├── InMemoryFragment.d
│ ├── KeyValuePartitioning.d
│ ├── KeyValuePartitioningOptions.d
│ ├── ParquetFileFormat.d
│ ├── Partitioning.d
│ ├── PartitioningFactoryOptions.d
│ ├── Scanner.d
│ ├── ScannerBuilder.d
│ └── c
│ ├── functions.d
│ └── types.d
└── arrowflight
├── CallOptions.d
├── Client.d
├── ClientOptions.d
├── CommandDescriptor.d
├── Criteria.d
├── DataStream.d
├── Descriptor.d
├── DoPutResult.d
├── Endpoint.d
├── Info.d
├── Location.d
├── MessageReader.d
├── MetadataReader.d
├── MetadataWriter.d
├── PathDescriptor.d
├── RecordBatchReader.d
├── RecordBatchStream.d
├── RecordBatchWriter.d
├── ServableIF.d
├── ServableT.d
├── Server.d
├── ServerAuthHandler.d
├── ServerAuthReader.d
├── ServerAuthSender.d
├── ServerCallContext.d
├── ServerCustomAuthHandler.d
├── ServerOptions.d
├── StreamChunk.d
├── StreamReader.d
├── StreamWriter.d
├── Ticket.d
└── c
├── functions.d
└── types.d
8 directories, 349 files |
It's possible. However, need manual fixes, like: Note
# $PWD = arrow
$ dub test --root=d -f
Generating test runner configuration 'arrow-d-test-unittest' for 'unittest' (library).
Pre-gen Running commands for glibd
Existing package girtod found locally
0 packages fetched, 1 already present, 0 failed
Building package girtod in /home/kassane/.dub/packages/girtod/0.23.2/girtod/
Pre-gen Running commands for girtod
Starting Performing "debug" build using /usr/bin/ldc2 for x86_64.
Building girtod 0.23.2: building configuration [application]
Linking girtod
Running ../../../girtod/0.23.2/girtod/girtod -i src -o generated --use-runtime-linker
copying file [src/gtkd] to [generated/gtkd]
Starting Performing "unittest" build using /usr/bin/ldc2 for x86_64.
Building glibd 2.4.3+commit.2.g1546823: building configuration [library]
Building arrow-d ~master: building configuration [arrow-d-test-unittest]
source/arrow/GIOOutputStream.d(12,8): Error: `OutputStream` matches conflicting symbols:
public class GIOOutputStream : OutputStream
^
source/arrow/OutputStream.d(18,8): class `arrow.OutputStream.OutputStream`
public class OutputStream : ObjectG, FileIF, WritableIF
^
../../.dub/packages/glibd/1546823185334c4727d378baf890fa13d9fa4cbd/glibd/generated/gio/OutputStream.d(58,8): class `gio.OutputStream.OutputStream`
public class OutputStream : ObjectG
^
source/arrow/GIOOutputStream.d(55,9): Error: `OutputStream` matches conflicting symbols:
public this(OutputStream gioOutputStream)
^
source/arrow/OutputStream.d(18,8): class `arrow.OutputStream.OutputStream`
public class OutputStream : ObjectG, FileIF, WritableIF
^
../../.dub/packages/glibd/1546823185334c4727d378baf890fa13d9fa4cbd/glibd/generated/gio/OutputStream.d(58,8): class `gio.OutputStream.OutputStream`
public class OutputStream : ObjectG
^
source/arrow/GIOOutputStream.d(76,22): Error: `OutputStream` matches conflicting symbols:
public OutputStream getRaw()
^
source/arrow/OutputStream.d(18,8): class `arrow.OutputStream.OutputStream`
public class OutputStream : ObjectG, FileIF, WritableIF
^
../../.dub/packages/glibd/1546823185334c4727d378baf890fa13d9fa4cbd/glibd/generated/gio/OutputStream.d(58,8): class `gio.OutputStream.OutputStream`
public class OutputStream : ObjectG
^
source/arrow/LargeListArray.d(138,27): Error: function `DataType arrow.LargeListArray.LargeListArray.getValueType()` does not override any function, did you mean to override `arrow.c.types.GArrowType arrow.Array.Array.getValueType()`?
public override DataType getValueType()
^
source/arrow/ListArray.d(134,27): Error: function `DataType arrow.ListArray.ListArray.getValueType()` does not override any function, did you mean to override `arrow.c.types.GArrowType arrow.Array.Array.getValueType()`?
public override DataType getValueType()
^ |
There is also another older implementation: https://github.com/rostyboost/darrow |
For C library, D importC solve (e.g. opendal_c header). However, C++ API does require manual intervention, because the existing bindgens for cpp2d have specific use cases, like cppconv. Edit: I don't plan to support nanoarrow at the moment. |
Having made some minor fixes, the biggest issues are linked to multiple inheritance by the auto-generated binding and conflicting members per inherited module. commit tested: kassane@86c9062 Build: 🆗 # Arrow-libs: $PWD/build/release
# Arrow-glibs: $PWD/c_glib.build/arrow-glib, $PWD/c_glib.build/arrow-flight-glib, $PWD/c_glib.build/arrow-dataset-glib
$ LD_LIBRARY_PATH=$PWD/c_glib.build/arrow-glib:$PWD/build/release:$PWD/c_glib.build/arrow-flight-glib:$PWD/c_glib.build/arrow-dataset-glib dub test -f --root=d/
Generating test runner configuration 'arrow-d-test-unittest' for 'unittest' (library).
Pre-gen Running commands for glibd
Existing package girtod found locally
0 packages fetched, 1 already present, 0 failed
Building package girtod in /home/kassane/.dub/packages/girtod/0.23.2/girtod/
Pre-gen Running commands for girtod
Starting Performing "debug" build using /home/kassane/zig/ldc2-master/bin/ldc2 for x86_64.
Building girtod 0.23.2: building configuration [application]
Linking girtod
Running ../../../girtod/0.23.2/girtod/girtod -i src -o generated --use-runtime-linker
copying file [src/gtkd] to [generated/gtkd]
Starting Performing "unittest" build using /home/kassane/zig/ldc2-master/bin/ldc2 for x86_64.
Building glibd 2.4.3+commit.2.g1546823: building configuration [library]
Building arrow-d ~master: building configuration [arrow-d-test-unittest]
Linking arrow-d-test-unittest
Running arrow-d-test-unittest
/home/kassane/arrow/d/arrow-d-test-unittest(+0x23f1b7) [0x65499791d1b7]
/usr/lib/libc.so.6(+0x3d1d0) [0x7a751a9151d0]
/home/kassane/arrow/c_glib.build/arrow-glib/libarrow-glib.so.1800(_Z20garrow_array_get_rawP12_GArrowArray+0xa) [0x7a751acb466a]
/home/kassane/arrow/c_glib.build/arrow-glib/libarrow-glib.so.1800(garrow_array_get_length+0x1e) [0x7a751acb522e]
/home/kassane/arrow/d/arrow-d-test-unittest(+0xf3db0) [0x6549977d1db0]
/home/kassane/arrow/d/arrow-d-test-unittest(+0xf3aeb) [0x6549977d1aeb]
/home/kassane/arrow/d/arrow-d-test-unittest(+0x23f1f8) [0x65499791d1f8]
/home/kassane/arrow/d/arrow-d-test-unittest(+0x24c417) [0x65499792a417]
/home/kassane/arrow/d/arrow-d-test-unittest(+0x24c949) [0x65499792a949]
/home/kassane/arrow/d/arrow-d-test-unittest(+0x24c3bc) [0x65499792a3bc]
/home/kassane/arrow/d/arrow-d-test-unittest(+0x24393f) [0x65499792193f]
/home/kassane/arrow/d/arrow-d-test-unittest(+0x23f0a4) [0x65499791d0a4]
/home/kassane/arrow/d/arrow-d-test-unittest(+0x246a1b) [0x654997924a1b]
/home/kassane/arrow/d/arrow-d-test-unittest(+0x246947) [0x654997924947]
/home/kassane/arrow/d/arrow-d-test-unittest(+0x24679d) [0x65499792479d]
/home/kassane/arrow/d/arrow-d-test-unittest(+0x14aec2) [0x654997828ec2]
/usr/lib/libc.so.6(+0x25e08) [0x7a751a8fde08]
/usr/lib/libc.so.6(__libc_start_main+0x8c) [0x7a751a8fdecc]
/home/kassane/arrow/d/arrow-d-test-unittest(+0xf3995) [0x6549977d1995]
Error Program exited with code -11
$ c++filt _Z20garrow_array_get_rawP12_GArrowArray
garrow_array_get_raw(_GArrowArray*) |
Great! Let's work on this step-by-step. How about supporting only arrow-glib as the first step? We can add support for other modules such as arrow-flight-glib later. Could you open a PR for it? I'll try it too. In general, we want to avoid changing auto generated files. We want to improve the upstream like you did for gtkd-developers/gir-to-d#45 instead. FYI: We also use the approach for the C# Parquet bindings: #41886 |
Kymorphia/gid provides initial support using the official Arrow GIR files. |
Describe the enhancement requested
Like Swift and other languages from small communities.
I would like to suggest that the D language (v2) implementation be added to arrow-upstream (or a separate repository as a library).
outdated ref. library: https://github.com/ananis25/darrow
Auto-generated: https://github.com/rostyboost/darrow
Component(s)
Integration
The text was updated successfully, but these errors were encountered: