Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41816: [C++] Add Minimal Meson Build of libarrow #45441

Merged
merged 10 commits into from
Feb 19, 2025
Merged

Conversation

WillAyd
Copy link
Contributor

@WillAyd WillAyd commented Feb 6, 2025

Rationale for this change

The Meson build system may be more user friendly to some developers, and may make it easier to perform tasks like valgrind, coverage, or ASAN/UBSAN coverage. There is also a prior art for using meson in the nanoarrow and arrow-adbc projects.

What changes are included in this PR?

This PR implements a Meson configuration that can build a minimal libarrow.

Are these changes tested?

A nightly CI job is also added to detect regressions.

Are there any user-facing changes?

No

@WillAyd WillAyd marked this pull request as draft February 6, 2025 00:45
@WillAyd
Copy link
Contributor Author

WillAyd commented Feb 6, 2025

@kou this is a simplified attempt at adding Meson to address your comment #41816 (comment)

This is currently slower than the CMake configuration by a good deal (pending investigation) and the configuration file is not 100% complete, but this should give us an idea of what a Meson configuration may look like.

To use, from the cpp directory developers can:

meson setup builddir -Dtests=true -Dcompute=true
meson compile -C builddir
meson test -C builddir

For ASAN/UBSAN, users could simply:

meson setup builddir -Dtests=true -Dcompute=true -Db_sanitize=address,undefined

Or if the project is already setup run:

meson configure -C builddir -Db_sanitize=address,undefined

Coverage can be enabled with:

meson configure -C builddir -Db_coverage=true

and tests can be run under valgrind with:

meson test -C builddir --wrap='valgrind --track-origins=yes --leak-check=full' --print-errorlog

@kou
Copy link
Member

kou commented Feb 6, 2025

Thanks.

Can we start from more simplified version? For example, we don't need libarrow_testing, tests and so on at the first step.

We want to add a nightly CI for this to detect regression.

We want to update version information automatically in release process. For example:

pushd "${ARROW_DIR}/c_glib"
sed -i.bak -E -e \
"s/^version = '.+'/version = '${version}'/" \
meson.build
rm -f meson.build.bak
git add meson.build

(I can do it in this branch later.)

Anyway, I didn't know that meson format exists! I want to use it in c_glib/ too.

@WillAyd
Copy link
Contributor Author

WillAyd commented Feb 6, 2025

Sure I can strip down further. So do think just something that builds libarrow is the right starting point?

@kou
Copy link
Member

kou commented Feb 7, 2025

Yes. Only minimal libarrow (e.g. no compute, no filesystem and so on) is the right starting point.

@WillAyd WillAyd force-pushed the meson-builds branch 2 times, most recently from c492403 to 61f4bbb Compare February 11, 2025 21:03
Copy link
Contributor Author

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kou . Hope this is more in line with your expectation.

As far as nightlies go, can you point me to an existing nightly CI setup in the repo? I was able to grep for some R nightlies, but not sure if there is existing infrastructure for C++ nightly jobs where this would be better placed

objects: objlibs,
include_directories: [include_dir],
install: true,
# compute/expression.cc may have undefined IPC symbols in non-IPC builds
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure if this was intentional or not in the existing code base

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we should disable features that depend on IPC in cpp/src/arrow/compute/expression.cc like GH-45171.

Could you open an issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem - see #45512

override_options: {'b_lundef': 'false'},
)

# Meson does not allow you to glob for headers to install. See also
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meson is pretty strict about wildcards for the reasons outlined in their FAQ. Idiomatically, Meson would want you to put the files you want to install in a separate directory and call install_subdir, but that would go beyond the scope of this initial PR I think

arrow_so_version = (ver_major.to_int() * 100 + ver_minor.to_int()).to_string()
arrow_full_so_version = '@0@.@1@.@2@'.format(arrow_so_version, ver_patch, 0)

# TODO: The Meson generated .pc file does not include the Apache license
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not research this in too much detail yet; figured I'd check if it was a big deal before investing time into a resolution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it.
arrow.pc.in has the license header because it's in this repository. Files that don't exist in this repository don't need it.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 11, 2025
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I push some commits to this branch for nightly CI and auto version update?

cpp/meson.build Outdated
'cpp',
'c',
version: '19.0.0-SNAPSHOT',
license: 'Apache 2.0',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
license: 'Apache 2.0',
license: 'Apache-2.0',

cpp/meson.build Outdated

git_id = get_option('git_id')
if git_id == ''
git_id = run_command('git', 'log', '-n1', '--format=%H', check: true).stdout().strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ignore git log error? This will fail when we use source archive.

Suggested change
git_id = run_command('git', 'log', '-n1', '--format=%H', check: true).stdout().strip()
git_id = run_command('git', 'log', '-n1', '--format=%H').stdout().strip()

cpp/meson.build Outdated

git_description = get_option('git_description')
if git_description == ''
git_description = run_command('git', 'describe', '--tags', check: true).stdout().strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
git_description = run_command('git', 'describe', '--tags', check: true).stdout().strip()
git_description = run_command('git', 'describe', '--tags').stdout().strip()


# Meson does not natively support object libraries
# https://github.com/mesonbuild/meson/issues/13843
objlib_sources = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to emulate objlib in Meson. It's just for faster build.
We can just use library().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a pretty poor understanding of how object libraries actually work, but I don't think we can just convert these to using the library call unless we want the linker to allow undefined symbols in these libraries as well (?)

Trying to do what I think you suggest generates a huge amount of errors like:

[7/96] Linking target src/arrow/libarrow_io.so
FAILED: src/arrow/libarrow_io.so 
c++  -o src/arrow/libarrow_io.so src/arrow/libarrow_io.so.p/io_buffered.cc.o src/arrow/libarrow_io.so.p/io_caching.cc.o src/arrow/libarrow_io.so.p/io_compressed.cc.o src/arrow/libarrow_io.so.p/io_file.cc.o src/arrow/libarrow_io.so.p/io_hdfs.cc.o src/arrow/libarrow_io.so.p/io_hdfs_internal.cc.o src/arrow/libarrow_io.so.p/io_interfaces.cc.o src/arrow/libarrow_io.so.p/io_memory.cc.o src/arrow/libarrow_io.so.p/io_slow.cc.o src/arrow/libarrow_io.so.p/io_stdio.cc.o src/arrow/libarrow_io.so.p/io_transform.cc.o -Wl,--as-needed -Wl,--no-undefined -Wl,-O1 -shared -fPIC -Wl,--start-group -Wl,-soname,libarrow_io.so -Wl,--end-group
/usr/bin/ld: src/arrow/libarrow_io.so.p/io_buffered.cc.o: in function `arrow::io::BufferedInputStream::SetBufferSize(long)':
buffered.cc:(.text+0x13a9): undefined reference to `arrow::Buffer::CheckCPU() const'
/usr/bin/ld: buffered.cc:(.text+0x13b1): undefined reference to `arrow::Buffer::CheckMutable() const'
/usr/bin/ld: buffered.cc:(.text+0x1407): undefined reference to `arrow::util::detail::StringStreamWrapper::StringStreamWrapper()'
/usr/bin/ld: buffered.cc:(.text+0x149e): undefined reference to `arrow::util::detail::StringStreamWrapper::str[abi:cxx11]()'
/usr/bin/ld: buffered.cc:(.text+0x14a6): undefined reference to `arrow::util::detail::StringStreamWrapper::~StringStreamWrapper()'
/usr/bin/ld: buffered.cc:(.text+0x14b6): undefined reference to `arrow::Status::Status(arrow::StatusCode, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we can remove objlib_sources and static_library() for them entirely.
We can just add sources in objlib_sources to arrow_srcs and use it in one library() (that already exists in this PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(We don't need objlib feature entirely with Meson.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I've still kept the top level dict for parity with CMake, and since some of the sources vary depending on compilation options (like arrow compute). Let me know if this is more in line with what you are thinking

objects: objlibs,
include_directories: [include_dir],
install: true,
# compute/expression.cc may have undefined IPC symbols in non-IPC builds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we should disable features that depend on IPC in cpp/src/arrow/compute/expression.cc like GH-45171.

Could you open an issue for this?

arrow_so_version = (ver_major.to_int() * 100 + ver_minor.to_int()).to_string()
arrow_full_so_version = '@0@.@1@.@2@'.format(arrow_so_version, ver_patch, 0)

# TODO: The Meson generated .pc file does not include the Apache license
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need it.
arrow.pc.in has the license header because it's in this repository. Files that don't exist in this repository don't need it.

configuration: conf_data,
format: 'cmake@',
install: true,
install_dir: 'arrow',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
install_dir: 'arrow',
install_dir: 'arrow/util',

Comment on lines 43 to 80
foreach cmakedefine : [
'ARROW_COMPUTE',
'ARROW_CSV',
'ARROW_CUDA',
'ARROW_DATASET',
'ARROW_FILESYSTEM',
'ARROW_FLIGHT',
'ARROW_FLIGHT_SQL',
'ARROW_IPC',
'ARROW_JEMALLOC',
'ARROW_JEMALLOC_VENDORED',
'ARROW_JSON',
'ARROW_MIMALLOC',
'ARROW_ORC',
'ARROW_PARQUET',
'ARROW_SUBSTRAIT',
'ARROW_AZURE',
'ARROW_ENABLE_THREADING',
'ARROW_GCS',
'ARROW_HDFS',
'ARROW_S3',
'ARROW_USE_GLOG',
'ARROW_USE_NATIVE_INT128',
'ARROW_WITH_BROTLI',
'ARROW_WITH_BZ2',
'ARROW_WITH_LZ4',
'ARROW_WITH_MUSL',
'ARROW_WITH_OPENTELEMETRY',
'ARROW_WITH_RE2',
'ARROW_WITH_SNAPPY',
'ARROW_WITH_UCX',
'ARROW_WITH_UTF8PROC',
'ARROW_WITH_ZLIB',
'ARROW_WITH_ZSTD',
'PARQUET_REQUIRE_ENCRYPTION',
]
conf_data.set(cmakedefine, false)
endforeach
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we don't need to use foreach here:

conf_data.set('ARROW_COMPUTE', false)
conf_data.set('ARROW_CSV', false)
...

Comment on lines 101 to 102
install: true,
install_dir: 'arrow',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to install _internal.h:

Suggested change
install: true,
install_dir: 'arrow',

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Feb 12, 2025
@WillAyd
Copy link
Contributor Author

WillAyd commented Feb 12, 2025

Sure no problem

@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels Feb 12, 2025
@WillAyd WillAyd force-pushed the meson-builds branch 2 times, most recently from b7a25a4 to 78d5fdb Compare February 13, 2025 15:40
@WillAyd WillAyd marked this pull request as ready for review February 13, 2025 20:26
@kou
Copy link
Member

kou commented Feb 17, 2025

@github-actions crossbow submit test-conda-cpp-meson

Copy link

Revision: 564df5f

Submitted crossbow builds: ursacomputing/crossbow @ actions-527e5649ae

Task Status
test-conda-cpp-meson GitHub Actions

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rebased on main, added support for updating version in release process and added nightly CI job for Meson.

Could you update the PR description?

Comment on lines 21 to 22
# Meson does not natively support object libraries
# https://github.com/mesonbuild/meson/issues/13843
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this comment?
We don't need this comment now, right?

'device.h',
'extension_type.h',
'memory_pool.h',
'memory_pool_test.h',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'memory_pool_test.h',

'extension_type.h',
'memory_pool.h',
'memory_pool_test.h',
'pch.h',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'pch.h',

Comment on lines 277 to 279
'stl_allocator.h',
'stl.h',
'stl_iterator.h',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you sort them?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 17, 2025
Comment on lines +94 to +96
meson test \
--print-errorlogs \
"$@"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't building any tests now, are we? On the current CI:

+ meson test --print-errorlogs
No tests defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we aren't actually building any tests - just creating an MRE for Meson with libarrow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We'll add tests later.

@github-actions github-actions bot removed the awaiting changes Awaiting changes label Feb 18, 2025
@WillAyd WillAyd changed the title GH-41816: [C++] Meson Build System Support GH-41816: [C++] Add Minimal Meson Build of libarrow to CI Feb 18, 2025
@github-actions github-actions bot added the awaiting change review Awaiting change review label Feb 18, 2025
@WillAyd
Copy link
Contributor Author

WillAyd commented Feb 18, 2025

I've rebased on main, added support for updating version in release process and added nightly CI job for Meson.

Could you update the PR description?

I've addressed your feedback. Not sure I am doing what you are asking with respect to the PR description but I changed it to be more exact. Let me know if there is something else should be doing.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 18, 2025
@kou
Copy link
Member

kou commented Feb 18, 2025

Not sure I am doing what you are asking with respect to the PR description but I changed it to be more exact.

We use the PR description for a commit message when we merge this. So we want to update the PR description to reflect the latest PR changes.

BTW, it seems that you didn't change the PR description yet. (You might forget to save your PR description change.)

@kou
Copy link
Member

kou commented Feb 18, 2025

@github-actions crossbow submit test-conda-cpp-meson

Copy link

Revision: 428c2f3

Submitted crossbow builds: ursacomputing/crossbow @ actions-3c4e1b441c

Task Status
test-conda-cpp-meson GitHub Actions

@WillAyd
Copy link
Contributor Author

WillAyd commented Feb 18, 2025

Do you mean the commit message? I can squash up and change that if so. On Github I've already change the PR title

@kou
Copy link
Member

kou commented Feb 18, 2025

I mean the merge commit's commit message not commit messages of commits in this PRs.

You don't need to squash commits in this PR. Because we use "Squash and merge" feature provided by GitHub. We use the following commit message:

${PR_TITLE}

${PR_DESCRIPTION}

For example, ec3d283 's commit message uses #44375 's title and description. It doesn't use commit messages of commits in #44375 .

On Github I've already change the PR title

Could you also update the PR description?

@WillAyd
Copy link
Contributor Author

WillAyd commented Feb 18, 2025

Ah got it - I've updated the OP

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@kou kou changed the title GH-41816: [C++] Add Minimal Meson Build of libarrow to CI GH-41816: [C++] Add Minimal Meson Build of libarrow Feb 19, 2025
@kou kou merged commit c7a9100 into apache:main Feb 19, 2025
44 of 45 checks passed
@kou kou removed the awaiting changes Awaiting changes label Feb 19, 2025
@github-actions github-actions bot added the awaiting merge Awaiting merge label Feb 19, 2025
@WillAyd WillAyd deleted the meson-builds branch February 19, 2025 05:06
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit c7a9100.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants