From ce11e561d37db3cdbc8c55e000ca46256f504dc1 Mon Sep 17 00:00:00 2001 From: Kevin Gurney Date: Fri, 29 Mar 2024 16:57:39 -0400 Subject: [PATCH] GH-38659: [CI][MATLAB][Packaging] Add MATLAB `packaging` task to crossbow `tasks.yml` (#38660) ### Rationale for this change Per the following mailing list discussion: https://lists.apache.org/thread/0xyow40h7b1bptsppb0rxd4g9r1xpmh6 to integrate the MATLAB interface code with the existing Arrow release tooling, we first need to add a task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) to crossbow. This packaging task will automatically create a [MLTBX file](https://www.mathworks.com/help/matlab/creating-help.html?s_tid=CRUX_lftnav) (the MATLAB equivalent to a Python binary wheel or Ruby gem) that can be installed via a "one-click" workflow in MATLAB. This will enable MATLAB users to install the interface without needing to build from source. ### Licensing For more information about licensing of the MLTBX file contents, please refer to the mailing list discussion and ASF Legal ticket linked below: 1. https://lists.apache.org/thread/zlpnncgvo6l4cvkxfxn7zt4q7qhptotw 2. https://issues.apache.org/jira/browse/LEGAL-665 ### What changes are included in this PR? 1. Added a `matlab` task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) in `dev/tasks/tasks.yml`. 4. Added a new GitHub Actions workflow called `dev/tasks/matlab/github.yml` which builds the MATLAB interface code on all platforms (Windows, macOS, and Ubuntu 20.04) and packages the generated build artifacts into a single MLTBX file using [`matlab.addons.toolbox.packageToolbox`](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html). 5. Changed the GitHub-hosted runner to `ubuntu-20.04` from `ubuntu-latest` for the MATLAB CI check (i.e. `.github/workflows/matlab.yml`). The rationale for this change is that we primarily develop and qualify against Debian 11 locally, but the CI check has been building against `ubuntu-latest` (i.e. `ubuntu-22.04`). There are two issues with using `ubuntu-22.04`. The first is that the version of `GLIBC` shipped with `ubuntu-22.04` is not fully compatible with the version of `GLIBC` shipped with `Debian 11`. This results in a runtime linker error when qualifying the packaged MATLAB interface code locally on Debian 11. The second issue with using `ubuntu-22.04` is that the system version of `GLIBCXX` is not fully compatible with the version of `GLIBCXX` bundled with MATLAB R2023a (this is a relatively common issue - e.g. see: https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found). Previously, we worked around this issue in GitHub Actions by using `LD_PRELOAD` before starting up MATLAB to run the unit tests. On the other hand, the version of `GLIBCXX` shipped with `ubuntu-20.04` **is** binary compatible with the version bundled with MATLAB R2023a. Therefore, we believe it would be better to use `ubuntu-20.04` in the MATLAB CI checks for the time being until we can qualify the MATLAB interface against `ubuntu-22.04`. ### Are these changes tested? Yes. 1. Successfully submitted a crossbow `packaging` job for the MATLAB interface by commenting `@ github-actions crossbow submit matlab`. Example of a successful packaging job: https://github.com/ursacomputing/crossbow/actions/runs/6893506432/job/18753227453. 2. Manually installed the resulting MLTBX file on macOS, Windows, Debian 11, and Ubuntu 20.04. Ran all tests under `matlab/test` using `runtests . IncludeSubFolders 1`. ### Are there any user-facing changes? No. ### Notes 1. While qualifying, we discovered that [MATLAB's programmatic packaging interface](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html) does not properly include symbolic link files in the packaged MLTBX file. We've reported this bug to the relevant MathWorks development team. As a temporary workaround, we included a step to change the expected name of the Arrow C++ libraries (using `patchelf`/`install_name_tool`) which `libarrowproxy.so`/`libarrowproxy.dylib` depends on to `libarrow.so.1500.0.0`/`libarrow.1500.0.0.dylib` instead of `libarrow.so.1500`/`libarrow.1500.dylib`, respectively. Once this bug is resolved, we will remove this step from the workflow. ### Future Directions 1. Add tooling to upload release candidate (RC) MLTBX files to apache/arrow's GitHub Releases area and mark them as "Prerelease". In other words, modify https://github.com/apache/arrow/blob/main/dev/release/05-binary-upload.sh. 2. Add a post-release script to upload release MLTBX files to apache/arrow's GitHub Releases area (similar to how https://github.com/apache/arrow/blob/main/dev/release/post-09-python.sh works). 4. Enable nightly builds for the MATLAB interface. 6. Document how to qualify a MATLAB Arrow interface release. 7. Enable building and testing the MATLAB Arrow interface on multiple Ubuntu distributions simulatneously (e.g. 20.04 *and* 22.04). * Closes: #38659 * GitHub Issue: #38659 Lead-authored-by: Sarah Gilmore Co-authored-by: Kevin Gurney Signed-off-by: Kevin Gurney --- .github/workflows/matlab.yml | 28 +++-- dev/tasks/matlab/github.yml | 162 ++++++++++++++++++++++++++ dev/tasks/tasks.yml | 9 ++ matlab/CMakeLists.txt | 17 --- matlab/tools/packageMatlabInterface.m | 84 +++++++++++++ 5 files changed, 273 insertions(+), 27 deletions(-) create mode 100644 dev/tasks/matlab/github.yml create mode 100644 matlab/tools/packageMatlabInterface.m diff --git a/.github/workflows/matlab.yml b/.github/workflows/matlab.yml index eceeb551a0653..dfc734e043371 100644 --- a/.github/workflows/matlab.yml +++ b/.github/workflows/matlab.yml @@ -42,7 +42,23 @@ jobs: ubuntu: name: AMD64 Ubuntu 20.04 MATLAB - runs-on: ubuntu-latest + # Explicitly pin the Ubuntu version to 20.04 for the time being because: + # + # 1. The version of GLIBCXX shipped with Ubuntu 22.04 is not binary compatible + # with the GLIBCXX bundled with MATLAB R2023a. This is a relatively common + # issue. + # + # For example, see: + # + # https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found + # + # 2. The version of GLIBCXX shipped with Ubuntu 22.04 is not binary compatible with + # the version of GLIBCXX shipped with Debian 11. Several of the Arrow community + # members who work on the MATLAB bindings use Debian 11 locally for qualification. + # Using Ubuntu 20.04 eases development workflows for these community members. + # + # In the future, we can investigate adding support for building against more Linux (e.g. `ubuntu-22.04`) and MATLAB versions (e.g. R2023b). + runs-on: ubuntu-20.04 if: ${{ !contains(github.event.pull_request.title, 'WIP') }} steps: - name: Check out repository @@ -74,14 +90,6 @@ jobs: run: ci/scripts/matlab_build.sh $(pwd) - name: Run MATLAB Tests env: - # libarrow.so requires a more recent version of libstdc++.so - # than is bundled with MATLAB under /sys/os/glnxa64. - # Therefore, if a MEX function that depends on libarrow.so - # is executed within the MATLAB address space, runtime linking - # errors will occur. To work around this issue, we can explicitly - # force MATLAB to use the system libstdc++.so via LD_PRELOAD. - LD_PRELOAD: /usr/lib/x86_64-linux-gnu/libstdc++.so.6 - # Add the installation directory to the MATLAB Search Path by # setting the MATLABPATH environment variable. MATLABPATH: matlab/install/arrow_matlab @@ -89,7 +97,7 @@ jobs: with: select-by-folder: matlab/test macos: - name: AMD64 macOS 11 MATLAB + name: AMD64 macOS 12 MATLAB runs-on: macos-latest if: ${{ !contains(github.event.pull_request.title, 'WIP') }} steps: diff --git a/dev/tasks/matlab/github.yml b/dev/tasks/matlab/github.yml new file mode 100644 index 0000000000000..1cd3949efbcf8 --- /dev/null +++ b/dev/tasks/matlab/github.yml @@ -0,0 +1,162 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +{% import 'macros.jinja' as macros with context %} + +{{ macros.github_header() }} + +jobs: + + ubuntu: + name: AMD64 Ubuntu 20.04 MATLAB + runs-on: ubuntu-20.04 + steps: + {{ macros.github_checkout_arrow()|indent }} + - name: Install ninja-build + run: sudo apt-get update && sudo apt-get install ninja-build + - name: Install MATLAB + uses: matlab-actions/setup-matlab@v1 + with: + release: R2023a + - name: Build MATLAB Interface + env: + {{ macros.github_set_sccache_envvars()|indent(8) }} + run: arrow/ci/scripts/matlab_build.sh $(pwd)/arrow + - name: Change shared library dependency name + # MATLAB's programmatic packaging interface does not properly + # include symbolic link files in the package MLTBX - this is a + # bug. As a temporary workaround, change the expected name of the + # Arrow C++ library which libarrowproxy.so depends on. For example, + # change libarrow.so.1500 to libarrow.so.1500.0.0. + run: | + pushd arrow/matlab/install/arrow_matlab/+libmexclass/+proxy/ + SYMLINK_ARROW_LIB="$(find . -name 'libarrow.so.*' -type l | xargs basename)" + REGULAR_ARROW_LIB="$(echo libarrow.so.*.*)" + echo "SYMLINK_ARROW_LIB = ${SYMLINK_ARROW_LIB}" + echo "REGULAR_ARROW_LIB = ${REGULAR_ARROW_LIB}" + patchelf --replace-needed $SYMLINK_ARROW_LIB $REGULAR_ARROW_LIB libarrowproxy.so + popd + - name: Compress into single artifact + run: tar -cvzf matlab-arrow-ubuntu.tar.gz arrow/matlab/install/arrow_matlab + - name: Upload artifacts + uses: actions/upload-artifact@v4 + with: + name: matlab-arrow-ubuntu.tar.gz + path: matlab-arrow-ubuntu.tar.gz + + macos: + name: AMD64 macOS 12 MATLAB + runs-on: macos-latest + steps: + {{ macros.github_checkout_arrow()|indent }} + - name: Install ninja-build + run: brew install ninja + - name: Install MATLAB + uses: matlab-actions/setup-matlab@v1 + with: + release: R2023a + - name: Build MATLAB Interface + env: + {{ macros.github_set_sccache_envvars()|indent(8) }} + run: arrow/ci/scripts/matlab_build.sh $(pwd)/arrow + - name: Change shared library dependency name + # MATLAB's programmatic packaging interface does not properly + # include symbolic link files in the package MLTBX - this is a + # bug. As a temporary workaround, change the expected name of the + # Arrow C++ library which libarrowproxy.dylib depends on. + # For example, change libarrow.1500.dylib to libarrow.1500.0.0.dylib. + run: | + pushd arrow/matlab/install/arrow_matlab/+libmexclass/+proxy + SYMLINK_ARROW_LIB="$(find . -name 'libarrow.*.dylib' -type l | xargs basename)" + REGULAR_ARROW_LIB="$(echo libarrow.*.*.dylib)" + echo "SYMLINK_ARROW_LIB = ${SYMLINK_ARROW_LIB}" + echo "REGULAR_ARROW_LIB = ${REGULAR_ARROW_LIB}" + install_name_tool -change @rpath/$SYMLINK_ARROW_LIB @rpath/$REGULAR_ARROW_LIB libarrowproxy.dylib + popd + - name: Compress into single artifact + run: tar -cvzf matlab-arrow-macos.tar.gz arrow/matlab/install/arrow_matlab + - name: Upload artifacts + uses: actions/upload-artifact@v4 + with: + name: matlab-arrow-macos.tar.gz + path: matlab-arrow-macos.tar.gz + + windows: + name: AMD64 Windows 2022 MATLAB + runs-on: windows-2022 + steps: + {{ macros.github_checkout_arrow()|indent }} + - name: Install MATLAB + uses: matlab-actions/setup-matlab@v1 + with: + release: R2023a + - name: Install sccache + shell: bash + run: arrow/ci/scripts/install_sccache.sh pc-windows-msvc $(pwd)/sccache + - name: Build MATLAB Interface + shell: cmd + env: + {{ macros.github_set_sccache_envvars()|indent(8) }} + run: | + call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64 + bash -c "arrow/ci/scripts/matlab_build.sh $(pwd)/arrow" + - name: Compress into single artifact + shell: bash + run: tar -cvzf matlab-arrow-windows.tar.gz arrow/matlab/install/arrow_matlab + - name: Upload artifacts + uses: actions/upload-artifact@v4 + with: + name: matlab-arrow-windows.tar.gz + path: matlab-arrow-windows.tar.gz + + package-mltbx: + name: Package MATLAB Toolbox (MLTBX) Files + runs-on: ubuntu-latest + needs: + - ubuntu + - macos + - windows + steps: + {{ macros.github_checkout_arrow(fetch_depth=0)|indent }} + - name: Download Artifacts + uses: actions/download-artifact@v4 + with: + path: artifacts-downloaded + - name: Decompress Artifacts + run: | + mv artifacts-downloaded/*/*.tar.gz . + tar -xzvf matlab-arrow-ubuntu.tar.gz + tar -xzvf matlab-arrow-macos.tar.gz + tar -xzvf matlab-arrow-windows.tar.gz + - name: Copy LICENSE.txt and NOTICE.txt for packaging + run: | + cp arrow/LICENSE.txt arrow/matlab/install/arrow_matlab/LICENSE.txt + cp arrow/NOTICE.txt arrow/matlab/install/arrow_matlab/NOTICE.txt + - name: Install MATLAB + uses: matlab-actions/setup-matlab@v1 + with: + release: R2023a + - name: Run commands + env: + MATLABPATH: arrow/matlab/tools + ARROW_MATLAB_TOOLBOX_FOLDER: arrow/matlab/install/arrow_matlab + ARROW_MATLAB_TOOLBOX_OUTPUT_FOLDER: artifacts/matlab-dist + ARROW_MATLAB_TOOLBOX_VERSION: {{ arrow.no_rc_version }} + uses: matlab-actions/run-command@v1 + with: + command: packageMatlabInterface + {{ macros.github_upload_releases(["artifacts/matlab-dist/*.mltbx"])|indent }} diff --git a/dev/tasks/tasks.yml b/dev/tasks/tasks.yml index 2abfbc15174df..5e1ef8d13b988 100644 --- a/dev/tasks/tasks.yml +++ b/dev/tasks/tasks.yml @@ -59,6 +59,7 @@ groups: - conan-* - debian-* - java-jars + - matlab - nuget - python-sdist - r-binary-packages @@ -665,6 +666,14 @@ tasks: params: formula: apache-arrow.rb + ############################## MATLAB Packages ################################ + + matlab: + ci: github + template: matlab/github.yml + artifacts: + - matlab-arrow-{no_rc_version}.mltbx + ############################## Arrow JAR's ################################## java-jars: diff --git a/matlab/CMakeLists.txt b/matlab/CMakeLists.txt index 206ecb318b3cc..b85f782d2d37a 100644 --- a/matlab/CMakeLists.txt +++ b/matlab/CMakeLists.txt @@ -201,9 +201,6 @@ get_filename_component(ARROW_SHARED_LIB_DIR ${ARROW_SHARED_LIB} DIRECTORY) get_filename_component(ARROW_SHARED_LIB_FILENAME ${ARROW_SHARED_LIB} NAME_WE) if(NOT Arrow_FOUND) - # If Arrow_FOUND is false, Arrow is built by the arrow_shared target and needs - # to be copied to CMAKE_PACKAGED_INSTALL_DIR. - if(APPLE) # Install libarrow.dylib (symlink) and the real files it points to. # on macOS, we need to match these files: libarrow.dylib @@ -226,20 +223,6 @@ if(NOT Arrow_FOUND) set(SHARED_LIBRARY_VERSION_REGEX ${ARROW_SHARED_LIB_FILENAME}${CMAKE_SHARED_LIBRARY_SUFFIX}) endif() - - # The subfolders cmake and pkgconfig are excluded as they will be empty. - # Note: The following CMake Issue suggests enabling an option to exclude all - # folders that would be empty after installation: - # https://gitlab.kitware.com/cmake/cmake/-/issues/17122 - - set(CMAKE_PACKAGED_INSTALL_DIR "${CMAKE_INSTALL_DIR}/+arrow") - - install(DIRECTORY "${ARROW_SHARED_LIB_DIR}/" - DESTINATION ${CMAKE_PACKAGED_INSTALL_DIR} - FILES_MATCHING - REGEX ${SHARED_LIBRARY_VERSION_REGEX} - PATTERN "cmake" EXCLUDE - PATTERN "pkgconfig" EXCLUDE) endif() # MATLAB_ADD_INSTALL_DIR_TO_STARTUP_FILE toggles whether an addpath command to add the install diff --git a/matlab/tools/packageMatlabInterface.m b/matlab/tools/packageMatlabInterface.m new file mode 100644 index 0000000000000..55b4d4241a569 --- /dev/null +++ b/matlab/tools/packageMatlabInterface.m @@ -0,0 +1,84 @@ +% Licensed to the Apache Software Foundation (ASF) under one +% or more contributor license agreements. See the NOTICE file +% distributed with this work for additional information +% regarding copyright ownership. The ASF licenses this file +% to you under the Apache License, Version 2.0 (the +% "License"); you may not use this file except in compliance +% with the License. You may obtain a copy of the License at +% +% http://www.apache.org/licenses/LICENSE-2.0 +% +% Unless required by applicable law or agreed to in writing, +% software distributed under the License is distributed on an +% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +% KIND, either express or implied. See the License for the +% specific language governing permissions and limitations +% under the License. + +toolboxFolder = string(getenv("ARROW_MATLAB_TOOLBOX_FOLDER")); +outputFolder = string(getenv("ARROW_MATLAB_TOOLBOX_OUTPUT_FOLDER")); +toolboxVersionRaw = string(getenv("ARROW_MATLAB_TOOLBOX_VERSION")); + +appendLicenseText(fullfile(toolboxFolder, "LICENSE.txt")); +appendNoticeText(fullfile(toolboxFolder, "NOTICE.txt")); + +% Output folder must exist. +mkdir(outputFolder); + +disp("Toolbox Folder: " + toolboxFolder); +disp("Output Folder: " + outputFolder); +disp("Toolbox Version Raw: " + toolboxVersionRaw); + + +% Note: This string processing heuristic may not be robust to future +% changes in the Arrow versioning scheme. +dotIdx = strfind(toolboxVersionRaw, "."); +numDots = numel(dotIdx); +if numDots >= 3 + toolboxVersion = extractBefore(toolboxVersionRaw, dotIdx(3)); +else + toolboxVersion = toolboxVersionRaw; +end + +disp("Toolbox Version:" + toolboxVersion); + +identifier = "ad1d0fe6-22d1-4969-9e6f-0ab5d0f12ce3"; +opts = matlab.addons.toolbox.ToolboxOptions(toolboxFolder, identifier); +opts.ToolboxName = "MATLAB Arrow Interface"; +opts.ToolboxVersion = toolboxVersion; +opts.AuthorName = "The Apache Software Foundation"; +opts.AuthorEmail = "dev@arrow.apache.org"; + +% Set the SupportedPlatforms +opts.SupportedPlatforms.Win64 = true; +opts.SupportedPlatforms.Maci64 = true; +opts.SupportedPlatforms.Glnxa64 = true; +opts.SupportedPlatforms.MatlabOnline = true; + +% Interface is only qualified against R2023a at the moment +opts.MinimumMatlabRelease = "R2023a"; +opts.MaximumMatlabRelease = "R2023a"; + +opts.OutputFile = fullfile(outputFolder, compose("matlab-arrow-%s.mltbx", toolboxVersionRaw)); +disp("Output File: " + opts.OutputFile); +matlab.addons.toolbox.packageToolbox(opts); + +function appendLicenseText(filename) + licenseText = [ ... + newline + "--------------------------------------------------------------------------------" + newline + "3rdparty dependency mathworks/libmexclass is redistributed as a dynamically" + "linked shared library in certain binary distributions, like the MATLAB" + "distribution." + newline + "Copyright: 2022-2024 The MathWorks, Inc. All rights reserved." + "Homepage: https://github.com/mathworks/libmexclass" + "License: 3-clause BSD" ]; + writelines(licenseText, filename, WriteMode="append"); +end + +function appendNoticeText(filename) + noticeText = [ ... + newline + "---------------------------------------------------------------------------------" + newline + "This product includes software from The MathWorks, Inc. (Apache 2.0)" + " * Copyright (C) 2024 The MathWorks, Inc."]; + writelines(noticeText, filename, WriteMode="append"); +end \ No newline at end of file