Skip to content

Commit

Permalink
apacheGH-38659: [CI][MATLAB][Packaging] Add MATLAB packaging task t…
Browse files Browse the repository at this point in the history
…o crossbow `tasks.yml` (apache#38660)

### Rationale for this change

Per the following mailing list discussion:

https://lists.apache.org/thread/0xyow40h7b1bptsppb0rxd4g9r1xpmh6

to integrate the MATLAB interface code with the existing Arrow release tooling, we first need to add a task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) to crossbow. This packaging task will automatically create a [MLTBX file](https://www.mathworks.com/help/matlab/creating-help.html?s_tid=CRUX_lftnav) (the MATLAB equivalent to a Python binary wheel or Ruby gem) that can be installed via a "one-click" workflow in MATLAB. This will enable MATLAB users to install the interface without needing to build from source.

### Licensing

For more information about licensing of the MLTBX file contents, please refer to the mailing list discussion and ASF Legal ticket linked below:

1. https://lists.apache.org/thread/zlpnncgvo6l4cvkxfxn7zt4q7qhptotw
2. https://issues.apache.org/jira/browse/LEGAL-665

### What changes are included in this PR?

1. Added a `matlab` task to the [`packaging` group](https://github.com/apache/arrow/blob/1fd11d33cb56fd7eff4dce05edaba1c9d8a1dccd/dev/tasks/tasks.yml#L55) in `dev/tasks/tasks.yml`.
4. Added a new GitHub Actions workflow called  `dev/tasks/matlab/github.yml` which builds the MATLAB interface code on all platforms (Windows, macOS, and Ubuntu 20.04) and packages the generated build artifacts into a single MLTBX file using [`matlab.addons.toolbox.packageToolbox`](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html).
5. Changed the GitHub-hosted runner to `ubuntu-20.04` from `ubuntu-latest` for the MATLAB CI check (i.e. `.github/workflows/matlab.yml`). The rationale for this change is that we primarily develop and qualify against Debian 11 locally, but the CI check has been building against `ubuntu-latest` (i.e. `ubuntu-22.04`). There are two issues with using `ubuntu-22.04`. The first is that the version of `GLIBC` shipped with `ubuntu-22.04` is not fully compatible with the version of `GLIBC` shipped with `Debian 11`. This results in a runtime linker error when qualifying the packaged MATLAB interface code locally on Debian 11. The second issue with using `ubuntu-22.04` is that the system version of `GLIBCXX` is not fully compatible with the version of `GLIBCXX` bundled with MATLAB R2023a (this is a relatively common issue - e.g. see: https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found). Previously, we worked around this issue in GitHub Actions by using `LD_PRELOAD` before starting up MATLAB to run the unit tests. On the other hand, the version of `GLIBCXX` shipped with `ubuntu-20.04` **is** binary compatible with the version bundled with MATLAB R2023a. Therefore, we believe it would be better to use `ubuntu-20.04` in the MATLAB CI checks for the time being until we can qualify the MATLAB interface against `ubuntu-22.04`.

### Are these changes tested?

Yes.

1. Successfully submitted a crossbow `packaging` job for the MATLAB interface by commenting `@ github-actions crossbow submit matlab`. Example of a successful packaging job: https://github.com/ursacomputing/crossbow/actions/runs/6893506432/job/18753227453.
2. Manually installed the resulting MLTBX file on macOS, Windows, Debian 11, and Ubuntu 20.04. Ran all tests under `matlab/test` using `runtests . IncludeSubFolders 1`.

### Are there any user-facing changes?

No.

### Notes
 
1. While qualifying, we discovered that [MATLAB's programmatic packaging interface](https://www.mathworks.com/help/matlab/ref/matlab.addons.toolbox.packagetoolbox.html) does not properly include symbolic link files in the packaged MLTBX file. We've reported this bug to the relevant MathWorks development team. As a temporary workaround, we included a step to change the expected name of the Arrow C++ libraries (using `patchelf`/`install_name_tool`) which `libarrowproxy.so`/`libarrowproxy.dylib` depends on to `libarrow.so.1500.0.0`/`libarrow.1500.0.0.dylib` instead of `libarrow.so.1500`/`libarrow.1500.dylib`, respectively. Once this bug is resolved, we will remove this step from the workflow.

### Future Directions
 
1. Add tooling to upload release candidate (RC) MLTBX files to apache/arrow's GitHub Releases area and mark them as "Prerelease". In other words, modify https://github.com/apache/arrow/blob/main/dev/release/05-binary-upload.sh.
2. Add a post-release script to upload release MLTBX files to apache/arrow's GitHub Releases area (similar to how https://github.com/apache/arrow/blob/main/dev/release/post-09-python.sh works).
4. Enable nightly builds for the MATLAB interface.
6. Document how to qualify a MATLAB Arrow interface release.
7. Enable building and testing the MATLAB Arrow interface on multiple Ubuntu distributions simulatneously (e.g. 20.04 *and* 22.04).

* Closes: apache#38659 
* GitHub Issue: apache#38659

Lead-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Co-authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
  • Loading branch information
kevingurney authored Mar 29, 2024
1 parent d32e4b0 commit ce11e56
Show file tree
Hide file tree
Showing 5 changed files with 273 additions and 27 deletions.
28 changes: 18 additions & 10 deletions .github/workflows/matlab.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,23 @@ jobs:

ubuntu:
name: AMD64 Ubuntu 20.04 MATLAB
runs-on: ubuntu-latest
# Explicitly pin the Ubuntu version to 20.04 for the time being because:
#
# 1. The version of GLIBCXX shipped with Ubuntu 22.04 is not binary compatible
# with the GLIBCXX bundled with MATLAB R2023a. This is a relatively common
# issue.
#
# For example, see:
#
# https://www.mathworks.com/matlabcentral/answers/1907290-how-to-manually-select-the-libstdc-library-to-use-to-resolve-a-version-glibcxx_-not-found
#
# 2. The version of GLIBCXX shipped with Ubuntu 22.04 is not binary compatible with
# the version of GLIBCXX shipped with Debian 11. Several of the Arrow community
# members who work on the MATLAB bindings use Debian 11 locally for qualification.
# Using Ubuntu 20.04 eases development workflows for these community members.
#
# In the future, we can investigate adding support for building against more Linux (e.g. `ubuntu-22.04`) and MATLAB versions (e.g. R2023b).
runs-on: ubuntu-20.04
if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
steps:
- name: Check out repository
Expand Down Expand Up @@ -74,22 +90,14 @@ jobs:
run: ci/scripts/matlab_build.sh $(pwd)
- name: Run MATLAB Tests
env:
# libarrow.so requires a more recent version of libstdc++.so
# than is bundled with MATLAB under <matlabroot>/sys/os/glnxa64.
# Therefore, if a MEX function that depends on libarrow.so
# is executed within the MATLAB address space, runtime linking
# errors will occur. To work around this issue, we can explicitly
# force MATLAB to use the system libstdc++.so via LD_PRELOAD.
LD_PRELOAD: /usr/lib/x86_64-linux-gnu/libstdc++.so.6

# Add the installation directory to the MATLAB Search Path by
# setting the MATLABPATH environment variable.
MATLABPATH: matlab/install/arrow_matlab
uses: matlab-actions/run-tests@v2
with:
select-by-folder: matlab/test
macos:
name: AMD64 macOS 11 MATLAB
name: AMD64 macOS 12 MATLAB
runs-on: macos-latest
if: ${{ !contains(github.event.pull_request.title, 'WIP') }}
steps:
Expand Down
162 changes: 162 additions & 0 deletions dev/tasks/matlab/github.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

{% import 'macros.jinja' as macros with context %}

{{ macros.github_header() }}

jobs:

ubuntu:
name: AMD64 Ubuntu 20.04 MATLAB
runs-on: ubuntu-20.04
steps:
{{ macros.github_checkout_arrow()|indent }}
- name: Install ninja-build
run: sudo apt-get update && sudo apt-get install ninja-build
- name: Install MATLAB
uses: matlab-actions/setup-matlab@v1
with:
release: R2023a
- name: Build MATLAB Interface
env:
{{ macros.github_set_sccache_envvars()|indent(8) }}
run: arrow/ci/scripts/matlab_build.sh $(pwd)/arrow
- name: Change shared library dependency name
# MATLAB's programmatic packaging interface does not properly
# include symbolic link files in the package MLTBX - this is a
# bug. As a temporary workaround, change the expected name of the
# Arrow C++ library which libarrowproxy.so depends on. For example,
# change libarrow.so.1500 to libarrow.so.1500.0.0.
run: |
pushd arrow/matlab/install/arrow_matlab/+libmexclass/+proxy/
SYMLINK_ARROW_LIB="$(find . -name 'libarrow.so.*' -type l | xargs basename)"
REGULAR_ARROW_LIB="$(echo libarrow.so.*.*)"
echo "SYMLINK_ARROW_LIB = ${SYMLINK_ARROW_LIB}"
echo "REGULAR_ARROW_LIB = ${REGULAR_ARROW_LIB}"
patchelf --replace-needed $SYMLINK_ARROW_LIB $REGULAR_ARROW_LIB libarrowproxy.so
popd
- name: Compress into single artifact
run: tar -cvzf matlab-arrow-ubuntu.tar.gz arrow/matlab/install/arrow_matlab
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: matlab-arrow-ubuntu.tar.gz
path: matlab-arrow-ubuntu.tar.gz

macos:
name: AMD64 macOS 12 MATLAB
runs-on: macos-latest
steps:
{{ macros.github_checkout_arrow()|indent }}
- name: Install ninja-build
run: brew install ninja
- name: Install MATLAB
uses: matlab-actions/setup-matlab@v1
with:
release: R2023a
- name: Build MATLAB Interface
env:
{{ macros.github_set_sccache_envvars()|indent(8) }}
run: arrow/ci/scripts/matlab_build.sh $(pwd)/arrow
- name: Change shared library dependency name
# MATLAB's programmatic packaging interface does not properly
# include symbolic link files in the package MLTBX - this is a
# bug. As a temporary workaround, change the expected name of the
# Arrow C++ library which libarrowproxy.dylib depends on.
# For example, change libarrow.1500.dylib to libarrow.1500.0.0.dylib.
run: |
pushd arrow/matlab/install/arrow_matlab/+libmexclass/+proxy
SYMLINK_ARROW_LIB="$(find . -name 'libarrow.*.dylib' -type l | xargs basename)"
REGULAR_ARROW_LIB="$(echo libarrow.*.*.dylib)"
echo "SYMLINK_ARROW_LIB = ${SYMLINK_ARROW_LIB}"
echo "REGULAR_ARROW_LIB = ${REGULAR_ARROW_LIB}"
install_name_tool -change @rpath/$SYMLINK_ARROW_LIB @rpath/$REGULAR_ARROW_LIB libarrowproxy.dylib
popd
- name: Compress into single artifact
run: tar -cvzf matlab-arrow-macos.tar.gz arrow/matlab/install/arrow_matlab
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: matlab-arrow-macos.tar.gz
path: matlab-arrow-macos.tar.gz

windows:
name: AMD64 Windows 2022 MATLAB
runs-on: windows-2022
steps:
{{ macros.github_checkout_arrow()|indent }}
- name: Install MATLAB
uses: matlab-actions/setup-matlab@v1
with:
release: R2023a
- name: Install sccache
shell: bash
run: arrow/ci/scripts/install_sccache.sh pc-windows-msvc $(pwd)/sccache
- name: Build MATLAB Interface
shell: cmd
env:
{{ macros.github_set_sccache_envvars()|indent(8) }}
run: |
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64
bash -c "arrow/ci/scripts/matlab_build.sh $(pwd)/arrow"
- name: Compress into single artifact
shell: bash
run: tar -cvzf matlab-arrow-windows.tar.gz arrow/matlab/install/arrow_matlab
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: matlab-arrow-windows.tar.gz
path: matlab-arrow-windows.tar.gz

package-mltbx:
name: Package MATLAB Toolbox (MLTBX) Files
runs-on: ubuntu-latest
needs:
- ubuntu
- macos
- windows
steps:
{{ macros.github_checkout_arrow(fetch_depth=0)|indent }}
- name: Download Artifacts
uses: actions/download-artifact@v4
with:
path: artifacts-downloaded
- name: Decompress Artifacts
run: |
mv artifacts-downloaded/*/*.tar.gz .
tar -xzvf matlab-arrow-ubuntu.tar.gz
tar -xzvf matlab-arrow-macos.tar.gz
tar -xzvf matlab-arrow-windows.tar.gz
- name: Copy LICENSE.txt and NOTICE.txt for packaging
run: |
cp arrow/LICENSE.txt arrow/matlab/install/arrow_matlab/LICENSE.txt
cp arrow/NOTICE.txt arrow/matlab/install/arrow_matlab/NOTICE.txt
- name: Install MATLAB
uses: matlab-actions/setup-matlab@v1
with:
release: R2023a
- name: Run commands
env:
MATLABPATH: arrow/matlab/tools
ARROW_MATLAB_TOOLBOX_FOLDER: arrow/matlab/install/arrow_matlab
ARROW_MATLAB_TOOLBOX_OUTPUT_FOLDER: artifacts/matlab-dist
ARROW_MATLAB_TOOLBOX_VERSION: {{ arrow.no_rc_version }}
uses: matlab-actions/run-command@v1
with:
command: packageMatlabInterface
{{ macros.github_upload_releases(["artifacts/matlab-dist/*.mltbx"])|indent }}
9 changes: 9 additions & 0 deletions dev/tasks/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ groups:
- conan-*
- debian-*
- java-jars
- matlab
- nuget
- python-sdist
- r-binary-packages
Expand Down Expand Up @@ -665,6 +666,14 @@ tasks:
params:
formula: apache-arrow.rb

############################## MATLAB Packages ################################

matlab:
ci: github
template: matlab/github.yml
artifacts:
- matlab-arrow-{no_rc_version}.mltbx

############################## Arrow JAR's ##################################

java-jars:
Expand Down
17 changes: 0 additions & 17 deletions matlab/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -201,9 +201,6 @@ get_filename_component(ARROW_SHARED_LIB_DIR ${ARROW_SHARED_LIB} DIRECTORY)
get_filename_component(ARROW_SHARED_LIB_FILENAME ${ARROW_SHARED_LIB} NAME_WE)

if(NOT Arrow_FOUND)
# If Arrow_FOUND is false, Arrow is built by the arrow_shared target and needs
# to be copied to CMAKE_PACKAGED_INSTALL_DIR.

if(APPLE)
# Install libarrow.dylib (symlink) and the real files it points to.
# on macOS, we need to match these files: libarrow.dylib
Expand All @@ -226,20 +223,6 @@ if(NOT Arrow_FOUND)
set(SHARED_LIBRARY_VERSION_REGEX
${ARROW_SHARED_LIB_FILENAME}${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()

# The subfolders cmake and pkgconfig are excluded as they will be empty.
# Note: The following CMake Issue suggests enabling an option to exclude all
# folders that would be empty after installation:
# https://gitlab.kitware.com/cmake/cmake/-/issues/17122

set(CMAKE_PACKAGED_INSTALL_DIR "${CMAKE_INSTALL_DIR}/+arrow")

install(DIRECTORY "${ARROW_SHARED_LIB_DIR}/"
DESTINATION ${CMAKE_PACKAGED_INSTALL_DIR}
FILES_MATCHING
REGEX ${SHARED_LIBRARY_VERSION_REGEX}
PATTERN "cmake" EXCLUDE
PATTERN "pkgconfig" EXCLUDE)
endif()

# MATLAB_ADD_INSTALL_DIR_TO_STARTUP_FILE toggles whether an addpath command to add the install
Expand Down
84 changes: 84 additions & 0 deletions matlab/tools/packageMatlabInterface.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
% Licensed to the Apache Software Foundation (ASF) under one
% or more contributor license agreements. See the NOTICE file
% distributed with this work for additional information
% regarding copyright ownership. The ASF licenses this file
% to you under the Apache License, Version 2.0 (the
% "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing,
% software distributed under the License is distributed on an
% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
% KIND, either express or implied. See the License for the
% specific language governing permissions and limitations
% under the License.

toolboxFolder = string(getenv("ARROW_MATLAB_TOOLBOX_FOLDER"));
outputFolder = string(getenv("ARROW_MATLAB_TOOLBOX_OUTPUT_FOLDER"));
toolboxVersionRaw = string(getenv("ARROW_MATLAB_TOOLBOX_VERSION"));

appendLicenseText(fullfile(toolboxFolder, "LICENSE.txt"));
appendNoticeText(fullfile(toolboxFolder, "NOTICE.txt"));

% Output folder must exist.
mkdir(outputFolder);

disp("Toolbox Folder: " + toolboxFolder);
disp("Output Folder: " + outputFolder);
disp("Toolbox Version Raw: " + toolboxVersionRaw);


% Note: This string processing heuristic may not be robust to future
% changes in the Arrow versioning scheme.
dotIdx = strfind(toolboxVersionRaw, ".");
numDots = numel(dotIdx);
if numDots >= 3
toolboxVersion = extractBefore(toolboxVersionRaw, dotIdx(3));
else
toolboxVersion = toolboxVersionRaw;
end

disp("Toolbox Version:" + toolboxVersion);

identifier = "ad1d0fe6-22d1-4969-9e6f-0ab5d0f12ce3";
opts = matlab.addons.toolbox.ToolboxOptions(toolboxFolder, identifier);
opts.ToolboxName = "MATLAB Arrow Interface";
opts.ToolboxVersion = toolboxVersion;
opts.AuthorName = "The Apache Software Foundation";
opts.AuthorEmail = "dev@arrow.apache.org";

% Set the SupportedPlatforms
opts.SupportedPlatforms.Win64 = true;
opts.SupportedPlatforms.Maci64 = true;
opts.SupportedPlatforms.Glnxa64 = true;
opts.SupportedPlatforms.MatlabOnline = true;

% Interface is only qualified against R2023a at the moment
opts.MinimumMatlabRelease = "R2023a";
opts.MaximumMatlabRelease = "R2023a";

opts.OutputFile = fullfile(outputFolder, compose("matlab-arrow-%s.mltbx", toolboxVersionRaw));
disp("Output File: " + opts.OutputFile);
matlab.addons.toolbox.packageToolbox(opts);

function appendLicenseText(filename)
licenseText = [ ...
newline + "--------------------------------------------------------------------------------" + newline
"3rdparty dependency mathworks/libmexclass is redistributed as a dynamically"
"linked shared library in certain binary distributions, like the MATLAB"
"distribution." + newline
"Copyright: 2022-2024 The MathWorks, Inc. All rights reserved."
"Homepage: https://github.com/mathworks/libmexclass"
"License: 3-clause BSD" ];
writelines(licenseText, filename, WriteMode="append");
end

function appendNoticeText(filename)
noticeText = [ ...
newline + "---------------------------------------------------------------------------------" + newline
"This product includes software from The MathWorks, Inc. (Apache 2.0)"
" * Copyright (C) 2024 The MathWorks, Inc."];
writelines(noticeText, filename, WriteMode="append");
end

0 comments on commit ce11e56

Please sign in to comment.