Skip to content

Commit

Permalink
Revert to passing matrices to/from C++ via uintptr_t's. (#35)
Browse files Browse the repository at this point in the history
This was initially motivated by the failure of pybind11's auto-conversion for
std::shared_ptr in downstream packages when using a prebuilt wheel of mattress.
More generally, this change avoids any reliance on pybind11 in downstream
packages; as long as they can take a uintptr_t, they can use mattress. 

We also introduce a BoundMatrix class inside a mattress.h header that can be
used by downstreams to get the casting correct. GC protection is now fully
handled in C++ via BoundMatrix::original, which reduces the risk of bugs when
memory management was previously split across C++ and Python.
  • Loading branch information
LTLA authored Dec 13, 2024
1 parent 42f01db commit 98ea078
Show file tree
Hide file tree
Showing 19 changed files with 552 additions and 382 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Changelog

## Version 0.3.1

- Cast to/from `uintptr_t` so that downstream packages aren't forced to rely on **pybind11** converters.
- Added a `mattress.h` to ensure developers use the correct types during casting.
- Shift all responsibility for GC protection to C++ via the new `mattress::BoundMatrix` class.

## Version 0.3.0

- Switch to **pybind11** for the Python/C++ interface, with CMake for the build system.
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
recursive-include src/mattress/include *
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,22 +32,26 @@ pip install mattress
The aim is to allow package C++ code to accept [all types of matrix representations](#supported-matrices) without requiring re-compilation of the associated code.
To achive this:

1. Add `assorthead.includes()` to the compiler's include path.
1. Add `mattress.includes()` and `assorthead.includes()` to the compiler's include path.
This can be done through `include_dirs=` of the `Extension()` definition in `setup.py`
or by adding a `target_include_directories()` in CMake, depending on the build system.
2. Call `mattress.initialize()` on a Python matrix object to wrap it in a **tatami**-compatible C++ representation.
This returns an `InitializedMatrix` with a `ptr` property that contains a pointer to the C++ matrix.
3. Pass `ptr` to [**pybind11**](https://pybind11.readthedocs.io)-wrapped C++ code as a [shared pointer to a `tatami::Matrix`](lib/src/def.h),
3. Pass `ptr` to C++ code as a `uintptr_t` referencing a `tatami::Matrix`,
which can be interrogated as described in the [**tatami** documentation](https://github.com/tatami-inc/tatami).

So, for example, the C++ code in our downstream package might look like the code below:

```cpp
int do_something(const std::shared_ptr<tatami::Matrix<double, uint32_t> >& mat) {
#include "mattress.h"

int do_something(uintptr_t ptr) {
const auto& mat_ptr = mattress::cast(ptr)->ptr;
// Do something with the tatami interface.
return 1;
}

// Assuming we're using pybind11, but any framework that can accept a uintptr_t is fine.
PYBIND11_MODULE(lib_downstream, m) {
m.def("do_something", &do_something);
}
Expand All @@ -64,7 +68,7 @@ def do_something(x):
return lib.do_something(tmat.ptr)
```

See [`lib/src/def.h`](lib/src/def.h) for the exact definitions of the interface types used by **mattress**.
Check out [the included header](src/mattress/include/mattress.h) for more definitions.

## Supported matrices

Expand Down Expand Up @@ -165,10 +169,10 @@ init2 = initialize(wrapped)
This is more efficient as it re-uses the `InitializedMatrix` already generated from `x`.
It is also more convenient as we don't have to carry around `x` to generate `init2`.

## Extending `initialize()`
## Extending to custom matrices

Developers can extend **mattress** to custom matrix classes by registering new methods with the `initialize()` generic.
This should return a `InitializedMatrix` object containing a shared pointer to a `tatami::Matrix<double, uint32_t>` instance (see [`lib/src/def.h`](lib/src/def.h) for types).
This should return a `InitializedMatrix` object containing a `uintptr_t` cast from a pointer to a `tatami::Matrix` (see [the included header](src/mattress/include/mattress.h)).
Once this is done, all calls to `initialize()` will be able to handle matrices of the newly registered types.

```python
Expand All @@ -178,9 +182,9 @@ import mattress
@mattress.initialize.register
def _initialize_my_custom_matrix(x: MyCustomMatrix):
data = x.some_internal_data
return mattress.InitializedMatrix(lib.initialize_custom(data), objects=[data])
return mattress.InitializedMatrix(lib.initialize_custom(data))
```

If the initialized `tatami::Matrix` contains references to Python-managed data, e.g., in NumPy arrays,
we must ensure that the data is not garbage-collected during the lifetime of the `tatami::Matrix`.
This is achieved by storing a reference to the data in the `objects=` argument of the `InitializedMatrix`.
This is achieved by storing a reference to the data in the `original` member of the `mattress::BoundMatrix`.
1 change: 1 addition & 0 deletions lib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ pybind11_add_module(mattress
)

target_include_directories(mattress PRIVATE "${ASSORTHEAD_INCLUDE_DIR}")
target_include_directories(mattress PRIVATE "../src/mattress/include")

set_property(TARGET mattress PROPERTY CXX_STANDARD 17)

Expand Down
298 changes: 167 additions & 131 deletions lib/src/common.cpp

Large diffs are not rendered by default.

46 changes: 38 additions & 8 deletions lib/src/compressed_sparse_matrix.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#include "def.h"
#include "mattress.h"
#include "utils.h"

#include "pybind11/pybind11.h"
Expand All @@ -9,7 +9,14 @@
#include <cstdint>

template<typename Data_, typename Index_>
MatrixPointer initialize_compressed_sparse_matrix_raw(MatrixIndex nr, MatrixValue nc, const pybind11::array& data, const pybind11::array& index, const pybind11::array& indptr, bool byrow) {
uintptr_t initialize_compressed_sparse_matrix_raw(
mattress::MatrixIndex nr,
mattress::MatrixValue nc,
const pybind11::array& data,
const pybind11::array& index,
const pybind11::array& indptr,
bool byrow)
{
size_t expected = (byrow ? nr : nc);
if (indptr.size() != expected + 1) {
throw std::runtime_error("unexpected length for the 'indptr' array");
Expand All @@ -27,12 +34,28 @@ MatrixPointer initialize_compressed_sparse_matrix_raw(MatrixIndex nr, MatrixValu
}
tatami::ArrayView<Index_> iview(check_contiguous_numpy_array<Index_>(index), nz);

typedef tatami::CompressedSparseMatrix<MatrixValue, MatrixIndex, decltype(dview), decltype(iview), decltype(pview)> Spmat;
return MatrixPointer(new Spmat(nr, nc, std::move(dview), std::move(iview), std::move(pview), byrow));
auto tmp = std::make_unique<mattress::BoundMatrix>();
typedef tatami::CompressedSparseMatrix<mattress::MatrixValue, mattress::MatrixIndex, decltype(dview), decltype(iview), decltype(pview)> Spmat;
tmp->ptr.reset(new Spmat(nr, nc, std::move(dview), std::move(iview), std::move(pview), byrow));

pybind11::tuple objects(3);
objects[0] = data;
objects[1] = index;
objects[2] = indptr;
tmp->original = std::move(objects);

return mattress::cast(tmp.release());
}

template<typename Data_>
MatrixPointer initialize_compressed_sparse_matrix_itype(MatrixIndex nr, MatrixValue nc, const pybind11::array& data, const pybind11::array& index, const pybind11::array& indptr, bool byrow) {
uintptr_t initialize_compressed_sparse_matrix_itype(
mattress::MatrixIndex nr,
mattress::MatrixValue nc,
const pybind11::array& data,
const pybind11::array& index,
const pybind11::array& indptr,
bool byrow)
{
auto dtype = index.dtype();

if (dtype.is(pybind11::dtype::of<int64_t>())) {
Expand All @@ -54,10 +77,17 @@ MatrixPointer initialize_compressed_sparse_matrix_itype(MatrixIndex nr, MatrixVa
}

throw std::runtime_error("unrecognized index type '" + std::string(dtype.kind(), 1) + std::to_string(dtype.itemsize()) + "' for compressed sparse matrix initialization");
return MatrixPointer();
return 0;
}

MatrixPointer initialize_compressed_sparse_matrix(MatrixIndex nr, MatrixValue nc, const pybind11::array& data, const pybind11::array& index, const pybind11::array& indptr, bool byrow) {
uintptr_t initialize_compressed_sparse_matrix(
mattress::MatrixIndex nr,
mattress::MatrixValue nc,
const pybind11::array& data,
const pybind11::array& index,
const pybind11::array& indptr,
bool byrow)
{
auto dtype = data.dtype();

if (dtype.is(pybind11::dtype::of<double>())) {
Expand All @@ -83,7 +113,7 @@ MatrixPointer initialize_compressed_sparse_matrix(MatrixIndex nr, MatrixValue nc
}

throw std::runtime_error("unrecognized data type '" + std::string(dtype.kind(), 1) + std::to_string(dtype.itemsize()) + "' for compressed sparse matrix initialization");
return MatrixPointer();
return 0;
}

void init_compressed_sparse_matrix(pybind11::module& m) {
Expand Down
12 changes: 0 additions & 12 deletions lib/src/def.h

This file was deleted.

53 changes: 34 additions & 19 deletions lib/src/delayed_binary_isometric_operation.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#include "def.h"
#include "mattress.h"

#include "pybind11/pybind11.h"
#include "pybind11/numpy.h"
Expand All @@ -8,45 +8,60 @@
#include <string>
#include <cstdint>

MatrixPointer initialize_delayed_binary_isometric_operation(MatrixPointer left, MatrixPointer right, const std::string& op) {
template<typename Operation_>
uintptr_t convert(uintptr_t left, uintptr_t right, Operation_ op) {
auto lbound = mattress::cast(left);
auto rbound = mattress::cast(right);

auto tmp = std::make_unique<mattress::BoundMatrix>();
tmp->ptr = tatami::make_DelayedBinaryIsometricOperation(lbound->ptr, rbound->ptr, std::move(op));

pybind11::tuple original(2);
original[0] = lbound->original;
original[1] = rbound->original;
tmp->original = std::move(original);
return mattress::cast(tmp.release());
}

uintptr_t initialize_delayed_binary_isometric_operation(uintptr_t left, uintptr_t right, const std::string& op) {
if (op == "add") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricAdd()));
return convert(left, right, tatami::make_DelayedBinaryIsometricAdd());
} else if (op == "subtract") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricSubtract()));
return convert(left, right, tatami::make_DelayedBinaryIsometricSubtract());
} else if (op == "multiply") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricMultiply()));
return convert(left, right, tatami::make_DelayedBinaryIsometricMultiply());
} else if (op == "divide") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricDivide()));
return convert(left, right, tatami::make_DelayedBinaryIsometricDivide());
} else if (op == "remainder") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricModulo()));
return convert(left, right, tatami::make_DelayedBinaryIsometricModulo());
} else if (op == "floor_divide") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricIntegerDivide()));
return convert(left, right, tatami::make_DelayedBinaryIsometricIntegerDivide());
} else if (op == "power") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricPower()));
return convert(left, right, tatami::make_DelayedBinaryIsometricPower());

} else if (op == "equal") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricEqual()));
return convert(left, right, tatami::make_DelayedBinaryIsometricEqual());
} else if (op == "not_equal") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricNotEqual()));
return convert(left, right, tatami::make_DelayedBinaryIsometricNotEqual());
} else if (op == "greater") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricGreaterThan()));
return convert(left, right, tatami::make_DelayedBinaryIsometricGreaterThan());
} else if (op == "greater_equal") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricGreaterThanOrEqual()));
return convert(left, right, tatami::make_DelayedBinaryIsometricGreaterThanOrEqual());
} else if (op == "less") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricLessThan()));
return convert(left, right, tatami::make_DelayedBinaryIsometricLessThan());
} else if (op == "less_equal") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricLessThanOrEqual()));
return convert(left, right, tatami::make_DelayedBinaryIsometricLessThanOrEqual());

} else if (op == "logical_and") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricBooleanAnd()));
return convert(left, right, tatami::make_DelayedBinaryIsometricBooleanAnd());
} else if (op == "logical_or") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricBooleanOr()));
return convert(left, right, tatami::make_DelayedBinaryIsometricBooleanOr());
} else if (op == "logical_xor") {
return (tatami::make_DelayedBinaryIsometricOperation(std::move(left), std::move(right), tatami::make_DelayedBinaryIsometricBooleanXor()));
return convert(left, right, tatami::make_DelayedBinaryIsometricBooleanXor());
}

throw std::runtime_error("unknown binary isometric operation '" + op + "'");
return MatrixPointer();
return 0;
}

void init_delayed_binary_isometric_operation(pybind11::module& m) {
Expand Down
18 changes: 13 additions & 5 deletions lib/src/delayed_bind.cpp
Original file line number Diff line number Diff line change
@@ -1,18 +1,26 @@
#include "def.h"
#include "mattress.h"

#include "tatami/tatami.hpp"

#include "pybind11/pybind11.h"

#include <vector>

MatrixPointer initialize_delayed_bind(pybind11::list inputs, int along) {
std::vector<MatrixPointer> combined;
uintptr_t initialize_delayed_bind(const pybind11::list& inputs, int along) {
std::vector<std::shared_ptr<tatami::Matrix<mattress::MatrixValue, mattress::MatrixIndex> > > combined;
combined.reserve(inputs.size());
pybind11::tuple originals(inputs.size());

for (size_t i = 0, n = inputs.size(); i < n; ++i) {
combined.push_back(inputs[i].cast<MatrixPointer>());
auto bound = mattress::cast(inputs[i].cast<uintptr_t>());
combined.push_back(bound->ptr);
originals[i] = bound->original;
}
return tatami::make_DelayedBind(std::move(combined), along == 0);

auto tmp = std::make_unique<mattress::BoundMatrix>();
tmp->ptr = tatami::make_DelayedBind(std::move(combined), along == 0);
tmp->original = std::move(originals);
return mattress::cast(tmp.release());
}

void init_delayed_bind(pybind11::module& m) {
Expand Down
18 changes: 14 additions & 4 deletions lib/src/delayed_subset.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#include "def.h"
#include "mattress.h"
#include "utils.h"

#include "pybind11/pybind11.h"
Expand All @@ -7,9 +7,19 @@
#include <string>
#include <cstdint>

MatrixPointer initialize_delayed_subset(MatrixPointer mat, const pybind11::array& subset, bool byrow) {
auto sptr = check_numpy_array<MatrixIndex>(subset);
return tatami::make_DelayedSubset(std::move(mat), tatami::ArrayView<MatrixIndex>(sptr, subset.size()), byrow);
uintptr_t initialize_delayed_subset(uintptr_t ptr, const pybind11::array& subset, bool byrow) {
auto bound = mattress::cast(ptr);
auto sptr = check_numpy_array<mattress::MatrixIndex>(subset);

auto tmp = std::make_unique<mattress::BoundMatrix>();
tmp->ptr = tatami::make_DelayedSubset(bound->ptr, tatami::ArrayView<mattress::MatrixIndex>(sptr, subset.size()), byrow);

pybind11::tuple original(2);
original[0] = bound->original;
original[1] = subset;
tmp->original = std::move(original);

return mattress::cast(tmp.release());
}

void init_delayed_subset(pybind11::module& m) {
Expand Down
10 changes: 7 additions & 3 deletions lib/src/delayed_transpose.cpp
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
#include "def.h"
#include "mattress.h"

#include "pybind11/pybind11.h"
#include "pybind11/numpy.h"

#include <string>
#include <cstdint>

MatrixPointer initialize_delayed_transpose(MatrixPointer mat) {
return tatami::make_DelayedTranspose(std::move(mat));
uintptr_t initialize_delayed_transpose(uintptr_t ptr) {
auto bound = mattress::cast(ptr);
auto tmp = std::make_unique<mattress::BoundMatrix>();
tmp->ptr = tatami::make_DelayedTranspose(bound->ptr);
tmp->original = bound->original;
return mattress::cast(tmp.release());
}

void init_delayed_transpose(pybind11::module& m) {
Expand Down
Loading

0 comments on commit 98ea078

Please sign in to comment.