-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing sklearn set_output API for BaseEstimator s #1093
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, except for one detail: @VincentRouvreau do we need to disable the test if pandas
is not installed? (we have things like if(SKLEARN_FOUND)
before the add_test
s in python/CMakeLists.txt
)
On windows, gudhi and cgal normally use mpir instead of gmp (mpir should install files gmp.h, libgmp.*, etc for compatibility). If that doesn't work, it is worth investigating.
The policy so far has been that CGAL is a dependency for a core part of Gudhi (alpha-complex). And once it is there, we might as well use it, users don't care much where it comes from, but they might (ok, maybe not here) care if it becomes 10x slower, or less reliable. It could make sense to try and simplify things for people who work on pure Python parts of Gudhi, by reusing a pre-built package (so they wouldn't even need a C++ compiler), but I don't know how hard that would be. |
Yes this test should not be run if pandas is not installed (as it is an optinal dependency).
def test_set_output():
try:
import pandas
NB_PC = 5
point_clouds = [points.sphere(n_samples=random.randint(100, 150), ambient_dim=2) for _ in range(NB_PC)]
rips = RipsPersistence(homology_dimensions=[0, 2], n_jobs=-2)
diags_pandas = rips.set_output(transform="pandas").fit_transform(point_clouds)
assert 'H0' == diags_pandas.columns[0]
assert 'H2' == diags_pandas.columns[1]
assert len(diags_pandas.index) == NB_PC
except ImportError:
pass
|
First option looks simple and therefore very good to me! |
So maybe I was wrong in thinking I needed to install GMP then. The thing is, with cgal-cpp installed via conda, on running the cmake command it fails with:
Not sure what to do then... |
Maybe |
Hmm, we'll need to investigate that 😞 |
@VincentRouvreau IIRC you adapted FindGMP.cmake from CGAL? The one in CGAL seems to have had a few patches since then, maybe we need them as well... There could also be bad interactions where |
Yep those two are there, mpir.h and gmp.h, and bin/ has gmp.dll, mpir.dll, and lib/ has gmp.lib and mpir.lib. |
and mpir is installed, |
Strange, on windows, I created a fresh miniforge environment, installed cgal-cpp, cmake and git in this env, cloned the gudhi-devel repository, went to a |
Oh good find indeed! I get that find_path considered the following locations:
... but none of these locations is |
I used a cmake I downloaded and installed, let's try with one from conda. |
... and that did it for me! |
Simple example showing how to implement scikit-learn's set_output API and an example test.
This outputs diagrams nicely:
Let me add that although this PR is very simple, I am a bit sore that it took so long for me to write for the following reasons:
conda install cgal-cpp
, conda will not also install GMP (conda claims specs are incompatible) so gudhi will not be happy on compilation because it cannot find GMP, if installed throughvcpkg
the install will be stuck onBuilding x64-windows-dbg
(see vcpkg getting stuck at installing package ("Building x64-windows-dbg") microsoft/vcpkg#31181 for detailed symptoms resembling mine), and installing GMP any other way seems nearly impossible on windows.So we could probably get rid of this dependency if I understood that correctly...