Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs tweaks #24

Merged
merged 8 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ julia = "1.9"
[extras]
DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
MLJTestInterface = "72560011-54dd-4dc2-94f3-c5de45b75ecd"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["DelimitedFiles", "MLJBase", "Test"]
test = ["DelimitedFiles", "MLJBase", "MLJTestInterface", "Test"]
1 change: 1 addition & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
Maxnet = "81f79f80-22f2-4e41-ab86-00c11cf0f26f"
6 changes: 3 additions & 3 deletions docs/src/usage/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ CurrentModule = Maxnet
```

## Installation
Maxnet.jl is not yet registered - install by running
Install the latest version of Maxnet.jl by running
```julia
]
add https://github.com/tiemvanderdeure/Maxnet.jl
add Maxnet
```

## Basic usage
Expand All @@ -31,7 +31,7 @@ There are numerous settings that can be tweaked to change the model fit. These a
### Model settings
The two most important settings to change when running Maxnet is the feature classes selected and the regularization factor.

By default, the feature classes selected depends on the number of presence points, see [Maxnet.default_features](@ref). To set them manually, specify the `features` keyword using either a `Vector` of `AbstractFeatureClass`, or a `string`, where `l` represents `LinearFeature` and `CategoricalFeature`, `q` represents `QuadraticFeature`, `p` represents `ProductFeature`, `t` represents `ThresholdFeature` and `h` represents `HingeFeature`.
By default, the feature classes selected depends on the number of presence points, see [default_features](@ref). To set them manually, specify the `features` keyword using either a `Vector` of `AbstractFeatureClass`, or a `string`, where `l` represents `LinearFeature` and `CategoricalFeature`, `q` represents `QuadraticFeature`, `p` represents `ProductFeature`, `t` represents `ThresholdFeature` and `h` represents `HingeFeature`.

For example:
```julia
Expand Down
14 changes: 11 additions & 3 deletions src/maxnet_function.jl
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,11 @@
- `features`: Either a `Vector` of `AbstractFeatureClass` to be used in the model,
or a `String` where "l" = linear and categorical, "q" = quadratic, "p" = product, "t" = threshold, "h" = hinge (e.g. "lqh"); or
By default, the features are based on the number of presences are used. See [`default_features`](@ref)
- `regularization_multiplier`: A constant to adjust regularization, where a higher `regularization_multiplier` results in a higher penalization for features
- `regularization_function`: A function to compute a regularization for each feature. A default `regularization_function` is built in.
- `addsamplestobackground`: A boolean, where `true` adds the background samples to the predictors. Defaults to `true`.
- `regularization_multiplier`: A constant to adjust regularization, where a higher `regularization_multiplier` results in a higher
penalization for features and therefore less overfitting.
- `regularization_function`: A function to compute a regularization for each feature. A default `regularization_function` is built in
and should be used in most cases.
- `addsamplestobackground`: Whether to add presence values to the background. Defaults to `true`.
- `n_knots`: the number of knots used for Threshold and Hinge features. Defaults to 50. Ignored if there are neither Threshold nor Hinge features
- `weight_factor`: A `Float64` value to adjust the weight of the background samples. Defaults to 100.0.
- `kw...`: Further arguments to be passed to `GLMNet.glmnet`
Expand All @@ -32,6 +34,7 @@ using Maxnet
p_a, env = Maxnet.bradypus();
bradypus_model = maxnet(p_a, env; features = "lq")

# Output
Fit Maxnet model
Features classes: Maxnet.AbstractFeatureClass[LinearFeature(), CategoricalFeature(), QuadraticFeature()]
Entropy: 6.114650341746531
Expand All @@ -49,6 +52,11 @@ function maxnet(
n_knots::Int = 50,
kw...)

if allequal(presences)
pa = first(presences) ? "presences" : "absences"
throw(ArgumentError("All data points are $pa. Maxnet will only work with at least some presences and some absences."))
end

_maxnet(
presences,
predictors,
Expand Down
94 changes: 70 additions & 24 deletions src/mlj_interface.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,30 +24,6 @@ function MaxnetBinaryClassifier(;
)
end

"""
MaxnetBinaryClassifier

A model type for fitting a maxnet model using `MLJ`.

Use `MaxnetBinaryClassifier()` to create an instance with default parameters, or use keyword arguments to specify parameters.

The keywords `link`, and `clamp` are passed to [`Maxnet.predict`](@ref), while all other keywords are passed to [`maxnet`](@ref).
See the documentation of these functions for the meaning of these parameters and their defaults.

# Example
```jldoctest
using Maxnet, MLJBase
p_a, env = Maxnet.bradypus()

mach = machine(MaxnetBinaryClassifier(features = "lqp"), env, categorical(p_a))
fit!(mach)
yhat = MLJBase.predict(mach, env)
# output
```

"""
MaxnetBinaryClassifier

MMI.metadata_pkg(
MaxnetBinaryClassifier;
name = "Maxnet",
Expand All @@ -67,6 +43,76 @@ MMI.metadata_model(
reports_feature_importances=false
)

"""
$(MMI.doc_header(MaxnetBinaryClassifier))

# Training data

In MLJ or MLJBase, bind an instance `model` to data with

mach = machine(model, X, y)

where

- `X`: any table of input features (eg, a `DataFrame`) whose columns
each have one of the following element scitypes: `Continuous` or `<:Multiclass`. Check
`scitypes` with `schema(X)`.

- `y`: is the target, which can be any `AbstractVector` whose element
scitype is `<:Binary`. The first class should refer to background values,
and the second class to presence values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Binary is an alias for Finite{2}. However, if the order of the classes matters, as it appears to do, shouldn't this be the more restrictive OrderedFactor{2}?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. I think it's very similar to a GLM in the sense that one class is assumed to be the 'positive' class (here: presences and background points) but conceptually they aren't really ordered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

# Hyper-parameters

- `features`: Specifies which features classes to use in the model, e.g. "lqh" for linear, quadratic and hinge features.
See also [Maxnet.maxnet](@ref)
- `regularization_multiplier = 1.0`: 'Adjust how tight the model will fit. Increasing this will reduce overfitting.
- `regularization_function`: A function to compute the regularization of each feature class. Defaults to `Maxnet.default_regularization`
- `addsamplestobackground = true`: Controls wether to add presence values to the background.
- `n_knots = 50`: The number of knots used for Threshold and Hinge features. A higher number gives more flexibility for these features.
- `weight_factor = 100.0`: A `Float64` value to adjust the weight of the background samples.
- `link = Maxnet.CloglogLink()`: The link function to use when predicting. See `Maxnet.predict`
- `clamp = false`: Clamp values passed to `MLJBase.predict` to the range the model was trained on.

# Operations

- `predict(mach, Xnew)`: return predictions of the target given
features `Xnew` having the same scitype as `X` above. Predictions are
probabilistic and can be interpreted as the probability of presence.

# Fitted Parameters

The fields of `fitted_params(mach)` are:

- `fitresult`: A `Tuple` where the first entry is the `Maxnet.MaxnetModel` returned by the Maxnet algorithm
and the second the entry is the classes of `y`

# Report

The fields of `report(mach)` are:

- `selected_variables`: A `Vector` of `Symbols` of the variables that were selected.
- `selected_features`: A `Vector` of `Maxnet.ModelMatrixColumn` with the features that were selected.
- `complexity`: the number of selected features in the model.


# Example

```@example
using MLJBase, Maxnet
p_a, env = Maxnet.bradypus()
y = coerce(p_a, Binary)
X = coerce(env, Count => Continuous)

mach = machine(MaxnetBinaryClassifier(features = "lqp"), X, y)
fit!(mach)
yhat = MLJBase.predict(mach, env)

```

"""
MaxnetBinaryClassifier

function MMI.fit(m::MaxnetBinaryClassifier, verbosity::Int, X, y)
# convert categorical to boolean
y_boolean = Bool.(MMI.int(y) .- 1)
Expand Down
15 changes: 13 additions & 2 deletions test/runtests.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
using Maxnet, Test, Statistics, CategoricalArrays
using Maxnet, Statistics, CategoricalArrays, MLJTestInterface
using Test

# read in Bradypus data
p_a, env = Maxnet.bradypus()
# Make the levels in ecoreg string to make sure that that works
env = merge(env, (; ecoreg = recode(env.ecoreg, (l => string(l) for l in levels(env.ecoreg))...)))
Expand Down Expand Up @@ -82,9 +84,18 @@ end
m = maxnet(p_a, env; features = "lq", addsamplestobackground = false)
@test m_w.entropy > m.entropy
end
m = maxnet(p_a, env; features = "lq", addsamplestobackground = false)

@testset "MLJ" begin
data = MLJTestInterface.make_binary()
failures, summary = MLJTestInterface.test(
[MaxnetBinaryClassifier],
data...;
mod=@__MODULE__,
verbosity=0, # bump to debug
throw=false, # set to true to debug
)
@test isempty(failures)

using MLJBase
mn = Maxnet.MaxnetBinaryClassifier

Expand Down
Loading