Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #9

Merged
merged 5 commits into from
Jan 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
### Version 0.4.7
Minor updates to Dataset classes that simplify them so that
end users can create custom datasets and wrap e.g. csv files
or databases when setting up their training dataset.

### Version 0.4.6
Added the Conv1dTwoLayer kernel. Removed the experimental simplex
rffs feature which is of uncertain usefulness. Fixed a bug occuring
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,11 @@ which only provide kernels for fixed-vector data (tabular data),
xGPR provides powerful convolution kernels for variable-length time series,
sequences and graphs.

### What's new in v0.4.5
### What's new in v0.4.7
You can now build custom Datasets (similar to the Dataloader in PyTorch)
so that you can use any kind of data (SQLite db, HDF5 etc.) as input
when training with minor tweaks.

Starting with version 0.4.5, xGPR is available as a precompiled binary / wheel
for 64 bit Linux and as a source distribution for other platforms, so that
in most cases, installation should typically be as simple as:
Expand Down
14 changes: 13 additions & 1 deletion docs/basic_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,14 @@ xGPR quickstart
Build your training set
-------------------------

Start by building a Dataset, which is similar to a DataLoader in PyTorch:::
Start by building a Dataset, which is similar to a DataLoader in PyTorch. If
your data is organized in one of a couple fairly common ways, you can
use a built-in xGPR function to build this dataset. If your data is in
some other form(e.g. a fasta file, an SQLite db or an HDF5 file) and you
don't want to make a copy of it, you can instead
subclass xGPR's ``DatasetBaseclass`` and build a
custom ``Dataset`` object. We'll look at the more common
situations first:::

from xGPR import build_regression_dataset

Expand Down Expand Up @@ -56,6 +63,11 @@ When you create the dataset, xGPR will do some checks to make sure that
what you fed it makes sense. If the dataset is very large, these may take a
second.

Finally, let's say your data is a fasta file, a csv file, an HDF5 file or some
other format. You can create your own Dataset that loads the data in minibatches
during training and does any preprocessing you want to do on each minibatch.
To see how to do this, check out :doc:`notebooks/custom_dataset_example`.

Fit your model and make predictions
-------------------------------------

Expand Down
45,731 changes: 45,731 additions & 0 deletions docs/notebooks/CASP.csv

Large diffs are not rendered by default.

Loading