Skip to content

Commit

Permalink
Simplified the Dataset classes to facilitate constructing custom data…
Browse files Browse the repository at this point in the history
…sets. Added an example to the docs to illustrate how to construct custom datasets.
  • Loading branch information
jlparkI committed Jan 22, 2025
1 parent 02c3bd1 commit 8c3b327
Show file tree
Hide file tree
Showing 14 changed files with 46,413 additions and 29 deletions.
5 changes: 5 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
### Version 0.4.7
Minor updates to Dataset classes that simplify them so that
end users can create custom datasets and wrap e.g. csv files
or databases when setting up their training dataset.

### Version 0.4.6
Added the Conv1dTwoLayer kernel. Removed the experimental simplex
rffs feature which is of uncertain usefulness. Fixed a bug occuring
Expand Down
14 changes: 13 additions & 1 deletion docs/basic_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,14 @@ xGPR quickstart
Build your training set
-------------------------

Start by building a Dataset, which is similar to a DataLoader in PyTorch:::
Start by building a Dataset, which is similar to a DataLoader in PyTorch. If
your data is organized in one of a couple fairly common ways, you can
use a built-in xGPR function to build this dataset. If your data is in
some other form(e.g. a fasta file, an SQLite db or an HDF5 file) and you
don't want to make a copy of it, you can instead
subclass xGPR's ``DatasetBaseclass`` and build a
custom ``Dataset`` object. We'll look at the more common
situations first:::

from xGPR import build_regression_dataset

Expand Down Expand Up @@ -56,6 +63,11 @@ When you create the dataset, xGPR will do some checks to make sure that
what you fed it makes sense. If the dataset is very large, these may take a
second.

Finally, let's say your data is a fasta file, a csv file, an HDF5 file or some
other format. You can create your own Dataset that loads the data in minibatches
during training and does any preprocessing you want to do on each minibatch.
To see how to do this, check out :doc:`notebooks/custom_dataset_example`.

Fit your model and make predictions
-------------------------------------

Expand Down
45,731 changes: 45,731 additions & 0 deletions docs/notebooks/CASP.csv

Large diffs are not rendered by default.

Loading

0 comments on commit 8c3b327

Please sign in to comment.