Skip to content

Commit

Permalink
Expand dataset import connectors (#866)
Browse files Browse the repository at this point in the history
* Add new data connectors

* Update documentation/docs/guide/datasets/import-dataset.md

Co-authored-by: Pascal Pfeiffer <1069138+pascal-pfeiffer@users.noreply.github.com>

---------

Co-authored-by: Pascal Pfeiffer <1069138+pascal-pfeiffer@users.noreply.github.com>
  • Loading branch information
oshi98 and pascal-pfeiffer authored Sep 24, 2024
1 parent a505b1a commit 600fc2a
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 1 deletion.
5 changes: 4 additions & 1 deletion documentation/docs/guide/datasets/data-connectors-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,10 @@ H2O LLM Studio supports the following data connectors to access or upload extern
- **Upload**: Upload a local dataset from your machine.
- **Local**: Specify the file location of the dataset on your machine.
- **AWS S3 (Amazon AWS S3)**: Connect to an Amazon AWS S3 data bucket.
- **Kaggle**: Connect to a Kaggle dataset.
- **Azure Datalake**: Connect to a dataset in Azure Datalake.
- **H2O Drive**: Upload a dataset from H2O Drive.
- **Kaggle**: Connect to a dataset hosted on Kaggle.
- **Hugging Face**: Connect to a dataset on Hugging Face.

## Data format

Expand Down
48 changes: 48 additions & 0 deletions documentation/docs/guide/datasets/import-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,30 @@ Follow the relevant steps below to import a dataset to H2O LLM Studio.
</li>
</ol>
</TabItem>
<TabItem value="azure datalake" label="Azure Datalake">
<ol>
<li>
Enter values for the following fields:
<ul>
<li>
<b>Datalake connection string: </b><br></br>
Enter your Azure connection string to connect to Datalake storage.
</li>
<li>
<b>Datalake container name: </b><br></br>
Enter the name of the Azure Data Lake container where your dataset is stored, including the relative path to the file within the container.
</li>
<li>
<b>File name: </b><br></br>
Specify the exact name of the file you want to import.
</li>
</ul>
</li>
<li>
Click <b>Continue</b>.
</li>
</ol>
</TabItem>
<TabItem value="h2o-drive" label="H2O-Drive">
<ol>
<li>
Expand Down Expand Up @@ -126,6 +150,30 @@ Follow the relevant steps below to import a dataset to H2O LLM Studio.
</li>
</ol>
</TabItem>
<TabItem value="hugging face" label="Hugging Face">
<ol>
<li>
Enter values for the following fields:
<ul>
<li>
<b>Hugging Face dataset: </b><br></br>
Enter the name of the Hugging Face dataset.
</li>
<li>
<b>Split: </b><br></br>
Enter the specific data split you want to import (e.g., "train", "test").
</li>
<li>
<b>Hugging Face API token (optional): </b><br></br>
Enter your Hugging Face API token to authenticate access to private datasets or datasets with gated access.
</li>
</ul>
</li>
<li>
Click <b>Continue</b>.
</li>
</ol>
</TabItem>
</Tabs>
:::

Expand Down

0 comments on commit 600fc2a

Please sign in to comment.