Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial project stubs, CI, tests and first example #1

Merged
merged 7 commits into from
Oct 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
name: CI tests
on:
pull_request:

push:
branches:
- main

env:
FORCE_COLOR: true

concurrency:
group: ${{ github.head_ref || github.run_id }}
cancel-in-progress: true

jobs:
pre-commit:
name: Run linters and other pre-commit hooks
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: "3.10"

- name: Set up uv
uses: astral-sh/setup-uv@v3
with:
version: "0.4.x"
enable-cache: true

- name: Install dependencies
run: |
uv sync --all-extras

- name: Run pre-commit
run: |
uv run pre-commit run --all-files --show-diff-on-failure

pytest:
name: Run Python unit tests
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
# tests need an unshallowed version of the repository to check the version
fetch-depth: 0

- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Set up uv
uses: astral-sh/setup-uv@v3
with:
version: "0.4.x"
enable-cache: true

- name: Install locked versions of dependencies
run: |
uv sync --all-extras

- name: Run all tests
run: |
uv run pytest -rs -vvv
36 changes: 36 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
default_language_version:
python: python3.10

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
exclude: (pydatalab/example_data/)|(.*.snap)
args: [--markdown-linebreak-ext=md]
- id: check-yaml
args: [--unsafe]
- id: check-json
- id: end-of-file-fixer
exclude: ^(pydatalab/example_data/|pydatalab/schemas)
- id: check-added-large-files
args: [--maxkb=1024]
- id: check-symlinks
- id: mixed-line-ending

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: "v0.6.9"
hooks:
- id: ruff
args: [--fix]
- id: ruff-format

- repo: https://github.com/asottile/pyupgrade
rev: v3.17.0
hooks:
- id: pyupgrade

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.11.2
hooks:
- id: mypy
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.10
41 changes: 41 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,44 @@
# yeLLowhaMmer

</div>

A continuation of the [LLM hackathon](https://www.eventbrite.com/e/llm-hackathon-for-applications-in-materials-and-chemistry-tickets-868303598437) project, [*yeLLowhaMMer: A Multi-modal Tool-calling Agent for Accelerated Research Data Management*](https://github.com/bocarsly-group/llm-hackathon-2024).

This repository will explore using the Jupyter AI plugin to provide an agentic interface to [*datalab*](https://github.com/datalab-org/datalab), with the idea of having this as an additional UI that *datalab* users can use to interact with their data, either deployed for their instance, or run locally.

## Initial development tasks

- [ ] Reproduce the hackathon project with the Jupyter AI plugin, probably by extending the `%%ai` cell magic to something like `%%yellowhammer` that includes the yellowhammer system prompt, and guides through the registration of a *datalab* API key and any provided LLM API keys.
- [ ] Use yellowhammer to generate API examples for the underlying [*datalab-python-api*](https://github.com/datalab-org/datalab-python-api) package.
- [ ] Consider deploying this as a JupyterHub that *datalab* instances can link to directly.
- [ ] Integrate the results much more closely into *datalab* itself, i.e., attaching notebooks to the relevant samples, and recording the provenance of AI generated data recording.

## Installation

This repository uses [`uv`](https://docs.astral.sh/uv/) for the entire packaging workflow.
Once you have installed `uv` following their documentation, you can install this repository by cloning and running `uv sync` in the root directory (optionally with `--dev` if you plan to develop it further).

```shell
git clone git@github.com:datalab-org/yellowhammer
cd yellowhammer
uv sync --dev
```

### Launching example notebooks

You can launch the example notebook locally with `uv` too:

```shell
uv run jupyter lab examples/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me, but uv run jupyter notebook examples/ doesn't load the environment correctly... any idea why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notebook isn't installed by default, perhaps its pulling from somewhere else? I can add it to the lockfile

```

The examples will require you to bring your own *datalab* API key (for your instance of choice) and API keys for any underlying LLM providers (OpenAI, Anthropic, etc.).

These can be set in your shell profile, or simply in your shell before launching Jupyter, using:
```bash
export OPENAI_API_KEY=sk-proj...
export ANTHROPIC_API_KEY=sk-ant...
```

`yellowhammer` by default will come preloaded with the relevant OpenAI and Anthropic packages.
You can see how to configure other providers in the [Jupyter AI plugin documentation](https://jupyter-ai.readthedocs.io/en/latest/users/index.html#model-providers).
131 changes: 131 additions & 0 deletions examples/001-getting-started.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "929301f6-5982-4662-b68f-9e72269cace7",
"metadata": {},
"outputs": [],
"source": [
"%reload_ext jupyter_ai "
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "f7f8325c-18b8-4943-b7e8-14a839060958",
"metadata": {},
"outputs": [
{
"data": {
"text/markdown": [
"Here's the information about Datalab in markdown format:\n",
"\n",
"# Datalab\n",
"\n",
"Datalab is a tool developed by Google Cloud Platform that provides an interactive environment for data exploration, analysis, and machine learning. Key features include:\n",
"\n",
"## Features\n",
"\n",
"- **Jupyter Notebooks**: Uses Jupyter notebooks for interactive coding and visualization\n",
"- **Cloud Integration**: Seamlessly integrates with Google Cloud services\n",
"- **Big Data Support**: Designed to work with large datasets using BigQuery and other Google Cloud data services\n",
"- **Pre-installed Libraries**: Comes with popular data science libraries like pandas, numpy, and scikit-learn\n",
"- **Collaborative**: Allows for easy sharing and collaboration on data projects\n",
"\n",
"## Use Cases\n",
"\n",
"- Data exploration and visualization\n",
"- Machine learning model development\n",
"- Big data analysis\n",
"- Prototyping data pipelines\n",
"\n",
"## Advantages\n",
"\n",
"- Easy setup and configuration\n",
"- Cost-effective (pay only for resources used)\n",
"- Scalable to handle large datasets\n",
"- Integrates well with other Google Cloud services\n",
"\n",
"Datalab is particularly useful for data scientists and analysts who work with Google Cloud Platform and need a powerful, cloud-based environment for their data projects."
],
"text/plain": [
"<IPython.core.display.Markdown object>"
]
},
"execution_count": 5,
"metadata": {
"text/markdown": {
"jupyter_ai": {
"model_id": "claude-3-5-sonnet-20240620",
"provider_id": "anthropic-chat"
}
}
},
"output_type": "execute_result"
}
],
"source": [
"%%ai anthropic-chat:claude-3-5-sonnet-20240620\n",
"Do you know what datalab is?"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "12aaa0b1-6ccd-4645-b578-d1fde5cd2614",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"UsageError: Line magic function `%%yellowhammer` not found.\n"
]
}
],
"source": [
"\"\"\" (mock up ) \"\"\"\n",
"%%yellowhammer\n",
"Do you know what datalab is?"
]
},
{
"cell_type": "markdown",
"id": "b943d5eb-a83a-4ed5-b531-e69a44d6d701",
"metadata": {},
"source": [
"Yes, *datalab* is ACTUALLY an open source research data management tool for chemistry and materials science, and here is the functionality I know about..."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8d93f30b-3dde-457a-8aa9-9d08cc19f3ad",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
38 changes: 38 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
[project]
name = "yellowhammer"
readme = "README.md"
license = { text = "MIT" }
requires-python = ">=3.10"
dynamic = ["version"]
dependencies = [
"datalab-api>=0.2.4",
"jupyter-ai>=2.24.1",
"jupyterlab>=4.2.5",
"langchain-anthropic>=0.1.23",
"langchain-openai>=0.1.25",
"notebook>=7.2.2",
"yellowhammer",
]

[project.urls]
homepage = "https://github.com/datalab-org/yellowhammer"
repository = "https://github.com/datalab-org/yellowhammer"
documentation = "https://github.com/datalab-org/yellowhammer"

[build-system]
requires = ["setuptools >= 70", "setuptools_scm ~= 8.1", "wheel"]
build-backend = "setuptools.build_meta"

[tool.setuptools_scm]
fallback_version = "0.1.0"

[tool.ruff]
line-length = 100
target-version = "py310"

[tool.mypy]
ignore_missing_imports = true
follow_imports = "skip"

[tool.uv]
dev-dependencies = ["ipykernel>=6.29.5", "pre-commit>=4.0.0", "pytest>=8.3.3"]
8 changes: 8 additions & 0 deletions src/yellowhammer/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
from importlib.metadata import PackageNotFoundError, version

try:
__version__ = version("yellowhammer")
except PackageNotFoundError:
__version__ = "develop"

__all__ = ("__version__",)
5 changes: 5 additions & 0 deletions src/yellowhammmer/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from importlib import metadata

__version__ = metadata.version("yellowhammer")

__all__ = ("__version__",)
4 changes: 4 additions & 0 deletions tests/test_import.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
def test_version():
from yellowhammer import __version__

assert __version__
Loading