Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moved sharktank runner to ossci cluster #990

Draft
wants to merge 35 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
3e53034
added print debugging
Feb 13, 2025
428686f
shortened tests for faster iterations
Feb 14, 2025
e4c9501
attempted fix
Feb 14, 2025
1cde838
fixed device issue
Feb 14, 2025
fd02a97
removed docker cleanup step
Feb 15, 2025
ab6b527
moved big test back to old runner
Feb 15, 2025
0fa650d
added ci-sharktank
Feb 15, 2025
c3bd7d2
add hf token
saienduri Feb 15, 2025
a5a143e
removed sharktank workflow because I dont have a HF token
Feb 17, 2025
2d93939
added back large test
Feb 17, 2025
d98ca17
reverted llama bench for merge
Feb 17, 2025
9127d59
updated hf token
Feb 17, 2025
c9c761e
reverted shark-tank
Feb 18, 2025
01bb4c1
addressed comments
Feb 18, 2025
922fc87
tried to fix path in sharktank
Feb 19, 2025
fa0ba0e
tried to fix path in sharktank
Feb 19, 2025
806dd2f
moved quark artifacts to writable mount
Feb 20, 2025
a1b1282
added permissions
Feb 20, 2025
0db2d98
tried to fix path in sharktank
Feb 20, 2025
bae7745
seeing if tests pass while removing delete line
Feb 21, 2025
e031685
attempted to fix prefills issue
Feb 21, 2025
7aaa179
attempted to fix prefills issue
Feb 21, 2025
a32102b
attempted to fix prefills issue
Feb 21, 2025
c45a397
tried to fix path in sharktank
Feb 21, 2025
89d3aef
tried to fix path in sharktank
Feb 21, 2025
18af41d
tried to fix path in sharktank
Feb 21, 2025
e1102ca
tried to fix path in sharktank
Feb 21, 2025
475309f
tried to fix path in sharktank
Feb 21, 2025
fd7b7bd
tried to fix path in sharktank
Feb 21, 2025
976b2ba
tried to fix path in sharktank
Feb 21, 2025
531d86f
tried to fix path in sharktank
Feb 21, 2025
2e70e71
tried to fix path in sharktank
Feb 21, 2025
b6a53d7
tried to fix path in sharktank
Feb 21, 2025
9beaea8
cleaned up pr
Feb 21, 2025
d403e53
cleaned up pr
Feb 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions .github/workflows/ci-sharktank.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,15 +93,16 @@ jobs:
strategy:
matrix:
python-version: [3.11]
runs-on: [llama-mi300x-3]
runs-on: [linux-mi300-1gpu-ossci]
fail-fast: false
runs-on: ${{matrix.runs-on}}
defaults:
run:
shell: bash
env:
VENV_DIR: ${{ github.workspace }}/.venv
HF_HOME: "/data/huggingface"
HF_HOME: "/shark-cache/data/huggingface"
HF_TOKEN: ${{ secrets.HF_FLUX_TOKEN }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

Expand Down Expand Up @@ -149,7 +150,7 @@ jobs:
sharktank/tests/models/vae/vae_test.py \
sharktank/tests/models/llama/quark_parity_test.py \
--durations=0 \
--timeout=800
--timeout=10000
# TODO: add back
# --with-t5-data \
# when #888 is resolved
Expand Down Expand Up @@ -193,7 +194,7 @@ jobs:
run: |
pytest -v sharktank/ -m punet_quick \
--durations=0 \
--timeout=600
--timeout=900

# Depends on other jobs to provide an aggregate job status.
# TODO(#584): move test_with_data and test_integration to a pkgci integration test workflow?
Expand Down
8 changes: 4 additions & 4 deletions sharktank/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,25 +191,25 @@ def pytest_addoption(parser):
parser.addoption(
"--google-t5-v1-1-small-f32-model-path",
type=Path,
default="/data/t5/small/google__t5-v1_1-small_f32.gguf",
default="/shark-dev/data/t5/small/google__t5-v1_1-small_f32.gguf",
help="Google T5 v1.1 small float32 model path",
)
parser.addoption(
"--google-t5-v1-1-small-bf16-model-path",
type=Path,
default="/data/t5/small/google__t5-v1_1-small_bf16.gguf",
default="/shark-dev/data/t5/small/google__t5-v1_1-small_bf16.gguf",
help="Google T5 v1.1 small bfloat16 model path",
)
parser.addoption(
"--google-t5-v1-1-xxl-f32-model-path",
type=Path,
default="/data/t5/xxl/google__t5-v1_1-xxl_f32.gguf",
default="/shark-dev/data/t5/xxl/google__t5-v1_1-xxl_f32.gguf",
help="Google T5 v1.1 XXL float32 model path",
)
parser.addoption(
"--google-t5-v1-1-xxl-bf16-model-path",
type=Path,
default="/data/t5/xxl/google__t5-v1_1-xxl_bf16.gguf",
default="/shark-dev/data/t5/xxl/google__t5-v1_1-xxl_bf16.gguf",
help="Google T5 v1.1 XXL bfloat16 model path",
)

Expand Down
8 changes: 6 additions & 2 deletions sharktank/tests/models/llama/quark_parity_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
class QuarkParityTest(unittest.TestCase):
def setUp(self):
super().setUp()
self.path_prefix = Path("/shark-dev/quark_test")
self.path_prefix = Path("/shark-cache/quark_test")

@with_quark_data
def test_compare_against_quark(self):
Expand Down Expand Up @@ -54,7 +54,7 @@ def test_compare_against_quark(self):
"sharktank.examples.paged_llm_v1",
"The capitol of Texas is",
f"--irpa-file={self.path_prefix}/fp8_bf16_weight.irpa",
f"--tokenizer-config-json=/data/llama3.1/8b/tokenizer.json",
f"--tokenizer-config-json=/shark-dev/data/llama3.1/weights/8b/tokenizer.json",
"--fake-quant",
"--attention-kernel=torch",
"--activation-dtype=bfloat16",
Expand All @@ -69,6 +69,10 @@ def test_compare_against_quark(self):
command, shell=True, capture_output=True, cwd=sharktank_dir
)

f_ = open("/shark-cache/quark_test/test0.txt", "w+")
f_.write(str(proc))
f_.close()

ours = dict()
with safe_open(our_path, "pytorch") as st:
for key in st.keys():
Expand Down
Loading