Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#18737: Add blackhole nightly and demo test workflows #18741

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions .github/workflows/blackhole-demo-tests-impl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
name: "[internal] Blackhole Demo tests impl"

on:
workflow_call:

jobs:
single-card-demo-tests:
strategy:
fail-fast: false
matrix:
test-group: [
{
name: "BH_functionality",
arch: blackhole,
runs-on: ["cloud-virtual-machine", "BH", "in-service"],
cmd:
},
{
name: "BH_performance",
arch: blackhole,
runs-on: ["BH", "bare-metal", "in-service"],
cmd:
}
]
name: ${{ matrix.test-group.name }}
env:
ARCH_NAME: ${{ matrix.test-group.arch }}
LOGURU_LEVEL: INFO
LD_LIBRARY_PATH: ${{ github.workspace }}/build/lib
runs-on: ${{ matrix.test-group.runs-on }}
steps:
- uses: tenstorrent/tt-metal/.github/actions/checkout-with-submodule-lfs@main
- name: Enable Performance mode
if: ${{ matrix.test-group.name == 'BH_performance' }}
run: |
sudo cpupower frequency-set -g performance
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Michael: not sure if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see cpupower available on the baremetals, shouldn't require new deps

ubuntu@yyzo-bh-gh03:~$ which cpupower
/usr/bin/cpupower
ubuntu@yyzo-bh-gh03:~$

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these are just blackhole workflows, do we need the if statement?
@skhorasganiTT just FYI, these 'BMs' are MSI gaming Consumer Grade level machines so just want to let you know that perf may not be as you expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The demo test flow distinguishes between single card perf vs functionality demo tests, so I've preserved it for BH equivalent. If we're only running demo perf then we can drop the if statement (and the BH_functionality test group).

- name: Set up dynamic env vars for build
run: |
echo "TT_METAL_HOME=$(pwd)" >> $GITHUB_ENV
- uses: ./.github/actions/prepare-metal-run
- uses: ./.github/actions/install-python-deps
- name: Run demo regression tests
timeout-minutes: 70
run: |
source ${{ github.workspace }}/python_env/bin/activate
cd $TT_METAL_HOME
export PYTHONPATH=$TT_METAL_HOME
source ${{ github.workspace }}/tests/scripts/single_card/run_single_card_demo_tests.sh
${{ matrix.test-group.cmd }}
- name: Save environment data
if: ${{ matrix.test-group.name == 'BH_performance' && !cancelled() }}
env:
PYTHONPATH: ${{ github.workspace }}
run: |
source ${{ github.workspace }}/python_env/bin/activate
python3 .github/scripts/data_analysis/create_benchmark_with_environment_json.py
- name: Upload benchmark data
if: ${{ matrix.test-group.name == 'BH_performance' && !cancelled() }}
uses: ./.github/actions/upload-data-via-sftp
with:
ssh-private-key: ${{ secrets.SFTP_BENCHMARK_WRITER_KEY }}
sftp-batchfile: .github/actions/upload-data-via-sftp/benchmark_data_batchfile.txt
username: ${{ secrets.SFTP_BENCHMARK_WRITER_USERNAME }}
hostname: ${{ secrets.SFTP_BENCHMARK_WRITER_HOSTNAME }}
- uses: ./.github/actions/upload-artifact-with-job-uuid
timeout-minutes: 10
if: ${{ !cancelled() }}
with:
path: |
generated/test_reports/
prefix: "test_reports_"
- name: Disable Performance mode
if: ${{ matrix.test-group.name == 'BH_performance' }}
run: |
sudo cpupower frequency-set -g ondemand
17 changes: 17 additions & 0 deletions .github/workflows/blackhole-demo-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: "(Blackhole) Demo tests"

on:
workflow_dispatch:
# workflow_call:
# schedule:
# - cron: "0 */6 * * 1,2,3,4,5"
# - cron: "0 */4 * * 0,6"

jobs:
build-artifact:
uses: ./.github/workflows/build-artifact.yaml
secrets: inherit
single-card-demo-tests:
needs: build-artifact
secrets: inherit
uses: ./.github/workflows/blackhole-demo-tests-impl.yaml
48 changes: 48 additions & 0 deletions .github/workflows/blackhole-nightly-tests-impl.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: "[internal] Blackhole nightly tests impl"

on:
workflow_call:

jobs:
nightly-bh-models:
strategy:
# Do not fail-fast because we need to ensure all tests go to completion
# so we try not to get hanging machines
fail-fast: false
matrix:
card: [BH]
model: [whisper]
name: Nightly ${{ matrix.card }} ${{ matrix.model }}
env:
ARCH_NAME: blackhole
LOGURU_LEVEL: INFO
LD_LIBRARY_PATH: ${{ github.workspace }}/build/lib
runs-on: ["cloud-virtual-machine", "in-service", "${{ matrix.card }}"]
steps:
- uses: tenstorrent/tt-metal/.github/actions/checkout-with-submodule-lfs@main
- name: Set up dyanmic env vars for build
run: |
echo "TT_METAL_HOME=$(pwd)" >> $GITHUB_ENV
- uses: ./.github/actions/prepare-metal-run
- uses: ./.github/actions/install-python-deps
- name: Run frequent reg tests scripts
timeout-minutes: 30
# Llama3 has a single pytest for multiple llama models, hence it requires calling it multiple times.
# Due to host OOM issues in CI vm, we currently only run llama-1B in the model matrix.
run: |
source ${{ github.workspace }}/python_env/bin/activate
cd $TT_METAL_HOME
export PYTHONPATH=$TT_METAL_HOME
if [[ "${{ matrix.model }}" == *"llama3"* ]]; then
pytest -n auto tests/nightly/single_card/llama3 -k ${{ matrix.model }}
fi
if [[ "${{ matrix.model }}" != *"llama3"* ]]; then
pytest -n auto tests/nightly/single_card/${{ matrix.model }}
fi
- uses: ./.github/actions/upload-artifact-with-job-uuid
timeout-minutes: 10
if: ${{ !cancelled() }}
with:
path: |
generated/test_reports/
prefix: "test_reports_"
16 changes: 16 additions & 0 deletions .github/workflows/blackhole-nightly-tests.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: "(Blackhole) Blackhole nightly tests"

on:
workflow_dispatch:
# workflow_call:
# schedule:
# - cron: "0 */6 * * *"

jobs:
build-artifact:
uses: ./.github/workflows/build-artifact.yaml
secrets: inherit
fd-nightly:
needs: build-artifact
uses: ./.github/workflows/blackhole-nightly-tests-impl.yaml
secrets: inherit
Loading