Skip to content

Commit

Permalink
Add retry mechanism for pushing eval results (#252)
Browse files Browse the repository at this point in the history
The Hub throws 403 errors if there are too many concurrent pushes to the same repo, so we need a retry mechanism when that happens.
  • Loading branch information
lewtun authored Feb 9, 2025
1 parent 90c1bfe commit 9be2e9a
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions slurm/evaluate.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,15 @@ OUTPUT_FILEPATHS=$(find $OUTPUT_DIR/results/ -type f \( -name "*.json" \))
for filepath in $OUTPUT_FILEPATHS; do
echo "Uploading $filepath to Hugging Face Hub..."
filename=$(basename -- "$filepath")
huggingface-cli upload --repo-type space --private $LM_EVAL_REPO_ID $filepath $OUTPUT_DIR/$filename
for attempt in {1..20}; do
if huggingface-cli upload --repo-type space --private $LM_EVAL_REPO_ID $filepath $OUTPUT_DIR/$filename; then
echo "Upload succeeded for $filepath"
break
else
echo "Upload failed for $filepath. Attempt $attempt of 20. Retrying in 5 seconds..."
sleep 5
fi
done
done

echo "Uploading details to Hugging Face Hub..."
Expand All @@ -78,4 +86,4 @@ python src/open_r1/utils/upload_details.py --data_files $DETAILS_FILEPATHS --hub
echo "Cleaning up ..."
rm -rf $OUTPUT_DIR

echo "Done!"
echo "Done!"

0 comments on commit 9be2e9a

Please sign in to comment.