Skip to content

Commit

Permalink
chore(datasets): improve logging and retry logic for MPUs (#387)
Browse files Browse the repository at this point in the history
This adds some basic retry logic for multipart upload parts so we don't
choke immediately if a single part fails for any reason. It also adds
some decidedly jank logging to update the end user on the progress of
large uploads. Unfortunately this can create some really noisy console
logs, but the way we have Halo implemented makes it just annoying enough
to do a cleaner job of things as to not be worthwhile. The important
thing is that users should now get relevant, timely information as to
how large uploads are progressing, or an informative error message if
the upload fails to complete for any reason.
  • Loading branch information
cwetherill-ps authored Apr 18, 2022
1 parent 2769721 commit b289316
Showing 1 changed file with 32 additions and 4 deletions.
36 changes: 32 additions & 4 deletions gradient/commands/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -640,12 +640,40 @@ def _put(self, path, url, content_type, dataset_version_id=None, key=None):
)[0]['url']

chunk = f.read(part_minsize)
part_res = session.put(
presigned_url,
data=chunk,
timeout=5)
for attempt in range(0, 5):
part_res = session.put(
presigned_url,
data=chunk,
timeout=5)
if part_res.status_code == 200:
break

if part_res.status_code != 200:
# Why do we silence exceptions that get
# explicitly raised? Mystery for the ages, but
# there you have it I guess...
print(f'\nUnable to complete upload of {path}')
raise ApplicationError(
f'Unable to complete upload of {path}')
etag = part_res.headers['ETag'].replace('"', '')
parts.append({'ETag': etag, 'PartNumber': part})
# This is a pretty jank way to get about multipart
# upload status updates, but we structure the Halo
# spinner to report on the number of completed
# tasks dispatched to the workers in the pool.
# Since it's more of a PITA to properly distribute
# this MPU among all workers than I really want to
# deal with, that means we can't easily plug into
# Halo for these updates. But we can print to
# console! Which again, jank and noisy, but arguably
# better than a task sitting forever, never either
# completing or emitting an error message.
if len(parts) % 7 == 0: # About every 100MB
print(
f'\nUploaded {len(parts) * part_minsize / 10e5}MB '
f'of {int(size / 10e5)}MB for '
f'{path}'
)

r = api_client.post(
url=mpu_url,
Expand Down

0 comments on commit b289316

Please sign in to comment.