Skip to content

Load testing the speech‐to‐text workflow

Andrew Berger edited this page Dec 18, 2024 · 2 revisions

Test collection

This argo-qa collection contains all the speech-to-text samples intended for testing. If creating new items - outside of integration tests - please add them to that collection.

Items to use for batch testing

Within that collection, there are 22 items that are tagged specifically so they can be rerun in a batch. These items have been through Whisper multiple times. Sometimes whisper has failed while the batch was running, sometimes it has made it all the way through. It will take a few hours to process them.

Running text extraction as a bulk action

(General documentation on how to run bulk actions is on Consul.)

High-level outline for running text extraction as a bulk action is:

  1. "Open" the batch of items
  2. Run text extraction
  3. See if anything breaks

The easiest way to run the bulk action is:

  1. Click on the link to the tagged items above
  2. From within the search results window, click on "Bulk actions". This is the link just above the results list. (Do not click on the "Bulk actions" link in the Argo header area - that link will not save your search results for use later.)
  3. From the bulk actions page, choose "New bulk action"
  4. Choose "Open new object versions" (under "Modify objects via form")
  5. Click on the button labeled "Populate with previous search" - this should automatically enter the list of tagged druids from your search results list. You can also paste in the list of druids yourself.
  6. Run the bulk action
  7. Wait until all the items have been opened, but don't leave the bulk actions page yet. You will need to refresh to see the progress of the bulk action. You may see an error if an item was already open. You can ignore that. The important thing is to make sure all items are opened before running text extraction.
  8. Once all the items are opened, choose "New bulk action" again.
  9. This time choose "Start text extraction"
  10. Click on "populate with previous search" - if you haven't run any other searches this should auto-fill the same druid list again
  11. Run the bulk action

At this point, the bulk action will start the speech-to-text workflow on all the items in the druid list. You can monitor the progress in Argo by checking on the item statuses. Each item will change to "Accessioned" when it's done processing.