Skip to content

Commit

Permalink
Merge branch 'main' into 4782-standardize-fields-shown-in-the-cases-l…
Browse files Browse the repository at this point in the history
…isted-in-cited-by-authorities-and-related-cases
  • Loading branch information
quevon24 authored Mar 11, 2025
2 parents 4bcaf9c + eccc7b2 commit 11173fb
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 12 deletions.
2 changes: 1 addition & 1 deletion cl/api/templates/recap-api-docs-vlatest.html
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ <h4 id="dockets">Purchasing Dockets</h4>
<h2 id="recap-upload">RECAP Upload API <small><code>{% url "processingqueue-list" version=version %}</code></small></h2>
<p>This API is used by the RECAP extension and a handful of special partners to upload PACER content to the RECAP Archive. This API is not available to the public. If you have a collection of PACER data you wish to donate to the RECAP Archive so it is permanently available to the public, please <a href="{% url "contact" %}">get in touch</a>.
</p>
<p>We describe the process for completing these uploads below, and you can see examples of them in <a href="https://github.com/freelawproject/courtlistener/blob/main/cl/recap/tests.py">CourtListener's automated test suite</a>. Uploads to these endpoints should be done using HTTP <code>POST</code> requests and multipart form data.
<p>We describe the process for completing these uploads below, and you can see examples of them in <a href="https://github.com/freelawproject/courtlistener/blob/main/cl/recap/tests/tests.py">CourtListener's automated test suite</a>. Uploads to these endpoints should be done using HTTP <code>POST</code> requests and multipart form data.
</p>
<p>When you make an upload, you create a <code>Processing Queue</code> object in the CourtListener system. This object will be returned in the HTTP response to your upload, so you will know its ID. This object will contain the fields you uploaded, and the following fields will be populated as the item is processed:
</p>
Expand Down
30 changes: 19 additions & 11 deletions cl/corpus_importer/signals.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,12 +52,19 @@ def update_latest_case_id_and_schedule_iquery_sweep(docket: Docket) -> None:
tasks_to_schedule = (
incoming_pacer_case_id - iquery_pacer_case_id_current
)
logger.info(
"Found %s tasks to schedule for pacer case IDs ranging from %s to %s.",
tasks_to_schedule,
iquery_pacer_case_id_current,
incoming_pacer_case_id,
)
if tasks_to_schedule > 10_800:
# Considering a Celery countdown of 1 second and a visibility_timeout
# of 6 hours, the maximum countdown time should be set to 21,600 to
# avoid a celery runaway. It's safer to abort if more than 10,800
# tasks are attempted to be scheduled. This could indicate an issue
# with retrieving the highest_known_pacer_case_id or a loss of the
# Considering a Celery countdown of 1 second applied via
# throttle_task and a visibility_timeout of 6 hours, the maximum
# countdown time should be set to 21,600 to avoid a celery runaway.
# It's safer to abort if more than 10,800 tasks are attempted to be
# scheduled. This could indicate an issue with retrieving the
# highest_known_pacer_case_id or a loss of the
# iquery_pacer_case_id_current for the court in Redis.
logger.error(
"Tried to schedule more than 10,800 iquery pages to scrape for "
Expand All @@ -66,20 +73,21 @@ def update_latest_case_id_and_schedule_iquery_sweep(docket: Docket) -> None:
)
release_redis_lock(r, update_lock_key, lock_value)
return None
task_scheduled_countdown = 0
task_to_schedule_count = 0
while iquery_pacer_case_id_current + 1 < incoming_pacer_case_id:
iquery_pacer_case_id_current += 1
task_scheduled_countdown += 1
# Schedule the next task with a 1-second countdown increment
task_to_schedule_count += 1
# Schedule the next task.
make_docket_by_iquery_sweep.apply_async(
args=(court_id, iquery_pacer_case_id_current),
kwargs={"skip_iquery_sweep": True},
countdown=task_scheduled_countdown,
queue=settings.CELERY_IQUERY_QUEUE,
)
logger.info(
f"Enqueued iquery docket case ID: {iquery_pacer_case_id_current} "
f"for court {court_id} with countdown {task_scheduled_countdown}"
"Enqueued %s iquery docket with case ID: %s for court %s",
task_to_schedule_count,
iquery_pacer_case_id_current,
court_id,
)

# Update the iquery_pacer_case_id_current in Redis
Expand Down

0 comments on commit 11173fb

Please sign in to comment.