fix(worker): per-commit concurrency limit for upload_finisher tasks#755
fix(worker): per-commit concurrency limit for upload_finisher tasks#755thomasrockhu-codecov wants to merge 4 commits intomainfrom
Conversation
Add a Redis INCR-based concurrency gate: only MAX_CONCURRENT_FINISHERS_PER_COMMIT (3) tasks can actively work on a given commit at any time. Excess tasks exit in milliseconds, freeing their worker slots immediately. The counter is decremented in a finally block with a 660s TTL safety net. Made-with: Cursor
- Fix mock_redis.incr returning MagicMock instead of int (TypeError on comparison) by setting incr.return_value=1 in conftest fixture - Replace silent return with self.retry() when concurrency limit is reached, so uploads are retried rather than dropped - Update test to expect Retry exception instead of return value Made-with: Cursor
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #755 +/- ##
=======================================
Coverage 92.26% 92.26%
=======================================
Files 1304 1304
Lines 47925 47942 +17
Branches 1628 1628
=======================================
+ Hits 44218 44236 +18
+ Misses 3398 3397 -1
Partials 309 309
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Align the per-commit finisher cap with the intended limit and make the retry path explicitly raise so counter bookkeeping cannot fall through. Made-with: Cursor
| inc_counter(UPLOAD_FINISHER_CONCURRENCY_LIMITED_COUNTER) | ||
| raise self.retry(countdown=FINISHER_BASE_RETRY_COUNTDOWN_SECONDS) |
There was a problem hiding this comment.
Bug: The concurrency-gating retry logic in the upload finisher can exhaust the task's retry budget in high-concurrency scenarios, causing tasks to be dropped and leading to data loss.
Severity: HIGH
Suggested Fix
Modify the self.retry() call used for concurrency gating to prevent it from consuming the task's limited retry budget. This can be achieved by passing max_retries=None to allow for unlimited retries specifically for this waiting period, or by implementing a separate rate-limiting mechanism that does not use task retries.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: apps/worker/tasks/upload_finisher.py#L285-L286
Potential issue: In scenarios with a large number of concurrent finishers for a single
commit, the concurrency-limiting mechanism uses the task's standard retry logic. The
retry call at line 286 consumes one of the five available retry attempts. Tasks that are
repeatedly blocked by the concurrency gate will exhaust their retry budget after
approximately 50 seconds (5 attempts with a 10-second countdown). Once the retries are
exhausted, Celery will drop the task, causing it to fail permanently. This results in
silent data loss, as the corresponding coverage reports are never processed and merged.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| # Only this many finisher tasks are allowed to actively work on a single commit | ||
| # at the same time. Excess tasks exit immediately, freeing their worker slot. | ||
| # A small number (>1) allows for failover if the active finisher crashes. | ||
| MAX_CONCURRENT_FINISHERS_PER_COMMIT = 3 |
There was a problem hiding this comment.
Concurrency limit set to 3 instead of intended 10
Medium Severity
MAX_CONCURRENT_FINISHERS_PER_COMMIT is set to 3, but the PR description explicitly documents 10 as the intended value ("Limit of 10: allows reasonable parallelism while capping the stampede") and the reviewer also suggested 10. A limit of 3 is unnecessarily aggressive and will cause far more tasks to retry than intended, reducing throughput and increasing latency for commits with moderate parallelism.


Summary
Add a Redis-based per-commit concurrency gate to
upload_finishertasks. OnlyMAX_CONCURRENT_FINISHERS_PER_COMMIT(10) finisher tasks can actively work on a given commit at any time. Excess tasks schedule a retry with a short countdown instead of blocking on the lock.Why
Large CI matrices fire one finisher task per upload via chord callbacks. All these finishers fight for the same per-commit Redis lock (
UPLOAD_PROCESSING). Withblocking_timeout=30s, each excess task wastes a worker slot for 30 seconds before failing, retrying with backoff, and going back to the queue. This creates a cascading worker starvation effect.How it works
finallyImpact estimate
For a commit with N concurrent finishers:
Test plan
test_retries_when_concurrency_limit_reached— verifies tasks over the limit raiseRetryand decrement the countertest_proceeds_when_under_concurrency_limit— verifies tasks under the limit proceed normally and decrement on exittest_counter_decremented_on_exception— verifies the counter is decremented even when the task throws an exceptionNote
Medium Risk
Changes
upload_finishertask execution flow to use a Redis counter + retry to throttle per-commit concurrency, which can affect task throughput and retry behavior under load. Also alters which uploads are considered final/covered by the finisher by always merging reconstructed processing results and treatingmergedas a terminal state.Overview
Adds a Redis-based per-commit concurrency limit to
UploadFinisherTask.run_implso onlyMAX_CONCURRENT_FINISHERS_PER_COMMITtasks actively process a commit at once; excess tasks decrement the counter, increment a new metric (upload_finisher_concurrency_limited), andretryquickly instead of blocking worker slots.Refactors finisher execution by moving the main logic into
_run_impl_innerand always reconstructing/mergingprocessing_resultsfromProcessingStateto ensure the finisher considers all uploads for the commit. The idempotency check now also treatsmergeduploads as already-final.Updates test infrastructure (
mock_redis.incrdefault) and adds unit tests covering concurrency-limit retrying, normal operation under the limit, and guaranteed counter decrement on exceptions.Written by Cursor Bugbot for commit b14d01b. This will update automatically on new commits. Configure here.