vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 14.9k
Star 74.6k

Code
Issues 1.8k
Pull requests 2.2k
Discussions
Actions
Projects
Security 35
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm

Labels 53 Milestones 2

New pull request New

2,216 Open 21,231 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[XPU] Fix spec-decode UTs under tests/v1/spec_decode intel-gpu

Related to Intel GPU

speculative-decoding v1

#38491 opened Mar 30, 2026 by yma11

Loading…

Revert "[Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement" (#38139) v1

#38490 opened Mar 30, 2026 by zhewenl • Draft

[Misc] Always use forward_mulmat for Conv3d on newer versions of torch.

#38487 opened Mar 29, 2026 by ywang96

Loading…

5 tasks

[Build] Add SM121 (DGX Spark / GB10) to published build targets ci/build nvidia

#38484 opened Mar 29, 2026 by JCorners68

Loading…

3 tasks

fix(v1): Handle max_model_len overflow gracefully instead of crashing v1

#38483 opened Mar 29, 2026 by machov

Loading…

(security) Fix SSRF in batch runner download_bytes_from_url documentation

Improvements or additions to documentation

frontend

#38482 opened Mar 29, 2026 by jperezdealgaba

Loading…

Fix potential infinite loop in SonnetDataset.sample when using short input-len performance

Performance-related issues

#38481 opened Mar 29, 2026 by frankie-ys

Loading…

1 of 5 tasks

[Attention Backend] TurboQuant: 2-bit KV cache compression with 4x capacity nvidia v1

#38479 opened Mar 29, 2026 by vibhavagarwal5

Loading…

[Bug fix][Quantization] Fix dummy weight loading bug

Something isn't working

needs-rebase

#38478 opened Mar 29, 2026 by Josephasafg

Loading…

3 of 5 tasks

Feat/usage policy tests frontend

#38477 opened Mar 29, 2026 by Csrayz

Loading…

[WIP] Add TRITON_MLA_SPARSE backend for SM80 sparse MLA support documentation

Improvements or additions to documentation

nvidia rocm

Related to AMD ROCm

#38476 opened Mar 29, 2026 by haosdent • Draft

fix(p2p_nccl): free KV recv_store entries immediately to prevent OOM (#38472) kv-connector v1

#38475 opened Mar 29, 2026 by saifmb0

Loading…

fix: Add apply_with_spec_decode() method to LogitBiasLogitsProcessor v1

#38469 opened Mar 29, 2026 by ranger2571

Loading…

5 tasks

Add platform manual_seed_all API intel-gpu

Related to Intel GPU

nvidia performance

Performance-related issues

rocm

Related to AMD ROCm

speculative-decoding v1

#38468 opened Mar 29, 2026 by yma11

Loading…

[Feature] Add apply_with_spec_decode() to LogitBiasLogitsProcessor v1

#38467 opened Mar 29, 2026 by NJX-njx

Loading…

[Bugfix] Add CPU profiler summary equivalent to CUDA summary bug

Something isn't working

cpu

Related to CPU backends

nvidia

#38466 opened Mar 29, 2026 by NJX-njx

Loading…

[Bugfix] Fix limit_mm_per_prompt being ignored for encoder cache profiling bug

Something isn't working

multi-modality

Related to multi-modality (#4194)

#38465 opened Mar 29, 2026 by NJX-njx

Loading…

[Logging] Improve DCP error message to suggest VLLM_ATTENTION_BACKEND v1

#38464 opened Mar 29, 2026 by WJYuuuu

Loading…

3 of 5 tasks

[Quantization] Consolidate experts_int8 with fp8 online quantization needs-rebase

#38463 opened Mar 29, 2026 by Josephasafg • Draft

3 of 5 tasks

[Logging] Add JIT compilation progress log for FlashInfer nvidia v1

#38462 opened Mar 29, 2026 by WJYuuuu

Loading…

3 of 5 tasks

Fixed issues multi-modality

Related to multi-modality (#4194)

#38461 opened Mar 29, 2026 by rpathade • Draft

[Perf] Batch KV cache swap copies via cuMemcpyBatchAsync ready

ONLY add when PR is ready to merge/full CI is needed

#38460 opened Mar 29, 2026 by Etelis

Loading…

[Docs] Add vLLM CI overview documentation for contributors documentation

Improvements or additions to documentation

#38458 opened Mar 29, 2026 by khluu

Loading…

3 tasks

[ROCm] [DOC] Update the Documentation to include ROCm Nightly Wheel support documentation

Improvements or additions to documentation

rocm

Related to AMD ROCm

#38457 opened Mar 29, 2026 by tjtanaa

Loading…

5 tasks

[CI] Fix online FP8 quantization materializing tensors on CPU bug

Something isn't working

needs-rebase

#38456 opened Mar 29, 2026 by haosdent

Loading…

Previous 1 2 3 4 5 … 88 89 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!