Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: The model's max seq len (163840) is larger than the maximum number of tokens that can be stored in KV cache (13360). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine. #83

Open
ArtificialZeng opened this issue Aug 15, 2024 · 3 comments

Comments

@ArtificialZeng
Copy link

Traceback (most recent call last):
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/cli/deploy.py", line 5, in
[rank0]: deploy_main()
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/utils/run_utils.py", line 32, in x_main
[rank0]: result = llm_x(args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/llm/deploy.py", line 773, in llm_deploy
[rank0]: llm_engine, template = prepare_vllm_engine_template(args, use_async=True)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/llm/utils/vllm_utils.py", line 542, in prepare_vllm_engine_template
[rank0]: llm_engine = get_vllm_engine(
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/llm/utils/vllm_utils.py", line 116, in get_vllm_engine
[rank0]: llm_engine = llm_engine_cls.from_engine_args(engine_args)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 471, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 381, in init
[rank0]: self.engine = self._init_engine(*args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 552, in _init_engine
[rank0]: return engine_class(*args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 263, in init
[rank0]: self._initialize_kv_caches()
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 375, in _initialize_kv_caches
[rank0]: self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 62, in initialize_cache
[rank0]: self._run_workers("initialize_cache",
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 192, in _run_workers
[rank0]: driver_worker_output = driver_worker_method(*args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/worker/worker.py", line 214, in initialize_cache
[rank0]: raise_if_cache_size_invalid(num_gpu_blocks,
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/worker/worker.py", line 374, in raise_if_cache_size_invalid
[rank0]: raise ValueError(
[rank0]: ValueError: The model's max seq len (163840) is larger than the maximum number of tokens that can be stored in KV cache (13360). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.
ERROR 08-15 17:01:04 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3015877 died, exit code: -15
ERROR 08-15 17:01:04 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3015878 died, exit code: -15
ERROR 08-15 17:01:04 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3015880 died, exit code: -15
INFO 08-15 17:01:04 multiproc_worker_utils.py:123] Killing local vLLM worker processes
/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 7 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

@EdWangLoDaSc
Copy link

Have same question

1 similar comment
@Jianwei-Lv
Copy link

Have same question

@alexzender
Copy link

alexzender commented Sep 24, 2024

Hi @ArtificialZeng @EdWangLoDaSc - this is a vllm issue. You need to pass --max-model-len less than your KV cache size (e.g. 13360 specified in the title of this ticket).

This worked for me -
python -m vllm.entrypoints.openai.api_server --trust-remote-code --model deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct --port 9000 --host 0.0.0.0 --max-model-len 80000

Please see vllm-project/vllm#2418 for more details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants