You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ValueError: The model's max seq len (163840) is larger than the maximum number of tokens that can be stored in KV cache (13360). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.
#83
Open
ArtificialZeng opened this issue
Aug 15, 2024
· 3 comments
Traceback (most recent call last):
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/cli/deploy.py", line 5, in
[rank0]: deploy_main()
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/utils/run_utils.py", line 32, in x_main
[rank0]: result = llm_x(args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/llm/deploy.py", line 773, in llm_deploy
[rank0]: llm_engine, template = prepare_vllm_engine_template(args, use_async=True)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/llm/utils/vllm_utils.py", line 542, in prepare_vllm_engine_template
[rank0]: llm_engine = get_vllm_engine(
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/llm/utils/vllm_utils.py", line 116, in get_vllm_engine
[rank0]: llm_engine = llm_engine_cls.from_engine_args(engine_args)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 471, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 381, in init
[rank0]: self.engine = self._init_engine(*args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 552, in _init_engine
[rank0]: return engine_class(*args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 263, in init
[rank0]: self._initialize_kv_caches()
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 375, in _initialize_kv_caches
[rank0]: self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 62, in initialize_cache
[rank0]: self._run_workers("initialize_cache",
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 192, in _run_workers
[rank0]: driver_worker_output = driver_worker_method(*args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/worker/worker.py", line 214, in initialize_cache
[rank0]: raise_if_cache_size_invalid(num_gpu_blocks,
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/worker/worker.py", line 374, in raise_if_cache_size_invalid
[rank0]: raise ValueError(
[rank0]: ValueError: The model's max seq len (163840) is larger than the maximum number of tokens that can be stored in KV cache (13360). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.
ERROR 08-15 17:01:04 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3015877 died, exit code: -15
ERROR 08-15 17:01:04 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3015878 died, exit code: -15
ERROR 08-15 17:01:04 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3015880 died, exit code: -15
INFO 08-15 17:01:04 multiproc_worker_utils.py:123] Killing local vLLM worker processes
/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 7 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
The text was updated successfully, but these errors were encountered:
Hi @ArtificialZeng@EdWangLoDaSc - this is a vllm issue. You need to pass --max-model-len less than your KV cache size (e.g. 13360 specified in the title of this ticket).
This worked for me - python -m vllm.entrypoints.openai.api_server --trust-remote-code --model deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct --port 9000 --host 0.0.0.0 --max-model-len 80000
Traceback (most recent call last):
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/cli/deploy.py", line 5, in
[rank0]: deploy_main()
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/utils/run_utils.py", line 32, in x_main
[rank0]: result = llm_x(args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/llm/deploy.py", line 773, in llm_deploy
[rank0]: llm_engine, template = prepare_vllm_engine_template(args, use_async=True)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/llm/utils/vllm_utils.py", line 542, in prepare_vllm_engine_template
[rank0]: llm_engine = get_vllm_engine(
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/swift/llm/utils/vllm_utils.py", line 116, in get_vllm_engine
[rank0]: llm_engine = llm_engine_cls.from_engine_args(engine_args)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 471, in from_engine_args
[rank0]: engine = cls(
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 381, in init
[rank0]: self.engine = self._init_engine(*args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 552, in _init_engine
[rank0]: return engine_class(*args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 263, in init
[rank0]: self._initialize_kv_caches()
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 375, in _initialize_kv_caches
[rank0]: self.model_executor.initialize_cache(num_gpu_blocks, num_cpu_blocks)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 62, in initialize_cache
[rank0]: self._run_workers("initialize_cache",
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 192, in _run_workers
[rank0]: driver_worker_output = driver_worker_method(*args, **kwargs)
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/worker/worker.py", line 214, in initialize_cache
[rank0]: raise_if_cache_size_invalid(num_gpu_blocks,
[rank0]: File "/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/site-packages/vllm/worker/worker.py", line 374, in raise_if_cache_size_invalid
[rank0]: raise ValueError(
[rank0]: ValueError: The model's max seq len (163840) is larger than the maximum number of tokens that can be stored in KV cache (13360). Try increasing gpu_memory_utilization or decreasing max_model_len when initializing the engine.
ERROR 08-15 17:01:04 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3015877 died, exit code: -15
ERROR 08-15 17:01:04 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3015878 died, exit code: -15
ERROR 08-15 17:01:04 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 3015880 died, exit code: -15
INFO 08-15 17:01:04 multiproc_worker_utils.py:123] Killing local vLLM worker processes
/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 7 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
/home/apus/mambaforge/envs/vllm_deepseekv2/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
The text was updated successfully, but these errors were encountered: