Merge branch 'main' of https://github.com/xorbitsai/inference

* 'main' of https://github.com/xorbitsai/inference: FEAT: support qwen2.5-coder-instruct and qwen2.5 sglang (xorbitsai#2332) DOC: update models for doc and readme (xorbitsai#2330) BUG: fix stable diffusion from dify tool (xorbitsai#2336) BUG: support old register llm format (xorbitsai#2335) FEAT: Support Qwen 2.5 (xorbitsai#2325) BUG: Fix CosyVoice missing output (xorbitsai#2320) BUG: [UI] Fix registration page bug. (xorbitsai#2315) Bug: modify vllm image version (xorbitsai#2312) BUG: modify vllm image version (xorbitsai#2311) FEAT: qwen2 audio (xorbitsai#2271) BUG: fix sampler_name for img2img (xorbitsai#2301) FEAT: Support yi-coder-chat (xorbitsai#2302) FEAT: support flux.1 image2image and inpainting (xorbitsai#2296) FEAT: support sdapi/img2img (xorbitsai#2293) ENH: Support fish speech 1.4 (xorbitsai#2295) FEAT: Update Qwen2-VL-Model to support flash_attention_2 implementation (xorbitsai#2289) FEAT: support deepseek-v2 and 2.5 (xorbitsai#2292) # Conflicts: # xinference/model/audio/cosyvoice.py
Vanocore · Sep 22, 2024 · e3df693 · e3df693
2 parents 1c5e6f2 + 5de46e9
commit e3df693
Show file tree

Hide file tree

Showing 84 changed files with 4,671 additions and 1,864 deletions.
diff --git a/.github/workflows/python.yaml b/.github/workflows/python.yaml
@@ -171,6 +171,7 @@ jobs:
             ${{ env.SELF_HOST_PYTHON }} -m pip install -U "loguru"
             ${{ env.SELF_HOST_PYTHON }} -m pip install -U "natsort"
             ${{ env.SELF_HOST_PYTHON }} -m pip install -U "loralib"
+            ${{ env.SELF_HOST_PYTHON }} -m pip install -U "ormsgpack"
             ${{ env.SELF_HOST_PYTHON }} -m pip uninstall -y opencc
             ${{ env.SELF_HOST_PYTHON }} -m pip uninstall -y "faster_whisper"
             ${{ env.SELF_HOST_PYTHON }} -m pytest --timeout=1500 \

diff --git a/README.md b/README.md
@@ -34,14 +34,14 @@ potential of cutting-edge AI models.
 - Support speech recognition model: [#929](https://github.com/xorbitsai/inference/pull/929)
 - Metrics support: [#906](https://github.com/xorbitsai/inference/pull/906)
 ### New Models
+- Built-in support for [Qwen 2.5 Series](https://qwenlm.github.io/blog/qwen2.5/): [#2325](https://github.com/xorbitsai/inference/pull/2325)
+- Built-in support for [Fish Speech V1.4](https://huggingface.co/fishaudio/fish-speech-1.4): [#2295](https://github.com/xorbitsai/inference/pull/2295)
+- Built-in support for [DeepSeek-V2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5): [#2292](https://github.com/xorbitsai/inference/pull/2292)
+- Built-in support for [Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio): [#2271](https://github.com/xorbitsai/inference/pull/2271)
 - Built-in support for [Qwen2-vl-instruct](https://github.com/QwenLM/Qwen2-VL): [#2205](https://github.com/xorbitsai/inference/pull/2205)
 - Built-in support for [MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B): [#2263](https://github.com/xorbitsai/inference/pull/2263)
 - Built-in support for [CogVideoX](https://github.com/THUDM/CogVideo): [#2049](https://github.com/xorbitsai/inference/pull/2049)
 - Built-in support for [flux.1-schnell & flux.1-dev](https://www.basedlabs.ai/tools/flux1): [#2007](https://github.com/xorbitsai/inference/pull/2007)
-- Built-in support for [MiniCPM-V 2.6](https://github.com/OpenBMB/MiniCPM-V): [#2031](https://github.com/xorbitsai/inference/pull/2031)
-- Built-in support for [Kolors](https://huggingface.co/Kwai-Kolors/Kolors): [#2028](https://github.com/xorbitsai/inference/pull/2028)
-- Built-in support for [SenseVoice](https://github.com/FunAudioLLM/SenseVoice): [#2008](https://github.com/xorbitsai/inference/pull/2008)
-- Built-in support for [Mistral Large 2](https://mistral.ai/news/mistral-large-2407/): [#1944](https://github.com/xorbitsai/inference/pull/1944)
 ### Integrations
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): an LLMOps platform that enables developers (and even non-developers) to quickly build useful applications based on large language models, ensuring they are visual, operable, and improvable.
 - [FastGPT](https://github.com/labring/FastGPT): a knowledge-based platform built on the LLM, offers out-of-the-box data processing and model invocation capabilities, allows for workflow orchestration through Flow visualization.

diff --git a/README_zh_CN.md b/README_zh_CN.md
@@ -31,14 +31,14 @@ Xorbits Inference（Xinference）是一个性能强大且功能全面的分布
 - 支持语音识别模型: [#929](https://github.com/xorbitsai/inference/pull/929)
 - 增加 Metrics 统计信息: [#906](https://github.com/xorbitsai/inference/pull/906)
 ### 新模型
+- 内置 [Qwen 2.5 Series](https://qwenlm.github.io/blog/qwen2.5/): [#2325](https://github.com/xorbitsai/inference/pull/2325)
+- 内置 [Fish Speech V1.4](https://huggingface.co/fishaudio/fish-speech-1.4): [#2295](https://github.com/xorbitsai/inference/pull/2295)
+- 内置 [DeepSeek-V2.5](https://huggingface.co/deepseek-ai/DeepSeek-V2.5): [#2292](https://github.com/xorbitsai/inference/pull/2292)
+- 内置 [Qwen2-Audio](https://github.com/QwenLM/Qwen2-Audio): [#2271](https://github.com/xorbitsai/inference/pull/2271)
 - 内置 [Qwen2-vl-instruct](https://github.com/QwenLM/Qwen2-VL): [#2205](https://github.com/xorbitsai/inference/pull/2205)
 - 内置 [MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B): [#2263](https://github.com/xorbitsai/inference/pull/2263)
 - 内置 [CogVideoX](https://github.com/THUDM/CogVideo): [#2049](https://github.com/xorbitsai/inference/pull/2049)
 - 内置 [flux.1-schnell & flux.1-dev](https://www.basedlabs.ai/tools/flux1): [#2007](https://github.com/xorbitsai/inference/pull/2007)
-- 内置 [MiniCPM-V 2.6](https://github.com/OpenBMB/MiniCPM-V): [#2031](https://github.com/xorbitsai/inference/pull/2031)
-- 内置 [Kolors](https://huggingface.co/Kwai-Kolors/Kolors): [#2028](https://github.com/xorbitsai/inference/pull/2028)
-- 内置 [SenseVoice](https://github.com/FunAudioLLM/SenseVoice): [#2008](https://github.com/xorbitsai/inference/pull/2008)
-- 内置 [Mistral Large 2](https://mistral.ai/news/mistral-large-2407/): [#1944](https://github.com/xorbitsai/inference/pull/1944)
 ### 集成
 - [FastGPT](https://doc.fastai.site/docs/development/custom-models/xinference/)：一个基于 LLM 大模型的开源 AI 知识库构建平台。提供了开箱即用的数据处理、模型调用、RAG 检索、可视化 AI 工作流编排等能力，帮助您轻松实现复杂的问答场景。
 - [Dify](https://docs.dify.ai/advanced/model-configuration/xinference): 一个涵盖了大型语言模型开发、部署、维护和优化的 LLMOps 平台。

diff --git a/doc/source/getting_started/installation.rst b/doc/source/getting_started/installation.rst
@@ -44,7 +44,8 @@ Currently, supported models include:
 - ``codestral-v0.1``
 - ``Yi``, ``Yi-1.5``, ``Yi-chat``, ``Yi-1.5-chat``, ``Yi-1.5-chat-16k``
 - ``code-llama``, ``code-llama-python``, ``code-llama-instruct``
-- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``
+- ``deepseek``, ``deepseek-coder``, ``deepseek-chat``, ``deepseek-coder-instruct``, ``deepseek-v2-chat``, ``deepseek-v2-chat-0628``, ``deepseek-v2.5``
+- ``yi-coder``, ``yi-coder-chat``
 - ``codeqwen1.5``, ``codeqwen1.5-chat``
 - ``baichuan-2-chat``
 - ``internlm2-chat``
@@ -56,6 +57,7 @@ Currently, supported models include:
 - ``codegeex4``
 - ``qwen1.5-chat``, ``qwen1.5-moe-chat``
 - ``qwen2-instruct``, ``qwen2-moe-instruct``
+- ``qwen2.5-instruct``
 - ``gemma-it``, ``gemma-2-it``
 - ``orion-chat``, ``orion-chat-rag``
 - ``c4ai-command-r-v01``

diff --git a/doc/source/models/builtin/audio/fishspeech-1.2-sft.rst b/doc/source/models/builtin/audio/fishspeech-1.2-sft.rst
diff --git a/doc/source/models/builtin/audio/fishspeech-1.4.rst b/doc/source/models/builtin/audio/fishspeech-1.4.rst
@@ -0,0 +1,19 @@
+.. _models_builtin_fishspeech-1.4:
+
+==============
+FishSpeech-1.4
+==============
+
+- **Model Name:** FishSpeech-1.4
+- **Model Family:** FishAudio
+- **Abilities:** text-to-audio
+- **Multilingual:** True
+
+Specifications
+^^^^^^^^^^^^^^
+
+- **Model ID:** fishaudio/fish-speech-1.4
+
+Execute the following command to launch the model::
+
+   xinference launch --model-name FishSpeech-1.4 --model-type audio
diff --git a/doc/source/models/builtin/audio/index.rst b/doc/source/models/builtin/audio/index.rst
@@ -25,7 +25,7 @@ The following is a list of built-in audio models in Xinference:
 
    cosyvoice-300m-sft
 
-   fishspeech-1.2-sft
+   fishspeech-1.4
 
    sensevoicesmall
 

diff --git a/doc/source/models/builtin/image/flux.1-dev.rst b/doc/source/models/builtin/image/flux.1-dev.rst
@@ -6,7 +6,7 @@ FLUX.1-dev
 
 - **Model Name:** FLUX.1-dev
 - **Model Family:** stable_diffusion
-- **Abilities:** text2image
+- **Abilities:** text2image, image2image, inpainting
 - **Available ControlNet:** None
 
 Specifications

diff --git a/doc/source/models/builtin/image/flux.1-schnell.rst b/doc/source/models/builtin/image/flux.1-schnell.rst
@@ -6,7 +6,7 @@ FLUX.1-schnell
 
 - **Model Name:** FLUX.1-schnell
 - **Model Family:** stable_diffusion
-- **Abilities:** text2image
+- **Abilities:** text2image, image2image, inpainting
 - **Available ControlNet:** None
 
 Specifications

diff --git a/doc/source/models/builtin/llm/deepseek-v2-chat-0628.rst b/doc/source/models/builtin/llm/deepseek-v2-chat-0628.rst
@@ -0,0 +1,31 @@
+.. _models_llm_deepseek-v2-chat-0628:
+
+========================================
+deepseek-v2-chat-0628
+========================================
+
+- **Context Length:** 128000
+- **Model Name:** deepseek-v2-chat-0628
+- **Languages:** en, zh
+- **Abilities:** chat
+- **Description:** DeepSeek-V2-Chat-0628 is an improved version of DeepSeek-V2-Chat. 
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 236 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 236
+- **Quantizations:** 4-bit, 8-bit, none
+- **Engines**: vLLM, Transformers, SGLang (vLLM and SGLang only available for quantization none)
+- **Model ID:** deepseek-ai/DeepSeek-V2-Chat-0628
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat-0628>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Chat-0628>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-v2-chat-0628 --size-in-billions 236 --model-format pytorch --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/deepseek-v2-chat.rst b/doc/source/models/builtin/llm/deepseek-v2-chat.rst
@@ -0,0 +1,47 @@
+.. _models_llm_deepseek-v2-chat:
+
+========================================
+deepseek-v2-chat
+========================================
+
+- **Context Length:** 128000
+- **Model Name:** deepseek-v2-chat
+- **Languages:** en, zh
+- **Abilities:** chat
+- **Description:** DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. 
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 16 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 16
+- **Quantizations:** 4-bit, 8-bit, none
+- **Engines**: vLLM, Transformers, SGLang (vLLM and SGLang only available for quantization none)
+- **Model ID:** deepseek-ai/DeepSeek-V2-Lite-Chat
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite-Chat>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite-Chat>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-v2-chat --size-in-billions 16 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (pytorch, 236 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 236
+- **Quantizations:** 4-bit, 8-bit, none
+- **Engines**: vLLM, Transformers, SGLang (vLLM and SGLang only available for quantization none)
+- **Model ID:** deepseek-ai/DeepSeek-V2-Chat
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-V2-Chat>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Chat>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-v2-chat --size-in-billions 236 --model-format pytorch --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/deepseek-v2.5.rst b/doc/source/models/builtin/llm/deepseek-v2.5.rst
@@ -0,0 +1,31 @@
+.. _models_llm_deepseek-v2.5:
+
+========================================
+deepseek-v2.5
+========================================
+
+- **Context Length:** 128000
+- **Model Name:** deepseek-v2.5
+- **Languages:** en, zh
+- **Abilities:** chat
+- **Description:** DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions.
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 236 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 236
+- **Quantizations:** 4-bit, 8-bit, none
+- **Engines**: vLLM, Transformers, SGLang (vLLM and SGLang only available for quantization none)
+- **Model ID:** deepseek-ai/DeepSeek-V2.5
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-V2.5>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-V2.5>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-v2.5 --size-in-billions 236 --model-format pytorch --quantization ${quantization}
+
diff --git a/doc/source/models/builtin/llm/deepseek-v2.rst b/doc/source/models/builtin/llm/deepseek-v2.rst
@@ -0,0 +1,47 @@
+.. _models_llm_deepseek-v2:
+
+========================================
+deepseek-v2
+========================================
+
+- **Context Length:** 128000
+- **Model Name:** deepseek-v2
+- **Languages:** en, zh
+- **Abilities:** generate
+- **Description:** DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. 
+
+Specifications
+^^^^^^^^^^^^^^
+
+
+Model Spec 1 (pytorch, 16 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 16
+- **Quantizations:** 4-bit, 8-bit, none
+- **Engines**: Transformers
+- **Model ID:** deepseek-ai/DeepSeek-V2-Lite
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-V2-Lite>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-v2 --size-in-billions 16 --model-format pytorch --quantization ${quantization}
+
+
+Model Spec 2 (pytorch, 236 Billion)
+++++++++++++++++++++++++++++++++++++++++
+
+- **Model Format:** pytorch
+- **Model Size (in billions):** 236
+- **Quantizations:** 4-bit, 8-bit, none
+- **Engines**: Transformers
+- **Model ID:** deepseek-ai/DeepSeek-V2
+- **Model Hubs**:  `Hugging Face <https://huggingface.co/deepseek-ai/DeepSeek-V2>`__, `ModelScope <https://modelscope.cn/models/deepseek-ai/DeepSeek-V2>`__
+
+Execute the following command to launch the model, remember to replace ``${quantization}`` with your
+chosen quantization method from the options listed above::
+
+   xinference launch --model-engine ${engine} --model-name deepseek-v2 --size-in-billions 236 --model-format pytorch --quantization ${quantization}
+