xinference0.15.0 不能自动引入`flash_attn` #2287

LaureatePoet · 2024-09-12T07:43:18Z

System Info / 系統信息

Python 3.8.17
torch 2.0.0

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

docker / docker
pip install / 通过 pip install 安装
installation from source / 从源码安装

Version info / 版本信息

xinference0.15.0

The command used to start Xinference / 用以启动 xinference 的命令

nohup xinference-local --host 0.0.0.0 --port 9997 > ./xinfer.log 2>&1 &

Reproduction / 复现过程

比如Qwen2-VL，启动模型后并没有开启flash_attention_2

目前的临时解决方案，进入xinference pip包路径

# 修改该模型的load参数
vim model/llm/transformers/qwen2_vl.py
#修改第63行
self.model_path, torch_dtype="bfloat16", device_map=device, attn_implementation="flash_attention_2", trust_remote_code=True

或者

sed -i '63s|self.model_path, device_map=device, trust_remote_code=True|self.model_path, torch_dtype="bfloat16", device_map=device, attn_implementation="flash_attention_2", trust_remote_code=True|' model/llm/transformers/qwen2_vl.py

Expected behavior / 期待表现

能够检测系统环境是否存在flash_attn包，如果存在则load模型时开启 torch_dtype="bfloat16", attn_implementation="flash_attention_2"。

The text was updated successfully, but these errors were encountered:

qinxuye · 2024-09-12T07:45:17Z

有兴趣提交 PR 来支持吗？

LaureatePoet · 2024-09-12T07:50:57Z

可能有兴趣，但不知道如何下手，还没搞清楚trust_remote_code这里传递的是什么，或者是否可以传递torch_dtype="bfloat16", attn_implementation="flash_attention_2"

qinxuye · 2024-09-12T08:05:18Z

没关系，可以先让支持这两个参数，有了 PR 我们再来看兼容等问题。

XprobeBot added this to the v0.15 milestone Sep 12, 2024

LaureatePoet mentioned this issue Sep 12, 2024

feat: Update Qwen2-VL-Model to support flash_attention_2 implementation LaureatePoet/inference#1

Merged

LaureatePoet closed this as completed in LaureatePoet/inference#1 Sep 12, 2024

LaureatePoet mentioned this issue Sep 12, 2024

FEAT: Update Qwen2-VL-Model to support flash_attention_2 implementation #2289

Merged

qinxuye reopened this Sep 12, 2024

qinxuye closed this as completed in #2289 Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xinference0.15.0 不能自动引入`flash_attn` #2287

xinference0.15.0 不能自动引入`flash_attn` #2287

LaureatePoet commented Sep 12, 2024

qinxuye commented Sep 12, 2024

LaureatePoet commented Sep 12, 2024

qinxuye commented Sep 12, 2024

xinference0.15.0 不能自动引入flash_attn #2287

xinference0.15.0 不能自动引入flash_attn #2287

Comments

LaureatePoet commented Sep 12, 2024

System Info / 系統信息

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece？

Version info / 版本信息

The command used to start Xinference / 用以启动 xinference 的命令

Reproduction / 复现过程

Expected behavior / 期待表现

qinxuye commented Sep 12, 2024

LaureatePoet commented Sep 12, 2024

qinxuye commented Sep 12, 2024

xinference0.15.0 不能自动引入`flash_attn` #2287

xinference0.15.0 不能自动引入`flash_attn` #2287