Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Dockerize support #2849

Merged
merged 2 commits into from
Mar 15, 2024
Merged

Improve Dockerize support #2849

merged 2 commits into from
Mar 15, 2024

Conversation

S3Studio
Copy link
Contributor

Here are some improvements based on the discussions of PR # (2743).
In addressing a compatibility issue, I made some slight alterations on the existing codes. Feel free to discuss with me.

What does this PR do?

Improve PR # (2743)

Before submitting

Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.
Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
@hiyouga hiyouga self-requested a review March 15, 2024 04:24
Copy link
Owner

@hiyouga hiyouga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the comment

Dockerfile Show resolved Hide resolved
@hiyouga hiyouga added the pending This problem is yet to be addressed label Mar 15, 2024
@hiyouga hiyouga merged commit 113cc04 into hiyouga:main Mar 15, 2024
1 check passed
@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Mar 15, 2024
@S3Studio S3Studio deleted the DockerizeSupport branch March 16, 2024 13:49
@hiyouga
Copy link
Owner

hiyouga commented Mar 28, 2024

Hello @S3Studio , how to specify the device id in dockerize training, such as CUDA_VISIBLE_DEVICES=0. I think it is necessary in readme.md since we will check the device num before training.

if not from_preview and get_device_count() > 1:
return ALERTS["err_device_count"][lang]

@hiyouga
Copy link
Owner

hiyouga commented Mar 28, 2024

You can also review this commit: c1fe6ce

@S3Studio
Copy link
Contributor Author

S3Studio commented Apr 2, 2024

You can also review this commit: c1fe6ce

I believe this commit solves the issue.

@hiyouga
Copy link
Owner

hiyouga commented Apr 2, 2024

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants