-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rouge-score results are surprisingly low #3764
Comments
Hi @Jiminator , as only 1000 samples are used and batch size is 8, setting "badam_switch_interval=50" will only update 8 blocks (1000*3/(8*50)=7.5), while Llama 3-8B has 32 block when using layer-wise partition. You should try increasing the training epochs or reduce switch interval (yet we suggest to set it larger than 20) to ensure every blocks are trained. Usually, setting "badam_switch_mode" to be "ascending" or "random" yields faster convergence at the begining. As finetuning Llama 3-8B requires no more than 24GB memory and V100 has 32GB memory, you can alternatively set each trainable block to be larger, e.g. contain 2 or more layers (instead of single layer). You can follow the instruction in https://github.com/Ledzy/BAdam?tab=readme-ov-file#partition-by-module to set block partition (This requires modifying the _create_badam_optimizer function a bit), which will yield faster convergence as the increased training parameter offers larger parameter search space. Alternatively, you can try "--badam_mode ratio" with proper "badam_update_ratio" that fits into your memory limit. |
You should use |
@Jiminator Exactly, be aware of using same template in training and inference |
我用的是 qwen模型来微调 qwen1.5的大模型,验证集的rouge同样很低,最高才10几分。可是等我训练好后,再次单独去推理,rouge是80分以上。。。很理解不了到底是为啥。。肯定是哪里发生了重大的变化。 |
Reminder
Reproduction
Here is my finetuning yaml:
Here is the yaml file I used to calculate rouge-score/bleu score
Expected behavior
After fine-tuning the llama3 8B using Badam, I expected the rouge and bleu scores to significantly improve, as with my previous experiments using Lora, Lora+, and Qlora. However, the output of my prediction script showed that Badam only netted a very slight improvement.
Llama-3-8b (base):
Llama-3-8b (Badam):
System Info
transformers
version: 4.40.0Others
The only changes I made to the original badam example are the dataset I am using to fine-tune, the location of the val_size variable in the YAML file, and pure_bf16. Since I am using a v100, my machine does not support bf_16, and when I try to just use
fp16: true
, I get an errorValueError: Attempting to unscale FP16 gradients.
I tried fixing this error by switching topeft=0.6.0
, but then llmtuner doesn't work(ImportError: peft>=0.10.0 is required for a normal functioning of this module, but found peft==0.6.0.
).bf16: false
allows the fine-tuning script to work, but I worry its not doing the fine-tuning correctly. Any help or advice would be greatly appreciated!The text was updated successfully, but these errors were encountered: