-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liger kernel brake fine tuning #5542
Comments
I encounter the same issue when using DPO to fine-tune qwen2-vl.
|
I also encounter the same issue, the question seems to be casued by 'enable_liger_kernel: true', I want to use this parameter to reduce the memory footprint. |
fixed |
how to fix it |
* 'main' of github.com:hurongliang/LLaMA-Factory: (61 commits) update wechat fix hiyouga#5542 add patch processor func lint Update constants.py Update template.py fix chat template Exaone3.0 Update README_zh.md Update README.md update docs Support model Exaone3.0 add Exaone3.0 template Update common.py Update README_zh.md Update README.md Update README.md Update constants.py Update test_mm_plugin.py fix template fix template fix constants ...
Not yet, the last code (download at 12.25) also has the same issue: |
Reminder
System Info
LLaMA Factory, version 0.9.1.dev0
liger_kernel 0.3.0
transformers 4.45.0.dev0
Reproduction
llamafactory-cli train ./examples/train_lora/qwen2vl_loraplus_dpo_2b_20_09.yaml
model
model_name_or_path: Qwen/Qwen2-VL-2B-Instruct
method
stage: dpo
do_train: true
finetuning_type: lora
lora_target: all
pref_beta: 0.3
pref_loss: sigmoid
dataset
dataset: obrazy_rlhf_v__proba
buffer_size: 1
preprocessing_batch_size: 1
streaming: true
val_size: 260
#accelerator_config:
dispatch_batches: false
template: qwen2_vl
cutoff_len: 2748
#max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 1
output
output_dir: saves/qwen2_vl-2b_loraplus/25v1_beta0_5_orig
logging_steps: 500
save_steps: 500
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 1
gradient_checkpointing: true
gradient_accumulation_steps: 1
learning_rate: 5.0e-6
num_train_epochs: 3.0
flash_attn: auto
lr_scheduler_type: cosine
max_grad_norm: 1.0
loraplus_lr_ratio: 16.0
enable_liger_kernel: true
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
max_steps: 2200
eval
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 200
Expected behavior
Unfortunately, running the training with liger kernel causes the following error:
[rank0]: AttributeError: 'NoneType' object has no attribute 'to'
My liger_kernel 0.3.0
llamafactory 0.9.1.dev0
transformers 4.45.0.dev0
09/25/2024 12:07:58 - INFO - llamafactory.model.model_utils.liger_kernel - Liger kernel has been applied to the model.
09/25/2024 12:07:58 - INFO - llamafactory.model.model_utils.liger_kernel - Liger kernel has been applied to the model.
[INFO|modeling_utils.py:3702] 2024-09-25 12:07:58,644 >> loading weights file model.safetensors from cache at /home/python/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/aca78372505e6cb469c4fa6a35c60265b00ff5a4/model.safetensors.index.json
[INFO|modeling_utils.py:1621] 2024-09-25 12:07:58,653 >> Instantiating Qwen2VLForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:1097] 2024-09-25 12:07:58,654 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"eos_token_id": 151645
}
[WARNING|logging.py:328] 2024-09-25 12:07:58,688 >> Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00, 5.88s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00, 5.88s/it]
[INFO|modeling_utils.py:4544] 2024-09-25 12:08:10,541 >> All model checkpoint weights were used when initializing Qwen2VLForConditionalGeneration.
[INFO|modeling_utils.py:4552] 2024-09-25 12:08:10,541 >> All the weights of Qwen2VLForConditionalGeneration were initialized from the model checkpoint at Qwen/Qwen2-VL-2B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2VLForConditionalGeneration for predictions without further training.
[INFO|configuration_utils.py:1052] 2024-09-25 12:08:10,685 >> loading configuration file generation_config.json from cache at /home/python/.cache/huggingface/hub/models--Qwen--Qwen2-VL-2B-Instruct/snapshots/aca78372505e6cb469c4fa6a35c60265b00ff5a4/generation_config.json
[INFO|configuration_utils.py:1097] 2024-09-25 12:08:10,685 >> Generate config GenerationConfig {
"bos_token_id": 151643,
"do_sample": true,
"eos_token_id": [
151645,
151643
],
"pad_token_id": 151643,
"temperature": 0.01,
"top_k": 1,
"top_p": 0.001
}
09/25/2024 12:08:10 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
09/25/2024 12:08:10 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
09/25/2024 12:08:10 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
09/25/2024 12:08:10 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
09/25/2024 12:08:10 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
09/25/2024 12:08:10 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
09/25/2024 12:08:10 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
09/25/2024 12:08:10 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
09/25/2024 12:08:10 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,down_proj,q_proj,k_proj,gate_proj,up_proj,v_proj
09/25/2024 12:08:10 - INFO - llamafactory.model.model_utils.misc - Found linear modules: q_proj,v_proj,o_proj,gate_proj,down_proj,k_proj,up_proj
09/25/2024 12:08:11 - INFO - llamafactory.model.loader - trainable params: 9,232,384 || all params: 2,218,217,984 || trainable%: 0.4162
09/25/2024 12:08:11 - INFO - llamafactory.model.loader - trainable params: 9,232,384 || all params: 2,218,217,984 || trainable%: 0.4162
max_steps is given, it will override any value given in num_train_epochs
[WARNING|trainer.py:617] 2024-09-25 12:08:11,039 >> max_steps is given, it will override any value given in num_train_epochs
[INFO|trainer.py:667] 2024-09-25 12:08:11,039 >> Using auto half precision backend
09/25/2024 12:08:11 - INFO - llamafactory.train.trainer_utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.
09/25/2024 12:08:11 - INFO - llamafactory.train.trainer_utils - Using LoRA+ optimizer with loraplus lr ratio 16.00.
[INFO|trainer.py:2212] 2024-09-25 12:08:13,575 >> ***** Running training *****
[INFO|trainer.py:2213] 2024-09-25 12:08:13,575 >> Num examples = 4,400
[INFO|trainer.py:2214] 2024-09-25 12:08:13,575 >> Num Epochs = 9,223,372,036,854,775,807
[INFO|trainer.py:2215] 2024-09-25 12:08:13,575 >> Instantaneous batch size per device = 1
[INFO|trainer.py:2218] 2024-09-25 12:08:13,575 >> Total train batch size (w. parallel, distributed & accumulation) = 2
[INFO|trainer.py:2219] 2024-09-25 12:08:13,575 >> Gradient Accumulation steps = 1
[INFO|trainer.py:2220] 2024-09-25 12:08:13,575 >> Total optimization steps = 2,200
[INFO|trainer.py:2221] 2024-09-25 12:08:13,578 >> Number of trainable parameters = 9,232,384
0%| | 0/2200 [00:00<?, ?it/s][rank0]: Traceback (most recent call last):
[rank0]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in
[rank0]: launch()
[rank0]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
[rank0]: run_exp()
[rank0]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/train/tuner.py", line 56, in run_exp
[rank0]: run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
[rank0]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/train/dpo/workflow.py", line 81, in run_dpo
[rank0]: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/python/factory/env/lib/python3.11/site-packages/transformers/trainer.py", line 2021, in train
[rank0]: return inner_training_loop(
[rank0]: ^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/python/factory/env/lib/python3.11/site-packages/transformers/trainer.py", line 2357, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/python/factory/env/lib/python3.11/site-packages/transformers/trainer.py", line 3454, in training_step
[rank0]: loss = self.compute_loss(model, inputs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/python/factory/env/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py", line 1408, in compute_loss
[rank0]: loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/train/dpo/trainer.py", line 232, in get_batch_loss_metrics
[rank0]: ) = self.concatenated_forward(model, batch)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/train/dpo/trainer.py", line 182, in concatenated_forward
[rank0]: all_logits: "torch.Tensor" = model(**batch, return_dict=True, use_cache=False).logits.to(torch.float32)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AttributeError: 'NoneType' object has no attribute 'to'
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/launcher.py", line 23, in
[rank1]: launch()
[rank1]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/launcher.py", line 19, in launch
[rank1]: run_exp()
[rank1]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/train/tuner.py", line 56, in run_exp
[rank1]: run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
[rank1]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/train/dpo/workflow.py", line 81, in run_dpo
[rank1]: train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/python/factory/env/lib/python3.11/site-packages/transformers/trainer.py", line 2021, in train
[rank1]: return inner_training_loop(
[rank1]: ^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/python/factory/env/lib/python3.11/site-packages/transformers/trainer.py", line 2357, in _inner_training_loop
[rank1]: tr_loss_step = self.training_step(model, inputs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/python/factory/env/lib/python3.11/site-packages/transformers/trainer.py", line 3454, in training_step
[rank1]: loss = self.compute_loss(model, inputs)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/python/factory/env/lib/python3.11/site-packages/trl/trainer/dpo_trainer.py", line 1408, in compute_loss
[rank1]: loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/train/dpo/trainer.py", line 232, in get_batch_loss_metrics
[rank1]: ) = self.concatenated_forward(model, batch)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/home/python/factory/LLaMA-Factory/src/llamafactory/train/dpo/trainer.py", line 182, in concatenated_forward
[rank1]: all_logits: "torch.Tensor" = model(**batch, return_dict=True, use_cache=False).logits.to(torch.float32)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: AttributeError: 'NoneType' object has no attribute 'to'
0%| | 0/2200 [00:13<?, ?it/s]
E0925 12:08:30.915000 140353497219136 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 0 (pid: 3061541) of binary: /home/python/factory/env/bin/python3
Traceback (most recent call last):
File "/home/python/factory/env/bin/torchrun", line 8, in
sys.exit(main())
^^^^^^
File "/home/python/factory/env/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/python/factory/env/lib/python3.11/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/home/python/factory/env/lib/python3.11/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/home/python/factory/env/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 133, in call
return launch_agent(self._config, self._entrypoint, list(args))
Others
No response
The text was updated successfully, but these errors were encountered: