Mixtral-8x7b build fails with custom_all_reduce #825

rohithkrn · 2024-01-05T18:57:00Z

Env:
TRT-LLM 0.7.1
Host: p4d.24xlarge ec2 instance(A100)

Model: Mixtral-8x7b
Build args: Tp=8, use_custom_all_reduce

python3 /usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/build.py                         --world_size 8                         --tp_size 8                         --dtype float16                         --max_input_len 1024                         --max_output_len 512                         --max_batch_size 32                         --max_beam_width 1                         --use_gpt_attention_plugin float16                         --use_gemm_plugin float16                         --enable_context_fmha                         --use_inflight_batching                         --remove_input_padding                         --paged_kv_cache                         --tokens_per_block 128                         --rotary_base 10000.0                         --output_dir /tmp/trtllm/5dfe24d9115c5ddd8811d9e898e0598c37a274ad/f840fbd912400603858dbefdc15e2597d94438de/1                         --parallel_build --use_custom_all_reduce --model_dir /tmp/download/f840fbd912400603858dbefdc15e2597d94438de

Error log:

Fails with AttributeError: 'NoneType' object has no attribute 'trt_tensor'

[INFO ] 2024-01-05 17:43:41 LmiUtils - convert_py: [01/05/2024-17:43:41] [TRT] [W] IElementWiseLayer with inputs LLaMAForCausalLM/layers/0/post_layernorm/REDUCE_AVG_0_output_0 and LLaMAForCausalLM/layers/0/post_layernorm/SHUFFLE_1_output_0: first input has type Half but second input has type Float.
[INFO ] 2024-01-05 17:43:41 LmiUtils - convert_py: [01/05/2024-17:43:41] [TRT-LLM] [W] Custom allreduce has already used id 0
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: Traceback (most recent call last):
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/build.py", line 902, in <module>
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     mp.spawn(build, nprocs=args.world_size, args=(args, ))
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 246, in spawn
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 202, in start_processes
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     while not context.join():
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 163, in join
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     raise ProcessRaisedException(msg, error_index, failed_process.pid)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: torch.multiprocessing.spawn.ProcessRaisedException: 
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: 
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: -- Process 2 terminated with the following error:
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: Traceback (most recent call last):
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     fn(i, *args)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/build.py", line 850, in build
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     engine = build_rank_engine(builder, builder_config, engine_name,
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/llama/build.py", line 777, in build_rank_engine
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     tensorrt_llm_llama(*inputs)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return self.forward(*args, **kwargs)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 398, in forward
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     hidden_states = super().forward(input_ids, position_ids, use_cache,
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 277, in forward
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     hidden_states = layer(
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return self.forward(*args, **kwargs)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 150, in forward
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     hidden_states = self.mlp(hidden_states,
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return self.forward(*args, **kwargs)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/moe.py", line 302, in forward
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     routing = self.router(routing_input)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/module.py", line 40, in __call__
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return self.forward(*args, **kwargs)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 220, in forward
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     return self.multiply_reduce(x,
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/layers/linear.py", line 206, in multiply_reduce
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     x = allreduce(x, self.tp_group, workspace, self.instance_id)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:   File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 2876, in allreduce
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py:     plug_inputs.append(workspace.trt_tensor)
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: AttributeError: 'NoneType' object has no attribute 'trt_tensor'
[INFO ] 2024-01-05 17:44:09 LmiUtils - convert_py: 
[INFO ] 2024-01-05 17:44:10 LmiUtils - convert_py: /usr/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown

The text was updated successfully, but these errors were encountered:

rohithkrn · 2024-01-05T18:57:22Z

Works well without --use_custom_all_reduce

symphonylyh · 2024-01-05T22:19:11Z

@rohithkrn thanks for reporting this!

Mixtral is MOE model, therefore its MLP layers is calling layers/moe.py, where the function signature is forward(self, hidden_states, finished=None, workspace=None,...) -- finished is the additional field specific to MOE but not in MLP/GatedMLP/FusedGatedMLP, so we should use a kwarg workspace=all_reduce_workspace at this line.

I will fix internally and update on next main branch release. Please apply this local change in the meantime, thanks!

Closing for now. If you still have issue after this fix, please feel free to re-open!

rohithkrn mentioned this issue Jan 5, 2024

disable mixtral custom all reduce flag deepjavalibrary/djl-serving#1458

Merged

symphonylyh self-assigned this Jan 5, 2024

symphonylyh closed this as completed Jan 5, 2024

kaiyux mentioned this issue Jan 9, 2024

Update TensorRT-LLM #846

Merged

JohnnyRacer mentioned this issue Jan 14, 2024

Building Mixtral engine with --load_by_shard option gives contradictory error #842

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mixtral-8x7b build fails with custom_all_reduce #825

Mixtral-8x7b build fails with custom_all_reduce #825

rohithkrn commented Jan 5, 2024

rohithkrn commented Jan 5, 2024

symphonylyh commented Jan 5, 2024

Mixtral-8x7b build fails with custom_all_reduce #825

Mixtral-8x7b build fails with custom_all_reduce #825

Comments

rohithkrn commented Jan 5, 2024

rohithkrn commented Jan 5, 2024

symphonylyh commented Jan 5, 2024