Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv-int8 output wrong result #133

Closed
sleepwalker2017 opened this issue Oct 26, 2023 · 6 comments
Closed

kv-int8 output wrong result #133

sleepwalker2017 opened this issue Oct 26, 2023 · 6 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@sleepwalker2017
Copy link

sleepwalker2017 commented Oct 26, 2023

using fp16, the result is:

Input: "<s>Resolving the Israel @-@ Palestine Conflict : University of British Columbia , Vancouver , Jan 21 , 2009 ."
Output: "

The conflict between Israel and Palestine has been a longstanding issue that has been a source of tension and violence in the Middle East for decades. The situation has been further complicated by the involvement of other countries and international organizations, making it a complex and multifaceted issue.

One of the main challenges in resolving the conflict is the deep-seated animosity and mistrust between the two sides. Both Israelis"

using int8, the result is:

Input: "<s>Resolving the Israel @-@ Palestine Conflict : University of British Columbia , Vancouver , Jan 21 , 2009 ."
Output: "
I g ishaz,
IQ on the
IQ about this ish hashts to
I,1 on the
I,1 Blog; weighed" data
Reg inter-
Reg hobser,11111111 and the
In
In
In
In
In
In
The sne (and#8:
The sne gives two websites"

Here is my convert script:

python build.py --model_dir /data/vicuna-13b/vicuna-13b-v1.5/ \
                --dtype float16 \
                --use_gpt_attention_plugin float16 \
                --use_gemm_plugin float16 \
                --output_dir ./tmp/llama/13B-kv-int8/trt_engines/fp16/2-gpu/ \
                --enable_context_fmha_fp32_acc \
                --world_size 2 \
                --tp_size 2 \
                --max_batch_size 32 \
                --int8_kv_cache

how to run:

mpirun  -n 2 --allow-run-as-root python3 run.py --max_output_len=96 \
               --tokenizer_dir /data/models/llama-7b-hf \
               --engine_dir=./tmp/llama/13B-kv-int8/trt_engines/fp16/2-gpu/
@sleepwalker2017 sleepwalker2017 changed the title kv-int8 output result seems error kv-int8 output result seems meaningless Oct 26, 2023
@sleepwalker2017 sleepwalker2017 changed the title kv-int8 output result seems meaningless kv-int8 output wrong result Oct 26, 2023
@byshiue
Copy link
Collaborator

byshiue commented Oct 26, 2023

How do you get the scales for kv cache? It is sensitive in int8.

@byshiue byshiue self-assigned this Oct 26, 2023
@byshiue byshiue added the triaged Issue has been triaged by maintainers label Oct 26, 2023
@sleepwalker2017
Copy link
Author

in

seems I didn't do extra things to get scales.
Is that any documents about this?

@byshiue
Copy link
Collaborator

byshiue commented Oct 26, 2023

@juney-nvidia
Copy link
Collaborator

@sleepwalker2017 Hi, is there anything new after following @byshiue's suggestion? Or can we close this issue?

@sleepwalker2017
Copy link
Author

sleepwalker2017 commented Oct 29, 2023 via email

@jdemouth-nvidia
Copy link
Collaborator

Thanks. I'm closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

5 participants
@byshiue @jdemouth-nvidia @sleepwalker2017 @juney-nvidia and others