You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Different versions of InternVL2 model use different LLM architectures. For example according to model card the 2B model uses InternLM2 while the 4B model uses Phi3 as its LLM. The multimodal_model_runner() preprocesses and formats the prompt the same way for all version of InternVL2here. It appears it uses the InternLM prompt formatting while the 4B model for example will have a Phi3 as its language engine.
If we want to use versions of InternVL that use an LLM architecture other than InternLM, should we modify the prompt formatting to match the format that specific LLM expects? For example when using the 4B model should we format the prompt as Phi3 expects? Or we will be okay using multimodal_model_runner() for all versions of InternVL without changing prompt formatting? The reason I am asking is because I'm observing very low accuracy for InternVL that varies considerably between different runs.
Also, to use mutlimodal_model_runner() for InternVL2.5 should we make any specific changes?
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
Follow the instructions here to generate the engine files for InternVL2-2B or InternVL2-4B then use the multimodal_model_runner() to run evaluation on the model using your dataset. Measure accuracy, latency and throughput if you want.
Expected behavior
See similar evaluation accuracy for different version of InternVL when comparing the PyTorch model to trt model.
actual behavior
Right now the trt evaluation accuracy is much lower. Around 60% for PyTorch model and below 20% for trt model.
additional notes
I am using InternVL2 for VQA and the dataset I'm using is A-OKVQA.
The text was updated successfully, but these errors were encountered:
System Info
Hi,
Different versions of InternVL2 model use different LLM architectures. For example according to model card the 2B model uses
InternLM2
while the 4B model usesPhi3
as its LLM. Themultimodal_model_runner()
preprocesses and formats the prompt the same way for all version ofInternVL2
here. It appears it uses theInternLM
prompt formatting while the 4B model for example will have a Phi3 as its language engine.If we want to use versions of
InternVL
that use an LLM architecture other thanInternLM
, should we modify the prompt formatting to match the format that specific LLM expects? For example when using the 4B model should we format the prompt as Phi3 expects? Or we will be okay usingmultimodal_model_runner()
for all versions ofInternVL
without changing prompt formatting? The reason I am asking is because I'm observing very low accuracy forInternVL
that varies considerably between different runs.Also, to use
mutlimodal_model_runner()
forInternVL2.5
should we make any specific changes?Who can help?
@sunnyqgg
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Follow the instructions here to generate the engine files for
InternVL2-2B
orInternVL2-4B
then use themultimodal_model_runner()
to run evaluation on the model using your dataset. Measure accuracy, latency and throughput if you want.Expected behavior
See similar evaluation accuracy for different version of InternVL when comparing the PyTorch model to trt model.
actual behavior
Right now the trt evaluation accuracy is much lower. Around 60% for PyTorch model and below 20% for trt model.
additional notes
I am using
InternVL2
for VQA and the dataset I'm using isA-OKVQA
.The text was updated successfully, but these errors were encountered: