Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ModelRunner] Fix stop and bad words list contiguous for offsets #1815

Closed

Conversation

Marks101
Copy link
Contributor

In our regression tests of the ModelRunner we noticed that in the current main branch (Jun 18, 2024) the stop_words_list feature does not work properly for batch_size > 1. The issue seems to be that the token arrays are not contiguously layed out in memory due to the transpose that is done in this line:

return np.array([flat_ids, offsets], dtype="int32").transpose((1, 0, 2))

This makes the array offsets that are created invalid.

In examples/run.py this features was deactivated for a long time, but it seems that originally the contiguous was implemented here:

# stop_words_list = torch.Tensor(stop_words_list).to(torch.int32).to("cuda").contiguous()

Thanks for taking a look at this

@byshiue byshiue self-assigned this Jun 23, 2024
@byshiue byshiue added the triaged Issue has been triaged by maintainers label Jun 23, 2024
@MartinMarciniszyn
Copy link
Collaborator

@Funatiq , could you please merge this into the main branch?

@nv-guomingz
Copy link
Collaborator

@Funatiq , could you please merge this into the main branch?

@byshiue already merged this PR into internal code base this morning.

@kaiyux kaiyux mentioned this pull request Jul 4, 2024
@nv-guomingz
Copy link
Collaborator

@Marks101 thanks for your contribution to TRT-LLM, this MR has been merged into upstream now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged triaged Issue has been triaged by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants