Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use streaming when creating log symbols file. #2858

Merged
merged 2 commits into from
Sep 27, 2023
Merged

Conversation

dbartol
Copy link
Contributor

@dbartol dbartol commented Sep 25, 2023

Internal users were seeing frequent crashes complaining about trying to create a string that was too long to fit in memory. This was happening when we parse the human-readable log to generate the symbols that map predicates to their human-readable RA. We were simply reading the entire human-readable log into memory at once, and these can be extremely large for complex queries.

The fix was to just use a streaming reader, which required slightly rearranging the code we use to parse the lines coming out of the reader.

I also moved some existing code for splitting a stream at line break boundaries into its own source file, so we could consume it from the new code.

@dbartol dbartol requested a review from a team as a code owner September 25, 2023 18:28
@dbartol dbartol marked this pull request as draft September 25, 2023 18:28
@dbartol dbartol added the Complexity: Low A good task for newcomers to learn, or experienced team members to complete quickly. label Sep 25, 2023
Copy link
Contributor

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me. Do we have any tests for SplitBuffer?

* https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/startsWith
* which is CC0/public domain
*
* See https://github.com/github/vscode-codeql/issues/802 for more context as to why we need it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the upstream issues have been fixed. Maybe we can remove this workaround. (Not needed for this PR.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything in this file that has changed when you extracted it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, other than the casing of LINE_ENDINGS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and the bug I discovered when you made me write a unit test:)

@dbartol dbartol marked this pull request as ready for review September 26, 2023 21:56
@dbartol dbartol requested a review from angelapwen September 26, 2023 21:56
@dbartol dbartol merged commit f1533dd into main Sep 27, 2023
@dbartol dbartol deleted the dbartol/long-strings branch September 27, 2023 13:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complexity: Low A good task for newcomers to learn, or experienced team members to complete quickly.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants