-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use streaming when creating log symbols file. #2858
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me. Do we have any tests for SplitBuffer
?
* https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/startsWith | ||
* which is CC0/public domain | ||
* | ||
* See https://github.com/github/vscode-codeql/issues/802 for more context as to why we need it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the upstream issues have been fixed. Maybe we can remove this workaround. (Not needed for this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything in this file that has changed when you extracted it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, other than the casing of LINE_ENDINGS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and the bug I discovered when you made me write a unit test:)
Internal users were seeing frequent crashes complaining about trying to create a string that was too long to fit in memory. This was happening when we parse the human-readable log to generate the symbols that map predicates to their human-readable RA. We were simply reading the entire human-readable log into memory at once, and these can be extremely large for complex queries.
The fix was to just use a streaming reader, which required slightly rearranging the code we use to parse the lines coming out of the reader.
I also moved some existing code for splitting a stream at line break boundaries into its own source file, so we could consume it from the new code.