Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code search: use code search api #2476

Merged
merged 6 commits into from
Jun 6, 2023
Merged

Conversation

norascheuch
Copy link
Contributor

@norascheuch norascheuch commented Jun 2, 2023

Instead of using repo search we want to use the code search api for this feature. Since the search api returns an error if two language parameters are added we made the entry of a language through the UI optional.

Items:

  • uses correct api
  • implements throttling
  • makes 'no language' available

Checklist

  • CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
  • Issues have been created for any UI or other user-facing changes made by this pull request.
  • [Maintainers only] If this pull request makes user-facing changes that require documentation changes, open a corresponding docs pull request in the github/codeql repo and add the ready-for-doc-review label there.

@norascheuch norascheuch marked this pull request as ready for review June 2, 2023 13:00
@norascheuch norascheuch requested a review from a team as a code owner June 2, 2023 13:00
Comment on lines 25 to 46
const MyOctokit = Octokit.plugin(throttling);
const auth = await credentials.getAccessToken();

const octokit = new MyOctokit({
auth,
throttle: {
onRateLimit: (retryAfter: number, options: any): boolean => {
void showAndLogWarningMessage(
`Request quota exhausted for request ${options.method} ${options.url}. Retrying after ${retryAfter} seconds!`,
);

return true;
},
onSecondaryRateLimit: (_retryAfter: number, options: any): void => {
void showAndLogWarningMessage(
`SecondaryRateLimit detected for request ${options.method} ${options.url}`,
);
},
},
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curiously I never hit the secondary Rate Limit during testing. However, the RateLimit was hit. Even though throttling was used the maximum entries in one list I ever got with one search was 901.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though throttling was used the maximum entries in one list I ever got with one search was 901.

Is this because of throttling, or because of duplicate repos in the search results? If you didn't hit the secondary rate limit then it means you should have results from all 10 pages, but it's just the total number of unique repositories was 901.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be this certain search query. I tested with some other queries and had different results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was told the search api returns different results when sending the same request multiple times, this doesn't seem to be true for me right now..

@norascheuch norascheuch force-pushed the nora/use-code-search-api branch from 013a691 to 92c86bc Compare June 2, 2023 13:04
Copy link
Contributor

@robertbrignull robertbrignull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts on wording of things, but I've tried this out and it worked for me.

I don't know too much about this throttling plugin or how the github rate limit behaves, so I can't comment too much on that part right now. If you'd like a deeper review of that part just let me know.

I also found the progress reporting a bit confusing, but it seemed to work well enough in practice when I ran some searches.

const auth = await credentials.getAccessToken();

const octokit = new MyOctokit({
auth,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To copy more closely what we do at

return new Octokit.Octokit({
auth: accessToken,
retry,
});
, this should ideally include the retry plugin too. Or is that intentionally missed out? The idea of this plugin is that'll automatically retry requests in cases of a 500 error.

Suggested change
auth,
auth,
retry,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the retry plugin here, as the throttle plugin is taking over retrying?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they deal with different concerns. The retry plugin will retry in response to 5xx errors. The throttle plugin will retry in response to 429 (Too many requests) errors and understands GitHub's specific messages.

But seems that in the latest version of provideOctokitWithThrottling the retry plugin is included, so that all looks good to me.

},
onSecondaryRateLimit: (_retryAfter: number, options: any): void => {
void showAndLogWarningMessage(
`SecondaryRateLimit detected for request ${options.method} ${options.url}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`SecondaryRateLimit detected for request ${options.method} ${options.url}`,
`Request quota exceeded for request ${options.method} ${options.url}`.,

Just a suggestion to make this error fit with the other one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to distinguish between the two, so that users know what they are hitting specifically.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree users should be able to distinguish and understand them. I was meaning more that we could make the language style more consistent between the two error messages. For example:

  • In one we say "quota" and in the other we say "limit". Is there a technical different between these I'm not aware of, and will the user know what they mean?
  • In one we use full sentences, and in the other we use CamelCase.

However, this is something that can easily be changed afterwards, if time is tight. So it's ok to defer this to a later PR if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this from the documentation and it copies also a script we have somewhere else in the repo. However you're right with the CamelCase, that's just weird. I fixed the CC and adjusted the wording!

throttle: {
onRateLimit: (retryAfter: number, options: any): boolean => {
void showAndLogWarningMessage(
`Request quota exhausted for request ${options.method} ${options.url}. Retrying after ${retryAfter} seconds!`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2023-06-02 at 14 33 51

I'm not sure why it happened but I got this error once. It does look odd to say "after 0 seconds". What does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time it retried after 42 seconds for me, after that 0. I don't know why it retries after 0 seconds, this behaviour comes from the plugin. I also found retries after 10, 20 or 7 seconds in the logs, but we would have to check the code of the plugin to find out more.

Comment on lines 25 to 46
const MyOctokit = Octokit.plugin(throttling);
const auth = await credentials.getAccessToken();

const octokit = new MyOctokit({
auth,
throttle: {
onRateLimit: (retryAfter: number, options: any): boolean => {
void showAndLogWarningMessage(
`Request quota exhausted for request ${options.method} ${options.url}. Retrying after ${retryAfter} seconds!`,
);

return true;
},
onSecondaryRateLimit: (_retryAfter: number, options: any): void => {
void showAndLogWarningMessage(
`SecondaryRateLimit detected for request ${options.method} ${options.url}`,
);
},
},
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though throttling was used the maximum entries in one list I ever got with one search was 901.

Is this because of throttling, or because of duplicate repos in the search results? If you didn't hit the secondary rate limit then it means you should have results from all 10 pages, but it's just the total number of unique repositories was 901.

@norascheuch
Copy link
Contributor Author

@robertbrignull Reviewing this code I noticed that the gh-api-client file where the getCodeSearchRepositories method is written in is in the variant-analysis folder. It seems to me that code search belongs more to databases and database lists than variant-analysis per-se. I think it makes more sense to move both getCodeSearchRepositories and getOctokitForSearch to it's own file inside the extensions/ql-vscode/src/databases/-folder.

@charisk
Copy link
Contributor

charisk commented Jun 5, 2023

@robertbrignull Reviewing this code I noticed that the gh-api-client file where the getCodeSearchRepositories method is written in is in the variant-analysis folder. It seems to me that code search belongs more to databases and database lists than variant-analysis per-se. I think it makes more sense to move both getCodeSearchRepositories and getOctokitForSearch to it's own file inside the extensions/ql-vscode/src/databases/-folder.

I think this refactoring makes sense - having the API client related code into one file inside /databases seems the right thing to do in terms of code structure and it helps with testing.

Whether that refactoring is done in this PR or a separate one is something that I'll leave to you and @robertbrignull to decide though 😃

@norascheuch norascheuch force-pushed the nora/use-code-search-api branch from bbc6595 to 2ecec54 Compare June 5, 2023 15:42
Copy link
Contributor

@robertbrignull robertbrignull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, especially given that this is now behind a feature flag.

I did notice a couple of tiny bits we can improve regarding the cancellation token, but we can discuss those outside of this PR. I'll raise issues or discuss them with you.

Copy link
Contributor

@robertbrignull robertbrignull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, especially given that this is now behind a feature flag.

I did notice a couple of tiny bits we can improve regarding the cancellation token, but we can discuss those outside of this PR. I'll raise issues or discuss them with you.

@norascheuch norascheuch force-pushed the nora/use-code-search-api branch from 2ecec54 to 945594d Compare June 5, 2023 15:55
@norascheuch norascheuch merged commit a7a24fc into main Jun 6, 2023
@norascheuch norascheuch deleted the nora/use-code-search-api branch June 6, 2023 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants