Code search: use code search api #2476

norascheuch · 2023-06-02T12:29:40Z

Instead of using repo search we want to use the code search api for this feature. Since the search api returns an error if two language parameters are added we made the entry of a language through the UI optional.

Items:

uses correct api
implements throttling
makes 'no language' available

Checklist

CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
Issues have been created for any UI or other user-facing changes made by this pull request.
[Maintainers only] If this pull request makes user-facing changes that require documentation changes, open a corresponding docs pull request in the github/codeql repo and add the ready-for-doc-review label there.

norascheuch · 2023-06-02T13:03:05Z

extensions/ql-vscode/src/variant-analysis/gh-api/gh-api-client.ts

+  const MyOctokit = Octokit.plugin(throttling);
+  const auth = await credentials.getAccessToken();
+
+  const octokit = new MyOctokit({
+    auth,
+    throttle: {
+      onRateLimit: (retryAfter: number, options: any): boolean => {
+        void showAndLogWarningMessage(
+          `Request quota exhausted for request ${options.method} ${options.url}. Retrying after ${retryAfter} seconds!`,
+        );
+
+        return true;
+      },
+      onSecondaryRateLimit: (_retryAfter: number, options: any): void => {
+        void showAndLogWarningMessage(
+          `SecondaryRateLimit detected for request ${options.method} ${options.url}`,
+        );
+      },
+    },
+  });
+


Curiously I never hit the secondary Rate Limit during testing. However, the RateLimit was hit. Even though throttling was used the maximum entries in one list I ever got with one search was 901.

Even though throttling was used the maximum entries in one list I ever got with one search was 901.

Is this because of throttling, or because of duplicate repos in the search results? If you didn't hit the secondary rate limit then it means you should have results from all 10 pages, but it's just the total number of unique repositories was 901.

It seems to be this certain search query. I tested with some other queries and had different results.

I was told the search api returns different results when sending the same request multiple times, this doesn't seem to be true for me right now..

robertbrignull

Some thoughts on wording of things, but I've tried this out and it worked for me.

I don't know too much about this throttling plugin or how the github rate limit behaves, so I can't comment too much on that part right now. If you'd like a deeper review of that part just let me know.

I also found the progress reporting a bit confusing, but it seemed to work well enough in practice when I ran some searches.

robertbrignull · 2023-06-02T13:40:20Z

extensions/ql-vscode/src/variant-analysis/gh-api/gh-api-client.ts

+  const auth = await credentials.getAccessToken();
+
+  const octokit = new MyOctokit({
+    auth,


To copy more closely what we do at

vscode-codeql/extensions/ql-vscode/src/common/vscode/authentication.ts

Lines 35 to 38 in 134c440

return new Octokit.Octokit({

auth: accessToken,

retry,

});

, this should ideally include the retry plugin too. Or is that intentionally missed out? The idea of this plugin is that'll automatically retry requests in cases of a 500 error.

Suggested change

auth,

auth,

retry,

I don't think we need the retry plugin here, as the throttle plugin is taking over retrying?

I believe they deal with different concerns. The retry plugin will retry in response to 5xx errors. The throttle plugin will retry in response to 429 (Too many requests) errors and understands GitHub's specific messages.

But seems that in the latest version of provideOctokitWithThrottling the retry plugin is included, so that all looks good to me.

robertbrignull · 2023-06-02T13:43:42Z

extensions/ql-vscode/src/variant-analysis/gh-api/gh-api-client.ts

+      },
+      onSecondaryRateLimit: (_retryAfter: number, options: any): void => {
+        void showAndLogWarningMessage(
+          `SecondaryRateLimit detected for request ${options.method} ${options.url}`,


Suggested change

`SecondaryRateLimit detected for request ${options.method} ${options.url}`,

`Request quota exceeded for request ${options.method} ${options.url}`.,

Just a suggestion to make this error fit with the other one.

I think it makes sense to distinguish between the two, so that users know what they are hitting specifically.

I agree users should be able to distinguish and understand them. I was meaning more that we could make the language style more consistent between the two error messages. For example:

In one we say "quota" and in the other we say "limit". Is there a technical different between these I'm not aware of, and will the user know what they mean?

In one we use full sentences, and in the other we use CamelCase.

However, this is something that can easily be changed afterwards, if time is tight. So it's ok to defer this to a later PR if needed.

I took this from the documentation and it copies also a script we have somewhere else in the repo. However you're right with the CamelCase, that's just weird. I fixed the CC and adjusted the wording!

robertbrignull · 2023-06-02T13:45:34Z

extensions/ql-vscode/src/variant-analysis/gh-api/gh-api-client.ts

+    throttle: {
+      onRateLimit: (retryAfter: number, options: any): boolean => {
+        void showAndLogWarningMessage(
+          `Request quota exhausted for request ${options.method} ${options.url}. Retrying after ${retryAfter} seconds!`,


I'm not sure why it happened but I got this error once. It does look odd to say "after 0 seconds". What does this mean?

The first time it retried after 42 seconds for me, after that 0. I don't know why it retries after 0 seconds, this behaviour comes from the plugin. I also found retries after 10, 20 or 7 seconds in the logs, but we would have to check the code of the plugin to find out more.

robertbrignull · 2023-06-02T13:46:41Z

extensions/ql-vscode/src/variant-analysis/gh-api/gh-api-client.ts

+  const MyOctokit = Octokit.plugin(throttling);
+  const auth = await credentials.getAccessToken();
+
+  const octokit = new MyOctokit({
+    auth,
+    throttle: {
+      onRateLimit: (retryAfter: number, options: any): boolean => {
+        void showAndLogWarningMessage(
+          `Request quota exhausted for request ${options.method} ${options.url}. Retrying after ${retryAfter} seconds!`,
+        );
+
+        return true;
+      },
+      onSecondaryRateLimit: (_retryAfter: number, options: any): void => {
+        void showAndLogWarningMessage(
+          `SecondaryRateLimit detected for request ${options.method} ${options.url}`,
+        );
+      },
+    },
+  });
+


Even though throttling was used the maximum entries in one list I ever got with one search was 901.

Is this because of throttling, or because of duplicate repos in the search results? If you didn't hit the secondary rate limit then it means you should have results from all 10 pages, but it's just the total number of unique repositories was 901.

norascheuch · 2023-06-05T09:05:50Z

@robertbrignull Reviewing this code I noticed that the gh-api-client file where the getCodeSearchRepositories method is written in is in the variant-analysis folder. It seems to me that code search belongs more to databases and database lists than variant-analysis per-se. I think it makes more sense to move both getCodeSearchRepositories and getOctokitForSearch to it's own file inside the extensions/ql-vscode/src/databases/-folder.

charisk · 2023-06-05T09:18:44Z

@robertbrignull Reviewing this code I noticed that the gh-api-client file where the getCodeSearchRepositories method is written in is in the variant-analysis folder. It seems to me that code search belongs more to databases and database lists than variant-analysis per-se. I think it makes more sense to move both getCodeSearchRepositories and getOctokitForSearch to it's own file inside the extensions/ql-vscode/src/databases/-folder.

I think this refactoring makes sense - having the API client related code into one file inside /databases seems the right thing to do in terms of code structure and it helps with testing.

Whether that refactoring is done in this PR or a separate one is something that I'll leave to you and @robertbrignull to decide though 😃

robertbrignull

LGTM, especially given that this is now behind a feature flag.

I did notice a couple of tiny bits we can improve regarding the cancellation token, but we can discuss those outside of this PR. I'll raise issues or discuss them with you.

robertbrignull

LGTM, especially given that this is now behind a feature flag.

I did notice a couple of tiny bits we can improve regarding the cancellation token, but we can discuss those outside of this PR. I'll raise issues or discuss them with you.

norascheuch marked this pull request as ready for review June 2, 2023 13:00

norascheuch requested a review from a team as a code owner June 2, 2023 13:00

norascheuch commented Jun 2, 2023

View reviewed changes

norascheuch force-pushed the nora/use-code-search-api branch from 013a691 to 92c86bc Compare June 2, 2023 13:04

robertbrignull reviewed Jun 2, 2023

View reviewed changes

robertbrignull mentioned this pull request Jun 5, 2023

Protect codeSearch feature for now #2478

Merged

3 tasks

norascheuch requested a review from robertbrignull June 5, 2023 13:36

norascheuch added 5 commits June 5, 2023 15:42

Add no-language option

6da1f93

Implement throttling

5467c50

Use showAndLogWarning to surface rateLimit

fef2880

Move octokit initialization to db panel to send log messages to the user

b470061

Move code search api call to its own file

876b92a

norascheuch force-pushed the nora/use-code-search-api branch from bbc6595 to 2ecec54 Compare June 5, 2023 15:42

robertbrignull approved these changes Jun 5, 2023

View reviewed changes

Adjust error response wording

945594d

norascheuch force-pushed the nora/use-code-search-api branch from 2ecec54 to 945594d Compare June 5, 2023 15:55

norascheuch merged commit a7a24fc into main Jun 6, 2023

norascheuch deleted the nora/use-code-search-api branch June 6, 2023 08:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code search: use code search api #2476

Code search: use code search api #2476

norascheuch commented Jun 2, 2023 •

edited

Loading

norascheuch Jun 2, 2023

robertbrignull Jun 2, 2023

norascheuch Jun 2, 2023

norascheuch Jun 2, 2023

robertbrignull left a comment

robertbrignull Jun 2, 2023

norascheuch Jun 2, 2023

robertbrignull Jun 5, 2023

robertbrignull Jun 2, 2023

norascheuch Jun 2, 2023

robertbrignull Jun 5, 2023

norascheuch Jun 5, 2023

robertbrignull Jun 2, 2023

norascheuch Jun 2, 2023

robertbrignull Jun 2, 2023

norascheuch commented Jun 5, 2023

charisk commented Jun 5, 2023

robertbrignull left a comment

robertbrignull left a comment

	`SecondaryRateLimit detected for request ${options.method} ${options.url}`,
	`Request quota exceeded for request ${options.method} ${options.url}`.,

Code search: use code search api #2476

Code search: use code search api #2476

Conversation

norascheuch commented Jun 2, 2023 • edited Loading

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robertbrignull left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

norascheuch commented Jun 5, 2023

charisk commented Jun 5, 2023

robertbrignull left a comment

Choose a reason for hiding this comment

robertbrignull left a comment

Choose a reason for hiding this comment

norascheuch commented Jun 2, 2023 •

edited

Loading