Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a language label next to databases in the UI #697

Merged
merged 3 commits into from
Dec 4, 2020

Conversation

aeisenberg
Copy link
Contributor

@aeisenberg aeisenberg commented Dec 3, 2020

This change will only work on databases created by cli >= 2.4.1. In that
version, a new primaryLanguage field in the codeql-database.yml
file. We use this property as the language.

This change also includes a refactoring of the logic around extracting
database information heuristically based on file location. As much
as possible, it is extracted to the helpers module. Also, the
initial quick query text is generated based on the language (if known)
otherwise it falls back to the old style of generation.

Fixes #321

Checklist

  • CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
  • Issues have been created for any UI or other user-facing changes made by this pull request.
  • [n/a] @github/docs-content-dsp has been cc'd in all issues for UI or other user-facing changes made by this pull request.

Copy link
Contributor

@adityasharad adityasharad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring looks good. I suggest getting the language from a CLI command instead of the YAML.

'semmlecode.javascript.dbscheme': 'javascript',
'semmlecode.cpp.dbscheme': 'cpp',
'semmlecode.dbscheme': 'java',
'semmlecode.python.dbscheme': 'pyhton',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'semmlecode.python.dbscheme': 'pyhton',
'semmlecode.python.dbscheme': 'python',

return !!path.basename(dbPath).startsWith('db-');
}

export async function getPrimaryLanguage(root: string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of parsing the config file, make a call to codeql resolve database --format json and get the first value in the languages field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Especially because the open-sourced VSCode extension is our showcase for the Right way to do things.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will also mean that the extension will not need its own dbSchemeToLanguage lookup table, since the codeql resolve database now already tries a dbscheme-based heuristic for language detection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I chose not to do that is that currently, there is no cli available on openDatabase. Also, making a filesystem check will be faster than starting up a jvm to get this information.

I could certainly stuff a reference to the cli into the constructor. But I'd prefer not to do that for the reasons above unless the approach I'm taking would produce incorrect results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lean towards considering "people who write their own IDE integrations for CodeQL think this is the right thing to do, and proceed to bake half-understood internal knowledge of database internals into their own work" to be an incorrect result. :-)

On the other hand, I don't have the insight to appreciate the possible problems in dealing with "there is no cli available on openDatabase". Why is that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cli is a singleton that gets passed around when needed. The DatabaseManager does not hold a reference to the cli. I would need to change that in order to call a cli command here. It's not a big change, but it is an architectural one.

Also, I still need dbSchemeToLanguage as a fallback in case the database was not created by a modern CLI. I will add a comment around using that so it's known that this is not recommended.

So, the question is, should I be getting the database label from the codeql-database.yml or from database resolve? I'm not sure what your comment was on this or about the dbSchemeToLanguage.

@aeisenberg aeisenberg force-pushed the aeisenberg/lang-label branch 2 times, most recently from 22afde9 to 7d86f1b Compare December 3, 2020 05:16
@hmakholm
Copy link
Contributor

hmakholm commented Dec 3, 2020

Also, I still need dbSchemeToLanguage as a fallback in case the database was not created by a modern CLI. I will add a comment around using that so it's known that this is not recommended.

If the CLI is new enough to return languages from resolve database at all, it will already do its own dbscheme-based heuristics when it's asked about old databases.

So, the question is, should I be getting the database label from the codeql-database.yml or from database resolve? I'm not sure what your comment was on this or about the dbSchemeToLanguage.

I think the extension should rely exclusively on codeql resolve database. I would really want to avoid exposing hacky heuristics in the open-sourced extension. That's why I suggested adding a feature to resolve database in the first place.

@aeisenberg
Copy link
Contributor Author

aeisenberg commented Dec 3, 2020

Two questions:

  1. What happens if the CLI is not new enough to return a languages field? That's one scenario I am trying to handle. I guess in this case we could just avoid returning anything and this would be an ecouragement to migrate to a newer version. But we still need to use dbscheme heuristics for generating the import statements of quick queries
  2. Are you saying that even if there is no primaryLanguage field in the codeql-database.yml, the database resolve command will still return the languages list?

EDIT: Looked at the code again and the answer to (2) seems to be "yes".

@hmakholm
Copy link
Contributor

hmakholm commented Dec 3, 2020

Hmmyes, an existing quick-query feature of course needs to keep working.

I was thinking more in terms of the new feature with showing language tags for databases in the UI. Here I think it is reasonably to expect that you need both a new extension and a CLI that supports the feature.

This change will only work on databases created by cli >= 2.4.1. In that
version, a new `primaryLanguage` field in the `codeql-database.yml`
file. We use this property as the language.

This change also includes a refactoring of the logic around extracting
database information heuristically based on file location. As much
as possible, it is extracted to the `helpers` module. Also, the
initial quick query text is generated based on the language (if known)
otherwise it falls back to the old style of generation.
This commit moves to using codeql resolve database instead of inspecting
the `codeql-database.yml` file.

When the extension starts and if the cli supports it, the extension will
attempt to get the name for any databases that don't yet have a name.
Once a name is searched for once by the cli, it will be cached so we
don't need to rediscover the name again.
@aeisenberg aeisenberg force-pushed the aeisenberg/lang-label branch from c75044f to dfbcf81 Compare December 3, 2020 23:56
@aeisenberg
Copy link
Contributor Author

I think this addresses all the concerns. Look at the second commit for the changes. I rebased to remove merge conflicts.

Copy link
Contributor

@hmakholm hmakholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel qualified to review the TS part, but I'm happy with the general approach to finding the language now.

Comment on lines 469 to 471
* CLI versions before the langauge name was introduced to dbInfo. Implementations
* that do not require backwards compatibility should call
* `cli.CodeQLCliServer.resolveDatabase` and use the first entry in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementations sounds like an "off" word to use here. Applications? Features?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features sounds good.

Copy link
Contributor

@adityasharad adityasharad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor questions.

@@ -123,6 +124,11 @@ export class CodeQLCliServer implements Disposable {
*/
private static CLI_VERSION_WITH_DECOMPILE_KIND_DIL = new SemVer('2.3.0');

/**
* CLI version where --kind=DIL was introduced
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste comment.

return (await this.getVersion()).compare(CodeQLCliServer.CLI_VERSION_WITH_DECOMPILE_KIND_DIL) >= 0;
}

public async supportsLangaugeName() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

extensions/ql-vscode/src/cli.ts Show resolved Hide resolved

private async getPrimaryLanguage(dbPath: string) {
if (!(await this.cli.supportsLangaugeName())) {
// return undefined so that we continually recalculate until the cli version is bumped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by recalculate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll explain better.

* Returns the initial contents for an empty query, based on the language of the selected
* databse.
*
* First try to get the contents text based on language. if that fails, try to get based on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* First try to get the contents text based on language. if that fails, try to get based on
* First try to use the given language name. If that doesn't exist, try to infer it based on

Comment on lines 469 to 471
* CLI versions before the langauge name was introduced to dbInfo. Implementations
* that do not require backwards compatibility should call
* `cli.CodeQLCliServer.resolveDatabase` and use the first entry in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features sounds good.

extensions/ql-vscode/src/databases.ts Show resolved Hide resolved
@aeisenberg aeisenberg merged commit 06a1fd9 into github:main Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a marker of the target language of databases in the tree view
3 participants