Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add telemetry for commands #611

Merged
merged 4 commits into from
Jan 29, 2021
Merged

Conversation

aeisenberg
Copy link
Contributor

@aeisenberg aeisenberg commented Oct 6, 2020

This commit adds telemetry capturing for command execution. The data captured is only the command id.

We also capture errors thrown by any command execution.

This adds telemetry to an Azure application insights instance. Talk to me if you want access to it.

We still need a way to notify users that telemetry is being collected and a way to disable it.

Also:

  • Extract commandRunner to its own file
  • Add an option to log all telemetry commands to the console
  • Add a popup (text to be determined) that requires users to accept
    telemetry before using.
  • Convert telemetry to opt-in instead of opt-out
  • Add a "canary" setting that users can use to opt-in to unreleased
    features.

Checklist

  • CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
  • Issues have been created for any UI or other user-facing changes made by this pull request.
  • @github/docs-content-dsp has been cc'd in all issues for UI or other user-facing changes made by this pull request.

@aeisenberg aeisenberg marked this pull request as draft October 6, 2020 18:25
@aeisenberg aeisenberg force-pushed the aeisenberg/telemetry branch 6 times, most recently from 17ad2a3 to 365b35b Compare October 14, 2020 15:24
@aeisenberg aeisenberg force-pushed the aeisenberg/telemetry branch 5 times, most recently from 1fc7aaf to dc528bc Compare October 15, 2020 21:29
@aeisenberg
Copy link
Contributor Author

Ping @github/docs-content-dsp!

We are going to need to create some documentation about what we are and what we are not collecting for telemetry. @sj is working on some of the wording. See our internal issue for more information on this.

@hubot
Copy link

hubot commented Oct 15, 2020

:octocat:📚 Thanks for the docs ping! 🛎️ This was added to our docs first-responder project board. A team member will be along shortly to review this for docs impact, but you can also open a docs issue to request docs updates.

@rachmari
Copy link

@aeisenberg docs first responder here. 👋 I've opened a tracking issue for add the documentation page that you mentioned above: https://github.com/github/docs-content/issues/3125

@aeisenberg aeisenberg added the Complexity: High Requires careful design or review. label Oct 23, 2020
@aeisenberg aeisenberg force-pushed the aeisenberg/telemetry branch 2 times, most recently from 01c789d to a62f6ef Compare October 26, 2020 18:05
@aeisenberg aeisenberg marked this pull request as ready for review October 26, 2020 18:06
@aeisenberg aeisenberg requested review from sj and adityasharad October 26, 2020 18:06
@aeisenberg
Copy link
Contributor Author

@sj could you take a look at the TELEMETRY.md file?

And at some point, I can walk you through what the UX is going to be like.

@aeisenberg aeisenberg force-pushed the aeisenberg/telemetry branch 7 times, most recently from 79c6f81 to 497e492 Compare October 30, 2020 20:39
@aeisenberg aeisenberg force-pushed the aeisenberg/telemetry branch 2 times, most recently from 531e8d8 to 29d8817 Compare November 26, 2020 18:20
@aeisenberg aeisenberg force-pushed the aeisenberg/telemetry branch 2 times, most recently from a8a28a4 to 89a809f Compare December 14, 2020 19:56
@aeisenberg
Copy link
Contributor Author

@sj We are hoping to release this shortly. Could you take a look at the TELEMETRY.md file as well as the changes to README.md and make sure they are sufficient from your perspective?

@aeisenberg aeisenberg force-pushed the aeisenberg/telemetry branch 3 times, most recently from 426600c to e2379bb Compare January 25, 2021 21:59
Copy link
Contributor

@sj sj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks 👍 with a few suggestions

TELEMETRY.md Outdated
@@ -0,0 +1,43 @@
# Telemetry in the CodeQL extension for VS Code

If you specifically opt-in to permit GitHub to do so, GitHub will collect usage data and metrics for the purposes of helping the core developers to improve the CodeQL extension for VS Code. This data will not be shared with any parties outside of GitHub and will be retained for a maximum of 30 days.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we incorporate the fact that this is anonymous data? E.g.:

  • "... collect anonymized usage data ..."
  • "This anonymized data will ..."?

Or is there a legal reason why we can't?

TELEMETRY.md Outdated

## Why do you collect data?

GitHub collects usage data and metrics to help us improve CodeQL for VS Code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if possible to include "anonymized" in here that'd be great 🙂

TELEMETRY.md Outdated Show resolved Hide resolved
TELEMETRY.md Outdated

GitHub collects anonymized information related to the usage of the extensions. The data collected are:

- Which commands are run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth clarifying "commands" here, just to avoid folks thinking that we're collecting which queries they're running. Any security researcher thinking we snoop on their queries will immediately stop using CodeQL.

My suggestion:
"Which CodeQL-related VS Code commands are run" (which links to: https://code.visualstudio.com/docs/getstarted/tips-and-tricks#_command-palette)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

TELEMETRY.md Outdated Show resolved Hide resolved
TELEMETRY.md Outdated

## How long will data be retained?

IP address and GUIDs will be retained for a maximum of 30 days. Aggregated data of command identifiers and run times will be retained for a maximum of 180 days.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not consistent with the text above:

This data will not be shared with any parties outside of GitHub and will be retained for a maximum of 30 days.

TELEMETRY.md Outdated Show resolved Hide resolved
TELEMETRY.md Outdated

## How do I disable telemetry reporting?

You can disable telemetry reporting by setting `codeQL.telemetry.enableTelemetry` to `false` in [your settings](https://code.visualstudio.com/docs/getstarted/settings#_settings-editor).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is worth repeating again that telemetry reporting is disabled until and unless the user explicitly agrees with it.


## Data and Telemetry

If you specifically opt-in to permit GitHub to do so, GitHub will collect usage data and metrics for the purposes of helping the core developers to improve the CodeQL extension for VS Code. This data will not be shared with any parties outside of GitHub and will be retained for a maximum of 30 days. Please see [telemetry](TELEMENTRY.md) for more information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and will be retained for a maximum of 30 days

There's a bit of an inconsistency here (see my other comment).


## Data and Telemetry

If you specifically opt-in to permit GitHub to do so, GitHub will collect usage data and metrics for the purposes of helping the core developers to improve the CodeQL extension for VS Code. This data will not be shared with any parties outside of GitHub and will be retained for a maximum of 30 days. Please see [telemetry](TELEMENTRY.md) for more information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my other comments, it'd be great if we could work in the word "anonymized" a few times 🙂

@aeisenberg
Copy link
Contributor Author

Thanks for your comments, @sj. I think I addressed them all. The wording is more complex now, but also more precise sine I need to distinguish between the anonymous and non-anonymized data.

@sj
Copy link
Contributor

sj commented Jan 26, 2021

TELEMETRY.md and README.md look great, but I'm wondering whether one of your commits added a whole bunch of other unrelated changes to this PR. I'll leave that up to @adityasharad.

@aeisenberg
Copy link
Contributor Author

Thanks @sj! Yes, this PR includes all of the telemetry code for the extension.

@aeisenberg
Copy link
Contributor Author

Also, the failing Release workflow is expected since it requires secrets specified in the main repository (not my fork).

Copy link
Contributor

@adityasharad adityasharad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine, minor suggestions on text for consistency, and a question about the release workflow changes.

.github/TELEMETRY.md Outdated Show resolved Hide resolved
.github/TELEMETRY.md Outdated Show resolved Hide resolved
.github/TELEMETRY.md Outdated Show resolved Hide resolved
.github/TELEMETRY.md Outdated Show resolved Hide resolved
.github/TELEMETRY.md Outdated Show resolved Hide resolved
TELEMETRY.md Outdated Show resolved Hide resolved
extensions/ql-vscode/README.md Outdated Show resolved Hide resolved
.github/workflows/release.yml Outdated Show resolved Hide resolved
extensions/ql-vscode/src/telemetry.ts Outdated Show resolved Hide resolved
TELEMETRY.md Outdated Show resolved Hide resolved
@aeisenberg aeisenberg force-pushed the aeisenberg/telemetry branch 3 times, most recently from a2f5be7 to 1130cbd Compare January 26, 2021 20:22
extensions/ql-vscode/CHANGELOG.md Show resolved Hide resolved
extensions/ql-vscode/TELEMETRY.md Show resolved Hide resolved
extensions/ql-vscode/README.md Show resolved Hide resolved
extensions/ql-vscode/TELEMETRY.md Outdated Show resolved Hide resolved
extensions/ql-vscode/TELEMETRY.md Show resolved Hide resolved
@sj
Copy link
Contributor

sj commented Jan 27, 2021

@aeisenberg : I think some changes that you made previously have now disappeared again. I've added some more comments in places where I think this might have happened (or I might have just completely misremembered). Thanks!

This commit adds telemetry capturing for command execution. The data
captured explicitly captured and sent to application insights is only
the command id, execution time, and command completion status. We also
capture errors thrown by any command execution, but these are not sent
to application insights.

Telemetry capturing is opt-in. No data will be sent to application
insights unless the user explicitly allows it.

There are two new config settings added. The first controls whether or
not telemetry should be sent. This setting AND the global telemetry setting
must be enabled in order for telemetry to be sent.

The second setting controls whether or not telemetry event data should
be logged to the extension console. The hope here is that users can
inspect exactly what data is sent to the server and can have confidence
that nothing concerning is being leaked.

Note that the global setting for disabling telemetry collection is
handled inside the  `vscode-extension-telemetry` package implicitly, so
this extension doesn't touch that setting explicitly.

The `codeql.canary` setting is being used to add an additional flag to
telemetry events. This flag will help us determine if a user in internal
or not.

The application insights key is injected at build time through a
repository secret.

This commit also includes a new `TELEMETRY.md` file that explains what
is being captured, and why.
When someone disables and then re-enables the global telemetry setting,
the telemetry recorder needs to be recreated in order to allow it to
respond to events again.

Also, write the telemetry log item in the same telemetry processor as
is used to remove unused fields. This ensures there is no race condition
on the order of telemetry processors being run. We always log after
fields are removed.
@aeisenberg aeisenberg force-pushed the aeisenberg/telemetry branch from 125baf6 to 31718e4 Compare January 29, 2021 19:42
Create them for appropriately named tags and workflow dispatch as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complexity: High Requires careful design or review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants