What specific data exactly will be send to Copilot? #59630
Replies: 12 comments 7 replies
-
Parth Thakkar has reversed engineered the plugin in VSCode and released a blog post Copilot Internals, which explains what data is send to copilot. However, I prefer an official documentation on what exact data is send to copilot (or specified to do so) which can be used in discussions. |
Beta Was this translation helpful? Give feedback.
-
🕒 Stale Discussion Alert 🕒 This Discussion has been labeled as stale by an automated system for having no activity in the last 60 days. Please consider one the following actions: 1️⃣ Close as Out of Date: If the topic is no longer relevant, close the Discussion as 2️⃣ Provide More Information: Share additional details or context — or let the community know if you've found a solution on your own. 3️⃣ Mark a Reply as Answer: If your question has been answered by a reply, mark the most helpful reply as the solution. Note: This stale notification will only apply to Discussions with the Thank you for helping bring this Discussion to a resolution! 💬 |
Beta Was this translation helpful? Give feedback.
-
also came here for this |
Beta Was this translation helpful? Give feedback.
-
Interesting that GitHub is still silent on this. I'm paying for this service and I'd like to know as well. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Somewhat related question: Does Microsoft/GitHub include individual's private repos as training data for Copilot? #135400 |
Beta Was this translation helpful? Give feedback.
-
look at the Q&A on this page https://resources.github.com/copilot-trust-center/ |
Beta Was this translation helpful? Give feedback.
-
I would very much like to hear this as well. E.g., what happens if a user opens a file somewhere on the disk using VS Code, because that happens to be the default app for opening e.g., an XML file -- and that file happens to contain secrets? Can this secret be sent to the servers as context without the user knowing? Note, I'm specifically not talking about storing secrets in repos! But I just want to understand what are the implications of having an active GitHub Copilot extension analysing all files that you happen to open. I'm aware of content exclusions, but to me it seems like the whole thing works the wrong way around. Shouldn't everything be denied by default, and the file content be allowed to be used as context only for explicitly defined file patterns? |
Beta Was this translation helpful? Give feedback.
-
This contains some information: https://stackoverflow.com/questions/76075204/github-copilot-and-privacy-does-github-copilot-save-locally-developed-code There's also this: https://resources.github.com/learn/pathways/copilot/essentials/how-github-copilot-handles-data/ |
Beta Was this translation helpful? Give feedback.
-
I went looking for an answer to whether copilot is using private repo informaton during training. Ended up here from the other threads being closed. I think based upon multiple discussions being closed (with no response) on this topic that we can infer that answer is probably a YES that they do seem to include user private repositories in the training. It should be pretty easy for github to say no to the question yet seem to willfully ignore the discussion completely. |
Beta Was this translation helpful? Give feedback.
-
It is definitely using/sending more data than expected. Following situation today:
However: the proposed changes in |
Beta Was this translation helpful? Give feedback.
-
I asked copilot:
|
Beta Was this translation helpful? Give feedback.
-
Select Topic Area
Question
Body
To understand the range of possible suggestions generated by Copilot, I would like to know the detailed technical description on which data exactly is send to Copilot. The features list on https://github.com/features/copilot only explain a very vague definition of the data being sent:
However, it doesn't specify what is send exactly.
The Privacy Statement on https://docs.github.com/en/site-policy/privacy-policies/github-copilot-for-business-privacy-statement explain vaguely again, that "Code Snippets" are send to Copilot:
Again, no exact definition of "Code snippets" that is being send to Copilot.
The official documentation for "Enabling or diabling duplication detection" on https://docs.github.com/en/copilot/configuring-github-copilot/configuring-github-copilot-settings-on-githubcom#enabling-or-disabling-duplication-detection say that "about 150 characters" around the current location is checked:
But this might only be used for this specific case of finding duplicate code against public github repositories.
So the question is: What exact data is send to Copilot to generate suggestions?
It looks like neither the official documentation on https://docs.github.com/en/copilot nor the feature list on https://github.com/features/copilot specifically explain what exact data is send to Copilot. Usage experience suggest, that some content from other open tabs are send, but I'm not sure about that. Or it has some other ways to "remember" what code was previously seen or used.
Beta Was this translation helpful? Give feedback.
All reactions