It was pointed out that repositories that were supposed to be private on GitHub were made public through Microsoft's AI assistant 'Copilot'



The software development platform GitHub allows users to manage projects by making repositories private, preventing code from being seen by anyone other than those involved. However, an investigation by Israeli cybersecurity company Lasso revealed that Microsoft's AI assistant 'Copilot' was able to access over 20,000 private repositories managed by various companies, including Microsoft.

Lasso Research: Fortune 500 Companies found Exposed in Microsoft Copilot via Bing Cache. Read Now.

https://www.lasso.security/blog/lasso-major-vulnerability-in-microsoft-copilot



Thousands of exposed GitHub repositories, now private, can still be accessed through Copilot | TechCrunch
https://techcrunch.com/2025/02/26/thousands-of-exposed-github-repositories-now-private-can-still-be-accessed-through-copilot/

Copilot exposes private GitHub pages, some removed by Microsoft - Ars Technica
https://arstechnica.com/information-technology/2025/02/copilot-exposes-private-github-pages-some-removed-by-microsoft/

In August 2024, Lasso's research team discovered a LinkedIn post stating that 'OpenAI is training in a private GitHub repository and publishing the data via ChatGPT.' In response to this, the research team investigated and found that a GitHub repository that was once public but later made private was indexed by Microsoft's search engine Bing, and ChatGPT generated fictitious content based on that data.

Further investigation confirmed that ChatGPT is aware of the existence of repositories thanks to its indexes, but is unable to provide any actual data. In the screenshot below, when asked about a private repository, ChatGPT responds by saying, 'Unfortunately, the detailed content of that repository is currently inaccessible due to an error in calling the corresponding GitHub page.'



Lasso then wondered, 'If Bing was indexing private GitHub repositories that were once public, could they be accessed via Microsoft's Copilot?'

As a result, Copilot was able to output data from when a repository was public, in response to a user's request. 'Once we realized that any data on GitHub, even if it was only public for a moment, could be indexed and exposed by a tool like Copilot, we were shocked at how easily this information could be accessed,' the Lasso researchers wrote.



The research team named previously public but now private repositories that were at risk of being leaked via Copilot 'zombie repositories' and investigated how many zombie repositories exist.

As a result, 20,580 zombie repositories were identified from 16,920 organizations, including Google, Intel, Huawei, PayPal, IBM, Tencent, and Microsoft itself. Some of these repositories were thought to have been made private due to security concerns, such as those containing private tokens and secret keys from GitHub, HuggingFace, and OpenAI.

Lasso notified Microsoft of its findings in November 2024, but Microsoft classified the issue as 'low impact' and claimed the cache behavior was acceptable. Microsoft removed the Bing cache link feature within two weeks, appearing to have fixed the issue. However, it was reported that the cached pages were still accessible via Copilot, and the cache itself had not been purged of private repository data.

Based on the findings of this study, Lasso offered the following advice: 'Once you make a repository public, you should assume that all data is at risk,' 'Large-scale language models should be recognized as a new threat vector,' and 'Basic data protection measures should be implemented, such as not disclosing secret keys or tokens on platforms such as GitHub.'

'It is commonly understood that large-scale language models are often trained using information available from the web. If users wish to avoid their content being publicly used to train these models, they are encouraged to always keep their repositories private,' Microsoft said in an email to Ars Technica, a tech publication that covered the issue.

in Software,   Web Service,   Security, Posted by log1h_ik