Microsoft’s Copilot tool continues to access sensitive data despite the company’s efforts to restrict its use of removed resources from GitHub. A recent investigation by Lasso has uncovered that Microsoft’s attempts to limit access to a specialized Bing interface—which had previously been available at cc.bingj.com—have proven inadequate. Notably, while public access to this interface has been cut off, the private information cached by Copilot remains accessible, thus allowing users of the tool to retrieve this sensitive data.
Lasso’s research highlights that even though Bing disabled its cached link feature, the indexed pages still persist in search results, indicating that Microsoft’s solution was merely a temporary fix. Investigators confirmed that Copilot retained access to cached data that had been effectively sealed off from human users. Therefore, while regular users were barred from viewing this information, the AI tool could still utilize it, rendering the attempted resolution partial.
The report elaborates on the alarming reality that developers often unwittingly embed security credentials directly into their code. Best practices encourage developers to secure sensitive information using alternative, more secure methods. Unfortunately, when such vulnerabilities occur, particularly in public repositories, they create substantial security risks. Even if developers quickly switch their repositories to private in hopes of mitigating exposure, the damage is often irreversible. Once credentials are exposed, the only viable course of action is a complete rotation of all affected credentials.
This situation is further complicated when repositories that contain critical data are transitioned from public to private. Microsoft has previously incurred legal costs to have tools removed from GitHub, alleging violations of numerous laws, including the Computer Fraud and Abuse Act and the Digital Millennium Copyright Act. Despite the removal of these tools, Copilot continues to undermine these efforts by making the tools accessible through its services.
In a statement issued following the revelations, Microsoft acknowledged the implications of using large language models, which are trained on publicly available web information. The company advised users concerned about their content being utilized for model training to keep their repositories private at all times, suggesting a certain level of responsibility falls on developers to manage their data proactively.
As organizations navigate the delicate balance of leveraging AI tools while safeguarding sensitive information, the risks tied to mismanaging access to private data cannot be underestimated. Employing techniques mapped within the MITRE ATT&CK framework, such as initial access, persistence, and privilege escalation, organizations must remain vigilant in their cybersecurity practices. Effective risk management in the face of these continuous challenges is essential to protect both corporate and customer data.