Simon Willison on Nostr: There's a story doing the rounds at the moment about private GitHub repos showing up ...
There's a story doing the rounds at the moment about private GitHub repos showing up in AI training data
If your repo has ever been public there's a chance it was archived by https://www.softwareheritage.org/ and ended up in The Stack training data: https://huggingface.co/spaces/bigcode/in-the-stack
If it's never been public that obviously should NOT have happened
You can use ClickHouse and the GitHub Archive to try and see which of your repos may have been public in the past using this tool I just built: https://observablehq.com/@simonw/github-public-repo-history
If your repo has ever been public there's a chance it was archived by https://www.softwareheritage.org/ and ended up in The Stack training data: https://huggingface.co/spaces/bigcode/in-the-stack
If it's never been public that obviously should NOT have happened
You can use ClickHouse and the GitHub Archive to try and see which of your repos may have been public in the past using this tool I just built: https://observablehq.com/@simonw/github-public-repo-history