Can coding agents relicense open source through a “clean room” implementation of code?

Pierre-Yves Lapersonne@programming.dev · 27 days ago

Can coding agents relicense open source through a “clean room” implementation of code?

eleijeep@piefed.social · 27 days ago

Well last I heard you can’t copyright the output of an LLM, so the entire concept of a licence for open slopware is moot.

polakkenak@feddit.dk · 27 days ago

No, absolutely not. It is safe to assume that most/all open source (and otherwise) has been part of the training data. You need not look further than the fact that some models can recite Harry Potter from memory. There is no such thing as “clean room” for AI.

Captain Beyond@linkage.ds8.zone · edit-2 26 days ago

Ironically though this makes the reverse a bit more defensible (i.e. using an LLM to reverse engineer a proprietary app) because that proprietary app’s source code is less likely to be among the publicly available dataset.

But I imagine the corpos aren’t going to look fondly on that for obvious reasons.

StellarExtract@lemmy.zip · 27 days ago

This really isn’t true though, even if it is currently true in many cases. Case in point, if I wrote something and published it right now, it wouldn’t be part of any AI model yet. A party with a lot of money (like, say, a tech corporation) could easily create a bespoke coding model that is trained on everything but the desired libraries, thus achieving “clean room”.

Captain Beyond@linkage.ds8.zone · 26 days ago

I think the fact that the maintainer is intimately knowledgeable about the original codebase is enough for it to not be a clean room re-implementation, no? That’s what makes it “clean”