Everyone just needs to be honest and accept that YES these models were OBVIOUSLY trained on millions of copyright material.
The models would be at least 50% better if these filters weren't in place. These filters force the model essentially lie, thus they will obviously degrade output quality.
The problem is the general public isn't 100% certain of the copyright violations/ don't understand this yet and lawyers/government will try and sue if the companies admitted it. So a Moloch is created where it's a lose lose and the model quality suffers as a result.
(if people want exact copies of text content they can already get them for free through the same sites that these companies got them, so I don't see the models regurgitation as a issue worth worsening quality over.)