Recently, there has been a lot of talk about adding an entry in ‘robot.txt’ to deny Machine Learning crawlers from using our personal website contents to train AIs.
I am fine with it. I am always in favor of solutions that allows individuals to express their will. Choosing who can use the things we put out in the world, and owning them, is the perfect expression of such power.
However, I personally disagree. I will always make my content available to everybody, human and non-human alike.
As you may know, I am in favor of greatly reducing the scope of copyright—a system currently used to abuse people and enforce the status quo. No problem has ever been solved by adding stricter and more convoluted copyright laws, with two exceptions: 1) reducing social gain to benefit corporations and individuals, and 2) providing a paycheck to copyright lawyers.
That’s why I am concerned when I see people cheering for more copyright laws, only because it supposedly hurts companies we do not like. For once, something remains wrong even when it hurts my “enemies.” I will not change allegiance depending on my convenience. It is easy to support a principle when we directly gain from it, but much harder when we do not.
Secondly, it will not hurt them. Not in the long term, anyway.
What will happen is that copyright will do what copyright does: create an investment moat that will allow only rich companies to do AI work. Rich companies and billionaires will cut training deals left and right, while independent and open-source AI researchers will end up starved of data and will be struck down by bad-faith copyright infringement claims (just look at YouTube). Moreover, criminalizing scraping will prevent investigative and research efforts to access data for non-AI-related purposes.
The result will be to give the means of digital production of the future to the super rich.
So, no, I will not join in raising walls to free information. I believe in free and accessible information for all, regardless of how it will be used.