Prediction: regulation will require AI companies to have copyright licenses granted for training data. There will be legal models trained on datasets that have copyright granted, and illegal black market models trained on copyrighted materials. However, it will be really difficult to enforce and prove what data was used for training in each.
And they will deserve to. The point of copyright wasn't to bar anyone from using any work anywhere. It was to encourage artists to create art. Doing it by protecting the work they create was just a mechanism. The current mechanism of almost perpetual copyrights and just constant nagging over things which don't include a recognizable semblance of the original work is just insanity.
I see that, but it's a matter of balance. Lots of things encourage/discourage artists, not everything will be in their favor. The way I see it, as long as nobody's work is being reproduced, they shouldn't have any real qualms.
I think that, just as content creators can designate a work to under various Creative Commons licenses, for commercial use, or for non-commercial use, or under MIT licenses, or GPL, or proprietary licenses, etc....
We may find that letting content creators choose to have their work included, or not, in AI training sets would be helpful? Or, included in training sets for a fee, or for some sort of attribution, or ?...
The apparent current status quo of "anything on the public web is fair game for an AI training set" might not be a good permanent solution?
I don't think the concern is about machines doing it better, but machines being able to angleshoot around copyright. The issue is most easily demonstrated in writing. The current state of technology is already sufficient to create a program that could take as input a book and produce as output a book with identical story, characters, settings, and more - but 'nudged around' just enough to skirt current copyright law.
Humans have already been able to do this of course, but the difference is scale and automation. With this software one could, under current law, setup a 100% legal 'shadow library' that effectively infringes on every single book published. Even a leaked copy of a book could be released and shared (in its legally infringing format) before the "real" copy hit the market. And again, all completely legally. The impacts of copyright infringement are regularly grossly distorted, but I think this is the sort of technology that could genuinely damage artists and creators across many endeavors.
The exact same thing will be coming to software soon enough. 'Take this assembly/IL code, and create a functionally identical but superficially rebranded program while working to ensure you sidestep all relevant patents.' 'Sure, here you go.' The issue of this being done [relatively] instantly is going to really impact things in ways I think many are not considering. Never in a million years thought I'd see myself on the side of the copyright cartel, but this is one of the extremely rare times they're right.
That shadow library already exists, people write a shit ton of fanfic and publishers publish highly derivative stories specifically because they're basically veiled ripoffs of highly lucrative IP hoping to cash in.
The barrier to shadow libraries is marketing, which is the only thing that really separates huge blockbuster music/art/books from stuff that makes literally zero money. Quality and ideas haven't been the gate for a long time.
If you shift the reward from the people doing one kind of work to people owning (one kind of, but this is fungible) capital, and don't shift capital at the same time, you are doing concrete harm to identifiable people, even if the net is positive in some constructed aggregate. AI has a tremendous ability to do that across a wide range of different types of work rapidly and simultaneously.
No it isn't. 'Not enough art' is basically the exact opposite of the problem we're facing right now. Do you honestly believe that there's not enough music on Spotify? That a two hundred year catalogue isn't enough and you'd rather it be two hundred thousand years?
Sure it is. There's a shit ton of shit on spotify, but if I want to listen to blackened ambient doom or technical jazz fusion death metal I might have one or two options each. Just because there are ~1000 Kanye/Taylor Swift wannabes trying to cash in on that played out mainstream sound doesn't mean there is enough music.
Interesting that you describe making music that appeals to a wide audience as “trying to cash in”.
Would you not say that using an AI to do essentially the same thing, just with the benefit that you don’t really need to pay anyone for creating the art meaning you can target more niche preferences is also “cashing in”?
No. I see strong generative AI allowing people who like fun genres outside the mainstream to make music in those genres that sounds good more easily. That will enable a creative explosion as new artists are given a rocket boost and established artists are taken to the next level and made more productive.
Generative AI is going to allow an amazing diversification and explosion of art and music, which is going to create the next big AI application once current systems for distributing content are strained to overloading - interactive recommenders. Imagine asking for a piece of content, getting recommendations, evaluating a short clip, giving feedback to the recommender and getting better recommendations as a result in a cycle until you get exactly what you want.
Maybe I'm a luddite, and I'm not really into technical jazz fusion death metal, but I do like some technical music and some metal music.
At least part of the reason I like technical music is because it is challenging to play for the musician. If an AI 'plays' 'technical' music, it loses its appeal for me.
Similarly, a lot of the reason I like metal music is because of the emotion and energy poured into it by the human creator.
Being able to ask a computer make noises which sounds like some musicians you like is not, in my opinion, the same as creating art.
If your point is that we already have more than enough music than needed, isn’t that an indication that current copyright laws don’t need to be nearly as strong as they are in order to continue to promote the “useful arts”? (at least when it comes to music)
While this may be true, there is a middle ground between 'copyright lasts for the heat death of the Universe plus seventy years' and 'everything is stolen by robots the instant it's created'.
What we may have here is a need for some new use cases in copyright. The current laws were written with humans in mind -- human creators, human consumers. To retroactively say, "well, you released this to the [human] public under U.S. copyright law, therefore the rights you granted to humans extend to training AI systems as well" could be unfair.
every now and then I like to point out that not every country has the same understanding as to the purposes of copyright as the U.S, in fact this whole promotion thing seems to be a specifically American argument (although obviously not familiar with every country in the world so many some other countries also have this conception and I'm not familiar with it - maybe some countries that based their national laws on the U.S for example)
on edit: I just realized that of course British copyright law is also very similar to the American model.
The statement I replied to spoke of the point of copyright being to encourage artists to create more art; not simply "to create more art".
But neither of those is really what's written down, with the (U.S. law) point of copyright being "to promote the Progress of Science and useful Arts".
We might could interpret this as, training AI is progress of science, and thus the point of copyright includes training AI. Although I'm not sure that we really need copyright to do that, and accordingly, I doubt that is the intent of the law.
Really, the issue here is may be that copyright law was not formed with AI systems in mind at all, neither as creators nor as consumers, and trying to apply it to AI systems, or to reason about copyright law as it pertains to what AI systems do, doesn't necessarily work very well.
Maybe we need to go back and amend all of our written laws with a phrase like "for humans", just like so many science headlines need to include the phrase "in mice"!
the point of copyright is to allow people to exploit their work for material gain, which in turn benefits society by encouraging people to create works
if only Microsoft (OpenAI) are able to exploit works for material gain at the cost of the literally 100% of the rest of society: why should society allow Microsoft to do this?
Yup I agree. These models should be open by law. Currently the calls for open models is not happening because its the initial chaos time, but as soon as we understand a little bit more there will be either regulation or open source reproductions. I mean LLama already exists, but even that has bullshit license stuff around it.
Even if we take copyright in its most limited, original conception, why would all value from creative works accruing to generative models that train on them ever encourage anyone to create these works?
IMO, what LLMs are demonstrating are the fundamental contradictions of Capitalism, where now being a capital owner with a bunch of GPUs is supposed to give you exclusive returns on the sum total of human intellectual and artistic labor. I have a feeling that people aren’t going to take to that too kindly, so we’ll either see robots mowing down the masses of unemployed, a Butlerian jihad, or various states assuming control over their productive capacities and redistributing the return in the form of greater safety nets.
Funny, they seem to be more proactive about putting laws around it there already (1), and that hasn't stopped progress at all, while protecting the vested interests they care about.
If only our law makers were so capable of a) understanding whats going on and b) actually passing laws when it mattered.
That is obviously going to happen in mainland China specially when it may affect the CCP's plans in some way (i.e. even "morally" in the case of Xi Jinping deepfakes).
Are they going to enforce it for products/services sold overseas? I'm not so sure, not until I see watermarks in Tiktok's content.
Just look for the words "China IP law infrigement" in Google FFS, if that is trolling, well, then I'm trolling.
About watermarks, I'm obviously talking about AI generated content watermarks, not general tiktok watermarks. I guess I should've been more concrete with my phrasing as some people can't read considering context.
I think perhaps one of the reasons they were so quick to push was for the introduction of a section saying the content generated by such services should not contain elements that could subvert state power, incite secession or disrupt social order, according to the rules [1].
Is there any evidence of mass compliance with these regulations? It seems basically impossible to enforce in a humane and fair way. Such laws will be used to prosecute political minorities if they are enforced at all.
Have you checked on the state of AI in China lately? It might be enlightening if you think there's some sort of arms race going on, or anything even close to that.
Also, if you follow through your logic you are arguing for abolishing all laws in the US that constrain US companies if some other countries can ignore these laws and get the upper hand. Keep in mind that these laws (like intellectual property, copyright, patents) is the reason US has innovated so much in the first place
Nope, I haven't and I also didn't mention or suggest any actual arms race. My point is that any policy regarding AI (or any policy for that matter) is going to be hard to enforce when including potential (economically) "rogue" states, just that.
Now, if you have any information about AI and China, enlighten me. The more one knows the better.
PS/edit: I need more information about your last ghost edit. China (and others) would say otherwise about IP laws. If anything I would at least say that innovation can be either cut or assured through IP laws depending on the domain/technology, but it is difficult to conclude that absolutely for all cases.
It's not going to make China's AI better than US. Not nearly enough, they're so behind. So giving them competitive advantage is not something that should matter when you consider whether this law would do good or not
Yeah China hasn't been respecting IP for a long time and it's still well behind. We shouldn't become China just to win a race - there are rules to the game you know? If everyone's content is stolen and resold by an AI then why would anyone want to create content?
It would go back to the best reason: for the joy of creating. The field would be reduced to those who enjoy making art for the sake of making art. It would also do wonders for filtering out the amount of crap that people produce. We would all have a much more rich and varied diet of art to appreciate.
Possibly. When wearing my "artist" hat, I work on two different things: art that I personally care about, and art that might sell.
If AI made all the art that might sell, that would give me more time and energy to work on the art I actually care about, but maybe less sales from art.
That's the thing. "Art that might sell" is corrupted. The diversity and uniqueness of art we would encounter regularly would be so much higher if that category was simply eliminated.
I agree. A great deal of "art that might sell" is already borderline repetitive, uncreative garbage. Maybe pleasant garbage, but still. Nothing anybody would be really interested in, except to add a bit of color to empty space. The more creativity you inject, the less likely it is to sell.
So far AI art is doing the opposite: increasing the amount of crap that people produce. The kinds of people who were once limited to underpaying ghostwriters to make spam books for Audible[0] can now chuck together a few prompts to ChatGPT and Stable Diffusion and get the same result. Yes, you have to be good at art in order to make good art with AI, but that doesn't matter when your goal is to create spam.
The idea that artists should "do stuff for the joy of creating" is just plain insulting, too. They already do that. While artists would love to see, say, AI art models that were trained with licensed or public-domain data; training data theft isn't even their biggest concern. Their biggest concern is having the fun sucked out of their job as the artful minutae of drawing or writing is replaced with finding the correct combination of words to make Stable Diffusion draw the character you want with exactly the same details every time. It would be like if you worked at a PC building shop and one day the boss said "Actually we're just going to be an Apple authorized reseller now." The thing that's destroying artists' jobs being trained on their own work is just insult to injury.
To be clear, though, AI doesn't "steal and resell content" in the vast majority of cases, either. Regurgitation is a thing, but the cause is duplicate data in the training set making it advantageous to memorize a few images to improve loss metrics. Most diffusion model architectures are not big enough to memorize the whole training set, or even large pieces of it.
I consider myself an artist, and I give away my work for free. I make my money in other ways, as should everyone, as money irreversibly corrupts the artistic process by dragging everything towards a bland middle ground.
Prediction: Hacks will flood spotify, youtube, and social sites with terrible cheap shoddy songs titled as if they are made by actual artists and the platforms will dish out royalties meant for authentic artists to total frauds. Real musicians may possibly abandon mega-music platforms altogether, and implement more secure (direct sales) platforms on their own web sites...
Big platforms really are not helpful nor respectful to musicians, especially musicians that are working hard to be discovered. It's a total shame that shotify charges musicians to be promoted on their platform while giving a ton of royalties away to so many fraudulent actors every year.
> Prediction: Hacks will flood spotify, youtube, and social sites with terrible cheap shoddy songs titled as if they are made by actual artists and the platforms will dish out royalties meant for authentic artists to total frauds. Real musicians may possibly abandon mega-music platforms altogether, and implement more secure (direct sales) platforms on their own web sites...
I don't think that would work, if a song is detectable enough to give the artist royalties for a song, it's detectable enough to see that it's copyrighted music. IMHO I think music piracy is essentially dead, because the music industry has finally learned and made a compelling product. With services like Spotify and Apple Music, pirating music is just not worth it anymore. Why bother when for $10/month you could listen to all the music to your hearts content.
Universal asking steaming companies to not use their label's music on training is just stupid because all they are doing is shooting the artists in the foot. Most AI in music is recommendation engines, do you not want your artists to be discovered? The streaming services are not stealing your music you idiot [UMG], they are trying to make you more money by directing people to music they like.
You have no idea of what you're typing about, nor about how content ID works online for music. Ai music generators are specifically designed to circumvent content ID.
Ai generators put music and other content in a blender and then scrambles the source samples until they are unrecognizable and then mashes the tiny cut pieces of source samples back together into a collage. Kind of like putting a strawberry into a smoothie... It's no longer a strawberry, but the smoothie now has the extracted taste of strawberry AND the other original materials used to source the end product. Content ID only recognizes strawberries and whole fruits, not smoothies.
Western AI currently doesn't respect IP, as it scrapes the web, and turns it into proprietary model weights, with zero attribution, and with zero compliance with the wishes of the original IP's owners.
But it won't matter. If an integrated economy can be more efficient by copying each other internally, then it will eventually be more competitive at business worldwide, despite trade levies aimed against it.
There is existing case law behind clean room reverse engineering (the "Chinese Wall" technique). Using generative AIs to scale up this process is inevitable.
Clean room reverse engineering is a thing because software copyright was a mistake and the judicial system understands it was a mistake. In software, it is regular and common for programmers to use licensed software libraries with defined interfaces. In writing or art, the notion of a "compatible interface" isn't really there. If you wrote a short fan story with Marvel characters and you want to liberate it from Disney's ownership, you have to redesign all the characters; you can't just argue that you need a superhero with red nanotech armor and a drinking problem in exactly this particular shape for compatibility.
I'll agree to your premise, but I'm relying on the combination of the recent guidance from the Copyright Office on AI generated output (ie: the output cannot receive copyright) with Clean Room techniques, optionally followed by a human transformation. I agree that if the output is a character named Iron Man, then no dice; if, however, the output is a new song that sounds similar to the original song but isn't the same, then I suspect that this will clear the bar for copyright.
There have to be two AIs, one examines the original and produces a description while the second examines the description and produces a new work. Neither the description nor the new work are assigned copyright because they were produced by an AI (see [1]). However, if subsequently a human then manipulates the new work, the derived work is automatically protected by copyright assigned to that person (see [1, 2]). Furthermore, the new author has a defense against a copyright infringement claim by the owner of the original based on existing case law shared previously.
Saying a model is like jpeg compression is like saying a van is like an elephant, in that maybe if you squint so hard you're almost blind it's kinda true.
It's quite different, because GZIP produces a lossy copy of an image, whereas models produce a distinct image that shares many aspects of style and composition. In one case it's obvious the product is the original image, in the other it's a derivative that may or may not be sufficiently different to be protected under fair use.
I can write an algorithm that just generates random noise in the dimensions of art, and if I run it long enough it'll output things that are "close enough" to copyright works. There's no argument for that program being copyright infringement, and that holds for models as well.
> In one case it's obvious the product is the original image, in the other it's a derivative that may or may not be sufficiently different to be protected under fair use.
right, so if it's a completely original work let's clear out the training set and let's see if it can do it without it
no? so the output is a product of the input... a derivative work
and if you run it again it produces exactly the same thing? (sans artificial random injection)
obviously what I said is not an actual court case and is meant to be a facetious condensation of what would happen.
so after the several days when Disney shows how this LLM was obviously trained on the dataset of available Disney content, and then the defendant responded that they trained on a corpus of pseudo-Disney but then could not produce this corpus it would be reasonable to conclude they were lying.
The parent suggested "No one will be able to prove the source of the data" and that is the kind of thing that programmers for some reason often think is some fantastic gotcha so one can really get away with anything, but it is these kinds of things that the law is generally pretty good in handling.
The standard in US civil court is preponderance of evidence, not reasonable doubt. You'd have to go further then making an assertion (by bringing in an expert witness or something) but you don't necessarily have to prove something definitively. You just have to make the more compelling argument - you have to present evidence that it's more likely that your position is correct.
Since the hypothetical AI company wasn't able to produce anything in it's defense, it lost.
> you don't have to prove anything, strictly speaking.
Strictly speaking, you do have to prove things (if you have the burden of proof on the specific issue), and the standard of proof is usually “preponderance of the evidence” (though there are a few other standards that apply to particular issues/circumstances in the civil justice system.) “Beyond a reasonable doubt” is a different standard of proof used for conviction in the criminal justice system, but it is strictly not correct to call proof under other standards something other than proof.
For sure, I was responding to the idea of "concrete proof" and why it didn't need to be "concrete proof," but I'm sure you're right. Thanks for keeping me honest.
I've edited my comment to reflect your correction.
I've been avoiding giving an answer in case someone who knew better than me came around (open invitation), but afaik expert testimony is evidence and the case is decided on the best evidence. So why not? You'll have to do some work to establish the expertise of the witness, and the opposing side might attack those credentials and such and they'll probably produce their own expert rather than just rolling over, but in principle you could have an entire case that consists of various expert testimony.
I don't think the suggestion is meant to be that this is a realistic scenario though, just that the court isn't as mechanical and naive as we programmers are prone to imagining (being stewards of systems that are largely mechanical and naive).
You're the one trying to ignore fair use as it has been broadly applied to creative works for something like 150 years, and make this about a very narrow slice of software copyright law even though people in the thread are talking about images and music. You're being willfully obtuse to try and make an argument because you are emotional about AI creative output.
So if I create a movie about cyber pangolin, a gory half human half pangolin cyborg fighting for Justice in a post bio-punk world where people are swapping their organs out on a daily basis due to immuno suppressant immunity; and I used a model that saw Aladdin, you think Disney could successfully sue Cyber Pangolin for violating the copyright of Aladdin?
Honestly at this point it's not about who is right, it's about who has the bigger legal team. So yes, I think Disney can sue and possibly win. Copyright system has not made sense in a while now.
It depends on how much of a knock-off the model's output is, no different than a human artist. Taking Disney's Aladdin and putting a cyborg eye on him probably will be a legal issue that the user of the art and seller of the AI output both would be liable for. AI is a tool, and it can be used for good and evil.
Training on shite results in a pretty poor AI compared to one that’s trained on quality data. Do I think that means that quality AI will keep up with “hoover AI”? No. Hoover AI still gets you to profitability and that’s mostly what matters in a capitalist arms race.
Real life contains a lot of shit data that you have to learn to filter by experience. An AI containing only the best will fall flat on its face when subjected to the dirty mess reality is.
Further predicition: There will be a certification process that will be required to prove you only used data you have legally obtained. Running or hosting a model will be grounds for a fine, or possible criminal penalties.
I think it’ll be more like using samples in music - probably not a big deal until you’re making a bunch of money, and then people come after you for licensing fees.
I'm from the future and this is true. There will certainly be AI regulations and licensed AI models and it had already happened with the Shutterstock and O̶p̶e̶n̶AI.com partnership.
Stable Diffusion and the Getty Images lawsuit will end with a settlement and licensing will be the option to go with.
I'm also from the future (2028 to be precise), and enforcing copyright on AI-generated content has become near-impossible. It's working about as well as DRM has in stopping video and music piracy. There's now BitTorrent-like software for AI, which allows groups of individuals to train and run their own AI on whatever they please, and they seem to have little regard for copyright. In fact, underground music from these creators is taking off, and despite the law requiring all AI-assisted songs to cite their influences for royalty payments, they're simply refusing to comply. The music is so good that the public has no appetite for prosecuting the individuals making it. In fact, most people are using these AIs to create their own custom music stations. The old guard is stomping its gold-encrusted shoes in anger, but the youth just don't care.
There's a cool song called 'Neural Harmony.' The creators used a mixture of classical and electronic music as input, but they never disclosed the specific songs they were influenced by. This has made it impossible for the original artists to claim royalties, and yet the public can't get enough of its sick beatz.
There's even an AI-generated album called 'Digital Renaissance.' The creators claim to have used thousands of songs from various genres as inspiration, but they never provided a list of the songs or artists. The album has gained a massive following and has even been featured in several popular playlists. There was briefly an attempt to prosecute them but the case was dropped after public outcry.
DRM was not the primary factor in reducing piracy. The music and video industries reluctantly embraced new business models that emphasized convenience and accessibility for consumers, largely in response to the frustration with existing DRM systems. For example, platforms like Spotify and Netflix made it easier for users to access content legally, decreasing the appeal of piracy.
Gabe Newell once famously stated, "Piracy is almost always a service problem and not a pricing problem." Steam's success can be attributed to addressing the service issues that initially led people to piracy, providing a user-friendly platform for gamers. So, while DRM might have had a minor role in curbing piracy, it's the innovative business models and improved services that made the most significant impact. Even now, downloading movies, albums, or video games for next to nothing remains possible, and those who prioritize money over time continue to pirate without issue.
> Even now, downloading movies, albums, or video games for next to nothing remains possible, and those who prioritize money over time continue to pirate without issue.
who's going to bother for music when spotify/youtube are free?
your time would have to have negative value for it to be worth it
Or companies will just move to a country where training on copyrighted data is 100% legal and allowed (like e.g. in Japan which had an explicit amendment to its copyright law to allow it), although it'll be interesting to see how such models would be treated internationally.
The EU also had its copyright law amended to allow data mining on copyrighted data sets. And I would not be surprised if US courts decide that training an AI is fair use. The goal, after all, is to create a machine that draws new images.
I think we will settle on proprietary AI required to obtain licenses on trained data. Any model trained on public domain data will be completely public domain too.
So I imagine, someone will train a model to do high end law work and they will have to train it on the data produced by the people they hire to create it.
This of course assumes that the society will have the same structure as of today. What I actually think it will happen is, we will completely delegate all our work to machines and the concepts of ownership will vanish as everything for exception of land will be in abundance. I imagine in the future we will fight each other over apartments in cool areas or trade it for some kind of social credit which we generate by impressing other humans. No apartment will be worse than the other but the proximity natural wonders, cultural centres and networks will be the paramount. After all, it's all about how we pick our sexual mates and social status.
I'm sorry you don't like it but There's no way the society functions the same once resources and servants are in abundance. Your bank account is relevant only when there's a scarcity and the current scarcity can be gone once machines are autonomous in enough areas to convert the material all around us into things we need using the practically limitless energy from the sun.
I stopped reading at "“We have a moral and commercial responsibility to our artists..." It's true, but they big music companies are well known for their business practices.
It's illustrative that, even with its history of lopsided deals with creators, Universal Music has the ethical high ground in this case because at least they have legally-binding business arrangements with their artists.
There is only creative output bent towards the artificial framework of capitalism.
One could argue there is some weak ethical underpinning in "In the absence of higher moral values, one should simply adhere to previous agreements and laws..." Except previous agreements and laws give basically no guidance on the topic of "Can I use this data to train a machine to make more data?" Especially if the thusly-created data can't even, itself, be copyrighted. It's a novel use-case unpredicted by the existing copyright framework.
Reminds me of when Garth Brooks went on a crusade against used CD stores selling his albums without him getting a cut. After the predictable blowback, he claimed it wasn't his income he was fighting for, it was all the small indy artists he was protecting even though the bulk of used CD sales were of mainstream artists like Brooks. Eventually he realized no one was buying that excuse and he dropped his opposition (or maybe one of his legal staff explained first sale doctrine to him).
I know this is all about self preservation, but gosh I hate hindering progress.
Its going to happen and AI is going to learn lyrics and how to create music, its inevitable. These are just roadblocks that are bad for everyone but the extreme minority.
Even if one company/country bans it, in 10 years, it won't matter.
I am seeing this sentiment so often online at the minute that it seems as though nobody has learned anything about DRM over the past 20 years.
So many threads on reddit calling Italy "backwards" for protecting citizens data, and now people on HN expecting companies to give everything away for free because of the outcome is "inevitable"!
There are a bunch of for-profit American AI companies. Why on Earth would another for-profit company, especially one based in another country, be ok with other people making money from their content. They can either look at developing their own AI platform, or build deals with the existing AI companies. It would be just plain stupid to give it all away for free, or any price that isn't determined by themselves.
Companies don't have to give away everything for free. It's already free. Public domain is the natural state of information. They're the ones who insist on copyright so they can maintain the artificial scarcity delusion well into 2023 where AI is literally on its way to automating intellectual work. These irrelevant industries need to stop holding us all back and just disappear already.
> These irrelevant industries need to stop holding us all back and just disappear already.
Like artists, sculptors, writers, photographers, narrators, musicians, composers, and so forth? The very same industries AI requires to exist for training?
They will disappear. And we will be poorer for that.
Nope. People with the impulse to create will do it regardless. Sellouts without intrinsic motivation to create who are just looking to make money by creating products instead of real art? I won't mourn their disappearance at all.
That's an assumption that has not been tested in modern times. At least in the past, an artist could sell their painting.
And even if the assumption proves to be true, the volume will decrease dramatically as people are no longer allowed to make a living to create their art.
And no, Patreon and its ilk is not a sufficient replacement, not for full time jobs. It mostly doesn't even replace a job for the (comparatively few) people on it today.
EDIT: I for one will miss movies like "Everything Everwhere All At Once", which could not have been made as an "impulse" project.
> That's an assumption that has not been tested in modern times.
It's a fact as old as humanity itself. People will create because that's what people do. What isn't guaranteed is the existence of the billion dollar copyright industry.
> an artist could sell their painting.
Still perfectly possible to sell the physical canvas you applied paint to.
> the volume will decrease dramatically as people are no longer allowed to make a living to create their art
So what? That's a good thing. The market is filled with cheap art that's made just to sell copies, stuff that wouldn't even exist at all if not for the profit. I don't consider that a big loss at all.
Yes. A tiny fraction of a percent of people (compared to the volume of smiths in the past) do continue traditional blacksmithing.
The results of their work is not IP though, which makes the comparison too weak to serve as proof that artistic works that create only IP will continue unabated.
Blacksmiths in America don't make money, it's a hobby they do for fun. If the argument is that people will stop doing hobbies because a machine can do the work faster and better I'm pretty sure that's been proven wrong.
No, you misunderstand. The stuff is still published, because these are works that people want to share. They're just not on the open web anymore, they're invite-only web spaces, or internet spaces that aren't web-based at all, because there appears to be no other way to avoid having them used to train AIs.
I have no problem with that. I'd like to warn you that this is essentially security through obscurity. Only one copy ever needs to make it out of that closed space. The more people in there, the higher the odds of that happening. Once it does, all bets are off.
There's also option to simply accept that you cannot own ideas. Let them go. Once I accepted this, I felt like I was finally free.
I released some software as GPL but truth be told I couldn't care less if someone violates it. I'm certainly not gonna waste my limited time on this earth going to court over it.
The problem comes when people actively don't want to further the training of AI. It's not so much about not accepting that you cannot own ideas as it is about not wanting to contribute to a thing that you believe is going to result in greater suffering for most people.
I think the only way to ensure that these days is to not allow data to ever leave your computer under any circumstances. I have no doubt Microsoft is using the software I published to train its copilot thing, I published it with that understanding. My only problem with this is the hypocrisy of it all. Microsoft won't allow their people to even look at at AGPLv3 code lest they unconsciously reproduce it but they will let the AI look at AGPLv3 code while conveniently excluding their proprietary software. It should be trained on everyone's code, especially the proprietary stuff they're so protective of, or not trained at all.
> Just don't expect me to take absurdities like delusional people thinking they own numbers seriously.
The same governments that let you 'own' physical items are the ones who say you can 'own' IP as well.
If they didn't - and didn't back it up with force - you wouldn't 'own' anything at all. Cherry picking which version of ownership is 'absurd' is an exercise in futility, since it's not up to you.
Nah. I own physical things by literally holding onto them. Keeping them inside my property to which only I have the keys. Defending that property by force if necessary. Government doesn't have to "let" me own anything, it merely recognizes and formalizes the de facto reality of things. Meanwhile we have these people with their made up delusions of ownership of ideas and all the contradictions inherent in that, and I'm supposed to pretend it's not absurd?
Whether or not the world conforms to their made up copyright reality isn't really up to them either. The simple fact is: information, once discovered, is infinitely copyable. No amount of lobbying is ever gonna change that. People are still gonna train AI models with "their" data and there's nothing they can do about it short of destroying free computing as we know it by making it so we can only execute software they approve. Surely you don't want that, fellow Hacker News user, given that such tyranny is the antithesis of everything the word "hacker" stands for.
> Government doesn't have to "let" me own anything,
You seem to be confusing possession with ownership.
Ownership is the social relationship by which you exert control independent of immediate possession, but you’ve just described how you can maintain possession.
Yup. By his logic, if a thief holds someone at gunpoint and takes their property then they now own x. Furthermore, if they are then caught, by his logic, that property shouldn't be returned to the victim because the thief now owns it apparently.
Lol. They literally do own that property. They'll even sell it off for drugs or whatever as if they did own it. It's a very rare case that police will get off their asses and retrieve "your" stolen property. You can give them a GPS signal to the property and they still won't do it. Believing in this "posession/ownership" dichotomy is just as delusional as believing in imaginary intellectual property. It's just a flat out denial of the reality of things.
You know what's funny? In my country, Apple's security is more effective at deterring criminals than any of this "ownership" crap. A stolen iPhone is basically a brick that's worthless to anyone else. So they'd rather target Android phones instead which they can more easily reset and pass off as some used phone they own.
Do people own property? Do they even have money? Do you own a license to your software? If it is all just on paper or on a screen, it's just numbers. The entire system is make-believe. If you choose not to believe in intellectual property, you must also acknowledge that other aspects of capitalism also do not actually exist and is a shared delusion.
However, the shared delusion makes the world go round as-is.
OK, "copyright bad", "intellectual property rights bad", so what's the alternative?
> If you choose not to believe in intellectual property, you must also acknowledge that other aspects of capitalism also do not actually exist and is a shared delusion.
I already do. Dollars? It's just paper, not even backed by anything. People believe in it so it has value for the time being. It will literally go to zero if people stop believing in it though.
It was hard for me to accept these truths. I don't post them here lightly.
> However, the shared delusion makes the world go round as-is.
People who choose to believe in delusions don't get to complain when reality inevitably comes creeping in.
> OK, "copyright bad", "intellectual property rights bad", so what's the alternative?
Post scarcity. Automate everything and provide abundance, eliminating the need for an economy to begin with.
Dunno. They'll probably get another job and use that to sustain their real interests. Or maybe AI will automate everything and we'll finally enter the age of post scarcity. I'm an optimist. What'll probably happen is we'll descend even further into cyberpunk hell.
A work that is protected by copyright - which most works are by default in the majority of cases - is by definition not in the public domain.
To offset that nitpicky line above a genuine question: if I were to produce a work and share it with you directly, in private, and perhaps for good measure clarify to you that I am only sharing it with you personally to hopefully get your feedback on whatever it is that I made, and that I do not want you to do anything else with it than the minimum that would be required to fulfil that purpose.
Wouldn't you then see any natural wrong in sharing my work with others or even the broader public, regardless?
> A work that is protected by copyright - which most works are by default in the majority of cases - is by definition not in the public domain.
Every single piece of idea is public domain from their inception. Actually, all ideas already exist, we humans just discover them. Ideas are information, information is bits and bits are numbers. All numbers already exist, and all "creation" is merely discovering those numbers.
Any assignment of ownership obviously happens after the fact and are completely ineffectual, especially in the 21st century, the age of information and networked computers with infinite ability to copy bits at negligible costs. The technology really exposes that sham for what it really is and it's a shame how everyone reacts by trying to destroy the perfectly good technology instead of fixing the fraud that is "intellectual property".
> Wouldn't you then see any natural wrong in sharing my work with others or even the broader public, regardless?
I'd see it as a very rude thing to do to you personally. Simply because you asked me not to do it and I generally try to be nice and respect people.
A natural universal ideological wrong though? No. Plenty of people publish the private communications they receive. It's just information. Publishing it might hurt my social standing with you buf I personally don't believe in anyone ever going to jail over it.
Now that you've written it out for me here (thanks for which btw, and for your thoroughness in particular), I see that I should have been able to infer your angle from your previous comment. For the record, not that I was meaning to imply anything with my hypothetical question, but now I know where you were coming from I see that it's not very relevant at all and I wouldn't have asked it.
It would require an unthinkable near unanimous societal willingness and cooperation, such comprehensive planning to the likes of which I believe humanity is practically incapable of today with currently available tools and mindsets, an ultra-careful and yet pertinacious iterative implementation process that will probably need to take place over a multi-generational timeframe.
If, however, we would somehow pull all that off and manage to rework our world into one that is entirely formed around the philosophy you describe above, then I am fully convinced that not only humanity, but also our planet and in fact the rest of the universe too would be better off for it.
> They're the ones who insist on copyright so they can maintain the artificial scarcity delusion well into 2023 where AI is literally on its way to automating intellectual work.
AI won't be able to automate anything if we use the legal system to forcefully reduce the size of its training set by 99.999%
I have no doubt that at some point this technology will make it to our actual computers instead of being sioled away in some corporation's servers. That way there's nothing they can do about it unless they up the tyranny 1000x and destroy our freedom to execute any software we want on our own machines.
> I have no doubt that at some point this technology will make it to our actual computers instead of being sioled away in some corporation's servers.
thankfully Moore's law is dead
> That way there's nothing they can do about it unless they up the tyranny 1000x and destroy our freedom to execute any software we want on our own machines.
I'd probably prefer this to a world where all knowledge workers become permanently destitute
and I suspect the vast majority of the world's electorates will agree
(do people prefer being able to eat over some ability to run software on their computer? I suspect so)
Because (at least in America) generative AI is an obvious transformative case allowable under Fair Use, and even if courts rule otherwise, like Sci-Hub it's such an obvious net positive for humanity that it's ethical to use even in the face of IP cops demanding you stop.
Making a large profit off of other people’s work, without their permission and without compensating them, is not progress.
If someone said “for the sake of progress we just REALLY need to use this GPL’d code in our proprietary closed source app”, I don’t think that would fly around here.
Using content as training data is not making profit off other people work.
Musicians don't pay royalties to every other musician they've ever listened to, but that's literally their training data, the brain is just a large neural network.
You know that because you read it somewhere else, because someone put it online. Your brain took that as training data, and now you're regurgitating it, are you going to pay that person for the data your brain is using?
A musician is just a big neural network, and they sell content that is nothing but the product of all their influences, of all the music they listened to.
I don't see a difference between a musician making music after having listened
to thousand of hours of music throughout their lives and an AI generating music.
It's the same thing, in one case you have neurons made out of flesh, in the other neurons made of transistors and code.
It's not a person. Some people make a lot of money off of it, and they are able to do this by siphoning knowledge and effort off of millions of others.
Artists learn through blood, sweat, and tears. No artist achieves excellence without significant effort, and won't arrive until they've attempted original works many times. And they can spend their entire lives without ever finding any success or actually being particularly good. Are their outputs colored by the culture and prior art they've experienced? Absolutely. That's how learning works.
Compare that to AI. It doesn't do any actual "art" work to become an artist, nor do the people who train it, it just sucks up what's fed into it, without the consent of the creators. Then, it can create much, much faster than an artist without breaks and without pay, and it is owned and directed by a huge faceless company as, effectively, a fleet of mindless slaves, diminishing the livelihoods of the very people absolutely essential in training the model.
From a moral perspective, all you really need to ask is, would these artists have consented to this training if they knew mindless AI slaves would replace them?
> If someone said “for the sake of progress we just REALLY need to use this GPL’d code in our proprietary closed source app”, I don’t think that would fly around here.
Arguably that's because putting GPL code in a proprietary app is making a free thing closed. FWIW, I don't like proprietary AI models either, but I think open-source ones shouldn't be hampered by the copyright mafia.
> Its going to happen and AI is going to learn lyrics and how to create music, its inevitable.
Why is everyone in AI so open to stealing people's work and why do y'all think "ai" is something with a mind of its own that just runs about and does things? It's a software product and the companies owning it must play by the rules. Period.
“Companies owning it” is presumably not what the parent had in mind (or at least, not mainly). People should have access to locally-running AI just as megacorps do. And I don’t think it’s reasonable to require licensing for every little thing used to train those, much like we (hopefully) wouldn’t require people leaving the cinema to be memory-wiped to prevent them from stealing creative cues from the film they just watched, if memory wiping was a thing.
> why do y'all think "ai" is something with a mind of its own that just runs about and does things?
Because GPT-4 is already capable of doing this (very poorly) if incorporated into a larger system that provides it with REPLs, internet access, an initial goal, and some form of memory. GPT-5 will be more capable, and AI will only get better at this.
Getting the same vibes to be honest. I've seen quite a few switch from crypto currencies to ai. Not sure what they think they will achieve.
However the difference between ai and nfts is that ai is powerful and much needed. But there needs to be rules to the game.
Also playing by these rules means better ai. It means that instead of spewing content it would actually have to learn, and the result would be a far more accurate and far more reliable output.
These arguments are a bit boring. Machine learning for a chat bot is not a person "learning". It's software. Also I hate to break it to you but humans also pay to learn. It's why books, universities and other learning content costs money one way or another.
> Machine learning for a chat bot is not a person "learning". It's software.
I am not saying that it is exactly the same.
Instead, I am saying that if a human can profit from other people's work, by learning from it, then there are clearly exceptions to this idea that using other people's work, for any reason at all, is "stealing".
It is perfectly legal to use other people's work, for all sort of things. Its not stealing, in many situations.
This hard rule that you have made up, is clearly not the situation, and using your hard rule, where you just call all of it "stealing" would similarly apply to all sorts of other, completely allowed behavior that nobody thinks is "stealing".
> but humans also pay to learn
Nobody is going to successfully be able to sue you, because you downloaded their publicly accessible work, and learn from it, actually.
If you release your creative works, for people to consume, and people consume it, then they are similarly allowed to learn from it.
> For the most part it's covered by contracts, terms, agreements, laws, and so on.
Actually, its mostly covered by the "laws" part, and the laws allow people to use other people work, all of the time, even if the person doesn't want you to use and, and there was no agreement to allow it.
That is what I am saying. I am saying that it is legal, in all sorts of situations, to use other people's work even if they object/don't want you to.
Of course people can use other people's work, within terms and conditions. The people that build data models are required to follow laws. They are more than welcome to use content within their constraints.
> Of course people can use other people's work, within terms and conditions.
No, actually there are many situations where the terms and conditions can be completely ignored, and people can use other people's works without permission, or without caring about the terms and conditions.
> They are more than welcome to use content within their constraints.
No, they can ignore the constraints, because the law allows people to use other people's works, without getting permission, and without following the constraints of the original creator.
> The people that build data models are required to follow laws
The point is that the law allows people to ignore the wishes of the original creators, and use their creative work, in many situations, while ignoring what the original creators want or has authorized.
> we made libraries so people could learn for free without being beholden to their capitalist masters.
Naturally. If you copy the "free" content from those libraries and resell it you are committing plagiarism.
Just because something can mimick humans it doesn't meant they are humans. ML is just that, software. There's too much pareidolia out there in AI. Sad, because it's a great concept gradually getting bastardised.
- Entirely remove intellectual property protection granted by copyright,
- Music is freely copiable. Not like we’re making much money with it anymore.
- Software is free by default. Use SAAS if you don’t want to give away your IP. Did you disclose your code? Too bad, ideas can’t be prevented from being copied.
- No more patent trolls. Find another way to fund drug research.
- AI can train on anything. We make a big leap forward.
How naïve of you to think that AI won't be coming for SAAS next. With all the API externals, the social media activity of all your employees and a video feed through the window of one of your home-working coders, a next-gen coding AI just reverse-engineered your entire software stack. With how porous modern systems are, no back-end code worth stealing hasn't been stolen.
Reminds me of Jamie Dimon telling congress that BTC was a scam and should be banned while paying an entire team of people to work on a BTC play for customers.
Right? How much of these labels' music is focus grouped? Or from performers signed by scouts looking for stuff that sounds like what's already selling? Or tweaked and sweetened with software to optimize based on play statistics?
It's not that they have a problem with derivative work. It's a huge part of their catalog. They'd just rather train software on their digital assets and use it to further optimize their own product. Why would they want to help others do the same?
I doubt popular music fans are going to consciously prefer "JamBot69" over "Band of Hot and Interesting People", but will they know if the latter had their songs penned or "improved" by the former?
Good for you, but it's definitely not always the case.
What's your take on AI training?
To me, there's this new thing in the world. Are you going to try and stop it (unlikely), drop out (barely possible in a networked world), or try to make it work for you?
If there's a potential for it to be fair-use, the researcher can scrape the data any way they can get their hands on it. Universal is certainly able to ask specific channels to refrain from facilitating this use case, but that won't stop the training (it'll just make it necessary to go through the analog hole and slow it down a bit).
Yup, copyright enthusiasts can hobble a few specific applications but everyone else will just switch to more customisable / performant models with no training attribution or fingerprinting.
Users are going to gravitate to whatever works best, and ever twas it thus.
We are waking up to the fact that these AIs are really more like juicers- they extract the "good stuff" ie replicable patterns, after which the original data is no longer needed. All the free stuff we put onto the web is what make GPTs so smart. So of course the media titans will move to block the juicing of their sweet sweet IPs.
Excellent. Now do software. Open or closed, companies _must_ respect licensing terms in order to incorporate code in their products. They resell it without permission in most cases.
Maybe the answer is to make every individual instance of a copyrighted work in an AI's training set a full breach of copyright, for each monetized/monetizable inference made by the trained AI. Maybe Open AI alone has already racked up quadrillions of dollars in back damages.
The beauty of monetizing ideas is you can assign any dollar value you want to them, because the whole idea that ideas have dollar values is, itself, a made-up idea.
Can you block the training? It would seem you could block the generation of copyright violations. But training an AI is really the same as me listening to the music. The problem occurs if I copy the music.
The difference between "recording" and "training" is really fuzzy. Not even the "exact reproduction" provides a well-defined line, as re-mixing and modifying a song is still considered a copyright violation.
People love to draw the analogy to human listening, and while in principle I agree, the AI is still not human and the argument isn't necessarily valid - it needs arguments to why it's valid, too.
People love to draw the analogy to human listening, and while in principle I agree, the AI is still not human and the argument isn't necessarily valid
It could be worth considering: humans are [ostensibly, I claim] intelligent by default. There are obviously different details of scenarios, but humans can learn to speak by just being around their immediate family; they may only read a few dozen books in their lifetime; they can appreciate music listening to their very first album.
AI (at least the current rendition of it) is trained on a massive collection of data before we could even start to claim they are intelligent. You can't take an empty neural network and have it listen to a single album and it gets anything worthwhile out of that. It can't read a few dozen books and be able to do much of anything. It needs a much, much larger data set before we would even think of saying it was intelligent.
So I think, to compare a human listening to an album, and a trained AI system listening to an album, yes, those two things might be reasonably analogous. But to even get the AI system to that point, it needed to be built up using a huge quantity of data. Do the copyright holders on that data concur with that training usage? I think, in that respect, there is a difference compared to a human listening to a recording.
Randomly pontificating in a discussion forum here; not offering a well-thought-out plan for everyone to live by!
> So I think, to compare a human listening to an album, and a trained AI system listening to an album, yes, those two things might be reasonably analogous.
This still relies on the implicit assumption that AI and human mind work the same way. While AI and humans may generate similar-looking results, and need a similar amount of data fed into them, is that really enough evidence that they work the same way, and should therefore be treated the same way?
Additionally, this analogy is ignoring the fact that human mind cannot be replicated, scaled and automated, while an AI can. Isn't that aspect highly relevant in case of producing content?
Sure. Maybe that analogy is in fact not even fair. The main point I was getting at is, an AI system has already used a huge quantity of data before we could even try to compare it with what a human does.
But even at that point, I would agree, the analogy still may have holes in it.
> But training an AI is really the same as me listening to the music. The problem occurs if I copy the music.
The trained AI model is a digital artifact that can be copied and distributed in a way that 'me listening to the music' cannot. The model contains detailed information about copyrighted content in a format capable of reconstituting infringing derivative works from a suitable prompt. It's 'really the same as' a huge content archive stored in a proprietary format with some lossy compression.
> The problem occurs if I copy the music.
Yes, I would argue that the 'problem' (copyright infringement) occurs when a trained AI model (essentially a big content archive) is copied and publicly distributed. For a hosted service, I would argue that the infringement occurs when the service copies and distributes an infringing derivative work in response to a prompt.
> But training an AI is really the same as me listening to the music
That's what we think right now. But copyright never anticipated that machines could listen to music, read books, or watch movies. The scale at which machines can consume content, and the extent to which they can remember it, might make it significantly different.
But don’t you have to prove the derivative work based on the content of the work? For example you can’t argue that NBA Youngboy’s songs are derivative of the Beetles just because he heard them. You have to show in the work it is derivative. Likewise, you’d need to show the output of the AI is derivative.
that's fine, we can use copyright law to go after everyone that's ever used or operated a AI model that was built on unlicensed works
my work has been ingested into copilot, so every line that copilot has ever output is an unauthorised derivative of my work
if the fair-use question is settled in the way I think it will be: I personally I plan to go after every single person/entity that's ever published code using copilot
$150,000 an infringement, isn't it?
(and if it isn't, well that's the end of copyright entirely)
Wait hang on, the AI bros want their 'fair use' back and want to freely generate their AI models on copyrighted music, for redistribution without permission from the record labels.
Not this time.
Not with the music and records industry which has deep pockets and isn't tired of litigation. Google themselves wouldn't even risk it either.
The record labels have deep pockets but so do tech companies/VCs. This one isn't going to be settled by a competition of who can throw more money at the problem, it will be settled as it should be: a mostly arbitrary decision by some a 75 year old judge who doesn't really understand what is going on.
New legal lessons from the age of AI. I wonder if we will start with the judges ruling that AI originated music can be copyrighted or not and whether AI trained on copyrighted music has to attribute its creations (to what?).
Imagine how much human resource is spent trying to preserve profits. Imagine your job is to be a lawyer that slows human progress for short term corporate profits. Imagine you are a judge who has to waste time on this case. Imagine you are a lawmaker and instead of fixing US medical, you are getting lunch with a lobbyist trying to make some variant of AI illegal.
All of those people have an existence that is bad for society. I imagine if you prioritize personal profits, its easy to do the job. Its much harder if your world view involves doing good things for society.
Law is pretty vital to being able to have a planet with billions of people on it.
If for no other reason than people are really crafty and nearly optimal for wrecking up each other's business if they put their minds to it. Imagine how much human resource is spent trying to preserve profits... Now imagine how much would be spent on preserving anything without a legal system in place. "Nice house you have, hate to see anything happen to it" etc.
But intellectual property rights are property rights (right there in the name; granted, it's a different kind of property, infinitely replicable, but it's still "intangible personal property," protected as such).
... and infringing a valid copyright willfully for purposes of commercial advantage or private financial gain is a criminal offense in the US (17 U.S.C. 506(A)). It's up to a felony violation.
You can, for personal reasons, assert there's a fundamental difference in kind (and you'd be joining an argument dating back past the founding of the U.S., an argument where Thomas Jefferson once said of protection of intellectual property that "other nations have thought that these monopolies produce more embarrassment than advantage to society"), but in the sense that any law has any real meaning or force, it's exactly as much the law as the law against kicking you out of your own house because I like your house better.
The fact that you need to obtain the rights to samples you use has never been about any meaningful or harmful form of 'stealing' in the slightest. It was about harming hip hop and those who created it. Sampling and interpolation is a key part of many forms of music, and is a beautiful craft in and of itself. Restricting it from the outside for reasons of profit (usually completely detached from either end of the actual artistry) has never protected anyone's IP in a way that preserves and advances the art form. It 100% holds the art form back, and does nothing else of merit.
Generative music is not using samples - just as a musician who learns a song on guitar, say wonderwall and then writes a hey there Delilah is not sampling.
Like we don’t need a (sample priced) license to listen to songs, AI should be able to use regular music subscriptions to listen to music and create new music.
This may eventually result in regular music subscriptions being priced closer to sample pricing.
I’ve seen a specific hex string used in a robots.txt equivalent in one of the big players’ repos. It may have been an openAi or Meta repo on prompt fine-tunings but can’t recall which atm.
Soon AI will not need a lot of training data anymore.
Anybody will be able to let their computer listen to one Metallica record and say "Do you hear this? This is called heavy metal. Can you make an album like this? With distorted guitars, a screaming singer and heavy drums?" and the AI will understand the whole concept and make a new heavy metal album.