• 12 Posts
  • 308 Comments
Joined 3 years ago
cake
Cake day: June 23rd, 2023

help-circle






  • When they can argue its for “transformative use” or whatever the magic words are? Thats technically fair use in US law.

    Well, considering they transformed its use to about 250GB of weights, that would qualify. That’s at least thousands of times less than the size of the books they downloaded, so you can’t really claim “they downloaded the books and put it into the model unaltered”.

    It’s not like you can ask one of the models for page 156 of the second Harry Potter book, unless it’s cheating and attached to a search engine to try to find the result. There is no compression technique that can take something to a thousandth of its size without an substantial loss. You can, however, ask it to summarize what happened in the second Harry Potter book, including what the actual title is, without it trying to look it up on its own.

    The AI bros might have a serious point within the law, and that should scare actual artists. It should also scare studios like Disney that hold a fuck ton of “intellectual property”.

    Actual artists have been fucked over by copyright since its invention. Copyright, patents, and intellectual rights were created under the false pretense that it “protects the little person”, but these are lies told by the rich and powerful to keep themselves rich and powerful. Time and time again, we have seen how broken the patent system is, how it is impossible to not step on musical copyright, how Mark Twain, Sonny Bono, and Disney has extended copyrights to forever, and how the megacorporations have way more money than everybody else to defend those copyrights and patents. These people are not your friend, and their legal protections are not for you.

    If the rich end up dismantling their own IP shield that has existed to enrich themselves for centuries in the name of AI progress, I’m going to call that a win.




  • Didn’t someone at Google write a memo that was like “we’re kinda fucked b/c you can re-create this stuff with enough resources” like 2 years ago?

    Basically, yes. They were specifically decrying the amount of open-sourcing they and their American competitors were doing, because capitalism, of course. Around this time, we had examples like StabilityAI’s StableDiffusion and Meta’s LLaMA as open-source models. And around this time, everybody else started closing their models, despite the fact that the research kept on going out in the open. StabilityAI kept their models open, mostly because they had no choice, but the attitude shifted towards profitability.

    So, China took the open-source mantle, and these open/closed lines are being drawn strictly around national divisions as this American vs. China slant. Which is mostly a diversion of the real battle.


  • Whoever wrote this article didn’t even bother to do the most basic of research.

    DeepSeek fully admitted they started with ChatGPT outputs to train its model. And then they released it as an open-source model, so that everybody else can “steal” their work. On the image/video front, the general public has created every possible variation on top of every model you can think of. On top of that, any model that has ever been released with full weights has been spun into whatever variation or VRAM size you want.

    The ugly truth that the American companies want to hide is the fact that they are spending trillions of dollars on an oligopoly that they can’t keep long-term. They hope that they can just keep spending more money to add more billions of parameters to their models, and keep technologically competitive with the secondary open-source models. But, they’ve already ran into diminishing returns over a year ago, and the global compute sector physically cannot keep up with demand for another cycle of even more diminishing returns.

    The other factor is that realistic miniaturization of models is already here. Some of the smaller sizes aren’t as effective as the 250GB models they use on cloud-based services, but you can still do a lot with a 16GB or 24GB video card, using models of those sizes. Optimization and LLM quantization is getting better and better each year. The AI bubble burst is going to force a cascade shift into a new era of localization. Everybody is sick to fucking death of renting and subscribing to everything. Us pirates already do so on the media front, and soon localization of LLMs is going to become way more popular.

    The question isn’t “Can people steal the tech?”. It’s “how long will people notice that it’s already happening?”







  • Now major news publishers are actively blocking the Internet Archive—one of the most important cultural preservation projects on the internet—because they’re worried AI companies might use it as a sneaky “backdoor” to access their content.

    This is a total lie. This has nothing to do with AI. They’ve hated archive sites because forums like this one hate their paywalls, and we prefer to be able to actually read their articles and discuss them instead of getting blackballed every time.

    NYT is one of the worst offenders, and NYT as a company has turned for the worse in the last 5-10 years, maybe even worse than Amazon Post. None of the old media companies really understand how to adapt in the Internet age, so they are slowly dying. It’s like they are perpetually in an economic bubble that hasn’t figured out how to pop itself. There’s so much damn news and news places copying their own news, and regurgitating it a hundred times, that we’re forced to aggregate it and have YouTubers hawk shit like Ground News just to process it all.




  • For a company named “Open” AI their reluctance to just opening the weights to this model and washing their hands of it seem bizarre to me.

    It’s not when you understand the history. When StabilityAI released their Stable Diffusion model as an open-source LLM and kickstarted the whole text-to-image LLM craze, there was a bit of a reckoning. At the time, Meta’s LLaMA was also out there in the open. Then Google put out an internal memo that basically said “oh shit, open-source is going to kick our ass”. Since then, they have been closing everything up, as the rest of the companies were realizing that giving away their models for free isn’t profitable.

    Meanwhile, the Chinese have realized that their strategy has to be different to compete. So, almost every major model they’ve released has been open-source: DeepSeek, Qwen, GLM, Moonshot AI, Kimi, WAN Video, Hunyuan Image, Higgs Audio. Black Forest Labs in Germany, with their FLUX image model, is the only other major non-Chinese company that has adopted this strategy to stay relevant. And the models are actually good, going toe-to-toe with the American close-sourced models.

    The US companies have committed to their own self-fulfilling prophecy in record time. Open source is actively kicking their ass. Yet they will spend trillions trying to make profitable models and rape the global economy in the process, while the Chinese wait patiently to stand on top of their corpses, when the AI bubble grenade explodes in their faces. All in the course of 5 years.

    Linux would be so lucky to have OS market share dominance in such an accelerated timeline, rather than the 30+ years it’s actually going to take. This is a self-fail speedrun.