

yep. you could of course swap weights in and out, but that would slow things down to a crawl. So they get lots of vram (edit: for example, an H100 has 80gb of vram)
yep. you could of course swap weights in and out, but that would slow things down to a crawl. So they get lots of vram (edit: for example, an H100 has 80gb of vram)
that’s why they need huge datacenters and thousands of GPUs. And, pretty soon, dedicated power plants. It is insane just how wasteful this all is.
i wasn’t born yet. I don’t even think half of me was in my dad’s balls yet
imagine that to type one letter, you need to manually read all unicode code points several thousand times. When you’re done, you select one letter to type.
Then you start rereading all unicode code points again for thousands of times again, for the next letter.
That’s how llms work. When they say 175 billion parameters, it means at least that many calculations per token it generates
funny how everyone who wants to write a new browser (except the ladybird guys) always skimp on writing the actual browser part
in yes/no type questions, 50% success rate is the absolute worst one can do. Any worse and you’re just giving an inverted correct answer more than half the time
they are improving at an exponential rate. It’s just that the exponent is less than one.
got a pc with a good deal. First thing I did was electrically cut off all unnecessary leds
if you’re concerned about how much you need to move your hand, then you’ll probably love (neo)vim
that’s why you get a little robot friend to clean it for you
theoretically, they wouldn’t, and yes, that is how it works. The math says so.
opposite or not, they are both tasks that the fixed-matrix-multiplications can utterly fail at. It’s not a regulation thing. It’s a math thing: this cannot possibly work.
If you could get the checker to be correct all of the time, then you could just do that on the model it’s “checking” because it is literally the same thing, with the same failure modes, and the same lack of any real authority in anything it spits
so? It was never advertised as intelligent and capable of solving any task other than that one.
Meanwhile slop generators are capable of doing a lot of things and reasoning.
One claims to be good at chess. The other claims to be good at everything.
the driver itself is kilobytes in size. Megabytes is huge for such a simple thing
how does that stop the checker model from “hallucinating” a “yep, this is fine” when it should have said “nah, this is wrong”
the first one was confident. But wrong. The second one could be just as confident and just as wrong.
what makes the checker models any more accurate?
you made me snort coffee out of my nose. I hoepe you’re proud of yourself
most code from the before times, from the long-long-ago, actually didn’t need a browser, and could fit on a floppy disk!
these types of laws usually come from the most technically illiterate people ever