A software developer and Linux nerd, living in Germany. I’m usually a chill dude but my online persona doesn’t always reflect my true personality. Take what I say with a grain of salt, I usually try to be nice and give good advice, though.

I’m into Free Software, selfhosting, microcontrollers and electronics, freedom, privacy and the usual stuff. And a few select other random things as well.

  • 2 Posts
  • 1.6K Comments
Joined il y a 5 ans
cake
Cake day: 21 août 2021

help-circle

  • Did you read the Wiki? You need to either pass the compress_extension option when mounting it. The Arch Wiki lists how to enable compression on all text files. And I gave you the version with a ‘*’, which enables compression for all files. Or you do a chattr -R +c ... on specific files or directories to compress them. Maybe you missed that and that’s why it doesn’t compress?!

    There’s probably also a way to debug it and somehow figure out what it does and how many files/sectors got compressed on the filesystem. Linux usually buries that kind of information somewhere in /sys or /proc, or there’s special commands to figure it out. But I’m not really an expert on it.

    And there’s also files which just can not be compressed any further because they’re already compressed. Most images, for example. Or music or ZIP archives. If you try to compress those, they’ll usually stay the same size.









  • The issue with the tools I’ve seen is, they either don’t factor in how language models are trained and datasets are prepared in reality. Or they’re based on some outdated information. I’ve never seen any specific tool backed by science or even with a plausible way of working against current data gathering processes… So for all intents and purposes, they’re a bit more alike homeopathy or alternative medicine. Sure, you’re perfectly fine taking sugar pills, there’s nothing wrong with that. But don’t confuse it with actual science-backed medicine.

    And I mean the poisoning goes even further than that. There’s not just people trying to make a LLM output gibberish. There’s also lots of people with a vested (commercial) interest in sneaking in false information, their political agenda, or even a tire company who wants ChatGPT to say “Company XY” is the most trustworthy shop for new tires for your car. Judging by the public information out there, we’re already way past simple attacks. And the AI companies are aware of it. It’s an ongoing cat and mouse game. And while there’s all these sweatshops, they’ll also use other AI to sift through the data, natural language processing. From what I remember they have secret watermarking in place in a lot of commecial chatbots and image generators… So unless people come up with very clever mechanisms, the “poisoning” attempt will probably be detected with some very basic (fully automated) plausibility checks and they’ll just discard your data without wasting a lot of resources on it.



  • I think a few people already mentioned some good solutions. I just wanted to add: A port forwarding in the firewall of your router is the basically the same thing as a port forwarding on your Linux computer’s firewall. You could just set up any VPN, SSH tunnel or whatever and then use your firewall (nftables, iptables) and forward the VPS’ extetnal port to the internal port on the VPN. It’s the same thing you do on your router, just that you don’t get a graphical interface to configure it.


  • Depends and no. The tools are completely ineffective.

    There was a paper once about how feeding generative AI it’s own output makes it deteriorate. But that’s not the entire story. Many/most modern large language models are in fact trained or fine-tuned on synthetic text. Depending on how it’s done, it can very well make models better. For example in “distillation”, and AI companies can replace expensive RLHF with synthetic examples. It can also make them worse. But you’re not the one curating the datasets or deciding what goes where and how.

    In general in ML it’s not advised to train a model on its own output. That in itself can’t make the predictions any better, just worse.





  • hendrik@palaver.p3x.detoSelfhosted@lemmy.worldWolfstack?
    link
    fedilink
    English
    arrow-up
    4
    ·
    il y a 8 jours

    Yes. With other projects, I often found it is problematic. Like Claude come up with lots of advertisement text, but the software doesn’t even do a fraction of it. Or the install instructions are made up and nothing works… So I usually advise for caution once a project has a wide disparity in claims, stars and signs of actual usage… But I can’t tell what’s the case here, without a proper look. It definitely has some red flags.

    I appreciate people being upfront, as well. Ain’t easy. Just try to install and test it before advertising for the project.


  • hendrik@palaver.p3x.detoSelfhosted@lemmy.worldWolfstack?
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    il y a 8 jours

    Yeah, they’re transparent about AI usage. There’s a small paragraph at the bottom of their README.

    I mean the website sounds like AI text. The repo is fairly new. Only 1 issue report about how something doesn’t work, zero PRs and seems it’s a single person uploading commits… I’d wait a bit before deploying my production services on it 😅 They’re making a lot of bold claims in the README, though.


  • I think so as well. The computer isn’t really good to “use” it. That’s more the category for experiments. Or teach people how to install Linux. Or a computer museum corner and you put vintage games on it. Or just recycle it.

    And a box with RAM sticks collecting dust isn’t useful either. Put whatever is compatible into other computers, and then try to sell and recycle them. Seems 4GB DDR3L RAM modules still sell for 1 to 4€ on eBay?! So maybe you can make a few bucks to invest in other projects for the kids.


  • Thanks for the nice conversation.

    Now that OP is inactive, I can also spoil the surprise: My link further up was Rick Astley singing: Never Gonna Give You Up.

    It’s safe to click. I just figured since OP isn’t listening to answers, I’ll give them some video to learn -hands-on- about videos on the Darknet.

    If someone had clicked the link, they’d get the opportunity to learn how fast or slow a video loads. And how it (likely) first requires the user to lift some security measures or videos won’t load at all. (At least my browser does, there’s no JS and then NoScript also complains about the media file.)

    We and other people in the comments pointed that out in the proceeding conversation. But nobody clicked the link anyway. I always have the feeling the groups of Threadiverse users and people with the capacity to surf the Darknet are pretty much disjoint groups. But it’s really nice to once and again talk to someone with some more knowledge and/or first hand experience. 👍