• FarceOfWill@infosec.pub
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 年前

    I don’t see how you can write the law such that it allows training ai on copyrighted data without making it possible to train a special llm on a single github instead of the entire universe, and essentially treat it as a full compression of the source.

    • Grimy@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 年前

      The outputs are still bound to copyright laws. Tracing pixel per pixel over an artwork doesn’t make it immune to copyright laws, maliciously over training gen ai to act like a database and outright copy shouldn’t either.

      If you have a carbon copy of someone’s github, it doesn’t matter if you generated it, it’s still a copy. Although code is a difficult example since I’m not entirely where the line is for one repo to be different then the other when they are accomplishing the same task.

      I always imagined businesses just grabbed the gpl software and would tell their employees to rewrite it but different. Most things I dive down into seem to stem from one algorithm or two from a paper and the rest is fluff.