• Avicenna@lemmy.world
    link
    fedilink
    arrow-up
    7
    ·
    il y a 1 an

    I suppose both plantnet and deep fakes have conv networks as part of their architectures though

    • 31337@sh.itjust.works
      link
      fedilink
      arrow-up
      2
      ·
      il y a 1 an

      Likely transformers now (I think SD3 uses a ViT for text encoding, and ViTs are currently one of the best model architectures for image classification).