These captchas are getting ridiculous

m_‮f@discuss.online · 2 months ago

These captchas are getting ridiculous

Vigge93@lemmy.world · 2 months ago

That’s when you get into more of the nuance with tokenization. It’s not a simple lookup table, and the AI does not have access to the original definitions of the tokens. Also, tokens do not map 1:1 onto words, and a word might be broken into several tokens. For example “There’s” might be broken into “There” + “'s”, and “strawberry” might be broken into “straw” + “berry”.

The reason we often simplify it as token = words is that it is the case for most of the common words.