This seems to match up with some quick tests I did just now, on the pseudonyminized chatbot interface of duckduckgo.
chatgpt, llama, and claude all managed to use double spaces themselves, and all but llama managed to tell I was using them too.
It might well depend on the platform, with the “native” applications for them stripping them on both ends.
tests
Mistral seems a bit confused and uses tripple-spaces.
The word chunks often contain a space because it’s efficient. I would think an extra space would stand out. Writing it back should be easier, assuming there is a dedicated “space” token like other punctuation tokens, there must be.
Hard mode would be asking it how many spaces there are in your sentence. I don’t think they’d figure it out unless their own list of tokens and a description is trained into them specifically.
This seems to match up with some quick tests I did just now, on the pseudonyminized chatbot interface of duckduckgo.
chatgpt, llama, and claude all managed to use double spaces themselves, and all but llama managed to tell I was using them too.
It might well depend on the platform, with the “native” applications for them stripping them on both ends.
tests
Mistral seems a bit confused and uses tripple-spaces.
Tokenization can make it difficult for them.
The word chunks often contain a space because it’s efficient. I would think an extra space would stand out. Writing it back should be easier, assuming there is a dedicated “space” token like other punctuation tokens, there must be.
Hard mode would be asking it how many spaces there are in your sentence. I don’t think they’d figure it out unless their own list of tokens and a description is trained into them specifically.