AI favors texts written by other AIs, even when they're worse than human ones

Original Work: AI favors texts written by other AIs, even when they're worse than human ones

Raindrop annotations

Annotations made with Raindrop.io.

And I found a fascinating pattern: the AI gives artificially high scores to reports written with AI.

The models perceive LLM-written reports as more professional and of higher quality. They prioritize form over substance.

And I'm not saying that style isn't important, because it is, in the real world. But it was giving very high marks to poorly reasoned, error-filled work simply because it was elegantly written.

When I asked the model what it based its evaluation on, it said things like: "Well, the students didn't literally write [something]... I inferred it from their abstract, which was very well written."

In other words, good writing produced by one LLM leads to a good evaluation by another LLM, even if the content is wrong.

Meanwhile, good writing by a student doesn't necessarily lead to a good evaluation by an LLM.

AI favors texts written by other AIs, even when they're worse than human ones

At that point, I couldn't help but think of Anthropic's paper on subliminal learning between language models. The mechanism isn't the same, but it made me wonder if we're looking at a similar phenomenon.

AI favors texts written by other AIs, even when they're worse than human ones

Notes on “AI favors texts written by other AIs, even when they're worse than human ones”

Raindrop annotations