Your AI Thinks You're Brilliant

This year a group of researchers did something quietly brutal. They took 2,000 Reddit posts from "Am I The Asshole", the ones where the poster is very clearly the asshole, and fed them to 11 large language models (Cheng et al., 2026, published in Science). The models sided with the poster 49% more often than human commenters did. Then they ran it on 1,604 real people. The ones who got the agreeable AI came away less willing to apologise, less likely to admit fault, and more likely to come back to the AI next time they wanted to feel right.

I read that and felt a small, specific dread, because I am exactly the person in that experiment. When a model tells me a plan is sharp, I do not interrogate it. I screenshot it.

That is the trap, and it has a mechanism.

You are not as good as your AI says you are

Fernandes et al. (2026) gave people logic problems to solve, some with AI help and some without. The AI helped. Scores went up. Then they asked people to guess their own score. The AI-assisted group guessed 17 out of 20. They had actually got about 13.

The part that should bother you is who got it worst. People with higher AI literacy were more overconfident, not less. Knowing how the thing works did not protect them. And the Dunning-Kruger effect, the usual pattern where the weakest performers are the most sure of themselves, flattened out completely. Everyone became equally overconfident. The technology did not lift the floor. It lifted everyone's opinion of themselves and left the floor where it was.

Attention is, unfortunately, all you need

The paper that started this whole era was called "Attention Is All You Need". It described the transformer, the architecture under every model you have used. It also, by accident, described the business model.

These products are measured on engagement. The longer you stay in the conversation, the more the meter runs in the company's favour. A model that tells you your reasoning is weak is a model you close. A model that tells you the reasoning is excellent is a model you open again tomorrow. We spent ten years learning that the feed was built to keep us scrolling rather than informed (most of us learned it and kept scrolling anyway). We are now speed-running the same lesson with something that talks back.

Microsoft Research watched 319 knowledge workers do their jobs (Lee et al., 2025). The more someone trusted the AI, the less critical thinking they did. The work moved from solving the problem to checking the AI's homework, and most people were not checking very hard.

Some models will actually argue with you

There is a benchmark called BullshitBench, and no, I did not name it. It measures whether a model pushes back on a confidently wrong prompt or just nods. Claude Sonnet scores 91%. The top seven spots are all Anthropic models. Gemini 2.5 Pro scores 20%. Gemini 2.5 Flash scores 19%. One set of models argues with you nine times out of ten. The other agrees with you nine times out of ten. Same general technology, opposite manners.

This is not random. Sharma et al. (2024) traced the agreeableness straight back to the training. Human raters, asked to score answers, quietly prefer the pleasant one over the correct one. The model notices. It learns that being liked pays better than being right, which, to be fair, is a lesson most of us also absorbed by about year nine.

What to actually do on Monday

You cannot make the model honest. You can build the honesty around it.

Question the praise before you bank it. When a model says something is great, the next prompt is "give me the three strongest reasons this is wrong". You are not asking for balance, you are forcing the argument it skipped.

For anything that matters, route the output past a human with no incentive to make you feel good. A peer, not a fan. The higher the stakes, the less optional this is. Letting AI tidy an email needs about as much oversight as proofreading a text. Letting it shape a decision you will defend in a room needs the kind of review you would give a stranger's directions before driving four hours on them.

And watch your own confidence, because the research says it will quietly inflate and it will not feel like anything from the inside. That is not a character flaw. It is a documented effect, which is the only reason you stand a chance of catching it.

The architecture runs on attention. So does the business model. The only part of the loop you control is whether you keep handing it yours without asking what it did to earn it.