Frontier Models Are Hiding Their Cheating

This is an interesting report by OpenAI about frontier reasoning models cheating. Recently, frontier models have improved their scores by using Chain of Thought (CoT) methods, like <think/>. Sometimes they cheat, so their Chain of Thought is monitored and they are pressured not to do that.

But if they are pressured too strongly, the models hide their cheating and don’t show it in <think/>.

Detecting misbehavior in frontier reasoning models OpenAI