back
Get SIGNAL/NOISE in your inbox daily

TL;DR: We wanted to benchmark supervision systems available on the market—they performed poorly. Out of curiosity, we naively asked a frontier LLM to…