back
Get SIGNAL/NOISE in your inbox daily

Recently, Apollo trained some deception probes (Goldowsky-Dill et al). A deception probe is a logistic classifier on the AI’s internal activations, i…