Activations, linear+MLP refusal probes, WildGuard-judged generations & eval data to fully reproduce amir/RESULTS.md (PhillipHoward/high-temp-refusal).
Abdullah
amirali1985
AI & ML interests
Mechanistic interpretability, high dimensional geometry, persona role playing.
Recent Activity
updated a collection about 9 hours ago
High-Temp Refusal: Probe-Gated Decoding updated a collection about 9 hours ago
High-Temp Refusal: Probe-Gated Decoding updated a collection about 9 hours ago
High-Temp Refusal: Probe-Gated Decoding