Extracting Jailbreak Neurons in a 24B Language Model
🔓

Extracting Jailbreak Neurons in a 24B Language Model