“Saltpeter—what purity should it be taken at?” A Classical Chinese conversation AI somehow easily jailbreaks? A paper reveals an LLM security loophole
Research shows that due to its opaque nature, classical Chinese can easily bypass the safety barriers of large language models. The research team used the CC-BOS framework to carry out a jailbreak attack with nearly 90% success rate, revealing blind spots in AI safety training regarding classical Chinese and showing vulnerabilities in how models handle classical languages.
CryptoCity·04-03 00:40