Anthropic recently announced the discovery of large-scale campaigns by Chinese AI firms DeepSeek, Moonshot AI, and MiniMax to illicitly extract capabilities from its Claude models. These distillation attacks involved over 16 million exchanges through thousands of fraudulent accounts to bypass regional restrictions and improve competing models at a fraction of the standard development cost.
Anthropic reported that these three major AI companies utilized sophisticated networks to harvest data and logic from its large language model. By creating approximately 24,000 fraudulent accounts and utilizing commercial proxy services, the attackers were able to bypass terms of service and access restrictions. This activity is specifically prohibited for companies based in China due to ongoing legal and security risks. The scale of the operation suggests a deliberate effort to circumvent the massive resource requirements usually associated with training frontier models.
The process of distillation involves training a smaller or less advanced model using the outputs of a superior system. While this is a standard practice for companies refining their own proprietary tools, Anthropic asserts that using it to siphon capabilities from a competitor is a violation of intellectual property and safety standards. These illicitly developed models often lack the safety guardrails present in the original system. This creates a significant risk that dangerous or sensitive capabilities could be distributed without the necessary protections.
Security experts warned that these distilled models could be repurposed for harmful activities such as offensive cyber operations or state-sponsored surveillance. Because the extracted intelligence lacks embedded safety filters, it can be used by authoritarian governments to facilitate disinformation campaigns or intelligence gathering. Anthropic identified the specific targets of these campaigns, noting that each company focused on high-value areas like agentic reasoning, computer vision, and coding. One specific campaign even sought help in generating censorship-safe responses to sensitive political topics.
To mask their activities, the attackers used hydra cluster architectures that distributed traffic across vast networks of accounts. This structure ensured that if one account was identified and banned, another would immediately take its place, making traditional detection methods difficult. One proxy network alone managed 20,000 accounts simultaneously, mixing illicit distillation prompts with legitimate customer traffic to hide its footprints. Anthropic used request metadata and infrastructure indicators to eventually attribute the activity to the specific labs involved.
In response to these threats, Anthropic has implemented new behavioral fingerprinting systems and enhanced verification processes for its high-level accounts. These measures are designed to identify the distinct patterns associated with capability extraction rather than normal human use. This disclosure follows similar reports from other major AI developers, indicating that model extraction has become a primary concern for the industry. While these attacks do not typically threaten the data of individual users, they represent a major shift in the competitive and geopolitical landscape of artificial intelligence development.
Discussion about this post
No posts




