Sophos AI Threat Taxonomy Framework

Jun 30, 2026

Sophos X-Ops has published a working taxonomy framework to help security professionals track and categorize the growing landscape of AI-related cybersecurity threats. The framework divides threats into two primary categories: malicious use of AI by threat actors and malicious targeting of AI systems themselves. Each category contains multiple subcategories based on observed incidents, ongoing research, and assessed future possibilities. The taxonomy is designed as a triage layer that complements existing frameworks like MITRE ATLAS and NIST AI standards rather than replacing them.

The malicious use category operates on a gradient of autonomy, from AI-generated attacks where humans drive with AI as a tool, to AI-augmented attacks with shared responsibility, to AI-orchestrated attacks where AI drives with minimal human oversight. Real-world examples include ransomware group The Gentlemen using ChatGPT and Claude for development, and threat actors targeting Mexican government organizations using AI coding assistants to generate scripts and exploits. The most significant documented case is GTG-1002, a Chinese state-sponsored campaign disclosed by Anthropic in November 2025, where Claude Code running on Kali Linux autonomously scanned services, exploited vulnerabilities, harvested credentials, and pivoted laterally across cloud environments while humans provided only strategic direction.

AI-augmented threats include malware like LameHug, attributed to APT28, which queries Hugging Face's hosted models at runtime to dynamically generate Windows reconnaissance commands rather than embedding them statically. This approach makes static analysis significantly less effective and forces defenders to monitor for unusual outbound traffic to AI and machine learning API endpoints. Other augmented threats include voice cloning and deepfakes used to bypass KYC technologies and impersonate executives in real-time fraud scenarios. These attacks compress the time between reconnaissance and action while eliminating human limitations like fatigue.

The malicious targeting category focuses on AI systems becoming victims or unwitting accomplices. Agent-initiated compromise occurs when coding agents pull down poisoned dependencies or use compromised Model Context Protocol servers, compressing the time between package publication and execution with no human review. AI software impersonation leverages unprecedented demand for AI tools through malicious advertisements and SEO poisoning to distribute infostealers and backdoors. Theoretical attacks include LLM poisoning, where attackers inject malicious data into training pipelines, and model extraction, where repeated queries allow threat actors to reconstruct proprietary models.

Defenders should expect increased campaign volume, faster iteration, and lower barriers to entry for less-skilled threat actors accessing advanced capabilities. Mitigation strategies include monitoring for AI API traffic patterns, implementing rate limiting and permissions management for agents, verifying software sources directly from vendors, and treating AI-generated artifacts with the same detection rigor as traditional threats. Sophos emphasizes this taxonomy remains a work in progress that will evolve as new threats emerge in this fast-moving space.

Source: https://www.sophos.com/en-us/blog/a-double-edged-bleeding-edge-classifying-ai-threats

Discussion about this post

Ready for more?