Hunting Infostealers with AI in Large-Scale SOCs

Jun 16, 2026

Sophos Principal Data Scientist François Labrèche presented research at NorthSec Conference 2026 demonstrating how a multi-layered detection pipeline can identify genuine threats within massive SOC datasets. The research addresses alert fatigue, a critical problem where analysts face overwhelming volumes of security alerts, most of which prove irrelevant or benign. Using two weeks of telemetry from Taegis XDR, which processes over 800 billion events daily, the study analyzed 11.8 trillion total events to demonstrate practical threat hunting at enterprise scale.

The detection pipeline operates in four distinct stages, each progressively reducing alert volume. The first stage applies detectors ranging from simple indicator matches to sophisticated ML models, including a Long Short-Term Memory (LSTM) network for identifying domain generation algorithms and logistic regression models for detecting malicious command-line activity. This initial filtering reduced the dataset to 2.6 billion alerts, representing just 0.02% of original events. The second stage performs deduplication and correlation, grouping related alerts such as denial-of-service attacks or scanning activity, which reduced the count to 251.4 million alerts.

The third stage applies context-based suppression, filtering alerts based on customer-specific circumstances such as authorized vulnerability scanning or known false positives from threat intelligence feeds. This suppression removed 16% of remaining alerts, leaving 211 million. The final prioritization stage employs a Gradient Boosted Trees Classifier trained on 1.8 million historical alerts, using both static features like MITRE techniques and dynamic features tracking investigation rates for similar alerts. This model automatically closes low-probability threats while elevating high-risk alerts, ultimately reducing the two-week dataset to 81,573 high and critical alerts.

The system successfully identified an infostealer attack affecting one customer through two high-severity password theft alerts. When analysts examined surrounding medium and lower-severity alerts for context, they discovered anomalous program behavior followed by malware detections and behavioral indicators of credential theft. The pattern repeated twice on the same user's machine, confirming compromise. Sophos contacted the customer and initiated incident response procedures to contain the attack.

Organizations operating large-scale SOCs should implement similar multi-stage filtering approaches rather than relying on single-layer detection systems. The research demonstrates that combining rule-based detection, machine learning models, and context-aware filtering can reduce alert volumes to manageable levels while maintaining threat detection capability. Future development will focus on applying prioritization models to grouped incidents rather than individual alerts, potentially improving detection accuracy through aggregated alert analysis.

Source: https://www.sophos.com/en-us/blog/a-needle-in-a-stack-of-needles-hunting-infostealers-with-ai

Discussion about this post

Ready for more?