Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models

Related

Microsoft Confirms Active Exploitation of Windows Shell CVE-2026-32202

What happened Microsoft has revised its advisory for CVE-2026-32202, a...

Microsoft Releases Emergency Patches for Critical ASP.NET Core Flaw

What happened Microsoft has released an out-of-band security update to...

Microsoft Commits $10 Billion to Expand AI and Cybersecurity Infrastructure in Japan

What happened Microsoft announced a $10 billion investment to expand...

Microsoft and Salesforce Patch AI Agent Flaws That Could Leak Sensitive Data

What happened Microsoft and Salesforce have patched recently disclosed AI...

Share

What happened

Microsoft has developed a lightweight scanner designed to detect backdoors in open-weight large language models (LLMs), according to researchers Blake Bullwinkel and Giorgio Severi. The AI Security team said the tool uses three observable behavioral signals that can be used to flag models that have been tampered with during training, where hidden “backdoor” triggers can remain dormant until specific inputs are encountered. Bullwinkel and Severi explained that the scanner analyzes how trigger-like inputs affect internal model behavior, allowing detection without prior knowledge of the backdoor mechanism. Model poisoning involves covert modifications that cause an LLM to behave normally in most contexts but to change outputs under narrowly defined conditions. Microsoft aims to improve trust in open-weight models by enabling defenders to identify potentially backdoored models at scale, even when they appear benign under typical use.

Who is affected

Developers, enterprises, and organizations that use or deploy open-weight large language models are affected because backdoored models may embed hidden behaviors that compromise model integrity and output trustworthiness. 

Why CISOs should care

The emergence of tooling to detect AI model backdoors reflects a growing category of supply-chain risk and integrity threats in machine learning, where compromised models could lead to unexpected behavior, data leakage, or automated exploitation if left unchecked. 

3 practical actions

  • Assess LLM sourcing practices. Identify and document third-party or open-weight models used in production and evaluate their provenance. 
  • Integrate model integrity scanning. Apply tools like the new backdoor scanner to validate AI models before deployment. 
  • Monitor behavioral anomalies. Track LLM outputs for trigger-linked deviations that could indicate hidden backdoor activation. 
IMG 0514 2
+ posts

John Kevin Hao is a news and feature writer covering cybersecurity, technology, and business targeted for professional audiences.