Patch Diffing + LLMs: ghidriff Featured in New Research and OBTS v8 Talk

October 10, 2025 · 3 min read

Cyber Security Research & Training

"It’s exciting to see open-source tools like ghidriff shaping the research frontier. This new paper validates what many of us have been building toward: diffing as the perfect context for LLMs, and agentic pipelines that turn binary changes into actionable security insight." – CSL

A new research paper has just been published on arXiv:
Binary Diff Summarization using Large Language Models

The paper highlights how ghidriff, our open-source Ghidra-based diffing tool, can serve as the foundation for next-generation vulnerability analysis pipelines. By combining binary diffing with LLM summarization, the authors demonstrate a powerful framework for software supply chain security.

The authors (NYU Tandon + Narf Industries) propose a framework that integrates ghidriff into a larger analysis pipeline:

Binary Diffing with ghidriff → isolates added, deleted, and modified functions between binary versions.
Diff Callgraph Construction → builds a dependency-aware graph of only the changed functions.
LLM Summarization → generates natural language explanations of what changed and why it matters.
Functional Sensitivity Score (FSS) → a novel scoring system to triage risky functions, inspired by CVSS categories (confidentiality, integrity, availability, etc.).

Benchmark & Results

6 open-source projects (gzip, openssl, tar, sqlite, microhttpd, paho-mqtt)
104 versions → 392 binary diffs → 46,023 functions
Injected 3 malware families (ransomware, RAT, botnet) to simulate supply chain compromises.
Achieved 0.98 precision and 0.64 recall in malware detection.
Case study: correctly detected the XZ utils supply chain backdoor with high FSS separation.

This validates a key idea: diffs provide the perfect context for LLMs. Instead of overwhelming models with entire binaries, we can focus them on just the changes—making analysis explainable, efficient, and actionable.

OBTS v8 Talk: Taking It Further

Next week at Objective by the Sea v8, CSL will be presenting:
“Reverse Engineering Apple Security Updates” where we will demonstrate patch diffing in action.

This talk builds directly on the same principles, but applies them to Apple’s opaque IPSW updates and CVE advisories. I’ll demonstrate how agentic patch diffing pipelines can:

⚡ Automate patch triage across Apple security updates
⚡ Compress analysis from days → minutes
⚡ Surface binary truth through a hybrid of deterministic tools + reasoning agents

Why This Matters

Supply chain security: Binary diffing + LLMs can detect injected malware before it propagates downstream.
Vulnerability research: Structured, explainable outputs reduce analyst workload.
Open-source validation: Tools like ghidriff are becoming central to both academic research and real-world triage.
Patch diffing in practice: At OBTS, we’ll show how these ideas scale to Apple’s monthly security updates, uncovering vulnerabilities that advisories leave vague.

The field is moving fast, and it’s rewarding to see community-driven tools shaping the conversation at both the research and practitioner levels.

Join Us at OBTS v8

If you want to see how this all comes together in practice, join us at OBTS v8.
🔗 Talk details

#ReverseEngineering #AppleSecurity #PatchDiffing #BinaryDiffing #LLMs #OBTS #ghidriff

Benchmark & Results​

OBTS v8 Talk: Taking It Further​

Why This Matters​

Join Us at OBTS v8​

Benchmark & Results

OBTS v8 Talk: Taking It Further

Why This Matters

Join Us at OBTS v8