Publications

Explore our work.

Benchmarks, technical reports, and in-depth research on reliable AI deployment in financial services.

BenchmarkJune 2026

PIBench: Prompt Injection Resistance in Agentic Underwriting

Koen Roelofs, Jakob Schmitt, Maximilian Eber, PhD

The first benchmark of prompt-injection resistance for agentic underwriting. Measures defense success across 16 frontier models, three providers, and five attack vectors — with and without untrusted-content tagging.

BenchmarkApril 2026

KYBench: Evaluating AI Agents for Adverse Media Research

David Ahn, Maximilian Eber, PhD, Sahith Jagarlamudi

The first public benchmark of AI-driven adverse media investigation. Evaluates detection accuracy, evidence quality, reliability across agent runs, and cost efficiency across eight frontier models and 31 configurations.

BenchmarkMarch 2026

FinSpread-Bench: Evaluating Agentic AI for Financial Spreading

Nico Klees, Maximilian Eber, PhD

The first public benchmark for agentic financial document processing. Evaluates extraction accuracy, cross-document reasoning, calculation correctness, and structured output quality across seven frontier models. Built on anonymized production data from financial institutions.

PaperMarch 2026

AI in AML: A guide to governance and implementation

Dustin Eaton, Maximilian Eber, PhD

Why AML teams must now apply model risk management standards to AI systems. Published in ACAMS Today, exploring how regulators are extending MRM frameworks to AI deployed in compliance functions — and what institutions need to do to prepare.