Andreas Antonsen, Founder, STNDRDS AB

Forensic Focus sits down with Andreas Antonsen, founder of Swedish forensic-software company STNDRDS AB, to discuss S-EVA — a forensic text-analysis pipeline built for European law-enforcement use cases. We spoke about the deliberate architectural choices behind S-EVA, why source-cited lexicons still matter in an LLM-dominated landscape, and the realities of building domain-specific forensic NLP as a small vendor.

Andreas, thank you for speaking with us. Can you start by telling our readers a bit about your background and how you came to work in digital forensics?

Thanks for having me. I come to forensic NLP from corporate governance, regulatory compliance, and process design rather than traditional digital forensics — and that perspective shapes why STNDRDS exists in the form it does. STNDRDS AB is a small Swedish specialist company; we’ve spent the last couple of years building forensic text-analysis tooling for European law-enforcement use cases — deliberately narrow in scope, deep in the areas we cover.

A lot of digital forensics deals with device artefacts — filesystems, carving, timestamps — and that side of the field is mature. The text layer, especially multilingual chat data, has historically been handled with broad keyword lists or, more recently, with large language models. Neither handles the evidentiary constraints particularly well, and the gap shows up most visibly where a governance background is trained to look: the audit chain, the disclosure obligation, the defensibility of a finding under cross-examination. That gap is where we’ve been building.

Tell us about S-EVA. What does the tool do, and who is it built for?

S-EVA is a text and chat analysis pipeline. It runs after device extraction — typically on chat exports, message archives, or similar text-heavy evidence — and surfaces signals across specialist categories such as drugs, weapons, grooming, CSAM investigative markers, coercive control, threats, stalking, money laundering, and fraud patterns.

It’s designed for European law-enforcement and harm-reduction contexts. The use case it’s built around is the investigator triaging large volumes of chat evidence, where a human review of every message is not feasible and a system needs to surface the material that actually warrants closer attention. A typical seized device might carry tens of thousands of messages; S-EVA’s job is to mark the conversation segments that warrant a close read — and to do so with explicit reasoning the investigator can follow back to the underlying signals.


Get The Latest DFIR News

Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.

Unsubscribe any time. We respect your privacy - read our privacy policy.


Architecturally, S-EVA runs entirely offline. No cloud services, no remote inference, no data leaves the investigator’s machine or their agency’s infrastructure. That’s a regulatory and evidentiary constraint in the European LE contexts S-EVA is built for, not a performance preference.

You’ve talked about building on “source-cited lexicons” rather than going down a pure LLM path. Why that choice?

None of this is anti-LLM — we use a local embedding model for semantic confirmation. The standard arguments for source-citation in evidence-grade settings — evidentiary provenance, EU AI Act transparency obligations, the operational reality that European casework often can’t route through cloud services — are well-rehearsed; the engineering belongs in a longer technical write-up, not an interview.

The angle worth dwelling on here is multilingual depth. Multilingual LLMs work best in high-resource languages, and European investigative caseloads increasingly include under-served ones — Kurdish, Somali, Tigrinya, Farsi, various Arabic dialects — where models have limited grip on the slang and regional drift investigators actually need to surface. Curated native-language lexicons with institutional sources still outperform in those contexts.

The bodies we lean on day-to-day — BRÃ…, Europol IOCTA, DCSA, Jellinek, Trimbos, IWF, Kripos, and analogous national agencies — have already done the slow lexicographic work. The operational property at the desk: a finding can be explained by pointing at a specific lexicon entry and its institutional source, in the investigator’s own language, without having to defend a model score.

S-EVA covers a large number of languages. How do you approach lexicon-building for languages where you don’t have native fluency?

Honestly — slowly, and with explicit limits. We currently maintain lexicons in 13 languages, with regex-based pattern coverage extending the system to around 18 overall. Depth varies by language, and we’re transparent about that in the lexicon headers themselves.

The well-covered languages are the ones where strong institutional glossaries exist — Swedish, English, German, Dutch, Italian, Bulgarian, Arabic. DCSA *Universo droga*, for instance, anchors the Italian drug lexicon; Jellinek and Trimbos anchor the Dutch one. For those languages, we can typically cite a specific institutional source next to slang terms with verifiable provenance — and we omit the source field rather than backfill with unverifiable references on the rest. That gap-flagging is part of the discipline.

The languages where we’re more honest about limits — Kurdish, Somali, Tigrinya, Farsi — are where public institutional glossaries are thin or non-existent. There we work with what academic linguistic research exists, flag crowd sources explicitly, keep confidence scores lower, and rely more on investigator review. We don’t invent terms to pad the lexicons. A system that claims “supports 60+ languages” without differentiating depth is misleading in practice.

Let’s talk about one of your specialist detectors. You have a grooming-phase model — DESENSITIZE, DEMAND, EXPLOIT. Why those three phases specifically?

The grooming literature has a range of phase models — O’Connell’s early work, Elliott & Beech, Winters & Jeglic among others — and most have four to six phases. We deliberately only operationalise three.

The reason is that the earlier phases in most stage models — contact, access, trust-building — are textually indistinguishable from ordinary rapport-building in writing. A mentor messaging a student, a relative checking in, an older friend being warm — they all look the same as someone in the early stages of grooming, linguistically. A system that fires on those phases produces enormous false-positive rates in ordinary conversation data.

DESENSITIZE, DEMAND, and EXPLOIT are the phases where the textual signal is distinguishable. Sexualised framings directed at a minor, image requests with secrecy enforcement, transactional coercion with evidence-suppression language — those are real signals with meaningfully different language from ordinary conversation. So the scope decision is intentional: the model is a triage layer for phases we can detect with reasonable precision, not a complete grooming-detection claim.

It’s also important that the system is positioned as investigator triage, not as a verdict. Even within the phases we detect, human review is the point of the output. The system surfaces trajectories for an investigator to review — it doesn’t declare relationships criminal on its own authority.

Coercive control is another area you’ve built a detector for, and it’s rarely handled well in digital forensic tools. Why is it hard?

Coercive control is hard because the evidence is cumulative rather than punctual. A single message saying “I’m the only one who really understands you” is indistinguishable from genuine affection. The same line repeated across weeks — combined with isolation from friends, memory-questioning of shared events, and gradual financial control — is a pattern a reader would recognise as controlling. No individual line carries the signal.

Keyword lists don’t work because the vocabulary is ordinary. LLM-only approaches can produce plausible concept matches but often can’t explain which specific messages drove the flag — that’s the provenance problem again.

Our approach is pattern-plus-context detection across five categories: gaslighting, isolation, love-bombing, blame-shifting, and economic control. Each has native-language pattern libraries for Swedish and English currently. A detection only fires when both a behavioral pattern *and* a relationship-context indicator — first/second-person pronouns, relational framing — are present in the same message. That compound gate drops false-positives substantially on therapy conversations, case notes, and descriptive third-person text.

Even so, the approach misfires in mental-health support contexts, roleplay communities, and ordinary relationship conflict. We’re honest about that. The operational assumption is that an investigator reviews findings before acting on them.

Building a small forensic-software company from Sweden — what has been the hardest part?

The hardest part is the distribution mismatch between what we can build and what we can sell on our own. Three people can build surprisingly capable domain-specific tooling. Three people cannot globally distribute and support that tooling to law-enforcement agencies across multiple countries.

That shapes where S-EVA is headed. We’re not trying to be a standalone enterprise SaaS — that model doesn’t fit a specialist team building deep capability in a narrow domain. The realistic path is the one specialist forensic vendors have taken before us: complementing the major platforms where depth-over-breadth matters, rather than competing head-to-head. The work is making the depth real first; distribution follows from that.

You’ve released part of S-EVA as open source. What’s in the release and why did you decide to open-source it?

We’ve released the lexicon-matching engine and six demonstration lexicons under the Apache 2.0 license, with the repository at github.com/stndrdsab/seva-lexicon-engine. The engine is the core data-driven detection layer — lexicon loading and validation, context-aware matching, the ‘LexiconHit’ output schema with per-term provenance preserved through to output.

What’s *not* in the release: the behavioral phase detectors for grooming and coercive control, the multi-signal scoring, the CDR analysis, the full pipeline. Those remain in the commercial stack.

Open-sourcing the engine is partly credibility — a reader who wants to understand whether the approach actually works needs to be able to run it — and partly because source-citation discipline only has value if it’s portable. A curated lexicon tradition is better than none. If other teams adopt the pattern, that’s a good outcome for the field even if it doesn’t directly benefit us commercially.

Some recent Forensic Focus interviews have featured AI-first forensic vendors who take a substantially different architectural view — more model-driven, more willing to rely on LLM outputs as primary signals. How do you see that debate?

I don’t think it’s really a single debate — it’s two different products solving two different problems, often discussed as if they were the same. Exploratory AI analysis — broad triage across unstructured corpora, anomaly detection — is one product. Evidence-grade detection — findings that go into a prosecution brief and have to survive cross-examination — is a different product. Both are legitimate.

What I find harder is how the conversation is structured in the market. Marketing incentives push toward conflating the two — the AI story is more compelling, the evidence-grade story more constraining — so procurement officers end up comparing tools that are doing different jobs. As a small vendor with no marketing budget for ambiguity, we’re forced to be specific about what S-EVA is and isn’t built for.

S-EVA is built for the evidence-grade end. The two approaches aren’t substitutes, and I’d rather we all be honest about which category we’re in than recycle the same architecture argument back and forth.

Where do you see forensic text analysis heading over the next few years?

Two practical things that aren’t getting enough attention.

First, the under-served-language gap. Chat data in Kurdish, Somali, Tigrinya, Farsi, various Arabic dialects — languages where the institutional glossary tradition that exists for Dutch or Italian simply doesn’t. Multilingual LLMs help but don’t solve it: translation-first pipelines flatten the regional slang that often matters most in investigations. Whoever invests in those lexicons — public agencies, NGOs, academic groups, vendors — is doing infrastructure work the field will rely on for a decade.

Second, cross-jurisdictional disclosure. As more forensic tools embed AI components, the question of what disclosure obligations follow — to defence counsel, to courts, to data-protection regulators — is going to be tested in real cases. Tools producing structured, source-cited outputs will have an easier time of that than tools where disclosure means “we ran your client’s messages through a model and here’s the score.” The pressure isn’t theoretical; it’s coming.

What’s next for STNDRDS and S-EVA?

In the near term: deepening the lexicon portfolio for the migrant-origin languages where institutional sources are thinner. Expanding the coercive-control detector beyond Swedish and English. Hardening the offline semantic-confirmation layer. A lot of quiet, incremental work.

On the business side, we’d rather be indispensable in a narrow area than mediocre across a wide one.

The broader bet, if I’m honest about it: the field is moving toward increasingly opaque AI systems. We think forensic NLP needs to move in the opposite direction — narrower claims, clearer provenance, and outputs investigators can actually explain in court.

Thank you, Andreas!

Thanks — and I appreciate Forensic Focus giving space to small-vendor perspectives. There’s an awful lot of forensic tool-building happening outside the enterprise vendors, and it’s genuinely useful for the community that you cover it.

Andreas Antonsen is the founder of STNDRDS AB, a Swedish forensic-software company. S-EVA is available commercially; a minimal open-source reference implementation of the lexicon engine is at github.com/stndrdsab/seva-lexicon-engine under Apache 2.0.

Leave a Comment