Zum Hauptinhalt springen

Method

Two-stage pipeline. No autonomous LLM research.

We strictly separate data collection from evaluation. Stage 1 structures. Stage 2 evaluates only that data. It's boring — and that's why it works.

STAGE 00

Data sources

Deterministic crawlers pull HTML, schema markup, backlink data and LLM citation outputs (Perplexity API, OpenAI with web_search, Anthropic with web_search). No LLM selection, no hallucination possibility.

n8n · Cheerio · Perplexity API · OpenAI Search · Anthropic Search

STAGE 01

Stage 1: Data Engineer

An LLM normalises raw data into a strictly JSON-schema validated format. Temperature=0. No web research. No external tool calls. It gets data, returns structured data. That's it.

Anthropic Claude · OpenAI ChatGPT · Schema-constrained JSON · Frontier-Modelle

STAGE 02

Stage 2: Senior Consultant

A second LLM evaluates the structured data from Stage 1. Prioritisation by impact and effort. Output: prioritised action list, rationale per action, estimated implementation time. Again: Temperature=0, no own research.

Anthropic Claude · System-Prompt versionsverwaltet · Outputs reproduzierbar

STAGE 03

Output

30+ page audit report (PDF, Markdown, Notion export available). Three top actions for Sprint-1. Schema snippets as PR-ready templates. JSON data export for internal tools. Versioned throughout.

@react-pdf/renderer · MDX · GitHub-PR-Template

FAQ

Common questions about the pipeline.

Why no autonomous LLM web research?

Autonomous web research by an LLM means: the LLM decides itself which sources to consult. Pragmatic for many applications — dangerous for an audit. Hallucinations, poor source selection and unverifiable results make outputs unusable. We strictly separate: a deterministic data fetcher pulls defined sources, the LLM evaluates only that structured data.

Which LLMs do you use in each stage?

Stage 1 (Data Engineer) and Stage 2 (Senior Consultant) both run on current frontier models from Anthropic (Claude) and OpenAI (ChatGPT). Both model families support schema-constrained JSON outputs — essential for reproducible results. Model choice is configurable per client.

How do you handle data protection?

Data flowing into the pipeline are aggregated crawl data and LLM citation outputs — no personal data. Where personal data becomes relevant on pilot/retainer projects (e.g. author profiles), we operate under DPA per Art. 28 GDPR, with documented TOMs and EU region compute (Cloudflare EU, OpenAI EU, Anthropic EU where available).

How reproducible is the pipeline?

Both LLM stages run with Temperature=0 and a fixed seed where available. Outputs are JSON-schema validated. Identical inputs produce identical outputs — important for audit repetitions and comparability over time. Competitor audits running at Temperature=0.7 cannot reproduce their own results.

Convinced by the method? Talk to us.

Mini-Audit: 30 minutes, we'll show live how the pipeline runs for your domain.

Book a slot