Skip to main content
About

Audit the workflow. Not the model.

Every AI tool ships assuming you already know how to use it. Most practitioners don't — not because they lack the talent, but because no tool ships with the framework. LLM-DX is that framework.

The problem

Most AI frustration isn't the AI.

Sessions degrade as context fills — often earlier than practitioners expect. Briefings get re-explained five times in a single conversation. Prompts ask for outputs the model has no way to produce because the inputs were never structured. The model gets the blame. The workflow gets a pass.

The practitioners who get the most out of these tools aren't using a better model. They're running a tighter loop.

The AI fluency research space is developing — Anthropic's 2026 Education Report measured fluency behaviors at population scale using the 4D Framework (Dakan, Feller). llm-dx operates at the individual practitioner level: diagnosis, correction, tracking.

Sources: Anthropic (2026). Education Report: AI Fluency. anthropic.com · Dakan, R. & Feller, J. (2025). The 4D AI Fluency Framework, in collaboration with Anthropic.
The switch

Diagnosis is the entry point. Practice is the destination.

The "dx" in llm-dx is clinical shorthand for diagnosis. It's where you start — not where you stop. The score tells you where the workflow breaks down. The corrections give you the next move. The history shows whether you're actually improving or just retaking the test.

Become the practitioner your AI assumes you already are.
Why this exists

No tool ships with the framework you need to use it well.

Claude, Gemini, the rest — they all ship assuming you already know how to brief, scope, structure, and discern. Most people don't. There's no shame in that; there's just no curriculum. This is the curriculum. Read the methodology →

Five principles

What this framework holds to.

  1. 01
    Diagnose before you upgrade.
    A new model won't fix a workflow problem. Measure the workflow first.
  2. 02
    Score the practice, not the output.
    Output quality is a downstream effect. The leverage is upstream — in setup, context, and discipline.
  3. 03
    Improvement compounds where you measure it.
    Most practitioners track nothing. The ones who improve fastest track the same seven things repeatedly.
  4. 04
    Tokens are a quality signal.
    Wasted tokens are the receipt for a workflow gap. Efficiency isn't frugality — it's evidence of structure.
  5. 05
    The framework is the product. The model is the substrate.
    Claude is the substrate today. The dimensions hold for the next model and the one after that.
The visual

What you're looking at in the background.

The motion behind every page is a generative system called Emergent Calibration. Particles are born into noise and gradually align into structured flow. They age through four colour states — dark indigo, bright indigo, green, and gold — that map directly onto the four assessment tiers: Foundational, Developing, Proficient, Optimised.

It's the practitioner journey rendered live. Chaos to structure. Undiagnosed to deliberate.

Writing

Research, analysis, and things worth saying.

The blog is where the thinking happens in public. Practitioner workflow, frontier technology, renewable energy, macroeconomics — topics where being direct and doing the research matters more than having the approved take.

If you find something useful, it will usually lead back to a correction or a question worth running through the assessment. Read the blog →

Who built this

Practitioner, not vendor.

Gavin Fitzpatrick
Cybersecurity & GRC practitioner · Dublin

I work in security, the kind of environment where structured inputs, traceable evidence, and disciplined process are the difference between a controlled outcome and an avoidable incident. That discipline is the lens I brought to AI.

Two years of running LLMs in production, across chat assistants, agentic coding tools, orchestration platforms, and self-hosted models, taught me that most AI frustration is a workflow problem the practitioner is calling a model problem. LLM-DX is a workflow framework, not a model framework, because the model is not the leverage point. The leverage point is the loop the practitioner runs around it. It cost evenings, weekends, and a real token budget, and it taught me the dimensions hold across every tool I tried.

The same discipline shapes how I build. state-machine-governance is a reference architecture for building security tooling with agentic AI, where the rules of the system are enforced as state machines rather than described in documents. The human designs the architecture and the invariants. The agent generates the code against them. It is published as an open specification, not source code, by design. The specification is the asset. The generated code is disposable.

And it shapes what I write. The trust model holding open-source infrastructure together was never properly engineered, and the AI capability shift turned a slow structural problem into an operational one. The gap is no longer the tooling. It is the accountability, and it is being built by the largest beneficiaries on their own timeline.

Support this work
Buy me a coffee
Start here

You can't fix what you haven't measured.