Governance Is a State Machine. We've Been Treating It Like a Spreadsheet.

How I built custom GRC tooling with agentic AI, the methodology behind it, and why this changes the build-vs-buy equation for every organisation.

Gavin Fitzpatrick · March 2026

GRC tools don't fail because of bad UX. They fail because they encode the wrong mental model.

Governance is a dynamic, interconnected state machine. A control failure doesn't sit in isolation. It alters the exposure profile of every risk linked to it. A missed SLA shouldn't log a timestamp. It should trigger a formal escalation path and freeze the residual risk score until remediation is confirmed.

Legacy tools treat these as separate events managed by separate people in separate Jira tickets. That gap between how governance actually works and how our tools model it is where compliance theatre lives.

I've spent years building teams to manually action these state changes across Slack, Jira, and Excel. I've watched GRC platforms become a "single source of truth" that only gets updated at audit time because nobody wants to touch them between cycles.

The answer has always been custom internal tooling. At Meta, I worked alongside software engineering teams to build exactly these tools. The problem was never the vision. It was getting engineering time. Competing demands, shifting priorities, and a perpetual backlog meant governance tooling was always deprioritised against revenue-generating work.

Agentic AI has changed that equation. These tools are your on-demand engineering team. Provide the specification and architecture, and they will build, debug, deploy, and iterate. The constraint is no longer engineering capacity. It's whether you have the architectural clarity to direct them.

So I stopped writing policy documents and started writing system invariants.

The distinction: A policy document describes what should happen. A system invariant enforces what can happen. The difference is architectural.

In practice: a risk record cannot transition from treatment to residual scoring unless the linked control has a verified remediation date, the treatment owner has confirmed closure, and no findings above the agreed severity threshold remain open. The state gate rejects the transition at the database layer. The write simply doesn't happen.

[Diagram: A multi-phase risk lifecycle with hard-coded phase gates, treatment decision enforcement, and cross-entity invariant propagation.]

I built a multi-module risk platform around this principle:

Multi-phase lifecycle where every transition passes through a hard-coded gate
Mandatory precondition checks before scoring begins
Architecture review validation before treatments advance
Treatment decision enforcement: Accept requires a time-bound expiry, Mitigate requires linked controls, Transfer and Avoid require documented rationale
Residual validation gate requiring confirmed mitigations, evidence, effectiveness, governance approval, and drift tracking before any score updates
Cross-entity propagation: a control test failure automatically impacts linked risk scores, a policy change cascades to mapped controls
All enforced at the database layer, not the application layer

This is the data model doing governance work that policy documents have always promised but never delivered.

I Built 3 Prototype Tools in 2 Weeks. Here's the Methodology.

In the last two weeks I've built three prototype tools. One with Claude Code, two with Lovable. More importantly, every one of them works. Not because I wrote production code. I didn't write a single line. Because I wrote precise specifications.

My journey started in Claude, where I took a granularly designed governance framework and codified it into a machine-readable rule set:

Object states and rules governing state transitions
Relationships between entities
Variables, thresholds, and scoring formulae
Hard invariants defining what the system must never allow

That codified rule set became the foundation. Before any code was generated, every module had the same artefacts: data model, architecture document, modules breakdown with acceptance criteria, tech stack decision record, security document, README, and a persistent AI briefing file that re-grounds the agent at the start of every session.

[Diagram: The specification-driven agentic development workflow: codified rules to specification artefacts to prompt-build-validate cycle.]

The prompt cycle

The build follows a strict prompt cycle. Each prompt is self-contained with schema definitions, business logic, and an explicit Definition of Done. The AI builds against the prompt. Every acceptance criterion gets checked. Pass: pin a stable version, move to the next prompt. Fail: revert to the last pin, fix the specification, rebuild.

Core principle: When the AI breaks, the instinct to open source files and debug syntax is a trap. It costs hours and produces brittle fixes. Fix the specification. Rebuild from it. That's the core skill of agentic engineering.

Early in the build, the AI scaffolded a scoring module before the control-effectiveness weighting was locked in the schema. Clean code against the wrong data model. I didn't debug it. I updated the spec to make control-effectiveness a required foreign key with a NOT NULL constraint, then rebuilt. Fifteen minutes. Rebuilding the wrong architecture later would have been days.

Context window degradation

This discipline matters because of a real constraint the methodology doesn't eliminate: context window degradation on large codebases. The specification documents serve a second purpose. They re-ground the agent when context drifts, without restarting from scratch.

Eleven self-contained prompts. Zero manual code. A multi-module GRC platform with phase-gated risk lifecycle, control library, treatment workflows, policy governance, third-party risk management, and executive dashboards.

But I hit my first architecture roadblock. The application was local on my device with no way for others to test or review.

Three Architectural Paths. Two I'm Actively Building.

[Diagram: From localhost to production: agentic SaaS deployment, self-hosted AWS, and specification-driven procurement.]

Path 1: Agentic SaaS Deployment (Actively Building)

I took the codified specification and prompt package from the local build and redeployed onto Lovable. The process was straightforward but not cheap. Every prompt processed, every database table created, every row updated, every code optimisation costs credits. The consumption model adds up fast on a multi-module platform.

But the return is real. A shareable URL, not a localhost port. Out-of-the-box auth, database hosting, and deployment infrastructure. These platforms offer security tooling to help keep your code and data secure, but you're now operating under the shared responsibility model. You own the code and the data. You're paying them to host, scale, and secure.

Risk decision: That's a risk decision you should make consciously, not one you stumble into because the deploy button was easy to click.

Path 2: Self-Hosted on AWS (Proposed Build)

This is the proposed path for full control. The same specification drives the build, but the deployment target is your own infrastructure. Container orchestration via ECS or EKS. PostgreSQL on RDS with the same schema and Row Level Security policies. Edge functions rewritten as Lambda or Fargate tasks behind API Gateway. Auth via Cognito or an external IdP. Infrastructure as Code via CDK or Terraform so the entire environment is reproducible.

The trade-off is clear. You gain data sovereignty, network-level isolation, and full governance over the CI/CD pipeline. You lose the speed. Every piece of infrastructure that the SaaS platform gave you for free now needs to be provisioned, secured, monitored, and maintained. AppSec scanning for AI-generated code becomes your responsibility.

Pro-tips from the build

Claude Code

Use a Project. Capture all architecture docs as project knowledge.
Keep your claude.md file maintained and current. This is your single best lever for reducing unnecessary code changes.
Precise persistent context = fewer times the agent touches code it shouldn't.

Lovable / Agentic SaaS

Build your own optimised prompt agent. Train it on the platform's docs.
Validate prompts locally before pushing. Every prompt costs credits.
Regularly export the data model and architecture. Re-import into your prompt agent to maintain context.

Path 3: Specification-Driven Procurement (Unlikely Today)

In theory, you hand enterprise GRC vendors your codified rules engine and working prototype and they prove they can map your exact state machine onto their platform. In practice, most GRC tooling is not going to customise their build to meet your specific governance logic. Not in the short term. The economics don't support it, and the product roadmaps aren't designed for it.

This path will become viable as the market shifts, but it's not where the action is right now.

The SaaS Disruption Nobody's Talking About

Here's what I think most GRC vendors haven't fully reckoned with. Companies with the right internal skills can now build custom governance tooling that maps precisely to their frameworks, their state machines, their invariants. Not a generic risk register that sort of fits. An exact implementation of how they actually govern.

Within the next few years, this will become the norm for organisations with mature security and architecture functions. Paths 1 and 2 will be the default for teams that have the specification capability.

That creates pressure from two directions:

SaaS vendors need to reinvent their offerings to compete with this wave of custom agentic applications. Configurable workflows and "custom fields" won't cut it when your customer can build an entire platform from a specification in two weeks.
Customers need the right people to build and deploy these tools. The bottleneck isn't the AI. It's having practitioners who can write a precise specification, design a data model, and make architectural decisions about state management, access control, and deployment.

Both sides have work to do. The vendors who adapt fastest will survive. The organisations that invest in specification-capable practitioners will have a significant competitive advantage.

The same skills apply: The same instinct that makes a senior GRC practitioner effective — clear scope, defined boundaries, modular thinking, precise language — is exactly what makes agentic development work. Most developers are learning to think like architects. If you already do, you have a significant head start.

What's Next

Vulnerability management. We have an abundance of scanning tools. The challenge has always been what happens after the scan: integrating findings into your internal ticketing system, tracking remediation against SLAs, managing exceptions, and reporting on exposure trends over time.

It's another domain where the tooling gap between "scan results exist" and "governance actually happens" is wide open. Another state machine problem. Another area ripe for exactly this approach.

Is your organisation building the internal capability to specify and deploy custom tooling, or are you still waiting for a vendor to solve it for you?

Architectural patterns described in this article were developed in a personal R&D environment for educational purposes. They do not represent the systems, roadmap, or official stance of any current or former employer.