NotarAI
Intent captured. Drift reconciled.
NotarAI is a continuous intent reconciliation tool that keeps your specs, code, and documentation in sync as all three evolve. It uses LLMs as a bidirectional reconciliation engine: not just to generate code from specs, but to detect drift, surface conflicts, and propose updates across your entire artifact chain.
What is NotarAI?
Spec-anchored – Structured YAML specs capture intent as the canonical source of truth, validated by JSON Schema.
Bidirectional – Detects drift in any direction (code, spec, or docs) and proposes aligned updates.
Propose and approve – Never auto-syncs. All changes are proposed for human review.
Composable – Specs reference each other via $ref for hierarchical and cross-cutting composition.
Quick links
Installation
Quick Install (Linux / macOS)
curl -fsSL https://raw.githubusercontent.com/davidroeca/NotarAI/main/scripts/install.sh | sh
This detects your OS and architecture, downloads the appropriate binary from
GitHub Releases, and installs it to ~/.local/bin. If that directory is not
in your PATH, the script will print a one-line export command to add it.
From crates.io
If you have Rust installed:
cargo install notarai
Manual Download
Download the binary for your platform from the latest release:
| Platform | Binary |
|---|---|
| Linux x86_64 (glibc) | notarai-x86_64-linux-gnu |
| Linux x86_64 (musl) | notarai-x86_64-linux-musl |
| Linux aarch64 (glibc) | notarai-aarch64-linux-gnu |
| Linux aarch64 (musl) | notarai-aarch64-linux-musl |
| macOS x86_64 | notarai-x86_64-macos |
| macOS aarch64 (Apple Silicon) | notarai-aarch64-macos |
| Windows x86_64 | notarai-x86_64-windows.exe |
Make the binary executable and move it to a directory in your PATH:
chmod +x notarai-*
mkdir -p ~/.local/bin
mv notarai-* ~/.local/bin/notarai
If ~/.local/bin is not already in your PATH, add this to your shell profile
(~/.bashrc, ~/.zshrc, etc.):
export PATH="$HOME/.local/bin:$PATH"
From Source
git clone https://github.com/davidroeca/NotarAI
cd NotarAI
cargo build --release -p notarai
# Binary is at target/release/notarai
Updating
If NotarAI is already installed, check for and install updates with:
notarai update
This detects how NotarAI was installed and acts accordingly — downloading a new binary for GitHub Release installs, or printing the appropriate cargo install command for Cargo installs. Use notarai update --check to check without installing.
NotarAI also prints a passive update hint on notarai validate and notarai init when a newer version is available (checked at most once every 24 hours).
Requirements
- No runtime dependencies – NotarAI is a single static binary
- Claude Code for reconciliation features (optional for validation-only usage)
Quick Start
Initialize your project
Run notarai init in your project root:
notarai init
This does several things:
- Copies
notarai.spec.jsonto.notarai/notarai.spec.jsonso the schema is available for validation. - Writes
.notarai/README.mdwith workflow instructions. - Writes
.notarai/reconcile-prompt.md(reconciliation prompt template). - Writes
.notarai/bootstrap-prompt.md(bootstrap prompt template). - Appends
.notarai/.cache/to.gitignoreso the hash cache DB is never committed. - Writes
.mcp.jsonregisteringnotarai mcpas a local MCP server, so MCP-accelerated reconciliation works out of the box. - Writes or section-merges
AGENTS.mdwith a## NotarAIsection describing the workflow. - For the Claude adapter: adds a PostToolUse hook to
.claude/settings.jsonso spec files are automatically validated when Claude Code writes or edits them; copies reconcile and bootstrap skills to.claude/skills/; creates or section-mergesCLAUDE.mdas an@AGENTS.mdpointer.
Running init again is safe: it always refreshes skills, templates, and the schema copy, and replaces the ## NotarAI section in AGENTS.md (and adapter pointer files) with the current content.
Create your first spec
Specs live in a .notarai/ directory at the root of your repository:
project/
.notarai/
system.spec.yaml
auth.spec.yaml
billing.spec.yaml
_shared/
security.spec.yaml
src/
docs/
Here’s a minimal spec:
# .notarai/auth.spec.yaml
schema_version: '0.8'
intent: |
Users can sign up, log in, and reset passwords.
Sessions expire after 30 min of inactivity.
behaviors:
- name: 'signup'
given: 'valid email + password (>= 12 chars)'
then: 'account created, welcome email sent'
- name: 'login'
given: 'valid credentials'
then: 'JWT issued, session created'
artifacts:
code:
- path: 'src/auth/**'
role: 'primary implementation'
docs:
- path: 'docs/auth.md'
Validate specs
# Validate all spec files in .notarai/
notarai validate
# Validate a specific file
notarai validate .notarai/auth.spec.yaml
# Validate a directory
notarai validate .notarai/subsystems/
Output is PASS <file> or FAIL <file> with an indented error list. Exit code is 0 if all files pass, 1 if any fail.
Update NotarAI
Check for and install updates:
notarai update
NotarAI will also print a hint when a newer version is available during validate or init.
Bump schema version
When you upgrade to a new version of NotarAI, update all spec files with:
notarai schema-bump
This overwrites .notarai/notarai.spec.json with the bundled schema and updates the schema_version field in every .notarai/*.spec.yaml file.
Bootstrap from an existing codebase
Use the /notarai-bootstrap skill in Claude Code to generate specs from your existing code via a structured developer interview.
Check for drift
Run notarai check to detect structural drift without an LLM:
# See what's drifted
notarai check
# Strict mode for CI (any finding = exit code 1)
notarai check --strict
This reports coverage gaps, orphaned globs, changed files since last reconciliation, overlapping coverage, circular $ref chains, and incomplete behaviors. See the CLI reference for details.
For automated PR checks, add the GitHub Action to your CI workflow.
Reconcile with an LLM
Use the /notarai-reconcile skill in Claude Code to perform a full semantic reconciliation: detect drift, propose spec/code/doc updates, and walk through each finding interactively.
Next steps
- Existing codebase? See the Brownfield Adoption Guide for a step-by-step walkthrough of adding NotarAI to a project that already has code.
- Not sure how much spec detail you need? Progressive Adoption describes three maturity levels so you can start light and add depth where it matters.
Spec Format Reference
Specs are YAML files validated against a JSON Schema (notarai.spec.json). The format uses progressive disclosure: a small set of required fields for minimum viability, with optional fields for precision as needed.
Required fields
schema_version
Pins the JSON Schema version. Current version: "0.8". Versions "0.7", "0.6", and "0.5" are also accepted for backward compatibility.
schema_version: '0.8'
intent
Natural language description of what the system or feature should do.
intent: |
Users can sign up, log in, and reset passwords.
Sessions expire after 30 min of inactivity.
behaviors
Structured Given/Then entries describing expected behavior. Each behavior has a name, a given condition, and a then outcome. Required for full tier specs; optional for registered and derived tier specs.
behaviors:
- name: 'signup'
given: 'valid email + password (>= 12 chars)'
then: 'account created, welcome email sent'
- name: 'session_timeout'
given: '30 min inactivity'
then: 'session invalidated'
Behaviors may also include optional interaction and state_transition sub-fields:
behaviors:
- name: 'submit_form'
given: 'user submits a valid form'
then: 'data saved, confirmation shown'
interaction:
trigger: user_action # user_action | timer | system_event | data_change | schedule | external_signal | threshold | manual | lifecycle
sequence:
- validate fields
- post to API
- show confirmation
state_transition:
from: editing
to: confirmed
Behaviors may also declare the tests that verify them via tested_by
(introduced in schema 0.8). notarai check uses this to surface
test-alignment drift:
behaviors:
- name: 'signup'
given: 'valid email and password'
then: 'account created, welcome email sent'
tested_by:
- path: 'tests/auth/signup_test.rs'
assertion: 'signup_creates_account'
| Check | Severity | Trigger |
|---|---|---|
| T001 | Warning | A tier-1 behavior has no tested_by entry. |
| T002 | Error | A tested_by.path does not exist on disk. |
artifacts
Glob patterns mapping the spec to the files it governs. The schema accepts any string as a category key. Convention categories:
| Category | When to use |
|---|---|
code | Source code |
docs | Documentation |
tests | Test files |
slides | Presentation files |
data | Data files, CSVs |
configs | Configuration, IaC |
notebooks | Jupyter/R notebooks |
assets | Media, images, fonts |
templates | Reusable templates |
schemas | Data schemas, API specs |
artifacts:
code:
- path: 'src/auth/**'
role: 'primary implementation'
docs:
- path: 'docs/auth.md'
tests:
- path: 'tests/auth/**'
Each artifact ref may include an optional integer tier override (1-4) for files that belong to a different tier than the spec itself:
artifacts:
code:
- path: 'dist/bundle.js'
tier: 4 # derived output — tracked for staleness, not authored directly
Optional fields
constraints
Rules the system must follow.
constraints:
- 'rate limit: 5 login attempts per minute per IP'
- 'passwords must be >= 12 characters'
invariants
Conditions that must never be violated.
invariants:
- 'no plaintext passwords stored anywhere'
- 'all API responses include request-id header'
decisions
Architectural decision log with date, choice, and rationale.
decisions:
- date: '2025-01-15'
choice: 'JWT over session cookies'
rationale: 'Stateless auth simplifies horizontal scaling'
open_questions
Unresolved design questions.
open_questions:
- 'Should we support OAuth providers beyond Google?'
- "What's the session timeout for mobile clients?"
dependencies
References to other specs this one interacts with.
dependencies:
- $ref: 'billing.spec.yaml'
relationship: 'auth gates billing endpoints'
notes
Freeform hints for the LLM about implicit relationships.
notes: |
The auth module shares a rate limiter with the API gateway.
Session storage is Redis in production, in-memory in dev.
output
Describes what the spec ultimately produces. Useful for non-software artifacts like presentations or reports.
output:
type: presentation # app | presentation | interactive-doc | game | dashboard | report | library | service | document | course | api | infrastructure | dataset | design-system | campaign | template
format: pptx
runtime: static-file # browser | native | static-file | embedded | server
entry_point: dist/deck.pptx
content
Describes the output’s logical structure in content terms (slides, scenes, sections) rather than file terms.
content:
structure: graph # ordered | hierarchical | graph | free-form
sections:
- id: level_1
type: scene
intent: 'Tutorial level introducing movement mechanics'
duration: { value: 5, unit: minutes }
connections:
- to: level_2
label: completion
- to: game_over
label: player_death
depends_on:
- id: intro_cutscene
relationship: 'must complete before this section unlocks'
evidence:
- type: data
source: playtests/run_3.csv
claim: '85% of players complete within 5 minutes'
states
Top-level state machine definition for interactive artifacts.
states:
initial: idle
definitions:
- id: idle
transitions:
- to: running
on: start
guard: 'all required fields are populated'
action: 'initialize timer, log start event'
- id: running
transitions:
- to: idle
on: stop
design
Visual and design specifications for brand-governed artifacts.
design:
theme:
palette: ['#1a1a2e', '#16213e']
typography:
heading: Inter
body: Roboto
modes:
light: { palette: ['#ffffff', '#f0f0f0'] }
dark: { palette: ['#1a1a2e', '#16213e'] }
layout:
type: paginated # slide-deck | scrolling | spatial | grid | free-form | paginated | canvas | timeline | tabbed
dimensions: letter
print:
margins: { top: '1in', right: '1in', bottom: '1in', left: '1in' }
headers: true
footers: true
page_numbers: true
responsive:
breakpoints:
- name: mobile
max_width: 768
layout_override: scrolling
- name: desktop
min_width: 769
audience
Context about who the output is for.
audience:
role: 'Series B investors'
assumed_knowledge: 'Familiar with SaaS metrics, not technical infrastructure'
tone: formal-but-engaging
locale: en-US
accessibility:
- high-contrast
- screen-reader-friendly
variants
Multiple versions of the same artifact with selective field overrides.
variants:
- id: investor-deck
description: 'Condensed version for investor meetings'
overrides:
audience.role: 'Series B investors'
- id: engineering-deep-dive
description: 'Full technical version for the eng team'
Variants are declarative metadata by default. Set variants_resolved: true at the spec top level to opt in to programmatic override resolution (scalar replacement, array replacement with + prefix for append, deep merge for objects, null to clear).
pipeline
Describes the build or generation process for the output artifact.
pipeline:
env:
NODE_ENV: production
steps:
- name: compile
tool: tsc
input: 'src/**/*.ts'
output: dist/
condition: "output.format == 'web'"
- name: export_pdf
command: 'pandoc input.md -o output.pdf'
condition: "output.format == 'pdf'"
on_failure: skip
depends_on: [compile]
env:
PANDOC_DATA_DIR: ./templates
preview:
command: npx serve dist/
url: 'http://localhost:3000'
feedback
Connects output performance metrics back to the spec for reconciliation triggers.
feedback:
metrics:
- name: avg_completion_rate
source: analytics/completion.csv
threshold: '>= 0.7'
- name: build_time
threshold: '< 5s'
triggers:
- condition:
metric: avg_completion_rate
operator: below_threshold
duration: { value: 3, unit: days }
action: reconcile
priority: high
Note: reconciliation_trigger (free-form string) is deprecated in favor of triggers but still accepted.
compliance
Maps invariants and constraints to regulatory or standards frameworks. The reconciliation engine verifies that framework-required invariants still exist in the spec.
compliance:
frameworks:
- name: SOC2
controls:
- id: CC6.1
satisfied_by:
invariants: ['no plaintext passwords stored anywhere']
constraints: ['rate limit: 5 login attempts per minute per IP']
- name: WCAG
level: AA
satisfied_by:
invariants: ['all interactive elements have visible focus indicators']
audit_trail: true
Coverage tiers
Every file in the repo falls into one of four tiers:
- Tier 1 (Full) — Business logic, APIs, user-facing features. Full behavioral specification required.
- Tier 2 (Registered) — Utilities, config, sidecars. Intent and artifact mapping only;
behaviorsnot required. - Tier 3 (Excluded) — Explicitly out of scope. Declared via
excludeglobs on the system spec. - Tier 4 (Derived) — Generated outputs tracked for staleness but not authored directly (e.g., build artifacts, compiled bundles). Use
tier: derivedon the spec ortier: 4on individual artifact refs.
Files not covered by any tier are flagged as “unspecced” — a lint warning, not a blocker.
Set the spec-level tier with the tier field:
tier: registered # full (default) | registered | derived
Composition
Specs compose via $ref (borrowed from JSON Schema/OpenAPI):
subsystems— hierarchical references (system → services)applies— cross-cutting specs (e.g., security, logging) that apply to all subsystems
A top-level system.spec.yaml serves as the manifest, referencing subsystem specs and declaring exclusion patterns for Tier 3 files.
Cross-cutting specs
A spec that expresses concerns spanning multiple subsystems (style, security, logging, compliance) should set cross_cutting: true:
schema_version: '0.8'
cross_cutting: true
intent: >
American English spelling across all code and documentation.
behaviors:
- name: american_english
given: 'british spelling appears in a governed file'
then: 'reconciliation flags it as drift'
invariants:
- 'All documentation uses American English spellings throughout'
Cross-cutting specs:
- Omit
artifacts— they govern no files directly. Their invariants and behaviors layer onto the specs that include them viaapplies. - Cannot be top-level — they must not declare
subsystemsorexclude. - Must be referenced via
applies, notsubsystems— L011 flags misplacement.
This avoids glob overlap with subsystem specs (since two specs governing the same file raises an OverlappingCoverage finding) while still letting the spec layer its invariants across the whole system.
Reconciliation
How reconciliation works
The reconciliation engine detects three scenarios:
1. Someone edits code
The engine detects that code has drifted from the spec and proposes spec and doc updates.
2. Someone edits spec
The engine propagates the spec change to code and documentation.
3. Conflict
Code says one thing, the spec says another. The engine surfaces the disagreement and the user decides which is correct.
The system is always propose-and-approve, never auto-sync. Both users and LLMs can edit everything; the spec is the tiebreaker.
Using reconciliation
After running notarai init, use the /notarai-reconcile slash command in Claude Code to trigger a reconciliation pass.
The skill is a thin orchestrator that delegates context assembly to the notarai export-context CLI command:
- Determines a baseline (from
.notarai/reconciliation_state.jsonif available, or asks for a base branch). - Runs
notarai export-context --all --base-branch <baseline> --format markdownto gather per-spec reconciliation blocks containing spec content and changed-file lists. - For small changesets (10 or fewer changed files), analyzes all specs inline. For larger changesets, spawns one parallel sub-agent per spec.
- Reads changed files, runs
git diffper file, and evaluates each behavior, constraint, and invariant against the changes. - Notes
appliescross-cutting specs anddependenciesrefs for ripple-effect analysis. - Produces a structured report (DRIFT / VIOLATED / UNSPECCED / STALE REF findings).
- Walks through findings interactively, proposing exact changes for approval.
- Calls
mark_reconciled(via MCP or CLI) to update the hash cache, then snapshots reconciliation state.
The MCP server is used for mark_reconciled and snapshot_state when available, with CLI fallbacks (notarai state snapshot) when it is not.
For non-Claude agents, run notarai export-context directly and paste the output into your agent’s prompt. See the CLI reference for details.
Automatic validation
After notarai init, spec files are validated automatically whenever Claude Code writes or edits a file in .notarai/. Invalid specs block the tool use with errors on stderr. Non-spec files are ignored silently.
Specs vs Claude Rules
NotarAI specs and Claude rules (CLAUDE.md / .claude/rules/) both express
project conventions, but they serve different purposes and trigger at different
times. This guide explains when to use each – and when to use both.
Decision framework
| Use a spec when… | Use a Claude rule when… |
|---|---|
| The concern describes what artifacts must look like | The concern describes how Claude should work |
| You want reconciliation to detect drift retroactively | You want to prevent violations proactively |
| The rule maps to files you can diff against | The rule is about process, workflow, or tool usage |
Cross-cutting specs (applies) can propagate it | The convention only matters during generation |
Use a spec
Specs are the right home for artifact-facing rules – invariants, constraints, and behaviors that describe what code, docs, or configs should look like. The reconciliation engine diffs artifacts against these rules and proposes fixes when they drift.
Examples from this project:
- “American English throughout” –
style.spec.yamlcatches existing files that use British spellings - “The engine must never silently auto-modify code” – an invariant in
system.spec.yamlthat reconciliation checks against code changes - “CLI validates spec files against bundled JSON Schema” – a behavior in
cli.spec.yamltied to source files
Cross-cutting specs (referenced via applies in the system spec) propagate
invariants and constraints across all subsystems without duplication.
Use a Claude rule
Claude rules are the right home for workflow-facing instructions – how Claude should run commands, what tools to prefer, what process to follow. These have no artifact to reconcile against; they shape how Claude works, not what the output looks like.
Examples:
- “Tests use
cargo test” – tells Claude which command to run - “When bumping schema version, update these five files” – a checklist for a multi-step process
- “Unit tests are inline
#[cfg(test)]modules” – convention for where to put new tests
These belong in .claude/rules/ files (or CLAUDE.md) because there is no
meaningful way to diff project files against them.
Use both
Some conventions benefit from both proactive prevention and retroactive detection. Style rules are the classic example:
- Claude rule prevents new violations: Claude follows the rule as it generates code, so new files are correct from the start.
- Spec catches existing drift: reconciliation scans all governed files and flags violations that predate the rule or were introduced by humans.
This is intentional duplication, not redundancy. The two mechanisms cover different failure modes.
Examples from this project:
| Convention | Claude rule | Spec |
|---|---|---|
| American English | .claude/rules/style.md | style.spec.yaml |
| QWERTY-typable characters | .claude/rules/style.md | style.spec.yaml |
Anti-patterns
Don’t put process instructions in specs. A spec behavior like “given a schema version bump, then update these five files” has no artifact to diff against. It belongs in a Claude rule or checklist.
Don’t put formal behavioral specs in Claude rules. A rule like “the CLI must validate spec files against the bundled schema” is a testable behavior. If it lives only in CLAUDE.md, reconciliation can’t detect when code drifts away from it.
Don’t duplicate without purpose. If a convention only needs proactive prevention (e.g., “run prettier on generated code”), a Claude rule is sufficient. If it only needs retroactive detection (e.g., “no circular $ref chains”), a spec invariant is sufficient. Use both only when both failure modes are real.
Non-Software Examples
NotarAI works for any artifact with intent, not just code. These examples show how the schema applies to presentations, legal documents, and research reports.
Presentation spec
A conference talk governed for audience alignment and slide drift.
schema_version: '0.7'
domain: presentation
intent: >
A 30-minute conference talk introducing NotarAI to developers unfamiliar with
spec-driven workflows. Attendees should leave understanding the three-body drift
problem and how to run notarai init on their own project.
behaviors:
- name: opening_hook
given: 'speaker takes the stage'
then: 'the intro slide presents a relatable drift scenario in under 90 seconds'
- name: demo_live
given: 'the demo section'
then: 'speaker runs notarai init and reconcile live on a sample repo; audience sees a real drift report'
audience:
role: 'mid-to-senior developers at a software conference'
assumed_knowledge: 'Familiar with git, CI/CD, and code review workflows; may not know NotarAI'
tone: formal-but-engaging
locale: en-US
output:
type: presentation
format: pptx
runtime: static-file
entry_point: dist/talk.pptx
content:
structure: ordered
sections:
- id: intro
type: slide
intent: 'Hook: show a real drift incident and its cost'
duration: { value: 3, unit: minutes }
- id: problem
type: slide
intent: 'Explain the three-body drift problem (spec, code, docs)'
duration: { value: 5, unit: minutes }
- id: demo
type: interactive
intent: 'Live notarai init + reconcile demo on a sample repo'
duration: { value: 10, unit: minutes }
content_ref: demo/sample-repo/
- id: takeaways
type: slide
intent: 'Three action items the audience can do today'
duration: { value: 2, unit: minutes }
design:
theme:
palette: ['#0f172a', '#6366f1', '#ffffff']
typography:
heading: Inter
body: Inter
layout:
type: slide-deck
dimensions: '16:9'
artifacts:
slides:
- path: 'slides/**/*.md'
role: 'slide source content'
assets:
- path: 'assets/**'
role: 'images and diagrams'
What this demonstrates: output.type: presentation, content.sections with duration, audience, and design. The reconciliation engine uses duration to detect if the talk now runs over time, and intent per section to detect off-message slides.
Legal contract spec
A service agreement governed for compliance and clause integrity.
schema_version: '0.7'
domain: legal
intent: >
A standard SaaS service agreement for enterprise customers. Governs payment
terms, liability limits, data processing obligations, and termination rights.
The spec tracks which clauses satisfy which regulatory requirements so that
removing or weakening a clause triggers a compliance drift alert.
behaviors:
- name: data_processing
given: 'customer data is processed by the service'
then: 'the DPA clause defines processing purposes, data categories, and sub-processor obligations per GDPR Article 28'
- name: liability_cap
given: 'a dispute arises'
then: 'liability is capped at 12 months of fees paid, except for gross negligence or data breach'
constraints:
- 'All clause changes must be reviewed by legal counsel before execution'
- 'Governing law must match the entity jurisdiction for each signed copy'
invariants:
- 'The DPA clause must never be removed from the agreement'
- 'Liability cap language must reference the specific cap amount'
compliance:
frameworks:
- name: GDPR
controls:
- id: Art28
satisfied_by:
invariants:
['The DPA clause must never be removed from the agreement']
- name: SOC2
controls:
- id: CC9.2
satisfied_by:
constraints:
[
'All clause changes must be reviewed by legal counsel before execution',
]
audit_trail: true
output:
type: document
format: pdf
content:
structure: ordered
sections:
- id: definitions
type: clause
intent: 'Define all capitalized terms used in the agreement'
- id: services
type: clause
intent: 'Describe the scope and delivery of services'
- id: payment
type: clause
intent: 'Payment terms, invoicing cycle, and late payment penalties'
- id: dpa
type: clause
intent: 'Data Processing Agreement per GDPR Article 28'
depends_on:
- id: definitions
relationship: 'References defined terms for data categories and processing'
- id: liability
type: clause
intent: 'Limit liability to 12 months fees; carve out gross negligence and data breach'
- id: termination
type: clause
intent: 'Termination for convenience (30 days notice) and for cause (material breach)'
design:
layout:
type: paginated
dimensions: letter
print:
margins: { top: '1in', right: '1in', bottom: '1in', left: '1in' }
headers: true
footers: true
page_numbers: true
artifacts:
docs:
- path: 'contracts/service-agreement.md'
role: 'master agreement source'
configs:
- path: 'contracts/variables.yaml'
role: 'per-customer variable substitutions (entity name, jurisdiction, fees)'
What this demonstrates: domain: legal, compliance.frameworks with control mappings, content.sections with type: clause and depends_on, design.print for paginated layout. The compliance block creates an explicit link between the GDPR requirement and the DPA clause – if someone removes the DPA clause, the reconciliation engine flags it as a high-priority drift event.
Research report spec
An evidence-backed technical report governed for citation integrity.
schema_version: '0.7'
domain: research
intent: >
A technical report evaluating three approaches to LLM-assisted code review:
prompt-only, RAG-augmented, and spec-anchored. Reports accuracy, latency, and
reviewer acceptance metrics from a 90-day study across 12 repositories.
behaviors:
- name: methodology_reproducible
given: 'a reader follows the methodology section'
then: 'they can reproduce the experimental setup using the linked code and dataset'
- name: results_traceable
given: 'a claim appears in the results section'
then: 'it is linked to a specific row or aggregate in the dataset'
constraints:
- 'All quantitative claims must cite a specific data source in evidence'
- 'Comparison tables must include confidence intervals'
- 'Methodology must describe exclusion criteria for repositories'
output:
type: document
format: pdf
content:
structure: ordered
sections:
- id: abstract
type: section
intent: 'Summarize the study question, methods, and key finding in 150 words'
duration: { value: 2, unit: minutes }
- id: methodology
type: section
intent: 'Describe the 90-day study design, repository selection criteria, and evaluation metrics'
content_ref: sections/methodology.md
evidence:
- type: reference
ref: 'Chen et al. 2023 -- LLM code review benchmarks'
claim: 'Our accuracy metric aligns with the Chen et al. framework'
relationship: 'supports methodology choice'
- id: results
type: section
intent: 'Present accuracy, latency, and acceptance metrics per approach with confidence intervals'
content_ref: sections/results.md
evidence:
- type: data
source: data/results_final.csv
claim: 'Spec-anchored approach achieves 94% accuracy vs 81% for prompt-only'
relationship: 'primary quantitative result'
- type: data
source: data/latency.csv
claim: 'Median review latency under 4 seconds for all approaches'
- id: discussion
type: section
intent: 'Interpret results, discuss limitations, and suggest future work'
depends_on:
- id: results
relationship: 'Interpretation requires results to be finalized'
- id: conclusion
type: section
intent: 'State the recommendation: spec-anchored review for accuracy-critical workflows'
feedback:
metrics:
- name: peer_review_score
threshold: '>= 3.5 / 5'
- name: reproduction_success_rate
threshold: '>= 0.8'
triggers:
- condition:
metric: peer_review_score
operator: below_threshold
action: reconcile
priority: high
artifacts:
docs:
- path: 'sections/**/*.md'
role: 'report section source content'
data:
- path: 'data/**/*.csv'
role: 'experimental results datasets'
configs:
- path: 'analysis/**/*.py'
role: 'analysis scripts that produce data/ outputs'
What this demonstrates: domain: research, content.sections with evidence entries linking claims to data sources, depends_on between sections, feedback.triggers for structured review thresholds, and duration for time-budgeted writing. When data/results_final.csv changes, the reconciliation engine flags the results section’s claim for review because it is linked via evidence.
Brownfield Adoption Guide
This guide walks through adopting NotarAI on an existing codebase. You do not need to spec everything at once. Start with the most critical modules and expand coverage incrementally.
Prerequisites
- A Git repository with existing code
- NotarAI installed (Installation)
Step 1: Initialize NotarAI
notarai init
This sets up the .notarai/ directory, validation hooks, slash commands, and MCP server configuration. See the Quick Start for details on what init creates.
Step 2: Create a system spec with broad exclusions
Start with a system spec that explicitly excludes files you do not want to track:
# .notarai/system.spec.yaml
schema_version: '0.8'
intent: >
Top-level system spec for the project. Defines subsystem
composition and excludes vendor, generated, and config files.
artifacts:
configs:
- path: 'package.json'
role: 'package manifest'
exclude:
- 'vendor/**'
- 'node_modules/**'
- 'dist/**'
- 'build/**'
- '.github/**'
- '*.lock'
- '*.config.*'
The exclude patterns use glob syntax. Files matching these patterns will not be flagged as “unspecced” by notarai check.
Step 3: Bootstrap specs for critical modules
Use the bootstrap interview to create specs for your 2-3 most important modules:
# In Claude Code:
/notarai-bootstrap
# Or for any agent:
notarai export-context --bootstrap | pbcopy
The bootstrap flow interviews you about the module’s purpose, behaviors, constraints, and invariants, then drafts a spec. You review and approve before anything is written.
Focus on modules where drift would cause the most damage: authentication, billing, core business logic.
Step 4: Run your first check
notarai check
This reports:
- Coverage gaps: Files not governed by any spec (expected to be large initially)
- Orphaned globs: Spec artifact patterns matching no files (should be zero for freshly created specs)
Do not try to eliminate all coverage gaps immediately. A large brownfield codebase will have many unspecced files, and that is fine. The goal is progressive coverage.
Step 5: Add specs as modules are touched
When you modify a module, that is the natural time to add a spec for it. The incremental approach:
- Before making changes, create a spec for the module (or use
/notarai-bootstrapto interview about it) - Make your code changes
- Run
/notarai-reconcileto verify the spec still aligns - Commit the spec alongside the code changes
Over time, your most-changed modules will naturally accumulate spec coverage.
Step 6: Set up CI drift detection
Add the NotarAI GitHub Action to your PR workflow:
# .github/workflows/notarai.yml
name: NotarAI Check
on:
pull_request:
branches: [main]
permissions:
contents: read
pull-requests: write
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: davidroeca/NotarAI/crates/notarai-action@v0.7.0
This runs notarai check on every PR and posts a comment summarizing findings. See the GitHub Action reference for configuration options.
Common pitfalls
Over-speccing too early. Do not try to write full behavioral specs for every module on day one. Start with tier: registered specs that just map intent to artifacts. Add behaviors later for critical paths. See Progressive Adoption.
Ignoring coverage gaps. Coverage gaps are warnings, not errors. They are informational. Do not suppress them globally; instead, add exclude patterns for directories you genuinely do not want to track (vendor, generated code).
Speccing generated code. Files generated by build tools, compilers, or code generators should be excluded (Tier 3) or tracked as derived (Tier 4). Do not write behavioral specs for generated output.
Realistic timeline
For a medium-sized project (50-100 source files, 5-10 logical modules):
- Day 1:
notarai init, system spec with exclusions, 2-3 module specs via bootstrap - Week 1-2: Add specs for modules you are actively working on
- Month 1: 40-60% coverage of source files; CI drift checks running on PRs
- Ongoing: Coverage grows naturally as modules are touched
There is no pressure to reach 100% coverage. Many teams find that 60-80% coverage of source files (with the remainder excluded or registered) provides the right balance of safety and maintenance cost.
Progressive Adoption
You do not need full behavioral specs to get value from NotarAI. This guide describes three maturity levels. Most teams should aim for Level 2 on critical modules and Level 1 everywhere else.
Level 1: Intent and artifacts only
The simplest useful spec. Maps files to a purpose without describing behaviors.
schema_version: '0.8'
intent: >
HTTP client library with retry logic, connection pooling,
and timeout configuration.
tier: registered
artifacts:
code:
- path: 'src/http/**/*.rs'
role: 'HTTP client implementation'
tests:
- path: 'tests/http_*.rs'
role: 'HTTP client integration tests'
docs:
- path: 'docs/http.md'
role: 'HTTP client usage guide'
What you get at Level 1:
notarai checkdetects coverage gaps and orphaned globs- Reconciliation surfaces which files changed and which spec governs them
- You have a searchable index of what each module owns
When Level 1 is enough: Utilities, configuration, internal tools, and any module where the intent is self-evident from the code.
Level 2: Add behaviors for critical paths
Add given/then behavior descriptions for the paths that matter most: error handling, security boundaries, data validation, and user-facing features.
schema_version: '0.8'
intent: >
HTTP client library with retry logic, connection pooling,
and timeout configuration.
behaviors:
- name: retry_on_transient_failure
given: 'a request fails with a 502, 503, or 429 status'
then: 'retries up to 3 times with exponential backoff (1s, 2s, 4s)'
- name: timeout_enforcement
given: 'a request exceeds the configured timeout'
then: 'aborts the request and returns a timeout error'
- name: connection_pool_reuse
given: 'multiple requests to the same host within the keep-alive window'
then: 'reuses the existing TCP connection'
constraints:
- 'All HTTP errors must be wrapped in a typed error enum'
- 'Connection pool size must be configurable at initialization'
artifacts:
code:
- path: 'src/http/**/*.rs'
role: 'HTTP client implementation'
tests:
- path: 'tests/http_*.rs'
role: 'HTTP client integration tests'
What you get at Level 2:
- Everything from Level 1
- Reconciliation can detect when code behavior drifts from spec (e.g., retry logic changed but spec still says “3 retries”)
notarai lintflags incomplete behaviors (missinggivenorthen)- New team members can read the spec to understand intended behavior without reading the full implementation
When to use Level 2: Business logic, APIs, authentication, data pipelines, and anything where behavioral correctness matters.
Level 3: Full SDD coverage
Add constraints, invariants, decisions, and cross-cutting concerns. This is the full power of the spec format.
schema_version: '0.8'
intent: >
HTTP client library with retry logic, connection pooling,
and timeout configuration.
behaviors:
- name: retry_on_transient_failure
given: 'a request fails with a 502, 503, or 429 status'
then: 'retries up to 3 times with exponential backoff (1s, 2s, 4s)'
- name: timeout_enforcement
given: 'a request exceeds the configured timeout'
then: 'aborts the request and returns a timeout error'
constraints:
- 'All HTTP errors must be wrapped in a typed error enum'
- 'Connection pool size must be configurable at initialization'
- 'No unbounded retries: retry count must have a hard cap'
invariants:
- 'A timed-out request must never silently succeed'
- 'Connection pool must never leak file descriptors'
decisions:
- date: '2026-03-15'
choice: 'Use ureq (blocking) instead of reqwest (async)'
rationale: >
Project is synchronous throughout. Adding tokio for HTTP
alone adds 200KB to the binary and complicates error handling.
artifacts:
code:
- path: 'src/http/**/*.rs'
role: 'HTTP client implementation'
tests:
- path: 'tests/http_*.rs'
role: 'HTTP client integration tests'
What you get at Level 3:
- Everything from Level 2
- Invariants serve as hard constraints that reconciliation checks against
- Decisions capture the “why” behind architectural choices, preventing them from being unknowingly reversed
notarai lintchecks decision freshness and rationale completeness
When to use Level 3: Core system components, security-critical modules, and modules where architectural decisions are frequently revisited or questioned.
Guidance
Start at Level 1 for everything, then promote. When you find yourself explaining a module’s behavior to a teammate or an LLM, that is a signal to promote it to Level 2.
Level 3 is not always better. Over-specified specs create maintenance burden. A utility module with three functions does not need invariants and decisions. Match the spec depth to the module’s complexity and risk.
You do not need Level 3 to get value from NotarAI. Most teams find that Level 2 on 5-10 critical modules plus Level 1 everywhere else provides the right balance.
See Brownfield Adoption for a step-by-step guide to getting started.
Severity Tiers
NotarAI classifies every check and lint finding into one of three severity tiers. Tiers help you prioritize: fix critical issues first, review drift during development, and handle housekeeping when convenient.
Tier definitions
| Tier | Name | Meaning | Example |
|---|---|---|---|
| 1 | Critical | Broken references or structural violations | Orphaned glob (spec references deleted code), circular $ref cycle, schema version mismatch (L009), missing $ref target (L004) |
| 2 | Drift | Code changed in ways that may not align with spec | Files changed since last reconciliation, Tier 1 spec with no behaviors (L001), duplicate behavior names (L010) |
| 3 | Housekeeping | Documentation, style, or organizational misalignment | Coverage gaps, overlapping coverage, incomplete behaviors, stale decisions (L006), open questions (L007), broad globs (L008) |
Tier assignment
Each check type and lint rule is assigned a fixed tier:
Critical: CircularRef, OrphanedGlob, L004, L009
Drift: ChangedSinceReconciliation, L001, L010
Housekeeping: CoverageGap, OverlappingCoverage, BehaviorIncomplete, L002, L003, L005, L006, L007, L008
Output
Human output groups findings by tier (critical first):
--- Critical (2 findings) ---
Orphaned Globs (1 findings)
error : src/deleted/**/*.rs
in .notarai/cli.spec.yaml
Circular $ref Cycles (1 findings)
error : .notarai/a.spec.yaml
Circular $ref chain: a -> b -> a
--- Drift (1 findings) ---
Changed Since Last Reconciliation (1 findings)
warning: src/lib.rs
--- Housekeeping (3 findings) ---
Coverage Gaps (3 findings)
warning: README.md
warning: LICENSE
warning: CONTRIBUTING.md
6 issues found (2 errors, 4 warnings).
JSON output includes a tier field on each finding:
{
"findings": [
{
"type": "orphaned_glob",
"severity": "error",
"tier": "critical",
"spec_path": ".notarai/cli.spec.yaml",
"file_path": null,
"glob_pattern": "src/deleted/**/*.rs",
"message": "Artifact glob matches no files: src/deleted/**/*.rs"
}
],
"summary": { "errors": 1, "warnings": 0 }
}
Configuring CI thresholds
Create .notarai/check.yaml to control which tiers cause CI failures:
# Fail the check if any critical or drift finding is present.
fail_on: drift
# Only show critical and drift findings (suppress housekeeping).
warn_on: drift
fail_on: The minimum tier that causes a non-zero exit code. Accepts critical, drift, or housekeeping. When set, any finding at or above (lower number) the configured tier causes exit code 1. Default behavior (when omitted): fail only on error-severity findings.
warn_on: The minimum tier to include in output. Tiers below this threshold are suppressed entirely. Default: housekeeping (show everything).
The --strict flag overrides fail_on and causes any finding at all to produce exit code 1.
LLM reconciliation tiers
The /notarai-reconcile skill and notarai export-context prompt also use the three-tier classification. When the LLM evaluates semantic drift, it classifies each finding as:
- Critical: Code contradicts a spec constraint, invariant, or behavior.
- Drift: Code has changed in ways not reflected in the spec, or new code is not covered by any behavior.
- Housekeeping: Documentation references outdated information, or style/naming diverges from spec conventions.
Drift Scoring
notarai score computes a numeric drift score for each spec so you can
prioritize reconciliation work. Scores are deterministic, produced
without LLM calls, and always informational (exit code 0).
Score range and status
Each spec gets a score in [0.0, 1.0]:
| Range | Status | Meaning |
|---|---|---|
| 0.0 - 0.3 | healthy | Spec is aligned with code and docs. |
| 0.3 - 0.6 | review | Some drift signals; review soon. |
| 0.6 - 1.0 | overdue | High drift; reconcile now. |
The overall score is the mean across all specs.
Signals
Six weighted signals contribute to each spec’s score. All weights are
configurable via .notarai/scoring.yaml; defaults are shown below.
| Signal | Default weight | What it measures |
|---|---|---|
files_changed | 0.30 | Governed files changed since last reconciliation. |
days_since_reconciliation | 0.20 | Days since the reconciliation state was updated. |
unresolved_decisions | 0.15 | Proposed (not yet accepted) decisions for spec. |
orphaned_globs | 0.15 | Artifact glob patterns that match zero files. |
open_questions | 0.10 | open_questions entries in the spec YAML. |
unspecced_files | 0.10 | Tracked files in governed directories not covered. |
Each signal is normalized to [0.0, 1.0] before being multiplied by
its weight. The final score is clamped to 1.0.
Usage
notarai score
notarai score --format json
notarai score --spec .notarai/cli.spec.yaml
Human output
Spec Score Status
-----------------------------------------------------------------
.notarai/cli.spec.yaml 0.42 review
.notarai/system.spec.yaml 0.18 healthy
Overall: 0.30 (review)
JSON output
{
"specs": [
{
"spec_path": ".notarai/cli.spec.yaml",
"score": 0.42,
"status": "review"
},
{
"spec_path": ".notarai/system.spec.yaml",
"score": 0.18,
"status": "healthy"
}
],
"overall": { "score": 0.3, "status": "review" }
}
Configuration
Create .notarai/scoring.yaml to override default weights:
files_changed: 0.4
days_since_reconciliation: 0.1
unresolved_decisions: 0.2
orphaned_globs: 0.1
open_questions: 0.1
unspecced_files: 0.1
Weights do not need to sum to 1.0, but keeping them normalized makes the resulting score easier to interpret.
MCP integration
The initialize response from notarai mcp includes:
driftScore– overall score (0.0-1.0)driftStatus–healthy/review/overduemostDrifted– spec path with the highest score
This lets agents quickly orient without running a separate command.
Troubleshooting Reconciliation
Common issues encountered during reconciliation and how to resolve them.
“Reconciliation flags too many false positives”
Symptom: Reconciliation reports drift on files that have not meaningfully changed, or flags changes that are intentional.
Causes and fixes:
-
Overly broad artifact globs. A glob like
src/**/*.rsmay include utility files, generated code, or test helpers that change frequently without affecting the spec’s intent. Narrow the globs to match only the files the spec actually governs. -
Missing exclude patterns. Generated files, lock files, and build artifacts should be excluded at the system spec level. Add patterns to the
excludearray in your system spec. -
Spec covers too much. If a single spec governs 50+ files, it will flag drift on every PR that touches any of them. Split large specs into focused subsystems.
“Reconciliation misses obvious drift”
Symptom: Code has clearly diverged from the spec, but reconciliation reports no findings.
Causes and fixes:
-
Stale cache. The BLAKE3 cache may show files as unchanged if they were marked reconciled after a previous session. Try running with
bypass_cache: trueon the MCPget_spec_difftool, or clear the cache:notarai cache clear -
Behaviors do not cover the changed area. Reconciliation evaluates drift against the behaviors listed in the spec. If the changed code is not described by any behavior, the LLM has nothing to compare against. Add behaviors for the critical paths you want monitored.
-
Wrong base branch. The reconciliation prompt compares against a base branch (default:
main). If you are working on a long-lived feature branch, the diff may not include the changes you expect. Specify the correct base:notarai export-context --all --base-branch develop
“Context window is too large”
Symptom: The reconciliation prompt exceeds the LLM’s context window, or the LLM produces shallow analysis because the context is too dense.
Causes and fixes:
-
Use exclude_patterns in get_spec_diff. The MCP tool accepts an
exclude_patternsarray of glob strings that suppress noisy files from the diff output. Common candidates: lock files, snapshot files, auto-generated code. -
Split large specs into subsystems. A spec governing 30+ files produces a large diff. Break it into focused subsystems with
$reflinks. Reconciliation processes each spec independently, keeping context proportional to each subsystem’s changes. -
Let the cache work. Files that have not changed since the last reconciliation are automatically excluded from the diff. Run
notarai cache statusto verify the cache is populated. If it is empty, the first reconciliation will include everything; subsequent runs will be incremental.
“Spec and code intentionally diverged”
Symptom: You know the code has changed in ways that do not match the spec, and you want to acknowledge this without updating the spec immediately.
Fix: Add a decisions entry to the spec explaining the divergence:
decisions:
- date: '2026-04-10'
choice: 'Temporarily diverge from retry spec during migration'
rationale: >
The old retry logic is being replaced incrementally. The spec
describes the target state. Code will converge over the next
two sprints.
Then mark the affected files as reconciled so they do not trigger repeated warnings:
# Via MCP (in a Claude Code session):
# The mark_reconciled tool updates the cache for specified files.
# Via CLI:
notarai state snapshot
“notarai check reports errors but reconciliation says everything is fine”
Symptom: notarai check finds issues (orphaned globs, coverage gaps) but the LLM-based reconciliation does not mention them.
Explanation: notarai check runs deterministic, structural checks. LLM-based reconciliation evaluates semantic alignment. They are complementary:
- check catches: orphaned globs, coverage gaps, overlapping specs, circular refs, stale cache entries
- reconciliation catches: behavioral drift, outdated constraints, misaligned documentation
Run both. Use notarai check in CI for fast, deterministic feedback. Use /notarai-reconcile during development for deeper semantic analysis.
“Bootstrap interview produces a spec that does not validate”
Symptom: The spec generated by /notarai-bootstrap fails notarai validate.
Fix: This usually means the LLM produced a field or value not in the JSON Schema. Common issues:
-
Unknown artifact category. The schema allows
code,docs,tests,configs, and several others. If the LLM invented a category likescripts, rename it to a supported one or use a custom key. -
Missing required fields. Every spec needs
schema_version,intent, andartifacts. If the LLM omitted one, add it. -
Wrong schema_version. The LLM may have used an older version string. Run
notarai schema-bumpto update all specs to the current version.
CLI Commands
NotarAI is distributed as a single static binary with no runtime dependencies. All commands use the notarai prefix.
notarai validate
Validate spec files against the JSON Schema.
# Validate all specs in .notarai/ (default)
notarai validate
# Validate a specific file
notarai validate .notarai/auth.spec.yaml
# Validate a directory
notarai validate .notarai/subsystems/
Arguments:
| Argument | Required | Description |
|---|---|---|
path | No | File or directory to validate. Defaults to .notarai/ |
Behavior:
- Single file: validates against the schema, prints
PASSorFAILwith indented errors. - Directory: recursively finds all
.spec.yamlfiles and validates each. - No specs found: exits 0 with a warning on stderr.
- Stale schema warning: if
.notarai/notarai.spec.jsonexists but its$iddiffers from the bundled schema, prints a warning suggestingnotarai initto update.
Exit codes: 0 all files pass, 1 any file fails.
notarai check
Deterministic, LLM-free drift detection. Reports coverage gaps, orphaned globs, changed files, overlapping coverage, circular $ref chains, and incomplete behaviors.
# Human-readable output (default)
notarai check
# JSON output
notarai check --format json
# Custom base branch
notarai check --base-branch develop
# Strict mode: promote all warnings to errors (useful for CI)
notarai check --strict
Arguments:
| Flag | Required | Default | Description |
|---|---|---|---|
--format | No | human | Output format: human or json |
--base-branch | No | main | Base branch for changed-file detection |
--strict | No | false | Promote all warnings to errors (zero-tolerance CI) |
Checks performed:
| Check | Severity | Tier | Description |
|---|---|---|---|
| Orphaned globs | Error | Critical | Artifact glob patterns matching zero files |
Circular $ref chains | Error | Critical | Cycles in subsystems, applies, or dependencies references |
| Changed since reconciliation | Warning | Drift | Governed files changed since last cache update |
| Coverage gaps | Warning | Housekeeping | Tracked files not governed by any spec (minus excludes) |
| Overlapping coverage | Warning | Housekeeping | Files governed by two or more specs |
| Behavior completeness | Warning | Housekeeping | Behaviors missing a given or then field |
| T001 Test coverage missing | Warning | Housekeeping | Tier-1 behavior without a tested_by entry |
| T002 Test path missing | Error | Critical | tested_by.path does not exist on disk |
Lint rules (L001-L011) are also run and merged into check output. See Lint Rules.
Severity tiers: Each finding is classified as Critical, Drift, or Housekeeping. Human output groups findings by tier. JSON output includes a tier field. See Severity Tiers for details.
Configuration: Create .notarai/check.yaml to control CI thresholds:
fail_on: drift # Fail on critical or drift findings
warn_on: drift # Suppress housekeeping from output
With --strict, all warning-severity findings are promoted to errors and any finding causes exit code 1.
The check command never modifies files or the cache database.
Exit codes: 0 no error-severity findings (or no findings at or above fail_on tier), 1 errors found (including warnings promoted under --strict), 2 not initialized (.notarai/ missing).
notarai lint
Lint spec files for quality issues beyond JSON Schema conformance. A superset of notarai validate that checks semantic quality.
# Human-readable output (default)
notarai lint
# JSON output
notarai lint --format json
| Flag | Default | Description |
|---|---|---|
--format | human | Output format: human or json |
Runs 11 deterministic rules (L001-L011) covering missing behaviors, broken $ref targets, stale decisions, schema mismatches, and more. Rules can be configured via .notarai/lint.yaml. Lint results are also integrated into notarai check.
See Lint Rules for the full rule reference.
Exit codes: 0 no error-severity findings, 1 errors found, 2 not initialized.
notarai decisions
Manage decision proposals from reconciliation. Proposals are stored in .notarai/decision-log.json and can be accepted (appended to the spec’s decisions array) or rejected (marked in the log with an optional reason).
notarai decisions list
# List all decisions
notarai decisions list
# Filter by status
notarai decisions list --status proposed
| Flag | Default | Description |
|---|---|---|
--status | (all) | Filter: proposed, accepted, or rejected |
notarai decisions accept
notarai decisions accept .notarai/auth.spec.yaml 0
Accepts the proposal at the given index: removes it from the log, appends { date, choice, rationale } to the spec’s YAML decisions array, and validates the spec afterward.
notarai decisions reject
notarai decisions reject .notarai/auth.spec.yaml 0 --reason "Not relevant"
Marks the proposal as rejected in the log. Does not modify the spec. The optional --reason flag records why the decision was rejected.
Exit codes: 0 success, 1 error, 2 not initialized.
notarai score
Compute drift scores for each spec. Deterministic, no LLM calls.
Exit code is always 0 (informational). See the
Drift Scoring guide for signal details
and configuration.
notarai score
notarai score --format json
notarai score --spec .notarai/cli.spec.yaml
| Flag | Default | Description |
|---|---|---|
--format | human | Output format: human or json. |
--spec | (all) | Score a single spec by path. |
Scores are in [0.0, 1.0] with thresholds: < 0.3 healthy,
< 0.6 review, otherwise overdue.
notarai init
Set up NotarAI in a project. Running init again is safe: it always refreshes skills and the schema copy.
# Interactive prompt (defaults to claude)
notarai init
# Explicit agent selection
notarai init --agents claude
notarai init --agents opencode
notarai init --agents claude,gemini
# All known adapters
notarai init --agents all
# Agent-agnostic artifacts only (no adapter-specific setup)
notarai init --agents none
# Deprecated alias (claude -> claude, generic -> opencode)
notarai init --agent claude
notarai init --agent generic
Arguments:
| Flag | Required | Description |
|---|---|---|
--agents | No | Comma-separated list of agents: claude, gemini, codex, opencode, plus meta-tokens all and none. Prompts interactively if omitted and stdin is a TTY; auto-detects if stdin is not a TTY |
--agent | No | Deprecated alias for --agents. claude maps to --agents claude; generic maps to --agents opencode |
Shared setup (all modes):
- Copies
notarai.spec.jsonto.notarai/notarai.spec.json(always refreshed). - Writes
.notarai/README.mdwith workflow instructions (always overwritten). - Writes
.notarai/reconcile-prompt.md(reconciliation prompt template). - Writes
.notarai/bootstrap-prompt.md(bootstrap prompt template). - Appends
.notarai/.cache/to.gitignore. - Writes
.mcp.jsonregisteringnotarai mcpas a local MCP server. - Writes or section-merges
AGENTS.mdso user content outside the## NotarAIsection is preserved.
Per-adapter setup (for each selected adapter):
- If the adapter declares a pointer file (CLAUDE.md, GEMINI.md), creates it as a single-line
@AGENTS.mdstub when absent, leaves it unchanged when it already contains@AGENTS.md, or section-merges a## NotarAIblock when it has other content. - If the adapter declares a skills directory, always overwrites SKILL.md for
notarai-reconcileandnotarai-bootstrap(Claude-flavor for the claude adapter, generic-flavor for all others). - If the adapter declares a hook installer, installs it (only claude installs a PostToolUse hook in
.claude/settings.json).
Exit codes: 0 success, 1 error (unparseable JSON, unknown agent, symlink pointer file, non-directory skills path).
notarai export-context
Export reconciliation context for any LLM agent. Outputs spec content, changed files, and diffs in a format suitable for feeding into a reconciliation prompt.
# Single spec, markdown output (default)
notarai export-context --spec .notarai/auth.spec.yaml
# All affected specs, JSON output
notarai export-context --all --format json
# Custom base branch
notarai export-context --spec .notarai/api.spec.yaml --base-branch develop
Arguments:
| Flag | Required | Default | Description |
|---|---|---|---|
--spec | One of the two | Path to a single spec file | |
--all | One of the two | Export context for all affected specs | |
--base-branch | No | main | Base branch for diff |
--format | No | markdown | Output format: markdown or json |
Exactly one of --spec or --all is required.
Markdown output fills the bundled reconcile-prompt.md template with spec content, changed file list, and diff. Multiple specs are separated by ---.
JSON output includes spec_path, spec_name, spec_content, changed_files, diff, binary_changes, and file_categories. A single spec produces an object; --all with multiple specs produces an array.
Exit codes: 0 success, 1 error (bad arguments, missing spec, git failure), 2 not initialized (.notarai/ missing).
notarai schema-bump
Update the schema version across all specs in the project.
notarai schema-bump
Detects the schema version in .notarai/notarai.spec.json (if it exists) and compares it to the bundled schema. If they differ:
- Overwrites
.notarai/notarai.spec.jsonwith the bundled schema. - Updates the
schema_versionfield in every.notarai/*.spec.yamlfile. - Validates all updated specs and reports any failures.
If versions already match, prints “Already at current schema version” and exits 0.
Exit codes: 0 success or already current, 1 validation error after update.
notarai hook validate
PostToolUse hook handler. Validates spec files when Claude Code writes or edits them.
# Called automatically by Claude Code, not typically invoked manually
notarai hook validate
Reads PostToolUse JSON from stdin. If the file path matches .notarai/**/*.spec.yaml, reads the file from disk and validates it. Invalid specs block the tool use with errors on stderr.
Behavior:
| Stdin | Result |
|---|---|
Spec file path (.notarai/**/*.spec.yaml) | Validates; exits 1 with errors if invalid |
| Non-spec file path | Exits 0 silently |
| Invalid JSON or missing file | Exits 0 silently (graceful degradation) |
Exit codes: 0 valid or non-spec file, 1 invalid spec.
notarai cache
BLAKE3 + SQLite hash cache for tracking file changes between reconciliation runs. The cache database lives at .notarai/.cache/notarai.db.
notarai cache status
Show cache status: database path, entry count, and newest entry timestamp.
notarai cache status
Creates an empty database if none exists.
Exit codes: 0 success, 1 error.
notarai cache clear
Delete the cache database.
notarai cache clear
Prints Cache cleared or Cache not initialized (if the DB didn’t exist). No-op if the file does not exist.
Exit codes: 0 success, 1 error.
notarai state
Manage the persistent reconciliation state file (.notarai/reconciliation_state.json). The state file records the last reconciliation timestamp, git hash, branch, and BLAKE3 fingerprints for all governed files and specs. It can be committed to the repo to give collaborators a baseline.
notarai state show
Display the current reconciliation state.
notarai state show
Prints the timestamp, git hash, branch, and counts of tracked files and specs. Prints No reconciliation state found. if no state file exists.
Exit codes: 0 success, 1 error.
notarai state reset
Delete the reconciliation state file, forcing the next reconciliation to treat everything as changed.
notarai state reset
Prints Reconciliation state reset. or No reconciliation state to reset. (if the file didn’t exist).
Exit codes: 0 success, 1 error.
notarai state snapshot
Build a new state snapshot from the current SQLite cache and save it to .notarai/reconciliation_state.json.
notarai state snapshot
Reads all entries from the cache, partitions them into file fingerprints and spec fingerprints, captures the current git HEAD and branch, and writes the result. This is the CLI equivalent of the snapshot_state MCP tool.
Exit codes: 0 success, 1 error.
notarai update
Check for and install updates.
# Check if an update is available
notarai update --check
# Update to the latest version
notarai update
Arguments:
| Flag | Required | Description |
|---|---|---|
--check | No | Only check, don’t install |
Behavior:
The command queries the GitHub API for the latest release, compares its version against the current binary, and prints the result. Without --check, it also attempts to install the update:
| Install method | Detection | Action |
|---|---|---|
| GitHub Release | Binary is not in .cargo/bin or target/ | Downloads and replaces the binary in place |
| cargo install | Binary path contains .cargo/bin | Prints cargo install notarai |
| Dev build | Debug build or path contains target/ | Prints cargo install --path crates/notarai |
Passive update hints:
notarai validate and notarai init automatically check for updates in the background using a global cache with a 24-hour TTL and a 5-second network timeout. If a newer version is available, a one-line hint is printed to stderr. All errors are silently swallowed — the hint never interferes with normal output.
Exit codes: 0 success or up to date, 1 error or update failure.
notarai mcp
Start a synchronous JSON-RPC 2.0 MCP server over stdio. Typically configured automatically by notarai init rather than invoked manually.
notarai mcp
The server reads JSON-RPC messages line-by-line from stdin and writes responses to stdout. It exits cleanly on stdin EOF.
Protocol: JSON-RPC 2.0 over stdio (synchronous, no async runtime).
Setup: notarai init writes .mcp.json to the project root, which Claude Code reads to auto-start the server:
{
"mcpServers": {
"notarai": {
"type": "stdio",
"command": "notarai",
"args": ["mcp"]
}
}
}
See the MCP Server reference for the full tool API, parameters, and return shapes.
Exit codes: 0 on stdin EOF.
Lint Rules
notarai lint checks spec quality beyond JSON Schema conformance. Each rule has a stable ID, a default severity, and a description. Rules can be configured per-project via .notarai/lint.yaml.
Rules
| Rule | Default Severity | Description |
|---|---|---|
| L001 | error | Tier 1 (full) spec has zero behaviors. A full spec should describe at least one behavior. |
| L002 | warning | Behavior missing given field. The trigger condition is unspecified. |
| L003 | warning | Behavior missing then field. The expected outcome is unspecified. |
| L004 | error | $ref target file does not exist on disk. A subsystem, applies, or dependency reference points to a missing file. |
| L005 | warning | Circular $ref dependency detected. Spec reference chains form a cycle. |
| L006 | warning | Decision older than 90 days with no rationale. Stale decisions without context lose value over time. |
| L007 | info | Spec has open_questions. Consider resolving before reconciliation. |
| L008 | warning | Artifact glob is **/* (overly broad). Likely matches unintended files. |
| L009 | error | schema_version does not match the bundled schema. Run notarai schema-bump to update. |
| L010 | warning | Duplicate behavior names within a spec. Each behavior should have a unique name. |
| L011 | error | Cross-cutting spec (cross_cutting: true) referenced from another spec’s subsystems. Move it to applies. |
Usage
# Human-readable output (default)
notarai lint
# JSON output for CI
notarai lint --format json
Exit codes:
0: No error-severity findings.1: At least one error-severity finding.2:.notarai/directory not found.
Configuration
Create .notarai/lint.yaml to customize rule behavior:
rules:
L006:
severity: info # downgrade from warning
decision_age_days: 180 # custom threshold (default: 90)
L007:
enabled: false # disable rule entirely
L008:
severity: error # promote to error
Each rule supports:
- enabled (bool): Set to
falseto disable the rule. Default:true. - severity (string): Override the default severity. Values:
error,warning,info. - Rule-specific parameters (e.g.,
decision_age_daysfor L006).
Integration with notarai check
Lint rules run automatically as part of notarai check. Findings from L002, L003, and L005 are deduplicated against their check equivalents (BehaviorIncomplete and CircularRef). All other lint errors count as check errors and affect the exit code.
JSON Output Format
{
"findings": [
{
"rule_id": "L001",
"severity": "error",
"spec_path": ".notarai/auth.spec.yaml",
"message": "Tier 1 (full) spec has zero behaviors: .notarai/auth.spec.yaml"
}
],
"summary": {
"errors": 1,
"warnings": 0,
"infos": 0
}
}
Rule IDs are stable across versions and will never be renumbered.
MCP Server
NotarAI includes a built-in Model Context Protocol (MCP) server that serves pre-filtered diffs and change data to the reconciliation engine. This keeps context usage proportional to what actually changed rather than the full repository.
Setup
notarai init writes an .mcp.json file to the project root that registers the MCP server:
{
"mcpServers": {
"notarai": {
"type": "stdio",
"command": "notarai",
"args": ["mcp"]
}
}
}
Claude Code reads this file and starts the server automatically. No manual configuration needed.
Protocol
- Transport: stdio (stdin/stdout)
- Format: JSON-RPC 2.0, one message per line
- Execution: synchronous (no async runtime)
- Protocol version:
2024-11-05
Initialize response
The initialize response includes standard MCP fields (protocolVersion, capabilities, serverInfo, tools). When the local schema (.notarai/notarai.spec.json) is out of date relative to the bundled schema, the response includes an additional schemaNote field:
{
"schemaNote": "Schema is out of date (local: .../0.5/..., bundled: .../0.6/...). Run `notarai init` to update."
}
This surfaces schema staleness to Claude at session start without requiring a separate check.
When the project’s NotarAI configs are behind the running CLI version (detected via the version in .notarai/README.md), the response includes an additional projectNote field:
{
"projectNote": "hint: project was initialized with notarai v0.3.1. Run `notarai init` to update project configs to v0.3.2."
}
This surfaces project config staleness to Claude at session start so reconciliation uses up-to-date slash commands and schema.
The response also includes a drift score snapshot so agents can prioritize reconciliation work without calling a separate tool:
{
"driftScore": 0.42,
"driftStatus": "review",
"mostDrifted": ".notarai/cli.spec.yaml"
}
See the Drift Scoring guide for signal details.
Tools
list_affected_specs
Identify which specs govern files that changed on the current branch relative to a base branch.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
base_branch | string | Yes | Branch to diff against (e.g., "main") |
Returns:
{
"changed_files": ["src/auth.rs", "src/main.rs"],
"affected_specs": [
{
"spec_path": ".notarai/cli.spec.yaml",
"behaviors": [],
"constraints": [],
"invariants": []
}
]
}
Each affected spec includes its behaviors, constraints, and invariants so the reconciliation engine has the context to evaluate drift without additional file reads.
get_spec_diff
Get the git diff filtered to files governed by a specific spec. Uses the hash cache to skip files that haven’t changed since the last reconciliation.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
spec_path | string | Yes | Relative path to the spec file |
base_branch | string | Yes | Branch to diff against |
exclude_patterns | string[] | No | Glob patterns to exclude via git :(exclude) pathspecs (e.g., ["Cargo.lock", "*.lock"]) |
bypass_cache | boolean | No | If true, diff all governed files regardless of cache state. Defaults to false |
Returns:
{
"diff": "unified diff of non-spec governed files...",
"files": ["src/auth.rs"],
"skipped": ["src/utils.rs"],
"excluded": ["Cargo.lock"],
"spec_changes": [
{
"path": ".notarai/cli.spec.yaml",
"content": "full file content..."
}
],
"system_spec": {
"path": ".notarai/system.spec.yaml",
"content": "full file content..."
},
"binary_changes": ["assets/logo.png", "slides/deck.pptx"],
"file_categories": {
"src/auth.rs": "code",
"docs/auth.md": "docs",
"assets/logo.png": "assets"
},
"spec_invalidated": ["src/utils.rs"]
}
| Field | Description |
|---|---|
diff | Unified diff output for non-spec, non-binary artifact files only |
files | Non-spec files included in the diff (includes binary files by path, but their content is in binary_changes) |
skipped | Non-spec files whose BLAKE3 hash matched the cache (already reconciled) |
excluded | Patterns passed via exclude_patterns |
spec_changes | Array of {path, content} for each governed .notarai/**/*.spec.yaml file that changed |
system_spec | The system spec (the spec with a subsystems key) – included whenever spec_changes is non-empty; null otherwise |
binary_changes | File paths of binary files (images, PPTX, PDF, etc.) whose content cannot be usefully diffed |
file_categories | Object mapping each changed file path to its artifact category from the spec (e.g., "code", "docs", "assets") |
spec_invalidated | Cached artifact paths whose governing spec has changed since last reconciliation, indicating they need review even though the artifacts themselves have not changed on disk. Empty when bypass_cache is true |
Why full content for spec files?
Spec files express intent, not implementation. The reconciliation engine needs the complete spec to evaluate drift – diff hunks showing only changed lines lack the context to determine whether behavior is still satisfied. Returning full content also avoids the ambiguity of partial context when the spec is the source of truth.
Spec deduplication: If the system spec itself changed, it appears in spec_changes with full content and system_spec contains only {path} (a reference) to avoid duplicating the content.
Cache behavior:
- Files whose on-disk BLAKE3 hash matches the cached hash are listed in
skipped(for artifact files) or omitted fromspec_changes(for spec files). - A cold or absent cache causes all governed files to be included. This is a safe fallback that ensures nothing is missed.
bypass_cache: trueforces a full diff without destroying the cache (useful for re-checking everything).
get_changed_artifacts
Get artifact files governed by a spec that have changed since the last cache update. Useful for identifying which docs or other artifacts need review during reconciliation.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
spec_path | string | Yes | Relative path to the spec file |
artifact_type | string | No | Filter by artifact type (e.g., "docs", "code", "configs") |
Returns:
{
"changed_artifacts": ["docs/auth.md", "docs/api-reference.md"],
"spec_invalidated": ["docs/overview.md"]
}
| Field | Description |
|---|---|
changed_artifacts | Files whose on-disk content differs from the cached hash |
spec_invalidated | Cached artifact paths whose governing spec has changed since last reconciliation, indicating they need review even though the artifacts themselves have not changed on disk |
If no artifact_type is specified, all artifact types are checked. When artifact_type is set, both changed_artifacts and spec_invalidated are filtered to that type.
mark_reconciled
Update the hash cache after reconciliation is complete. Call this at the end of a reconciliation pass so that subsequent runs skip files that haven’t changed.
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
files | string[] | Yes | Relative file paths to cache |
Returns:
{
"updated": 5
}
Files are hashed with BLAKE3 and stored with their relative paths as cache keys. Non-existent files are silently skipped.
clear_cache
Delete the reconciliation cache database, forcing the next get_spec_diff call to diff all governed files.
Parameters: None.
Returns:
{
"cleared": true
}
Returns true if the database was deleted, false if it didn’t exist.
snapshot_state
Persist the current reconciliation cache as a state snapshot at .notarai/reconciliation_state.json. Call this at the end of a successful reconciliation pass.
Parameters: None.
Returns:
{
"state_path": ".notarai/reconciliation_state.json",
"files": 42,
"specs": 5,
"git_hash": "a1b2c3d..."
}
| Field | Description |
|---|---|
state_path | Absolute path where the state file was written |
files | Number of non-spec file fingerprints stored |
specs | Number of spec fingerprints stored |
git_hash | git HEAD at snapshot time (empty string if not in a repo) |
The state file is pretty-printed JSON and safe to commit. It gives collaborators a baseline so subsequent get_spec_diff calls can skip files that haven’t changed since the last reconciliation. Use notarai state show / notarai state reset to inspect or clear state from the CLI.
Cache semantics
The cache is a SQLite database at .notarai/.cache/notarai.db with a single table:
file_cache(path TEXT PRIMARY KEY, blake3_hash TEXT, updated_at INTEGER)
Key details:
- Hash algorithm: BLAKE3 – fast cryptographic hash.
- Path format: MCP tools use relative paths as cache keys. Seed the MCP cache via
mark_reconciled, notnotarai cache update. - Cold cache: When the cache is empty or absent,
get_spec_diffdiffs all governed files. This is the safe default. - Cache location:
.notarai/.cache/is gitignored bynotarai initso the cache is never committed.
Error codes
| Code | Meaning |
|---|---|
-32700 | Parse error (malformed JSON) |
-32601 | Method not found |
-32602 | Invalid params (missing required parameter) |
-32603 | Internal error (git failure, file I/O, cache unavailable) |
GitHub Action
NotarAI provides a composite GitHub Action that runs notarai check on pull requests and posts a summary comment. No Rust toolchain is required on the runner.
Setup
Add a workflow file to your repository:
# .github/workflows/notarai.yml
name: NotarAI Check
on:
pull_request:
branches: [main]
permissions:
contents: read
pull-requests: write
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: davidroeca/NotarAI/crates/notarai-action@v0.7.0
The action downloads the notarai binary from GitHub Releases, runs the check, and posts (or updates) a PR comment with the results.
Inputs
| Input | Default | Description |
|---|---|---|
version | latest | NotarAI version to install |
base-branch | main | Branch to diff against for changed-file detection |
strict | false | Promote all warnings to errors (fail on any drift) |
comment | true | Post a PR comment with findings |
What it does
- Detects platform: determines runner OS and architecture.
- Downloads binary: fetches the matching
notarairelease binary from GitHub Releases. - Runs check: executes
notarai check --format json --base-branch <base-branch>(with--strictif enabled). - Posts comment: renders findings into a Markdown comment grouped by type, with collapsible details.
- Sets exit code: fails the step if any error-severity findings are present.
PR comment
The comment includes a summary line and collapsible sections for each finding type:
## NotarAI Drift Check
113 finding(s) | 0 error(s) | 113 warning(s) -- 113 warning(s)
> Orphaned globs (1)
> - Artifact glob matches no files: src/legacy/** (in legacy.spec.yaml)
> Coverage gaps (3)
> - File not governed by any spec: src/new_module.rs
> - ...
Re-runs update the existing comment in place rather than posting duplicates. The comment is identified by a <!-- notarai-action --> HTML marker.
Strict mode
Use strict: 'true' to fail the check on any finding, not just errors:
- uses: davidroeca/NotarAI/crates/notarai-action@v0.7.0
with:
strict: 'true'
This is useful for repositories that want zero-tolerance drift detection in CI.
Uninitialized repositories
If the repository does not have a .notarai/ directory, the action posts a comment noting that NotarAI is not initialized and exits successfully (does not fail the workflow).
Requirements
- Runs on
ubuntu-latest(Linux x86_64 or aarch64). - No Rust toolchain required on the runner.
- Requires
pull-requests: writepermission for posting comments. - Uses
GITHUB_TOKEN(automatic) for downloading releases and posting comments.
Motivation
The problem
With LLMs generating both code and documentation from natural language prompts, there’s no authoritative representation of intent that persists across changes. Code and docs drift out of sync – and unlike the pre-LLM era where code was the single source of truth, now either artifact can be the one that’s “right.” This is the three-body problem: intent, code, and docs can all diverge.
The idea
Introduce a NotarAI spec – a structured YAML document governed by a JSON Schema – that captures user intent as the canonical source of truth. An LLM acts as the reconciliation engine, keeping code and documentation in sync with the spec (and vice versa).
Coverage model
Four tiers ensure every file in the repo is accounted for without over-specifying:
- Tier 1 (Full Spec): Business logic, APIs, user-facing features – full behaviors and constraints
- Tier 2 (Registered): Utility libs, sidecars, config – just intent + artifact mapping, no behaviors
- Tier 3 (Excluded): Generated code, vendor deps, editor configs – explicitly out of scope
- Tier 4 (Derived): Generated outputs tracked for staleness but not authored directly (build artifacts, compiled bundles)
Anything not covered by any tier is flagged as “unspecced” – a lint warning, not a blocker.
Bootstrap
For existing codebases: ingest code + docs + commit history, then the LLM interviews the developer about goals and undocumented rules, drafts a spec with required fields only, and the user reviews and enriches. The spec accrues precision over time.
Inspirations
See the Inspirations page.
Design Diagrams
All diagrams from the design process, illustrating the NotarAI name and .notarai/ directory convention.
1. The Problem: Pre-LLM vs Current LLM Era
1a. Pre-LLM: Code Is the Spec
flowchart LR
Dev["Developer<br/>(intent in head)"]
Code["Source Code<br/>authoritative spec"]
Docs["Docs<br/>second-class, often stale"]
Dev -->|writes| Code
Code -.->|describes| Docs
1b. Current LLM Era: The Three-Body Problem
flowchart TD
Intent["User Intent<br/>natural language prompt"]
LLM["LLM"]
Code["Source Code"]
Docs["Documentation"]
Intent --> LLM
Intent -.->|"edits directly"| Code
Intent -.->|"edits directly"| Docs
LLM -->|generates| Code
LLM -->|generates| Docs
Code <-..->|"drift / desync"| Docs
2. NotarAI: Spec State File as Single Source of Truth
flowchart TD
Intent["User Intent<br/>natural language"]
Spec["NotarAI Spec<br/>structured intent representation<br/>canonical source of truth"]
LLM["LLM (sync engine)"]
Code["Source Code"]
Docs["Documentation"]
Intent -->|updates| Spec
Spec -->|reads| LLM
LLM -->|derives| Code
LLM -->|derives| Docs
Code -.->|reconcile back| Spec
Docs -.->|reconcile back| Spec
Code <-.->|"always in sync via spec"| Docs
3. Spec File Anatomy
3a. Required Core
# .notarai/auth.spec.yaml
schema_version: '0.6'
intent: |
Users can sign up, log in, and
reset passwords. Sessions expire
after 30 min of inactivity.
behaviors:
- name: 'signup'
given: 'valid email + password'
then: 'account created, welcome email sent'
- name: 'session_timeout'
given: '30 min inactivity'
then: 'session invalidated'
artifacts:
code:
- path: 'src/auth/**'
docs:
- path: 'docs/auth.md'
3b. Optional Extensions
# Power users add precision as needed
constraints:
- 'passwords >= 12 chars'
- 'rate limit: 5 login attempts / min'
invariants:
- 'no plaintext passwords in DB'
- 'all endpoints require HTTPS'
decisions:
- date: '2025-03-12'
choice: 'JWT over session cookies'
rationale: 'stateless scaling'
open_questions:
- 'Should we support OAuth2 providers?'
- 'MFA timeline?'
Design note: The
behaviorsfield uses Given/Then language (BDD-adjacent) but stays in natural language – not formal Gherkin. Structured enough to diff and validate, informal enough that non-engineers can author it.
4. Reconciliation Lifecycle
4a. Scenario A: Human Edits Code
flowchart LR
A1["Human edits code<br/>adds OAuth endpoint"]
A2["LLM detects drift<br/>code != spec behaviors"]
A3["LLM proposes spec update<br/>+ add behavior: oauth_login<br/>+ update docs/auth.md"]
A4["Human approves<br/>or adjusts and approves"]
A1 -->|trigger| A2
A2 -->|reconcile| A3
A3 -->|resolve| A4
4b. Scenario B: Human Edits Spec
flowchart LR
B1["Human edits spec<br/>changes session to 60 min"]
B2["LLM updates code to match"]
B3["LLM updates docs to match"]
B4["Human reviews<br/>code + docs diff<br/>as a single PR"]
B1 -->|direct| B2
B1 -->|direct| B3
B2 --> B4
B3 --> B4
4c. Scenario C: Conflict Detected
flowchart LR
C1["Conflict detected<br/>code says X, spec says Y<br/>docs say Z"]
C2["LLM presents options<br/>spec says X, but code<br/>does Y -- which is right?"]
C3["Human decides intent<br/>LLM propagates decision<br/>across spec + code + docs"]
C4["All three aligned<br/>conflict resolved"]
C1 -->|detect| C2
C2 -->|reconcile| C3
C3 -->|resolve| C4
5. Post-Push Reconciliation in Practice
flowchart LR
S1["Dev + LLM<br/>write code freely<br/>no spec friction"]
S2["git push<br/>or open PR"]
S3["CI hook: LLM reviews<br/>diff vs affected specs<br/>proposes spec updates<br/>proposes doc updates"]
S4["Adds to PR<br/>spec diff + docs diff<br/>alongside code diff"]
S5["Single review<br/>code + spec + docs<br/>all land together or not"]
S1 --> S2 --> S3 --> S4 --> S5
The
artifactsfield in the spec tells the CI hook which specs are affected by which file paths – so it only reconciles what changed.
6. Spec Composition – The Import Model
6a. Directory Structure
project/
+-- .notarai/
| +-- system.spec.yaml # top-level system spec
| +-- auth.spec.yaml # auth service (Tier 1)
| +-- billing.spec.yaml # billing service (Tier 1)
| +-- api.spec.yaml # API layer (Tier 1)
| +-- utils.spec.yaml # shared utilities (Tier 2)
| +-- redis-cache.spec.yaml # sidecar process (Tier 2)
| +-- _shared/
| +-- security.spec.yaml # cross-cutting
| +-- logging.spec.yaml # cross-cutting
+-- src/
+-- docs/
6b. Composition Relationships
flowchart TD
System["system.spec.yaml<br/>top-level intent + invariants"]
Auth[".notarai/auth.spec.yaml"]
Billing[".notarai/billing.spec.yaml"]
API[".notarai/api.spec.yaml"]
Security["_shared/security.spec.yaml<br/>applies to: all subsystems"]
Logging["_shared/logging.spec.yaml<br/>applies to: all subsystems"]
System -->|"$ref"| Auth
System -->|"$ref"| Billing
System -->|"$ref"| API
Security -.->|applies| Auth
Security -.->|applies| Billing
Security -.->|applies| API
Logging -.->|applies| Auth
Logging -.->|applies| Billing
Logging -.->|applies| API
When the LLM checks
auth.spec.yaml, it also loadssecurity.spec.yamland validates that auth code satisfies both specs’ invariants. Cross-cutting concerns are defined once and enforced everywhere.
7. Coverage Model – Three Tiers
flowchart LR
subgraph T1["Tier 1: Full Spec"]
T1a["Business logic services"]
T1b["API endpoints"]
T1c["Data models / schemas"]
T1d["Anything user-facing"]
end
subgraph T2["Tier 2: Registered"]
T2a["Utility libraries"]
T2b["Shared helpers / constants"]
T2c["Config files"]
T2d["Sidecar processes"]
end
subgraph T3["Tier 3: Excluded"]
T3a["Generated code / build output"]
T3b["Vendored dependencies"]
T3c["IDE / editor configs"]
T3d["node_modules, .git, etc."]
end
Coverage equation: Tier 1 + Tier 2 + Tier 3 = entire repo
Anything not covered = unspecced (a lint warning, not a block).
8. Bootstrap Flow for Existing Codebases
flowchart LR
S1["1. Ingest<br/>code + docs +<br/>commit history +<br/>README / ADRs"]
S2["2. LLM interviews<br/>What's the goal?<br/>Any undocumented rules?"]
S3["3. Draft spec<br/>required fields only<br/>intent + behaviors +<br/>artifact mappings"]
S4["4. Human review<br/>correct, enrich,<br/>add constraints /<br/>open questions"]
S5["5. Activate<br/>sync engine<br/>watches for drift<br/>from this point on"]
S1 --> S2 --> S3 --> S4 --> S5
Bootstrap starts minimal and accrues precision over time – the spec is a living document.
Comparison to SDD
Spec-driven development (SDD) has emerged as a major pattern for AI-assisted coding, but the term covers several distinct approaches. Birgitta Böckeler’s taxonomy identifies three levels:
- Spec-first: Write a spec, generate code, discard or ignore the spec afterward.
- Spec-anchored: Keep the spec around for ongoing maintenance, but how it stays current is left vague.
- Spec-as-source: The spec replaces code as the primary artifact. People never touch code directly.
Most SDD tools (Kiro, Spec Kit, OpenSpec) are spec-first in practice: they help you go from intent to plan to tasks to code, but once the code exists, the spec quietly goes stale. Superpowers takes the spec-first workflow further with a structured seven-stage methodology and subagent-driven execution, but its plans are task-scoped artifacts. Tessl is exploring spec-as-source, where code is generated from specs and marked “DO NOT EDIT”, but this sacrifices the flexibility of direct code editing.
NotarAI occupies the gap that Böckeler’s taxonomy identifies but no current tool fills: spec-anchored with automated maintenance. The spec persists for the lifetime of the feature, and an LLM reconciliation engine actively keeps it aligned with code and docs as all three evolve.
SDD tools solve the cold-start problem. NotarAI solves the entropy problem.
SDD tools help you write specs. NotarAI helps you keep them true.
- A developer adds a feature – NotarAI detects the spec doesn’t account for it and proposes an update
- A team lead updates the spec – NotarAI propagates the change to code and docs
- Code contradicts a spec constraint – NotarAI flags the conflict and asks the user to decide
The spec isn’t just a blueprint. It’s a witness – a living contract the LLM continuously verifies against reality.
Landscape comparison
| Tool | SDD Level | Direction | Spec Lifespan | Brownfield Support |
|---|---|---|---|---|
| Kiro | Spec-first | Spec -> code | Change request | Limited |
| Spec Kit | Spec-first (aspires to anchored) | Spec -> code | Branch / change request | Limited |
| Tessl | Spec-as-source | Spec -> code (human edits spec only) | Feature lifetime | Reverse-engineering CLI |
| OpenSpec | Spec-first | Spec -> code | Change request | Limited |
| Superpowers | Spec-first (workflow methodology) | Spec -> plan -> subagent execution | Task / branch | Git worktree isolation |
| Semcheck | Compliance checking | Spec -> code (one-way check) | Ongoing | Yes |
| NotarAI | Spec-anchored + active reconciliation | Spec <-> code <-> docs | Feature lifetime | Bootstrap flow with LLM interview |
For a practical, feature-by-feature comparison of NotarAI against specific tools (Spec Kit, OpenSpec, Intent, Kiro), see How NotarAI Compares.
How NotarAI Compares
This page provides a fair, practical comparison of NotarAI against the major spec-driven development tools available today. For the conceptual positioning of NotarAI within SDD taxonomy, see Comparison to SDD.
At a Glance
| Dimension | NotarAI | Spec Kit | OpenSpec | Intent | Kiro |
|---|---|---|---|---|---|
| Focus | Continuous reconciliation | Generative workflow | Proposal-based | Living specs | IDE-integrated |
| Agent support | Claude Code + any via export | 14+ agents | 20+ agents | Proprietary | Claude only |
| Spec format | Structured YAML + JSON Schema | Markdown | Markdown | Proprietary | EARS notation |
| CI integration | notarai check + GitHub Action | Manual | Manual | Built-in | Built-in |
| Deterministic checks | Yes (LLM-free) | No | No | Partial | Partial |
| Cost | Free / OSS | Free / OSS | Free / OSS | $60-200/mo | Free tier + paid |
| Brownfield | /notarai-bootstrap interview | Manual | Delta markers | Context Engine | Limited |
Where NotarAI fits
Most SDD tools solve the cold-start problem: turning intent into code. NotarAI solves the entropy problem: keeping specs, code, and docs aligned as all three evolve independently after the initial generation.
This makes NotarAI complementary to, not competitive with, tools like Spec Kit and OpenSpec. You can use Spec Kit to generate your initial codebase, then install NotarAI to watch for drift as the project evolves.
Key differentiators
Post-generation drift detection. NotarAI keeps watching after the initial generation is done. When code changes but the spec stays the same (or the reverse), NotarAI surfaces the conflict and proposes updates.
Deterministic CI checks. notarai check runs without network access, API keys, or LLM calls. It completes in under 2 seconds and produces structured JSON for CI consumption. No other SDD tool offers a fully deterministic, headless drift analysis.
Structured specs with schema validation. Specs are machine-readable YAML validated against a JSON Schema. This enables deterministic tooling (lint, check, scoring) that Markdown-based specs cannot support.
Agent-agnostic reconciliation. notarai export-context produces self-contained prompts that any LLM can process. The MCP server speaks a standard protocol. You are not locked into a single agent ecosystem.
Propose-and-approve only. NotarAI never auto-modifies code or specs. Every change is surfaced for human review. The spec is the tiebreaker when code and spec disagree, but the human decides what to do about it.
When to choose NotarAI
NotarAI is a good fit when:
- You already have a codebase and want to add spec coverage incrementally
- You want CI to catch spec drift automatically, without LLM calls
- You care about structured, machine-readable specs rather than freeform Markdown
- You want to use multiple LLM agents without lock-in
NotarAI may not be the right choice when:
- You need a tool that generates entire codebases from specs (use Spec Kit, OpenSpec, or Kiro instead, then add NotarAI for ongoing maintenance)
- Your team prefers freeform Markdown specs without schema constraints
- You need a fully proprietary, managed solution with built-in billing
Using NotarAI alongside other tools
NotarAI’s .notarai/ spec format captures the same intent, behaviors, and constraints that other SDD tools produce. A typical combined workflow:
- Use Spec Kit or OpenSpec to bootstrap your initial codebase from specs
- Run
notarai initandnotarai-bootstrapto create.notarai/specs from the existing code - Use
notarai checkin CI and/notarai-reconcileduring development - Continue using your preferred generation tool for new features; NotarAI watches everything
See Brownfield Adoption for a step-by-step guide.
Inspirations
NotarAI draws from several established traditions:
- Cucumber / Gherkin: The Given/Then behavior format in NotarAI specs comes from BDD’s structured scenario language, but kept in natural language rather than formal Gherkin syntax to lower the authoring barrier.
- Terraform and Infrastructure-as-Code: The reconciliation model (declare desired state, detect drift from actual state, propose a plan to converge) is borrowed from IaC tools like Terraform, Pulumi, and CloudFormation. NotarAI’s spec is a state file for intent, not infrastructure.
- JSON Schema / OpenAPI: The
$refcomposition model and the use of a JSON Schema to govern spec validity come directly from these standards. - Design by Contract (Eiffel): The distinction between
constraints(what the system enforces) andinvariants(what must never be violated) echoes Eiffel’s preconditions, postconditions, and class invariants. - Architecture Decision Records: The
decisionsfield in the spec is a lightweight ADR log, capturing the why alongside the what.
Contributing
Your interest in contributing to this project is appreciated. Below is a series of instructions that will hopefully remain up to date because this tool should help manage that. However, if you notice that the steps seem out of date or misaligned with current practices in the repo, an update to this document could be a high-value first or second contribution to the project.
Note that the project’s own spec drift is self-managed, so please get acquainted with the tool and make sure your contributions stay in sync.
Development Setup
Install Rust (stable toolchain). Install pre-commit for pre-commit hooks.
Temporarily (until biome supports markdown), install prettier.
Setup clippy and rustfmt via:
rustup component add rustfmt clippy
Then setup the repo:
git clone https://github.com/davidroeca/NotarAI.git
cd NotarAI
cargo build
cargo install biome
cargo install --path crates/notarai
pre-commit install
The cargo install step installs the notarai binary to ~/.cargo/bin so the
Claude Code hook (notarai hook validate) resolves correctly. Re-run it whenever
you want the installed binary to reflect your latest local changes.
Making Changes
- Create a branch from
main - Make your changes
- Run
cargo buildto verify compilation - Run
cargo testto run the test suite - Run
cargo fmt --checkto verify formatting - Run
cargo clippy -- -D warningsto check for lint issues - Use the
/notarai-reconcileClaude Code command to check for spec drift - Add a changeset if your PR should trigger a release (see below)
- Open a pull request
Changesets
This project uses sampo for versioning and changelogs. If your PR introduces user-visible changes (new features, bug fixes, breaking changes), add a changeset:
sampo add
This creates a Markdown file in .sampo/changesets/ describing the change and
the bump level (patch, minor, or major). Commit this file with your PR.
When changesets are merged to main, a release PR is automatically created.
Merging the release PR publishes the new version.
Code Style
- Rust 2024 edition
cargo fmtfor Rust formattingcargo clippyfor Rust lintsbiome format --checkfor non-Rust file formatting (JSON, JS/TS, CSS, etc.)prettier --checkfor Markdown formatting (temporary until biome#3718 is resolved)- Functional style preferred over excessive use of structs with methods
- Core library lives in
crates/notarai/src/core/(notsrc/lib/due to Rust’s reserved module name)
Project Structure
See CLAUDE.md in the repository root for a detailed layout and architectural
constraints.
Good First Contributions
These changes will drive broader adoption but are not yet a priority:
- Support other coding agents (e.g. Codex, Aider, Cline, OpenHands, Goose, opencode)
- Find/create new issues and reference them here
License
By contributing, you agree that your contributions will be licensed under the Apache License 2.0.