Skip to content

Project Timeline — Forma3D.Connect

An AI-built production system, guided by a human

This document traces how Forma3D.Connect evolved from first commit to a multi-service production platform — entirely through AI-human collaboration. It highlights the major milestones, architectural shifts, and the surprisingly impactful role that human intuition played alongside AI velocity.


At a Glance

uml diagram


The Numbers

Metric Value
Calendar days (Jan 9 – Mar 2) 53
Active days 50 (only 3 days with zero activity)
Total AI chat sessions 460+
Average sessions per active day ~9.2
Peak day: Feb 17 (microservices deploy) 19 sessions
Human interventions acknowledged by AI 90+
Pipeline failures pasted by human 50+ (~1/day)
Pipeline failures caused by AI code ~63%
CHANGELOG versions 17 (0.0.0 → 20260302)
Human estimated duration (Phases 0–7) 26.5 weeks
Human estimated duration (Phases 0–13) 48.5 weeks
Actual duration (Phases 0–7) 10 days
Actual duration (Phases 0–13) 53 days
Cost (AI usage, Jan 2026) €655
Cost (human team estimate) €120,000 – €160,000 (Phases 0–7)

Human vs AI: Implementation Plan Timeline Comparison

The implementation plan estimated 27.5 work-weeks for a team of 3–4 mid-level full-stack developers to complete Phases 0–8. The AI + human pair completed the same scope in 18 calendar days — including weekends.

The two Gantt charts below use matching phase labels so the scale difference speaks for itself.

Human team estimate (Phases 0–8)

Sequential work-weeks for a 3–4 person team. Sub-phases (1b–1d, 5b–5k) are grouped under their parent phase.

uml diagram

AI + human actual (Phases 0–8)

One human operator + AI. Weekends included — the AI doesn't take days off.

uml diagram

Phase-by-phase breakdown

Phase Scope Human Estimate AI Actual Ratio
Phase 0 Foundation, Nx, Prisma, CI 2 weeks 1 day (Jan 9) 10x
Phase 1 + 1b–1d Shopify, Observability, Staging, Tests 5.5 weeks 4 days (Jan 9–12) 10x
Phase 2 SimplyPrint integration 3 weeks 1 day (Jan 13) 15x
Phase 3 Fulfillment automation 2 weeks 1 day (Jan 14) 10x
Phase 4 Dashboard MVP 3 weeks 1 day (Jan 14) 15x
Phase 5 Shipping (Sendcloud) 2 weeks 1 day (Jan 16) 10x
Tech Debt 5b–5k 9 items (domain boundaries, tests, schemas) 5.5 weeks 4 hours (Jan 17) 97x
Phase 6 Hardening, runbooks, load testing 2 weeks 1 day (Jan 18) 10x
Phase 7 PWA, push notifications 1 week 1 hour (Jan 19) 56x
Phase 8 RBAC, session auth, user management 1 week 5 days (Jan 21–25) 1.4x
Total 27.5 weeks 18 days ~11x

Phase 8 (RBAC) stands out as the closest to the human estimate. Security-critical features — password hashing, session management, role enforcement, audit logging — benefit less from AI velocity because the complexity is in getting the design right, not in writing code fast.

Summary

uml diagram


Detailed Timeline

Week 1 — Foundation and Core Integrations (Jan 9–14)

uml diagram

Key human intervention — Jan 9: The human spotted that the database schema assumed a 1:1 relationship between Shopify products and print files. The AI acknowledged: "Excellent observation! You're right — the current schema doesn't account for assemblies." This led to the assembly parts model that persists today.

Key human intervention — Jan 10: The human caught the AI overwriting the .env.example file (destroying SimplyPrint config) when adding Sentry variables. Also pushed for 100% Sentry sampling since traffic would be low initially.

Key human intervention — Jan 11: The human spotted that the deployment script would restart containers before running database migrations — a classic production incident waiting to happen. The AI called it "exactly the kind of subtle ordering bug that can cause production incidents!"


Week 2 — Shipping, Hardening, PWA (Jan 16–21)

uml diagram

Key human intervention — Jan 17: The human asked "Isn't that quick fix a violation against ADR-032: Domain Boundary Separation?" — catching the AI taking a shortcut that would violate the project's own architecture decisions. The fix was reimplemented properly.

Key human intervention — Jan 18: The human pointed out that relying on existing orders in the database for load tests was brittle — the staging database might be empty. Also caught a version mismatch between the API (0.0.1) and frontend (0.4.0).

Key human intervention — Jan 19: The human identified that the PWA manifest's theme_color is static and can't change when the user toggles dark mode. Also asked about Docker log rotation after a disk-full incident, leading to the json-file logging driver configuration.


Week 3 — Auth, Users, Reconciliation (Jan 22–28)

uml diagram

Key human intervention — Jan 23: The human raised two issues about pagination: the API wasn't consistently returning pagination metadata, and not all components handled it. This became a project-wide API contract rule: "The API should always return pagination info (page, pageSize, and total)."

Key human intervention — Jan 24: The human noticed that documentation URLs, pgAdmin URLs, and other links were hardcoded in the frontend. These would break in production. Environment-based URL configuration was implemented.


Week 4 — OAuth and Research (Jan 29 – Feb 3)

uml diagram

Key human intervention — Feb 1: The human asked whether the registry cleanup script would accidentally remove the :cache tag needed for Docker inline caching. "Good catch" — the script was updated to preserve cache tags.

Key human intervention — Feb 1: The human noticed that Docker build times actually got slower after optimization attempts. "You're right — 5m 44s is actually slower than before. The cache isn't working as expected." The cache strategy was reworked.


Week 5 — Advanced Features (Feb 4–12)

uml diagram

Key human intervention — Feb 8: The human identified three missing items in a single review: typechecking hadn't been run, tests weren't written, and Shopify theme docs weren't updated. "Good catches" — all three were addressed.

Key human intervention — Feb 9: The human corrected the AI multiple times about SimplyPrint's queue behavior. The AI was guessing API response field names instead of reading documentation. The human insisted: read the docs first. The AI responded: "You're absolutely right, and I owe you an honest answer."

Key human intervention — Feb 10: The human asked "What is the purpose of the isActive field on ProductMapping? I never asked for it." — prompting a code archaeology investigation that led to removing an unrequested feature.


Week 6 — The Big Split (Feb 13–18)

uml diagram

Key human intervention — Feb 15: The human caught a critical deployment bug: the .env file was being overwritten with all image tags on every pipeline run. If a service wasn't rebuilt (due to Nx affected), its tag would be overwritten with the wrong value, potentially bringing down unaffected services.

Key human intervention — Feb 15: The human noticed cosign was pre-installed on self-hosted agents but the pipeline was redundantly trying to re-download it every run, failing because it was owned by root.

Key human intervention — Feb 16: The human correctly identified that docker builder prune was running on the staging server (inside an SSH heredoc), not on the build agent — the AI had confused which machine was being affected, which was wiping Docker cache after every deployment.

Key human intervention — Feb 17: The human raised two critical issues about database containers in CI: (1) even with trust auth, two agents can't bind to the same port, and (2) 80+ orphaned Docker volumes proved Azure DevOps wasn't cleaning up on cancellation. Both were addressed with dynamic port allocation and cleanup cron jobs.


Week 7 — Documentation, Integration UI, and Observability (Feb 18–22)

This week focused on EventCatalog architecture documentation, event traceability fields (eventId, source), and database-backed UI configuration for SimplyPrint and Sendcloud connections with AES-256-GCM encrypted credential storage. The ClickHouse + Grafana centralized logging pipeline was deployed — replacing Sentry Logs with a self-hosted stack (OTel Collector → ClickHouse → Grafana). pgAdmin was moved to an on-demand container managed via the DevTools UI. The Nx affected pipeline was hardened with a last-successful-deploy git tag (ADR-059).


Week 8 — Preview Infrastructure and Security (Feb 23 – Mar 2)

uml diagram

Key achievements — Week 8:

  • Preview cache revolution: Extracted STL preview generation into @forma3d/gridflock-core as a reusable library. Created offline scripts for pre-populating preview caches with CPU-aware parallelism. Implemented plate-level caching (268 files, 60 MB) replacing the legacy per-dimension cache (16,471 files, 35 GB) — a 99.8% storage reduction with 100-300x faster preview assembly.
  • Security baseline: Configured Aikido for continuous security scanning. Initial findings identified Prisma operator injection risks, Express security header gaps, and dependency CVEs — all tracked for remediation.
  • Storefront fix: Grid configurator dimension rounding changed from Math.round to Math.floor to prevent dimensions like 74.41 cm from rounding up to 74.5 cm.
  • SaaS research: Comprehensive SaaS launch readiness research covering onboarding, pricing, billing, GDPR, multi-tenancy considerations, and Stripe integration.

The Architectural Evolution

uml diagram


Session Category Breakdown

uml diagram

Observation: Bug fixing is the second-largest category (19.9%), reflecting the reality that AI speed doesn't eliminate bugs — it just creates and fixes them faster. The high documentation percentage (16.9%) is unusual for a fast-moving project and reflects the explicit quality mandate in the project rules.


Human-AI Cooperation Analysis

The Human Contribution — By the Numbers

uml diagram

AI Acknowledgment Patterns

AI Response Pattern Count Typical Context
"Good catch" 20 Human found a bug, logic error, or inconsistency
"You're right" / "You're correct" 19 Human corrected a factual mistake or wrong assumption
"Good question" / "Great question" 32 Human asked something that revealed a gap in the solution
"You're absolutely right" 10 Human identified a significant issue the AI missed entirely
"Good point" / "Great point" 3 Human made a strategic or practical recommendation
"Excellent observation" / "Excellent point" 5 Human's insight changed the approach

The Most Critical Human Interventions

These interventions directly prevented production incidents or architectural rot:

Date Intervention Severity AI Response
Jan 9 Assembly model gap (1:1 → 1:many) Architecture "Excellent observation!"
Jan 11 Migration-before-restart ordering Critical "Exactly the kind of subtle ordering bug that can cause production incidents!"
Jan 17 ADR-032 violation in a quick fix Architecture Reimplemented properly
Feb 1 Registry cleanup deleting cache tags Deployment Script updated to preserve cache
Feb 9 Insisting AI read API docs instead of guessing Process "I owe you an honest answer"
Feb 15 .env overwrite destroying unaffected services Critical Deploy flow redesigned
Feb 16 Identifying wrong machine for docker builder prune Deployment Cache strategy corrected
Feb 17 Port collision + orphaned volumes in CI CI/CD Dynamic ports + cleanup cron

Patterns in Human-AI Cooperation

uml diagram

Human Intervention Frequency Over Time

uml diagram

Key insight: Human interventions spike during two specific periods:

  1. Week 1 (foundation) — setting patterns and catching early design flaws before they propagate.
  2. Week 6 (microservices migration) — the most complex architectural change with new infrastructure, multiple deployment targets, and CI pipeline rewrites.

The quieter weeks (3–5) represent periods where patterns were established, the AI was operating within known boundaries, and fewer novel decisions needed human judgment.


Pipeline Interventions — The Human as Build Monitor

Beyond architectural and design interventions, the human played a constant role as the project's build monitor — pasting pipeline output into the chat 43 times when something broke. In 95% of cases, the human pasted raw Azure DevOps pipeline logs (full ##[section] blocks with timestamps, exit codes, and stack traces). Only twice was the problem described verbally.

Pipeline Failure Categories

uml diagram

Root Causes — Who Broke It?

uml diagram

The headline number: nearly 63% of all pipeline failures were caused by AI-generated code changes. The AI writes code, pushes it, and the human discovers it broke the pipeline by pasting the output back. This is the dominant feedback loop in the project.

Pipeline Failures Over Time

uml diagram

Recurring Failure Patterns

Several failure classes kept resurfacing — they weren't one-off issues but structural weaknesses:

Pattern Occurrences Root Problem
Playwright/test result publishing 4 times Azure DevOps task configuration never fully stable
Coverage threshold not met 4 times AI ships new code without enough tests, every time
Cosign/Sigstore attestation 3 times Device flow auth fundamentally incompatible with CI
Docker build failures 6 times (5 in Feb) Each new microservice introduced a new Dockerfile to get wrong
Acceptance tests breaking 8 times Most fragile stage — any application change can break them

The "It Used to Work" Pattern

At least 3 times, the human expressed explicit frustration:

  • "What has changed? It used to work perfectly." (deployment verification, Jan 26)
  • "Still failing. What has changed? It used to work all the time." (registry cleanup, Feb 1)
  • "The tests still seem not to publish correctly." (test results, Feb 16)

These all trace back to AI changes breaking previously-working pipeline functionality — a side effect of the AI modifying pipeline YAML or build scripts without fully understanding the downstream effects.

The Self-Hosted Agent Effect

When the project moved to self-hosted build agents around Feb 15, an entirely new class of failures appeared:

Failure Why It Didn't Happen on Hosted Agents
syft: command not found Hosted agents have most tools pre-installed
git fetch: non-fast-forward Hosted agents start with a clean workspace every time
PostgreSQL port collisions Hosted agents don't share ports between pipeline runs
80+ orphaned Docker volumes Hosted agents are ephemeral — no accumulation

What This Reveals About AI-First Development

uml diagram

The key metric: with 43 pipeline failures across 41 active days, the project averaged roughly 1 pipeline failure per day that required human intervention. This is the operational cost of AI-first development — the AI writes code fast but doesn't run the pipeline locally, so the human becomes the feedback loop between the AI and the CI system.


What the Data Reveals

The Human Is Not Just a Rubber Stamp

With 74 acknowledged interventions across 296 sessions, the human contributed meaningfully in roughly 25% of all sessions. These weren't cosmetic suggestions — they prevented production incidents, caught architectural violations, and injected domain knowledge the AI didn't have.

The AI's Blind Spots Are Predictable

The AI consistently struggles with:

  1. Cross-boundary effects — changing one file without checking if it affects deployments, docs, or other services
  2. Real-world behavior vs. documented behavior — APIs that behave differently than their docs suggest
  3. Infrastructure edge cases — concurrent agents, cancelled pipelines, disk-full scenarios
  4. Optimizing for speed over safety — taking shortcuts that violate the project's own architecture decisions
  5. Visual/spatial reasoning — the logo SVG required 10+ rounds of human correction
  6. Pipeline awareness — the AI caused 62.8% of all pipeline failures by pushing code without local verification; coverage thresholds were violated 4 times by shipping new code without tests

The Collaboration Gets More Efficient

Early weeks required more interventions per session. As the project established patterns, rules (like the .cursorrules file), and ADRs, the AI needed less correction. The microservices migration (Week 6) temporarily disrupted this — introducing new infrastructure always resets the learning curve.

The Human's Superpower: Asking "What About...?"

The single most impactful human behavior was asking about scenarios the AI didn't consider:

  • "What about when you cancel the pipeline?" → discovered orphaned containers
  • "What about concurrent build agents?" → discovered port collisions
  • "What about an empty staging database?" → discovered brittle load tests
  • "What about the other documents that mention endpoints?" → discovered stale docs
  • "What about the cache tag?" → prevented cache strategy from being destroyed

These "what about" questions account for roughly a third of all human interventions and nearly all of the most critical ones.


Cost and Velocity Summary

uml diagram


Conclusion

Forma3D.Connect was built by AI — but it was shaped by a human. The AI provided velocity, breadth of knowledge, and tireless consistency. The human provided judgment, domain expertise, and the crucial ability to ask "but what about...?"

Neither could have built this system alone. The AI without the human would have shipped a fragile system full of subtle deployment bugs, architectural shortcuts, and hardcoded assumptions. The human without the AI would still be in Week 3 of the original 26.5-week plan.

The data shows that human oversight is not overhead — it's a force multiplier. The 74 interventions didn't slow the project down; they prevented the kind of problems that derail projects for days or weeks. The cost of those interventions (a few minutes of human attention each) was trivial compared to the production incidents they prevented.

This is what AI-human collaboration looks like when it works: AI velocity, human wisdom, and a shared commitment to getting it right.


Addendum: The ClickHouse Death Spiral (March 3, 2026)

One day after the timeline was "complete," the AI proved its value again.

The Question

The human noticed ClickHouse consuming 83% CPU on the staging Dozzle dashboard and asked a simple question: "Is it normal that ClickHouse is taking more than 80% CPU while the system is doing nothing?"

What the AI Found

Within minutes of SSH-ing into the server, the AI uncovered a merge death spiral — a self-reinforcing feedback loop that was silently burning both CPUs and would have become a production crisis:

  1. System tables had grown unchecked. ClickHouse's own internal logging tables had accumulated 138 million rows in asynchronous_metric_log, 3 million rows / 788 MiB in text_log, and 82 active parts in metric_log — all without any TTL or retention policy.

  2. Background merges were failing. ClickHouse needed to merge these bloated tables, but every merge attempt hit MEMORY_LIMIT_EXCEEDED (Code 241). The server had 3.82 GiB of RAM shared across 16 containers, but ClickHouse was configured to claim 80% of the host's RAM (3.06 GiB) — memory it could never actually get.

  3. Failed merges created more data. Each OOM failure logged a massive stack trace to system.text_log, which created more parts, which triggered more merge attempts, which failed, which generated more errors. Classic death spiral.

  4. 713 threads spinning on retry loops. 48 MergeTree background threads were all caught in this fail-log-retry cycle, pegging both CPUs at 163% with zero useful work.

  5. The OTel filter was defined but never wired. A filter/drop-debug processor existed in the OpenTelemetry collector config but wasn't included in the pipeline — debug-level logs were being ingested unnecessarily.

The Fix

The AI applied a layered fix in under 30 minutes:

Layer Change Impact
Immediate Truncated all bloated system tables CPU dropped from 163% to 3.6% instantly
Docker Compose Added mem_limit: 1536m, cpus: 1.0 to ClickHouse Container can no longer starve other services
ClickHouse config Added 3-7 day TTL to all system tables Tables auto-purge, preventing unbounded growth
ClickHouse config Set text_log level to warning Broke the error-logging feedback loop
ClickHouse config Reduced background_schedule_pool_size from 512 to 32 Fewer idle threads on a 2-CPU machine
OTel collector Activated the filter/drop-debug processor in pipeline Stopped unnecessary debug log ingestion

Before and After

Metric Before After
CPU 162.84% 3.57%
Memory 937.9 MiB (no limit) 290.8 MiB / 1.5 GiB
Threads 713 212
Error rate ~5 errors/sec 0
System table size ~1.3 GiB Truncated + TTL

Why This Matters

This incident perfectly illustrates the value of AI-human collaboration in operations:

  • The human noticed. A glance at a dashboard, a gut feeling that 83% CPU during idle was wrong. No alert fired. No user complained. The human's pattern-matching caught it.
  • The AI diagnosed. Within minutes, it traced the root cause through five layers of infrastructure — Docker stats, ClickHouse system tables, background thread metrics, error logs, and config files. A human SRE would need significant ClickHouse expertise to identify a merge death spiral this quickly.
  • The AI fixed it safely. It applied the fix incrementally (truncate first for immediate relief, then config changes, then restart), caught and corrected a startup failure caused by ClickHouse's pool size constraints, and verified the fix was stable before declaring success.

Left unchecked, this spiral would have continued degrading the staging environment and — once the same ClickHouse configuration was promoted to production — would have caused a production outage. The total time from "is this normal?" to "CPU at 3.57% and stable" was under 30 minutes.

This is the "what about...?" pattern in action — except this time, it happened after the project was declared complete.


Addendum: The ClickHouse Death Spiral Returns (March 18, 2026)

Two weeks later, the same spiral — proving that fixing symptoms without fixing architecture is borrowing time.

The Recurrence

The human noticed ClickHouse at 51% CPU on the Dozzle dashboard and asked the AI to check. SSH diagnostics revealed the same pattern: 69% CPU, 417 MiB / 1.5 GiB memory, 1,524 OOM errors in 10 minutes, merge tasks stuck in a retry loop. The system.metric_log table's merge task had reached level 748, and system.text_log had re-bloated to 113 MiB.

Every fix from March 3 was in place: memory limits, TTLs, thread pool tuning, log level filtering, debug log dropping. All correctly configured. All ineffective.

Why the Previous Fix Failed

The March 3 fix treated the correct symptoms but missed the architectural root cause:

What was configured Why it didn't work
3-day TTLs on system tables TTL cleanup runs during merges. When merges OOM, TTLs can't execute either — a circular dependency.
toYYYYMM monthly partitions With 3-day TTL inside a monthly partition, expired rows must be rewritten out of the part (a merge-like operation). On a 1.5 GiB container, rewriting a month of metric_log exceeds memory.
error_log Never configured at all — no TTL, no retention. Accumulated 825K rows of OOM errors, fueling the spiral.
metric_log collect every 10s Created ~1 part per 10 seconds → 155K parts over 18 days. Merge tree depth reached level 748 before a single merge exceeded memory.
flush_interval: 7.5s Too frequent — each flush creates a new part. More parts = more merge pressure.

The fundamental issue: TTLs and monthly partitions are architecturally incompatible on a memory-constrained system. TTL cleanup on monthly partitions requires the same merge operations that caused the OOM in the first place.

The Actual Fix

Layer Change Why it works
Partition scheme toYYYYMMtoYYYYMMDD (daily) on all system tables Expired data = entire day-part → dropped with zero I/O and zero memory. No merge-rewrite needed.
Missing table Added error_log config with 3-day TTL Closes the gap that let OOM errors accumulate without retention.
Missing table Added query_metric_log config with 3-day TTL Another table that had no retention policy.
Collection rate metric_log collect interval 10s → 60s 6× fewer parts created, dramatically lower merge pressure.
Flush rate All flush intervals 7.5s → 30–60s More rows batched per part, far fewer parts to merge.
Disabled tables trace_log, processors_profile_log, asynchronous_insert_log removed Zero merge/storage overhead for tables nobody reads on staging.
Table migration Dropped old monthly-partitioned tables, ClickHouse recreated with daily scheme Config only applies at table creation; existing tables needed manual recreation.

Before and After

Metric Before After
CPU 69% 3.2%
Memory 417 MiB / 1.5 GiB 110 MiB / 1.5 GiB
System table partitions Monthly (merge-rewrite for TTL) Daily (drop-part for TTL)
Error rate ~150 errors/min 0
metric_log part creation 1 part / 10s 1 part / 60s

The Lesson

Configuration correctness ≠ operational correctness. Every TTL, every memory limit, every thread pool setting from the March 3 fix was syntactically correct and semantically appropriate. But the interaction between monthly partitions and short TTLs created a situation where the TTLs could never execute — they required the very merge operations that were failing. The fix looked right in the config file but was architecturally impossible at runtime.

This is a class of bug that's invisible to code review and config audits. The only way to catch it is to understand how ClickHouse implements TTL cleanup (via merges) and reason about whether the merge operations themselves are feasible within the resource constraints. It's a second-order failure mode: not "does this setting exist?" but "can the mechanism that enforces this setting actually run?"


Addendum: The Cosign 401 Mystery (March 6–7, 2026)

When the AI's analytical approach hit a wall, the human's memory of past struggles broke through.

The Problem

After introducing self-hosted DigitalOcean build agents alongside the default Microsoft-hosted agents, all staging promotion attestations started failing with a 401 Unauthorized error. The cosign tool could sign images and record entries in the Sigstore transparency log, but couldn't push the attestation back to the DigitalOcean Container Registry:

Error: signing registry.digitalocean.com/forma-3d/forma3d-connect-order-service@sha256:...
GET https://api.digitalocean.com/v2/registry/auth?scope=repository:forma-3d/forma3d-connect-order-service:push,pull
→ unexpected status code 401 Unauthorized

The confusing part: image builds, pushes, cosign signing, and SBOM attestations all succeeded in the earlier build stage — on the same agent pool, with the same token. Only the staging promotion attestations (which ran later in the pipeline) failed.

The AI's Attempts

The AI approached the problem analytically, trying three successive hypotheses:

Attempt Hypothesis Fix Applied Result
1 Custom DOCKER_CONFIG path was wrong Removed custom config, used default ~/.docker/config.json Still 401
2 Credential helper on self-hosted agents intercepting docker login Reset Docker config to {} before login, verified inline auth Still 401
3 Missing set -e hiding a silent login failure Added set -e to all login blocks Still 401

Each hypothesis was reasonable. Each was wrong. The AI was trapped in a pattern of analyzing the current pipeline configuration without considering DigitalOcean-specific authentication behavior.

The Human's Breakthrough

The human remembered: "I feel like we have had this problem a long time ago." They asked the AI to search the Specstory conversation history in .specstory/history/ — the archive of all previous AI-human sessions.

The AI found it immediately: a conversation from January 14, 2026 (container-image-promotion-issue) where the exact same 401 Unauthorized error had been encountered and solved. The fix was simple but non-obvious:

docker login with a raw DigitalOcean API token produces credentials that Docker CLI can use but cosign cannot. DigitalOcean's registry auth endpoint rejects the raw token format when cosign presents it. The solution is doctl registry login, which generates a properly scoped registry credential that both Docker and cosign understand.

The Fix

Three lines replaced the broken docker login pattern across all attestation jobs:

doctl auth init --access-token $DOCR_TOKEN
doctl registry login

This matched the pattern already used successfully in the pipeline's CleanupRegistry job — which also runs cosign operations against the DigitalOcean registry.

Why This Matters

This incident reveals a fundamental limitation and a fundamental strength of AI-human collaboration:

  • The AI's limitation: It can analyze what's in front of it with extraordinary depth. It traced credential flows through Azure DevOps variable expansion, Docker config files, and cosign's go-containerregistry library. But it couldn't recall that this exact problem had been solved before — it had no memory across sessions.

  • The human's strength: The human had a feeling. Not a precise recollection, but an intuition born from having lived through the January debugging session. That vague memory — "didn't we fix something like this before?" — was worth more than three rounds of systematic analysis.

  • The archive as shared memory: The .specstory/history/ folder — containing 460+ conversation logs — acted as a bridge between the human's fuzzy recall and the AI's precise execution. The human knew something was there; the AI could find and apply it in seconds.

This is the inverse of the ClickHouse incident. There, the AI diagnosed a novel problem the human couldn't have solved alone. Here, the human's experiential memory broke through where the AI's analytical approach kept missing. The conversation archive turned an individual's vague memory into an actionable solution — a form of institutional knowledge that neither human nor AI could have leveraged alone.


Addendum: Intelligent Ralph Wiggum Loops (January 9 – March 8, 2026)

When the human becomes the build monitor and the AI becomes the code monkey in a feedback loop.

What Is a Ralph Wiggum Loop?

A Ralph Wiggum loop is the practice of repeatedly feeding the same prompt to an AI until the task is fully complete. In its original form, it's a dumb bash loop: while :; do cat PROMPT.md | claude ; done. The AI runs, the pipeline checks, the loop repeats until everything passes.

In Forma3D.Connect, we ran an intelligent variant: a human monitors the pipeline, identifies what still fails, and feeds only the relevant failure logs back to the AI. The AI proposes a fix, the human pushes it, the pipeline runs, and the cycle repeats until the pipeline goes green — or the human gives up and changes strategy.

uml diagram

The Numbers

Metric Value
Total intelligent Ralph Wiggum loops identified 21
Cross-session recurring patterns 3
Intentional verification loops 1
Total iterations across all loops ~85
Loops ending in clean success 13 (62%)
Loops abandoned or worked around 4 (19%)
Loops partially resolved or evolved 4 (19%)
Longest loop by iterations 10+ (Cosign 401, Mar 6–8)
Longest loop by duration 44 hours (Cosign 401, Mar 6–8)
Shortest successful loop 25 minutes (Cosign flag mismatch, Feb 26)
Cases where the human broke the loop 5
Average iterations per loop ~4

All 21 Loops — Chronological Catalog

uml diagram

RW #1 — Azure Pipeline Type Check Cascade (Jan 9)

Field Value
Iterations 4
Duration ~2 hours
Outcome Success

TypeScript's React 19 JSX namespace change triggered the first cascade: fix TS2503nx affected can't find main branch → unit tests exit with "No tests found" → deprecated Azure DevOps task versions. Four distinct errors, each revealed only after the previous was fixed.

RW #2 — Publish Test Results + Badge Fix (Jan 9)

Field Value
Iterations 2
Duration ~30 min
Outcome Success

No JUnit XML files being generated → wrong Azure DevOps badge URL. Two sequential pipeline issues fixed in the project's first day.

RW #3 — Docker Push Authorization Saga (Jan 11)

Field Value
Iterations 5
Duration ~3 hours
Outcome Success

The most dramatic early loop. Docker login fails → AI fixes env mapping → push still fails → human spots wrong registry URL → push succeeds but hits repo limit → human creates new registry → AI rewrites all image naming. The human's intervention at iteration 3 (spotting the wrong registry) was the breakthrough the AI couldn't reach analytically.

RW #4 — CI Pipeline Triggering + Deployment (Jan 12)

Field Value
Iterations 3–4
Duration ~2 hours
Outcome Success

Pipeline stops auto-triggering → fix trigger path exclusions → deployment verification fails → debug deployment steps. Mixed loop that evolved from one problem to another.

RW #5 — Staging Deployment Health Verification (Jan 12)

Field Value
Iterations 4+
Duration ~4 hours
Outcome Resolved (with SSH escalation)

Health endpoint returns 404 → AI investigates Traefik routing → AI given SSH access to debug server directly → traces through Docker logs, .env files, Prisma migrations, network config. The loop escalated from "paste pipeline log" to "give the AI SSH access and let it dig."

RW #6 — Acceptance Test Post-Deploy (Jan 12)

Field Value
Iterations 2
Duration ~1 hour
Outcome Partial (continued in other sessions)

Acceptance tests fail after first successful deploy → fix Prisma binary targets for Alpine → still failing → continued debugging in subsequent sessions.

RW #7 — Playwright Report Publishing (Jan 14, cross-session)

Field Value
Iterations 3 (across 2 sessions)
Duration ~4 hours
Outcome Success

Azure DevOps PublishHtmlReport@1 can't find attachment → AI masks with continueOnError → human opens new session with same error → AI adds file-existence checks → user says "Again..." → AI realizes reportDir must point to a file, not a directory. Third time's the charm.

RW #8 — Acceptance Tests: Shipping Integration (Jan 16)

Field Value
Iterations 4
Duration ~3 hours
Outcome Abandoned

Four shipping tests fail with wrong HTTP status codes → AI fixes error handling → still failing → AI tries again → still failing → human abandons the loop and disables shipping in staging (SHIPPING_ENABLED=false) since there are no SendCloud API keys configured anyway. A pragmatic exit.

RW #9 — Order Metadata Schema Type Check (Jan 17)

Field Value
Iterations 2
Duration ~1 hour
Outcome Success

Zod z.record() needs 2 args in v4 → fix type signature → pipeline advances to unit tests → fix coverage thresholds.

RW #10 — Rate Limiting / Orders API Timeout (Jan 18, cross-session)

Field Value
Iterations 2+ (across 2 sessions)
Duration ~2 hours
Outcome Unresolved

Acceptance tests hit 429 rate limiting → AI adds @SkipThrottle() to missing controllers → human opens new session: "Are you sure the fix was deployed?" → same 429s plus new 503 errors emerge. The loop stalled on uncertainty about whether the fix had actually been deployed.

RW #11 — Acceptance Test Run Failure (Jan 21–22)

Field Value
Iterations 3
Duration ~9 hours
Outcome Likely resolved

Acceptance tests fail in CI → AI fixes test locators → unit tests fail → AI fixes → acceptance tests fail again → AI fixes more test logic. Ended with the human asking the AI to document the local test procedure, suggesting eventual success.

RW #12 — Real-Time Updates Acceptance Test Failure "The Marathon" (Jan 24–25)

Field Value
Iterations 8+
Duration ~20 hours
Outcome Partially resolved

The most intense loop in the project. Missing env vars → AI fixes → still failing → AI fixes more config → human says "I am tired of going back and forth with CI" → retries after local test → "Still 8 failing" → AI fixes locators → user screams "FIX IT" → AI refactors tests → deployment verification fails (version mismatch) → AI discovers its own nginx proxy broke the version check. Tests went from 45 failing to 1, but the deployment verification remained broken. The 37,897-line conversation file is the longest in the entire project history.

RW #13 — User Creation Authentication Error "The Deployment Saga" (Jan 26)

Field Value
Iterations 6
Duration ~3 hours
Outcome Success

Can't create users → Prisma role seeding missing → CI acceptance tests fail → deployment check fails → Docker container warnings → 1 test still fails → all resolved.

RW #14 — Deployment Verification "Version Mismatch Whack-a-Mole" (Jan 26)

Field Value
Iterations 4
Duration ~2 hours
Outcome Success

Version mismatch on staging → AI investigates imageTag propagation → "I reran, still fails" → AI SSHes into server → "From 1 failing back to 45 failing?" — a regression caused by the fix itself. Eventually resolved after the AI identified a Docker image build issue.

RW #15 — isActive Feature Removal Cascade (Feb 10)

Field Value
Iterations 5 (2 initial + 3 whack-a-mole)
Duration ~13 hours
Outcome Success

Removing an unrequested isActive field triggered a cascade across three pipeline stages: lint (unused import) → Docker build (seed.ts still references isActive) → BDD generation (step definition text mismatch). User frustration marker: "Failed again. Make sure to run acceptance tests locally before I commit..." The AI learned — in the final iteration, it ran bddgen locally before declaring the fix.

RW #16 — Gridflock Crash Loop (Feb 16)

Field Value
Iterations 2
Duration ~1 hour
Outcome Cut off (continues in RW #17)

Gridflock service crash-looping due to missing NestJS module imports → AI fixes ServiceClientModule → still crash-looping → different missing dependency (SlicerClient). Each fix resolved one missing dependency but revealed another.

RW #17 — The Grand Marathon (Feb 16)

Field Value
Iterations 8
Duration ~12 hours
Outcome Partial success

The day-long debugging marathon after a deployment broke everything: 44+ acceptance test failures spanning auth forwarding, WebSocket config, gridflock crashes, proxy 404s, PostgreSQL auth, and pipeline configuration. Progressive failure reduction: 44+ → ~20 → ~5 → 3 → 2 → then infrastructure issues (PostgreSQL auth on build agents, pg_isready: command not found) derailed progress. The AI had to completely rewrite a PostgreSQL wait script from bash to Node.js mid-loop. 27 conversation turns over 12 hours.

RW #18 — Sendcloud Test Loop (Feb 16)

Field Value
Iterations 2
Duration ~1 hour
Outcome Abandoned (workaround)

Late evening after the day-long marathon. Shipping/sendcloud tests still failing → AI proposes fix → still failing → human breaks the loop: "Ok, I will merge my latest changes and force a full build on main." Pragmatism over persistence.

RW #19 — Docs Docker Image Build (Feb 17)

Field Value
Iterations 2
Duration ~2 hours
Outcome Success

CHANGELOG.md not accessible inside container → AI fixes .dockerignore + COPY → same error persists → AI realizes the real root cause: pymdownx.snippets' restrict_base_path blocks .. in snippet paths. Textbook depth-of-analysis loop: first fix was necessary but insufficient.

RW #20 — PostgreSQL Auth on Self-Hosted Agents (Feb 17)

Field Value
Iterations 2 (then evolved into architecture discussion)
Duration ~3 hours
Outcome Evolved

PostgreSQL password auth failing → AI rewrites with trust auth and dynamic ports → same failure → AI SSHes into agent, finds host-level PostgreSQL interfering. Rather than continuing to iterate, the conversation pivoted to forward-looking infrastructure design (Testcontainers, MS-hosted agents, cleanup strategies). The intelligent version of breaking the loop: recognizing when iteration won't help and changing approach.

RW #21 — Type Check Cascade (Feb 23)

Field Value
Iterations 3
Duration ~45 min
Outcome Success

The most efficient loop. TypeScript error (unused import) → fix → unit tests fail (missing tenantId) → fix → all 14 projects build successfully. Progressive pipeline stage unlocking: typecheck → tests → build → green.

RW #22 — Cosign Flag Mismatch (Feb 26)

Field Value
Iterations 2
Duration ~25 min
Outcome Success

SBOM too large for Rekor transparency log → AI adds --no-tlog-uploadunknown flag error → AI corrects to --tlog-upload=false (version mismatch between cosign versions). The shortest successful loop.

RW #23 — The Grand Cosign 401 (Mar 6–8)

Field Value
Iterations 10+
Duration 44 hours
Outcome Success

The longest and most dramatic loop in the entire project. cosign attest returns 401 Unauthorized pushing to DigitalOcean Container Registry. The AI proposed 7+ wrong hypotheses over 10 iterations:

# AI Hypothesis Result
1 Token issue / missing set -e Still 401
2 Custom DOCKER_CONFIG interference Still 401
3 Stale env vars Still 401
4 Dirty config.json Still 401
5 Wrong login method (docker login vs doctl) Still 401
6 Pipeline config drift from known-good state Still 401
7 Self-hosted agent state corruption Still 401 (even on MS-hosted agent!)

The human broke the loop twice:

  • At iteration 5: asked the AI to search .specstory/history/ for past solutions (the AI's conversation archive as institutional memory)
  • At iteration 10: identified the actual root cause — the registry was locked during garbage collection, so attestation needed to run after cleanup, not in parallel

User frustration escalation: "Still failing" → "Hmmm fails again" → "Still not working??????" → "It worked! Finally! GOOD JOB."


Cross-Session Clusters

Some loops didn't occur in isolation but formed clusters — waves of related failures spanning multiple conversations.

uml diagram

Cluster A (Feb 15–17) is the most intense: a single deployment event triggered a cascade of 7 interconnected sessions spanning 28 hours and 5 Ralph Wiggum loops. The overall failure arc continued across session boundaries even when individual sessions reached local conclusions. By the end, the conversation had pivoted from reactive bug-fixing to proactive infrastructure redesign.

Cluster B (Jan 22 → Feb 3) reveals a structural weakness: the coverage threshold was lowered from 78% to 72% after the first failure, but coverage actually dropped further (from 76.75% to 71.18%) because new code kept being shipped without sufficient branch coverage.

Cluster C (Jan 26 → Feb 1) shows the container registry cleanup script breaking in a new way each time it was "fixed." The human's frustrated "Still failing. What has changed? It used to work all the time." captures the pattern perfectly.

Cluster D (Feb 26 + Mar 6–8) connects the shortest and longest loops in the project — both involving cosign + DigitalOcean, but with entirely different root causes.


Loop Outcomes Analysis

uml diagram

How the Human Broke the Loop

In 5 of the 21 loops, the human actively broke the cycle rather than letting the AI iterate to a solution:

Loop How the Human Broke It Why
#8 (Shipping tests) Disabled shipping in staging No SendCloud API keys configured — the tests couldn't pass regardless
#18 (Sendcloud tests) Merged and force-built on main Late evening after 12-hour marathon — pragmatism over persistence
#20 (PostgreSQL auth) Pivoted to architecture discussion Recognized the root cause was systemic (host-level PG) — iteration wouldn't help
#23 (Cosign 401), attempt 1 Directed AI to search conversation history The human's vague memory ("didn't we fix this before?") was more valuable than the AI's analysis
#23 (Cosign 401), attempt 2 Identified actual root cause (GC locking registry) 7 AI hypotheses failed — the human's infrastructure intuition broke through

Pattern: The human breaks the loop when the problem is environmental, systemic, or outside the AI's analytical reach. The AI excels at code-level fixes but struggles with infrastructure timing, external service behavior, and problems it has solved before but can't remember.

Loops Per Week

uml diagram

Observation: Loops cluster around two events: initial setup (Week 1, 6 loops) and the microservices migration (Week 6, 5 loops). The quiet period (Weeks 4–5) coincides with feature development on established patterns — exactly when you'd expect the loop frequency to drop. The lone Week 9 outlier (the Grand Cosign 401) is an infrastructure timing problem that no amount of code-level iteration could solve.

What the Loops Reveal About AI-First Development

The Ralph Wiggum loop is the dominant operational pattern in this project. With 21 loops comprising ~85 iterations across 53 days, the project averaged roughly 1.6 iterations per active day of human-AI pipeline ping-pong.

Three meta-patterns emerge:

  1. Cascade loops (RW #1, #15, #17, #21): Fix one error, reveal the next. Each pipeline stage acts as a gate — typecheck → lint → unit tests → Docker build → deploy → acceptance tests. Fixing a failure at one gate just means hitting the next gate. These loops are the most productive: each iteration makes genuine progress.

  2. Same-problem loops (RW #3, #7, #8, #10, #23): The same error keeps coming back despite fixes. These indicate the AI is treating symptoms rather than root causes. The Grand Cosign 401 (RW #23) is the extreme example: 7 wrong hypotheses over 44 hours because the real problem (registry locked during garbage collection) was outside the AI's analytical frame.

  3. Regression loops (RW #14, #17): The fix makes things worse. RW #14 went from 1 failing test to 45 after an AI fix. These are the most demoralizing — and the most dangerous without human oversight.

The intelligent variant matters. In a "dumb" Ralph Wiggum loop, the entire prompt is fed back every time and the AI has no guidance about what changed. In this project, the human acted as a filter — pasting only the relevant failure, providing context about what was already tried, and critically, knowing when to break the loop entirely. Five of the 21 loops were resolved not by the AI iterating to a solution, but by the human changing strategy.

This is the case for the "intelligent" in intelligent Ralph Wiggum loops: the human's judgment about when to stop iterating is as valuable as the AI's ability to keep iterating.


Addendum: The Stock Management Feature — Anatomy of a Multi-Session Implementation (February 15 – March 9, 2026)

When a single feature request spawned 7 AI sessions, revealed 15 gaps, and demonstrated that the human's most valuable intervention is asking "but does this actually work?"

The Request

On February 15, the human described a hybrid fulfillment model: "In spare time or in the weekend it could be beneficial to already print some of the best selling products when there are not many orders." The system should maintain a minimum stock per product, pre-print during quiet periods, and consume from stock when orders arrive to speed up fulfillment.

This seemingly straightforward feature would become the most revealing case study of AI-human collaboration gaps in the entire project.


The Seven Sessions

uml diagram


Session 1 — The Prompt (Stock management prompt)

February 15, 2026

The human asked the AI to design the feature. The AI created a comprehensive 9-phase implementation prompt in docs/_internal/prompts/done/prompt-inventory-stock-management.md covering inventory tracking, pre-production scheduling, stock-aware fulfillment, and an audit trail.

Three latent gaps were introduced in the design phase:

# Gap Why It Went Unnoticed
1 Inventory tracked at AssemblyPart level instead of ProductMapping Seemed logical from a parts perspective, but ignored the GridFlock problem: printing one part of a grid doesn't make a complete product
2 maximumStock field designed into the schema but never used in the deficit formula The prompt included the field in the schema but the deficit calculation only referenced minimumStock
3 Named "pre-production" A confusing term in a 3D printing context where "production" and "printing" are synonymous

The human filed the prompt as TODO. No corrections were made. The gaps were invisible because the prompt looked comprehensive — the devil was in the formulae.


Session 2 — The Implementation (Implement stock management)

March 8–9, 2026

The human said: "Implement prompt docs/_internal/prompts/done/prompt-inventory-stock-management.md"

The AI built the entire feature across 9 phases: Prisma schema changes, domain contracts, inventory module (service, repository, controller), stock replenishment cron, stock-aware orchestration, print job completion handler, permissions, and gateway proxy.

Six new gaps were introduced during implementation:

uml diagram

The AI also broke 11 existing test suites (1,499 tests total) by not adding the new purpose and stockBatchId fields to mock PrintJob objects across the codebase. The human discovered this when asking "Do we need more unit tests?" — the AI then found and fixed all 141 broken test files.

Key human interventions during implementation:

  • Pasted 5 separate CI failure logs requiring iterative fixes
  • Asked the AI to write acceptance tests after staging deployment
  • Pointed out RBAC permissions weren't seeded — "I do not have access" after deployment
  • Insisted stock management config should be per-tenant (SystemConfig), not environment variables — the AI had used global env vars, which wouldn't work for a multi-tenant SaaS

Session 3 — The Architecture Questions (Flowchart and architecture)

March 9, 2026

The human reviewed the roadmap flowchart and caught three design flaws:

  1. Flowchart loop bug: After queuing one part for replenishment, the diagram looped back to "Parts needing replenishment?" instead of the top. The human correctly noted: "In the mean time pre-production could have been disabled, new production orders and jobs could have arrived."

  2. Inventory level question: The human asked whether inventory should be tracked at ProductMapping or AssemblyPart level, noting that for GridFlock grids, "printing one part alone makes no sense." The AI agreed ProductMapping was correct.

  3. Toggle redundancy: The human asked: "What is the purpose of preProductionEnabled if minimumStock > 0 already implies replenishment intent?" The AI agreed it was redundant.

All three were prompt design flaws that had survived since February 15 — 22 days undetected.


Session 4 — The maximumStock Discovery (Prompt update & maximumStock)

March 9, 2026

This session produced the most significant human discovery. The human asked the AI to update the prompt for the current microservices architecture and rename "pre-production" to "stock replenishment."

During the review, the human asked: "The UI says the maximum stock is optional? What does this mean? If maximum stock is 0 will the system just keep printing?"

The AI investigated and discovered: maximumStock existed in the schema, DTOs, API contracts, and the UI — but had zero functional role. The deficit formula was minimumStock - currentStock. The field was pure decoration. The system always replenished to minimumStock only, regardless of maximumStock.

uml diagram

Critical human correction: The AI initially only updated the prompt document. The human pushed back: "Changes only made to the prompt? No. Implement the maximum stock stuff now, add or update unit tests and acceptance tests, update docs." The AI then discovered the system was already fully implemented (not a TODO) and made the code changes.


Session 5 — The Dead Pipeline (Replenishment event wiring)

March 9, 2026

The human deployed to staging and tested the feature. The UI showed: deficit = 1, replenishment enabled, correct settings — but 0 pending batches and no print jobs in the SimplyPrint queue.

The human shared screenshots and asked: "Shouldn't the print jobs have been created? The SimplyPrint queue is empty."

The AI investigated and found the most critical bug in the entire feature: Nobody subscribed to STOCK_REPLENISHMENT_SCHEDULED events. The StockReplenishmentService created StockBatch and PrintJob records in the database and published events — but the EventSubscriberService only listened for ORDER_CREATED and ORDER_CANCELLED. The entire replenishment pipeline was a dead end.

uml diagram

The fix: Added STOCK_REPLENISHMENT_SCHEDULED subscription to EventSubscriberService — lookup print job, validate file ID, call SimplyPrint's addToQueue(), update print job with queue item ID.

This bug could only have been found by testing on staging. The unit tests all passed because each component worked in isolation. The gap was in the wiring between components — exactly the kind of integration bug that unit tests can't catch.


Session 6 — Test Failures & Acceptance Test Pollution (CI fixes & stock cleanup)

March 9, 2026

The human pasted CI pipeline output showing two test failures caused by the maximumStock changes:

  1. inventory.controller.spec.ts: Mock data missing the new replenishmentTarget property (TS2345)
  2. inventory.service.spec.ts: The "Full Stock Widget" test data had maximumStock: 50 with currentStock: 20, giving an unexpected deficit: 30 — the test expected 1 product needing replenishment but got 2

After fixing the tests, the human noticed the staging inventory showed stock of 16 on the real "Colored Benchy" product. The human asked: "How did the system arrive at stock 16?" and "Shouldn't I be able to manually edit the current stock?"

Two more gaps discovered:

  1. Acceptance tests using real product mappings. The step "there is a product mapping with stock management" grabbed the first existing stock-managed product — the human's real "Colored Benchy." Each test run added net +8 units (+5 +3 +2 -1 -1). Two CI runs = 16 phantom units.

  2. No manual stock adjustment UI. The backend had full adjustStock and scrapStock API endpoints, the frontend had useAdjustStock() and useScrapStock() hooks — but no page in the app actually used them. The feature was 90% built but invisible to the user.


Session 7 — The Missing UI & The Silent Mutation (current session)

March 9, 2026

The human asked: "Make it so that I can do stock adjustments on the product mapping from the UI."

The AI built a StockAdjustmentModal component with three modes (Add Stock / Remove Stock / Scrap Stock), quantity input, required reason for audit trail, and a live preview of the resulting stock level. Integrated in two places: the product mapping edit page and the inventory stock levels page.

The human tested the modal on staging and reported: "Clicking the 'Add Stock' button does not close the modal window and does not update the numbers in the modal window. This is confusing because the stock IS being adjusted but you do not see it." Even after closing the modal manually (via X or Cancel), the "Current stock" label on the product mapping page showed the stale value — a full page refresh was required.

Root cause — another wiring gap between backend and frontend:

uml diagram

The fix was three-layered:

  1. Backend: Changed @HttpCode(HttpStatus.OK) to @HttpCode(HttpStatus.NO_CONTENT) on both adjustStock and scrapStock controller methods — semantically correct for operations that return no data
  2. Frontend request() function: Added a safety net to read the response as text first and only parse JSON if content exists — prevents any future 200-with-empty-body from silently breaking mutations
  3. Modal onError handler: Added explicit error toast so mutation failures are never silent again

This gap is a cascade from gap #10 (no stock adjust UI). The AI built the modal with correct onSuccess wiring (close modal, show toast, invalidate queries) — but didn't test the full HTTP round-trip where the backend returned an unexpected response format. The same "each piece works in isolation" pattern as the dead event pipeline (gap #6).

Gap #14 — Acceptance tests expected 200, got 204. The fix for gap #13 changed the HTTP status code from 200 to 204. The CI pipeline immediately caught two failing acceptance tests:

Expected: 200
Received: 204

The feature file scenarios "Stock can be adjusted positively" and "Stock can be scrapped" both asserted Then the response status should be 200. A textbook cascade: fixing the backend response format broke the acceptance test assertions that hardcoded the old status code. Fixed by updating both scenarios to expect 204.

Gap #15 — Inventory nav item not gated by feature flag. The human disabled the stockManagement feature flag on the Feature Flags settings page and observed that:

  • The Stock Management tile in Settings correctly disappeared
  • The Stock Management section on Product Mapping edit correctly disappeared
  • But the Inventory menu item in the sidebar (and mobile nav) remained visible

Navigating to /inventory showed the error: "Failed to load stock levels: Stock management is not enabled for this tenant." The nav items were defined as a static array — the feature flag was never consulted.

This is an implementation gap from the original stock management build (Session 2). The AI correctly gated the Settings tile and the Product Mapping section behind features?.stockManagement, but forgot to apply the same gate to the navigation. A partial feature flag implementation — the feature was hidden from two of three entry points, but the primary entry point (the nav menu) was left wide open.

The fix: Added a featureFlag property to nav items and filtered the navigation arrays in both Sidebar and MobileNav components based on useFeatureFlags(). The Inventory item is now only visible when stockManagement is enabled.

The human also noted a UX issue: the "Edit" button for stock management settings was positioned at the same level as "Adjust Stock" and "Disable," but it controls the parameters below (Min Stock, Max Stock, Priority, Batch Size). The button was moved down to sit alongside the settings grid, with a "Settings" sub-label — grouping the control with the content it modifies.


The Gap Lifecycle

uml diagram

Discovery Attribution

uml diagram

The human found 67% of all gaps. The CI pipeline caught type errors and assertion failures (27%). The AI self-discovered only one bug (7%) — the tenant-wide pendingBatches count — and only because it was actively refactoring the deficit calculation for the maximumStock fix.


The Pattern: Prompt vs. Implementation vs. Wiring

The 15 gaps cluster into three categories that reveal where AI struggles most:

Category Gaps Example Root Cause
Design gaps (prompt) 5 maximumStock in schema but not in formula AI creates comprehensive-looking documents where individual details contradict each other
Wiring gaps (implementation) 7 No event subscriber for replenishment; nav item not gated by feature flag AI implements each component correctly in isolation but misses the connections between them — including applying the same gate to all entry points
Cascade gaps (fixes) 3 Mock data incomplete after adding field; modal mutation silently failing; acceptance tests expecting old HTTP status Fixing one gap exposes another — the "whack-a-mole" pattern

The most dangerous category is wiring gaps. Each component passes its unit tests. The cron creates batches. The event bus publishes. The subscriber service runs. But nobody wired them together. This is the software equivalent of building a beautiful bridge where each section is structurally sound — but the sections don't connect.


The Human's Questions — In Order

The entire gap discovery process was driven by ten observations from the human:

  1. "Shouldn't the flowchart loop back to the top?" → Flowchart logic error (22 days old)
  2. "Should we track at ProductMapping or AssemblyPart level?" → Wrong abstraction level (22 days old)
  3. "What is the purpose of preProductionEnabled if minimumStock > 0 already implies it?" → Redundant toggle (22 days old)
  4. "The UI says maximum stock is optional? What does this mean?"maximumStock non-functional (22 days old)
  5. "Shouldn't the print jobs have been created? The SimplyPrint queue is empty." → Dead event pipeline (1 day old)
  6. "How did the system arrive at stock 16?" → Acceptance tests polluting real data (1 day old)
  7. "Shouldn't I be able to manually edit the current stock?" → Missing UI (1 day old)
  8. "Clicking Add Stock does not close the modal. The stock IS being adjusted but you do not see it." → Silent mutation failure (minutes old)
  9. "I feel like 'Edit' should be one level down on the same level as the parameters because it is for editing those parameters." → UX hierarchy mismatch (minutes old)
  10. "When I set the Stock Management feature flag to false, the Inventory menu item still exists." → Partial feature flag implementation (minutes old)

Every question was a variant of the project's recurring pattern: "What about...?" — the human's superpower identified earlier in this timeline. The human didn't need to read the code. They just used the system and noticed when reality didn't match expectations. Question #8 is particularly telling: the human reported the exact symptom — "the stock IS being adjusted but you do not see it" — which immediately pointed the AI toward a response-handling issue rather than a backend bug. Question #10 reveals the pattern at its most systematic: the human toggled a feature flag off and methodically checked every place in the UI where the feature should disappear — finding the one the AI missed.


Conclusion

The stock management feature was implemented across 7 sessions spanning 22 days. It produced 15 gaps — 5 in the prompt, 7 during implementation, and 3 as cascades from fixes. The human discovered 10 of them (67%), the CI pipeline caught 4 (27%), and the AI self-discovered 1 (7%).

The most important lesson: a feature can look fully implemented — schema, API, UI, tests, documentation — and still be fundamentally broken if the wiring between components is missing. The replenishment pipeline created database records but never sent print jobs to SimplyPrint. The maximumStock field appeared everywhere in the UI but did nothing. The acceptance tests ran green while silently corrupting production data. The stock adjustment modal had perfect onSuccess logic — close the modal, show a toast, invalidate queries — but a mismatch between the backend's empty HTTP 200 and the frontend's JSON parser meant the success callback never fired. The feature flag correctly hid the feature from two of three UI entry points — but left the primary navigation link visible.

The cascade chain is particularly instructive: gap #10 (no UI) → gap #13 (modal doesn't close due to HTTP 200/empty body) → gap #14 (acceptance tests fail because they expected 200, now get 204). Each fix peeled back a layer and exposed the next. Meanwhile, gap #15 (nav item not gated) demonstrates a different AI failure mode: incomplete application of a cross-cutting concern. The AI applied the feature flag to the Settings page and the Product Mapping page but missed the navigation — the most visible entry point. The human found it in seconds by simply toggling the flag and looking at the sidebar.

The AI excelled at building each piece. The human excelled at asking whether the pieces actually worked together. Neither could have shipped this feature alone — but for very different reasons than the earlier sections of this timeline describe. Here, the AI's failure mode wasn't speed-induced sloppiness. It was the gap between "implemented" and "functional" — the subtle difference between code that exists and code that works.


Addendum: The SonarQube Saga — From 769 Issues to Zero in 43 Hours (March 12–14, 2026)

When a phone call about a demo turned into a full code quality overhaul — and AI proved that "65 hours of developer work" is a negotiable concept.

The Catalyst

On the evening of March 12, the human had a phone call with Steven Robijns about an upcoming demo of the Forma3D.Connect platform. Steven asked a simple but pointed question: "How good is the code quality that AI generates?"

The human didn't have a data-driven answer. The codebase had been built entirely by AI over 9 weeks, with human guidance — but no independent quality assessment had ever been performed. That evening, the human sat down and asked the AI three things:

  1. Research: Create a research report on integrating SonarQube into the project
  2. Prompt: Design a prompt for implementing SonarCloud integration
  3. Execute: Implement the prompt

What followed was a 43-hour sprint that would transform the codebase from 769 issues to zero — and provide the data-driven answer Steven's question demanded.


The Timeline

uml diagram

March 12, Evening — The Phone Call and Research (17:00–21:00Z)

Time Event Session
~17:00Z Phone call with Steven Robijns about upcoming demo
17:18Z Human asks AI to research SonarQube CE vs SonarCloud Research report created
17:18Z Human asks AI to create integration prompt prompt-sonarcloud-integration.md
17:21Z AI researches SonarQube CE limitations, TypeScript support, branch analysis Research document with PlantUML
~18:00Z Human registers project in SonarCloud, provides credentials
~18:30Z AI implements SonarCloud integration in Azure Pipeline SonarCloudPrepare@4, SonarCloudAnalyze@4, SonarCloudPublish@4
20:37Z First scan results arrive: 769 issues AI begins fixing S2933 (readonly modifiers)

The first scan was sobering:

Metric Value
Total issues 769
Code smells 748
Vulnerabilities 12
Bugs 9
Security hotspots 6
Code duplication 19.5%
SonarCloud estimated fix effort 3,890 minutes (~65 hours)

The AI immediately produced a triage report (sonarcloud-issue-triage-20260312.md) categorizing all 769 issues by severity and action:

Severity Count Real Problems False Positive Won't Fix
BLOCKER 13 1 12 0
CRITICAL 39 36 0 3
MAJOR 132 132 0 0
MINOR 585 490 15 80
Total 769 659 27 83

March 13, Morning — The Blitz (07:00–10:00Z)

The human directed the AI to fix issues in waves. Twelve parallel AI sessions ran between 07:00 and 08:00Z alone:

Time Rule Description Issues Fixed
07:20Z Remove all // NOSONAR comments (not supported in TS) 14
07:34Z S3863 Merge duplicate imports 80
07:34Z S7773 parseInt()Number.parseInt() ~15
07:34Z S7781 .replace(/g).replaceAll() 16
07:34Z S7748 Remove unnecessary decimal points 23
07:39Z S7735 Flip negated conditions 40
07:39Z S6582 Prefer optional chaining 10
07:47Z S7764 windowglobalThis 36
07:47Z S4325 Remove redundant type assertions 14
07:47Z S7778 indexOfincludes 9
07:47Z S7757 Consistent type imports 2
07:47Z S7776 startsWith/endsWith 2

By 10:00Z, issues had dropped from 769 to 244 — a 68% reduction in 3 hours.

March 13, Afternoon — Moderate Fixes and Coverage (13:00–15:00Z)

The AI produced a second triage report and executed phases 1, 2, and 5:

Time Rule Description Issues Fixed
13:58Z Phase 1 Quick auto-fixes (replaceAll, ??=, .at()) 22
14:02Z S6759 React props should be Readonly<> (3 sessions) 49
14:04Z S4624 Extract nested template literals 10
14:04Z S6819 div role="region"<section> 6
14:04Z S4323 Type aliases 2
14:04Z S6571 Redundant unknown in unions 2
14:18Z S4325 Type assertion review 4
14:18Z S6853 Form label association (htmlFor/id) (2 sessions) 8
14:23Z S6551 String coercion (String() wrapping) 47
14:44Z S107 Too many constructor parameters 2

March 13, Evening — The Coverage Problem (20:00–22:00Z)

The human checked SonarCloud and was confused: "I thought we upped coverage and downed duplication. Still seems off?"

Metric Azure DevOps SonarCloud Why Different
Coverage 73% 57.9% SonarCloud counts uninstrumented files as 0% covered
Duplication 10.1% 6,662 duplicated lines across 141 blocks

Root cause: SonarCloud's sonar.sources included files excluded from test instrumentation. The AI aligned sonar.coverage.exclusions with the Jest/Vitest exclusion patterns.

The AI also fixed 18 more duplicate import issues across the service layer (S3863).

By end of March 13: 61 issues remaining, duplication at 10.1%.

March 14, Morning — The Final Push (07:00–13:00Z)

uml diagram

Time Action Result
07:38Z Human asks why main still shows "Failed" AI explains PR vs. main quality gate difference
08:19Z Fix shallow clone warning in pipeline Git fetch depth configured
08:40Z CodeCharta integration (3D city map from SonarCloud) Visualization pipeline job added
10:00Z Human directs: "Fix or won't-fix the 47 issues" AI resolves all 47 via API + code fixes
10:00Z Duplication reduction target: 9.8% → ~3% Refactored duplicated service code
12:06Z Coverage improvement sprint begins 12 new test files created
12:07Z S107 constructor parameter fix SendcloudBaseService refactored
12:09Z Pipeline enforcement: quality gate must pass sonar.qualitygate.wait=true added
12:14Z Final test files for service-common Controllers and services covered

By 13:00Z on March 14: 0 issues, quality gate passing, pipeline enforcing.


The Numbers

uml diagram

Metric First Scan (Mar 12) Final (Mar 14) Change
Total issues 769 0 -769 (-100%)
Code smells 748 0 -748
Vulnerabilities 12 0 -12
Bugs 9 0 -9
Code duplication 19.5% ~3% -16.5pp
Duplicated lines 13,366 ~2,000 -85%
Coverage (SonarCloud) 57.9% 70%+ +12pp
Quality gate FAILED PASSED

AI Sessions Breakdown

Category Sessions Purpose
Research & setup 3 SonarQube CE research, SonarCloud integration prompt, pipeline setup
Triage reports 2 Categorize all issues by severity and action
Batch rule fixes 20 Fix specific SonarCloud rules across codebase
Coverage & duplication 4 Align coverage metrics, reduce duplication
Pipeline & quality gate 3 Enforce quality gate, shallow clone, CodeCharta
Test coverage sprint 6 New test files to raise coverage from 59% to 70%+
Total ~38

Cost Analysis

Human Cost

Activity Estimated Time Notes
Phone call with Steven Robijns 30 min The catalyst
Research direction & initial setup 30 min "I want SonarCloud, evaluate CE vs Cloud"
Registering project in SonarCloud 15 min Manual step in SonarCloud UI
Directing AI fix sessions 2 hours Pasting rule IDs, reviewing progress
Reviewing SonarCloud dashboard 1 hour Checking metrics between fix rounds
Asking clarifying questions 30 min "Why does main still fail?", coverage discrepancy
Pipeline verification 30 min Confirming quality gate enforcement
Total human time ~5 hours Spread across 2 evenings + 1 morning

AI Cost

Resource Estimate
AI sessions ~38
Estimated AI compute cost ~€40–60
Files modified 200+
Lines changed 3,000+

Total Project Cost

Item Cost
Human time (~5 hrs × €75/hr) ~€375
AI compute ~€50
SonarCloud (free for open-source tier) €0
Total ~€425

SonarCloud's Estimate vs. Reality

This is where the numbers become striking. SonarCloud estimates fix effort per issue based on industry averages for a human developer working manually.

uml diagram

Metric SonarCloud Estimate (Human) Actual (AI + Human) Ratio
Work effort 65 hours ~8 hours (5 human + 3 AI) 8x faster
Calendar time ~8 working days ~2 calendar days 4x faster
Cost ~€4,875 (at €75/hr) ~€425 11x cheaper
Issues resolved per hour ~12/hr ~96/hr (elapsed) 8x throughput

Important caveat: SonarCloud's effort estimates assume a single human developer reading code, understanding context, making changes, running tests, and reviewing. The AI skips the "reading and understanding" phase — it can grep the entire codebase in seconds and apply mechanical transformations across hundreds of files simultaneously. For mechanical fixes (replace .replace(/g) with .replaceAll(), add Readonly<> to React props), the AI's advantage is extreme. For complex refactoring (reducing duplication, restructuring services), the advantage narrows but remains significant.


What SonarCloud Revealed About AI-Generated Code

The 769 issues tell a story about how AI writes code:

Pattern Issues What It Reveals
Missing readonly modifiers (S2933) 8 AI doesn't default to immutability
Duplicate imports (S3863) 80+ AI adds imports incrementally without consolidating
Legacy patterns (S7781, S7773, S7778) 60+ AI uses older JS idioms (replace(/g), parseInt(), indexOf)
Negated conditions (S7735) 40 AI writes if (!x) { a } else { b } instead of if (x) { b } else { a }
React props not Readonly<> (S6759) 49 AI doesn't wrap React props in Readonly<> by default
Template literal nesting (S4624) 10 AI creates complex nested interpolations
String coercion (S6551) 47 AI interpolates unknown values without String()
Too many parameters (S107) 2 AI mirrors DI framework patterns without questioning parameter count

The meta-pattern: AI writes functional code — it works, it passes tests, it handles edge cases. But it doesn't write idiomatic code. It uses patterns from its training data rather than modern best practices. SonarCloud catches exactly these kinds of style and maintainability gaps.

The irony: AI generated the code quality issues. AI also fixed them all. The human's role was to introduce the quality gate (SonarCloud) and direct the AI to fix what it found. The same AI that wrote parseInt() instead of Number.parseInt() could instantly fix all 15 occurrences when told to — it just didn't know it should until SonarCloud flagged it.


The Pipeline Integration

The SonarCloud integration became a permanent part of the development process:

uml diagram

Quality gate conditions (Sonar way for AI Code):

Condition Threshold Scope
New issues 0 New code only
New coverage ≥ 80% New code only
New duplication ≤ 3% New code only
Reliability rating A Overall
Security rating A Overall

Why This Matters for the Demo

Steven Robijns asked: "How good is the code quality that AI generates?"

The data now provides a nuanced answer:

  1. AI-generated code starts with quality gaps. 769 issues in a 53,000-line codebase is ~14.5 issues per 1,000 lines. This is within industry norms but reveals that AI doesn't naturally write to SonarCloud standards.

  2. AI can fix its own quality gaps at extreme speed. What SonarCloud estimated would take a human developer 65 hours was completed in 2 calendar days with ~5 hours of human oversight.

  3. The quality is now enforced. With sonar.qualitygate.wait=true in the pipeline, every future commit must pass the quality gate. AI can no longer introduce issues without immediately being asked to fix them.

  4. The real answer: AI-generated code quality is exactly as good as the quality gates you enforce. Without SonarCloud, the 769 issues would have accumulated silently. With SonarCloud, they were eliminated in 43 hours and can never return. The quality of AI code is a function of the guardrails, not the AI itself.


The Steven Robijns Pattern

This incident reveals a pattern seen throughout the project: external pressure drives quality improvements that internal development wouldn't prioritize.

The codebase had existed for 9 weeks without a code quality scanner. The AI never suggested adding one. The human never prioritized it. It took a phone call about a demo — an external event with social stakes — to trigger the integration.

Once triggered, the actual work was trivial for the AI. The bottleneck was never "can AI fix code quality issues?" — it was "does anyone ask it to?" This is the human's role distilled to its essence: not writing code, not even reviewing code, but deciding what questions to ask about the code.

Steven Robijns didn't write a single line of code. He asked a single question. That question led to 769 fixes, a permanent quality gate, and a data-driven answer ready for the demo. The highest-leverage intervention in this entire saga was a phone call.



Addendum: The Grype CVE Saga (March 17, 2026)

The supply chain security pipeline

uml diagram

Background

On March 17, 2026 at 09:45, the AI introduced container vulnerability scanning into the CI/CD pipeline. Using Grype (by Anchore), every Docker image now gets its SBOM scanned for known CVEs before deployment. The pipeline was configured to fail on High severity vulnerabilities that have available fixes (--fail-on high --only-fixed).

The very first pipeline run with Grype enabled immediately failed. Every single service image had CVEs. What followed was a 4-hour investigation and remediation session between the human and AI.

What Grype uncovered

Across the 9 container images (Gateway, Order Service, Print Service, Shipping Service, GridFlock Service, Web, Docs, EventCatalog, Slicer), Grype discovered vulnerabilities in three distinct layers:

uml diagram

Layer 1: npm transitive dependencies (all NestJS services)

Package Vulnerable Version Fixed Version Severity
cross-spawn 7.0.3 7.0.5 High
minimatch 5.1.6, 9.0.5 5.1.9, 9.0.9 High
glob 10.4.2 10.5.0 High
tar 6.2.1 7.x High (6 CVEs)
file-type 21.2.0 21.3.3 Medium
lodash 4.17.21 4.17.23 Medium
serialize-javascript 6.0.2 7.0.4 High
ajv 8.17.1 8.18.0 Medium
bn.js 4.12.2 5.2.3 Medium
qs 6.14.1 6.15.0 Low

Layer 2: System packages from Docker base images

Source Packages Affected Cause
npm bundled in node:20-alpine tar@6.2.1, glob@10.4.2, cross-spawn@7.0.3 System npm not needed at runtime
Alpine docker-cli package Go stdlib, containerd, docker/cli, otel/sdk Go binaries with old dependencies
Alpine zlib zlib 1.3.1-r2 Outdated Alpine package

Layer 3: The Slicer (linuxserver/bambustudio:01.08.03)

The Slicer was in a category of its own: 38,731 SBOM components and 800+ CVEs with fixes available, including Critical findings with active exploitation (CVE-2024-9680 in Firefox ESR with 30.8% EPSS). The base image ships an entire Debian 12 desktop environment (Firefox, GStreamer, Qt5, CUPS, ffmpeg, GhostScript) with packages that hadn't been patched in over a year.

How they were solved

npm transitive dependencies — Added 9 pnpm overrides to package.json:

"pnpm": {
  "overrides": {
    "ajv@>=8": ">=8.18.0",
    "bn.js": ">=4.12.3",
    "cross-spawn": ">=7.0.5",
    "file-type": ">=21.3.2",
    "lodash": ">=4.17.23",
    "minimatch@<6": "5.1.9",
    "minimatch@>=9 <10": "9.0.9",
    "qs": ">=6.14.2",
    "serialize-javascript": ">=7.0.3"
  }
}

A key insight: the tar@6 → 7 override was deliberately not applied. The AI investigated and found tar@7 is ESM-only with a completely different API — it would break prisma-uml in development. The tar@6.2.1 in production images came from the bundled npm, not from project dependencies.

System packages in Docker base images — Modified 5 Dockerfiles (Gateway, Order Service, Print Service, Shipping Service, GridFlock Service) to add two lines to each production stage:

RUN apk add --no-cache openssl ... && \
    apk upgrade --no-cache && \
    rm -rf /usr/local/lib/node_modules/npm /usr/local/bin/npm /usr/local/bin/npx
  • apk upgrade picks up patched Alpine packages (zlib, docker-cli Go binaries)
  • Removing npm strips the bundled tar/glob/cross-spawn that the runtime doesn't need

The Slicer — After analyzing the 800+ CVEs, the conclusion was that they are unfixable without an upstream base image update. BambuStudio v2 had compatibility issues, and the container runs internally without internet exposure. The grype scan was commented out with a detailed rationale, and a TODO.md entry was created to revisit after BambuStudio v2 research.

The tar investigation

The AI's handling of the tar package is worth highlighting. When first asked "can this be fixed?", a naive approach would have been to add "tar": ">=7.5.11" to overrides. Instead, the AI traced two separate sources:

uml diagram

This kind of dependency forensics — cross-referencing lockfile versions against container scan versions, tracing transitive dependency trees, evaluating major version compatibility — is exactly the work that makes CVE remediation time-consuming for humans.

Time analysis

uml diagram

Activity Clock time Notes
Grype introduction to pipeline ~09:45 AI added SBOM generation + Grype scan to all 9 service jobs
First failed pipeline run ~12:00 Gateway scan revealed 30 CVEs
Human reviews logs, pastes to AI ~12:00–13:30 Human provided scan output for each service, asked "same issues?"
AI investigation + all fixes applied ~13:30 pnpm overrides, 5 Dockerfiles, Slicer exclusion, TODO.md
Smoke test builds pass ~13:30 Gateway + Order Service + GridFlock builds verified
Total wall clock ~4 hours From introduction to all fixes committed

AI effort: The AI performed dependency tree analysis (pnpm why for 10+ packages), lockfile forensics, Dockerfile inspection across 9 services, version compatibility research, applied 15+ file edits, ran build verification, and wrote the TODO.md entry. Estimated equivalent: ~15 minutes of compute time across all interactions.

Human effort: The human pasted 6 pipeline log outputs, asked clarifying questions ("same issues?", "will this fix the pipeline?"), and made one strategic decision (exclude Slicer grype with rationale). Estimated: ~30 minutes of active work, mostly reading and deciding.

Estimated time without AI: A senior developer performing the same work manually would need to:

  • Understand each CVE and its severity (~1 hour reading advisories)
  • Trace each vulnerable package through the dependency tree (~2 hours with npm ls / pnpm why)
  • Research which overrides are safe vs breaking (tar@7 ESM, minimatch cross-major) (~2 hours)
  • Apply and test pnpm overrides (~1 hour)
  • Investigate Docker base image CVE sources vs project CVEs (~2 hours)
  • Modify 5 Dockerfiles and verify builds (~1 hour)
  • Analyze the Slicer's 800+ CVEs and determine unfixability (~2 hours)
  • Write documentation and TODO entries (~1 hour)

Estimated total without AI: 12–16 hours (2 full working days), assuming the developer has prior experience with container security scanning, pnpm overrides, and Alpine/Debian package management.

uml diagram

The pattern

This incident follows the same pattern seen throughout the project: the human provides context and makes strategic decisions; the AI provides velocity and thoroughness.

The human's highest-leverage contributions were:

  1. Recognizing the Slicer was different — rather than asking the AI to "fix everything," the human asked "will this actually fix the pipeline?" which led to the honest answer that Go module CVEs in the base image are unfixable
  2. Making the exclude decision — weighing the security risk (internal-only container) against the engineering cost (BambuStudio v2 compatibility research) and choosing to defer with documentation
  3. Asking about each service — by methodically going through Gateway → Print → Shipping → Order → GridFlock → Slicer, the human ensured nothing was missed and that service-specific CVEs (bn.js in Order, ajv/serialize-javascript in GridFlock) were caught

The AI's key contribution was turning each question into immediate action — no context switching, no documentation lookup, no "I'll look into it tomorrow." Each service's scan output was analyzed in seconds and cross-referenced against the fixes already applied.


Generated from CHANGELOG.md and 500+ chat sessions in .specstory/history/
Forma3D.Connect — January 9 – March 17, 2026