Skip to content

Architecture Decision Records (ADR)

Project: Forma3D.Connect
Version: 3.0
Last Updated: January 21, 2026

This document captures the significant architectural decisions made during the development of Forma3D.Connect.


Table of Contents

  1. ADR-001: Monorepo with Nx
  2. ADR-002: NestJS for Backend Framework
  3. ADR-003: React 19 with Vite for Frontend
  4. ADR-004: PostgreSQL with Prisma ORM
  5. ADR-005: TypeScript Strict Mode
  6. ADR-006: Azure DevOps for CI/CD
  7. ADR-007: Layered Architecture with Repository Pattern
  8. ADR-008: Event-Driven Internal Communication
  9. ADR-009: OpenAPI/Swagger for API Documentation
  10. ADR-010: HMAC Verification for Webhooks
  11. ADR-011: Idempotent Webhook Processing
  12. ADR-012: Assembly Parts Model for Product Mapping
  13. ADR-013: Shared Domain Library
  14. ADR-014: SimplyPrint as Unified Print Farm Controller
  15. ADR-015: Aikido Security Platform
  16. ADR-016: Sentry Observability with OpenTelemetry
  17. ADR-017: Docker + Traefik Deployment Strategy
  18. ADR-018: Nx Affected Conditional Deployment Strategy
  19. ADR-019: SimplyPrint Webhook Verification
  20. ADR-020: Hybrid Status Monitoring (Polling + Webhooks)
  21. ADR-021: Retry Queue with Exponential Backoff
  22. ADR-022: Event-Driven Fulfillment Architecture
  23. ADR-023: Email Notification Strategy
  24. ADR-024: API Key Authentication for Admin Endpoints
  25. ADR-025: Cosign Image Signing for Supply Chain Security
  26. ADR-026: CycloneDX SBOM Attestations
  27. ADR-027: TanStack Query for Server State Management
  28. ADR-028: Socket.IO for Real-Time Dashboard Updates
  29. ADR-029: API Key Authentication for Dashboard
  30. ADR-030: Sendcloud for Shipping Integration
  31. ADR-031: Automated Container Registry Cleanup
  32. ADR-032: Domain Boundary Separation with Interface Contracts
  33. ADR-033: Database-Backed Webhook Idempotency
  34. ADR-034: Docker Log Rotation & Resource Cleanup
  35. ADR-035: Progressive Web App (PWA) for Cross-Platform Access
  36. ADR-036: localStorage Fallback for PWA Install Detection
  37. ADR-037: Keep a Changelog for Release Documentation
  38. ADR-038: Zensical for Publishing Project Documentation

ADR-001: Monorepo with Nx

Attribute Value
ID ADR-001
Status Accepted
Date 2026-01-09
Context Need to manage multiple applications (API, Web, Desktop, Mobile) and shared libraries in a single repository

Decision

Use Nx (v19.x) as the monorepo management tool with pnpm as the package manager.

Rationale

  • Unified tooling: Single command to build, test, lint all projects
  • Dependency graph: Nx understands project dependencies and can run only affected tests
  • Caching: Local and remote caching speeds up CI/CD pipelines
  • Code sharing: Shared libraries (@forma3d/domain, @forma3d/utils, etc.) are first-class citizens
  • Plugin ecosystem: Built-in support for NestJS, React, and other frameworks

Consequences

  • ✅ Fast CI through affected commands and caching
  • ✅ Consistent tooling across all projects
  • ✅ Easy code sharing via path aliases
  • ⚠️ Learning curve for developers unfamiliar with Nx
  • ⚠️ Initial setup complexity

Alternatives Considered

Alternative Reason for Rejection
Turborepo Less mature NestJS support
Lerna Deprecated in favor of Nx
Separate repositories Too much overhead for shared code

ADR-002: NestJS for Backend Framework

Attribute Value
ID ADR-002
Status Accepted
Date 2026-01-09
Context Need a robust, scalable backend framework for the integration API

Decision

Use NestJS (v10.x) as the backend framework.

Rationale

  • Enterprise-grade: Built-in support for dependency injection, modules, guards, interceptors
  • TypeScript-first: Native TypeScript support with decorators
  • Modular architecture: Easy to organize code by feature
  • Excellent documentation: Well-documented with active community
  • Testing support: Built-in testing utilities with Jest
  • OpenAPI support: First-class Swagger/OpenAPI integration via @nestjs/swagger

Consequences

  • ✅ Clean, maintainable code structure
  • ✅ Easy to add new features as modules
  • ✅ Built-in validation with class-validator
  • ✅ Excellent integration with Prisma
  • ⚠️ Verbose compared to Express.js
  • ⚠️ Decorator-heavy syntax

Alternatives Considered

Alternative Reason for Rejection
Express.js Too low-level, lacks structure
Fastify Less ecosystem support
Hono Too new, less enterprise features

ADR-003: React 19 with Vite for Frontend

Attribute Value
ID ADR-003
Status Accepted
Date 2026-01-09
Context Need a modern frontend framework for the admin dashboard

Decision

Use React 19 with Vite as the bundler and Tailwind CSS for styling.

Rationale

  • React 19: Latest version with improved performance and new features
  • Vite: Extremely fast development server and build times
  • Tailwind CSS: Utility-first CSS for rapid UI development
  • TanStack Query: Excellent server state management
  • React Router: Standard routing solution

Consequences

  • ✅ Fast development experience with HMR
  • ✅ Modern React features (Server Components ready)
  • ✅ Consistent styling with Tailwind
  • ⚠️ Tailwind learning curve for traditional CSS developers

Alternatives Considered

Alternative Reason for Rejection
Next.js Overkill for admin dashboard, SSR not needed
Angular Less flexibility, steeper learning curve
Vue.js Team expertise in React

ADR-004: PostgreSQL with Prisma ORM

Attribute Value
ID ADR-004
Status Accepted
Date 2026-01-09
Context Need a reliable database with type-safe access

Decision

Use PostgreSQL 16 as the database with Prisma 5 as the ORM.

Rationale

  • PostgreSQL: Robust, ACID-compliant, excellent JSON support
  • Prisma: Type-safe database access, auto-generated client
  • Schema-first: Prisma schema as single source of truth
  • Migrations: Built-in migration system
  • Studio: Visual database browser for development

Consequences

  • ✅ Full type safety from database to API
  • ✅ Easy schema changes with migrations
  • ✅ No raw SQL in application code
  • ⚠️ Prisma Client must be regenerated after schema changes
  • ⚠️ Some complex queries require raw SQL

Schema Design Decisions

  • UUIDs for primary keys (portability, no sequence conflicts)
  • JSON columns for flexible data (shipping address, print profiles)
  • Decimal type for monetary values (precision)
  • Timestamps with timezone (audit trail)

ADR-005: TypeScript Strict Mode

Attribute Value
ID ADR-005
Status Accepted
Date 2026-01-09
Context Need to ensure code quality and catch errors early

Decision

Enable TypeScript strict mode with additional strict checks:

{
  "strict": true,
  "noImplicitAny": true,
  "strictNullChecks": true,
  "noUnusedLocals": true,
  "noUnusedParameters": true
}

Rationale

  • Early error detection: Catch type errors at compile time
  • Self-documenting code: Types serve as documentation
  • Refactoring safety: IDE can safely refactor with full type information
  • No any type: Prevents type escape hatches

Consequences

  • ✅ Higher code quality
  • ✅ Better IDE support and autocomplete
  • ✅ Safer refactoring
  • ⚠️ More verbose code with explicit types
  • ⚠️ Stricter null checking requires careful handling

ADR-006: Azure DevOps for CI/CD with Digital Ocean Hosting

Attribute Value
ID ADR-006
Status Accepted
Date 2026-01-09
Context Need a CI/CD pipeline for automated testing and deployment

Decision

Use Azure DevOps Pipelines for CI/CD and Digital Ocean for hosting.

Rationale

  • Azure DevOps: Existing team expertise with YAML pipelines
  • Digital Ocean: Cost-effective, simple infrastructure for small-scale deployment
  • Separation of concerns: CI/CD tooling separate from hosting
  • Docker-based: Consistent container deployment across environments
  • Managed Database: Digital Ocean managed PostgreSQL for reliability

Infrastructure

Component Service Purpose
CI/CD Azure DevOps Pipelines Build, test, deploy automation
Container Registry Digital Ocean Registry Docker image storage
Staging Digital Ocean Droplet Staging environment
Production Digital Ocean Droplet Production environment
Database Digital Ocean Managed PostgreSQL Data persistence

Pipeline Stages

  1. Validate: Lint and type check
  2. Test: Unit tests with coverage, E2E tests
  3. Build: Build all affected projects, push to DO Registry
  4. Deploy Staging: Auto-deploy on develop branch
  5. Deploy Production: Manual approval for main branch

Consequences

  • ✅ Automated quality gates
  • ✅ Fast feedback on PRs
  • ✅ Consistent deployments
  • ✅ Cost-effective hosting
  • ⚠️ Need to manage Docker deployments on Droplets

ADR-007: Layered Architecture with Repository Pattern

Attribute Value
ID ADR-007
Status Accepted
Date 2026-01-09
Context Need a clean separation of concerns in the backend

Decision

Implement a layered architecture with the Repository Pattern:

uml diagram

Layer Responsibilities

Layer Responsibility Example
Controller HTTP handling, validation, routing OrdersController
Service Business logic, orchestration OrdersService
Repository Data access, Prisma queries OrdersRepository
DTO Data transfer, validation CreateOrderDto

Rationale

  • Testability: Each layer can be tested in isolation
  • Single responsibility: Clear separation of concerns
  • Flexibility: Easy to swap implementations (e.g., different databases)
  • Maintainability: Changes in one layer don't affect others

Consequences

  • ✅ Clean, maintainable code
  • ✅ Easy to unit test with mocks
  • ✅ Prisma isolated to repository layer
  • ⚠️ More files per feature
  • ⚠️ Some boilerplate code

ADR-008: Event-Driven Internal Communication

Attribute Value
ID ADR-008
Status ✅ Implemented
Date 2026-01-09 (Updated: 2026-01-13)
Context Need to decouple components and trigger actions on state changes

Decision

Use NestJS EventEmitter for internal event-driven communication.

Events Defined

Event Trigger Listeners
order.created New order from Shopify webhook OrchestrationService
order.status_changed Order status update EventLogService
order.cancelled Order cancellation PrintJobsService (cancels pending jobs)
order.ready-for-fulfillment All print jobs completed Future: FulfillmentService (Phase 3)
order.fulfilled Order shipped EventLogService
order.failed Order processing failed EventLogService
simplyprint.job-status-changed SimplyPrint webhook/poll update PrintJobsService
printjob.created Print job created in SimplyPrint EventLogService
printjob.status_changed Print job status update OrchestrationService
printjob.completed Print job finished successfully OrchestrationService (checks order completion)
printjob.failed Print job failed OrchestrationService (checks order failure)
printjob.cancelled Print job cancelled EventLogService
printjob.retry_requested Print job retry initiated EventLogService

Event Flow Diagram

Shopify Webhook → OrdersService → order.created
                                       ↓
                              OrchestrationService
                                       ↓
                              PrintJobsService → printjob.created
                                       ↓
                              SimplyPrint API

SimplyPrint Webhook → SimplyPrintService → simplyprint.job-status-changed
                                                  ↓
                                          PrintJobsService → printjob.status_changed
                                                  ↓                    ↓
                                          printjob.completed    printjob.failed
                                                  ↓                    ↓
                                          OrchestrationService ←───────┘
                                                  ↓
                                          order.ready-for-fulfillment (Phase 3)

Rationale

  • Decoupling: Services don't directly depend on each other
  • Extensibility: Easy to add new listeners
  • Async processing: Events can be processed asynchronously
  • Audit trail: Events naturally support logging
  • Orchestration: Clean separation between job creation and completion tracking

Consequences

  • ✅ Loose coupling between modules
  • ✅ Easy to add new functionality
  • ✅ Clear event flow
  • ✅ Enables reactive order completion tracking
  • ⚠️ Harder to trace execution flow (mitigated by EventLogService)
  • ⚠️ Eventual consistency considerations

ADR-009: OpenAPI/Swagger for API Documentation

Attribute Value
ID ADR-009
Status Accepted
Date 2026-01-10
Context Need interactive API documentation for developers

Decision

Use @nestjs/swagger for OpenAPI 3.0 documentation with Swagger UI.

Implementation

  • Swagger UI: Available at /api/docs
  • OpenAPI JSON: Available at /api/docs-json
  • Environment restriction: Only enabled in non-production
  • Decorator-based: All endpoints documented via decorators

Decorators Used

Decorator Purpose
@ApiTags Group endpoints by feature
@ApiOperation Describe endpoint purpose
@ApiResponse Document response schemas
@ApiProperty Document DTO properties
@ApiParam Document path parameters
@ApiQuery Document query parameters

Consequences

  • ✅ Interactive API testing
  • ✅ Auto-generated documentation
  • ✅ Type-safe documentation
  • ⚠️ Must keep decorators in sync with code

ADR-010: HMAC Verification for Webhooks

Attribute Value
ID ADR-010
Status Accepted
Date 2026-01-09
Context Need to verify webhook requests are genuinely from Shopify

Decision

Implement HMAC-SHA256 signature verification for all Shopify webhooks.

Implementation

// ShopifyWebhookGuard
const hash = crypto.createHmac('sha256', webhookSecret).update(rawBody, 'utf8').digest('base64');

return crypto.timingSafeEqual(Buffer.from(hash), Buffer.from(hmacHeader));

Rationale

  • Security: Prevents forged webhook requests
  • Shopify standard: Required by Shopify webhook specification
  • Timing-safe comparison: Prevents timing attacks
  • Raw body access: NestJS configured to preserve raw body

Consequences

  • ✅ Secure webhook endpoint
  • ✅ Compliant with Shopify requirements
  • ⚠️ Requires raw body access (special NestJS configuration)

ADR-011: Idempotent Webhook Processing

Attribute Value
ID ADR-011
Status Accepted
Date 2026-01-09
Context Shopify may send duplicate webhooks; need to handle gracefully

Decision

Implement idempotent webhook processing using:

  1. Webhook ID tracking (in-memory Set)
  2. Database unique constraints (shopifyOrderId)

Implementation

// ShopifyService
private readonly processedWebhooks = new Set<string>();

if (this.processedWebhooks.has(webhookId)) {
  return; // Skip duplicate
}
this.processedWebhooks.add(webhookId);

// OrdersService
const existing = await this.ordersRepository.findByShopifyOrderId(id);
if (existing) {
  return existing; // Return existing, don't create duplicate
}

Consequences

  • ✅ No duplicate orders created
  • ✅ Safe to retry failed webhooks
  • ⚠️ In-memory Set resets on restart (database constraint is primary guard)

ADR-012: Assembly Parts Model for Product Mapping

Attribute Value
ID ADR-012
Status Accepted
Date 2026-01-09
Context A single Shopify product may require multiple 3D printed parts

Decision

Implement ProductMapping → AssemblyPart one-to-many relationship.

Data Model

uml diagram

Fields

  • ProductMapping: SKU, shopifyProductId, defaultPrintProfile
  • AssemblyPart: partName, partNumber, simplyPrintFileId, quantityPerProduct

Rationale

  • Flexibility: Support both single-part (1 part) and multi-part products
  • Quantity support: quantityPerProduct for parts needed multiple times (e.g., 4 wheels)
  • Print profiles: Override default profile per part if needed

Consequences

  • ✅ Supports complex assemblies
  • ✅ Clear part ordering via partNumber
  • ✅ Flexible print settings per part
  • ⚠️ More complex order processing logic

ADR-013: Shared Domain Library

Attribute Value
ID ADR-013
Status Accepted
Date 2026-01-09
Context Need to share types between frontend, backend, and external integrations

Decision

Create a shared @forma3d/domain library containing:

  • Entity types
  • Enums
  • Shopify types
  • Common interfaces

Structure

libs/domain/src/
├── entities/
│   ├── order.ts
│   ├── line-item.ts
│   ├── print-job.ts
│   └── product-mapping.ts
├── enums/
│   ├── order-status.ts
│   ├── line-item-status.ts
│   └── print-job-status.ts
├── shopify/
│   ├── shopify-order.entity.ts
│   └── shopify-product.entity.ts
└── index.ts

Rationale

  • Single source of truth: Types defined once, used everywhere
  • Type safety: Frontend and backend share exact same types
  • Nx integration: Clean imports via path aliases

Consequences

  • ✅ Consistent types across codebase
  • ✅ No type drift between frontend/backend
  • ✅ Easy to update types in one place
  • ⚠️ Must rebuild library on changes

ADR-014: SimplyPrint as Unified Print Farm Controller

Attribute Value
ID ADR-014
Status ✅ Implemented (Phase 2)
Date 2026-01-10 (Updated: 2026-01-13)
Context Need to control multiple 3D printer brands (Prusa, Bambu Lab) from one API

Decision

Use SimplyPrint as the unified print farm management solution with an edge device connecting to all printers via LAN.

Architecture

uml diagram

Rationale

  • Unified API: Single integration point for all printer brands
  • LAN mode: Direct communication with printers, no cloud dependency for print control
  • Edge device: Handles printer communication, buffering, and monitoring
  • Multi-brand support: Prusa and Bambu Lab printers managed together
  • No Bambu Cloud dependency: Avoids Bambu Lab Cloud API limitations

Printer Support

Brand Models Connection
Prusa MK3S+, XL, Mini LAN via SimplyPrint edge device
Bambu Lab X1 Carbon, P1S LAN via SimplyPrint edge device

Implementation Details (Phase 2)

API Client (apps/api/src/simplyprint/simplyprint-api.client.ts):

  • HTTP Basic Authentication with Company ID and API Key
  • Typed methods for files, jobs, printers, and queue operations
  • Automatic connection verification on startup
  • Sentry integration for 5xx error tracking

Webhook Controller (apps/api/src/simplyprint/simplyprint-webhook.controller.ts):

  • Endpoint: POST /webhooks/simplyprint
  • X-SP-Token verification via guard
  • Event-driven status updates

Print Jobs Service (apps/api/src/print-jobs/print-jobs.service.ts):

  • Creates print jobs in SimplyPrint when orders arrive
  • Updates local status based on SimplyPrint events
  • Supports cancel and retry operations

API Endpoints Used:

Endpoint Method Purpose
/{companyId}/files/List GET List available print files
/{companyId}/printers/Get GET Get printer statuses
/{companyId}/printers/actions/CreateJob POST Create new print job
/{companyId}/printers/actions/Cancel POST Cancel active job
/{companyId}/queue/GetItems GET Get queue items
/{companyId}/queue/AddItem POST Add item to queue
/{companyId}/queue/RemoveItem POST Remove from queue

Consequences

  • ✅ Single API for all printers
  • ✅ No dependency on Bambu Lab Cloud
  • ✅ Local network resilience
  • ✅ Real-time printer status via edge device
  • ✅ Typed API client with full error handling
  • ✅ Webhook and polling support for status updates
  • ⚠️ Requires edge device on print farm network
  • ⚠️ SimplyPrint subscription required

ADR-015: Aikido Security Platform for Continuous Security Monitoring

Attribute Value
ID ADR-015
Status Accepted
Date 2026-01-10
Context Need continuous security monitoring, vulnerability scanning, and SBOM generation

Decision

Use Aikido Security Platform as the centralized security monitoring and compliance solution integrated into the CI/CD pipeline.

Security Checks Implemented

Check Status Description
Open Source Dependency Monitoring Active Monitors 3rd party dependencies for vulnerabilities
Exposed Secrets Monitoring Compliant Detects accidentally exposed secrets in source code
License Management Compliant Validates dependency licenses for legal compliance
SAST Compliant Static Application Security Testing
IaC Testing Compliant Infrastructure as Code security analysis
Malware Detection Compliant Detects malware in dependencies
Mobile Issues Compliant Mobile manifest file monitoring
SBOM Generation Active Software Bill of Materials for supply chain security

Rationale

  • Comprehensive coverage: Single platform covers multiple security domains
  • CI/CD integration: Automated scanning on every code change
  • SBOM generation: Critical for supply chain security and compliance
  • License compliance: Automated license validation prevents legal issues
  • Developer-friendly: Clear dashboards and actionable remediation guidance
  • Proactive detection: Continuous monitoring catches issues before production

Future Enhancements

  • Code Quality Analysis: Will be enabled in a subsequent phase to complement security scanning

Consequences

  • ✅ Continuous security visibility across the codebase
  • ✅ Automated vulnerability detection in dependencies
  • ✅ SBOM generation for supply chain transparency
  • ✅ License compliance validation
  • ✅ Secrets exposure prevention
  • ⚠️ Requires Aikido platform subscription
  • ⚠️ May flag false positives requiring triage

Alternatives Considered

Alternative Reason for Rejection
Snyk More expensive, less comprehensive for our needs
GitHub Advanced Security Limited to GitHub, not as comprehensive
Manual audits Not scalable, too slow for continuous delivery
Dependabot only Only covers dependency vulnerabilities, not comprehensive

ADR-016: Sentry Observability with OpenTelemetry ✅

Attribute Value
ID ADR-016
Status ✅ Implemented
Date 2026-01-10
Context Need comprehensive observability: error tracking, performance monitoring, distributed tracing

Decision

Use Sentry as the observability platform with an OpenTelemetry-first architecture for vendor neutrality.

Architecture

uml diagram

Implementation Details

Backend (NestJS):

  • @sentry/nestjs for error tracking and performance
  • @sentry/profiling-node for profiling
  • nestjs-pino for structured JSON logging
  • OpenTelemetry auto-instrumentation for Prisma queries
  • Global exception filter with Sentry capture
  • Logging interceptor with correlation IDs

Frontend (React):

  • @sentry/react for error tracking
  • Custom ErrorBoundary component with Sentry integration
  • Browser tracing for page navigation
  • User-friendly error fallback UI

Sampling Configuration (Free Tier Compatible): | Environment | Traces | Profiles | Errors | |-------------|--------|----------|--------| | Development | 100% | 100% | 100% | | Production | 10% | 10% | 100% |

Rationale

  • Sentry: Industry-leading error tracking with excellent stack trace support
  • OpenTelemetry: Vendor-neutral instrumentation standard, future-proof
  • Structured Logging: JSON logs enable log aggregation and searching
  • Correlation IDs: End-to-end request tracing across frontend and backend
  • Free Tier: Sufficient for small-scale production (10K errors/month)

Data Privacy

Sensitive data is automatically scrubbed:

  • Authorization headers
  • Cookies
  • API tokens
  • Passwords
  • Shopify access tokens

Implementation Details (Phase 1b)

Backend (apps/api):

  • instrument.ts - Sentry initialization with profiling (imported first in main.ts)
  • ObservabilityModule - Global module with Pino logger and Sentry integration
  • SentryExceptionFilter - Captures all exceptions with request context
  • LoggingInterceptor - Request/response logging with correlation IDs
  • ObservabilityController - Test endpoints for verifying observability (non-prod only)
  • Prisma service enhanced with Sentry breadcrumbs for query tracing

Frontend (apps/web):

  • sentry.ts - Sentry initialization with browser tracing and session replay
  • ErrorBoundary.tsx - React error boundary with Sentry integration

Shared Library (libs/observability):

  • sentry.config.ts - Shared Sentry configuration with 100% sampling
  • otel.config.ts - OpenTelemetry configuration
  • constants.ts - Trace/request ID header constants

Sampling Decision:

  • 100% sampling for all environments (traces and profiles)
  • Rationale: Full visibility needed during early development
  • Can be reduced when traffic increases and limits are reached

Consequences

  • ✅ Comprehensive error visibility with stack traces and context
  • ✅ Performance monitoring for API endpoints and database queries
  • ✅ Distributed tracing across frontend and backend
  • ✅ Structured logs with correlation IDs for debugging
  • ✅ Vendor-neutral instrumentation via OpenTelemetry
  • ✅ Test endpoints for verifying observability in development
  • ⚠️ Requires Sentry account (free tier available)
  • ⚠️ Must initialize Sentry before other imports in main.ts
  • ⚠️ 100% sampling may hit free tier limits with high traffic

Alternatives Considered

Alternative Reason for Rejection
Datadog Expensive for small-scale, overkill for current needs
New Relic Expensive, complex pricing model
Grafana + Loki Requires self-hosting, more operational overhead
ELK Stack Complex to set up and maintain, expensive at scale
Console.log only No centralized visibility, hard to debug production issues

ADR-017: Docker + Traefik Deployment Strategy

Attribute Value
ID ADR-017
Status ⏳ In Progress
Date 2026-01-10
Context Need a deployment strategy for staging/production on DigitalOcean with TLS and zero-downtime

Decision

Use Docker Compose with Traefik reverse proxy for deploying to DigitalOcean Droplets.

Architecture

uml diagram

Deployment Components

Component Technology Purpose
Reverse Proxy Traefik v3 TLS termination, routing, load balancing
TLS Certificates Let's Encrypt Automatic certificate issuance/renewal
Container Orchestration Docker Compose Service definition and networking
Image Registry DigitalOcean Registry Private Docker image storage
Database DO Managed PostgreSQL Persistent data storage with TLS

Traefik Configuration

Feature Implementation
Entry Points HTTP (:80) with redirect to HTTPS (:443)
Certificate Resolver Let's Encrypt with HTTP challenge
Service Discovery Docker labels on containers
Health Checks HTTP health endpoints (/health, /health/live, /health/ready)
Logging JSON format for log aggregation

Staging URLs

Service URL
API https://staging-connect-api.forma3d.be
Web https://staging-connect.forma3d.be

Pipeline Integration

Stage Trigger Action
Package develop branch Build Docker images, push to DO Registry
Deploy Staging develop branch SSH + docker compose up
Deploy Production main branch Manual approval + SSH deploy

Image Tagging Strategy

Tag Format Example Purpose
Pipeline Instance 20260110143709 Immutable deployment reference
Latest latest Convenience for development

Database Migration Strategy

Prisma migrations run before container deployment:

# Executed in pipeline before docker compose up
docker compose run --rm api npx prisma migrate deploy

Rationale

  • Traefik: Automatic TLS, Docker-native, label-based configuration
  • Docker Compose: Simple, declarative, easy to understand
  • SSH deployment: Direct control, no additional orchestration overhead
  • Managed PostgreSQL: Reliability, automated backups, TLS built-in
  • Let's Encrypt: Free, automated TLS certificates

Zero-Downtime Deployment

# Pull new images
docker compose pull

# Run migrations (idempotent)
docker compose run --rm api npx prisma migrate deploy

# Start new containers (Compose handles replacement)
docker compose up -d --remove-orphans

# Clean up old images
docker image prune -f

Consequences

  • ✅ Automatic TLS certificate management
  • ✅ Simple deployment via SSH + Docker Compose
  • ✅ Zero-downtime container replacement
  • ✅ Docker labels for routing configuration
  • ✅ Consistent image tagging with pipeline ID
  • ⚠️ Single droplet = single point of failure (acceptable for staging)
  • ⚠️ Requires manual SSH key management in Azure DevOps

Alternatives Considered

Alternative Reason for Rejection
Kubernetes Overkill for current scale, operational complexity
Docker Swarm Less ecosystem support, not needed for single-node
Nginx Manual certificate management, less dynamic
Caddy Less mature Docker integration than Traefik
DigitalOcean App Platform Less control, higher cost

ADR-018: Nx Affected Conditional Deployment Strategy

Attribute Value
ID ADR-018
Status ✅ Implemented
Date 2026-01-11
Context Need to avoid unnecessary Docker builds and deployments when only part of the codebase changes

Decision

Use Nx affected to detect which applications have changed and conditionally run package/deploy stages only for affected apps.

Architecture

uml diagram

Pipeline Parameters

Parameter Type Default Purpose
ForceFullVersioningAndDeployment boolean true Bypass affected detection, deploy all apps
breakingMigration boolean false Stop API before migrations

How Affected Detection Works

The pipeline runs pnpm nx show projects --affected --type=app to identify which applications have changed compared to the base branch (origin/main).

Scenarios:

Change Location API Affected Web Affected Reason
apps/api/** Only API code changed
apps/web/** Only Web code changed
libs/domain/** Shared library affects both apps
libs/api-client/** API client only used by Web
prisma/** Database schema affects API
docs/**, *.md Docs are published as a separate static site (Zensical)

Migration Safety

The deployment follows a specific order to ensure database safety:

  1. Pull new images (uses latest code with new Prisma schema)
  2. Stop API (only if breakingMigration=true)
  3. Run migrations (using new image via docker compose run --rm)
  4. Start API (after migrations complete)

Migration Types:

Migration Type Safe During Old API? Recommended Action
Add nullable column ✅ Safe Normal deployment
Add column with default ✅ Safe Normal deployment
Add new table ✅ Safe Normal deployment
Drop column ❌ Dangerous Use breakingMigration=true
Rename column ❌ Dangerous Use breakingMigration=true
Add non-nullable column ❌ Dangerous Use breakingMigration=true

Rationale

  • Efficiency: Avoid building/pushing Docker images when code hasn't changed
  • Cost reduction: Fewer container registry pushes, less storage used
  • Faster deployments: Only affected services are restarted
  • Cleaner versioning: New version tags only when actual code changes
  • Nx integration: Leverages existing monorepo tooling for dependency detection

Consequences

  • ✅ Significantly faster CI/CD for partial changes
  • ✅ Reduced container registry costs
  • ✅ Cleaner deployment history (versions reflect actual changes)
  • ✅ Safe migration order (migrations before restart)
  • ✅ Support for breaking migrations with explicit parameter
  • ✅ Override available via ForceFullVersioningAndDeployment parameter
  • ⚠️ First pipeline run on new branch may show all apps affected
  • ⚠️ Shared library changes trigger both app deployments (by design)
  • ⚠️ breakingMigration requires manual assessment of migration type

Alternatives Considered

Alternative Reason for Rejection
Always build both apps Wasteful, slow, unnecessary version proliferation
Manual selection of apps Error-prone, requires human decision each time
Git diff on Dockerfiles only Misses shared library changes
Separate pipelines per app Loses monorepo benefits, harder to maintain

ADR-019: SimplyPrint Webhook Verification

Attribute Value
ID ADR-019
Status ✅ Implemented
Date 2026-01-13
Context Need to verify webhook requests are genuinely from SimplyPrint

Decision

Implement X-SP-Token header verification with timing-safe comparison for all SimplyPrint webhooks.

Implementation

// SimplyPrintWebhookGuard
@Injectable()
export class SimplyPrintWebhookGuard implements CanActivate {
  canActivate(context: ExecutionContext): boolean {
    const request = context.switchToHttp().getRequest();
    const token = request.headers['x-sp-token'];

    if (!this.webhookSecret) {
      this.logger.warn('SimplyPrint webhook secret not configured, skipping verification');
      return true;
    }

    if (!token) {
      throw new UnauthorizedException('Missing X-SP-Token header');
    }

    // Timing-safe comparison to prevent timing attacks
    const tokenBuffer = Buffer.from(token);
    const secretBuffer = Buffer.from(this.webhookSecret);

    if (tokenBuffer.length !== secretBuffer.length) {
      throw new UnauthorizedException('Invalid SimplyPrint webhook signature');
    }

    if (!crypto.timingSafeEqual(tokenBuffer, secretBuffer)) {
      throw new UnauthorizedException('Invalid SimplyPrint webhook signature');
    }

    return true;
  }
}

Rationale

  • Security: Prevents forged webhook requests
  • SimplyPrint standard: Uses the X-SP-Token header as per SimplyPrint documentation
  • Timing-safe comparison: Prevents timing attacks on secret comparison
  • Graceful degradation: Allows bypassing verification in development when secret not configured

Webhook Endpoint

Endpoint Method Purpose
/webhooks/simplyprint POST Receive SimplyPrint events

Supported Events

Event Action
job.started Update job status to PRINTING
job.done Update job status to COMPLETED
job.failed Update job status to FAILED
job.cancelled Update job status to CANCELLED
job.paused Keep as PRINTING (temporary state)
job.resumed Keep as PRINTING
printer.* Ignored (no job status change)

Consequences

  • ✅ Secure webhook endpoint
  • ✅ Protection against timing attacks
  • ✅ Clear event-to-status mapping
  • ✅ Development-friendly (optional verification)
  • ⚠️ Requires SIMPLYPRINT_WEBHOOK_SECRET environment variable

ADR-020: Hybrid Status Monitoring (Polling + Webhooks)

Attribute Value
ID ADR-020
Status ✅ Implemented
Date 2026-01-13
Context Need reliable print job status updates even if webhooks fail or are delayed

Decision

Implement a hybrid approach using both SimplyPrint webhooks (primary) and periodic polling (fallback) for job status monitoring.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Status Update Sources                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   SimplyPrint Cloud                                              │
│         │                                                        │
│         ├─── Webhooks (Primary, Real-time) ───┐                  │
│         │    • Immediate notification         │                  │
│         │    • Event: job.started/done/failed │                  │
│         │                                     ▼                  │
│         │                            SimplyPrintService          │
│         │                                     │                  │
│         └─── Polling (Fallback, 30s) ────────►│                  │
│              • @Cron every 30 seconds         │                  │
│              • Checks queue and printers      │                  │
│              • Catches missed webhooks        ▼                  │
│                                    simplyprint.job-status-changed│
│                                               │                  │
│                                               ▼                  │
│                                       PrintJobsService           │
│                                               │                  │
│                                               ▼                  │
│                                        Database Update           │
└─────────────────────────────────────────────────────────────────┘

Implementation

Webhook Handler (Primary):

async handleWebhook(payload: SimplyPrintWebhookPayload): Promise<void> {
  const jobData = payload.data.job;
  if (!jobData) return;

  const newStatus = this.mapWebhookEventToStatus(payload.event);
  if (!newStatus) return;

  this.eventEmitter.emit(SIMPLYPRINT_EVENTS.JOB_STATUS_CHANGED, {
    simplyPrintJobId: jobData.uid,
    newStatus,
    printerId: payload.data.printer?.id,
    printerName: payload.data.printer?.name,
    timestamp: new Date(payload.timestamp * 1000),
  });
}

Polling Fallback:

@Cron(CronExpression.EVERY_30_SECONDS)
async pollJobStatuses(): Promise<void> {
  if (!this.pollingEnabled || this.isPolling) return;

  this.isPolling = true;
  try {
    const printers = await this.simplyPrintClient.getPrinters();

    for (const printer of printers) {
      if (printer.currentJobId && printer.status === 'printing') {
        this.eventEmitter.emit(SIMPLYPRINT_EVENTS.JOB_STATUS_CHANGED, {
          simplyPrintJobId: printer.currentJobId,
          newStatus: PrintJobStatus.PRINTING,
          printerId: printer.id,
          printerName: printer.name,
          timestamp: new Date(),
        });
      }
    }
  } finally {
    this.isPolling = false;
  }
}

Configuration

Environment Variable Default Description
SIMPLYPRINT_POLLING_ENABLED true Enable/disable polling fallback
SIMPLYPRINT_POLLING_INTERVAL_MS 30000 Polling interval in milliseconds

Rationale

  • Reliability: Webhooks can fail due to network issues, SimplyPrint outages, or configuration problems
  • Real-time updates: Webhooks provide immediate notification when status changes
  • Consistency: Polling catches any status changes that webhooks might miss
  • Idempotency: Status updates check current status before updating, preventing duplicate updates
  • Configurable: Polling can be disabled in environments where webhooks are reliable

Status Deduplication

The system handles duplicate status updates gracefully:

async updateJobStatus(simplyPrintJobId: string, newStatus: PrintJobStatus): Promise<PrintJob> {
  const printJob = await this.findBySimplyPrintJobId(simplyPrintJobId);

  // Skip if status unchanged (idempotent)
  if (printJob.status === newStatus) {
    return printJob;
  }

  // Update and emit events
  // ...
}

Consequences

  • ✅ High reliability for status updates
  • ✅ Real-time updates via webhooks
  • ✅ Catches missed webhooks via polling
  • ✅ Configurable polling interval
  • ✅ Idempotent status updates
  • ⚠️ Polling adds API calls every 30 seconds (minimal overhead)
  • ⚠️ Potential for slight delay if only relying on polling

Alternatives Considered

Alternative Reason for Rejection
Webhooks only Single point of failure, missed events cause stale status
Polling only Higher latency, unnecessary API calls when webhooks work
WebSocket connection SimplyPrint doesn't offer WebSocket API
Manual refresh button Poor UX, requires operator intervention

ADR-021: Retry Queue with Exponential Backoff

Attribute Value
ID ADR-021
Status ✅ Implemented
Date 2026-01-14
Context Need to handle transient failures in external API calls (Shopify, SimplyPrint) gracefully

Decision

Implement a database-backed retry queue with exponential backoff and jitter for all retryable operations.

Configuration

Setting Value Description
Max Retries 5 Maximum retry attempts
Initial Delay 1 second First retry delay
Max Delay 1 hour Maximum retry delay
Backoff Multiplier 2 Exponential growth factor
Jitter ±10% Randomization to prevent thundering herd
Cleanup 7 days Old completed jobs deleted

Implementation

calculateDelay(attempt: number): number {
  let delay = initialDelayMs * Math.pow(backoffMultiplier, attempt - 1);
  delay = Math.min(delay, maxDelayMs);
  const jitter = delay * 0.1 * (Math.random() * 2 - 1);
  return Math.round(delay + jitter);
}

Supported Job Types

Job Type Description
FULFILLMENT Shopify fulfillment creation
PRINT_JOB_CREATION SimplyPrint job creation
CANCELLATION Job cancellation operations
NOTIFICATION Email notification sending

Consequences

  • ✅ Automatic recovery from transient failures
  • ✅ Prevents thundering herd with jitter
  • ✅ Persistent queue survives application restarts
  • ✅ Failed jobs trigger operator alerts
  • ⚠️ Adds database table for queue persistence

ADR-022: Event-Driven Fulfillment Architecture

Attribute Value
ID ADR-022
Status ✅ Implemented
Date 2026-01-14
Context Need to automatically create Shopify fulfillments when all print jobs complete

Decision

Use NestJS Event Emitter to trigger fulfillment creation when the orchestration service determines all print jobs for an order are complete.

Event Flow

PrintJob.COMPLETED → OrchestrationService checks all jobs
                   → If all complete: emit order.ready-for-fulfillment
                   → FulfillmentService listens and creates Shopify fulfillment

Key Events

Event Producer Consumer
order.ready-for-fulfillment OrchestrationService FulfillmentService
fulfillment.created FulfillmentService NotificationsService
fulfillment.failed FulfillmentService NotificationsService
order.cancelled OrdersService CancellationService

Consequences

  • ✅ Loose coupling between order management and fulfillment
  • ✅ Easy to add additional listeners (logging, analytics)
  • ✅ Failure in fulfillment doesn't block order completion
  • ⚠️ Event ordering not guaranteed (acceptable for this use case)

ADR-023: Email Notification Strategy

Attribute Value
ID ADR-023
Status ✅ Implemented
Date 2026-01-14
Context Need to alert operators when automated processes fail and require attention

Decision

Implement email notifications via SMTP using Nodemailer with Handlebars templates for operator alerts.

Notification Triggers

Trigger Severity Description
Fulfillment failed (final) ERROR Fulfillment failed after max retries
Print job failed (final) ERROR Print job failed after max retries
Cancellation needs review WARNING Order cancelled with in-progress prints
Retry exhausted ERROR Any retry job exceeded max attempts

Configuration

SMTP_HOST=smtp.example.com
SMTP_PORT=587
SMTP_USER=notifications@forma3d.be
SMTP_PASS=***
SMTP_FROM=noreply@forma3d.be
OPERATOR_EMAIL=operator@forma3d.be
NOTIFICATIONS_ENABLED=true

Consequences

  • ✅ Operators notified of issues requiring attention
  • ✅ Email templates are maintainable and customizable
  • ✅ Graceful degradation if email unavailable
  • ✅ Can be disabled in development
  • ⚠️ Requires SMTP configuration for each environment

ADR-024: API Key Authentication for Admin Endpoints

Attribute Value
ID ADR-024
Status ✅ Implemented
Date 2026-01-14
Context Admin endpoints (fulfillment, cancellation) need protection from unauthorized access

Decision

Implement API key authentication using a custom NestJS guard for all admin endpoints that modify order state.

Implementation

// ApiKeyGuard
@Injectable()
export class ApiKeyGuard implements CanActivate {
  canActivate(context: ExecutionContext): boolean {
    if (!this.isEnabled) return true; // Development mode

    const request = context.switchToHttp().getRequest();
    const providedKey = request.headers['x-api-key'];

    if (!providedKey) {
      throw new UnauthorizedException('API key required');
    }

    // Timing-safe comparison to prevent timing attacks
    if (!crypto.timingSafeEqual(Buffer.from(providedKey), Buffer.from(this.apiKey))) {
      throw new UnauthorizedException('Invalid API key');
    }

    return true;
  }
}

Protected Endpoints

Endpoint Method Purpose
/api/v1/fulfillments/order/:orderId POST Create fulfillment
/api/v1/fulfillments/order/:orderId/force POST Force fulfill order
/api/v1/fulfillments/order/:orderId/status GET Get fulfillment status
/api/v1/cancellations/order/:orderId POST Cancel order
/api/v1/cancellations/print-job/:jobId POST Cancel single print job

Authentication Methods Summary

Endpoint Type Method Header Verification
Shopify Webhooks HMAC-SHA256 Signature X-Shopify-Hmac-Sha256 Timing-safe comparison
SimplyPrint Webhooks Token Verification X-SP-Token Timing-safe comparison
Admin Endpoints API Key X-API-Key Timing-safe comparison
Public Endpoints None - -

Configuration

# Generate secure API key
openssl rand -hex 32

# Environment variable
INTERNAL_API_KEY="your-secure-api-key"

Security Considerations

  1. Timing-safe comparison: Prevents timing attacks on key validation
  2. Generic error messages: Returns "API key required" or "Invalid API key" to prevent information leakage
  3. Audit logging: Access attempts are logged for security monitoring
  4. Development mode: If INTERNAL_API_KEY not set, endpoints are accessible (development only)

Rationale

  • IDOR Prevention: Addresses Insecure Direct Object Reference (IDOR) vulnerabilities flagged by security scanners
  • Defense in Depth: Additional layer of protection for sensitive operations
  • Simple Implementation: API keys are stateless and easy to rotate
  • Swagger Integration: API key documented in OpenAPI spec for easy testing

Consequences

  • ✅ Protection against unauthorized access to admin functions
  • ✅ IDOR vulnerability mitigated
  • ✅ Timing-safe implementation prevents timing attacks
  • ✅ Development-friendly (optional in dev mode)
  • ✅ Documented in Swagger UI
  • ⚠️ Requires secure key management in production
  • ⚠️ Key must be rotated if compromised

Alternatives Considered

Alternative Reason for Rejection
OAuth 2.0 / JWT Overkill for internal B2B system with no user accounts
IP Whitelisting Too inflexible, requires network configuration
mTLS Complex certificate management for simple use case
No authentication Unacceptable security risk (IDOR vulnerability)

ADR-025: Cosign Image Signing for Supply Chain Security

Attribute Value
ID ADR-025
Status ✅ Implemented
Date 2026-01-14
Context Need to cryptographically sign container images and create attestations for promotion tracking

Decision

Implement key-based container image signing using cosign from the Sigstore project, with attestations to track image promotions through environments.

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                        Azure DevOps Pipeline                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Build & Package          Acceptance Test           Production           │
│  ┌─────────────┐          ┌─────────────┐          ┌─────────────┐      │
│  │ Build Docker│          │ Deploy to   │          │ Verify      │      │
│  │ Images      │          │ Staging     │          │ Staging     │      │
│  └──────┬──────┘          └──────┬──────┘          │ Attestation │      │
│         │                        │                 └──────┬──────┘      │
│         ▼                        ▼                        │             │
│  ┌─────────────┐          ┌─────────────┐                 ▼             │
│  │ Sign with   │          │ Run Tests   │          ┌─────────────┐      │
│  │ cosign.key  │          └──────┬──────┘          │ Deploy to   │      │
│  └──────┬──────┘                 │                 │ Production  │      │
│         │                        ▼                 └──────┬──────┘      │
│         │                 ┌─────────────┐                 │             │
│         │                 │ Create      │                 ▼             │
│         │                 │ Staging     │          ┌─────────────┐      │
│         │                 │ Attestation │          │ Create Prod │      │
│         │                 └─────────────┘          │ Attestation │      │
│         │                                          └─────────────┘      │
└─────────┼──────────────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────┐     ┌─────────────────────┐
│   DigitalOcean Container Registry   │     │     Repository      │
│   ─────────────────────────────────│     │  ─────────────────  │
│   • Image:tag                       │     │  • cosign.pub       │
│   • Image.sig (signature)           │◄────│    (public key)     │
│   • Image.att (attestation)         │     │                     │
└─────────────────────────────────────┘     └─────────────────────┘

Implementation

Key-Based Signing (chosen approach):

# Azure DevOps Pipeline
- task: DownloadSecureFile@1
  name: cosignKey
  inputs:
    secureFile: 'cosign.key'

- script: |
    cosign sign \
      --key $(cosignKey.secureFilePath) \
      --annotations "build.number=$(imageTag)" \
      $(dockerRegistry)/$(imageName)@$(digest)
  env:
    COSIGN_PASSWORD: $(COSIGN_PASSWORD)

Attestation for Promotion Tracking:

{
  "_type": "https://forma3d.com/attestations/promotion/v1",
  "environment": "staging",
  "promotedAt": "2026-01-14T16:00:00+00:00",
  "build": {
    "number": "20260114160000",
    "pipeline": "forma-3d-connect",
    "commit": "abc123..."
  },
  "verification": {
    "healthCheckPassed": true,
    "acceptanceTestsPassed": true
  }
}

Key Management

File Location Purpose
cosign.key Azure DevOps Secure Files Sign images (private)
COSIGN_PASSWORD Azure DevOps Variable Group Decrypt private key
cosign.pub Repository root (/cosign.pub) Verify signatures (public)

Signing Workflow

Stage Action Artifact Created
Build & Package Sign image after push Image signature (.sig)
Staging Deploy Create staging attestation Staging attestation (.att)
Production Deploy Verify staging attestation, then sign Production attestation

Rationale

  • Supply chain security: Cryptographic proof that images were built by the CI/CD pipeline
  • Promotion tracking: Attestations provide audit trail without modifying image tags
  • Tamper detection: Modifications to signed images are detectable
  • Key-based over keyless: Keyless (OIDC) requires workload identity federation which adds complexity; key-based is simpler and fully functional in Azure DevOps

Why Key-Based Instead of Keyless

Sigstore's "keyless" signing uses OIDC tokens from identity providers (GitHub Actions, Google Cloud, etc.). While elegant, it has challenges in Azure DevOps:

Approach Pros Cons
Keyless (OIDC) No key management, identity-based Requires Azure Workload Identity Federation, falls back to device flow in CI (fails)
Key-Based Works immediately in any CI Requires secure key storage and rotation

We chose key-based because:

  1. Azure DevOps doesn't have native OIDC integration with Sigstore
  2. Device flow authentication cannot work in non-interactive CI
  3. Key-based signing is well-supported and reliable

Security Considerations

  1. Private key protection: Stored in Azure DevOps Secure Files (encrypted at rest)
  2. Password protection: Private key is encrypted, password in secret variable
  3. Timing-safe verification: Public key verification uses constant-time comparison
  4. Key rotation: Documented procedure for rotating keys periodically (see Cosign Setup Guide)

Pipeline Parameters

Parameter Type Default Description
enableSigning boolean true Enable/disable image signing and attestations

Verification Commands

# Verify image signature
cosign verify --key cosign.pub \
  registry.digitalocean.com/forma-3d/forma3d-connect-api:20260114160000

# View attestations attached to image
cosign tree registry.digitalocean.com/forma-3d/forma3d-connect-api:20260114160000

# Verify and decode attestation
cosign verify-attestation --key cosign.pub --type custom \
  registry.digitalocean.com/forma-3d/forma3d-connect-api@sha256:... \
  | jq '.payload | @base64d | fromjson | .predicate'

Local Tooling

A script is provided to view image promotion status:

# List all images with their promotion status
./scripts/list-image-promotions.sh

# Output shows signed status and promotion level
  TAG                PROMOTION    SIGNED   UPDATED
  20260114160000     STAGING              2026-01-14
  20260114120000     none                 2026-01-14

Consequences

  • ✅ Cryptographic proof of image provenance
  • ✅ Tamper detection for container images
  • ✅ Audit trail for environment promotions
  • ✅ Works reliably in Azure DevOps without OIDC setup
  • ✅ Can verify images locally with public key
  • ⚠️ Requires secure key management
  • ⚠️ Keys must be rotated periodically (recommended: 6-12 months)
  • ⚠️ Pipeline requires secure files and variables to be configured

Alternatives Considered

Alternative Reason for Rejection
No signing No supply chain security, no tamper detection
Keyless signing (OIDC) Falls back to device flow in Azure DevOps, requires manual auth
Docker Content Trust (DCT) Less flexible, no custom attestations, vendor lock-in
Image tags for promotion Tags can be overwritten, no cryptographic verification
External attestation store Additional infrastructure, attestations separate from images

ADR-026: CycloneDX SBOM Attestations

Attribute Value
ID ADR-026
Status ✅ Implemented
Date 2026-01-16
Context Need to generate and attach Software Bill of Materials (SBOM) to container images for supply chain transparency

Decision

Generate CycloneDX SBOMs using Syft and attach them as signed attestations using cosign.

Architecture

Each container image in the registry will have multiple attestations stored as separate OCI artifacts:

Container Image (e.g., forma3d-connect-api:20260116120000)
├── Image signature (.sig) ─────────────── cosign sign
├── SBOM attestation (.att) ────────────── cosign attest --type cyclonedx
├── Staging promotion attestation (.att) ─ cosign attest --type custom
└── Production promotion attestation (.att) cosign attest --type custom

Why CycloneDX over SPDX

Criteria CycloneDX SPDX
Primary Focus Security & DevSecOps License compliance
VEX Support Native Separate spec
Tool Ecosystem Excellent (Grype, Syft) Good
Format Complexity Simpler More complex
OWASP Alignment Yes (OWASP project) No

CycloneDX was chosen because:

  1. Better integration with vulnerability scanners (Grype, Trivy)
  2. Native support for VEX (Vulnerability Exploitability eXchange)
  3. Simpler format for debugging
  4. Aligns with OWASP security practices
  5. Growing adoption in DevSecOps pipelines

Implementation

Pipeline Step (after image signing):

- script: |
    set -e

    # Install Syft
    curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin

    # Generate CycloneDX SBOM
    syft $(dockerRegistry)/$(imageName)@$(digest) \
      --output cyclonedx-json=sbom.cdx.json

    # Attach as signed attestation
    cosign attest \
      --yes \
      --key $(cosignKey.secureFilePath) \
      --predicate sbom.cdx.json \
      --type cyclonedx \
      $(dockerRegistry)/$(imageName)@$(digest)
  displayName: 'Generate and Attach SBOM'
  env:
    COSIGN_PASSWORD: $(COSIGN_PASSWORD)

Attestation Types in Registry

After deployment, each image has multiple separate attestations:

Attestation Type Purpose Created By
Signature Proves image was built by CI/CD cosign sign
CycloneDX SBOM Lists all components/packages cosign attest --type cyclonedx
Staging Proves image passed staging cosign attest --type custom
Production Proves image deployed to prod cosign attest --type custom

Verification Commands

# View all attestations attached to an image
cosign tree registry.digitalocean.com/forma-3d/forma3d-connect-api:latest

# Verify and extract SBOM
cosign verify-attestation \
  --key cosign.pub \
  --type cyclonedx \
  registry.digitalocean.com/forma-3d/forma3d-connect-api@sha256:... \
  | jq -r '.payload' | base64 -d | jq '.predicate'

# Count components in SBOM
cosign verify-attestation --key cosign.pub --type cyclonedx \
  registry.digitalocean.com/forma-3d/forma3d-connect-api@sha256:... \
  | jq -r '.payload' | base64 -d | jq '.predicate.components | length'

Scanning for Vulnerabilities

With the SBOM attached, you can scan for vulnerabilities without pulling the full image:

# Extract SBOM and scan with Grype
cosign verify-attestation --key cosign.pub --type cyclonedx \
  registry.digitalocean.com/forma-3d/forma3d-connect-api@sha256:... \
  | jq -r '.payload' | base64 -d | jq '.predicate' > sbom.cdx.json

grype sbom:sbom.cdx.json

Rationale

  • Supply chain transparency: SBOM provides complete visibility into image contents
  • Vulnerability management: Enables scanning without pulling full images
  • Compliance: Meets requirements for software transparency (US Executive Order 14028)
  • Signed attestation: SBOM itself is cryptographically signed, preventing tampering
  • Tool-agnostic: CycloneDX is an open standard supported by many tools

Consequences

  • ✅ Complete visibility into image dependencies
  • ✅ Enables vulnerability scanning from SBOM
  • ✅ Signed attestation prevents SBOM tampering
  • ✅ Supports compliance requirements
  • ✅ Works with existing cosign infrastructure
  • ⚠️ Adds ~10-15 seconds to pipeline per image
  • ⚠️ SBOM attestation adds ~2KB manifest to registry

Alternatives Considered

Alternative Reason for Rejection
SPDX format More focused on licensing, less security tooling
Syft native format Not an industry standard, limited tool support
Docker Buildx --sbom Requires buildx, less control over format
No SBOM Missing supply chain transparency
SBOM in image labels Not cryptographically signed, can be tampered

Tools Used

Tool License Purpose
Syft Apache 2.0 Generate CycloneDX SBOM
Cosign Apache 2.0 Sign and attach as attestation
Grype Apache 2.0 Vulnerability scanning (optional)

ADR-027: TanStack Query for Server State Management

Attribute Value
ID ADR-026
Status Accepted
Date 2026-01-14
Context Need to manage server state in the React dashboard with caching, refetching, and loading states

Decision

Use TanStack Query (v5.x, formerly React Query) for server state management in the dashboard.

Rationale

  • Automatic caching: Query results are cached and deduplicated automatically
  • Background refetching: Data stays fresh with configurable stale times and refetch intervals
  • Loading/error states: Built-in loading, error, and success states reduce boilerplate
  • Optimistic updates: Supports optimistic updates for better UX on mutations
  • DevTools: React Query DevTools for debugging cache state
  • TypeScript support: Excellent TypeScript integration with inferred types

Implementation

// Query client configuration (apps/web/src/lib/query-client.ts)
const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      staleTime: 30 * 1000, // 30 seconds
      gcTime: 5 * 60 * 1000, // 5 minutes cache
      retry: 1,
      refetchOnWindowFocus: false,
    },
  },
});

// Example hook (apps/web/src/hooks/use-orders.ts)
export function useOrders(query: OrdersQuery = {}) {
  return useQuery({
    queryKey: ['orders', query],
    queryFn: () => apiClient.orders.list(query),
  });
}

Consequences

  • ✅ Eliminates manual loading/error state management
  • ✅ Automatic cache invalidation on mutations
  • ✅ Integrates well with Socket.IO for real-time updates
  • ✅ Reduces API calls through intelligent caching
  • ⚠️ Requires understanding of query keys for proper cache invalidation

Alternatives Considered

Alternative Reason for Rejection
Redux Too much boilerplate for server state
SWR Less features than TanStack Query
Apollo Client GraphQL-focused, overkill for REST API
Manual fetch Requires implementing caching/loading states manually

ADR-028: Socket.IO for Real-Time Dashboard Updates

Attribute Value
ID ADR-028
Status Accepted
Date 2026-01-14
Context Dashboard needs real-time updates when orders and print jobs change status

Decision

Use Socket.IO for real-time WebSocket communication between backend and dashboard.

Architecture

Backend Events          WebSocket Gateway         React Dashboard
     │                        │                        │
     │  order.created         │                        │
     ├───────────────────────►│                        │
     │                        │  order:created         │
     │                        ├───────────────────────►│
     │                        │                        │ invalidateQueries()
     │                        │                        │ toast.success()

Implementation

Backend (NestJS WebSocket Gateway):

// apps/api/src/gateway/events.gateway.ts
@WebSocketGateway({ namespace: '/events' })
export class EventsGateway {
  @WebSocketServer()
  server!: Server;

  @OnEvent(ORDER_EVENTS.CREATED)
  handleOrderCreated(event: OrderEventPayload): void {
    this.server.emit('order:created', { ... });
  }
}

Frontend (React Context):

// apps/web/src/contexts/socket-context.tsx
socketInstance.on('order:created', (data) => {
  toast.success(`New order: #${data.orderNumber}`);
  queryClient.invalidateQueries({ queryKey: ['orders'] });
});

Rationale

  • Already installed: Socket.IO server was already in dependencies for Phase 3
  • Bidirectional: Supports future features like notifications and chat
  • Automatic reconnection: Handles network interruptions gracefully
  • Namespace support: Can separate different event channels
  • Browser compatibility: Works across all modern browsers

Consequences

  • ✅ Real-time updates without polling
  • ✅ Toast notifications on important events
  • ✅ Automatic TanStack Query cache invalidation
  • ✅ Connection status visible in UI
  • ⚠️ Requires WebSocket support in infrastructure

Alternatives Considered

Alternative Reason for Rejection
Polling Higher latency, more server load
Server-Sent Events One-directional only
Raw WebSockets Less features than Socket.IO (rooms, reconnection)
Pusher/Ably External dependency, cost

ADR-029: API Key Authentication for Dashboard

Attribute Value
ID ADR-029
Status Accepted
Date 2026-01-14
Context Dashboard needs authentication to protect admin operations

Decision

Use API key authentication stored in browser localStorage for dashboard authentication.

Implementation

// apps/web/src/contexts/auth-context.tsx
const AUTH_STORAGE_KEY = 'forma3d_api_key';

export function AuthProvider({ children }: { children: ReactNode }) {
  const [apiKey, setApiKey] = useState<string | null>(() => {
    return localStorage.getItem(AUTH_STORAGE_KEY);
  });

  const login = (key: string) => {
    localStorage.setItem(AUTH_STORAGE_KEY, key);
    setApiKey(key);
  };
  // ...
}

// Protected routes redirect to /login if not authenticated
function ProtectedRoute({ children }: { children: ReactNode }) {
  const { isAuthenticated } = useAuth();
  if (!isAuthenticated) return <Navigate to="/login" replace />;
  return <>{children}</>;
}

Rationale

  • Simplicity: No session management, token refresh, or OAuth complexity
  • Consistent with API: Uses same API key authentication as backend (ADR-024)
  • Offline-capable: Works without server validation on page load
  • Single operator: System is used by single operator, not public users

Security Considerations

  • API key stored in localStorage (acceptable for internal admin tool)
  • Key sent via X-API-Key header for mutations
  • HTTPS required in production
  • Key should be rotated periodically

Consequences

  • ✅ Simple implementation and user experience
  • ✅ Consistent with existing API key guard on backend
  • ✅ No additional authentication infrastructure needed
  • ⚠️ API key visible in localStorage (acceptable for admin tool)
  • ⚠️ No role-based access control (single admin role)

Alternatives Considered

Alternative Reason for Rejection
OAuth/OIDC Overkill for single-operator system
JWT tokens Adds complexity without benefit for this use case
Session cookies Requires server-side session management
No auth Admin operations must be protected

ADR-030: Sendcloud for Shipping Integration

Attribute Value
ID ADR-030
Status Accepted
Date 2026-01-16
Context Need to generate shipping labels and sync tracking information to Shopify

Decision

Use Sendcloud API (custom integration) rather than the native Sendcloud-Shopify app for shipping label generation and tracking.

Rationale

Why Sendcloud as a Platform

  • Multi-carrier support: Single API for PostNL, DPD, DHL, UPS, and 80+ other carriers
  • European focus: Strong presence in Belgium/Netherlands matching Forma3D's primary market
  • Simple API: REST API with Basic Auth, parcel creation returns label PDF immediately
  • Automatic tracking: Tracking numbers and URLs provided on parcel creation
  • Webhook support: Status updates available via webhooks (for future enhancement)
  • Competitive pricing: Pay-per-label pricing suitable for small business volumes
  • Label formats: Supports A4, A6, and thermal printer formats

Why Custom API Integration vs Native Shopify-Sendcloud App

Sendcloud offers a native Shopify integration that automatically syncs orders. However, we chose a custom API integration for the following reasons:

Aspect Native Sendcloud-Shopify App Our Custom API Integration
Trigger Manual — operator must create label in Sendcloud dashboard Automatic — triggered when all print jobs complete
Print awareness None — doesn't know about 3D printing workflow Full — waits for SimplyPrint jobs to finish
Unified dashboard Split across Shopify + Sendcloud panels Single dashboard — orders, prints, shipments in one place
Audit trail Separate logs in each system Integrated event log with full traceability
Custom workflow Generic e-commerce flow Custom print-to-ship automation
Tracking sync timing After manual label creation Immediate — included in Shopify fulfillment

Key insight: The native integration doesn't know when 3D printing is complete. An operator would need to: 1. Monitor SimplyPrint for job completion 2. Switch to Sendcloud dashboard 3. Find the order and create a label 4. Wait for tracking to sync back to Shopify

Our custom integration automates this entire workflow:

Print Jobs Complete → Auto-Generate Label → Auto-Fulfill with Tracking → Customer Notified

This reduces manual intervention from ~5 minutes per order to zero, which is critical for scaling order volumes.

Implementation

apps/api/src/
├── sendcloud/
│   ├── sendcloud-api.client.ts    # HTTP client with Basic Auth
│   ├── sendcloud.service.ts       # Business logic, event listener
│   ├── sendcloud.controller.ts    # REST endpoints
│   └── sendcloud.module.ts
├── shipments/
│   ├── shipments.repository.ts    # Prisma queries for Shipment
│   ├── shipments.controller.ts    # REST endpoints
│   └── shipments.module.ts

libs/api-client/src/
└── sendcloud/
    └── sendcloud.types.ts         # Typed DTOs for Sendcloud API

Event Flow

  1. All print jobs complete → OrchestrationService emits order.ready-for-fulfillment
  2. SendcloudService listens → creates parcel via Sendcloud API
  3. Sendcloud returns label URL + tracking number
  4. Shipment record stored in database
  5. SendcloudService emits shipment.created event
  6. FulfillmentService listens → creates Shopify fulfillment with tracking info
  7. Customer receives email notification with tracking link
┌─────────────┐    ┌──────────────┐    ┌─────────────┐    ┌─────────────┐
│ SimplyPrint │───▶│ Orchestration│───▶│  Sendcloud  │───▶│ Fulfillment │
│  (prints)   │    │   Service    │    │   Service   │    │   Service   │
└─────────────┘    └──────────────┘    └─────────────┘    └─────────────┘
                         │                    │                  │
                         │ order.ready-       │ shipment.        │ Shopify
                         │ for-fulfillment    │ created          │ Fulfillment
                         ▼                    ▼                  ▼
                   [All jobs done]      [Label + tracking]  [Customer notified]

Consequences

  • ✅ Single integration for multiple carriers
  • ✅ Automatic label PDF generation
  • ✅ Tracking information synced to Shopify fulfillments
  • ✅ Dashboard displays shipment status and label download
  • ⚠️ Dependent on Sendcloud uptime and API availability
  • ⚠️ Limited to carriers supported by Sendcloud
  • ⚠️ Requires Sendcloud account and sender address configuration

Environment Variables

SENDCLOUD_PUBLIC_KEY=xxx
SENDCLOUD_SECRET_KEY=xxx
SENDCLOUD_API_URL=https://panel.sendcloud.sc/api/v2
DEFAULT_SHIPPING_METHOD_ID=8
DEFAULT_SENDER_ADDRESS_ID=12345
SHIPPING_ENABLED=true

Alternatives Considered

Alternative Reason for Rejection
Native Sendcloud-Shopify app Requires manual label creation; no print workflow awareness
Direct carrier APIs Too many integrations to maintain, each with different APIs
ShipStation US-focused, less European carrier support
EasyPost Less European carrier coverage than Sendcloud
Manual labels Does not meet automation requirements; ~5 min overhead per order

ADR-031: Automated Container Registry Cleanup

Attribute Value
ID ADR-031
Status Accepted
Date 2026-01-16
Context Container registries accumulate old images over time, increasing storage costs and clutter

Decision

Implement automated container registry cleanup that runs after each successful staging deployment and attestation. The cleanup uses attestation-based policies to determine which images to keep or delete.

Rationale

The Problem

Without automated cleanup, the DigitalOcean Container Registry accumulates images indefinitely:

  • Each CI build creates new images with timestamped tags (e.g., 20260116120000)
  • Signature and attestation artifacts add ~2KB per image
  • Storage costs grow linearly with deployment frequency
  • Old images provide no value after newer versions are verified in production

Attestation-Based Cleanup Policy

The cleanup leverages the cosign attestation system (ADR-025) to make intelligent retention decisions:

Image Status Action Rationale
PRODUCTION attestation Keep May need for rollback
Currently deployed Keep Active in production/staging
Recent (last 5) Keep Recent builds for debugging
STAGING-only attestation Delete Superseded by newer staging builds
No attestation Delete Never passed acceptance tests

This policy ensures: 1. Rollback capability: Production-attested images are always available 2. Debugging support: Recent images preserved for investigation 3. Automatic garbage collection: Old staging/unsigned images removed

Integration with Health Endpoints

The cleanup script queries the /health endpoints to determine which images are currently deployed:

# API health endpoint returns current build number
curl https://staging-connect-api.forma3d.be/health
# Response: { "build": { "number": "20260116120000" }, ... }

This prevents accidental deletion of running containers.

Implementation

scripts/
└── cleanup-registry.sh    # Cleanup script with attestation checking

azure-pipelines.yml
└── AcceptanceTest stage
    └── CleanupRegistry job    # Runs after AttestStagingPromotion

Cleanup Script

The scripts/cleanup-registry.sh script:

  1. Authenticates to DigitalOcean Container Registry via doctl
  2. Queries health endpoints to find currently deployed image tags
  3. Lists all images in the registry for each repository
  4. Checks attestations using cosign verify-attestation with the public key
  5. Applies retention policy based on attestation status
  6. Deletes eligible images via doctl registry repository delete-manifest
  7. Triggers garbage collection to reclaim storage space

Pipeline Integration

The cleanup runs as the final job in the AcceptanceTest stage:

- job: CleanupRegistry
  displayName: 'Cleanup Container Registry'
  dependsOn: AttestStagingPromotion
  condition: or(succeeded(), eq(variables.deploymentHappened, 'False'))

Cleanup Flow

┌─────────────────────────────────────────────────────────────────────┐
│                        AcceptanceTest Stage                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  VerifyDeployment ──▶ RunAcceptanceTests ──▶ AttestStagingPromotion │
│                                                     │                │
│                                                     ▼                │
│                                            CleanupRegistry           │
│                                                     │                │
│                                                     ▼                │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │ 1. Query /health endpoints for deployed versions             │   │
│  │ 2. List all images in registry                               │   │
│  │ 3. For each image:                                           │   │
│  │    - Check if PRODUCTION attested → KEEP                     │   │
│  │    - Check if currently deployed → KEEP                      │   │
│  │    - Check if in top 5 recent → KEEP                         │   │
│  │    - Check if STAGING-only attested → DELETE                 │   │
│  │    - Check if no attestation → DELETE                        │   │
│  │ 4. Trigger garbage collection                                │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Usage

Local Testing (Dry Run)

# Preview what would be deleted
./scripts/cleanup-registry.sh \
  --key cosign.pub \
  --api-url https://staging-connect-api.forma3d.be \
  --web-url https://staging-connect.forma3d.be \
  --dry-run \
  --verbose

Manual Cleanup

# Perform actual cleanup
./scripts/cleanup-registry.sh \
  --key cosign.pub \
  --api-url https://staging-connect-api.forma3d.be \
  --web-url https://staging-connect.forma3d.be \
  --verbose

Script Options

Option Description
-k, --key FILE Public key for attestation verification (required)
--api-url URL API health endpoint URL (required)
--web-url URL Web health endpoint URL (required)
--keep-recent N Keep N most recent images (default: 5)
--dry-run Preview deletions without executing
-v, --verbose Show detailed output

Consequences

  • ✅ Automatic storage management reduces costs
  • ✅ Attestation-based policy ensures production rollback capability
  • ✅ Health endpoint check prevents deletion of running containers
  • ✅ Dry-run mode enables safe testing
  • ✅ Garbage collection reclaims space after deletion
  • ⚠️ Requires health endpoints to return build information
  • ⚠️ Dependent on cosign/doctl availability in pipeline

Alternatives Considered

Alternative Reason for Rejection
Time-based retention (e.g., 30 days) Doesn't account for promotion status; may delete production-ready images
Tag-based retention (e.g., keep latest) latest tag is mutable; doesn't guarantee correct image
Manual cleanup Error-prone, inconsistent, doesn't scale
Registry auto-purge policies DigitalOcean doesn't support attestation-aware policies

ADR-032: Domain Boundary Separation with Interface Contracts

Attribute Value
ID ADR-032
Title Domain Boundary Separation with Interface Contracts
Status Implemented
Context Prepare the modular monolith for potential future microservices extraction by establishing clean domain boundaries
Date 2026-01-17

Context

As the application grows, we need to ensure domain boundaries are well-defined to: 1. Enable future microservices extraction without major refactoring 2. Reduce coupling between modules 3. Enable independent testing of domain logic 4. Provide distributed tracing capabilities

Decision

We implement domain boundary separation with the following patterns:

1. Domain Contracts Library (libs/domain-contracts)

Create a dedicated library containing: - Interface definitions (IOrdersService, IPrintJobsService, etc.) - DTOs for cross-domain communication (OrderDto, PrintJobDto, etc.) - Symbol injection tokens (ORDERS_SERVICE, PRINT_JOBS_SERVICE, etc.)

2. Correlation ID Infrastructure

Add correlation ID propagation for distributed tracing: - CorrelationMiddleware extracts/generates x-correlation-id headers - CorrelationService uses AsyncLocalStorage for context propagation - All domain events include correlationId, timestamp, and source fields

3. Repository Encapsulation

Repositories are internal implementation details: - Modules stop exporting repositories - Only interface tokens are exported for cross-domain communication - Services implement domain interfaces

4. Event-Based Base Interfaces

Define base event interfaces that all domain events extend:

interface BaseEvent {
  correlationId: string;
  timestamp: Date;
  source: string;
}

Implementation

Component Path Description
Domain Contracts libs/domain-contracts/ Interface definitions and DTOs
Correlation Service apps/api/src/common/correlation/ Request context propagation
Base Events libs/domain/src/events/ Base event interfaces

Interface Tokens Pattern

// In domain-contracts library
export const ORDERS_SERVICE = Symbol('IOrdersService');

export interface IOrdersService {
  findById(id: string): Promise<OrderDto | null>;
  updateStatus(id: string, status: OrderStatus): Promise<OrderDto>;
  // ... other methods
}

// In module
@Module({
  providers: [
    OrdersService,
    { provide: ORDERS_SERVICE, useExisting: OrdersService },
  ],
  exports: [ORDERS_SERVICE], // No longer exports repository
})
export class OrdersModule {}

// In consumer service
@Injectable()
export class FulfillmentService {
  constructor(
    @Inject(ORDERS_SERVICE)
    private readonly ordersService: IOrdersService,
  ) {}
}

Consequences

Positive: - Clear domain boundaries enable future microservices extraction - Reduced coupling between modules - Better testability with interface-based mocking - Distributed tracing via correlation IDs - Repository details are now private implementation

Negative: - Slight increase in boilerplate (interface definitions, DTOs) - Need to maintain DTO mapping logic - Some forwardRef() usages remain for circular retry patterns

  • ADR-007: Layered Architecture with Repository Pattern
  • ADR-008: Event-Driven Internal Communication
  • ADR-013: Shared Domain Library

ADR-033: Database-Backed Webhook Idempotency

Attribute Value
ID ADR-033
Title Database-Backed Webhook Idempotency
Status Implemented
Context In-memory webhook idempotency cache doesn't work in multi-instance deployments
Date 2026-01-17

Context

The original implementation used an in-memory Set<string> for webhook idempotency tracking:

private readonly processedWebhooks = new Set<string>();

This approach had critical problems:

  1. Horizontal Scaling Failure: In a multi-instance deployment, each API instance has its own cache. Webhooks may be processed multiple times across instances.
  2. Memory Leak: The Set grows unbounded as webhooks are processed, causing memory pressure in long-running instances.
  3. Restart Data Loss: All idempotency data is lost on application restart, allowing duplicate processing during restarts.

Decision

Use a PostgreSQL table (ProcessedWebhook) for webhook idempotency instead of Redis or in-memory caching.

Rationale

  • No additional infrastructure: Uses existing PostgreSQL database
  • Transactional safety: Database unique constraint ensures race-condition-safe idempotency
  • Simple cleanup: Scheduled job removes expired records hourly
  • Debugging support: Records include metadata (webhook type, order ID, timestamps)
  • Horizontal scaling: Works correctly across multiple API instances

Implementation

// Atomic check-and-mark using unique constraint
async isProcessedOrMark(webhookId: string, type: string): Promise<boolean> {
  try {
    await this.prisma.processedWebhook.create({ 
      data: { webhookId, webhookType: type, expiresAt } 
    });
    return false; // First time processing
  } catch (error) {
    if (error.code === 'P2002') return true; // Already processed
    throw error;
  }
}

Database Schema

model ProcessedWebhook {
  id          String   @id @default(uuid())
  webhookId   String   @unique  // The Shopify webhook ID
  webhookType String            // e.g., "orders/create"
  processedAt DateTime @default(now())
  expiresAt   DateTime          // When this record can be cleaned up
  orderId     String?           // Associated order for debugging

  @@index([expiresAt])          // For cleanup job queries
  @@index([processedAt])        // For monitoring
}

Alternatives Considered

Alternative Pros Cons Decision
Redis TTL support, fast Additional infrastructure Rejected
Distributed Lock Works with DB Complex, race conditions Rejected
Database Table Simple, no new infra Needs cleanup job Selected

Consequences

Positive:

  • ✅ Works correctly in multi-instance deployments
  • ✅ Survives application restarts
  • ✅ No memory leaks
  • ✅ Auditable (can query processed webhooks)
  • ✅ Race-condition safe via unique constraint

Negative:

  • ⚠️ Slightly higher latency than in-memory (< 10ms)
  • ⚠️ Requires cleanup job (runs hourly)
  • ADR-007: Layered Architecture with Repository Pattern
  • ADR-021: Retry Queue for Resilient Operations

ADR-034: Docker Infrastructure Hardening (Log Rotation & Resource Cleanup)

Status Date Context
Accepted 2026-01-19 Prevent disk exhaustion from Docker logs and images

Context

During staging operations, the server disk filled to 100% due to:

  1. Unbounded Docker logs: The default json-file log driver has no size limits, causing container logs to grow indefinitely
  2. Accumulated old images: Each deployment pulls new images but old versions remained on disk
  3. Health check failures: When disk was full, Docker couldn't execute health checks, causing containers to be marked unhealthy and Traefik to stop routing traffic

Decision

Implement automated infrastructure hardening in the deployment pipeline:

  1. Docker Log Rotation: Configure daemon-level log rotation with size limits
  2. Aggressive Resource Cleanup: Remove unused images, volumes, and networks after each deployment
  3. Separate Image Tags: Use independent version tags for API and Web to support partial deployments

Implementation

1. Docker Log Rotation Configuration

The pipeline automatically creates /etc/docker/daemon.json if missing:

{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

This limits each container to: - Maximum 10MB per log file - Maximum 3 rotated files - Total: 30MB per container (90MB for all 3 containers)

2. Deployment Cleanup Steps

After container restart, the pipeline runs:

# Remove dangling images
docker image prune -f

# Remove unused images older than 24h
docker image prune -a -f --filter "until=24h"

# Clean up unused volumes and networks
docker volume prune -f
docker network prune -f

3. Separate Image Tags

docker-compose.yml now uses independent tags:

api:
  image: ${REGISTRY_URL}/forma3d-connect-api:${API_IMAGE_TAG:-latest}

web:
  image: ${REGISTRY_URL}/forma3d-connect-web:${WEB_IMAGE_TAG:-latest}

This allows: - Deploying only API without changing Web version - Deploying only Web without changing API version - Independent rollbacks for each service

Consequences

Positive:

  • ✅ Prevents disk exhaustion from unbounded log growth
  • ✅ Reduces disk usage by cleaning old images after deployment
  • ✅ Supports independent versioning for API and Web
  • ✅ Self-healing: Pipeline automatically configures log rotation if missing
  • ✅ No manual intervention required

Negative:

  • ⚠️ Docker daemon restart required if log rotation config is missing (brief container interruption)
  • ⚠️ Log history limited to ~30MB per container (may need external log aggregation for production)

Configuration Summary

Setting Value Rationale
max-size 10m Balance between history and disk usage
max-file 3 Keeps ~30MB per container
Image cleanup filter 24h Keeps recent images for quick rollback
  • ADR-017: Docker + Traefik Deployment Strategy
  • ADR-031: Automated Container Registry Cleanup

ADR-035: Progressive Web App (PWA) for Cross-Platform Access

Attribute Value
ID ADR-035
Status Accepted
Date 2026-01-19
Context Need to provide mobile and desktop access for operators monitoring print jobs and managing orders while away from desk

Decision

Adopt Progressive Web App (PWA) technology for the existing React web application, replacing the planned Tauri (desktop) and Capacitor (mobile) native shell applications.

The web application will be enhanced with: 1. Web App Manifest for installability 2. Service Worker for offline caching and push notifications 3. Web Push API for real-time alerts on print job status

Rationale

PWA Suitability for Admin Dashboards

Research conducted in January 2026 confirms PWA is an ideal fit for Forma3D.Connect:

  • Application type: Admin dashboards and SaaS tools are PWA's primary use case
  • Feature requirements: Order management, real-time updates, and push notifications are fully supported
  • Device features: No deep hardware integration (Bluetooth, NFC, sensors) required

iOS/Safari PWA Support (2026)

Apple has significantly improved PWA support:

Feature iOS Version Status
Web Push Notifications iOS 16.4+ ✅ Supported (Home Screen install required)
Badging API iOS 16.4+ ✅ Supported
Declarative Web Push iOS 18.4+ ✅ Improved reliability
Standalone Display Mode iOS 16.4+ ✅ Supported

Cost-Benefit Analysis

Aspect Tauri + Capacitor PWA
Initial development 40-80 hours 8-16 hours
CI/CD pipelines Additional complexity None
Code signing Required (Apple, Windows) None
App store submissions Required None
Update cycle Days (app store review) Instant
Maintenance Ongoing Minimal

Estimated savings: 80-150 hours initial + ongoing maintenance reduction

Tauri/Capacitor Provided No Real Advantage

Both planned native apps were WebView wrappers: - Container(desktop, "Tauri, Rust", "Native desktop shell wrapping the web application") - Container(mobile, "Capacitor", "Mobile shell for on-the-go monitoring")

PWA provides the same experience (installable, app-like, offline capable) without: - Separate build pipelines - Platform-specific debugging - App store management - Code signing certificates

Implementation

Phase 1: PWA Foundation

  1. Add vite-plugin-pwa to the web application
  2. Create manifest.json with app metadata and icons
  3. Configure service worker for asset caching
  4. Enable HTTPS (already implemented)
{
  "name": "Forma3D.Connect",
  "short_name": "Forma3D",
  "start_url": "/",
  "display": "standalone",
  "background_color": "#ffffff",
  "theme_color": "#0066cc"
}

Phase 2: Push Notifications

  1. Implement Web Push API in frontend
  2. Add VAPID key configuration to API
  3. Create notification service (integrate with existing email notifications)
  4. User permission flow in dashboard settings

Phase 3: Enhanced Offline Support

  1. IndexedDB for offline data caching
  2. Background sync for queued actions
  3. Optimistic UI updates

Consequences

Positive:

  • ✅ Significant reduction in development and maintenance effort
  • ✅ Single codebase, single deployment target
  • ✅ Instant updates for all users (no app store delays)
  • ✅ No platform-specific bugs or WebView inconsistencies
  • ✅ No code signing or app store management
  • ✅ Works on any device with a modern browser

Negative:

  • ⚠️ iOS requires Home Screen install for full PWA features
  • ⚠️ No notification sounds on iOS PWA (visual only)
  • ⚠️ Limited system tray integration on desktop

Removed from Project:

  • apps/desktop (Tauri) - removed from roadmap
  • apps/mobile (Capacitor) - removed from roadmap

Updated Architecture

The C4 Container diagram has been updated to reflect the PWA-only architecture:

Before:
├── Web Application (React 19)
├── Desktop App (Tauri) [future]
├── Mobile App (Capacitor) [future]
└── API Server (NestJS)

After:
├── Progressive Web App (React 19 + PWA)
└── API Server (NestJS)

Alternatives Considered

Alternative Reason for Rejection
Keep Tauri + Capacitor plan Unnecessary complexity; WebView wrappers provide no advantage over PWA
React Native for mobile Requires separate codebase; overkill for admin dashboard
Electron for desktop Large bundle size; same WebView approach as Tauri but less efficient
Flutter Requires separate codebase; not justified for simple dashboard

ADR-036: localStorage Fallback for PWA Install Detection

Attribute Value
ID ADR-036
Status Accepted
Date 2026-01-20
Context Need to detect if PWA is installed when user views site in browser, to show appropriate messaging and avoid duplicate install prompts

Decision

Use a dual detection strategy combining the getInstalledRelatedApps() API with localStorage persistence as a fallback for PWA installation detection.

Rationale

The Problem

When a user installs a PWA and later visits the same site in a regular browser: - The browser doesn't know the PWA is installed - The site shows "Install App" even though it's already installed - This creates a confusing user experience

API Limitations

The navigator.getInstalledRelatedApps() API can detect installed PWAs, but has limitations:

Platform Chrome Version Support
Android 80+ ✅ Full support
Windows 85+ ✅ Supported
macOS 140+ ✅ Same-scope only
iOS/Safari - ❌ Not supported

Even where supported, the API can be unreliable due to: - Scope restrictions (must be same origin/scope) - Timing issues during page load - Browser implementation quirks

Dual Detection Strategy

  1. Primary: getInstalledRelatedApps() API
  2. Query the browser for installed related apps
  3. Works when supported and correctly configured

  4. Fallback: localStorage persistence

  5. Store pwa-installed: true when:
    • User installs via appinstalled event
    • App is opened in standalone mode
    • API successfully detects installation
  6. Check localStorage on page load

Implementation

// Detection flow
useEffect(() => {
  // 1. Check standalone mode (running inside PWA)
  const isStandalone = window.matchMedia('(display-mode: standalone)').matches;
  if (isStandalone) {
    setIsInstalled(true);
    localStorage.setItem('pwa-installed', 'true');
    return;
  }

  // 2. Check localStorage fallback
  if (localStorage.getItem('pwa-installed') === 'true') {
    setIsInstalled(true);
  }

  // 3. Try getInstalledRelatedApps API
  if (navigator.getInstalledRelatedApps) {
    navigator.getInstalledRelatedApps().then(apps => {
      if (apps.some(app => app.platform === 'webapp')) {
        setIsInstalled(true);
        localStorage.setItem('pwa-installed', 'true');
      }
    });
  }
}, []);

// Persist on install
window.addEventListener('appinstalled', () => {
  localStorage.setItem('pwa-installed', 'true');
});

Consequences

Positive:

  • ✅ Works across all browsers and platforms
  • ✅ Provides consistent UX when switching between PWA and browser
  • ✅ No false "Install App" prompts when already installed
  • ✅ Gracefully degrades when API not supported

Negative:

  • ⚠️ localStorage can become stale if user uninstalls PWA externally
  • ⚠️ No automatic cleanup mechanism for uninstalled apps
  • ⚠️ Per-browser storage (installing in Chrome won't reflect in Firefox)

Trade-off Accepted:

The risk of showing "Installed" for an uninstalled app is acceptable because: - Users rarely uninstall and then want to reinstall immediately - Clearing site data will reset the state - Better UX than constantly prompting to install an already-installed app

Alternatives Considered

Alternative Reason for Rejection
API only Too unreliable; doesn't work on Safari/iOS
localStorage only Misses installations from other sessions
Server-side tracking Requires authentication; overcomplicated
Cookie-based Cleared more frequently than localStorage

ADR-037: Keep a Changelog for Release Documentation

Attribute Value
ID ADR-037
Status Accepted
Date 2026-01-20
Context Need a standardized way to document changes between releases for developers, operators, and stakeholders

Decision

Adopt the Keep a Changelog format for documenting all notable changes to the project, combined with Semantic Versioning for version numbers.

Rationale

Why Keep a Changelog?

  1. Human-readable: Written for humans, not machines - focuses on what matters to users
  2. Standardized format: Well-known convention reduces cognitive load
  3. Categorized changes: Clear sections (Added, Changed, Deprecated, Removed, Fixed, Security)
  4. Release-oriented: Groups changes by version, making it easy to see what's in each release
  5. Unreleased section: Accumulates changes before a release, making release notes easy

Why Semantic Versioning?

  • MAJOR.MINOR.PATCH format communicates impact:
  • MAJOR: Breaking changes
  • MINOR: New features (backward compatible)
  • PATCH: Bug fixes (backward compatible)
  • Industry standard, well understood by developers
  • Enables automated tooling and dependency management

Benefits for AI-Generated Codebase

This project is primarily AI-generated, making structured documentation critical:

  1. Context for AI: Changelog provides history context for future AI sessions
  2. Audit trail: Documents what was added/changed in each phase
  3. Stakeholder communication: Non-technical stakeholders can understand progress
  4. Debugging aid: When issues arise, changelog helps identify when changes were introduced

Implementation

File location: CHANGELOG.md in repository root

Format:

# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [0.7.0] - 2026-01-19
### Added
- Feature description

### Changed
- Change description

### Fixed
- Bug fix description

### Security
- Security fix description

Change categories (use only those that apply): - Added: New features - Changed: Changes to existing functionality - Deprecated: Features marked for removal - Removed: Features removed - Fixed: Bug fixes - Security: Vulnerability fixes

Guidelines

  1. Update with every PR: Add changelog entry as part of the PR
  2. Write for humans: Describe the user impact, not implementation details
  3. Link to issues/PRs: Reference related issues where helpful
  4. Keep Unreleased current: Move entries to versioned section on release
  5. One entry per change: Don't combine unrelated changes

Consequences

Positive:

  • ✅ Clear release history for all stakeholders
  • ✅ Standardized format reduces documentation overhead
  • ✅ Supports both manual reading and automated parsing
  • ✅ Integrates well with CI/CD release workflows
  • ✅ Provides context for AI-assisted development sessions

Negative:

  • ⚠️ Requires discipline to update with each change
  • ⚠️ Can become verbose if too granular

Alternatives Considered

Alternative Reason for Rejection
Git commit history only Too granular; hard to see high-level changes
GitHub Releases only Tied to GitHub; not in repository
Auto-generated from commits Requires strict commit conventions; often too noisy
Wiki-based changelog Separate from code; easy to forget to update

ADR-038: Zensical for Publishing Project Documentation

Attribute Value
ID ADR-038
Status Accepted
Date 2026-01-21
Context Need a maintainable, deployable documentation website built from the repository docs/

Decision

Publish the repository documentation in docs/ as a static website built with Zensical.

The docs site is:

  • Built from docs/ with configuration in zensical.toml
  • Rendered with PlantUML pre-rendering (SVG/PNG) for existing diagrams
  • Packaged as a container image forma3d-connect-docs and published to the existing container registry
  • Deployed to staging behind Traefik at https://staging-connect-docs.forma3d.be
  • Managed by the existing Azure DevOps pipeline using docsAffected detection

Rationale

  • Single source of truth: docs live next to the code they describe (docs/)
  • Static output: simple, fast, cacheable; no backend runtime required
  • Pipeline parity: follows the same build/sign/SBOM/deploy controls as api and web
  • Diagram support: preserves existing PlantUML investment via deterministic CI rendering

Implementation

  • Config: zensical.toml (sets site name, logo, PlantUML markdown extension)
  • Container build: deployment/docs/Dockerfile (builds site + serves via Nginx)
  • Staging service: deployment/staging/docker-compose.yml (docs service + Traefik labels)
  • CI/CD: azure-pipelines.yml
  • Detect changes to docs/** or zensical.toml via docsAffected
  • Build/push/sign/SBOM the forma3d-connect-docs image
  • Deploy conditionally to staging

Consequences

Positive:

  • ✅ Documentation changes can be delivered independently of API/Web
  • ✅ Consistent hosting model (Traefik + container) across services
  • ✅ PlantUML diagrams render in the published docs site

Negative:

  • ⚠️ Docs builds can be slower due to diagram rendering (mitigated by caching)
  • ⚠️ Local preview requires Zensical + Java/Graphviz (documented in developer workflow)

Alternatives Considered

Alternative Reason for Rejection
Host Markdown in repo UI Not a branded, searchable documentation site
MkDocs Material Zensical provides a modern, batteries-included path with similar ecosystem compatibility
Convert all diagrams to Mermaid High migration effort; risk of losing diagram fidelity

Document History

Version Date Author Changes
1.0 2026-01-10 AI Assistant Initial ADR document with 13 decisions
1.1 2026-01-10 AI Assistant Updated ADR-006 for Digital Ocean hosting, added ADR-014 for SimplyPrint
1.2 2026-01-10 AI Assistant Added ADR-015 for Aikido Security Platform
1.3 2026-01-10 AI Assistant Added ADR-016 for Sentry Observability with OpenTelemetry
1.4 2026-01-10 AI Assistant Marked ADR-016 as implemented, added implementation details
1.5 2026-01-10 AI Assistant Added ADR-017 for Docker + Traefik Deployment Strategy
1.6 2026-01-11 AI Assistant Added ADR-018 for Nx Affected Conditional Deployment Strategy
1.7 2026-01-13 AI Assistant Phase 2 updates: Updated ADR-008 with implemented events, added ADR-019 (SimplyPrint Webhook Verification), ADR-020 (Hybrid Status Monitoring)
1.8 2026-01-14 AI Assistant Phase 3 updates: Added ADR-021 (Retry Queue), ADR-022 (Event-Driven Fulfillment), ADR-023 (Email Notifications)
1.9 2026-01-14 AI Assistant Security update: Added ADR-024 (API Key Authentication for Admin Endpoints)
2.0 2026-01-14 AI Assistant Supply chain security: Added ADR-025 (Cosign Image Signing)
2.1 2026-01-14 AI Assistant Phase 4 updates: Added ADR-027 (TanStack Query), ADR-028 (Socket.IO Real-Time), ADR-029 (Dashboard Authentication)
2.2 2026-01-16 AI Assistant SBOM attestations: Added ADR-026 (CycloneDX SBOM Attestations with Syft)
2.3 2026-01-16 AI Assistant Phase 5 updates: Added ADR-030 (Sendcloud for Shipping Integration)
2.4 2026-01-16 AI Assistant Registry cleanup: Added ADR-031 (Automated Container Registry Cleanup)
2.5 2026-01-17 AI Assistant Domain boundary separation: Added ADR-032 (Domain Boundary Separation with Interface Contracts)
2.6 2026-01-17 AI Assistant Critical tech debt resolution: Added ADR-033 (Database-Backed Webhook Idempotency)
2.7 2026-01-19 AI Assistant Infrastructure hardening: Added ADR-034 (Docker Log Rotation & Resource Cleanup)
2.8 2026-01-19 AI Assistant Cross-platform strategy: Added ADR-035 (PWA replaces Tauri/Capacitor native apps)
2.9 2026-01-20 AI Assistant PWA detection: Added ADR-036 (localStorage Fallback for PWA Install Detection)
3.0 2026-01-20 AI Assistant Documentation: Added ADR-037 (Keep a Changelog for Release Documentation)
3.1 2026-01-21 AI Assistant Documentation: Added ADR-038 (Zensical for publishing project documentation from docs/)

References