Skip to content

Patterns Evaluation — Forma3D.Connect

Date: 2026-02-20 Scope: Full codebase analysis — apps, libs, infrastructure, deployment Version: 1.0


Table of Contents


1. Executive Summary

Forma3D.Connect is a full-stack Nx monorepo built with React 19, NestJS, Prisma, and PostgreSQL. The system follows a microservice architecture with five backend services, a React PWA frontend, and nine shared libraries. This document evaluates the codebase against four industry-standard pattern catalogs:

Catalog Patterns Identified Coverage
Microservices Patterns 22 Strong
GoF Design Patterns 14 Good
12-Factor App 12/12 Excellent
Enterprise Application Patterns 15 Good

2. System Overview

uml diagram


3. Microservices Patterns

Reference catalog: microservices.io/patterns

3.1 Architectural Style

Microservice Architecture

Where: The entire system is decomposed into independently deployable services.

Why: The system integrates with three external platforms (Shopify, SimplyPrint, SendCloud) that have fundamentally different concerns. Microservices allow each integration to evolve, scale, and fail independently.

Service Domain Port Key Responsibility
apps/gateway Infrastructure 3000 Authentication, routing, WebSocket proxy
apps/order-service Order Management 3001 Orders, orchestration, Shopify
apps/print-service Manufacturing 3002 Print jobs, SimplyPrint
apps/shipping-service Logistics 3003 Shipments, SendCloud
apps/gridflock-service Product Generation 3004 Parametric STL, slicing pipeline

Evidence: - Each service has its own Dockerfile, main.ts, NestJS AppModule, and independent startup - Services communicate via BullMQ event queues and internal HTTP APIs - ADR-051 documents the decomposition decision


3.2 Service Boundaries & Decomposition

Decompose by Business Capability

Where: Service boundaries align with business capabilities — ordering, printing, shipping, product generation.

Why: Each service maps to a distinct business function with its own lifecycle, external integration, and domain expertise.

uml diagram

Decompose by Subdomain

Where: libs/domain/ contains shared domain entities; libs/domain-contracts/ defines bounded context interfaces.

Why: Domain-Driven Design subdomains are reflected in the service structure — Order (core), Printing (supporting), Shipping (supporting), GridFlock (supporting), Platform (generic).


3.3 Service Collaboration

Database per Service — Partial

Where: All services share a single PostgreSQL database via Prisma, but each service only accesses tables within its bounded context.

Why: At current scale, a shared database simplifies operations. Logical isolation is enforced through: - Repository patterns that scope queries to domain-specific tables - Tenant isolation via tenantId on all entities - Domain contracts that prevent cross-boundary data access

Trade-off: Not a strict database-per-service — this is a pragmatic choice documented for future extraction when scale demands it.

Saga — Orchestration-based

Where: apps/order-service/src/orchestration/orchestration.service.ts

Why: The order fulfillment workflow spans multiple services and cannot use a distributed transaction. The orchestration saga coordinates the flow:

uml diagram

Evidence: - handleOrderCreated() — initiates the saga - handlePrintJobCompleted() / handlePrintJobFailed() — compensating/progression steps - markOrderReadyForFulfillment() — saga completion gate

Domain Event

Where: - apps/order-service/src/events/event-publisher.service.ts — publishes domain events - apps/order-service/src/events/event-subscriber.service.ts — subscribes to cross-service events - libs/service-common/src/lib/events/bullmq-event-bus.ts — transport layer

Why: Domain events decouple services. When an order is created, the order service doesn't need to know how printing or shipping will handle it.

Events published: | Event | Publisher | Subscribers | |-------|-----------|-------------| | ORDER_CREATED | Order Service | Print Service, GridFlock Service | | ORDER_CANCELLED | Order Service | Print Service, Shipping Service | | ORDER_READY_FOR_FULFILLMENT | Order Service | Shipping Service | | PRINT_JOB_COMPLETED | Print Service | Order Service | | PRINT_JOB_FAILED | Print Service | Order Service | | SHIPMENT_CREATED | Shipping Service | Order Service | | GRIDFLOCK_MAPPING_READY | GridFlock Service | Order Service |

API Composition

Where: apps/gateway/ aggregates health checks from all downstream services into a single /health endpoint.

Why: External monitoring systems (Uptime Kuma) need a single health endpoint rather than polling each service.


3.4 Transactional Messaging

Transactional Outbox — Partial

Where: apps/order-service/src/events/event-publisher.service.ts bridges local EventEmitter2 events to BullMQ queues. The EventLog table records all published events.

Why: Ensures that domain events are reliably published even if the message broker is temporarily unavailable. The EventLog serves as an audit trail and can be used for event replay.

How it works: 1. Service performs database operation 2. Local EventEmitter2 fires domain event 3. EventPublisherService catches the event via @OnEvent decorator 4. Event is logged to EventLog table 5. Event is published to BullMQ queue


3.5 External API

API Gateway

Where: apps/gateway/

Why: Provides a single entry point for the frontend, handling cross-cutting concerns before routing to downstream services.

uml diagram

Evidence: - apps/gateway/src/routing/route-config.ts — path-based routing with longest-prefix match - apps/gateway/src/proxy/proxy.middleware.tshttp-proxy-middleware for HTTP proxying - User context headers (X-User-Id, X-Tenant-Id, X-User-Email, X-User-Roles, X-User-Permissions) injected into downstream requests

Backend for Frontend (BFF) — Implicit

Where: The gateway serves as a de facto BFF for the React PWA, tailoring the API surface to frontend needs (session auth, WebSocket proxy, aggregated health).


3.6 Communication Styles

Remote Procedure Invocation

Where: libs/service-common/src/lib/service-client/base-service-client.ts — typed HTTP clients for synchronous service-to-service calls.

Why: Some operations require synchronous responses (e.g., creating a print job and receiving the job ID).

Clients: - OrderServiceClient - PrintServiceClient - ShippingServiceClient - GridflockServiceClient - SlicerClient

All inject x-internal-api-key header for authentication.

Messaging

Where: libs/service-common/src/lib/events/bullmq-event-bus.ts — BullMQ-backed async event bus.

Why: Decouples services for fire-and-forget operations (e.g., notifying shipping that an order is ready). Provides retry semantics, exactly-once processing, and dead-letter handling.

Configuration: - 3 retry attempts with exponential backoff - removeOnComplete: 1000, removeOnFail: 5000 - Dedicated queues per event type

Idempotent Consumer

Where: - apps/order-service/src/shopify/webhook-idempotency.repository.tsProcessedWebhook table with unique constraint on webhookId - apps/print-service/src/print-jobs/print-jobs.service.ts — checks for existing jobs before creation

Why: Webhooks from Shopify, SimplyPrint, and SendCloud may be delivered multiple times. Idempotency prevents duplicate processing.


3.7 Reliability

Circuit Breaker — Partial (Retry Queue)

Where: apps/order-service/src/retry-queue/retry-queue.service.ts and apps/shipping-service/src/retry-queue/

Why: External API calls (Shopify, SimplyPrint, SendCloud) can fail transiently. The retry queue provides resilience with exponential backoff and jitter.

uml diagram

Job types: FULFILLMENT, PRINT_JOB_CREATION, CANCELLATION, NOTIFICATION, SHIPMENT


3.8 Security

Access Token

Where: - Gateway uses session-based authentication (Redis-backed express-session) - Internal service-to-service calls use API key (x-internal-api-key) with timing-safe comparison - User context is propagated via headers (X-User-Id, X-Tenant-Id, etc.)

Guards: | Guard | Location | Purpose | |-------|----------|---------| | SessionGuard | Gateway | Validates user session | | PermissionsGuard | Gateway | Checks RBAC permissions | | InternalAuthGuard | All services | Validates internal API key | | UserContextMiddleware | All services | Extracts user context from headers | | ShopifyWebhookGuard | Order Service | HMAC-SHA256 verification | | SimplyPrintWebhookGuard | Print Service | Token verification | | SendcloudWebhookGuard | Shipping Service | HMAC-SHA256 verification |


3.9 Observability

Log Aggregation

Where: libs/observability/src/lib/ — Pino structured JSON logging with sensitive field redaction, request ID correlation.

Distributed Tracing

Where: libs/observability/src/lib/otel.config.ts — OpenTelemetry with OTLP exporter. Correlation IDs (X-Trace-Id, X-Request-Id) propagated through all services via Sentry integration.

Exception Tracking

Where: libs/observability/src/lib/sentry.config.ts — Sentry integration with environment-based sampling (10% prod, 100% non-prod). Only 5xx errors captured. Domain error codes tagged.

Health Check API

Where: Every service exposes /health via @nestjs/terminus. The gateway aggregates all service health into a single endpoint.

Checks: Redis PING, Prisma database connectivity, downstream service reachability.

Audit Logging

Where: apps/order-service/src/audit/audit.service.ts — dedicated AuditLog table recording security events (auth.login.success, auth.login.failure, auth.logout) with actor details, IP address, user agent.


3.10 Deployment

Service Instance per Container

Where: Each service has its own Dockerfile using multi-stage builds (Node 20 Alpine). The deployment/staging/docker-compose.yml orchestrates all containers.

Evidence: - 5 backend service Dockerfiles + 1 web Dockerfile + 1 slicer Dockerfile - Docker Compose defines health checks, restart policies, volume mounts - Traefik provides automatic service discovery via Docker labels

uml diagram


3.11 Cross-Cutting Concerns

Microservice Chassis

Where: libs/service-common/ provides a shared chassis with reusable NestJS modules.

Modules provided: | Module | Purpose | |--------|---------| | BullMqEventBus | Async event publishing/subscribing | | BaseServiceClient | Typed HTTP clients with API key auth | | UserContextMiddleware | User identity extraction from headers | | InternalAuthGuard | Service-to-service authentication | | ServiceHealthIndicator | Redis/DB health checks | | CorrelationMiddleware | Request correlation IDs |

Externalized Configuration

Where: .env.example documents all environment variables. Each service validates config at startup via class-validator validation classes (e.g., apps/order-service/src/config/env.validation.ts).

Why: Configuration varies between development, staging, and production environments. Externalization enables the same container image to run in any environment.


3.12 Testing

Consumer-Driven Contract Test — Structural

Where: libs/domain-contracts/ defines TypeScript interfaces and DTOs that serve as contracts between services.

Contracts defined: - IOrdersService — Order service contract - IPrintJobsService — Print service contract - IShipmentsService — Shipment service contract - IFulfillmentService — Fulfillment service contract

Why: While not runtime contract tests (e.g., Pact), the shared TypeScript types enforce compile-time compatibility between service consumers and providers.

Service Component Test

Where: apps/acceptance-tests/ — Playwright-BDD acceptance tests running against staging services. Per-service Jest test suites with mock dependencies.


4. Design Patterns (GoF)

Reference catalog: refactoring.guru/design-patterns/catalog

4.1 Creational Patterns

Factory Method

Where: - libs/testing/src/fixtures/ — factory functions: createMockOrder(), createMockPrintJob(), createMockShipment() with variant factories (createProcessingOrder(), createCompletedOrder(), createFailedPrintJob()) - libs/domain/src/errors/ — typed error factories: OrderErrors.notFound(), PrintJobErrors.invalidTransition(), IntegrationErrors.apiTimeout()

Why: Centralizes object creation with sensible defaults while allowing overrides for specific test scenarios. Error factories ensure consistent error codes and HTTP statuses.

Builder

Where: Query construction in repositories — e.g., OrdersRepository builds Prisma queries with optional filters (status, date range, search term) via conditional where clause composition.

Why: Complex queries with many optional parameters benefit from step-by-step construction rather than a single constructor with many arguments.

Singleton

Where: NestJS dependency injection provides singleton scope by default: - PrismaService — single database connection pool per service - ConfigService — single config instance - BullMqEventBus — single event bus instance - QueryClient (frontend) — single TanStack Query client

Why: Resource-expensive objects (database connections, Redis connections, event bus) must be shared across the application.


4.2 Structural Patterns

Facade

Where: - API Gateway — the entire gateway is a facade over the microservice topology - apps/web/src/lib/api-client.ts — the frontend API client provides a simplified interface (apiClient.orders.list(), apiClient.printJobs.retry()) hiding HTTP complexity - apps/order-service/src/orchestration/orchestration.service.ts — facade over the multi-service fulfillment workflow

Why: Complex subsystems are hidden behind simple interfaces. The frontend doesn't need to know which backend service handles a request.

uml diagram

Proxy

Where: - apps/gateway/src/proxy/proxy.middleware.ts — HTTP reverse proxy that intercepts requests, adds authentication context, and forwards to downstream services - apps/web/vite.config.ts — development proxy for API and Socket.IO requests

Why: The proxy pattern adds cross-cutting behavior (auth, logging, rate limiting) without modifying the downstream services.

Adapter

Where: - apps/print-service/src/simplyprint/simplyprint-api.client.ts — adapts the SimplyPrint REST API to the internal domain model - apps/shipping-service/src/sendcloud/sendcloud.service.ts — adapts the SendCloud API to the internal shipment model - apps/order-service/src/shopify/ — adapts Shopify webhook payloads to internal Order entities - libs/service-common/src/lib/events/bullmq-event-bus.ts — adapts the IEventBus interface to BullMQ implementation

Why: External APIs have different data formats, authentication mechanisms, and error conventions than the internal domain. Adapters translate between the two.

Decorator

Where: NestJS decorators extensively used: - @RequirePermissions('orders.read') — adds RBAC checks to endpoints - @SkipThrottle() / @WebhookThrottle() — modifies rate limiting behavior - @OnEvent('ORDER_CREATED') — adds event handling to methods - Custom parameter decorators for extracting user context, tenant ID

Why: Decorators add behavior to classes and methods without modifying their implementation — a core principle of NestJS.

Composite

Where: NestJS module system — AppModule composes feature modules (OrdersModule, PrintJobsModule, etc.), each of which composes providers, controllers, and sub-modules into a unified tree.

Why: The module tree allows features to be organized hierarchically while still being composable.


4.3 Behavioral Patterns

Observer

Where: - EventEmitter2 — local event system within each service (e.g., @OnEvent('order.created')) - BullMQ event bus — cross-service pub/sub - Socket.IO — real-time frontend notifications (order:created, printjob:completed) - TanStack Query cache invalidation — Socket.IO events trigger queryClient.invalidateQueries()

Why: Observer decouples event producers from consumers, allowing the orchestration service to react to events from any service without tight coupling.

uml diagram

Strategy

Where: - Print job matching in apps/print-service/ — multi-step fallback strategy for matching SimplyPrint webhooks to internal jobs (by UID → by numeric ID → by API query → by queue-item ID) - Shipping method selection in apps/shipping-service/ — maps carrier + country + delivery type to shipping method IDs - PWA caching strategies in apps/web/src/sw.tsStaleWhileRevalidate for API data, NetworkOnly for health, Precache for assets

Why: Different algorithms are needed for different contexts (webhook matching, shipping methods, cache strategies). Strategy allows swapping algorithms without changing the client code.

Template Method

Where: - libs/service-common/src/lib/service-client/base-service-client.ts — defines the HTTP request template (URL construction, header injection, error handling) while subclasses define specific endpoints - libs/domain/src/errors/base.error.tsDomainError base class provides the error response template; subclasses (OrderErrors, PrintJobErrors) specialize the error codes

Why: Shared behavior (HTTP call mechanics, error formatting) is defined once in the base class, while subclasses only override the variable parts.

State

Where: State machines for domain entities: - Order: PENDINGPROCESSINGPARTIALLY_COMPLETEDCOMPLETED / FAILED / CANCELLED - PrintJob: QUEUEDASSIGNEDPRINTINGCOMPLETED / FAILED / CANCELLED - Shipment: Lifecycle managed by SendCloud webhook status updates - GridflockJob: PENDINGPROCESSINGCOMPLETED / FAILED

Why: Each entity has well-defined states with valid transitions. Invalid transitions are rejected with typed domain errors.

uml diagram

uml diagram

Mediator

Where: apps/order-service/src/orchestration/orchestration.service.ts — acts as a mediator between Order, Print, Shipping, and GridFlock services, coordinating multi-service workflows without direct service-to-service knowledge.

Why: Without the mediator, each service would need to know about every other service it interacts with, creating an N-to-N coupling problem.

Chain of Responsibility

Where: NestJS middleware and guard pipeline in the gateway: 1. ThrottlerGuard (rate limiting) 2. SessionGuard (authentication) 3. PermissionsGuard (authorization) 4. CorrelationMiddleware (tracing) 5. ProxyMiddleware (routing)

Each handler in the chain can short-circuit the request (e.g., return 401, 403, 429).

Why: Each cross-cutting concern is handled by a focused component, and the chain can be reordered or extended without modifying individual handlers.

Command

Where: BullMQ jobs encapsulate operations as serializable commands: - generate-baseplate / generate-plate-set — GridFlock generation commands - FULFILLMENT / CANCELLATION / NOTIFICATION — retry queue commands - Event payloads contain all data needed to execute the operation

Why: Commands can be queued, retried, and processed asynchronously. Failed commands persist for manual intervention.


5. Twelve-Factor App Compliance

Reference: 12factor.net

uml diagram

Factor-by-Factor Assessment

# Factor Compliance Evidence
I Codebase Full Single Git repo (Nx monorepo) with multiple deploys (dev, staging, production). Each service is independently deployable from the same codebase.
II Dependencies Full pnpm with lockfile (pnpm-lock.yaml). package.json declares all dependencies. Docker images use multi-stage builds with explicit Node 20 Alpine base. No system-level dependencies assumed.
III Config Full All configuration via environment variables (.env.example documents 40+ variables). Per-service validation at startup via class-validator. No config in code. Feature flags (SIMPLYPRINT_POLLING_ENABLED, SHIPPING_ENABLED) externalized.
IV Backing Services Full PostgreSQL, Redis, Sentry, Shopify API, SimplyPrint API, SendCloud API — all treated as attached resources configured via URL environment variables (DATABASE_URL, REDIS_URL, SENTRY_DSN).
V Build, Release, Run Full Azure DevOps pipeline: build (Docker multi-stage) → push (DigitalOcean Container Registry) → deploy (deployment/staging/deploy.sh with image tag versioning). Strict separation of stages.
VI Processes Full Services are stateless. Session state stored in Redis (not in-process). File storage in GridFlock uses shared volumes (not local disk). No sticky sessions required.
VII Port Binding Full Each service self-binds to a port: Gateway (3000), Order (3001), Print (3002), Shipping (3003), GridFlock (3004), Slicer (3010). Web app served via Nginx on port 80.
VIII Concurrency Mostly Scale-out via containers (one process per container). BullMQ workers can scale horizontally. However, current deployment is single-instance per service (no horizontal pod autoscaling yet).
IX Disposability Full Fast startup (NestJS boots in seconds). Graceful shutdown via NestJS lifecycle hooks. BullMQ workers drain gracefully. Docker health checks enable zero-downtime deploys.
X Dev/Prod Parity Full Docker Compose for local dev mirrors staging topology. Same Dockerfiles, same environment variable structure. PostgreSQL and Redis used in all environments (no SQLite for dev).
XI Logs Full Pino structured JSON logging to stdout. No log file management in-process. Dozzle aggregates container logs. Sentry captures exceptions. Logs include request IDs and correlation data.
XII Admin Processes Full prisma migrate deploy for database migrations. prisma/seed.ts and prisma/seed-rbac.ts for data seeding. Deploy script (deploy.sh) supports --migrate-only mode. All admin processes run as one-off commands.

6. Enterprise Application Architecture Patterns

Reference catalog: martinfowler.com/eaaCatalog

6.1 Domain Logic Patterns

Service Layer

Where: Every backend service implements a service layer: - apps/order-service/src/orders/orders.service.ts - apps/print-service/src/print-jobs/print-jobs.service.ts - apps/shipping-service/src/shipments/shipments.service.ts - apps/gridflock-service/src/gridflock/gridflock.service.ts

Why: The service layer defines the application boundary, coordinates business logic, and orchestrates repository calls and event emission. Controllers are thin (HTTP-only), repositories are data-only.

uml diagram

Domain Model

Where: libs/domain/src/entities/ — domain entities (Order, PrintJob, LineItem, ProductMapping) encapsulate both data and behavior (status transitions, validation).

Why: Rich domain entities prevent business logic from leaking into services or controllers. Status transition rules are co-located with the entity definition.


6.2 Data Source Architectural Patterns

Data Mapper

Where: Prisma ORM acts as a data mapper between TypeScript domain objects and PostgreSQL tables. Each service has repository classes that map between Prisma models and domain entities.

Why: Domain entities remain independent of database schema details. Prisma handles the mapping.

Repository

Where: - apps/order-service/src/orders/orders.repository.ts - apps/order-service/src/shopify/webhook-idempotency.repository.ts - apps/print-service/src/print-jobs/print-jobs.repository.ts - apps/shipping-service/src/shipments/shipments.repository.ts - apps/gridflock-service/src/gridflock/gridflock.repository.ts

Why: Repositories provide a collection-like interface for accessing domain objects, encapsulating Prisma queries and tenant isolation logic. Services never call Prisma directly.

Example interface:

interface IOrdersRepository {
  create(data: CreateOrderData): Promise<Order>;
  findById(id: string): Promise<Order | null>;
  findByShopifyOrderId(shopifyOrderId: string): Promise<Order | null>;
  findAll(filters: OrderFilters): Promise<PaginatedResult<Order>>;
  update(id: string, data: UpdateOrderData): Promise<Order>;
}

6.3 Object-Relational Patterns

Identity Field

Where: All Prisma models use id String @id @default(cuid()) as their identity field, maintaining a one-to-one mapping between in-memory objects and database rows.

Foreign Key Mapping

Where: Prisma relations map associations to foreign keys: - OrderLineItem (one-to-many via orderId) - ProductMappingAssemblyPart (one-to-many via productMappingId) - OrderShipment (one-to-one via orderId) - Tenant → all entities (via tenantId)

Single Table Inheritance — Via Enums

Where: RetryQueue table uses RetryJobType enum (FULFILLMENT, PRINT_JOB_CREATION, CANCELLATION, NOTIFICATION, SHIPMENT) to distinguish job types in a single table, with a payload JSON column for type-specific data.

Why: Retry jobs share the same lifecycle (pending → processing → completed/failed) but carry different payloads. A single table simplifies the retry processor.

Embedded Value

Where: Prisma JSON columns store complex value objects: - Order.shippingAddress — JSON field storing address components - ProductMapping.metadata — JSON field for arbitrary product metadata - RetryQueue.payload — JSON field for operation-specific data - GridflockJob.parameters — JSON field for generation parameters

Why: Avoids excessive table decomposition for value objects that are always read/written as a unit.


6.4 Web Presentation Patterns

Model View Controller

Where: - Backend: NestJS controllers (View), services (Controller logic), domain entities (Model) - Frontend: React components (View), custom hooks (Controller logic), TanStack Query cache (Model)

Why: Separation of concerns between data, business logic, and presentation.

Front Controller

Where: The API Gateway acts as a front controller, handling all incoming requests and routing them to the appropriate service. Traefik provides an additional front controller layer for TLS termination and routing.

Application Controller

Where: React Router in apps/web/src/router.tsx — centralized route definitions with lazy loading, route guards (ProtectedRoute, PermissionGatedRoute), and layout composition.


6.5 Distribution Patterns

Remote Facade

Where: Each NestJS controller exposes a coarse-grained REST API that aggregates multiple fine-grained domain operations: - POST /api/v1/orders creates an order with line items, triggers orchestration, and returns a complete response - POST /api/v1/mappings creates a mapping with assembly parts in a single call

Why: Reduces network round-trips between frontend and backend by batching operations into coarse-grained API calls.

Data Transfer Object

Where: - libs/domain-contracts/src/api/ — API request/response DTOs - libs/domain-contracts/src/lib/types.ts — shared DTO types - NestJS ValidationPipe + class-validator decorators for runtime validation - Swagger @ApiProperty decorators for documentation

DTOs defined: - CreateOrderDto, OrderResponseDto, OrderQueryDto - CreateProductMappingDto, ProductMappingResponseDto - InternalCreatePrintJobDto, InternalCreateShipmentDto - BullMQ event payloads: OrderCreatedEvent, PrintJobCompletedEvent

Why: DTOs decouple the internal domain model from the external API contract, allowing each to evolve independently.


6.6 Base Patterns

Gateway

Where: - libs/service-common/src/lib/service-client/base-service-client.ts — gateway to downstream microservices - apps/print-service/src/simplyprint/simplyprint-api.client.ts — gateway to SimplyPrint API - apps/shipping-service/src/sendcloud/sendcloud.service.ts — gateway to SendCloud API - apps/order-service/src/shopify/ — gateway to Shopify API

Why: Encapsulates access to external systems behind a well-defined interface, isolating the rest of the codebase from third-party API details.

Separated Interface

Where: libs/domain-contracts/src/ defines interfaces (IOrdersService, IPrintJobsService, IShipmentsService) in a separate library from their implementations. NestJS injection tokens (ORDERS_SERVICE, PRINT_JOBS_SERVICE) enable runtime binding.

Why: Services depend on interfaces, not implementations. This supports testability (mock implementations) and future architectural changes.

uml diagram

Value Object

Where: - libs/domain/src/schemas/shipping-address.schema.ts — shipping address as a value object (validated via Zod) - libs/domain/src/schemas/print-profile.schema.ts — print profile parameters - libs/domain/src/schemas/event-metadata.schema.ts — event metadata

Why: Value objects are immutable, compared by value (not identity), and validated on construction. They prevent invalid data from entering the domain.

Special Case (Null Object)

Where: libs/domain/src/errors/ — typed domain errors act as special cases for failure scenarios: - OrderErrors.notFound(id) — returns a 404 with error code ORDER_NOT_FOUND - PrintJobErrors.invalidTransition(from, to) — returns a 422 with error code INVALID_STATUS_TRANSITION - CommonErrors.unauthorized() — returns a 401 with error code UNAUTHORIZED

Why: Instead of throwing generic exceptions or returning null, typed errors carry context (code, message, HTTP status) that enable precise error handling upstream.

Registry

Where: NestJS IoC container acts as a service registry. Injection tokens (ORDERS_SERVICE, PRINT_JOBS_SERVICE) provide well-known lookup keys for service implementations.

Why: Services are located through the registry rather than direct instantiation, enabling dependency injection and testability.

Layer Supertype

Where: - libs/domain/src/errors/base.error.tsDomainError is the layer supertype for all domain errors - libs/service-common/src/lib/service-client/base-service-client.tsBaseServiceClient is the layer supertype for all HTTP service clients

Why: Common behavior (error formatting, HTTP request mechanics) is defined once in the supertype and inherited by all subtypes in that layer.


7. Additional Architectural Principles

7.1 Domain-Driven Design

uml diagram

DDD Concepts Applied:

Concept Implementation Location
Bounded Context Each service owns its domain logic apps/*-service/
Aggregate Order → LineItem hierarchy libs/domain/src/entities/
Entity Order, PrintJob, Shipment (identity-based) libs/domain/src/entities/
Value Object ShippingAddress, PrintProfile (value-based) libs/domain/src/schemas/
Domain Event ORDER_CREATED, PRINT_JOB_COMPLETED libs/service-common/src/lib/events/
Domain Service OrchestrationService, FulfillmentService apps/order-service/src/orchestration/
Anti-Corruption Layer Shopify/SimplyPrint/SendCloud adapters apps/*-service/src/*/
Shared Kernel libs/domain, libs/domain-contracts libs/
Ubiquitous Language Consistent naming across services All code

7.2 Event-Driven Architecture

uml diagram

Characteristics: - Event sourcing-like audit trailEventLog table records all business events with metadata, enabling reconstruction of system state - Eventual consistency — services are eventually consistent via async BullMQ events - Exactly-once processing — BullMQ ensures each job is processed by exactly one worker - Event replay potential — EventLog can be used to replay events for debugging or recovery


7.3 Progressive Web App Architecture

Where: apps/web/

PWA Feature Implementation Location
Service Worker Workbox with precaching + runtime caching apps/web/src/sw.ts
Offline Support IndexedDB for pending actions, online sync apps/web/src/lib/indexed-db.ts
Push Notifications Web Push API with VAPID keys apps/web/src/hooks/use-push-notifications.ts
Install Prompt beforeinstallprompt event handling apps/web/src/pwa/install-prompt.tsx
Background Sync Pending action queue with online status detection apps/web/src/hooks/use-online-status.ts
Pull to Refresh Touch gesture handling for mobile apps/web/src/pwa/pull-to-refresh.tsx
Update Flow Service worker update detection and prompt apps/web/src/pwa/sw-update-prompt.tsx

7.4 Monorepo Architecture

uml diagram

Nx Features Used: - Module boundaries — ESLint rules enforce import constraints - Affected commands — Only rebuild/test changed projects - Caching — Build and test outputs cached - Task orchestrationdepends on: ^build ensures correct build order - Code generation — Nx generators for new libraries and components


8. Pattern Coverage Matrix

Microservices Patterns

Pattern Status Notes
Microservice Architecture Implemented 5 services + gateway
Decompose by Business Capability Implemented Order, Print, Shipping, GridFlock, Platform
Decompose by Subdomain Implemented Core, supporting, generic subdomains
API Gateway Implemented NestJS gateway with proxy, auth, rate limiting
BFF Implicit Gateway tailored for PWA
Database per Service Partial Shared DB with logical isolation
Saga (Orchestration) Implemented Order fulfillment orchestrator
Domain Event Implemented 11 event types via BullMQ
API Composition Implemented Health aggregation
Transactional Outbox Partial EventLog + EventPublisher
RPI Implemented Typed HTTP service clients
Messaging Implemented BullMQ event bus
Idempotent Consumer Implemented Webhook deduplication
Circuit Breaker Partial Retry queue with backoff
Access Token Implemented Session + API key + user context headers
Log Aggregation Implemented Pino JSON → Dozzle
Distributed Tracing Implemented OpenTelemetry + Sentry
Exception Tracking Implemented Sentry
Health Check API Implemented @nestjs/terminus
Audit Logging Implemented AuditLog table
Service Instance per Container Implemented Docker per service
Microservice Chassis Implemented libs/service-common
Externalized Configuration Implemented .env + validation

GoF Design Patterns

Pattern Status Primary Location
Factory Method Implemented Test fixtures, error factories
Builder Implemented Repository query construction
Singleton Implemented NestJS DI, QueryClient
Facade Implemented Gateway, API client, orchestration
Proxy Implemented Gateway proxy middleware
Adapter Implemented External API clients
Decorator Implemented NestJS decorators
Composite Implemented NestJS module tree
Observer Implemented EventEmitter2, BullMQ, Socket.IO
Strategy Implemented Job matching, caching strategies
Template Method Implemented BaseServiceClient, DomainError
State Implemented Order/PrintJob state machines
Mediator Implemented OrchestrationService
Chain of Responsibility Implemented NestJS middleware/guard pipeline
Command Implemented BullMQ jobs

Enterprise Application Patterns

Pattern Status Primary Location
Service Layer Implemented All service classes
Domain Model Implemented libs/domain entities
Data Mapper Implemented Prisma ORM
Repository Implemented All repository classes
Identity Field Implemented CUID primary keys
Foreign Key Mapping Implemented Prisma relations
Embedded Value Implemented JSON columns
Single Table Inheritance Implemented RetryQueue with enums
MVC Implemented NestJS + React
Front Controller Implemented API Gateway
Application Controller Implemented React Router
Remote Facade Implemented Coarse-grained REST APIs
Data Transfer Object Implemented DTOs in domain-contracts
Gateway Implemented External API clients
Separated Interface Implemented domain-contracts lib
Value Object Implemented Zod schemas
Special Case Implemented Typed domain errors
Registry Implemented NestJS IoC container
Layer Supertype Implemented DomainError, BaseServiceClient

9. Recommendations

9.1 Patterns to Strengthen

Database per Service

Current state All services share a single PostgreSQL instance via Prisma. Repositories scope queries to their bounded context tables, and tenant isolation is enforced via tenantId.
Recommendation Extract to separate databases when horizontal scaling or independent deployment cadences are needed.

How to implement:

  1. Phase 1 — Schema ownership boundaries. Each service already accesses a scoped set of tables. Formalize this by creating per-service Prisma schemas (prisma/order.prisma, prisma/print.prisma, etc.) that only expose the tables owned by that service. Nx can generate the clients independently.
  2. Phase 2 — Separate connection strings. Introduce per-service DATABASE_URL environment variables (ORDER_DATABASE_URL, PRINT_DATABASE_URL). Initially they can all point to the same PostgreSQL instance — this validates that services do not join across boundaries.
  3. Phase 3 — Physical separation. Create separate DigitalOcean Managed PostgreSQL databases. Migrate data using pg_dump/pg_restore per schema. Update Docker Compose and deployment scripts to provision separate connection secrets.
  4. Cross-boundary reads that currently exist (e.g., Shipping Service reading order data) must be replaced with synchronous HTTP calls via OrderServiceClient or denormalized into the local database via domain events.

uml diagram


Circuit Breaker

Current state External API resilience relies on the RetryQueueService (exponential backoff with jitter, max 5 retries). API clients for Shopify, SimplyPrint, and SendCloud have no circuit breaker — a prolonged outage will keep queuing retries and consuming resources.
Recommendation Add a circuit breaker in front of each external API client using the opossum library.

How to implement:

  1. Install opossumpnpm add opossum && pnpm add -D @types/opossum.
  2. Create a shared circuit breaker factory in libs/service-common/src/lib/circuit-breaker/:
import CircuitBreaker from 'opossum';

export interface CircuitBreakerOptions {
  timeout: number;
  errorThresholdPercentage: number;
  resetTimeout: number;
  volumeThreshold: number;
}

export function createCircuitBreaker<T>(
  action: (...args: unknown[]) => Promise<T>,
  options: CircuitBreakerOptions
): CircuitBreaker<unknown[], T> {
  const breaker = new CircuitBreaker(action, {
    timeout: options.timeout,
    errorThresholdPercentage: options.errorThresholdPercentage,
    resetTimeout: options.resetTimeout,
    volumeThreshold: options.volumeThreshold,
  });

  breaker.on('open', () => logger.warn('Circuit breaker opened'));
  breaker.on('halfOpen', () => logger.info('Circuit breaker half-open'));
  breaker.on('close', () => logger.info('Circuit breaker closed'));

  return breaker;
}
  1. Wrap external API clients. In simplyprint-api.client.ts, sendcloud-api.client.ts, and shopify-api.client.ts, wrap the core HTTP methods with the circuit breaker:
this.breaker = createCircuitBreaker(
  (url, config) => this.axios.get(url, config),
  {
    timeout: 30_000,
    errorThresholdPercentage: 50,
    resetTimeout: 60_000,
    volumeThreshold: 5,
  }
);
  1. Emit circuit breaker state changes to the EventLog and Sentry so operators are notified when an integration is degraded.
  2. Expose breaker state in health checks (/health) — report DEGRADED when a circuit is open.

Transactional Outbox

Current state EventPublisherService listens to local EventEmitter2 events and publishes to BullMQ. The EventLog table records events for audit purposes. However, the database write and the BullMQ publish are not atomic — if the service crashes between the DB commit and the BullMQ publish, the event is lost.
Recommendation Formalize a transactional outbox table with a polling publisher or change-data-capture relay.

How to implement:

  1. Add an OutboxEvent table to the Prisma schema:
model OutboxEvent {
  id          String   @id @default(uuid())
  eventType   String
  payload     Json
  published   Boolean  @default(false)
  createdAt   DateTime @default(now())
  publishedAt DateTime?

  @@index([published, createdAt])
}
  1. Write events in the same transaction as the domain operation. In service methods, use prisma.$transaction():
await this.prisma.$transaction(async (tx) => {
  const order = await tx.order.update({ ... });
  await tx.outboxEvent.create({
    data: {
      eventType: 'ORDER_CREATED',
      payload: orderCreatedPayload,
    },
  });
  return order;
});
  1. Create an OutboxPublisher service in libs/service-common/ that polls the outbox table on a short interval (e.g., every 2 seconds via @Cron), publishes unpublished events to BullMQ, and marks them as published. This replaces the current EventEmitter2 → BullMQ bridge.

  2. Cleanup — add a cron job to delete published outbox events older than 7 days.

  3. Future enhancement — replace polling with Postgres logical replication / change-data-capture (e.g., Debezium) for lower latency, aligning with the Transaction Log Tailing pattern.


CQRS

Current state Read and write operations share the same service layer and repository. DTOs implicitly separate the read model (response DTOs with computed fields) from the write model (create/update DTOs). The AnalyticsService already uses raw Prisma queries optimized for read aggregations separate from the CRUD operations.
Recommendation Evaluate explicit CQRS when read/write performance requirements diverge — particularly for the dashboard, analytics, and order list views.

How to implement:

  1. Start with the analytics domain which already has separate query paths. Extract AnalyticsRepository queries into a dedicated read model that can use materialized views or database replicas.
  2. Introduce read-only repositories — create OrderReadRepository alongside OrdersRepository. The read repository uses optimized queries (select only needed columns, use joins for list views, leverage PostgreSQL indexes) while the write repository focuses on aggregate consistency.
  3. Route read traffic to a replica. Add a DATABASE_REPLICA_URL environment variable. Inject a second PrismaService instance configured with the replica URL. Read repositories use the replica; write repositories use the primary.
  4. For the frontend, TanStack Query already separates reads (useQuery) from writes (useMutation). The staleTime: 30s configuration means the UI tolerates eventual consistency — CQRS won't require frontend changes.

uml diagram


Consumer-Driven Contracts

Current state libs/domain-contracts/ defines TypeScript interfaces (IOrdersService, IPrintJobsService, etc.) and DTOs that provide compile-time contract enforcement. Services implement these interfaces and bind them via NestJS injection tokens (ORDERS_SERVICE, PRINT_JOBS_SERVICE).
Recommendation Add runtime contract testing with Pact for deployment confidence, especially as services are deployed independently.

How to implement:

  1. Install Pactpnpm add -D @pact-foundation/pact.
  2. Consumer-side tests. In services that call other services (e.g., Order Service calling Print Service via PrintServiceClient), write Pact consumer tests that define the expected API interactions:
describe('PrintServiceClient Pact', () => {
  const provider = new PactV4({
    consumer: 'order-service',
    provider: 'print-service',
  });

  it('creates a print job', async () => {
    await provider
      .addInteraction()
      .given('a valid product mapping exists')
      .uponReceiving('a request to create a print job')
      .withRequest('POST', '/api/v1/internal/print-jobs', ...)
      .willRespondWith(201, ...)
      .executeTest(async (mockServer) => {
        const client = new PrintServiceClient(mockServer.url);
        const result = await client.createPrintJob(dto);
        expect(result.status).toBe('QUEUED');
      });
  });
});
  1. Provider-side verification. In Print Service, verify that the real API satisfies the pact generated by the consumer test. This can run in the CI pipeline after both services are built.
  2. Pact Broker. Deploy a Pact Broker (or use PactFlow) to store and share pacts between consumer and provider pipelines. Add a can-i-deploy check to the deployment script to prevent incompatible deployments.
  3. Scope. Start with the internal HTTP APIs (/api/v1/internal/*) since those are the service-to-service contracts. Webhook contracts (Shopify, SimplyPrint, SendCloud) are owned by third parties and should use consumer-side contract tests instead.

9.2 Patterns to Consider Adopting

Event Sourcing

Current state EventLog records business events with type, severity, metadata, and timestamps. Orders and print jobs are stored as current-state snapshots in their respective tables.
Rationale Full event sourcing would enable state reconstruction, temporal queries ("what was the order state at 3pm?"), and simplified debugging of production issues.

How to implement:

  1. Start with the Order aggregate — the most complex entity with the richest state machine. Create an OrderEvent table:
model OrderEvent {
  id          String   @id @default(uuid())
  orderId     String
  eventType   String
  payload     Json
  version     Int
  createdAt   DateTime @default(now())
  createdBy   String?

  @@unique([orderId, version])
  @@index([orderId])
}
  1. Append-only writes. Modify OrdersService to append events rather than mutate state. Each state transition produces an event (OrderCreated, PrintJobAssigned, StatusChanged, OrderFulfilled).
  2. Projection for current state. Build a projection that replays events to materialize the current Order row. This can be a synchronous fold on read or an async projection updated by event handlers — the existing EventEmitter2 infrastructure supports this.
  3. Keep the existing snapshot table as a read-optimized projection. The orders table becomes a materialized view updated by event handlers. This avoids rewriting all read queries.
  4. Temporal queries. With the event log, add an endpoint GET /api/v1/orders/:id/history that replays events up to a given timestamp to reconstruct past state.
  5. Scope carefully. Event sourcing adds complexity. Only apply it to aggregates that genuinely need audit trails and temporal queries (Order, PrintJob). Simpler entities (ProductMapping, Printer) should remain CRUD.

Service Mesh

Current state Services communicate via direct HTTP calls (BaseServiceClienthttp://order-service:3001) and BullMQ events. Inter-service authentication uses a shared INTERNAL_API_KEY header with timing-safe comparison. No mTLS, no service-level retries beyond the retry queue, no traffic shaping.
Rationale As the service count grows, a mesh offloads cross-cutting communication concerns (mTLS, retries, circuit breaking, observability) from application code into infrastructure.

How to implement:

  1. Prerequisite — Kubernetes migration. Service meshes (Istio, Linkerd) require a container orchestrator. The current Docker Compose deployment on a single DigitalOcean droplet would need to migrate to a managed Kubernetes cluster (DigitalOcean Kubernetes / DOKS).
  2. Start with Linkerd (lighter than Istio). Install Linkerd on the cluster and inject sidecar proxies into each service deployment. Linkerd automatically adds mTLS between services, removing the need for INTERNAL_API_KEY validation.
  3. Traffic policies. Define retry budgets and timeouts in Linkerd ServiceProfile resources instead of application-level retry logic. This lets the mesh handle transient failures for synchronous HTTP calls, simplifying BaseServiceClient.
  4. Observability. Linkerd's built-in metrics (request rate, success rate, latency percentiles) complement the existing Sentry/OpenTelemetry setup. Grafana dashboards can consume Linkerd metrics via Prometheus.
  5. Incremental adoption. Inject sidecars one service at a time. The gateway can be the last to migrate since it already handles TLS termination via Traefik.

When to adopt: This is a significant infrastructure investment. Defer until the system needs horizontal scaling, independent deployment of services, or zero-trust networking between services.


Strangler Fig

Current state The Order Service currently contains 30 modules spanning order management, Shopify integration, orchestration, fulfillment, cancellation, analytics, notifications, retry queue, user management, audit logging, and multi-tenancy. It is the largest and most complex service.
Rationale Extracting cohesive subsets into independent services would reduce the blast radius of changes and allow independent scaling.

How to implement:

  1. Identify extraction candidates. The Order Service modules group into natural bounded contexts:

uml diagram

  1. Phase 1 — Extract the Integration Hub. The Shopify webhook handler, OAuth flow, and integration connection modules (SimplyPrintConnection, SendcloudConnection) are self-contained. Create an integration-service that owns Shopify webhook ingestion and publishes ORDER_CREATED events. The gateway routes /api/v1/shopify/* to the new service.
  2. Phase 2 — Extract Platform Services. Auth, Users, Tenancy, and Audit have no order-domain logic. Extract into a platform-service that the gateway delegates authentication to. Other services call it for user/tenant resolution instead of reading the shared database.
  3. Strangler routing. Update apps/gateway/src/routing/route-config.ts to route extracted paths to the new services. The gateway's path-based routing table makes this a configuration change:
{ path: '/api/v1/shopify', target: 'http://integration-service:3005' },
{ path: '/api/v1/admin/users', target: 'http://platform-service:3006' },
  1. Data migration. Move owned tables (e.g., ShopifyShop, User, Role, Permission, AuditLog) to the new service's database using the Database per Service migration path described above.
  2. Keep interfaces stable. The existing IOrdersService and domain-contracts interfaces remain unchanged. The Order Service still publishes and consumes the same BullMQ events — only the source of ORDER_CREATED shifts to the Integration Service.

Sidecar

Current state Observability (Sentry, OpenTelemetry, Pino logging) and security (API key validation, webhook HMAC verification) are implemented as NestJS modules within each service via libs/service-common and libs/observability. Each service container bundles all concerns.
Rationale Sidecar containers could offload log collection, metrics export, and TLS termination from application code, simplifying the service chassis.

How to implement (Docker Compose — no Kubernetes required):

  1. Log collection sidecar. Replace direct Pino stdout → Dozzle with a Fluent Bit sidecar per service that tails container logs, enriches them with service metadata, and forwards to the OpenTelemetry Collector (or ClickHouse directly). In docker-compose.yml:
order-service-logs:
  image: fluent/fluent-bit:latest
  volumes:
    - /var/lib/docker/containers:/var/lib/docker/containers:ro
  depends_on:
    - order-service
  environment:
    - SERVICE_NAME=order-service
  1. Metrics sidecar. Add a Prometheus exporter sidecar that scrapes the NestJS /metrics endpoint (exposed via prom-client) and makes metrics available for Grafana. This separates metrics collection from the application runtime.
  2. TLS sidecar (for service-to-service). Instead of implementing mTLS in application code, deploy an Envoy proxy sidecar alongside each service. Envoy handles TLS termination for incoming requests and TLS origination for outgoing calls. Services communicate over plaintext localhost; Envoy handles encryption on the wire. This is effectively a lightweight service mesh without Kubernetes.
  3. Incremental adoption. Start with the log collection sidecar (lowest risk, highest value). Metrics and TLS sidecars can follow as the operational complexity justifies them.
  4. Keep libs/service-common as fallback. The sidecar approach complements the chassis — services retain in-process error handling and domain logging while sidecars handle infrastructure-level concerns.

9.3 Prioritization & Implementation Order

Dependency Map

The nine recommendations are not independent — several have prerequisites or unlock others. The diagram below shows the dependency graph.

uml diagram

Dependency Analysis

Recommendation Depends On Unlocks
Circuit Breaker Nothing Service Mesh (supersedes it)
Transactional Outbox Nothing Database per Service, Event Sourcing
Consumer-Driven Contracts Nothing Database per Service, Strangler Fig
Sidecar (Log Collection) Nothing Service Mesh (stepping stone)
Database per Service Transactional Outbox, Consumer-Driven Contracts Strangler Fig, CQRS
Strangler Fig Consumer-Driven Contracts, Database per Service (Phase 1+)
CQRS Database per Service (Phase 2+) Event Sourcing (complementary)
Event Sourcing Transactional Outbox, CQRS (recommended)
Service Mesh Kubernetes migration, Sidecar (experience)

Key dependency chains:

  1. Reliability chain: Circuit Breaker → (standalone, immediate value)
  2. Data integrity chain: Transactional Outbox → Database per Service → CQRS → Event Sourcing
  3. Decomposition chain: Consumer-Driven Contracts → Database per Service → Strangler Fig
  4. Infrastructure chain: Sidecar → Service Mesh

MoSCoW Prioritization

Must Have

These address active production risks and should be implemented regardless of scaling plans.

# Recommendation Rationale Effort Risk if Deferred
1 Circuit Breaker A prolonged Shopify, SimplyPrint, or SendCloud outage will exhaust retry queue resources and degrade the entire Order Service. The current retry queue only limits retries — it doesn't prevent the initial flood of failing calls. opossum can be integrated in 1-2 days with the shared factory approach. S High — cascading failure during third-party outage
2 Transactional Outbox The current EventEmitter2 → BullMQ bridge is not atomic with the database write. A crash between commit and publish silently drops events, causing orders to stall without any print jobs being created. The fix is a Prisma migration + one service refactor. M High — silent event loss causes stuck orders
Should Have

These improve development velocity and operational maturity but are not urgent production risks.

# Recommendation Rationale Effort Risk if Deferred
3 Consumer-Driven Contracts As services are deployed independently (potentially by different developers or at different cadences), compile-time TypeScript checks alone cannot guarantee runtime compatibility. Pact tests catch contract drift before deployment. This is a prerequisite for safely pursuing Database per Service and Strangler Fig. M Medium — contract drift during independent deploys
4 Sidecar (Log Collection) The existing Dozzle/Pino setup works but doesn't support structured log routing, enrichment, or long-term retention. A Fluent Bit sidecar is low-risk and provides operational visibility improvements. It also builds familiarity with the sidecar pattern before committing to a full service mesh. S Low — operational inconvenience, not a system risk
Could Have

These are valuable architectural improvements that become necessary at higher scale or complexity but are premature at current size.

# Recommendation Rationale Effort Trigger to Implement
5 Database per Service The shared database is adequate at current scale (5 services, single deployment). Physical separation becomes necessary when services need independent scaling, different database engines, or separate deployment lifecycles. Phase 1 (schema ownership) can be done early as a low-risk boundary validation exercise. L Service needs independent scaling or different DB engine
6 Strangler Fig The Order Service's 30 modules are manageable while a small team owns the full codebase. Extraction becomes worthwhile when team boundaries form or the service's deployment frequency creates a bottleneck. XL Multiple teams, or Order Service deploy cadence becomes a bottleneck
7 CQRS Read/write performance is adequate with the shared database. Explicit CQRS adds architectural complexity. Only justified when dashboard/analytics queries start impacting write-path performance, or when read traffic volume requires a replica. L Read queries impacting write latency, or need for read replicas
Won't Have (for now)

These are architecturally sound but premature given the current deployment model, team size, and scale.

# Recommendation Rationale Effort Trigger to Reconsider
8 Event Sourcing Significant complexity increase (projections, snapshots, eventual consistency everywhere). The existing EventLog + state snapshots provide sufficient auditability. Only justified if regulatory requirements demand full state reconstruction, or if the Transactional Outbox + CQRS foundation is already in place. XL Regulatory audit requirements, or Outbox + CQRS already implemented
9 Service Mesh Requires a Kubernetes migration — a major infrastructure investment. The current Docker Compose on a single droplet with Traefik handles the traffic. A mesh is justified when there are 10+ services, zero-trust networking requirements, or multi-node deployments. XXL 10+ services, multi-node deployment, zero-trust requirement

uml diagram

Phase Timeframe Recommendations Rationale
1 Now Circuit Breaker Immediate production resilience. Independent, no prerequisites. Small effort (1-2 days).
2 Now + 1 week Transactional Outbox Fixes a data integrity gap. Independent, no prerequisites. Medium effort (3-5 days).
3 After Phase 2 Consumer-Driven Contracts Establishes runtime contract safety. Prerequisite for phases 5-6. Medium effort (1-2 weeks).
4 Anytime Sidecar (Log Collection) Opportunistic improvement. No dependencies on other phases. Small effort (1-2 days).
5 When scaling triggers Database per Service (Phase 1-2) Requires Outbox (Phase 2) and Contracts (Phase 3) in place. Start with schema ownership validation.
6 When team/complexity triggers Strangler Fig Requires Contracts (Phase 3) and DB boundaries (Phase 5). Begin with Integration Hub extraction.
7 When read performance triggers CQRS Requires DB per Service Phase 2+ (Phase 5). Start with analytics read model.
8 When regulatory/audit triggers Event Sourcing Requires Outbox (Phase 2) and ideally CQRS (Phase 7) as foundation.
9 When infra complexity triggers Service Mesh Requires Kubernetes migration. Defer until 10+ services or multi-node deployment.

Key insight: Phases 1-4 (Must + Should) can all be implemented within the current architecture without disruptive changes. They improve production resilience, data integrity, and operational visibility while laying the foundation for Phases 5-9 if and when those become necessary. Phases 5-9 are triggered by specific scaling or organizational thresholds, not by a calendar.

Effort Legend

Size Estimate Scope
S 1-2 days Single library or module change
M 3-5 days Cross-module change, one Prisma migration
L 1-3 weeks Multi-service change, infrastructure adjustments
XL 3-6 weeks Architectural change, data migration, new service
XXL 2-3 months Infrastructure platform change (e.g., Kubernetes)

References