Patterns Evaluation — Forma3D.Connect¶
Date: 2026-02-20 Scope: Full codebase analysis — apps, libs, infrastructure, deployment Version: 1.0
Table of Contents¶
- 1. Executive Summary
- 2. System Overview
- 3. Microservices Patterns
- 3.1 Architectural Style
- 3.2 Service Boundaries & Decomposition
- 3.3 Service Collaboration
- 3.4 Transactional Messaging
- 3.5 External API
- 3.6 Communication Styles
- 3.7 Reliability
- 3.8 Security
- 3.9 Observability
- 3.10 Deployment
- 3.11 Cross-Cutting Concerns
- 3.12 Testing
- 4. Design Patterns (GoF)
- 4.1 Creational Patterns
- 4.2 Structural Patterns
- 4.3 Behavioral Patterns
- 5. Twelve-Factor App Compliance
- 6. Enterprise Application Architecture Patterns
- 6.1 Domain Logic Patterns
- 6.2 Data Source Architectural Patterns
- 6.3 Object-Relational Patterns
- 6.4 Web Presentation Patterns
- 6.5 Distribution Patterns
- 6.6 Base Patterns
- 7. Additional Architectural Principles
- 7.1 Domain-Driven Design
- 7.2 Event-Driven Architecture
- 7.3 Progressive Web App Architecture
- 7.4 Monorepo Architecture
- 8. Pattern Coverage Matrix
- 9. Recommendations
1. Executive Summary¶
Forma3D.Connect is a full-stack Nx monorepo built with React 19, NestJS, Prisma, and PostgreSQL. The system follows a microservice architecture with five backend services, a React PWA frontend, and nine shared libraries. This document evaluates the codebase against four industry-standard pattern catalogs:
| Catalog | Patterns Identified | Coverage |
|---|---|---|
| Microservices Patterns | 22 | Strong |
| GoF Design Patterns | 14 | Good |
| 12-Factor App | 12/12 | Excellent |
| Enterprise Application Patterns | 15 | Good |
2. System Overview¶
3. Microservices Patterns¶
Reference catalog: microservices.io/patterns
3.1 Architectural Style¶
Microservice Architecture¶
Where: The entire system is decomposed into independently deployable services.
Why: The system integrates with three external platforms (Shopify, SimplyPrint, SendCloud) that have fundamentally different concerns. Microservices allow each integration to evolve, scale, and fail independently.
| Service | Domain | Port | Key Responsibility |
|---|---|---|---|
apps/gateway |
Infrastructure | 3000 | Authentication, routing, WebSocket proxy |
apps/order-service |
Order Management | 3001 | Orders, orchestration, Shopify |
apps/print-service |
Manufacturing | 3002 | Print jobs, SimplyPrint |
apps/shipping-service |
Logistics | 3003 | Shipments, SendCloud |
apps/gridflock-service |
Product Generation | 3004 | Parametric STL, slicing pipeline |
Evidence:
- Each service has its own Dockerfile, main.ts, NestJS AppModule, and independent startup
- Services communicate via BullMQ event queues and internal HTTP APIs
- ADR-051 documents the decomposition decision
3.2 Service Boundaries & Decomposition¶
Decompose by Business Capability¶
Where: Service boundaries align with business capabilities — ordering, printing, shipping, product generation.
Why: Each service maps to a distinct business function with its own lifecycle, external integration, and domain expertise.
Decompose by Subdomain¶
Where: libs/domain/ contains shared domain entities; libs/domain-contracts/ defines bounded context interfaces.
Why: Domain-Driven Design subdomains are reflected in the service structure — Order (core), Printing (supporting), Shipping (supporting), GridFlock (supporting), Platform (generic).
3.3 Service Collaboration¶
Database per Service — Partial¶
Where: All services share a single PostgreSQL database via Prisma, but each service only accesses tables within its bounded context.
Why: At current scale, a shared database simplifies operations. Logical isolation is enforced through:
- Repository patterns that scope queries to domain-specific tables
- Tenant isolation via tenantId on all entities
- Domain contracts that prevent cross-boundary data access
Trade-off: Not a strict database-per-service — this is a pragmatic choice documented for future extraction when scale demands it.
Saga — Orchestration-based¶
Where: apps/order-service/src/orchestration/orchestration.service.ts
Why: The order fulfillment workflow spans multiple services and cannot use a distributed transaction. The orchestration saga coordinates the flow:
Evidence:
- handleOrderCreated() — initiates the saga
- handlePrintJobCompleted() / handlePrintJobFailed() — compensating/progression steps
- markOrderReadyForFulfillment() — saga completion gate
Domain Event¶
Where:
- apps/order-service/src/events/event-publisher.service.ts — publishes domain events
- apps/order-service/src/events/event-subscriber.service.ts — subscribes to cross-service events
- libs/service-common/src/lib/events/bullmq-event-bus.ts — transport layer
Why: Domain events decouple services. When an order is created, the order service doesn't need to know how printing or shipping will handle it.
Events published:
| Event | Publisher | Subscribers |
|-------|-----------|-------------|
| ORDER_CREATED | Order Service | Print Service, GridFlock Service |
| ORDER_CANCELLED | Order Service | Print Service, Shipping Service |
| ORDER_READY_FOR_FULFILLMENT | Order Service | Shipping Service |
| PRINT_JOB_COMPLETED | Print Service | Order Service |
| PRINT_JOB_FAILED | Print Service | Order Service |
| SHIPMENT_CREATED | Shipping Service | Order Service |
| GRIDFLOCK_MAPPING_READY | GridFlock Service | Order Service |
API Composition¶
Where: apps/gateway/ aggregates health checks from all downstream services into a single /health endpoint.
Why: External monitoring systems (Uptime Kuma) need a single health endpoint rather than polling each service.
3.4 Transactional Messaging¶
Transactional Outbox — Partial¶
Where: apps/order-service/src/events/event-publisher.service.ts bridges local EventEmitter2 events to BullMQ queues. The EventLog table records all published events.
Why: Ensures that domain events are reliably published even if the message broker is temporarily unavailable. The EventLog serves as an audit trail and can be used for event replay.
How it works:
1. Service performs database operation
2. Local EventEmitter2 fires domain event
3. EventPublisherService catches the event via @OnEvent decorator
4. Event is logged to EventLog table
5. Event is published to BullMQ queue
3.5 External API¶
API Gateway¶
Where: apps/gateway/
Why: Provides a single entry point for the frontend, handling cross-cutting concerns before routing to downstream services.
Evidence:
- apps/gateway/src/routing/route-config.ts — path-based routing with longest-prefix match
- apps/gateway/src/proxy/proxy.middleware.ts — http-proxy-middleware for HTTP proxying
- User context headers (X-User-Id, X-Tenant-Id, X-User-Email, X-User-Roles, X-User-Permissions) injected into downstream requests
Backend for Frontend (BFF) — Implicit¶
Where: The gateway serves as a de facto BFF for the React PWA, tailoring the API surface to frontend needs (session auth, WebSocket proxy, aggregated health).
3.6 Communication Styles¶
Remote Procedure Invocation¶
Where: libs/service-common/src/lib/service-client/base-service-client.ts — typed HTTP clients for synchronous service-to-service calls.
Why: Some operations require synchronous responses (e.g., creating a print job and receiving the job ID).
Clients:
- OrderServiceClient
- PrintServiceClient
- ShippingServiceClient
- GridflockServiceClient
- SlicerClient
All inject x-internal-api-key header for authentication.
Messaging¶
Where: libs/service-common/src/lib/events/bullmq-event-bus.ts — BullMQ-backed async event bus.
Why: Decouples services for fire-and-forget operations (e.g., notifying shipping that an order is ready). Provides retry semantics, exactly-once processing, and dead-letter handling.
Configuration:
- 3 retry attempts with exponential backoff
- removeOnComplete: 1000, removeOnFail: 5000
- Dedicated queues per event type
Idempotent Consumer¶
Where:
- apps/order-service/src/shopify/webhook-idempotency.repository.ts — ProcessedWebhook table with unique constraint on webhookId
- apps/print-service/src/print-jobs/print-jobs.service.ts — checks for existing jobs before creation
Why: Webhooks from Shopify, SimplyPrint, and SendCloud may be delivered multiple times. Idempotency prevents duplicate processing.
3.7 Reliability¶
Circuit Breaker — Partial (Retry Queue)¶
Where: apps/order-service/src/retry-queue/retry-queue.service.ts and apps/shipping-service/src/retry-queue/
Why: External API calls (Shopify, SimplyPrint, SendCloud) can fail transiently. The retry queue provides resilience with exponential backoff and jitter.
Job types: FULFILLMENT, PRINT_JOB_CREATION, CANCELLATION, NOTIFICATION, SHIPMENT
3.8 Security¶
Access Token¶
Where:
- Gateway uses session-based authentication (Redis-backed express-session)
- Internal service-to-service calls use API key (x-internal-api-key) with timing-safe comparison
- User context is propagated via headers (X-User-Id, X-Tenant-Id, etc.)
Guards:
| Guard | Location | Purpose |
|-------|----------|---------|
| SessionGuard | Gateway | Validates user session |
| PermissionsGuard | Gateway | Checks RBAC permissions |
| InternalAuthGuard | All services | Validates internal API key |
| UserContextMiddleware | All services | Extracts user context from headers |
| ShopifyWebhookGuard | Order Service | HMAC-SHA256 verification |
| SimplyPrintWebhookGuard | Print Service | Token verification |
| SendcloudWebhookGuard | Shipping Service | HMAC-SHA256 verification |
3.9 Observability¶
Log Aggregation¶
Where: libs/observability/src/lib/ — Pino structured JSON logging with sensitive field redaction, request ID correlation.
Distributed Tracing¶
Where: libs/observability/src/lib/otel.config.ts — OpenTelemetry with OTLP exporter. Correlation IDs (X-Trace-Id, X-Request-Id) propagated through all services via Sentry integration.
Exception Tracking¶
Where: libs/observability/src/lib/sentry.config.ts — Sentry integration with environment-based sampling (10% prod, 100% non-prod). Only 5xx errors captured. Domain error codes tagged.
Health Check API¶
Where: Every service exposes /health via @nestjs/terminus. The gateway aggregates all service health into a single endpoint.
Checks: Redis PING, Prisma database connectivity, downstream service reachability.
Audit Logging¶
Where: apps/order-service/src/audit/audit.service.ts — dedicated AuditLog table recording security events (auth.login.success, auth.login.failure, auth.logout) with actor details, IP address, user agent.
3.10 Deployment¶
Service Instance per Container¶
Where: Each service has its own Dockerfile using multi-stage builds (Node 20 Alpine). The deployment/staging/docker-compose.yml orchestrates all containers.
Evidence: - 5 backend service Dockerfiles + 1 web Dockerfile + 1 slicer Dockerfile - Docker Compose defines health checks, restart policies, volume mounts - Traefik provides automatic service discovery via Docker labels
3.11 Cross-Cutting Concerns¶
Microservice Chassis¶
Where: libs/service-common/ provides a shared chassis with reusable NestJS modules.
Modules provided:
| Module | Purpose |
|--------|---------|
| BullMqEventBus | Async event publishing/subscribing |
| BaseServiceClient | Typed HTTP clients with API key auth |
| UserContextMiddleware | User identity extraction from headers |
| InternalAuthGuard | Service-to-service authentication |
| ServiceHealthIndicator | Redis/DB health checks |
| CorrelationMiddleware | Request correlation IDs |
Externalized Configuration¶
Where: .env.example documents all environment variables. Each service validates config at startup via class-validator validation classes (e.g., apps/order-service/src/config/env.validation.ts).
Why: Configuration varies between development, staging, and production environments. Externalization enables the same container image to run in any environment.
3.12 Testing¶
Consumer-Driven Contract Test — Structural¶
Where: libs/domain-contracts/ defines TypeScript interfaces and DTOs that serve as contracts between services.
Contracts defined:
- IOrdersService — Order service contract
- IPrintJobsService — Print service contract
- IShipmentsService — Shipment service contract
- IFulfillmentService — Fulfillment service contract
Why: While not runtime contract tests (e.g., Pact), the shared TypeScript types enforce compile-time compatibility between service consumers and providers.
Service Component Test¶
Where: apps/acceptance-tests/ — Playwright-BDD acceptance tests running against staging services. Per-service Jest test suites with mock dependencies.
4. Design Patterns (GoF)¶
Reference catalog: refactoring.guru/design-patterns/catalog
4.1 Creational Patterns¶
Factory Method¶
Where:
- libs/testing/src/fixtures/ — factory functions: createMockOrder(), createMockPrintJob(), createMockShipment() with variant factories (createProcessingOrder(), createCompletedOrder(), createFailedPrintJob())
- libs/domain/src/errors/ — typed error factories: OrderErrors.notFound(), PrintJobErrors.invalidTransition(), IntegrationErrors.apiTimeout()
Why: Centralizes object creation with sensible defaults while allowing overrides for specific test scenarios. Error factories ensure consistent error codes and HTTP statuses.
Builder¶
Where: Query construction in repositories — e.g., OrdersRepository builds Prisma queries with optional filters (status, date range, search term) via conditional where clause composition.
Why: Complex queries with many optional parameters benefit from step-by-step construction rather than a single constructor with many arguments.
Singleton¶
Where: NestJS dependency injection provides singleton scope by default:
- PrismaService — single database connection pool per service
- ConfigService — single config instance
- BullMqEventBus — single event bus instance
- QueryClient (frontend) — single TanStack Query client
Why: Resource-expensive objects (database connections, Redis connections, event bus) must be shared across the application.
4.2 Structural Patterns¶
Facade¶
Where:
- API Gateway — the entire gateway is a facade over the microservice topology
- apps/web/src/lib/api-client.ts — the frontend API client provides a simplified interface (apiClient.orders.list(), apiClient.printJobs.retry()) hiding HTTP complexity
- apps/order-service/src/orchestration/orchestration.service.ts — facade over the multi-service fulfillment workflow
Why: Complex subsystems are hidden behind simple interfaces. The frontend doesn't need to know which backend service handles a request.
Proxy¶
Where:
- apps/gateway/src/proxy/proxy.middleware.ts — HTTP reverse proxy that intercepts requests, adds authentication context, and forwards to downstream services
- apps/web/vite.config.ts — development proxy for API and Socket.IO requests
Why: The proxy pattern adds cross-cutting behavior (auth, logging, rate limiting) without modifying the downstream services.
Adapter¶
Where:
- apps/print-service/src/simplyprint/simplyprint-api.client.ts — adapts the SimplyPrint REST API to the internal domain model
- apps/shipping-service/src/sendcloud/sendcloud.service.ts — adapts the SendCloud API to the internal shipment model
- apps/order-service/src/shopify/ — adapts Shopify webhook payloads to internal Order entities
- libs/service-common/src/lib/events/bullmq-event-bus.ts — adapts the IEventBus interface to BullMQ implementation
Why: External APIs have different data formats, authentication mechanisms, and error conventions than the internal domain. Adapters translate between the two.
Decorator¶
Where: NestJS decorators extensively used:
- @RequirePermissions('orders.read') — adds RBAC checks to endpoints
- @SkipThrottle() / @WebhookThrottle() — modifies rate limiting behavior
- @OnEvent('ORDER_CREATED') — adds event handling to methods
- Custom parameter decorators for extracting user context, tenant ID
Why: Decorators add behavior to classes and methods without modifying their implementation — a core principle of NestJS.
Composite¶
Where: NestJS module system — AppModule composes feature modules (OrdersModule, PrintJobsModule, etc.), each of which composes providers, controllers, and sub-modules into a unified tree.
Why: The module tree allows features to be organized hierarchically while still being composable.
4.3 Behavioral Patterns¶
Observer¶
Where:
- EventEmitter2 — local event system within each service (e.g., @OnEvent('order.created'))
- BullMQ event bus — cross-service pub/sub
- Socket.IO — real-time frontend notifications (order:created, printjob:completed)
- TanStack Query cache invalidation — Socket.IO events trigger queryClient.invalidateQueries()
Why: Observer decouples event producers from consumers, allowing the orchestration service to react to events from any service without tight coupling.
Strategy¶
Where:
- Print job matching in apps/print-service/ — multi-step fallback strategy for matching SimplyPrint webhooks to internal jobs (by UID → by numeric ID → by API query → by queue-item ID)
- Shipping method selection in apps/shipping-service/ — maps carrier + country + delivery type to shipping method IDs
- PWA caching strategies in apps/web/src/sw.ts — StaleWhileRevalidate for API data, NetworkOnly for health, Precache for assets
Why: Different algorithms are needed for different contexts (webhook matching, shipping methods, cache strategies). Strategy allows swapping algorithms without changing the client code.
Template Method¶
Where:
- libs/service-common/src/lib/service-client/base-service-client.ts — defines the HTTP request template (URL construction, header injection, error handling) while subclasses define specific endpoints
- libs/domain/src/errors/base.error.ts — DomainError base class provides the error response template; subclasses (OrderErrors, PrintJobErrors) specialize the error codes
Why: Shared behavior (HTTP call mechanics, error formatting) is defined once in the base class, while subclasses only override the variable parts.
State¶
Where: State machines for domain entities:
- Order: PENDING → PROCESSING → PARTIALLY_COMPLETED → COMPLETED / FAILED / CANCELLED
- PrintJob: QUEUED → ASSIGNED → PRINTING → COMPLETED / FAILED / CANCELLED
- Shipment: Lifecycle managed by SendCloud webhook status updates
- GridflockJob: PENDING → PROCESSING → COMPLETED / FAILED
Why: Each entity has well-defined states with valid transitions. Invalid transitions are rejected with typed domain errors.
Mediator¶
Where: apps/order-service/src/orchestration/orchestration.service.ts — acts as a mediator between Order, Print, Shipping, and GridFlock services, coordinating multi-service workflows without direct service-to-service knowledge.
Why: Without the mediator, each service would need to know about every other service it interacts with, creating an N-to-N coupling problem.
Chain of Responsibility¶
Where: NestJS middleware and guard pipeline in the gateway:
1. ThrottlerGuard (rate limiting)
2. SessionGuard (authentication)
3. PermissionsGuard (authorization)
4. CorrelationMiddleware (tracing)
5. ProxyMiddleware (routing)
Each handler in the chain can short-circuit the request (e.g., return 401, 403, 429).
Why: Each cross-cutting concern is handled by a focused component, and the chain can be reordered or extended without modifying individual handlers.
Command¶
Where: BullMQ jobs encapsulate operations as serializable commands:
- generate-baseplate / generate-plate-set — GridFlock generation commands
- FULFILLMENT / CANCELLATION / NOTIFICATION — retry queue commands
- Event payloads contain all data needed to execute the operation
Why: Commands can be queued, retried, and processed asynchronously. Failed commands persist for manual intervention.
5. Twelve-Factor App Compliance¶
Reference: 12factor.net
Factor-by-Factor Assessment¶
| # | Factor | Compliance | Evidence |
|---|---|---|---|
| I | Codebase | Full | Single Git repo (Nx monorepo) with multiple deploys (dev, staging, production). Each service is independently deployable from the same codebase. |
| II | Dependencies | Full | pnpm with lockfile (pnpm-lock.yaml). package.json declares all dependencies. Docker images use multi-stage builds with explicit Node 20 Alpine base. No system-level dependencies assumed. |
| III | Config | Full | All configuration via environment variables (.env.example documents 40+ variables). Per-service validation at startup via class-validator. No config in code. Feature flags (SIMPLYPRINT_POLLING_ENABLED, SHIPPING_ENABLED) externalized. |
| IV | Backing Services | Full | PostgreSQL, Redis, Sentry, Shopify API, SimplyPrint API, SendCloud API — all treated as attached resources configured via URL environment variables (DATABASE_URL, REDIS_URL, SENTRY_DSN). |
| V | Build, Release, Run | Full | Azure DevOps pipeline: build (Docker multi-stage) → push (DigitalOcean Container Registry) → deploy (deployment/staging/deploy.sh with image tag versioning). Strict separation of stages. |
| VI | Processes | Full | Services are stateless. Session state stored in Redis (not in-process). File storage in GridFlock uses shared volumes (not local disk). No sticky sessions required. |
| VII | Port Binding | Full | Each service self-binds to a port: Gateway (3000), Order (3001), Print (3002), Shipping (3003), GridFlock (3004), Slicer (3010). Web app served via Nginx on port 80. |
| VIII | Concurrency | Mostly | Scale-out via containers (one process per container). BullMQ workers can scale horizontally. However, current deployment is single-instance per service (no horizontal pod autoscaling yet). |
| IX | Disposability | Full | Fast startup (NestJS boots in seconds). Graceful shutdown via NestJS lifecycle hooks. BullMQ workers drain gracefully. Docker health checks enable zero-downtime deploys. |
| X | Dev/Prod Parity | Full | Docker Compose for local dev mirrors staging topology. Same Dockerfiles, same environment variable structure. PostgreSQL and Redis used in all environments (no SQLite for dev). |
| XI | Logs | Full | Pino structured JSON logging to stdout. No log file management in-process. Dozzle aggregates container logs. Sentry captures exceptions. Logs include request IDs and correlation data. |
| XII | Admin Processes | Full | prisma migrate deploy for database migrations. prisma/seed.ts and prisma/seed-rbac.ts for data seeding. Deploy script (deploy.sh) supports --migrate-only mode. All admin processes run as one-off commands. |
6. Enterprise Application Architecture Patterns¶
Reference catalog: martinfowler.com/eaaCatalog
6.1 Domain Logic Patterns¶
Service Layer¶
Where: Every backend service implements a service layer:
- apps/order-service/src/orders/orders.service.ts
- apps/print-service/src/print-jobs/print-jobs.service.ts
- apps/shipping-service/src/shipments/shipments.service.ts
- apps/gridflock-service/src/gridflock/gridflock.service.ts
Why: The service layer defines the application boundary, coordinates business logic, and orchestrates repository calls and event emission. Controllers are thin (HTTP-only), repositories are data-only.
Domain Model¶
Where: libs/domain/src/entities/ — domain entities (Order, PrintJob, LineItem, ProductMapping) encapsulate both data and behavior (status transitions, validation).
Why: Rich domain entities prevent business logic from leaking into services or controllers. Status transition rules are co-located with the entity definition.
6.2 Data Source Architectural Patterns¶
Data Mapper¶
Where: Prisma ORM acts as a data mapper between TypeScript domain objects and PostgreSQL tables. Each service has repository classes that map between Prisma models and domain entities.
Why: Domain entities remain independent of database schema details. Prisma handles the mapping.
Repository¶
Where:
- apps/order-service/src/orders/orders.repository.ts
- apps/order-service/src/shopify/webhook-idempotency.repository.ts
- apps/print-service/src/print-jobs/print-jobs.repository.ts
- apps/shipping-service/src/shipments/shipments.repository.ts
- apps/gridflock-service/src/gridflock/gridflock.repository.ts
Why: Repositories provide a collection-like interface for accessing domain objects, encapsulating Prisma queries and tenant isolation logic. Services never call Prisma directly.
Example interface:
interface IOrdersRepository {
create(data: CreateOrderData): Promise<Order>;
findById(id: string): Promise<Order | null>;
findByShopifyOrderId(shopifyOrderId: string): Promise<Order | null>;
findAll(filters: OrderFilters): Promise<PaginatedResult<Order>>;
update(id: string, data: UpdateOrderData): Promise<Order>;
}
6.3 Object-Relational Patterns¶
Identity Field¶
Where: All Prisma models use id String @id @default(cuid()) as their identity field, maintaining a one-to-one mapping between in-memory objects and database rows.
Foreign Key Mapping¶
Where: Prisma relations map associations to foreign keys:
- Order → LineItem (one-to-many via orderId)
- ProductMapping → AssemblyPart (one-to-many via productMappingId)
- Order → Shipment (one-to-one via orderId)
- Tenant → all entities (via tenantId)
Single Table Inheritance — Via Enums¶
Where: RetryQueue table uses RetryJobType enum (FULFILLMENT, PRINT_JOB_CREATION, CANCELLATION, NOTIFICATION, SHIPMENT) to distinguish job types in a single table, with a payload JSON column for type-specific data.
Why: Retry jobs share the same lifecycle (pending → processing → completed/failed) but carry different payloads. A single table simplifies the retry processor.
Embedded Value¶
Where: Prisma JSON columns store complex value objects:
- Order.shippingAddress — JSON field storing address components
- ProductMapping.metadata — JSON field for arbitrary product metadata
- RetryQueue.payload — JSON field for operation-specific data
- GridflockJob.parameters — JSON field for generation parameters
Why: Avoids excessive table decomposition for value objects that are always read/written as a unit.
6.4 Web Presentation Patterns¶
Model View Controller¶
Where: - Backend: NestJS controllers (View), services (Controller logic), domain entities (Model) - Frontend: React components (View), custom hooks (Controller logic), TanStack Query cache (Model)
Why: Separation of concerns between data, business logic, and presentation.
Front Controller¶
Where: The API Gateway acts as a front controller, handling all incoming requests and routing them to the appropriate service. Traefik provides an additional front controller layer for TLS termination and routing.
Application Controller¶
Where: React Router in apps/web/src/router.tsx — centralized route definitions with lazy loading, route guards (ProtectedRoute, PermissionGatedRoute), and layout composition.
6.5 Distribution Patterns¶
Remote Facade¶
Where: Each NestJS controller exposes a coarse-grained REST API that aggregates multiple fine-grained domain operations:
- POST /api/v1/orders creates an order with line items, triggers orchestration, and returns a complete response
- POST /api/v1/mappings creates a mapping with assembly parts in a single call
Why: Reduces network round-trips between frontend and backend by batching operations into coarse-grained API calls.
Data Transfer Object¶
Where:
- libs/domain-contracts/src/api/ — API request/response DTOs
- libs/domain-contracts/src/lib/types.ts — shared DTO types
- NestJS ValidationPipe + class-validator decorators for runtime validation
- Swagger @ApiProperty decorators for documentation
DTOs defined:
- CreateOrderDto, OrderResponseDto, OrderQueryDto
- CreateProductMappingDto, ProductMappingResponseDto
- InternalCreatePrintJobDto, InternalCreateShipmentDto
- BullMQ event payloads: OrderCreatedEvent, PrintJobCompletedEvent
Why: DTOs decouple the internal domain model from the external API contract, allowing each to evolve independently.
6.6 Base Patterns¶
Gateway¶
Where:
- libs/service-common/src/lib/service-client/base-service-client.ts — gateway to downstream microservices
- apps/print-service/src/simplyprint/simplyprint-api.client.ts — gateway to SimplyPrint API
- apps/shipping-service/src/sendcloud/sendcloud.service.ts — gateway to SendCloud API
- apps/order-service/src/shopify/ — gateway to Shopify API
Why: Encapsulates access to external systems behind a well-defined interface, isolating the rest of the codebase from third-party API details.
Separated Interface¶
Where: libs/domain-contracts/src/ defines interfaces (IOrdersService, IPrintJobsService, IShipmentsService) in a separate library from their implementations. NestJS injection tokens (ORDERS_SERVICE, PRINT_JOBS_SERVICE) enable runtime binding.
Why: Services depend on interfaces, not implementations. This supports testability (mock implementations) and future architectural changes.
Value Object¶
Where:
- libs/domain/src/schemas/shipping-address.schema.ts — shipping address as a value object (validated via Zod)
- libs/domain/src/schemas/print-profile.schema.ts — print profile parameters
- libs/domain/src/schemas/event-metadata.schema.ts — event metadata
Why: Value objects are immutable, compared by value (not identity), and validated on construction. They prevent invalid data from entering the domain.
Special Case (Null Object)¶
Where: libs/domain/src/errors/ — typed domain errors act as special cases for failure scenarios:
- OrderErrors.notFound(id) — returns a 404 with error code ORDER_NOT_FOUND
- PrintJobErrors.invalidTransition(from, to) — returns a 422 with error code INVALID_STATUS_TRANSITION
- CommonErrors.unauthorized() — returns a 401 with error code UNAUTHORIZED
Why: Instead of throwing generic exceptions or returning null, typed errors carry context (code, message, HTTP status) that enable precise error handling upstream.
Registry¶
Where: NestJS IoC container acts as a service registry. Injection tokens (ORDERS_SERVICE, PRINT_JOBS_SERVICE) provide well-known lookup keys for service implementations.
Why: Services are located through the registry rather than direct instantiation, enabling dependency injection and testability.
Layer Supertype¶
Where:
- libs/domain/src/errors/base.error.ts — DomainError is the layer supertype for all domain errors
- libs/service-common/src/lib/service-client/base-service-client.ts — BaseServiceClient is the layer supertype for all HTTP service clients
Why: Common behavior (error formatting, HTTP request mechanics) is defined once in the supertype and inherited by all subtypes in that layer.
7. Additional Architectural Principles¶
7.1 Domain-Driven Design¶
DDD Concepts Applied:
| Concept | Implementation | Location |
|---|---|---|
| Bounded Context | Each service owns its domain logic | apps/*-service/ |
| Aggregate | Order → LineItem hierarchy | libs/domain/src/entities/ |
| Entity | Order, PrintJob, Shipment (identity-based) | libs/domain/src/entities/ |
| Value Object | ShippingAddress, PrintProfile (value-based) | libs/domain/src/schemas/ |
| Domain Event | ORDER_CREATED, PRINT_JOB_COMPLETED | libs/service-common/src/lib/events/ |
| Domain Service | OrchestrationService, FulfillmentService | apps/order-service/src/orchestration/ |
| Anti-Corruption Layer | Shopify/SimplyPrint/SendCloud adapters | apps/*-service/src/*/ |
| Shared Kernel | libs/domain, libs/domain-contracts |
libs/ |
| Ubiquitous Language | Consistent naming across services | All code |
7.2 Event-Driven Architecture¶
Characteristics:
- Event sourcing-like audit trail — EventLog table records all business events with metadata, enabling reconstruction of system state
- Eventual consistency — services are eventually consistent via async BullMQ events
- Exactly-once processing — BullMQ ensures each job is processed by exactly one worker
- Event replay potential — EventLog can be used to replay events for debugging or recovery
7.3 Progressive Web App Architecture¶
Where: apps/web/
| PWA Feature | Implementation | Location |
|---|---|---|
| Service Worker | Workbox with precaching + runtime caching | apps/web/src/sw.ts |
| Offline Support | IndexedDB for pending actions, online sync | apps/web/src/lib/indexed-db.ts |
| Push Notifications | Web Push API with VAPID keys | apps/web/src/hooks/use-push-notifications.ts |
| Install Prompt | beforeinstallprompt event handling |
apps/web/src/pwa/install-prompt.tsx |
| Background Sync | Pending action queue with online status detection | apps/web/src/hooks/use-online-status.ts |
| Pull to Refresh | Touch gesture handling for mobile | apps/web/src/pwa/pull-to-refresh.tsx |
| Update Flow | Service worker update detection and prompt | apps/web/src/pwa/sw-update-prompt.tsx |
7.4 Monorepo Architecture¶
Nx Features Used:
- Module boundaries — ESLint rules enforce import constraints
- Affected commands — Only rebuild/test changed projects
- Caching — Build and test outputs cached
- Task orchestration — depends on: ^build ensures correct build order
- Code generation — Nx generators for new libraries and components
8. Pattern Coverage Matrix¶
Microservices Patterns¶
| Pattern | Status | Notes |
|---|---|---|
| Microservice Architecture | Implemented | 5 services + gateway |
| Decompose by Business Capability | Implemented | Order, Print, Shipping, GridFlock, Platform |
| Decompose by Subdomain | Implemented | Core, supporting, generic subdomains |
| API Gateway | Implemented | NestJS gateway with proxy, auth, rate limiting |
| BFF | Implicit | Gateway tailored for PWA |
| Database per Service | Partial | Shared DB with logical isolation |
| Saga (Orchestration) | Implemented | Order fulfillment orchestrator |
| Domain Event | Implemented | 11 event types via BullMQ |
| API Composition | Implemented | Health aggregation |
| Transactional Outbox | Partial | EventLog + EventPublisher |
| RPI | Implemented | Typed HTTP service clients |
| Messaging | Implemented | BullMQ event bus |
| Idempotent Consumer | Implemented | Webhook deduplication |
| Circuit Breaker | Partial | Retry queue with backoff |
| Access Token | Implemented | Session + API key + user context headers |
| Log Aggregation | Implemented | Pino JSON → Dozzle |
| Distributed Tracing | Implemented | OpenTelemetry + Sentry |
| Exception Tracking | Implemented | Sentry |
| Health Check API | Implemented | @nestjs/terminus |
| Audit Logging | Implemented | AuditLog table |
| Service Instance per Container | Implemented | Docker per service |
| Microservice Chassis | Implemented | libs/service-common |
| Externalized Configuration | Implemented | .env + validation |
GoF Design Patterns¶
| Pattern | Status | Primary Location |
|---|---|---|
| Factory Method | Implemented | Test fixtures, error factories |
| Builder | Implemented | Repository query construction |
| Singleton | Implemented | NestJS DI, QueryClient |
| Facade | Implemented | Gateway, API client, orchestration |
| Proxy | Implemented | Gateway proxy middleware |
| Adapter | Implemented | External API clients |
| Decorator | Implemented | NestJS decorators |
| Composite | Implemented | NestJS module tree |
| Observer | Implemented | EventEmitter2, BullMQ, Socket.IO |
| Strategy | Implemented | Job matching, caching strategies |
| Template Method | Implemented | BaseServiceClient, DomainError |
| State | Implemented | Order/PrintJob state machines |
| Mediator | Implemented | OrchestrationService |
| Chain of Responsibility | Implemented | NestJS middleware/guard pipeline |
| Command | Implemented | BullMQ jobs |
Enterprise Application Patterns¶
| Pattern | Status | Primary Location |
|---|---|---|
| Service Layer | Implemented | All service classes |
| Domain Model | Implemented | libs/domain entities |
| Data Mapper | Implemented | Prisma ORM |
| Repository | Implemented | All repository classes |
| Identity Field | Implemented | CUID primary keys |
| Foreign Key Mapping | Implemented | Prisma relations |
| Embedded Value | Implemented | JSON columns |
| Single Table Inheritance | Implemented | RetryQueue with enums |
| MVC | Implemented | NestJS + React |
| Front Controller | Implemented | API Gateway |
| Application Controller | Implemented | React Router |
| Remote Facade | Implemented | Coarse-grained REST APIs |
| Data Transfer Object | Implemented | DTOs in domain-contracts |
| Gateway | Implemented | External API clients |
| Separated Interface | Implemented | domain-contracts lib |
| Value Object | Implemented | Zod schemas |
| Special Case | Implemented | Typed domain errors |
| Registry | Implemented | NestJS IoC container |
| Layer Supertype | Implemented | DomainError, BaseServiceClient |
9. Recommendations¶
9.1 Patterns to Strengthen¶
Database per Service¶
| Current state | All services share a single PostgreSQL instance via Prisma. Repositories scope queries to their bounded context tables, and tenant isolation is enforced via tenantId. |
| Recommendation | Extract to separate databases when horizontal scaling or independent deployment cadences are needed. |
How to implement:
- Phase 1 — Schema ownership boundaries. Each service already accesses a scoped set of tables. Formalize this by creating per-service Prisma schemas (
prisma/order.prisma,prisma/print.prisma, etc.) that only expose the tables owned by that service. Nx can generate the clients independently. - Phase 2 — Separate connection strings. Introduce per-service
DATABASE_URLenvironment variables (ORDER_DATABASE_URL,PRINT_DATABASE_URL). Initially they can all point to the same PostgreSQL instance — this validates that services do not join across boundaries. - Phase 3 — Physical separation. Create separate DigitalOcean Managed PostgreSQL databases. Migrate data using
pg_dump/pg_restoreper schema. Update Docker Compose and deployment scripts to provision separate connection secrets. - Cross-boundary reads that currently exist (e.g., Shipping Service reading order data) must be replaced with synchronous HTTP calls via
OrderServiceClientor denormalized into the local database via domain events.
Circuit Breaker¶
| Current state | External API resilience relies on the RetryQueueService (exponential backoff with jitter, max 5 retries). API clients for Shopify, SimplyPrint, and SendCloud have no circuit breaker — a prolonged outage will keep queuing retries and consuming resources. |
| Recommendation | Add a circuit breaker in front of each external API client using the opossum library. |
How to implement:
- Install
opossum—pnpm add opossum && pnpm add -D @types/opossum. - Create a shared circuit breaker factory in
libs/service-common/src/lib/circuit-breaker/:
import CircuitBreaker from 'opossum';
export interface CircuitBreakerOptions {
timeout: number;
errorThresholdPercentage: number;
resetTimeout: number;
volumeThreshold: number;
}
export function createCircuitBreaker<T>(
action: (...args: unknown[]) => Promise<T>,
options: CircuitBreakerOptions
): CircuitBreaker<unknown[], T> {
const breaker = new CircuitBreaker(action, {
timeout: options.timeout,
errorThresholdPercentage: options.errorThresholdPercentage,
resetTimeout: options.resetTimeout,
volumeThreshold: options.volumeThreshold,
});
breaker.on('open', () => logger.warn('Circuit breaker opened'));
breaker.on('halfOpen', () => logger.info('Circuit breaker half-open'));
breaker.on('close', () => logger.info('Circuit breaker closed'));
return breaker;
}
- Wrap external API clients. In
simplyprint-api.client.ts,sendcloud-api.client.ts, andshopify-api.client.ts, wrap the core HTTP methods with the circuit breaker:
this.breaker = createCircuitBreaker(
(url, config) => this.axios.get(url, config),
{
timeout: 30_000,
errorThresholdPercentage: 50,
resetTimeout: 60_000,
volumeThreshold: 5,
}
);
- Emit circuit breaker state changes to the
EventLogand Sentry so operators are notified when an integration is degraded. - Expose breaker state in health checks (
/health) — reportDEGRADEDwhen a circuit is open.
Transactional Outbox¶
| Current state | EventPublisherService listens to local EventEmitter2 events and publishes to BullMQ. The EventLog table records events for audit purposes. However, the database write and the BullMQ publish are not atomic — if the service crashes between the DB commit and the BullMQ publish, the event is lost. |
| Recommendation | Formalize a transactional outbox table with a polling publisher or change-data-capture relay. |
How to implement:
- Add an
OutboxEventtable to the Prisma schema:
model OutboxEvent {
id String @id @default(uuid())
eventType String
payload Json
published Boolean @default(false)
createdAt DateTime @default(now())
publishedAt DateTime?
@@index([published, createdAt])
}
- Write events in the same transaction as the domain operation. In service methods, use
prisma.$transaction():
await this.prisma.$transaction(async (tx) => {
const order = await tx.order.update({ ... });
await tx.outboxEvent.create({
data: {
eventType: 'ORDER_CREATED',
payload: orderCreatedPayload,
},
});
return order;
});
-
Create an
OutboxPublisherservice inlibs/service-common/that polls the outbox table on a short interval (e.g., every 2 seconds via@Cron), publishes unpublished events to BullMQ, and marks them as published. This replaces the currentEventEmitter2→ BullMQ bridge. -
Cleanup — add a cron job to delete published outbox events older than 7 days.
-
Future enhancement — replace polling with Postgres logical replication / change-data-capture (e.g., Debezium) for lower latency, aligning with the Transaction Log Tailing pattern.
CQRS¶
| Current state | Read and write operations share the same service layer and repository. DTOs implicitly separate the read model (response DTOs with computed fields) from the write model (create/update DTOs). The AnalyticsService already uses raw Prisma queries optimized for read aggregations separate from the CRUD operations. |
| Recommendation | Evaluate explicit CQRS when read/write performance requirements diverge — particularly for the dashboard, analytics, and order list views. |
How to implement:
- Start with the analytics domain which already has separate query paths. Extract
AnalyticsRepositoryqueries into a dedicated read model that can use materialized views or database replicas. - Introduce read-only repositories — create
OrderReadRepositoryalongsideOrdersRepository. The read repository uses optimized queries (select only needed columns, use joins for list views, leverage PostgreSQL indexes) while the write repository focuses on aggregate consistency. - Route read traffic to a replica. Add a
DATABASE_REPLICA_URLenvironment variable. Inject a secondPrismaServiceinstance configured with the replica URL. Read repositories use the replica; write repositories use the primary. - For the frontend, TanStack Query already separates reads (
useQuery) from writes (useMutation). ThestaleTime: 30sconfiguration means the UI tolerates eventual consistency — CQRS won't require frontend changes.
Consumer-Driven Contracts¶
| Current state | libs/domain-contracts/ defines TypeScript interfaces (IOrdersService, IPrintJobsService, etc.) and DTOs that provide compile-time contract enforcement. Services implement these interfaces and bind them via NestJS injection tokens (ORDERS_SERVICE, PRINT_JOBS_SERVICE). |
| Recommendation | Add runtime contract testing with Pact for deployment confidence, especially as services are deployed independently. |
How to implement:
- Install Pact —
pnpm add -D @pact-foundation/pact. - Consumer-side tests. In services that call other services (e.g., Order Service calling Print Service via
PrintServiceClient), write Pact consumer tests that define the expected API interactions:
describe('PrintServiceClient Pact', () => {
const provider = new PactV4({
consumer: 'order-service',
provider: 'print-service',
});
it('creates a print job', async () => {
await provider
.addInteraction()
.given('a valid product mapping exists')
.uponReceiving('a request to create a print job')
.withRequest('POST', '/api/v1/internal/print-jobs', ...)
.willRespondWith(201, ...)
.executeTest(async (mockServer) => {
const client = new PrintServiceClient(mockServer.url);
const result = await client.createPrintJob(dto);
expect(result.status).toBe('QUEUED');
});
});
});
- Provider-side verification. In Print Service, verify that the real API satisfies the pact generated by the consumer test. This can run in the CI pipeline after both services are built.
- Pact Broker. Deploy a Pact Broker (or use PactFlow) to store and share pacts between consumer and provider pipelines. Add a
can-i-deploycheck to the deployment script to prevent incompatible deployments. - Scope. Start with the internal HTTP APIs (
/api/v1/internal/*) since those are the service-to-service contracts. Webhook contracts (Shopify, SimplyPrint, SendCloud) are owned by third parties and should use consumer-side contract tests instead.
9.2 Patterns to Consider Adopting¶
Event Sourcing¶
| Current state | EventLog records business events with type, severity, metadata, and timestamps. Orders and print jobs are stored as current-state snapshots in their respective tables. |
| Rationale | Full event sourcing would enable state reconstruction, temporal queries ("what was the order state at 3pm?"), and simplified debugging of production issues. |
How to implement:
- Start with the Order aggregate — the most complex entity with the richest state machine. Create an
OrderEventtable:
model OrderEvent {
id String @id @default(uuid())
orderId String
eventType String
payload Json
version Int
createdAt DateTime @default(now())
createdBy String?
@@unique([orderId, version])
@@index([orderId])
}
- Append-only writes. Modify
OrdersServiceto append events rather than mutate state. Each state transition produces an event (OrderCreated,PrintJobAssigned,StatusChanged,OrderFulfilled). - Projection for current state. Build a projection that replays events to materialize the current
Orderrow. This can be a synchronous fold on read or an async projection updated by event handlers — the existingEventEmitter2infrastructure supports this. - Keep the existing snapshot table as a read-optimized projection. The
orderstable becomes a materialized view updated by event handlers. This avoids rewriting all read queries. - Temporal queries. With the event log, add an endpoint
GET /api/v1/orders/:id/historythat replays events up to a given timestamp to reconstruct past state. - Scope carefully. Event sourcing adds complexity. Only apply it to aggregates that genuinely need audit trails and temporal queries (Order, PrintJob). Simpler entities (ProductMapping, Printer) should remain CRUD.
Service Mesh¶
| Current state | Services communicate via direct HTTP calls (BaseServiceClient → http://order-service:3001) and BullMQ events. Inter-service authentication uses a shared INTERNAL_API_KEY header with timing-safe comparison. No mTLS, no service-level retries beyond the retry queue, no traffic shaping. |
| Rationale | As the service count grows, a mesh offloads cross-cutting communication concerns (mTLS, retries, circuit breaking, observability) from application code into infrastructure. |
How to implement:
- Prerequisite — Kubernetes migration. Service meshes (Istio, Linkerd) require a container orchestrator. The current Docker Compose deployment on a single DigitalOcean droplet would need to migrate to a managed Kubernetes cluster (DigitalOcean Kubernetes / DOKS).
- Start with Linkerd (lighter than Istio). Install Linkerd on the cluster and inject sidecar proxies into each service deployment. Linkerd automatically adds mTLS between services, removing the need for
INTERNAL_API_KEYvalidation. - Traffic policies. Define retry budgets and timeouts in Linkerd
ServiceProfileresources instead of application-level retry logic. This lets the mesh handle transient failures for synchronous HTTP calls, simplifyingBaseServiceClient. - Observability. Linkerd's built-in metrics (request rate, success rate, latency percentiles) complement the existing Sentry/OpenTelemetry setup. Grafana dashboards can consume Linkerd metrics via Prometheus.
- Incremental adoption. Inject sidecars one service at a time. The gateway can be the last to migrate since it already handles TLS termination via Traefik.
When to adopt: This is a significant infrastructure investment. Defer until the system needs horizontal scaling, independent deployment of services, or zero-trust networking between services.
Strangler Fig¶
| Current state | The Order Service currently contains 30 modules spanning order management, Shopify integration, orchestration, fulfillment, cancellation, analytics, notifications, retry queue, user management, audit logging, and multi-tenancy. It is the largest and most complex service. |
| Rationale | Extracting cohesive subsets into independent services would reduce the blast radius of changes and allow independent scaling. |
How to implement:
- Identify extraction candidates. The Order Service modules group into natural bounded contexts:
- Phase 1 — Extract the Integration Hub. The Shopify webhook handler, OAuth flow, and integration connection modules (
SimplyPrintConnection,SendcloudConnection) are self-contained. Create anintegration-servicethat owns Shopify webhook ingestion and publishesORDER_CREATEDevents. The gateway routes/api/v1/shopify/*to the new service. - Phase 2 — Extract Platform Services. Auth, Users, Tenancy, and Audit have no order-domain logic. Extract into a
platform-servicethat the gateway delegates authentication to. Other services call it for user/tenant resolution instead of reading the shared database. - Strangler routing. Update
apps/gateway/src/routing/route-config.tsto route extracted paths to the new services. The gateway's path-based routing table makes this a configuration change:
{ path: '/api/v1/shopify', target: 'http://integration-service:3005' },
{ path: '/api/v1/admin/users', target: 'http://platform-service:3006' },
- Data migration. Move owned tables (e.g.,
ShopifyShop,User,Role,Permission,AuditLog) to the new service's database using the Database per Service migration path described above. - Keep interfaces stable. The existing
IOrdersServiceand domain-contracts interfaces remain unchanged. The Order Service still publishes and consumes the same BullMQ events — only the source ofORDER_CREATEDshifts to the Integration Service.
Sidecar¶
| Current state | Observability (Sentry, OpenTelemetry, Pino logging) and security (API key validation, webhook HMAC verification) are implemented as NestJS modules within each service via libs/service-common and libs/observability. Each service container bundles all concerns. |
| Rationale | Sidecar containers could offload log collection, metrics export, and TLS termination from application code, simplifying the service chassis. |
How to implement (Docker Compose — no Kubernetes required):
- Log collection sidecar. Replace direct Pino stdout → Dozzle with a Fluent Bit sidecar per service that tails container logs, enriches them with service metadata, and forwards to the OpenTelemetry Collector (or ClickHouse directly). In
docker-compose.yml:
order-service-logs:
image: fluent/fluent-bit:latest
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers:ro
depends_on:
- order-service
environment:
- SERVICE_NAME=order-service
- Metrics sidecar. Add a Prometheus exporter sidecar that scrapes the NestJS
/metricsendpoint (exposed viaprom-client) and makes metrics available for Grafana. This separates metrics collection from the application runtime. - TLS sidecar (for service-to-service). Instead of implementing mTLS in application code, deploy an Envoy proxy sidecar alongside each service. Envoy handles TLS termination for incoming requests and TLS origination for outgoing calls. Services communicate over plaintext localhost; Envoy handles encryption on the wire. This is effectively a lightweight service mesh without Kubernetes.
- Incremental adoption. Start with the log collection sidecar (lowest risk, highest value). Metrics and TLS sidecars can follow as the operational complexity justifies them.
- Keep
libs/service-commonas fallback. The sidecar approach complements the chassis — services retain in-process error handling and domain logging while sidecars handle infrastructure-level concerns.
9.3 Prioritization & Implementation Order¶
Dependency Map¶
The nine recommendations are not independent — several have prerequisites or unlock others. The diagram below shows the dependency graph.
Dependency Analysis¶
| Recommendation | Depends On | Unlocks |
|---|---|---|
| Circuit Breaker | Nothing | Service Mesh (supersedes it) |
| Transactional Outbox | Nothing | Database per Service, Event Sourcing |
| Consumer-Driven Contracts | Nothing | Database per Service, Strangler Fig |
| Sidecar (Log Collection) | Nothing | Service Mesh (stepping stone) |
| Database per Service | Transactional Outbox, Consumer-Driven Contracts | Strangler Fig, CQRS |
| Strangler Fig | Consumer-Driven Contracts, Database per Service (Phase 1+) | — |
| CQRS | Database per Service (Phase 2+) | Event Sourcing (complementary) |
| Event Sourcing | Transactional Outbox, CQRS (recommended) | — |
| Service Mesh | Kubernetes migration, Sidecar (experience) | — |
Key dependency chains:
- Reliability chain: Circuit Breaker → (standalone, immediate value)
- Data integrity chain: Transactional Outbox → Database per Service → CQRS → Event Sourcing
- Decomposition chain: Consumer-Driven Contracts → Database per Service → Strangler Fig
- Infrastructure chain: Sidecar → Service Mesh
MoSCoW Prioritization¶
Must Have¶
These address active production risks and should be implemented regardless of scaling plans.
| # | Recommendation | Rationale | Effort | Risk if Deferred |
|---|---|---|---|---|
| 1 | Circuit Breaker | A prolonged Shopify, SimplyPrint, or SendCloud outage will exhaust retry queue resources and degrade the entire Order Service. The current retry queue only limits retries — it doesn't prevent the initial flood of failing calls. opossum can be integrated in 1-2 days with the shared factory approach. |
S | High — cascading failure during third-party outage |
| 2 | Transactional Outbox | The current EventEmitter2 → BullMQ bridge is not atomic with the database write. A crash between commit and publish silently drops events, causing orders to stall without any print jobs being created. The fix is a Prisma migration + one service refactor. |
M | High — silent event loss causes stuck orders |
Should Have¶
These improve development velocity and operational maturity but are not urgent production risks.
| # | Recommendation | Rationale | Effort | Risk if Deferred |
|---|---|---|---|---|
| 3 | Consumer-Driven Contracts | As services are deployed independently (potentially by different developers or at different cadences), compile-time TypeScript checks alone cannot guarantee runtime compatibility. Pact tests catch contract drift before deployment. This is a prerequisite for safely pursuing Database per Service and Strangler Fig. | M | Medium — contract drift during independent deploys |
| 4 | Sidecar (Log Collection) | The existing Dozzle/Pino setup works but doesn't support structured log routing, enrichment, or long-term retention. A Fluent Bit sidecar is low-risk and provides operational visibility improvements. It also builds familiarity with the sidecar pattern before committing to a full service mesh. | S | Low — operational inconvenience, not a system risk |
Could Have¶
These are valuable architectural improvements that become necessary at higher scale or complexity but are premature at current size.
| # | Recommendation | Rationale | Effort | Trigger to Implement |
|---|---|---|---|---|
| 5 | Database per Service | The shared database is adequate at current scale (5 services, single deployment). Physical separation becomes necessary when services need independent scaling, different database engines, or separate deployment lifecycles. Phase 1 (schema ownership) can be done early as a low-risk boundary validation exercise. | L | Service needs independent scaling or different DB engine |
| 6 | Strangler Fig | The Order Service's 30 modules are manageable while a small team owns the full codebase. Extraction becomes worthwhile when team boundaries form or the service's deployment frequency creates a bottleneck. | XL | Multiple teams, or Order Service deploy cadence becomes a bottleneck |
| 7 | CQRS | Read/write performance is adequate with the shared database. Explicit CQRS adds architectural complexity. Only justified when dashboard/analytics queries start impacting write-path performance, or when read traffic volume requires a replica. | L | Read queries impacting write latency, or need for read replicas |
Won't Have (for now)¶
These are architecturally sound but premature given the current deployment model, team size, and scale.
| # | Recommendation | Rationale | Effort | Trigger to Reconsider |
|---|---|---|---|---|
| 8 | Event Sourcing | Significant complexity increase (projections, snapshots, eventual consistency everywhere). The existing EventLog + state snapshots provide sufficient auditability. Only justified if regulatory requirements demand full state reconstruction, or if the Transactional Outbox + CQRS foundation is already in place. |
XL | Regulatory audit requirements, or Outbox + CQRS already implemented |
| 9 | Service Mesh | Requires a Kubernetes migration — a major infrastructure investment. The current Docker Compose on a single droplet with Traefik handles the traffic. A mesh is justified when there are 10+ services, zero-trust networking requirements, or multi-node deployments. | XXL | 10+ services, multi-node deployment, zero-trust requirement |
Recommended Implementation Sequence¶
| Phase | Timeframe | Recommendations | Rationale |
|---|---|---|---|
| 1 | Now | Circuit Breaker | Immediate production resilience. Independent, no prerequisites. Small effort (1-2 days). |
| 2 | Now + 1 week | Transactional Outbox | Fixes a data integrity gap. Independent, no prerequisites. Medium effort (3-5 days). |
| 3 | After Phase 2 | Consumer-Driven Contracts | Establishes runtime contract safety. Prerequisite for phases 5-6. Medium effort (1-2 weeks). |
| 4 | Anytime | Sidecar (Log Collection) | Opportunistic improvement. No dependencies on other phases. Small effort (1-2 days). |
| 5 | When scaling triggers | Database per Service (Phase 1-2) | Requires Outbox (Phase 2) and Contracts (Phase 3) in place. Start with schema ownership validation. |
| 6 | When team/complexity triggers | Strangler Fig | Requires Contracts (Phase 3) and DB boundaries (Phase 5). Begin with Integration Hub extraction. |
| 7 | When read performance triggers | CQRS | Requires DB per Service Phase 2+ (Phase 5). Start with analytics read model. |
| 8 | When regulatory/audit triggers | Event Sourcing | Requires Outbox (Phase 2) and ideally CQRS (Phase 7) as foundation. |
| 9 | When infra complexity triggers | Service Mesh | Requires Kubernetes migration. Defer until 10+ services or multi-node deployment. |
Key insight: Phases 1-4 (Must + Should) can all be implemented within the current architecture without disruptive changes. They improve production resilience, data integrity, and operational visibility while laying the foundation for Phases 5-9 if and when those become necessary. Phases 5-9 are triggered by specific scaling or organizational thresholds, not by a calendar.
Effort Legend¶
| Size | Estimate | Scope |
|---|---|---|
| S | 1-2 days | Single library or module change |
| M | 3-5 days | Cross-module change, one Prisma migration |
| L | 1-3 weeks | Multi-service change, infrastructure adjustments |
| XL | 3-6 weeks | Architectural change, data migration, new service |
| XXL | 2-3 months | Infrastructure platform change (e.g., Kubernetes) |
References¶
- Microservice Architecture Patterns — Chris Richardson
- Design Patterns Catalog — Refactoring.Guru
- The Twelve-Factor App — Adam Wiggins (Heroku)
- Patterns of Enterprise Application Architecture — Martin Fowler
- Domain-Driven Design — Eric Evans
- C4 Model — Simon Brown
- Building Microservices — Sam Newman