AI Prompt: Forma3D.Connect — Phase 6: Hardening (Production Readiness)¶

Purpose: This prompt instructs an AI to implement Phase 6 of Forma3D.Connect
Estimated Effort: 44 hours (~2 weeks)
Prerequisites: Phase 5k completed (All tech debt phases 5e-5k resolved)
Output: Production-ready system with comprehensive testing, monitoring, security hardening, and complete documentation
Status: ⏳ PENDING

🎯 Mission¶

You are continuing development of Forma3D.Connect, building on the Phase 5 foundation (including all tech debt resolution phases 5e-5k). Your task is to implement Phase 6: Hardening — completing the production readiness requirements to ensure the system is reliable, performant, secure, and fully documented.

Phase 6 delivers:

Comprehensive test suite with 80%+ coverage
Integration and E2E tests for critical paths
Performance and load testing (500+ orders/day capacity)
Production monitoring and alerting infrastructure
Complete technical documentation and runbooks
Security hardening and vulnerability remediation

Phase 6 ensures the system is ready for production workloads:

Testing → Monitoring → Documentation → Security → Production Ready ✅

📋 Phase 6 Context¶

What Was Built in Previous Phases¶

The complete automation system is already in place:

Phase 0: Foundation ✅
Nx monorepo with apps/api, apps/web, and shared libs
PostgreSQL database with Prisma schema
NestJS backend structure with modules, services, repositories
Azure DevOps CI/CD pipeline
Phase 1: Shopify Inbound ✅
Shopify webhooks receiver with HMAC verification
Order storage and status management
Product mapping CRUD operations
Event logging service
OpenAPI/Swagger documentation at /api/docs
Aikido Security Platform integration
Phase 1b: Observability ✅
Sentry error tracking and performance monitoring
OpenTelemetry-first architecture
Structured JSON logging with Pino and correlation IDs
React error boundaries
BusinessObservabilityService for state transition and flow tracking
Flow milestone tracking with timing (order automation cycle)
State change logging with old→new state transitions
Phase 1c: Staging Deployment ✅
Docker images with multi-stage builds
Traefik reverse proxy with Let's Encrypt TLS
Zero-downtime deployments via Docker Compose
Staging environment: https://staging-connect.forma3d.be
Phase 1d: Acceptance Testing ✅
Playwright + Gherkin acceptance tests
Given/When/Then scenarios for deployment verification
Azure DevOps pipeline integration
Phase 2: SimplyPrint Core ✅
SimplyPrint API client with HTTP Basic Auth
Automated print job creation from orders
Print job status monitoring (webhook + polling)
Order-job orchestration with order.ready-for-fulfillment event
Phase 3: Fulfillment Loop ✅
Automated Shopify fulfillment creation
Order cancellation handling
Retry queue with exponential backoff
Email notifications for critical failures
API key authentication for admin endpoints
Phase 4: Dashboard MVP ✅
React 19 dashboard with TanStack Query
Order management UI (list, detail, actions)
Product mapping configuration UI
Real-time updates via Socket.IO
Activity logs with filtering and export
Phase 5: Shipping Integration ✅
Sendcloud API client for shipping labels
Automated label generation on order completion
Tracking sync to Shopify fulfillments
Shipping management UI in dashboard
Phase 5b: Domain Boundaries ✅
Correlation ID infrastructure
Domain contracts library (libs/domain-contracts)
Repository encapsulation
Interface-based service dependencies
Phase 5c: Webhook Idempotency ✅
Database-backed webhook idempotency (TD-001 resolved)
Automated cleanup of expired records
Phase 5d: Frontend Tests ✅
Vitest configuration with React Testing Library
MSW API mocking layer
200 frontend tests (TD-002 resolved)
Phase 5e-5k: Tech Debt Resolution ✅ (assumed complete before Phase 6)
F5e: Typed JSON Schemas (TD-003)
F5f: Shared API Types (TD-004)
F5g: Structured Logging (TD-005)
F5h: Controller Tests (TD-006)
F5i: Domain Contract Cleanup (TD-007)
F5j: Typed Error Hierarchy (TD-008)
F5k: Configuration Externalization (TD-009)

What Phase 6 Builds¶

Feature	Description	Effort
F6.1: Comprehensive Testing	Achieve 80%+ coverage, E2E & load tests	16 hours
F6.2: Monitoring and Alerting	Health checks, alerting, metrics dashboard	8 hours
F6.3: Documentation	Complete technical docs, runbooks, guides	12 hours
F6.4: Security Hardening	Dependency scan, rate limiting, security audit	8 hours

🛠️ Tech Stack Reference¶

All technologies from previous phases remain. Additional packages for Phase 6:

Package	Purpose
`k6`	Load testing tool
`@nestjs/throttler`	Rate limiting for NestJS
`helmet`	Security headers middleware
`express-rate-limit`	Backup rate limiting (if needed)
`autocannon`	Alternative load testing
`clinic`	Node.js performance profiling

🏗️ Architecture Reference¶

Detailed Architecture Diagrams¶

📐 For detailed architecture, refer to the existing PlantUML diagrams:

Diagram Path Description

Context View docs/03-architecture/c4-model/1-context/C4_Context.puml System context diagram

Container View docs/03-architecture/c4-model/2-container/C4_Container.puml System containers and interactions

Component View docs/03-architecture/c4-model/3-component/C4_Component.puml Backend component architecture

Order State docs/03-architecture/state-machines/C4_Code_State_Order.puml Order status state machine

Domain Model docs/03-architecture/c4-model/4-code/C4_Code_DomainModel.puml Entity relationships

These PlantUML diagrams should be validated and updated as part of Phase 6.

Current System Health Endpoints¶

Endpoint	API	Web	Description
`/health`	✅	✅	Full health status with build info
`/health/live`	✅	✅	Simple liveness probe
`/health/ready`	✅	-	Readiness probe (checks database)

Phase 6 Focus Areas¶

┌──────────────────────────────────────────────────────────────────┐
│                    PHASE 6: HARDENING                            │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌────────────────┐  ┌────────────────┐  ┌────────────────┐     │
│  │   TESTING      │  │   MONITORING   │  │   SECURITY     │     │
│  │                │  │                │  │                │     │
│  │ • Unit 80%+    │  │ • Health checks│  │ • Dep scan     │     │
│  │ • Integration  │  │ • Alerting     │  │ • Rate limits  │     │
│  │ • E2E paths    │  │ • Metrics      │  │ • Headers      │     │
│  │ • Load 500/day │  │ • Runbooks     │  │ • Auth audit   │     │
│  └────────────────┘  └────────────────┘  └────────────────┘     │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                  DOCUMENTATION                          │    │
│  │                                                         │    │
│  │  • README complete       • API docs (Swagger)           │    │
│  │  • Architecture docs     • Troubleshooting guide        │    │
│  │  • Runbook operations    • Environment setup            │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
│                          ↓                                       │
│              ┌──────────────────────┐                            │
│              │  PRODUCTION READY ✅  │                            │
│              └──────────────────────┘                            │
└──────────────────────────────────────────────────────────────────┘

📁 Files to Create/Modify¶

Testing Infrastructure¶

apps/api/src/
├── **/__tests__/                      # Unit tests for all modules
│   ├── orders.controller.spec.ts      # Controller tests (if not done in 5h)
│   ├── print-jobs.controller.spec.ts
│   └── ...
│
├── test/
│   ├── integration/
│   │   ├── order-flow.spec.ts         # Order → Print → Fulfill flow
│   │   ├── cancellation-flow.spec.ts  # Cancellation handling
│   │   └── error-recovery.spec.ts     # Error recovery scenarios
│   └── e2e/
│       └── critical-paths.spec.ts     # E2E critical path tests

apps/web/src/
├── **/__tests__/                      # Additional component tests

load-tests/
├── k6/
│   ├── config.js                      # K6 configuration
│   ├── scenarios/
│   │   ├── order-throughput.js        # 500 orders/day simulation
│   │   ├── dashboard-load.js          # Dashboard concurrent users
│   │   └── webhook-burst.js           # Webhook burst handling
│   └── reports/                       # Generated reports

Monitoring Infrastructure¶

apps/api/src/
├── health/
│   ├── health.module.ts               # UPDATE: Add external service checks
│   ├── health.controller.ts           # UPDATE: Enhanced health endpoints
│   └── indicators/
│       ├── shopify.indicator.ts       # Shopify API health check
│       ├── simplyprint.indicator.ts   # SimplyPrint API health check
│       ├── sendcloud.indicator.ts     # Sendcloud API health check
│       └── database.indicator.ts      # Database health check

deployment/
├── monitoring/
│   ├── alerting-rules.yml             # Alert definitions
│   ├── runbook.md                     # Operations runbook
│   └── dashboard.json                 # Metrics dashboard config

Security Hardening¶

apps/api/src/
├── common/
│   ├── guards/
│   │   └── throttler.guard.ts         # Rate limiting guard
│   ├── middleware/
│   │   └── security-headers.middleware.ts  # Security headers
│   └── filters/
│       └── global-exception.filter.ts # UPDATE: Enhanced error handling

.github/workflows/
└── security-scan.yml                  # Dependency security scan workflow

Documentation¶

docs/
├── 04-development/
│   ├── runbook.md                     # NEW: Operations runbook
│   ├── troubleshooting.md             # NEW: Troubleshooting guide
│   └── environment-setup.md           # NEW: Environment setup guide
├── 03-architecture/
│   └── (validate and update all diagrams)
└── README.md                          # UPDATE: Complete project documentation

🔧 Feature F6.1: Comprehensive Testing¶

Requirements Reference¶

NFR-MA-002: Test Coverage (> 80%)
NFR-PE-001: Performance Requirements
Success Metric: 99% automation success rate

Implementation¶

1. Test Coverage Analysis¶

First, analyze current coverage and identify gaps:

# Generate coverage report for backend
pnpm nx test api --coverage

# Generate coverage report for frontend
pnpm nx test web --coverage

# Identify files with low coverage
# Target: 80%+ statements, functions, branches, lines

2. Integration Tests¶

Create apps/api/test/integration/order-flow.spec.ts:

import { Test, TestingModule } from '@nestjs/testing';
import { INestApplication } from '@nestjs/common';
import * as request from 'supertest';
import { AppModule } from '../../src/app.module';
import { PrismaService } from '../../src/database/prisma.service';
import { OrderStatus, PrintJobStatus } from '@prisma/client';

describe('Order Flow Integration (Integration)', () => {
  let app: INestApplication;
  let prisma: PrismaService;

  beforeAll(async () => {
    const moduleRef: TestingModule = await Test.createTestingModule({
      imports: [AppModule],
    }).compile();

    app = moduleRef.createNestApplication();
    prisma = moduleRef.get<PrismaService>(PrismaService);
    await app.init();
  });

  afterAll(async () => {
    await app.close();
  });

  beforeEach(async () => {
    // Clean up test data
    await prisma.eventLog.deleteMany();
    await prisma.printJob.deleteMany();
    await prisma.lineItem.deleteMany();
    await prisma.shipment.deleteMany();
    await prisma.order.deleteMany();
  });

  describe('Complete Order Flow', () => {
    it('should process order from creation to fulfillment', async () => {
      // 1. Create order via webhook simulation
      const webhookPayload = createMockShopifyOrderWebhook();

      const orderResponse = await request(app.getHttpServer())
        .post('/api/v1/webhooks/shopify')
        .set('X-Shopify-Topic', 'orders/create')
        .set('X-Shopify-Hmac-SHA256', calculateHmac(webhookPayload))
        .send(webhookPayload)
        .expect(200);

      expect(orderResponse.body.success).toBe(true);

      // 2. Verify order was created
      const order = await prisma.order.findFirst({
        where: { shopifyOrderId: webhookPayload.id.toString() },
        include: { lineItems: true, printJobs: true },
      });

      expect(order).toBeDefined();
      expect(order!.status).toBe(OrderStatus.PENDING);
      expect(order!.lineItems).toHaveLength(webhookPayload.line_items.length);

      // 3. Simulate print job completion
      for (const printJob of order!.printJobs) {
        await prisma.printJob.update({
          where: { id: printJob.id },
          data: { status: PrintJobStatus.COMPLETED },
        });
      }

      // 4. Trigger orchestration check
      // (In real flow this happens via events)

      // 5. Verify order is ready for fulfillment
      const updatedOrder = await prisma.order.findUnique({
        where: { id: order!.id },
      });

      // Order should be completed when all print jobs are done
      expect(updatedOrder!.status).toBe(OrderStatus.COMPLETED);
    });

    it('should handle order cancellation during printing', async () => {
      // Create order and start printing
      const order = await createTestOrder(prisma);

      // Simulate cancellation webhook
      const cancelWebhook = createMockCancellationWebhook(order.shopifyOrderId);

      await request(app.getHttpServer())
        .post('/api/v1/webhooks/shopify')
        .set('X-Shopify-Topic', 'orders/cancelled')
        .set('X-Shopify-Hmac-SHA256', calculateHmac(cancelWebhook))
        .send(cancelWebhook)
        .expect(200);

      // Verify order and print jobs are cancelled
      const cancelledOrder = await prisma.order.findUnique({
        where: { id: order.id },
        include: { printJobs: true },
      });

      expect(cancelledOrder!.status).toBe(OrderStatus.CANCELLED);
      cancelledOrder!.printJobs.forEach((job) => {
        expect([PrintJobStatus.CANCELLED, PrintJobStatus.COMPLETED]).toContain(job.status);
      });
    });
  });

  describe('Error Recovery', () => {
    it('should retry failed print jobs', async () => {
      // Create order with a print job that will fail
      const order = await createTestOrder(prisma);
      const printJob = order.printJobs[0];

      // Mark as failed
      await prisma.printJob.update({
        where: { id: printJob.id },
        data: { 
          status: PrintJobStatus.FAILED,
          errorMessage: 'Simulated failure',
        },
      });

      // Trigger retry via API
      await request(app.getHttpServer())
        .post(`/api/v1/print-jobs/${printJob.id}/retry`)
        .set('X-API-Key', process.env.API_KEY || 'test-key')
        .expect(200);

      // Verify job is queued again
      const retriedJob = await prisma.printJob.findUnique({
        where: { id: printJob.id },
      });

      expect(retriedJob!.status).toBe(PrintJobStatus.QUEUED);
    });
  });
});

// Helper functions
function createMockShopifyOrderWebhook() {
  return {
    id: Date.now(),
    order_number: 1001,
    email: 'test@example.com',
    total_price: '49.99',
    currency: 'EUR',
    shipping_address: {
      first_name: 'Test',
      last_name: 'Customer',
      address1: '123 Test St',
      city: 'Brussels',
      zip: '1000',
      country_code: 'BE',
    },
    line_items: [
      {
        id: Date.now(),
        variant_id: 12345,
        title: 'Test Product',
        quantity: 1,
        price: '49.99',
        sku: 'TEST-SKU-001',
      },
    ],
  };
}

function createMockCancellationWebhook(shopifyOrderId: string) {
  return {
    id: shopifyOrderId,
    cancelled_at: new Date().toISOString(),
  };
}

function calculateHmac(payload: object): string {
  const crypto = require('crypto');
  const secret = process.env.SHOPIFY_WEBHOOK_SECRET || 'test-secret';
  return crypto
    .createHmac('sha256', secret)
    .update(JSON.stringify(payload))
    .digest('base64');
}

async function createTestOrder(prisma: PrismaService) {
  // Create a test order with line items and print jobs
  return prisma.order.create({
    data: {
      shopifyOrderId: `test-${Date.now()}`,
      shopifyOrderNumber: 'TEST-1001',
      customerName: 'Test Customer',
      customerEmail: 'test@example.com',
      shippingAddress: {
        first_name: 'Test',
        last_name: 'Customer',
        address1: '123 Test St',
        city: 'Brussels',
        zip: '1000',
        country_code: 'BE',
      },
      totalPrice: 49.99,
      currency: 'EUR',
      status: OrderStatus.PROCESSING,
      lineItems: {
        create: [
          {
            shopifyLineItemId: `line-${Date.now()}`,
            shopifyVariantId: '12345',
            title: 'Test Product',
            quantity: 1,
            price: 49.99,
            sku: 'TEST-SKU-001',
          },
        ],
      },
      printJobs: {
        create: [
          {
            simplyPrintJobId: `sp-${Date.now()}`,
            status: PrintJobStatus.PRINTING,
          },
        ],
      },
    },
    include: {
      lineItems: true,
      printJobs: true,
    },
  });
}

3. Load Testing with K6¶

Create load-tests/k6/scenarios/order-throughput.js:

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

// Custom metrics
const orderCreationRate = new Rate('order_creation_success');
const orderCreationDuration = new Trend('order_creation_duration');

// Test configuration
export const options = {
  scenarios: {
    // Simulate 500 orders/day = ~21 orders/hour = ~0.35 orders/minute
    // But we want to test burst capacity too
    sustained_load: {
      executor: 'constant-arrival-rate',
      rate: 1, // 1 order per second (3600/hour for stress test)
      timeUnit: '1s',
      duration: '5m',
      preAllocatedVUs: 10,
      maxVUs: 50,
    },
    spike_test: {
      executor: 'ramping-arrival-rate',
      startRate: 0,
      timeUnit: '1s',
      preAllocatedVUs: 50,
      maxVUs: 100,
      stages: [
        { target: 10, duration: '30s' }, // Ramp up to 10/s
        { target: 10, duration: '1m' },  // Hold at 10/s
        { target: 50, duration: '30s' }, // Spike to 50/s
        { target: 50, duration: '30s' }, // Hold spike
        { target: 0, duration: '30s' },  // Ramp down
      ],
      startTime: '6m', // Start after sustained load
    },
  },
  thresholds: {
    'http_req_duration': ['p(95)<2000'], // 95th percentile < 2s
    'http_req_failed': ['rate<0.01'],    // Error rate < 1%
    'order_creation_success': ['rate>0.99'], // 99% success rate
  },
};

const BASE_URL = __ENV.API_URL || 'http://localhost:3000';
const API_KEY = __ENV.API_KEY || 'test-api-key';

// Webhook simulation (read-only for load test)
export default function () {
  // Test 1: Health check
  const healthRes = http.get(`${BASE_URL}/health`);
  check(healthRes, {
    'health check status is 200': (r) => r.status === 200,
  });

  // Test 2: Orders list (dashboard simulation)
  const ordersRes = http.get(`${BASE_URL}/api/v1/orders?page=1&pageSize=20`, {
    headers: { 'X-API-Key': API_KEY },
  });
  check(ordersRes, {
    'orders list status is 200': (r) => r.status === 200,
    'orders list has data': (r) => {
      const body = JSON.parse(r.body);
      return Array.isArray(body.orders);
    },
  });

  // Test 3: Single order detail
  const orderId = getRandomOrderId();
  if (orderId) {
    const orderDetailRes = http.get(`${BASE_URL}/api/v1/orders/${orderId}`, {
      headers: { 'X-API-Key': API_KEY },
    });
    check(orderDetailRes, {
      'order detail status is 200 or 404': (r) => [200, 404].includes(r.status),
    });
  }

  // Test 4: Shipping methods (unauthenticated)
  const shippingRes = http.get(`${BASE_URL}/api/v1/shipping/methods?country=BE`);
  check(shippingRes, {
    'shipping methods status is 200': (r) => r.status === 200,
  });

  sleep(0.1); // 100ms between iterations
}

function getRandomOrderId() {
  // In real test, fetch from a pool of known order IDs
  // For now, return null to skip order detail test
  return null;
}

Create load-tests/k6/config.js:

export const environments = {
  local: {
    baseUrl: 'http://localhost:3000',
    apiKey: 'dev-api-key',
  },
  staging: {
    baseUrl: 'https://staging-connect-api.forma3d.be',
    apiKey: '__STAGING_API_KEY__',
  },
};

Add load test scripts to package.json:

{
  "scripts": {
    "load-test:local": "k6 run --env API_URL=http://localhost:3000 load-tests/k6/scenarios/order-throughput.js",
    "load-test:staging": "k6 run --env API_URL=https://staging-connect-api.forma3d.be load-tests/k6/scenarios/order-throughput.js"
  }
}

4. E2E Critical Path Tests¶

Create apps/api/test/e2e/critical-paths.spec.ts:

/**
 * E2E Critical Path Tests
 * 
 * These tests verify the complete automation flow works end-to-end
 * against a real database (test environment).
 */
import { Test, TestingModule } from '@nestjs/testing';
import { INestApplication } from '@nestjs/common';
import { AppModule } from '../../src/app.module';

describe('Critical Paths E2E', () => {
  let app: INestApplication;

  beforeAll(async () => {
    const moduleRef: TestingModule = await Test.createTestingModule({
      imports: [AppModule],
    }).compile();

    app = moduleRef.createNestApplication();
    await app.init();
  });

  afterAll(async () => {
    await app.close();
  });

  describe('Order → Print Job → Fulfillment Path', () => {
    it('should complete full automation cycle', async () => {
      // This is a placeholder for a full E2E test
      // In a real scenario, this would:
      // 1. Create a Shopify order via webhook
      // 2. Verify print jobs are created
      // 3. Simulate print completion
      // 4. Verify shipping label is generated
      // 5. Verify Shopify fulfillment is created
      expect(true).toBe(true);
    });
  });

  describe('Cancellation Path', () => {
    it('should handle cancellation at any stage', async () => {
      expect(true).toBe(true);
    });
  });

  describe('Error Recovery Path', () => {
    it('should recover from transient failures', async () => {
      expect(true).toBe(true);
    });
  });
});

🔧 Feature F6.1b: Enhanced Business Observability¶

Overview¶

The BusinessObservabilityService provides comprehensive logging for business events, state transitions, and automation flow tracking. This service integrates with Sentry for structured business metrics and the EventLogService for persistent audit trails.

Key Features¶

State Transition Logging
Tracks old→new state with timing for orders, print jobs, shipments, and fulfillments
Includes correlation IDs for distributed tracing
Persists to EventLog for audit trail
Flow Milestone Tracking
Tracks order automation cycle from receipt to fulfillment
Measures elapsed time between milestones
Records flow completion/failure with total duration
Sentry Business Integration
Sets order/print job context for better error correlation
Adds breadcrumbs for state transitions
Captures flow completion/failure as Sentry events

Available Milestones¶

Milestone	Trigger Point
`order_received`	Flow starts (startFlow called)
`order_validated`	Order found and validated
`print_jobs_created`	Print jobs created for all line items
`all_jobs_printing`	All jobs transitioned to PRINTING
`all_jobs_completed`	All print jobs completed
`shipping_label_created`	Sendcloud label generated
`fulfillment_created`	Shopify fulfillment created
`flow_completed`	Full automation cycle completed
`flow_failed`	Automation failed at any point

Usage Example¶

// Start tracking a new order flow
this.businessObservability.startFlow(orderId);

// Set Sentry context
this.businessObservability.setOrderContext({
  id: order.id,
  shopifyOrderId: order.shopifyOrderId,
  shopifyOrderNumber: order.shopifyOrderNumber,
  status: order.status,
});

// Log state transition
await this.businessObservability.logStateTransition({
  entityType: 'order',
  entityId: orderId,
  orderId,
  previousState: OrderStatus.PENDING,
  newState: OrderStatus.PROCESSING,
  trigger: 'webhook_received',
});

// Record milestone
await this.businessObservability.recordMilestone({
  orderId,
  milestone: 'print_jobs_created',
  metadata: { jobCount: 3 },
});

// Complete flow (called automatically when flow_completed milestone is recorded)
await this.businessObservability.recordMilestone({
  orderId,
  milestone: 'flow_completed',
});

Log Output Examples¶

State Transition:

{
  "message": "[STATE CHANGE] order:order-123 PENDING → PROCESSING",
  "correlationId": "abc-123",
  "entityType": "order",
  "entityId": "order-123",
  "previousState": "PENDING",
  "newState": "PROCESSING",
  "trigger": "webhook_received"
}

Flow Completion:

{
  "message": "[FLOW COMPLETED] Order order-123 automation completed successfully",
  "correlationId": "abc-123",
  "orderId": "order-123",
  "success": true,
  "totalDurationMs": 125000,
  "totalDurationMinutes": 2.1,
  "milestones": {
    "order_received": 0,
    "order_validated": 50,
    "print_jobs_created": 200,
    "all_jobs_completed": 120000,
    "shipping_label_created": 122000,
    "fulfillment_created": 125000
  }
}

🔧 Feature F6.2: Monitoring and Alerting¶

Requirements Reference¶

NFR-AV-001: System Uptime (99%)
NFR-PE-003: Processing Latency (< 2 minutes)
Health check endpoints operational

Implementation¶

1. Enhanced Health Indicators¶

Create apps/api/src/health/indicators/shopify.indicator.ts:

import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { ShopifyApiClient } from '../../shopify/shopify-api.client';

@Injectable()
export class ShopifyHealthIndicator extends HealthIndicator {
  constructor(private readonly shopifyClient: ShopifyApiClient) {
    super();
  }

  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    try {
      // Make a lightweight API call to verify connectivity
      const isConnected = await this.shopifyClient.ping();

      if (isConnected) {
        return this.getStatus(key, true);
      }

      throw new HealthCheckError(
        'Shopify API check failed',
        this.getStatus(key, false, { message: 'Unable to connect to Shopify API' })
      );
    } catch (error) {
      throw new HealthCheckError(
        'Shopify API check failed',
        this.getStatus(key, false, { error: error.message })
      );
    }
  }
}

Create apps/api/src/health/indicators/simplyprint.indicator.ts:

import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { SimplyPrintApiClient } from '../../simplyprint/simplyprint-api.client';

@Injectable()
export class SimplyPrintHealthIndicator extends HealthIndicator {
  constructor(private readonly simplyPrintClient: SimplyPrintApiClient) {
    super();
  }

  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    try {
      const isConnected = await this.simplyPrintClient.ping();

      if (isConnected) {
        return this.getStatus(key, true);
      }

      throw new HealthCheckError(
        'SimplyPrint API check failed',
        this.getStatus(key, false)
      );
    } catch (error) {
      throw new HealthCheckError(
        'SimplyPrint API check failed',
        this.getStatus(key, false, { error: error.message })
      );
    }
  }
}

Create apps/api/src/health/indicators/sendcloud.indicator.ts:

import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { SendcloudApiClient } from '../../sendcloud/sendcloud-api.client';

@Injectable()
export class SendcloudHealthIndicator extends HealthIndicator {
  constructor(private readonly sendcloudClient: SendcloudApiClient) {
    super();
  }

  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    if (!this.sendcloudClient.isShippingEnabled()) {
      return this.getStatus(key, true, { status: 'disabled' });
    }

    try {
      // Get shipping methods as a health check
      await this.sendcloudClient.getShippingMethods();
      return this.getStatus(key, true);
    } catch (error) {
      throw new HealthCheckError(
        'Sendcloud API check failed',
        this.getStatus(key, false, { error: error.message })
      );
    }
  }
}

2. Update Health Controller¶

Update apps/api/src/health/health.controller.ts:

import { Controller, Get } from '@nestjs/common';
import { ApiTags, ApiOperation, ApiResponse } from '@nestjs/swagger';
import {
  HealthCheckService,
  HealthCheck,
  PrismaHealthIndicator,
  HealthCheckResult,
} from '@nestjs/terminus';
import { PrismaService } from '../database/prisma.service';
import { ShopifyHealthIndicator } from './indicators/shopify.indicator';
import { SimplyPrintHealthIndicator } from './indicators/simplyprint.indicator';
import { SendcloudHealthIndicator } from './indicators/sendcloud.indicator';

@ApiTags('Health')
@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly prismaHealth: PrismaHealthIndicator,
    private readonly prisma: PrismaService,
    private readonly shopifyHealth: ShopifyHealthIndicator,
    private readonly simplyPrintHealth: SimplyPrintHealthIndicator,
    private readonly sendcloudHealth: SendcloudHealthIndicator,
  ) {}

  @Get()
  @HealthCheck()
  @ApiOperation({ summary: 'Full health check with all dependencies' })
  @ApiResponse({ status: 200, description: 'System is healthy' })
  @ApiResponse({ status: 503, description: 'System is unhealthy' })
  async check(): Promise<HealthCheckResult> {
    return this.health.check([
      // Database
      () => this.prismaHealth.pingCheck('database', this.prisma),
      // External services
      () => this.shopifyHealth.isHealthy('shopify'),
      () => this.simplyPrintHealth.isHealthy('simplyprint'),
      () => this.sendcloudHealth.isHealthy('sendcloud'),
    ]);
  }

  @Get('live')
  @ApiOperation({ summary: 'Liveness probe - is the process running?' })
  @ApiResponse({ status: 200, description: 'Process is alive' })
  async liveness(): Promise<{ status: string; timestamp: string }> {
    return {
      status: 'ok',
      timestamp: new Date().toISOString(),
    };
  }

  @Get('ready')
  @HealthCheck()
  @ApiOperation({ summary: 'Readiness probe - is the service ready to accept traffic?' })
  @ApiResponse({ status: 200, description: 'Service is ready' })
  @ApiResponse({ status: 503, description: 'Service is not ready' })
  async readiness(): Promise<HealthCheckResult> {
    return this.health.check([
      () => this.prismaHealth.pingCheck('database', this.prisma),
    ]);
  }

  @Get('dependencies')
  @HealthCheck()
  @ApiOperation({ summary: 'Check all external service dependencies' })
  @ApiResponse({ status: 200, description: 'All dependencies healthy' })
  @ApiResponse({ status: 503, description: 'One or more dependencies unhealthy' })
  async dependencies(): Promise<HealthCheckResult> {
    return this.health.check([
      () => this.shopifyHealth.isHealthy('shopify'),
      () => this.simplyPrintHealth.isHealthy('simplyprint'),
      () => this.sendcloudHealth.isHealthy('sendcloud'),
    ]);
  }
}

3. Alerting Rules¶

Create deployment/monitoring/alerting-rules.yml:

# Alerting Rules for Forma3D.Connect
# Configure in your monitoring system (e.g., Prometheus AlertManager, Datadog, etc.)

groups:
  - name: forma3d-connect
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "Error rate is above 5% for the last 5 minutes"
          runbook_url: "docs/04-development/runbook.md#high-error-rate"

      # API latency
      - alert: HighAPILatency
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High API latency"
          description: "95th percentile latency is above 2 seconds"
          runbook_url: "docs/04-development/runbook.md#high-latency"

      # Database connection issues
      - alert: DatabaseConnectionFailed
        expr: pg_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Database connection failed"
          description: "Cannot connect to PostgreSQL database"
          runbook_url: "docs/04-development/runbook.md#database-connection"

      # External service failures
      - alert: ShopifyAPIDown
        expr: shopify_api_up == 0
        for: 5m
        labels:
          severity: high
        annotations:
          summary: "Shopify API unreachable"
          description: "Cannot connect to Shopify API for 5 minutes"
          runbook_url: "docs/04-development/runbook.md#shopify-down"

      - alert: SimplyPrintAPIDown
        expr: simplyprint_api_up == 0
        for: 5m
        labels:
          severity: high
        annotations:
          summary: "SimplyPrint API unreachable"
          description: "Cannot connect to SimplyPrint API for 5 minutes"
          runbook_url: "docs/04-development/runbook.md#simplyprint-down"

      # Order processing stuck
      - alert: OrdersStuckInProcessing
        expr: count(orders_status{status="processing", age_minutes > 60}) > 5
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Orders stuck in processing"
          description: "Multiple orders have been in processing state for over 60 minutes"
          runbook_url: "docs/04-development/runbook.md#stuck-orders"

      # Retry queue growing
      - alert: RetryQueueBacklog
        expr: retry_queue_size > 50
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Retry queue backlog growing"
          description: "More than 50 items in retry queue for over 15 minutes"
          runbook_url: "docs/04-development/runbook.md#retry-queue-backlog"

4. Operations Runbook¶

Create docs/04-development/runbook.md:

# Forma3D.Connect Operations Runbook

## Overview

This runbook provides procedures for operating and troubleshooting Forma3D.Connect in production.

## Table of Contents

1. [Service Architecture](#service-architecture)
2. [Health Checks](#health-checks)
3. [Common Issues and Resolutions](#common-issues)
4. [Incident Response](#incident-response)
5. [Maintenance Procedures](#maintenance-procedures)

---

## Service Architecture

### Components

| Component | URL | Purpose |
|-----------|-----|---------|
| API | `https://connect-api.forma3d.be` | Backend NestJS application |
| Web | `https://connect.forma3d.be` | React dashboard |
| Database | PostgreSQL (managed) | Data persistence |
| Traefik | Internal | Reverse proxy with TLS |

### External Dependencies

| Service | Purpose | Documentation |
|---------|---------|---------------|
| Shopify | E-commerce platform | [Shopify API Docs](https://shopify.dev/docs/api) |
| SimplyPrint | 3D print management | [SimplyPrint API](https://simplyprint.io/docs) |
| Sendcloud | Shipping labels | [Sendcloud API](https://api.sendcloud.dev) |
| Sentry | Error monitoring | [Sentry Dashboard](https://sentry.io) |

---

## Health Checks

### Endpoints

```bash
# Full health check
curl https://connect-api.forma3d.be/health

# Liveness probe
curl https://connect-api.forma3d.be/health/live

# Readiness probe
curl https://connect-api.forma3d.be/health/ready

# External dependencies
curl https://connect-api.forma3d.be/health/dependencies

Expected Responses¶

Healthy:

{
  "status": "ok",
  "info": {
    "database": { "status": "up" },
    "shopify": { "status": "up" },
    "simplyprint": { "status": "up" },
    "sendcloud": { "status": "up" }
  }
}

Unhealthy (example):

{
  "status": "error",
  "error": {
    "shopify": {
      "status": "down",
      "error": "Connection timeout"
    }
  }
}

Common Issues¶

High Error Rate¶

Symptoms: Error rate > 5%, Sentry alerts

Investigation: 1. Check Sentry for error patterns 2. Review API logs: docker logs forma3d-api --tail 100 3. Check database connectivity 4. Verify external service status

Resolution: 1. If database issue: Restart database connection pool 2. If external service: Enable fallback/degraded mode 3. If code bug: Deploy hotfix

High Latency¶

Symptoms: 95^th percentile response time > 2s

Investigation: 1. Check database query performance 2. Review slow query logs 3. Check external API response times 4. Monitor memory/CPU usage

Resolution: 1. Scale up resources if needed 2. Optimize slow queries 3. Add caching if appropriate

Database Connection¶

Symptoms: Database health check failing

Investigation: 1. Check PostgreSQL status 2. Verify connection string 3. Check network connectivity 4. Review connection pool settings

Resolution: 1. Restart API service: docker-compose restart api 2. Check database credentials 3. Contact database provider if managed

Shopify Down¶

Symptoms: Webhooks not being received, fulfillments failing

Investigation: 1. Check Shopify status page 2. Verify webhook configuration in Shopify admin 3. Review API logs for errors

Resolution: 1. Wait for Shopify to recover 2. Orders will be retried via retry queue 3. Manual reprocessing if needed

SimplyPrint Down¶

Symptoms: Print jobs not being created

Investigation: 1. Check SimplyPrint API status 2. Verify API credentials 3. Check SimplyPrint dashboard

Resolution: 1. Wait for service recovery 2. Failed jobs will be retried automatically 3. Manual retry: POST /api/v1/print-jobs/{id}/retry

Stuck Orders¶

Symptoms: Orders in PROCESSING state for > 60 minutes

Investigation: 1. Check print job status in dashboard 2. Verify SimplyPrint job status 3. Check for failed webhooks

Resolution: 1. Force refresh print job status 2. Manually update order status if needed 3. Contact SimplyPrint support if print issues

Retry Queue Backlog¶

Symptoms: > 50 items in retry queue

Investigation: 1. Check retry queue: GET /api/v1/admin/retry-queue 2. Identify failing job types 3. Check error messages

Resolution: 1. Fix underlying issue causing failures 2. Clear old/stale entries if safe 3. Increase retry queue processing capacity

Incident Response¶

Severity Levels¶

Level	Description	Response Time	Examples
P1 - Critical	Complete service outage	15 minutes	Database down, API unresponsive
P2 - High	Major feature broken	1 hour	Webhooks failing, fulfillments stuck
P3 - Medium	Degraded performance	4 hours	High latency, intermittent errors
P4 - Low	Minor issue	1 business day	UI bugs, documentation issues

Incident Template¶

## Incident Report

**Date:** YYYY-MM-DD
**Severity:** P1/P2/P3/P4
**Duration:** HH:MM - HH:MM
**Impact:** [Description of user impact]

### Timeline
- HH:MM - Issue detected
- HH:MM - Investigation started
- HH:MM - Root cause identified
- HH:MM - Fix deployed
- HH:MM - Issue resolved

### Root Cause
[Description of what caused the issue]

### Resolution
[What was done to fix it]

### Prevention
[What will be done to prevent recurrence]

Maintenance Procedures¶

Deploying Updates¶

# Pull latest changes
git pull origin main

# Build and push images
docker build -t forma3d-api:latest apps/api
docker push registry.digitalocean.com/forma3d/api:latest

# Deploy with zero downtime
docker-compose up -d --no-deps api

Database Migrations¶

# Run migrations
pnpm prisma migrate deploy

# Rollback (if needed)
pnpm prisma migrate resolve --rolled-back MIGRATION_NAME

Log Rotation¶

Logs are automatically rotated by Docker. Manual cleanup:

# Clear old logs
docker system prune --volumes

Backup Procedures¶

Database backups are handled by the managed PostgreSQL provider.

Manual backup:

pg_dump $DATABASE_URL > backup_$(date +%Y%m%d).sql

Keys & Certificates Inventory¶

IMPORTANT: Maintain an up-to-date inventory of all API keys, secrets, and certificates.

Create and maintain docs/04-development/keys-certificates-inventory.md:

Key/Certificate	Purpose	Lifespan	Renewal Location	Renewal Procedure	Last Renewed
— INFRASTRUCTURE —
Droplet SSH Key	SSH access to server	No expiry (rotate annually)	DigitalOcean → Settings → Security → SSH Keys	Generate new keypair, add to DO, update `~/.ssh/authorized_keys` on droplet	YYYY-MM-DD
Droplet Root Password	Emergency console access	No expiry (rotate annually)	DigitalOcean → Droplet → Access → Reset Root Password	Reset via DO console, store in password manager	YYYY-MM-DD
TLS Certificate (Let's Encrypt)	HTTPS for API/Web	90 days	Let's Encrypt via Traefik ACME	Auto-renewed by Traefik (see note below)	Auto
— DATABASE —
Database CA Certificate	SSL connection to managed DB	1-5 years (provider-managed)	DigitalOcean → Databases → Your DB → Connection Details → Download CA	Download new CA cert, update `?sslmode=require&sslrootcert=` path	YYYY-MM-DD
Database Password	PostgreSQL access	No expiry (rotate quarterly)	DigitalOcean → Databases → Your DB → Users	Reset via provider, update `DATABASE_URL`	YYYY-MM-DD
— CONTAINER REGISTRY —
Container Registry Token	Push/pull Docker images	No expiry	DigitalOcean → Container Registry → API	Generate new token, update CI/CD variables	YYYY-MM-DD
Cosign Signing Key	Container image signing	No expiry	Self-generated	`cosign generate-key-pair`, update CI/CD secrets	YYYY-MM-DD
— EXTERNAL SERVICES —
Shopify API Key	Shopify Admin API access	No expiry	Shopify Admin → Apps → Your App	Regenerate in Shopify admin, update `.env`	YYYY-MM-DD
Shopify API Secret	App authentication	No expiry	Shopify Admin → Apps → Your App	Regenerate in Shopify admin, update `.env`	YYYY-MM-DD
Shopify Access Token	Store-specific access	No expiry (unless revoked)	Shopify Admin → Apps	Reinstall app or regenerate	YYYY-MM-DD
Shopify Webhook Secret	HMAC verification	No expiry	Shopify Admin → Notifications → Webhooks	Regenerate in webhooks settings	YYYY-MM-DD
SimplyPrint API Key	Print farm API access	No expiry	SimplyPrint Dashboard → API Settings	Generate new key in dashboard	YYYY-MM-DD
SimplyPrint Webhook Token	Webhook verification	No expiry	SimplyPrint Dashboard → Webhooks	Configure in webhook settings	YYYY-MM-DD
Sendcloud Public Key	Shipping API authentication	No expiry	Sendcloud Panel → Settings → API	Generate new integration	YYYY-MM-DD
Sendcloud Secret Key	Shipping API authentication	No expiry	Sendcloud Panel → Settings → API	Generate new integration	YYYY-MM-DD
— APPLICATION —
API_KEY (internal)	Dashboard/Admin access	No expiry (rotate annually)	Self-generated	Generate new UUID, update `.env`	YYYY-MM-DD
Sentry DSN	Error tracking	No expiry	Sentry Dashboard → Project Settings	Create new project if needed	YYYY-MM-DD
SMTP Credentials	Email notifications	Varies by provider	Email provider dashboard	Regenerate password/API key	YYYY-MM-DD
— CI/CD (Azure DevOps) —
Azure DevOps PAT	Pipeline authentication	1 year max	Azure DevOps → User Settings → Personal Access Tokens	Generate new PAT, update service connections	YYYY-MM-DD
Service Connection (SSH)	Deploy to droplet	Tied to SSH key	Azure DevOps → Project Settings → Service Connections	Update with new SSH private key	YYYY-MM-DD
Pipeline Variables	Secrets in pipelines	No expiry	Azure DevOps → Pipelines → Library → Variable Groups	Update individual variables as needed	YYYY-MM-DD

Let's Encrypt TLS Certificate Auto-Renewal:

Traefik automatically handles Let's Encrypt certificate renewal. Key details: - Validity: 90 days per certificate - Auto-renewal: Traefik renews ~30 days before expiry (no manual action required) - Storage: Certificates stored in Docker volume traefik-certs at /letsencrypt/acme.json - Challenge: HTTP-01 challenge via port 80 (must remain accessible) - Configuration: See deployment/staging/traefik.yml → certificatesResolvers.letsencrypt

# traefik.yml - ACME auto-renewal configuration
certificatesResolvers:
  letsencrypt:
    acme:
      email: admin@forma3d.be          # Expiry notifications sent here
      storage: /letsencrypt/acme.json  # Persisted in Docker volume
      httpChallenge:
        entryPoint: web                # Port 80 for challenge

Monitoring TLS auto-renewal:

# Check certificate expiry date
echo | openssl s_client -connect staging-connect-api.forma3d.be:443 2>/dev/null | openssl x509 -noout -dates

# Check Traefik logs for renewal activity
docker logs forma3d-traefik 2>&1 | grep -i "acme\|certificate\|renew"

# Verify ACME storage file exists
docker exec forma3d-traefik ls -la /letsencrypt/acme.json

Troubleshooting failed renewal: 1. Ensure port 80 is open and reachable from internet 2. Check DNS still points to correct IP 3. Verify Traefik container is running: docker ps | grep traefik 4. Check Traefik logs for ACME errors 5. If needed, remove acme.json and restart Traefik to re-issue certificates

Renewal Calendar: - Weekly (automated): TLS certificates checked by Traefik, renewed if within 30 days of expiry - Monthly: Verify TLS auto-renewal is working (check cert dates) - Quarterly: Rotate database password, review API key usage - Annually: Rotate SSH keys, Azure DevOps PAT, internal API_KEY, review all external API keys - Before Expiry: Database CA certificate (monitor provider notifications), Azure DevOps PAT - On Incident: Rotate any potentially compromised credentials immediately

Monitoring Expiry: - Set calendar reminders for annually-rotated credentials - Subscribe to provider notifications (DigitalOcean, Azure DevOps) - Let's Encrypt sends expiry warnings to admin@forma3d.be (but Traefik should renew automatically) - Database CA cert expiry: openssl x509 -enddate -noout -in ca-certificate.crt

Renewal Checklist: 1. [ ] Generate new credential in source system 2. [ ] Update .env files (staging and production) 3. [ ] Update secrets in CI/CD (Azure DevOps variables) 4. [ ] Deploy with new credentials 5. [ ] Verify system still works (health checks) 6. [ ] Revoke old credential (if applicable) 7. [ ] Update this inventory with new "Last Renewed" date

Contact Information¶

Role	Contact	Escalation
On-call Engineer	[email/phone]	Primary
Tech Lead	[email/phone]	If P1/P2
Database Admin	[email/phone]	Database issues
--- ## 🔧 Feature F6.3: Documentation ### Requirements Reference - Complete README with setup instructions - API documentation (OpenAPI/Swagger) - Architecture documentation - Runbook for operations - Keys & certificates inventory (tabular overview of all API keys, secrets, certificates with lifespans and renewal procedures) ### Implementation #### 1. Environment Setup Guide Create `docs/04-development/environment-setup.md`: ```markdown # Environment Setup Guide ## Prerequisites - Node.js 20.x or higher - pnpm 8.x or higher - Docker and Docker Compose - PostgreSQL 15 (or use Docker) - Git ## Quick Start ### 1. Clone Repository ```bash git clone https://github.com/forma3d/forma3d-connect.git cd forma3d-connect

2. Install Dependencies¶

pnpm install

3. Configure Environment¶

Copy the example environment file:

cp .env.example .env

Edit .env with your configuration:

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/forma3d_connect

# Shopify
SHOPIFY_SHOP_DOMAIN=your-shop.myshopify.com
SHOPIFY_API_KEY=your-api-key
SHOPIFY_API_SECRET=your-api-secret
SHOPIFY_ACCESS_TOKEN=your-access-token
SHOPIFY_WEBHOOK_SECRET=your-webhook-secret

# SimplyPrint
SIMPLYPRINT_API_URL=https://api.simplyprint.io/v1
SIMPLYPRINT_API_KEY=your-api-key

# Sendcloud (optional)
SENDCLOUD_PUBLIC_KEY=your-public-key
SENDCLOUD_SECRET_KEY=your-secret-key
SHIPPING_ENABLED=true

# Application
API_KEY=your-admin-api-key
NODE_ENV=development
PORT=3000

4. Start Database¶

Using Docker:

docker-compose up -d postgres

Or use your local PostgreSQL installation.

5. Run Migrations¶

pnpm prisma migrate dev

6. Start Development Servers¶

# Start all services
pnpm dev

# Or start individually
pnpm nx serve api    # Backend on http://localhost:3000
pnpm nx serve web    # Frontend on http://localhost:4200

7. Verify Installation¶

# Check API health
curl http://localhost:3000/health

# Access Swagger docs
open http://localhost:3000/api/docs

# Access dashboard
open http://localhost:4200

Docker Development¶

Build and run everything in Docker:

docker-compose up -d

Testing¶

# Run all tests
pnpm test

# Run with coverage
pnpm test:coverage

# Run E2E tests
pnpm e2e

Troubleshooting¶

Database Connection Issues¶

Verify PostgreSQL is running
Check DATABASE_URL format
Ensure database exists: createdb forma3d_connect

Port Conflicts¶

If port 3000 is in use:

PORT=3001 pnpm nx serve api

Prisma Issues¶

Regenerate Prisma client:

pnpm prisma generate

Reset database:

pnpm prisma migrate reset

#### 2. Troubleshooting Guide

Create `docs/04-development/troubleshooting.md`:

```markdown
# Troubleshooting Guide

## Common Issues

### Build Errors

#### "Cannot find module '@forma3d/...'"

**Cause:** Library not built or missing from node_modules

**Solution:**
```bash
pnpm install
pnpm nx run-many --target=build --all

TypeScript compilation errors¶

Cause: Type mismatches or outdated types

Solution:

pnpm prisma generate  # Regenerate Prisma types
pnpm nx reset         # Clear Nx cache
pnpm install          # Reinstall dependencies

Runtime Errors¶

"ECONNREFUSED" to database¶

Cause: Database not running or wrong connection string

Solution: 1. Start database: docker-compose up -d postgres 2. Verify DATABASE_URL in .env 3. Check network connectivity

"Invalid API key" from external services¶

Cause: Missing or incorrect API credentials

Solution: 1. Verify credentials in .env 2. Check for extra spaces or newlines 3. Regenerate API keys if needed

Webhook Issues¶

Shopify webhooks not arriving¶

Cause: Webhook URL not accessible or HMAC validation failing

Solution: 1. Use ngrok for local development: ngrok http 3000 2. Update webhook URL in Shopify admin 3. Verify SHOPIFY_WEBHOOK_SECRET

SimplyPrint webhooks failing¶

Cause: Token mismatch or network issues

Solution: 1. Verify SIMPLYPRINT_WEBHOOK_TOKEN 2. Check firewall/security group rules 3. Review SimplyPrint webhook logs

Performance Issues¶

Slow API responses¶

Cause: Database queries not optimized

Solution: 1. Enable query logging in Prisma 2. Add missing indexes 3. Use pagination for large datasets

Memory issues¶

Cause: Memory leaks or insufficient resources

Solution: 1. Monitor with docker stats 2. Increase container memory limits 3. Review for memory leaks in code

Testing Issues¶

Tests timing out¶

Cause: Async operations not completing

Solution: 1. Increase Jest timeout 2. Check for unresolved promises 3. Verify test database is accessible

MSW not intercepting requests¶

Cause: Handler not matching request

Solution: 1. Check handler URL patterns 2. Verify request method (GET/POST) 3. Add console.log to handler to debug

Logs and Debugging¶

View API Logs¶

# Development
pnpm nx serve api --verbose

# Docker
docker logs forma3d-api -f --tail 100

# Staging
ssh staging 'docker logs forma3d-api'

Enable Debug Mode¶

DEBUG=forma3d:*
LOG_LEVEL=debug

Prisma Query Logging¶

DEBUG=prisma:query

Getting Help¶

Check this troubleshooting guide
Search existing GitHub issues
Review Sentry for similar errors
Ask in team Slack channel

Create a GitHub issue with reproduction steps

#### 3. Update README.md

Update the main README with comprehensive documentation covering:

- Project overview
- Features list
- Quick start guide
- Architecture overview
- API documentation link
- Development setup
- Testing instructions
- Deployment guide
- Contributing guidelines

---

## 🔧 Feature F6.4: Security Hardening

### Requirements Reference

- NFR-SE-001: Secure Credential Storage
- NFR-SE-002: Webhook Verification
- Security scan passing

### Implementation

#### 1. Rate Limiting

Create `apps/api/src/common/guards/throttler.guard.ts`:

```typescript
import { Injectable, ExecutionContext } from '@nestjs/common';
import { ThrottlerGuard } from '@nestjs/throttler';

@Injectable()
export class CustomThrottlerGuard extends ThrottlerGuard {
  protected async getTracker(req: Record<string, unknown>): Promise<string> {
    // Use X-Forwarded-For header when behind proxy
    const forwarded = req.headers?.['x-forwarded-for'] as string;
    if (forwarded) {
      return forwarded.split(',')[0].trim();
    }
    return req.ip as string;
  }

  protected async shouldSkip(context: ExecutionContext): Promise<boolean> {
    const request = context.switchToHttp().getRequest();

    // Skip rate limiting for health checks
    if (request.url.startsWith('/health')) {
      return true;
    }

    return false;
  }
}

Update apps/api/src/app.module.ts:

import { ThrottlerModule } from '@nestjs/throttler';

@Module({
  imports: [
    // ... existing imports
    ThrottlerModule.forRoot([
      {
        name: 'short',
        ttl: 1000,     // 1 second
        limit: 10,      // 10 requests per second
      },
      {
        name: 'medium',
        ttl: 10000,    // 10 seconds
        limit: 50,      // 50 requests per 10 seconds
      },
      {
        name: 'long',
        ttl: 60000,    // 1 minute
        limit: 200,     // 200 requests per minute
      },
    ]),
  ],
  providers: [
    {
      provide: APP_GUARD,
      useClass: CustomThrottlerGuard,
    },
  ],
})
export class AppModule {}

2. Security Headers¶

Create apps/api/src/common/middleware/security-headers.middleware.ts:

import { Injectable, NestMiddleware } from '@nestjs/common';
import { Request, Response, NextFunction } from 'express';
import helmet from 'helmet';

@Injectable()
export class SecurityHeadersMiddleware implements NestMiddleware {
  private readonly helmet = helmet({
    contentSecurityPolicy: {
      directives: {
        defaultSrc: ["'self'"],
        styleSrc: ["'self'", "'unsafe-inline'"],
        scriptSrc: ["'self'"],
        imgSrc: ["'self'", 'data:', 'https:'],
        connectSrc: ["'self'", 'https://api.sentry.io'],
        frameSrc: ["'none'"],
        objectSrc: ["'none'"],
      },
    },
    crossOriginEmbedderPolicy: false, // Required for Swagger UI
    crossOriginOpenerPolicy: { policy: 'same-origin-allow-popups' },
    crossOriginResourcePolicy: { policy: 'cross-origin' },
    hsts: {
      maxAge: 31536000,
      includeSubDomains: true,
      preload: true,
    },
    noSniff: true,
    referrerPolicy: { policy: 'strict-origin-when-cross-origin' },
    xssFilter: true,
  });

  use(req: Request, res: Response, next: NextFunction) {
    this.helmet(req, res, next);
  }
}

3. Dependency Security Scan¶

Create .github/workflows/security-scan.yml:

name: Security Scan

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 0 * * 1' # Weekly on Monday

jobs:
  dependency-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install pnpm
        uses: pnpm/action-setup@v2
        with:
          version: 8

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Run npm audit
        run: pnpm audit --audit-level=high
        continue-on-error: true

      - name: Run Snyk scan
        uses: snyk/actions/node@master
        continue-on-error: true
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

      - name: Upload Snyk report
        uses: github/codeql-action/upload-sarif@v2
        if: always()
        with:
          sarif_file: snyk.sarif
        continue-on-error: true

  code-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Initialize CodeQL
        uses: github/codeql-action/init@v2
        with:
          languages: javascript, typescript

      - name: Perform CodeQL Analysis
        uses: github/codeql-action/analyze@v2

4. Authentication Audit¶

Create a checklist for security review:

# Security Audit Checklist

## Authentication & Authorization

- [ ] API key authentication implemented for admin endpoints
- [ ] API keys are stored hashed in database (if applicable)
- [ ] Rate limiting applied to authentication endpoints
- [ ] Failed authentication attempts are logged

## Webhook Security

- [ ] Shopify webhooks use HMAC verification
- [ ] SimplyPrint webhooks use token verification
- [ ] Sendcloud webhooks verified (if applicable)
- [ ] Webhook idempotency implemented

## Data Protection

- [ ] Sensitive data is not logged (passwords, tokens, etc.)
- [ ] Database credentials are not in code
- [ ] API keys are stored in environment variables
- [ ] HTTPS enforced in production

## Input Validation

- [ ] All DTOs have validation decorators
- [ ] JSON payloads are validated at boundaries
- [ ] SQL injection prevented via Prisma
- [ ] XSS prevented via proper output encoding

## Dependencies

- [ ] No critical vulnerabilities in dependencies
- [ ] Dependencies are up to date
- [ ] Lock file is committed

## Infrastructure

- [ ] TLS certificates valid and auto-renewed
- [ ] Security headers configured
- [ ] CORS properly configured
- [ ] Firewall rules reviewed

🧪 Testing Requirements¶

Test Coverage Requirements¶

Per requirements.md (NFR-MA-002):

Unit Tests: > 80% coverage for all services
Integration Tests: All API integrations tested
E2E Tests: Critical paths covered
Load Tests: 500+ orders/day capacity verified

Unit Test Scenarios Required¶

Category	Scenario	Priority
Health Indicators	Shopify health check succeeds	High
Health Indicators	SimplyPrint health check succeeds	High
Health Indicators	Sendcloud health check (disabled)	Medium
Rate Limiting	Requests are throttled correctly	High
Rate Limiting	Health endpoints bypass throttling	Medium
Security Headers	All required headers present	High

Load Test Scenarios¶

Scenario	Description	Target
Sustained Load	1 request/second for 5 minutes	< 1% errors
Spike Test	Ramp to 50 requests/second	< 5% errors
Order Throughput	500 orders/day simulation	100% success
Dashboard Load	10 concurrent dashboard users	< 2s latency

✅ Validation Checklist¶

Infrastructure¶

Testing (F6.1)¶

Unit test coverage > 80% for backend
Unit test coverage > 60% for frontend
Integration tests for order flow
Integration tests for cancellation flow
Integration tests for error recovery
E2E critical path tests passing
Load tests pass with 500 orders/day
Performance meets latency requirements (< 2s)

Monitoring (F6.2)¶

Health check endpoints working
Shopify health indicator implemented
SimplyPrint health indicator implemented
Sendcloud health indicator implemented
Alerting rules defined
Runbook complete

Documentation (F6.3)¶

README.md complete and up to date
Environment setup guide complete
Troubleshooting guide complete
Runbook complete
Keys & certificates inventory complete (with renewal procedures)
API documentation (Swagger) complete
Architecture diagrams validated

Security (F6.4)¶

🚫 Constraints and Rules¶

MUST DO¶

Achieve 80%+ test coverage for backend
Implement load testing for 500+ orders/day
Add health indicators for all external services
Create complete runbook for operations
Configure security headers and rate limiting
Pass dependency security scan
Update all documentation

MUST NOT¶

Skip writing tests to save time
Deploy without load testing
Leave security vulnerabilities unaddressed
Commit hardcoded credentials
Deploy without complete documentation
Skip security audit checklist

🎬 Execution Order¶

Testing (F6.1)¶

Analyze current test coverage and identify gaps
Write missing unit tests to reach 80%+ coverage
Create integration tests for order flow
Create integration tests for cancellation flow
Create integration tests for error recovery
Set up K6 load testing infrastructure
Create load test scenarios for throughput testing
Run load tests and document results
Optimize performance based on load test results

Monitoring (F6.2)¶

Create health indicators for external services
Update health controller with enhanced endpoints
Define alerting rules for monitoring
Create operations runbook with procedures

Documentation (F6.3)¶

Create environment setup guide
Create troubleshooting guide
Update README.md with complete documentation
Validate and update architecture diagrams

Security (F6.4)¶

Implement rate limiting with @nestjs/throttler
Configure security headers with helmet
Create security scan workflow for CI
Complete security audit checklist

Validation¶

Run all tests and verify coverage
Run load tests against staging
Verify all health endpoints work
Complete security audit
Final documentation review

📊 Expected Output¶

When Phase 6 is complete:

Verification Commands¶

# Run all tests with coverage
pnpm test:coverage

# Expected output: > 80% coverage

# Run load tests
pnpm load-test:staging

# Expected output: All thresholds passed

# Check health endpoints
curl https://staging-connect-api.forma3d.be/health
curl https://staging-connect-api.forma3d.be/health/dependencies

# Run security scan
pnpm audit

# Expected output: 0 high/critical vulnerabilities

Success Metrics¶

Metric	Target	Verification
Unit test coverage	> 80%	`pnpm test:coverage`
Integration tests	All passing	`pnpm test:integration`
Load test (orders/day)	500+	K6 load test results
API latency (p95)	< 2 seconds	K6 metrics
Error rate under load	< 1%	K6 metrics
Security vulnerabilities	0 critical/high	`pnpm audit`
Documentation	100% complete	Manual review

📝 Documentation Updates¶

CRITICAL: All documentation must be updated to reflect Phase 6 completion.

docs/04-development/implementation-plan.md Updates Required¶

Update the implementation plan to mark Phase 6 as complete:

Mark F6.1 (Comprehensive Testing) as ✅ Completed
Mark F6.2 (Monitoring and Alerting) as ✅ Completed
Mark F6.3 (Documentation) as ✅ Completed
Mark F6.4 (Security Hardening) as ✅ Completed
Update Phase 6 Exit Criteria with checkmarks
Add implementation notes and component paths
Update revision history with completion date

Additional Documentation¶

Update README.md with complete project documentation
Create docs/04-development/runbook.md
Create docs/04-development/troubleshooting.md
Create docs/04-development/environment-setup.md
Create docs/04-development/keys-certificates-inventory.md
Validate all architecture diagrams

🔗 Phase 6 Exit Criteria¶

From implementation-plan.md:

Additional Exit Criteria¶

All health indicators implemented and working
Alerting rules defined
Operations runbook complete
Rate limiting configured
Security headers configured
Dependency scan passing
Security audit checklist complete

🎉 Production Readiness¶

With Phase 6 complete, Forma3D.Connect is ready for production:

Production Checklist¶

Go-Live Steps¶

Final security review
Update DNS for production domains
Configure production environment variables
Deploy to production
Verify health checks
Monitor for first 24 hours
Announce go-live

END OF PROMPT

This prompt concludes the Forma3D.Connect implementation phases. The AI should implement all Phase 6 hardening features to ensure the system is production-ready with comprehensive testing, monitoring, documentation, and security. After Phase 6, the system achieves full production readiness.

Diagram	Path	Description
Context View	`docs/03-architecture/c4-model/1-context/C4_Context.puml`	System context diagram
Container View	`docs/03-architecture/c4-model/2-container/C4_Container.puml`	System containers and interactions
Component View	`docs/03-architecture/c4-model/3-component/C4_Component.puml`	Backend component architecture
Order State	`docs/03-architecture/state-machines/C4_Code_State_Order.puml`	Order status state machine
Domain Model	`docs/03-architecture/c4-model/4-code/C4_Code_DomainModel.puml`	Entity relationships