Skip to content

AI Prompt: Forma3D.Connect — Phase 6: Hardening (Production Readiness)

Purpose: This prompt instructs an AI to implement Phase 6 of Forma3D.Connect
Estimated Effort: 44 hours (~2 weeks)
Prerequisites: Phase 5k completed (All tech debt phases 5e-5k resolved)
Output: Production-ready system with comprehensive testing, monitoring, security hardening, and complete documentation
Status:PENDING


🎯 Mission

You are continuing development of Forma3D.Connect, building on the Phase 5 foundation (including all tech debt resolution phases 5e-5k). Your task is to implement Phase 6: Hardening — completing the production readiness requirements to ensure the system is reliable, performant, secure, and fully documented.

Phase 6 delivers:

  • Comprehensive test suite with 80%+ coverage
  • Integration and E2E tests for critical paths
  • Performance and load testing (500+ orders/day capacity)
  • Production monitoring and alerting infrastructure
  • Complete technical documentation and runbooks
  • Security hardening and vulnerability remediation

Phase 6 ensures the system is ready for production workloads:

Testing → Monitoring → Documentation → Security → Production Ready ✅

📋 Phase 6 Context

What Was Built in Previous Phases

The complete automation system is already in place:

  • Phase 0: Foundation
  • Nx monorepo with apps/api, apps/web, and shared libs
  • PostgreSQL database with Prisma schema
  • NestJS backend structure with modules, services, repositories
  • Azure DevOps CI/CD pipeline

  • Phase 1: Shopify Inbound

  • Shopify webhooks receiver with HMAC verification
  • Order storage and status management
  • Product mapping CRUD operations
  • Event logging service
  • OpenAPI/Swagger documentation at /api/docs
  • Aikido Security Platform integration

  • Phase 1b: Observability

  • Sentry error tracking and performance monitoring
  • OpenTelemetry-first architecture
  • Structured JSON logging with Pino and correlation IDs
  • React error boundaries
  • BusinessObservabilityService for state transition and flow tracking
  • Flow milestone tracking with timing (order automation cycle)
  • State change logging with old→new state transitions

  • Phase 1c: Staging Deployment

  • Docker images with multi-stage builds
  • Traefik reverse proxy with Let's Encrypt TLS
  • Zero-downtime deployments via Docker Compose
  • Staging environment: https://staging-connect.forma3d.be

  • Phase 1d: Acceptance Testing

  • Playwright + Gherkin acceptance tests
  • Given/When/Then scenarios for deployment verification
  • Azure DevOps pipeline integration

  • Phase 2: SimplyPrint Core

  • SimplyPrint API client with HTTP Basic Auth
  • Automated print job creation from orders
  • Print job status monitoring (webhook + polling)
  • Order-job orchestration with order.ready-for-fulfillment event

  • Phase 3: Fulfillment Loop

  • Automated Shopify fulfillment creation
  • Order cancellation handling
  • Retry queue with exponential backoff
  • Email notifications for critical failures
  • API key authentication for admin endpoints

  • Phase 4: Dashboard MVP

  • React 19 dashboard with TanStack Query
  • Order management UI (list, detail, actions)
  • Product mapping configuration UI
  • Real-time updates via Socket.IO
  • Activity logs with filtering and export

  • Phase 5: Shipping Integration

  • Sendcloud API client for shipping labels
  • Automated label generation on order completion
  • Tracking sync to Shopify fulfillments
  • Shipping management UI in dashboard

  • Phase 5b: Domain Boundaries

  • Correlation ID infrastructure
  • Domain contracts library (libs/domain-contracts)
  • Repository encapsulation
  • Interface-based service dependencies

  • Phase 5c: Webhook Idempotency

  • Database-backed webhook idempotency (TD-001 resolved)
  • Automated cleanup of expired records

  • Phase 5d: Frontend Tests

  • Vitest configuration with React Testing Library
  • MSW API mocking layer
  • 200 frontend tests (TD-002 resolved)

  • Phase 5e-5k: Tech Debt Resolution ✅ (assumed complete before Phase 6)

  • F5e: Typed JSON Schemas (TD-003)
  • F5f: Shared API Types (TD-004)
  • F5g: Structured Logging (TD-005)
  • F5h: Controller Tests (TD-006)
  • F5i: Domain Contract Cleanup (TD-007)
  • F5j: Typed Error Hierarchy (TD-008)
  • F5k: Configuration Externalization (TD-009)

What Phase 6 Builds

Feature Description Effort
F6.1: Comprehensive Testing Achieve 80%+ coverage, E2E & load tests 16 hours
F6.2: Monitoring and Alerting Health checks, alerting, metrics dashboard 8 hours
F6.3: Documentation Complete technical docs, runbooks, guides 12 hours
F6.4: Security Hardening Dependency scan, rate limiting, security audit 8 hours

🛠️ Tech Stack Reference

All technologies from previous phases remain. Additional packages for Phase 6:

Package Purpose
k6 Load testing tool
@nestjs/throttler Rate limiting for NestJS
helmet Security headers middleware
express-rate-limit Backup rate limiting (if needed)
autocannon Alternative load testing
clinic Node.js performance profiling

🏗️ Architecture Reference

Detailed Architecture Diagrams

📐 For detailed architecture, refer to the existing PlantUML diagrams:

Diagram Path Description
Context View docs/03-architecture/c4-model/1-context/C4_Context.puml System context diagram
Container View docs/03-architecture/c4-model/2-container/C4_Container.puml System containers and interactions
Component View docs/03-architecture/c4-model/3-component/C4_Component.puml Backend component architecture
Order State docs/03-architecture/state-machines/C4_Code_State_Order.puml Order status state machine
Domain Model docs/03-architecture/c4-model/4-code/C4_Code_DomainModel.puml Entity relationships

These PlantUML diagrams should be validated and updated as part of Phase 6.

Current System Health Endpoints

Endpoint API Web Description
/health Full health status with build info
/health/live Simple liveness probe
/health/ready - Readiness probe (checks database)

Phase 6 Focus Areas

┌──────────────────────────────────────────────────────────────────┐
│                    PHASE 6: HARDENING                            │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌────────────────┐  ┌────────────────┐  ┌────────────────┐     │
│  │   TESTING      │  │   MONITORING   │  │   SECURITY     │     │
│  │                │  │                │  │                │     │
│  │ • Unit 80%+    │  │ • Health checks│  │ • Dep scan     │     │
│  │ • Integration  │  │ • Alerting     │  │ • Rate limits  │     │
│  │ • E2E paths    │  │ • Metrics      │  │ • Headers      │     │
│  │ • Load 500/day │  │ • Runbooks     │  │ • Auth audit   │     │
│  └────────────────┘  └────────────────┘  └────────────────┘     │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │                  DOCUMENTATION                          │    │
│  │                                                         │    │
│  │  • README complete       • API docs (Swagger)           │    │
│  │  • Architecture docs     • Troubleshooting guide        │    │
│  │  • Runbook operations    • Environment setup            │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                  │
│                          ↓                                       │
│              ┌──────────────────────┐                            │
│              │  PRODUCTION READY ✅  │                            │
│              └──────────────────────┘                            │
└──────────────────────────────────────────────────────────────────┘

📁 Files to Create/Modify

Testing Infrastructure

apps/api/src/
├── **/__tests__/                      # Unit tests for all modules
│   ├── orders.controller.spec.ts      # Controller tests (if not done in 5h)
│   ├── print-jobs.controller.spec.ts
│   └── ...
│
├── test/
│   ├── integration/
│   │   ├── order-flow.spec.ts         # Order → Print → Fulfill flow
│   │   ├── cancellation-flow.spec.ts  # Cancellation handling
│   │   └── error-recovery.spec.ts     # Error recovery scenarios
│   └── e2e/
│       └── critical-paths.spec.ts     # E2E critical path tests

apps/web/src/
├── **/__tests__/                      # Additional component tests

load-tests/
├── k6/
│   ├── config.js                      # K6 configuration
│   ├── scenarios/
│   │   ├── order-throughput.js        # 500 orders/day simulation
│   │   ├── dashboard-load.js          # Dashboard concurrent users
│   │   └── webhook-burst.js           # Webhook burst handling
│   └── reports/                       # Generated reports

Monitoring Infrastructure

apps/api/src/
├── health/
│   ├── health.module.ts               # UPDATE: Add external service checks
│   ├── health.controller.ts           # UPDATE: Enhanced health endpoints
│   └── indicators/
│       ├── shopify.indicator.ts       # Shopify API health check
│       ├── simplyprint.indicator.ts   # SimplyPrint API health check
│       ├── sendcloud.indicator.ts     # Sendcloud API health check
│       └── database.indicator.ts      # Database health check

deployment/
├── monitoring/
│   ├── alerting-rules.yml             # Alert definitions
│   ├── runbook.md                     # Operations runbook
│   └── dashboard.json                 # Metrics dashboard config

Security Hardening

apps/api/src/
├── common/
│   ├── guards/
│   │   └── throttler.guard.ts         # Rate limiting guard
│   ├── middleware/
│   │   └── security-headers.middleware.ts  # Security headers
│   └── filters/
│       └── global-exception.filter.ts # UPDATE: Enhanced error handling

.github/workflows/
└── security-scan.yml                  # Dependency security scan workflow

Documentation

docs/
├── 04-development/
│   ├── runbook.md                     # NEW: Operations runbook
│   ├── troubleshooting.md             # NEW: Troubleshooting guide
│   └── environment-setup.md           # NEW: Environment setup guide
├── 03-architecture/
│   └── (validate and update all diagrams)
└── README.md                          # UPDATE: Complete project documentation

🔧 Feature F6.1: Comprehensive Testing

Requirements Reference

  • NFR-MA-002: Test Coverage (> 80%)
  • NFR-PE-001: Performance Requirements
  • Success Metric: 99% automation success rate

Implementation

1. Test Coverage Analysis

First, analyze current coverage and identify gaps:

# Generate coverage report for backend
pnpm nx test api --coverage

# Generate coverage report for frontend
pnpm nx test web --coverage

# Identify files with low coverage
# Target: 80%+ statements, functions, branches, lines

2. Integration Tests

Create apps/api/test/integration/order-flow.spec.ts:

import { Test, TestingModule } from '@nestjs/testing';
import { INestApplication } from '@nestjs/common';
import * as request from 'supertest';
import { AppModule } from '../../src/app.module';
import { PrismaService } from '../../src/database/prisma.service';
import { OrderStatus, PrintJobStatus } from '@prisma/client';

describe('Order Flow Integration (Integration)', () => {
  let app: INestApplication;
  let prisma: PrismaService;

  beforeAll(async () => {
    const moduleRef: TestingModule = await Test.createTestingModule({
      imports: [AppModule],
    }).compile();

    app = moduleRef.createNestApplication();
    prisma = moduleRef.get<PrismaService>(PrismaService);
    await app.init();
  });

  afterAll(async () => {
    await app.close();
  });

  beforeEach(async () => {
    // Clean up test data
    await prisma.eventLog.deleteMany();
    await prisma.printJob.deleteMany();
    await prisma.lineItem.deleteMany();
    await prisma.shipment.deleteMany();
    await prisma.order.deleteMany();
  });

  describe('Complete Order Flow', () => {
    it('should process order from creation to fulfillment', async () => {
      // 1. Create order via webhook simulation
      const webhookPayload = createMockShopifyOrderWebhook();

      const orderResponse = await request(app.getHttpServer())
        .post('/api/v1/webhooks/shopify')
        .set('X-Shopify-Topic', 'orders/create')
        .set('X-Shopify-Hmac-SHA256', calculateHmac(webhookPayload))
        .send(webhookPayload)
        .expect(200);

      expect(orderResponse.body.success).toBe(true);

      // 2. Verify order was created
      const order = await prisma.order.findFirst({
        where: { shopifyOrderId: webhookPayload.id.toString() },
        include: { lineItems: true, printJobs: true },
      });

      expect(order).toBeDefined();
      expect(order!.status).toBe(OrderStatus.PENDING);
      expect(order!.lineItems).toHaveLength(webhookPayload.line_items.length);

      // 3. Simulate print job completion
      for (const printJob of order!.printJobs) {
        await prisma.printJob.update({
          where: { id: printJob.id },
          data: { status: PrintJobStatus.COMPLETED },
        });
      }

      // 4. Trigger orchestration check
      // (In real flow this happens via events)

      // 5. Verify order is ready for fulfillment
      const updatedOrder = await prisma.order.findUnique({
        where: { id: order!.id },
      });

      // Order should be completed when all print jobs are done
      expect(updatedOrder!.status).toBe(OrderStatus.COMPLETED);
    });

    it('should handle order cancellation during printing', async () => {
      // Create order and start printing
      const order = await createTestOrder(prisma);

      // Simulate cancellation webhook
      const cancelWebhook = createMockCancellationWebhook(order.shopifyOrderId);

      await request(app.getHttpServer())
        .post('/api/v1/webhooks/shopify')
        .set('X-Shopify-Topic', 'orders/cancelled')
        .set('X-Shopify-Hmac-SHA256', calculateHmac(cancelWebhook))
        .send(cancelWebhook)
        .expect(200);

      // Verify order and print jobs are cancelled
      const cancelledOrder = await prisma.order.findUnique({
        where: { id: order.id },
        include: { printJobs: true },
      });

      expect(cancelledOrder!.status).toBe(OrderStatus.CANCELLED);
      cancelledOrder!.printJobs.forEach((job) => {
        expect([PrintJobStatus.CANCELLED, PrintJobStatus.COMPLETED]).toContain(job.status);
      });
    });
  });

  describe('Error Recovery', () => {
    it('should retry failed print jobs', async () => {
      // Create order with a print job that will fail
      const order = await createTestOrder(prisma);
      const printJob = order.printJobs[0];

      // Mark as failed
      await prisma.printJob.update({
        where: { id: printJob.id },
        data: { 
          status: PrintJobStatus.FAILED,
          errorMessage: 'Simulated failure',
        },
      });

      // Trigger retry via API
      await request(app.getHttpServer())
        .post(`/api/v1/print-jobs/${printJob.id}/retry`)
        .set('X-API-Key', process.env.API_KEY || 'test-key')
        .expect(200);

      // Verify job is queued again
      const retriedJob = await prisma.printJob.findUnique({
        where: { id: printJob.id },
      });

      expect(retriedJob!.status).toBe(PrintJobStatus.QUEUED);
    });
  });
});

// Helper functions
function createMockShopifyOrderWebhook() {
  return {
    id: Date.now(),
    order_number: 1001,
    email: 'test@example.com',
    total_price: '49.99',
    currency: 'EUR',
    shipping_address: {
      first_name: 'Test',
      last_name: 'Customer',
      address1: '123 Test St',
      city: 'Brussels',
      zip: '1000',
      country_code: 'BE',
    },
    line_items: [
      {
        id: Date.now(),
        variant_id: 12345,
        title: 'Test Product',
        quantity: 1,
        price: '49.99',
        sku: 'TEST-SKU-001',
      },
    ],
  };
}

function createMockCancellationWebhook(shopifyOrderId: string) {
  return {
    id: shopifyOrderId,
    cancelled_at: new Date().toISOString(),
  };
}

function calculateHmac(payload: object): string {
  const crypto = require('crypto');
  const secret = process.env.SHOPIFY_WEBHOOK_SECRET || 'test-secret';
  return crypto
    .createHmac('sha256', secret)
    .update(JSON.stringify(payload))
    .digest('base64');
}

async function createTestOrder(prisma: PrismaService) {
  // Create a test order with line items and print jobs
  return prisma.order.create({
    data: {
      shopifyOrderId: `test-${Date.now()}`,
      shopifyOrderNumber: 'TEST-1001',
      customerName: 'Test Customer',
      customerEmail: 'test@example.com',
      shippingAddress: {
        first_name: 'Test',
        last_name: 'Customer',
        address1: '123 Test St',
        city: 'Brussels',
        zip: '1000',
        country_code: 'BE',
      },
      totalPrice: 49.99,
      currency: 'EUR',
      status: OrderStatus.PROCESSING,
      lineItems: {
        create: [
          {
            shopifyLineItemId: `line-${Date.now()}`,
            shopifyVariantId: '12345',
            title: 'Test Product',
            quantity: 1,
            price: 49.99,
            sku: 'TEST-SKU-001',
          },
        ],
      },
      printJobs: {
        create: [
          {
            simplyPrintJobId: `sp-${Date.now()}`,
            status: PrintJobStatus.PRINTING,
          },
        ],
      },
    },
    include: {
      lineItems: true,
      printJobs: true,
    },
  });
}

3. Load Testing with K6

Create load-tests/k6/scenarios/order-throughput.js:

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

// Custom metrics
const orderCreationRate = new Rate('order_creation_success');
const orderCreationDuration = new Trend('order_creation_duration');

// Test configuration
export const options = {
  scenarios: {
    // Simulate 500 orders/day = ~21 orders/hour = ~0.35 orders/minute
    // But we want to test burst capacity too
    sustained_load: {
      executor: 'constant-arrival-rate',
      rate: 1, // 1 order per second (3600/hour for stress test)
      timeUnit: '1s',
      duration: '5m',
      preAllocatedVUs: 10,
      maxVUs: 50,
    },
    spike_test: {
      executor: 'ramping-arrival-rate',
      startRate: 0,
      timeUnit: '1s',
      preAllocatedVUs: 50,
      maxVUs: 100,
      stages: [
        { target: 10, duration: '30s' }, // Ramp up to 10/s
        { target: 10, duration: '1m' },  // Hold at 10/s
        { target: 50, duration: '30s' }, // Spike to 50/s
        { target: 50, duration: '30s' }, // Hold spike
        { target: 0, duration: '30s' },  // Ramp down
      ],
      startTime: '6m', // Start after sustained load
    },
  },
  thresholds: {
    'http_req_duration': ['p(95)<2000'], // 95th percentile < 2s
    'http_req_failed': ['rate<0.01'],    // Error rate < 1%
    'order_creation_success': ['rate>0.99'], // 99% success rate
  },
};

const BASE_URL = __ENV.API_URL || 'http://localhost:3000';
const API_KEY = __ENV.API_KEY || 'test-api-key';

// Webhook simulation (read-only for load test)
export default function () {
  // Test 1: Health check
  const healthRes = http.get(`${BASE_URL}/health`);
  check(healthRes, {
    'health check status is 200': (r) => r.status === 200,
  });

  // Test 2: Orders list (dashboard simulation)
  const ordersRes = http.get(`${BASE_URL}/api/v1/orders?page=1&pageSize=20`, {
    headers: { 'X-API-Key': API_KEY },
  });
  check(ordersRes, {
    'orders list status is 200': (r) => r.status === 200,
    'orders list has data': (r) => {
      const body = JSON.parse(r.body);
      return Array.isArray(body.orders);
    },
  });

  // Test 3: Single order detail
  const orderId = getRandomOrderId();
  if (orderId) {
    const orderDetailRes = http.get(`${BASE_URL}/api/v1/orders/${orderId}`, {
      headers: { 'X-API-Key': API_KEY },
    });
    check(orderDetailRes, {
      'order detail status is 200 or 404': (r) => [200, 404].includes(r.status),
    });
  }

  // Test 4: Shipping methods (unauthenticated)
  const shippingRes = http.get(`${BASE_URL}/api/v1/shipping/methods?country=BE`);
  check(shippingRes, {
    'shipping methods status is 200': (r) => r.status === 200,
  });

  sleep(0.1); // 100ms between iterations
}

function getRandomOrderId() {
  // In real test, fetch from a pool of known order IDs
  // For now, return null to skip order detail test
  return null;
}

Create load-tests/k6/config.js:

export const environments = {
  local: {
    baseUrl: 'http://localhost:3000',
    apiKey: 'dev-api-key',
  },
  staging: {
    baseUrl: 'https://staging-connect-api.forma3d.be',
    apiKey: '__STAGING_API_KEY__',
  },
};

Add load test scripts to package.json:

{
  "scripts": {
    "load-test:local": "k6 run --env API_URL=http://localhost:3000 load-tests/k6/scenarios/order-throughput.js",
    "load-test:staging": "k6 run --env API_URL=https://staging-connect-api.forma3d.be load-tests/k6/scenarios/order-throughput.js"
  }
}

4. E2E Critical Path Tests

Create apps/api/test/e2e/critical-paths.spec.ts:

/**
 * E2E Critical Path Tests
 * 
 * These tests verify the complete automation flow works end-to-end
 * against a real database (test environment).
 */
import { Test, TestingModule } from '@nestjs/testing';
import { INestApplication } from '@nestjs/common';
import { AppModule } from '../../src/app.module';

describe('Critical Paths E2E', () => {
  let app: INestApplication;

  beforeAll(async () => {
    const moduleRef: TestingModule = await Test.createTestingModule({
      imports: [AppModule],
    }).compile();

    app = moduleRef.createNestApplication();
    await app.init();
  });

  afterAll(async () => {
    await app.close();
  });

  describe('Order → Print Job → Fulfillment Path', () => {
    it('should complete full automation cycle', async () => {
      // This is a placeholder for a full E2E test
      // In a real scenario, this would:
      // 1. Create a Shopify order via webhook
      // 2. Verify print jobs are created
      // 3. Simulate print completion
      // 4. Verify shipping label is generated
      // 5. Verify Shopify fulfillment is created
      expect(true).toBe(true);
    });
  });

  describe('Cancellation Path', () => {
    it('should handle cancellation at any stage', async () => {
      expect(true).toBe(true);
    });
  });

  describe('Error Recovery Path', () => {
    it('should recover from transient failures', async () => {
      expect(true).toBe(true);
    });
  });
});

🔧 Feature F6.1b: Enhanced Business Observability

Overview

The BusinessObservabilityService provides comprehensive logging for business events, state transitions, and automation flow tracking. This service integrates with Sentry for structured business metrics and the EventLogService for persistent audit trails.

Key Features

  1. State Transition Logging
  2. Tracks old→new state with timing for orders, print jobs, shipments, and fulfillments
  3. Includes correlation IDs for distributed tracing
  4. Persists to EventLog for audit trail

  5. Flow Milestone Tracking

  6. Tracks order automation cycle from receipt to fulfillment
  7. Measures elapsed time between milestones
  8. Records flow completion/failure with total duration

  9. Sentry Business Integration

  10. Sets order/print job context for better error correlation
  11. Adds breadcrumbs for state transitions
  12. Captures flow completion/failure as Sentry events

Available Milestones

Milestone Trigger Point
order_received Flow starts (startFlow called)
order_validated Order found and validated
print_jobs_created Print jobs created for all line items
all_jobs_printing All jobs transitioned to PRINTING
all_jobs_completed All print jobs completed
shipping_label_created Sendcloud label generated
fulfillment_created Shopify fulfillment created
flow_completed Full automation cycle completed
flow_failed Automation failed at any point

Usage Example

// Start tracking a new order flow
this.businessObservability.startFlow(orderId);

// Set Sentry context
this.businessObservability.setOrderContext({
  id: order.id,
  shopifyOrderId: order.shopifyOrderId,
  shopifyOrderNumber: order.shopifyOrderNumber,
  status: order.status,
});

// Log state transition
await this.businessObservability.logStateTransition({
  entityType: 'order',
  entityId: orderId,
  orderId,
  previousState: OrderStatus.PENDING,
  newState: OrderStatus.PROCESSING,
  trigger: 'webhook_received',
});

// Record milestone
await this.businessObservability.recordMilestone({
  orderId,
  milestone: 'print_jobs_created',
  metadata: { jobCount: 3 },
});

// Complete flow (called automatically when flow_completed milestone is recorded)
await this.businessObservability.recordMilestone({
  orderId,
  milestone: 'flow_completed',
});

Log Output Examples

State Transition:

{
  "message": "[STATE CHANGE] order:order-123 PENDING → PROCESSING",
  "correlationId": "abc-123",
  "entityType": "order",
  "entityId": "order-123",
  "previousState": "PENDING",
  "newState": "PROCESSING",
  "trigger": "webhook_received"
}

Flow Completion:

{
  "message": "[FLOW COMPLETED] Order order-123 automation completed successfully",
  "correlationId": "abc-123",
  "orderId": "order-123",
  "success": true,
  "totalDurationMs": 125000,
  "totalDurationMinutes": 2.1,
  "milestones": {
    "order_received": 0,
    "order_validated": 50,
    "print_jobs_created": 200,
    "all_jobs_completed": 120000,
    "shipping_label_created": 122000,
    "fulfillment_created": 125000
  }
}


🔧 Feature F6.2: Monitoring and Alerting

Requirements Reference

  • NFR-AV-001: System Uptime (99%)
  • NFR-PE-003: Processing Latency (< 2 minutes)
  • Health check endpoints operational

Implementation

1. Enhanced Health Indicators

Create apps/api/src/health/indicators/shopify.indicator.ts:

import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { ShopifyApiClient } from '../../shopify/shopify-api.client';

@Injectable()
export class ShopifyHealthIndicator extends HealthIndicator {
  constructor(private readonly shopifyClient: ShopifyApiClient) {
    super();
  }

  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    try {
      // Make a lightweight API call to verify connectivity
      const isConnected = await this.shopifyClient.ping();

      if (isConnected) {
        return this.getStatus(key, true);
      }

      throw new HealthCheckError(
        'Shopify API check failed',
        this.getStatus(key, false, { message: 'Unable to connect to Shopify API' })
      );
    } catch (error) {
      throw new HealthCheckError(
        'Shopify API check failed',
        this.getStatus(key, false, { error: error.message })
      );
    }
  }
}

Create apps/api/src/health/indicators/simplyprint.indicator.ts:

import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { SimplyPrintApiClient } from '../../simplyprint/simplyprint-api.client';

@Injectable()
export class SimplyPrintHealthIndicator extends HealthIndicator {
  constructor(private readonly simplyPrintClient: SimplyPrintApiClient) {
    super();
  }

  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    try {
      const isConnected = await this.simplyPrintClient.ping();

      if (isConnected) {
        return this.getStatus(key, true);
      }

      throw new HealthCheckError(
        'SimplyPrint API check failed',
        this.getStatus(key, false)
      );
    } catch (error) {
      throw new HealthCheckError(
        'SimplyPrint API check failed',
        this.getStatus(key, false, { error: error.message })
      );
    }
  }
}

Create apps/api/src/health/indicators/sendcloud.indicator.ts:

import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { SendcloudApiClient } from '../../sendcloud/sendcloud-api.client';

@Injectable()
export class SendcloudHealthIndicator extends HealthIndicator {
  constructor(private readonly sendcloudClient: SendcloudApiClient) {
    super();
  }

  async isHealthy(key: string): Promise<HealthIndicatorResult> {
    if (!this.sendcloudClient.isShippingEnabled()) {
      return this.getStatus(key, true, { status: 'disabled' });
    }

    try {
      // Get shipping methods as a health check
      await this.sendcloudClient.getShippingMethods();
      return this.getStatus(key, true);
    } catch (error) {
      throw new HealthCheckError(
        'Sendcloud API check failed',
        this.getStatus(key, false, { error: error.message })
      );
    }
  }
}

2. Update Health Controller

Update apps/api/src/health/health.controller.ts:

import { Controller, Get } from '@nestjs/common';
import { ApiTags, ApiOperation, ApiResponse } from '@nestjs/swagger';
import {
  HealthCheckService,
  HealthCheck,
  PrismaHealthIndicator,
  HealthCheckResult,
} from '@nestjs/terminus';
import { PrismaService } from '../database/prisma.service';
import { ShopifyHealthIndicator } from './indicators/shopify.indicator';
import { SimplyPrintHealthIndicator } from './indicators/simplyprint.indicator';
import { SendcloudHealthIndicator } from './indicators/sendcloud.indicator';

@ApiTags('Health')
@Controller('health')
export class HealthController {
  constructor(
    private readonly health: HealthCheckService,
    private readonly prismaHealth: PrismaHealthIndicator,
    private readonly prisma: PrismaService,
    private readonly shopifyHealth: ShopifyHealthIndicator,
    private readonly simplyPrintHealth: SimplyPrintHealthIndicator,
    private readonly sendcloudHealth: SendcloudHealthIndicator,
  ) {}

  @Get()
  @HealthCheck()
  @ApiOperation({ summary: 'Full health check with all dependencies' })
  @ApiResponse({ status: 200, description: 'System is healthy' })
  @ApiResponse({ status: 503, description: 'System is unhealthy' })
  async check(): Promise<HealthCheckResult> {
    return this.health.check([
      // Database
      () => this.prismaHealth.pingCheck('database', this.prisma),
      // External services
      () => this.shopifyHealth.isHealthy('shopify'),
      () => this.simplyPrintHealth.isHealthy('simplyprint'),
      () => this.sendcloudHealth.isHealthy('sendcloud'),
    ]);
  }

  @Get('live')
  @ApiOperation({ summary: 'Liveness probe - is the process running?' })
  @ApiResponse({ status: 200, description: 'Process is alive' })
  async liveness(): Promise<{ status: string; timestamp: string }> {
    return {
      status: 'ok',
      timestamp: new Date().toISOString(),
    };
  }

  @Get('ready')
  @HealthCheck()
  @ApiOperation({ summary: 'Readiness probe - is the service ready to accept traffic?' })
  @ApiResponse({ status: 200, description: 'Service is ready' })
  @ApiResponse({ status: 503, description: 'Service is not ready' })
  async readiness(): Promise<HealthCheckResult> {
    return this.health.check([
      () => this.prismaHealth.pingCheck('database', this.prisma),
    ]);
  }

  @Get('dependencies')
  @HealthCheck()
  @ApiOperation({ summary: 'Check all external service dependencies' })
  @ApiResponse({ status: 200, description: 'All dependencies healthy' })
  @ApiResponse({ status: 503, description: 'One or more dependencies unhealthy' })
  async dependencies(): Promise<HealthCheckResult> {
    return this.health.check([
      () => this.shopifyHealth.isHealthy('shopify'),
      () => this.simplyPrintHealth.isHealthy('simplyprint'),
      () => this.sendcloudHealth.isHealthy('sendcloud'),
    ]);
  }
}

3. Alerting Rules

Create deployment/monitoring/alerting-rules.yml:

# Alerting Rules for Forma3D.Connect
# Configure in your monitoring system (e.g., Prometheus AlertManager, Datadog, etc.)

groups:
  - name: forma3d-connect
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "Error rate is above 5% for the last 5 minutes"
          runbook_url: "docs/04-development/runbook.md#high-error-rate"

      # API latency
      - alert: HighAPILatency
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High API latency"
          description: "95th percentile latency is above 2 seconds"
          runbook_url: "docs/04-development/runbook.md#high-latency"

      # Database connection issues
      - alert: DatabaseConnectionFailed
        expr: pg_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Database connection failed"
          description: "Cannot connect to PostgreSQL database"
          runbook_url: "docs/04-development/runbook.md#database-connection"

      # External service failures
      - alert: ShopifyAPIDown
        expr: shopify_api_up == 0
        for: 5m
        labels:
          severity: high
        annotations:
          summary: "Shopify API unreachable"
          description: "Cannot connect to Shopify API for 5 minutes"
          runbook_url: "docs/04-development/runbook.md#shopify-down"

      - alert: SimplyPrintAPIDown
        expr: simplyprint_api_up == 0
        for: 5m
        labels:
          severity: high
        annotations:
          summary: "SimplyPrint API unreachable"
          description: "Cannot connect to SimplyPrint API for 5 minutes"
          runbook_url: "docs/04-development/runbook.md#simplyprint-down"

      # Order processing stuck
      - alert: OrdersStuckInProcessing
        expr: count(orders_status{status="processing", age_minutes > 60}) > 5
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Orders stuck in processing"
          description: "Multiple orders have been in processing state for over 60 minutes"
          runbook_url: "docs/04-development/runbook.md#stuck-orders"

      # Retry queue growing
      - alert: RetryQueueBacklog
        expr: retry_queue_size > 50
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Retry queue backlog growing"
          description: "More than 50 items in retry queue for over 15 minutes"
          runbook_url: "docs/04-development/runbook.md#retry-queue-backlog"

4. Operations Runbook

Create docs/04-development/runbook.md:

# Forma3D.Connect Operations Runbook

## Overview

This runbook provides procedures for operating and troubleshooting Forma3D.Connect in production.

## Table of Contents

1. [Service Architecture](#service-architecture)
2. [Health Checks](#health-checks)
3. [Common Issues and Resolutions](#common-issues)
4. [Incident Response](#incident-response)
5. [Maintenance Procedures](#maintenance-procedures)

---

## Service Architecture

### Components

| Component | URL | Purpose |
|-----------|-----|---------|
| API | `https://connect-api.forma3d.be` | Backend NestJS application |
| Web | `https://connect.forma3d.be` | React dashboard |
| Database | PostgreSQL (managed) | Data persistence |
| Traefik | Internal | Reverse proxy with TLS |

### External Dependencies

| Service | Purpose | Documentation |
|---------|---------|---------------|
| Shopify | E-commerce platform | [Shopify API Docs](https://shopify.dev/docs/api) |
| SimplyPrint | 3D print management | [SimplyPrint API](https://simplyprint.io/docs) |
| Sendcloud | Shipping labels | [Sendcloud API](https://api.sendcloud.dev) |
| Sentry | Error monitoring | [Sentry Dashboard](https://sentry.io) |

---

## Health Checks

### Endpoints

```bash
# Full health check
curl https://connect-api.forma3d.be/health

# Liveness probe
curl https://connect-api.forma3d.be/health/live

# Readiness probe
curl https://connect-api.forma3d.be/health/ready

# External dependencies
curl https://connect-api.forma3d.be/health/dependencies

Expected Responses

Healthy:

{
  "status": "ok",
  "info": {
    "database": { "status": "up" },
    "shopify": { "status": "up" },
    "simplyprint": { "status": "up" },
    "sendcloud": { "status": "up" }
  }
}

Unhealthy (example):

{
  "status": "error",
  "error": {
    "shopify": {
      "status": "down",
      "error": "Connection timeout"
    }
  }
}


Common Issues

High Error Rate

Symptoms: Error rate > 5%, Sentry alerts

Investigation: 1. Check Sentry for error patterns 2. Review API logs: docker logs forma3d-api --tail 100 3. Check database connectivity 4. Verify external service status

Resolution: 1. If database issue: Restart database connection pool 2. If external service: Enable fallback/degraded mode 3. If code bug: Deploy hotfix

High Latency

Symptoms: 95th percentile response time > 2s

Investigation: 1. Check database query performance 2. Review slow query logs 3. Check external API response times 4. Monitor memory/CPU usage

Resolution: 1. Scale up resources if needed 2. Optimize slow queries 3. Add caching if appropriate

Database Connection

Symptoms: Database health check failing

Investigation: 1. Check PostgreSQL status 2. Verify connection string 3. Check network connectivity 4. Review connection pool settings

Resolution: 1. Restart API service: docker-compose restart api 2. Check database credentials 3. Contact database provider if managed

Shopify Down

Symptoms: Webhooks not being received, fulfillments failing

Investigation: 1. Check Shopify status page 2. Verify webhook configuration in Shopify admin 3. Review API logs for errors

Resolution: 1. Wait for Shopify to recover 2. Orders will be retried via retry queue 3. Manual reprocessing if needed

SimplyPrint Down

Symptoms: Print jobs not being created

Investigation: 1. Check SimplyPrint API status 2. Verify API credentials 3. Check SimplyPrint dashboard

Resolution: 1. Wait for service recovery 2. Failed jobs will be retried automatically 3. Manual retry: POST /api/v1/print-jobs/{id}/retry

Stuck Orders

Symptoms: Orders in PROCESSING state for > 60 minutes

Investigation: 1. Check print job status in dashboard 2. Verify SimplyPrint job status 3. Check for failed webhooks

Resolution: 1. Force refresh print job status 2. Manually update order status if needed 3. Contact SimplyPrint support if print issues

Retry Queue Backlog

Symptoms: > 50 items in retry queue

Investigation: 1. Check retry queue: GET /api/v1/admin/retry-queue 2. Identify failing job types 3. Check error messages

Resolution: 1. Fix underlying issue causing failures 2. Clear old/stale entries if safe 3. Increase retry queue processing capacity


Incident Response

Severity Levels

Level Description Response Time Examples
P1 - Critical Complete service outage 15 minutes Database down, API unresponsive
P2 - High Major feature broken 1 hour Webhooks failing, fulfillments stuck
P3 - Medium Degraded performance 4 hours High latency, intermittent errors
P4 - Low Minor issue 1 business day UI bugs, documentation issues

Incident Template

## Incident Report

**Date:** YYYY-MM-DD
**Severity:** P1/P2/P3/P4
**Duration:** HH:MM - HH:MM
**Impact:** [Description of user impact]

### Timeline
- HH:MM - Issue detected
- HH:MM - Investigation started
- HH:MM - Root cause identified
- HH:MM - Fix deployed
- HH:MM - Issue resolved

### Root Cause
[Description of what caused the issue]

### Resolution
[What was done to fix it]

### Prevention
[What will be done to prevent recurrence]

Maintenance Procedures

Deploying Updates

# Pull latest changes
git pull origin main

# Build and push images
docker build -t forma3d-api:latest apps/api
docker push registry.digitalocean.com/forma3d/api:latest

# Deploy with zero downtime
docker-compose up -d --no-deps api

Database Migrations

# Run migrations
pnpm prisma migrate deploy

# Rollback (if needed)
pnpm prisma migrate resolve --rolled-back MIGRATION_NAME

Log Rotation

Logs are automatically rotated by Docker. Manual cleanup:

# Clear old logs
docker system prune --volumes

Backup Procedures

Database backups are handled by the managed PostgreSQL provider.

Manual backup:

pg_dump $DATABASE_URL > backup_$(date +%Y%m%d).sql

Keys & Certificates Inventory

IMPORTANT: Maintain an up-to-date inventory of all API keys, secrets, and certificates.

Create and maintain docs/04-development/keys-certificates-inventory.md:

Key/Certificate Purpose Lifespan Renewal Location Renewal Procedure Last Renewed
— INFRASTRUCTURE —
Droplet SSH Key SSH access to server No expiry (rotate annually) DigitalOcean → Settings → Security → SSH Keys Generate new keypair, add to DO, update ~/.ssh/authorized_keys on droplet YYYY-MM-DD
Droplet Root Password Emergency console access No expiry (rotate annually) DigitalOcean → Droplet → Access → Reset Root Password Reset via DO console, store in password manager YYYY-MM-DD
TLS Certificate (Let's Encrypt) HTTPS for API/Web 90 days Let's Encrypt via Traefik ACME Auto-renewed by Traefik (see note below) Auto
— DATABASE —
Database CA Certificate SSL connection to managed DB 1-5 years (provider-managed) DigitalOcean → Databases → Your DB → Connection Details → Download CA Download new CA cert, update ?sslmode=require&sslrootcert= path YYYY-MM-DD
Database Password PostgreSQL access No expiry (rotate quarterly) DigitalOcean → Databases → Your DB → Users Reset via provider, update DATABASE_URL YYYY-MM-DD
— CONTAINER REGISTRY —
Container Registry Token Push/pull Docker images No expiry DigitalOcean → Container Registry → API Generate new token, update CI/CD variables YYYY-MM-DD
Cosign Signing Key Container image signing No expiry Self-generated cosign generate-key-pair, update CI/CD secrets YYYY-MM-DD
— EXTERNAL SERVICES —
Shopify API Key Shopify Admin API access No expiry Shopify Admin → Apps → Your App Regenerate in Shopify admin, update .env YYYY-MM-DD
Shopify API Secret App authentication No expiry Shopify Admin → Apps → Your App Regenerate in Shopify admin, update .env YYYY-MM-DD
Shopify Access Token Store-specific access No expiry (unless revoked) Shopify Admin → Apps Reinstall app or regenerate YYYY-MM-DD
Shopify Webhook Secret HMAC verification No expiry Shopify Admin → Notifications → Webhooks Regenerate in webhooks settings YYYY-MM-DD
SimplyPrint API Key Print farm API access No expiry SimplyPrint Dashboard → API Settings Generate new key in dashboard YYYY-MM-DD
SimplyPrint Webhook Token Webhook verification No expiry SimplyPrint Dashboard → Webhooks Configure in webhook settings YYYY-MM-DD
Sendcloud Public Key Shipping API authentication No expiry Sendcloud Panel → Settings → API Generate new integration YYYY-MM-DD
Sendcloud Secret Key Shipping API authentication No expiry Sendcloud Panel → Settings → API Generate new integration YYYY-MM-DD
— APPLICATION —
API_KEY (internal) Dashboard/Admin access No expiry (rotate annually) Self-generated Generate new UUID, update .env YYYY-MM-DD
Sentry DSN Error tracking No expiry Sentry Dashboard → Project Settings Create new project if needed YYYY-MM-DD
SMTP Credentials Email notifications Varies by provider Email provider dashboard Regenerate password/API key YYYY-MM-DD
— CI/CD (Azure DevOps) —
Azure DevOps PAT Pipeline authentication 1 year max Azure DevOps → User Settings → Personal Access Tokens Generate new PAT, update service connections YYYY-MM-DD
Service Connection (SSH) Deploy to droplet Tied to SSH key Azure DevOps → Project Settings → Service Connections Update with new SSH private key YYYY-MM-DD
Pipeline Variables Secrets in pipelines No expiry Azure DevOps → Pipelines → Library → Variable Groups Update individual variables as needed YYYY-MM-DD

Let's Encrypt TLS Certificate Auto-Renewal:

Traefik automatically handles Let's Encrypt certificate renewal. Key details: - Validity: 90 days per certificate - Auto-renewal: Traefik renews ~30 days before expiry (no manual action required) - Storage: Certificates stored in Docker volume traefik-certs at /letsencrypt/acme.json - Challenge: HTTP-01 challenge via port 80 (must remain accessible) - Configuration: See deployment/staging/traefik.ymlcertificatesResolvers.letsencrypt

# traefik.yml - ACME auto-renewal configuration
certificatesResolvers:
  letsencrypt:
    acme:
      email: admin@forma3d.be          # Expiry notifications sent here
      storage: /letsencrypt/acme.json  # Persisted in Docker volume
      httpChallenge:
        entryPoint: web                # Port 80 for challenge

Monitoring TLS auto-renewal:

# Check certificate expiry date
echo | openssl s_client -connect staging-connect-api.forma3d.be:443 2>/dev/null | openssl x509 -noout -dates

# Check Traefik logs for renewal activity
docker logs forma3d-traefik 2>&1 | grep -i "acme\|certificate\|renew"

# Verify ACME storage file exists
docker exec forma3d-traefik ls -la /letsencrypt/acme.json

Troubleshooting failed renewal: 1. Ensure port 80 is open and reachable from internet 2. Check DNS still points to correct IP 3. Verify Traefik container is running: docker ps | grep traefik 4. Check Traefik logs for ACME errors 5. If needed, remove acme.json and restart Traefik to re-issue certificates

Renewal Calendar: - Weekly (automated): TLS certificates checked by Traefik, renewed if within 30 days of expiry - Monthly: Verify TLS auto-renewal is working (check cert dates) - Quarterly: Rotate database password, review API key usage - Annually: Rotate SSH keys, Azure DevOps PAT, internal API_KEY, review all external API keys - Before Expiry: Database CA certificate (monitor provider notifications), Azure DevOps PAT - On Incident: Rotate any potentially compromised credentials immediately

Monitoring Expiry: - Set calendar reminders for annually-rotated credentials - Subscribe to provider notifications (DigitalOcean, Azure DevOps) - Let's Encrypt sends expiry warnings to admin@forma3d.be (but Traefik should renew automatically) - Database CA cert expiry: openssl x509 -enddate -noout -in ca-certificate.crt

Renewal Checklist: 1. [ ] Generate new credential in source system 2. [ ] Update .env files (staging and production) 3. [ ] Update secrets in CI/CD (Azure DevOps variables) 4. [ ] Deploy with new credentials 5. [ ] Verify system still works (health checks) 6. [ ] Revoke old credential (if applicable) 7. [ ] Update this inventory with new "Last Renewed" date


Contact Information

Role Contact Escalation
On-call Engineer [email/phone] Primary
Tech Lead [email/phone] If P1/P2
Database Admin [email/phone] Database issues
---

## 🔧 Feature F6.3: Documentation

### Requirements Reference

- Complete README with setup instructions
- API documentation (OpenAPI/Swagger)
- Architecture documentation
- Runbook for operations
- **Keys & certificates inventory** (tabular overview of all API keys, secrets, certificates with lifespans and renewal procedures)

### Implementation

#### 1. Environment Setup Guide

Create `docs/04-development/environment-setup.md`:

```markdown
# Environment Setup Guide

## Prerequisites

- Node.js 20.x or higher
- pnpm 8.x or higher
- Docker and Docker Compose
- PostgreSQL 15 (or use Docker)
- Git

## Quick Start

### 1. Clone Repository

```bash
git clone https://github.com/forma3d/forma3d-connect.git
cd forma3d-connect

2. Install Dependencies

pnpm install

3. Configure Environment

Copy the example environment file:

cp .env.example .env

Edit .env with your configuration:

# Database
DATABASE_URL=postgresql://user:password@localhost:5432/forma3d_connect

# Shopify
SHOPIFY_SHOP_DOMAIN=your-shop.myshopify.com
SHOPIFY_API_KEY=your-api-key
SHOPIFY_API_SECRET=your-api-secret
SHOPIFY_ACCESS_TOKEN=your-access-token
SHOPIFY_WEBHOOK_SECRET=your-webhook-secret

# SimplyPrint
SIMPLYPRINT_API_URL=https://api.simplyprint.io/v1
SIMPLYPRINT_API_KEY=your-api-key

# Sendcloud (optional)
SENDCLOUD_PUBLIC_KEY=your-public-key
SENDCLOUD_SECRET_KEY=your-secret-key
SHIPPING_ENABLED=true

# Application
API_KEY=your-admin-api-key
NODE_ENV=development
PORT=3000

4. Start Database

Using Docker:

docker-compose up -d postgres

Or use your local PostgreSQL installation.

5. Run Migrations

pnpm prisma migrate dev

6. Start Development Servers

# Start all services
pnpm dev

# Or start individually
pnpm nx serve api    # Backend on http://localhost:3000
pnpm nx serve web    # Frontend on http://localhost:4200

7. Verify Installation

# Check API health
curl http://localhost:3000/health

# Access Swagger docs
open http://localhost:3000/api/docs

# Access dashboard
open http://localhost:4200

Docker Development

Build and run everything in Docker:

docker-compose up -d

Testing

# Run all tests
pnpm test

# Run with coverage
pnpm test:coverage

# Run E2E tests
pnpm e2e

Troubleshooting

Database Connection Issues

  1. Verify PostgreSQL is running
  2. Check DATABASE_URL format
  3. Ensure database exists: createdb forma3d_connect

Port Conflicts

If port 3000 is in use:

PORT=3001 pnpm nx serve api

Prisma Issues

Regenerate Prisma client:

pnpm prisma generate

Reset database:

pnpm prisma migrate reset
#### 2. Troubleshooting Guide

Create `docs/04-development/troubleshooting.md`:

```markdown
# Troubleshooting Guide

## Common Issues

### Build Errors

#### "Cannot find module '@forma3d/...'"

**Cause:** Library not built or missing from node_modules

**Solution:**
```bash
pnpm install
pnpm nx run-many --target=build --all

TypeScript compilation errors

Cause: Type mismatches or outdated types

Solution:

pnpm prisma generate  # Regenerate Prisma types
pnpm nx reset         # Clear Nx cache
pnpm install          # Reinstall dependencies

Runtime Errors

"ECONNREFUSED" to database

Cause: Database not running or wrong connection string

Solution: 1. Start database: docker-compose up -d postgres 2. Verify DATABASE_URL in .env 3. Check network connectivity

"Invalid API key" from external services

Cause: Missing or incorrect API credentials

Solution: 1. Verify credentials in .env 2. Check for extra spaces or newlines 3. Regenerate API keys if needed

Webhook Issues

Shopify webhooks not arriving

Cause: Webhook URL not accessible or HMAC validation failing

Solution: 1. Use ngrok for local development: ngrok http 3000 2. Update webhook URL in Shopify admin 3. Verify SHOPIFY_WEBHOOK_SECRET

SimplyPrint webhooks failing

Cause: Token mismatch or network issues

Solution: 1. Verify SIMPLYPRINT_WEBHOOK_TOKEN 2. Check firewall/security group rules 3. Review SimplyPrint webhook logs

Performance Issues

Slow API responses

Cause: Database queries not optimized

Solution: 1. Enable query logging in Prisma 2. Add missing indexes 3. Use pagination for large datasets

Memory issues

Cause: Memory leaks or insufficient resources

Solution: 1. Monitor with docker stats 2. Increase container memory limits 3. Review for memory leaks in code

Testing Issues

Tests timing out

Cause: Async operations not completing

Solution: 1. Increase Jest timeout 2. Check for unresolved promises 3. Verify test database is accessible

MSW not intercepting requests

Cause: Handler not matching request

Solution: 1. Check handler URL patterns 2. Verify request method (GET/POST) 3. Add console.log to handler to debug

Logs and Debugging

View API Logs

# Development
pnpm nx serve api --verbose

# Docker
docker logs forma3d-api -f --tail 100

# Staging
ssh staging 'docker logs forma3d-api'

Enable Debug Mode

DEBUG=forma3d:*
LOG_LEVEL=debug

Prisma Query Logging

DEBUG=prisma:query

Getting Help

  1. Check this troubleshooting guide
  2. Search existing GitHub issues
  3. Review Sentry for similar errors
  4. Ask in team Slack channel
  5. Create a GitHub issue with reproduction steps
    #### 3. Update README.md
    
    Update the main README with comprehensive documentation covering:
    
    - Project overview
    - Features list
    - Quick start guide
    - Architecture overview
    - API documentation link
    - Development setup
    - Testing instructions
    - Deployment guide
    - Contributing guidelines
    
    ---
    
    ## 🔧 Feature F6.4: Security Hardening
    
    ### Requirements Reference
    
    - NFR-SE-001: Secure Credential Storage
    - NFR-SE-002: Webhook Verification
    - Security scan passing
    
    ### Implementation
    
    #### 1. Rate Limiting
    
    Create `apps/api/src/common/guards/throttler.guard.ts`:
    
    ```typescript
    import { Injectable, ExecutionContext } from '@nestjs/common';
    import { ThrottlerGuard } from '@nestjs/throttler';
    
    @Injectable()
    export class CustomThrottlerGuard extends ThrottlerGuard {
      protected async getTracker(req: Record<string, unknown>): Promise<string> {
        // Use X-Forwarded-For header when behind proxy
        const forwarded = req.headers?.['x-forwarded-for'] as string;
        if (forwarded) {
          return forwarded.split(',')[0].trim();
        }
        return req.ip as string;
      }
    
      protected async shouldSkip(context: ExecutionContext): Promise<boolean> {
        const request = context.switchToHttp().getRequest();
    
        // Skip rate limiting for health checks
        if (request.url.startsWith('/health')) {
          return true;
        }
    
        return false;
      }
    }
    

Update apps/api/src/app.module.ts:

import { ThrottlerModule } from '@nestjs/throttler';

@Module({
  imports: [
    // ... existing imports
    ThrottlerModule.forRoot([
      {
        name: 'short',
        ttl: 1000,     // 1 second
        limit: 10,      // 10 requests per second
      },
      {
        name: 'medium',
        ttl: 10000,    // 10 seconds
        limit: 50,      // 50 requests per 10 seconds
      },
      {
        name: 'long',
        ttl: 60000,    // 1 minute
        limit: 200,     // 200 requests per minute
      },
    ]),
  ],
  providers: [
    {
      provide: APP_GUARD,
      useClass: CustomThrottlerGuard,
    },
  ],
})
export class AppModule {}

2. Security Headers

Create apps/api/src/common/middleware/security-headers.middleware.ts:

import { Injectable, NestMiddleware } from '@nestjs/common';
import { Request, Response, NextFunction } from 'express';
import helmet from 'helmet';

@Injectable()
export class SecurityHeadersMiddleware implements NestMiddleware {
  private readonly helmet = helmet({
    contentSecurityPolicy: {
      directives: {
        defaultSrc: ["'self'"],
        styleSrc: ["'self'", "'unsafe-inline'"],
        scriptSrc: ["'self'"],
        imgSrc: ["'self'", 'data:', 'https:'],
        connectSrc: ["'self'", 'https://api.sentry.io'],
        frameSrc: ["'none'"],
        objectSrc: ["'none'"],
      },
    },
    crossOriginEmbedderPolicy: false, // Required for Swagger UI
    crossOriginOpenerPolicy: { policy: 'same-origin-allow-popups' },
    crossOriginResourcePolicy: { policy: 'cross-origin' },
    hsts: {
      maxAge: 31536000,
      includeSubDomains: true,
      preload: true,
    },
    noSniff: true,
    referrerPolicy: { policy: 'strict-origin-when-cross-origin' },
    xssFilter: true,
  });

  use(req: Request, res: Response, next: NextFunction) {
    this.helmet(req, res, next);
  }
}

3. Dependency Security Scan

Create .github/workflows/security-scan.yml:

name: Security Scan

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]
  schedule:
    - cron: '0 0 * * 1' # Weekly on Monday

jobs:
  dependency-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install pnpm
        uses: pnpm/action-setup@v2
        with:
          version: 8

      - name: Install dependencies
        run: pnpm install --frozen-lockfile

      - name: Run npm audit
        run: pnpm audit --audit-level=high
        continue-on-error: true

      - name: Run Snyk scan
        uses: snyk/actions/node@master
        continue-on-error: true
        env:
          SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}

      - name: Upload Snyk report
        uses: github/codeql-action/upload-sarif@v2
        if: always()
        with:
          sarif_file: snyk.sarif
        continue-on-error: true

  code-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Initialize CodeQL
        uses: github/codeql-action/init@v2
        with:
          languages: javascript, typescript

      - name: Perform CodeQL Analysis
        uses: github/codeql-action/analyze@v2

4. Authentication Audit

Create a checklist for security review:

# Security Audit Checklist

## Authentication & Authorization

- [ ] API key authentication implemented for admin endpoints
- [ ] API keys are stored hashed in database (if applicable)
- [ ] Rate limiting applied to authentication endpoints
- [ ] Failed authentication attempts are logged

## Webhook Security

- [ ] Shopify webhooks use HMAC verification
- [ ] SimplyPrint webhooks use token verification
- [ ] Sendcloud webhooks verified (if applicable)
- [ ] Webhook idempotency implemented

## Data Protection

- [ ] Sensitive data is not logged (passwords, tokens, etc.)
- [ ] Database credentials are not in code
- [ ] API keys are stored in environment variables
- [ ] HTTPS enforced in production

## Input Validation

- [ ] All DTOs have validation decorators
- [ ] JSON payloads are validated at boundaries
- [ ] SQL injection prevented via Prisma
- [ ] XSS prevented via proper output encoding

## Dependencies

- [ ] No critical vulnerabilities in dependencies
- [ ] Dependencies are up to date
- [ ] Lock file is committed

## Infrastructure

- [ ] TLS certificates valid and auto-renewed
- [ ] Security headers configured
- [ ] CORS properly configured
- [ ] Firewall rules reviewed

🧪 Testing Requirements

Test Coverage Requirements

Per requirements.md (NFR-MA-002):

  • Unit Tests: > 80% coverage for all services
  • Integration Tests: All API integrations tested
  • E2E Tests: Critical paths covered
  • Load Tests: 500+ orders/day capacity verified

Unit Test Scenarios Required

Category Scenario Priority
Health Indicators Shopify health check succeeds High
Health Indicators SimplyPrint health check succeeds High
Health Indicators Sendcloud health check (disabled) Medium
Rate Limiting Requests are throttled correctly High
Rate Limiting Health endpoints bypass throttling Medium
Security Headers All required headers present High

Load Test Scenarios

Scenario Description Target
Sustained Load 1 request/second for 5 minutes < 1% errors
Spike Test Ramp to 50 requests/second < 5% errors
Order Throughput 500 orders/day simulation 100% success
Dashboard Load 10 concurrent dashboard users < 2s latency

✅ Validation Checklist

Infrastructure

  • All modules compile without errors
  • pnpm nx build api succeeds
  • pnpm nx build web succeeds
  • pnpm lint passes on all files
  • No TypeScript errors

Testing (F6.1)

  • Unit test coverage > 80% for backend
  • Unit test coverage > 60% for frontend
  • Integration tests for order flow
  • Integration tests for cancellation flow
  • Integration tests for error recovery
  • E2E critical path tests passing
  • Load tests pass with 500 orders/day
  • Performance meets latency requirements (< 2s)

Monitoring (F6.2)

  • Health check endpoints working
  • Shopify health indicator implemented
  • SimplyPrint health indicator implemented
  • Sendcloud health indicator implemented
  • Alerting rules defined
  • Runbook complete

Documentation (F6.3)

  • README.md complete and up to date
  • Environment setup guide complete
  • Troubleshooting guide complete
  • Runbook complete
  • Keys & certificates inventory complete (with renewal procedures)
  • API documentation (Swagger) complete
  • Architecture diagrams validated

Security (F6.4)

  • Rate limiting implemented
  • Security headers configured
  • Dependency scan passing
  • Security audit checklist complete
  • No critical vulnerabilities

🚫 Constraints and Rules

MUST DO

  • Achieve 80%+ test coverage for backend
  • Implement load testing for 500+ orders/day
  • Add health indicators for all external services
  • Create complete runbook for operations
  • Configure security headers and rate limiting
  • Pass dependency security scan
  • Update all documentation

MUST NOT

  • Skip writing tests to save time
  • Deploy without load testing
  • Leave security vulnerabilities unaddressed
  • Commit hardcoded credentials
  • Deploy without complete documentation
  • Skip security audit checklist

🎬 Execution Order

Testing (F6.1)

  1. Analyze current test coverage and identify gaps
  2. Write missing unit tests to reach 80%+ coverage
  3. Create integration tests for order flow
  4. Create integration tests for cancellation flow
  5. Create integration tests for error recovery
  6. Set up K6 load testing infrastructure
  7. Create load test scenarios for throughput testing
  8. Run load tests and document results
  9. Optimize performance based on load test results

Monitoring (F6.2)

  1. Create health indicators for external services
  2. Update health controller with enhanced endpoints
  3. Define alerting rules for monitoring
  4. Create operations runbook with procedures

Documentation (F6.3)

  1. Create environment setup guide
  2. Create troubleshooting guide
  3. Update README.md with complete documentation
  4. Validate and update architecture diagrams

Security (F6.4)

  1. Implement rate limiting with @nestjs/throttler
  2. Configure security headers with helmet
  3. Create security scan workflow for CI
  4. Complete security audit checklist

Validation

  1. Run all tests and verify coverage
  2. Run load tests against staging
  3. Verify all health endpoints work
  4. Complete security audit
  5. Final documentation review

📊 Expected Output

When Phase 6 is complete:

Verification Commands

# Run all tests with coverage
pnpm test:coverage

# Expected output: > 80% coverage

# Run load tests
pnpm load-test:staging

# Expected output: All thresholds passed

# Check health endpoints
curl https://staging-connect-api.forma3d.be/health
curl https://staging-connect-api.forma3d.be/health/dependencies

# Run security scan
pnpm audit

# Expected output: 0 high/critical vulnerabilities

Success Metrics

Metric Target Verification
Unit test coverage > 80% pnpm test:coverage
Integration tests All passing pnpm test:integration
Load test (orders/day) 500+ K6 load test results
API latency (p95) < 2 seconds K6 metrics
Error rate under load < 1% K6 metrics
Security vulnerabilities 0 critical/high pnpm audit
Documentation 100% complete Manual review

📝 Documentation Updates

CRITICAL: All documentation must be updated to reflect Phase 6 completion.

docs/04-development/implementation-plan.md Updates Required

Update the implementation plan to mark Phase 6 as complete:

  • Mark F6.1 (Comprehensive Testing) as ✅ Completed
  • Mark F6.2 (Monitoring and Alerting) as ✅ Completed
  • Mark F6.3 (Documentation) as ✅ Completed
  • Mark F6.4 (Security Hardening) as ✅ Completed
  • Update Phase 6 Exit Criteria with checkmarks
  • Add implementation notes and component paths
  • Update revision history with completion date

Additional Documentation

  • Update README.md with complete project documentation
  • Create docs/04-development/runbook.md
  • Create docs/04-development/troubleshooting.md
  • Create docs/04-development/environment-setup.md
  • Create docs/04-development/keys-certificates-inventory.md
  • Validate all architecture diagrams

🔗 Phase 6 Exit Criteria

From implementation-plan.md:

  • Test coverage > 80%
  • Load testing passed
  • Monitoring operational
  • Documentation complete
  • Security review passed

Additional Exit Criteria

  • All health indicators implemented and working
  • Alerting rules defined
  • Operations runbook complete
  • Rate limiting configured
  • Security headers configured
  • Dependency scan passing
  • Security audit checklist complete

🎉 Production Readiness

With Phase 6 complete, Forma3D.Connect is ready for production:

Production Checklist

  • All tests passing
  • Load testing verified capacity
  • Monitoring and alerting in place
  • Runbook available for operations
  • Security hardened
  • Documentation complete
  • CI/CD pipeline verified

Go-Live Steps

  1. Final security review
  2. Update DNS for production domains
  3. Configure production environment variables
  4. Deploy to production
  5. Verify health checks
  6. Monitor for first 24 hours
  7. Announce go-live

END OF PROMPT


This prompt concludes the Forma3D.Connect implementation phases. The AI should implement all Phase 6 hardening features to ensure the system is production-ready with comprehensive testing, monitoring, documentation, and security. After Phase 6, the system achieves full production readiness.