AI Prompt: Forma3D.Connect — Phase 6: Hardening (Production Readiness)¶
Purpose: This prompt instructs an AI to implement Phase 6 of Forma3D.Connect
Estimated Effort: 44 hours (~2 weeks)
Prerequisites: Phase 5k completed (All tech debt phases 5e-5k resolved)
Output: Production-ready system with comprehensive testing, monitoring, security hardening, and complete documentation
Status: ⏳ PENDING
🎯 Mission¶
You are continuing development of Forma3D.Connect, building on the Phase 5 foundation (including all tech debt resolution phases 5e-5k). Your task is to implement Phase 6: Hardening — completing the production readiness requirements to ensure the system is reliable, performant, secure, and fully documented.
Phase 6 delivers:
- Comprehensive test suite with 80%+ coverage
- Integration and E2E tests for critical paths
- Performance and load testing (500+ orders/day capacity)
- Production monitoring and alerting infrastructure
- Complete technical documentation and runbooks
- Security hardening and vulnerability remediation
Phase 6 ensures the system is ready for production workloads:
Testing → Monitoring → Documentation → Security → Production Ready ✅
📋 Phase 6 Context¶
What Was Built in Previous Phases¶
The complete automation system is already in place:
- Phase 0: Foundation ✅
- Nx monorepo with
apps/api,apps/web, and shared libs - PostgreSQL database with Prisma schema
- NestJS backend structure with modules, services, repositories
-
Azure DevOps CI/CD pipeline
-
Phase 1: Shopify Inbound ✅
- Shopify webhooks receiver with HMAC verification
- Order storage and status management
- Product mapping CRUD operations
- Event logging service
- OpenAPI/Swagger documentation at
/api/docs -
Aikido Security Platform integration
-
Phase 1b: Observability ✅
- Sentry error tracking and performance monitoring
- OpenTelemetry-first architecture
- Structured JSON logging with Pino and correlation IDs
- React error boundaries
- BusinessObservabilityService for state transition and flow tracking
- Flow milestone tracking with timing (order automation cycle)
-
State change logging with old→new state transitions
-
Phase 1c: Staging Deployment ✅
- Docker images with multi-stage builds
- Traefik reverse proxy with Let's Encrypt TLS
- Zero-downtime deployments via Docker Compose
-
Staging environment:
https://staging-connect.forma3d.be -
Phase 1d: Acceptance Testing ✅
- Playwright + Gherkin acceptance tests
- Given/When/Then scenarios for deployment verification
-
Azure DevOps pipeline integration
-
Phase 2: SimplyPrint Core ✅
- SimplyPrint API client with HTTP Basic Auth
- Automated print job creation from orders
- Print job status monitoring (webhook + polling)
-
Order-job orchestration with
order.ready-for-fulfillmentevent -
Phase 3: Fulfillment Loop ✅
- Automated Shopify fulfillment creation
- Order cancellation handling
- Retry queue with exponential backoff
- Email notifications for critical failures
-
API key authentication for admin endpoints
-
Phase 4: Dashboard MVP ✅
- React 19 dashboard with TanStack Query
- Order management UI (list, detail, actions)
- Product mapping configuration UI
- Real-time updates via Socket.IO
-
Activity logs with filtering and export
-
Phase 5: Shipping Integration ✅
- Sendcloud API client for shipping labels
- Automated label generation on order completion
- Tracking sync to Shopify fulfillments
-
Shipping management UI in dashboard
-
Phase 5b: Domain Boundaries ✅
- Correlation ID infrastructure
- Domain contracts library (
libs/domain-contracts) - Repository encapsulation
-
Interface-based service dependencies
-
Phase 5c: Webhook Idempotency ✅
- Database-backed webhook idempotency (TD-001 resolved)
-
Automated cleanup of expired records
-
Phase 5d: Frontend Tests ✅
- Vitest configuration with React Testing Library
- MSW API mocking layer
-
200 frontend tests (TD-002 resolved)
-
Phase 5e-5k: Tech Debt Resolution ✅ (assumed complete before Phase 6)
- F5e: Typed JSON Schemas (TD-003)
- F5f: Shared API Types (TD-004)
- F5g: Structured Logging (TD-005)
- F5h: Controller Tests (TD-006)
- F5i: Domain Contract Cleanup (TD-007)
- F5j: Typed Error Hierarchy (TD-008)
- F5k: Configuration Externalization (TD-009)
What Phase 6 Builds¶
| Feature | Description | Effort |
|---|---|---|
| F6.1: Comprehensive Testing | Achieve 80%+ coverage, E2E & load tests | 16 hours |
| F6.2: Monitoring and Alerting | Health checks, alerting, metrics dashboard | 8 hours |
| F6.3: Documentation | Complete technical docs, runbooks, guides | 12 hours |
| F6.4: Security Hardening | Dependency scan, rate limiting, security audit | 8 hours |
🛠️ Tech Stack Reference¶
All technologies from previous phases remain. Additional packages for Phase 6:
| Package | Purpose |
|---|---|
k6 |
Load testing tool |
@nestjs/throttler |
Rate limiting for NestJS |
helmet |
Security headers middleware |
express-rate-limit |
Backup rate limiting (if needed) |
autocannon |
Alternative load testing |
clinic |
Node.js performance profiling |
🏗️ Architecture Reference¶
Detailed Architecture Diagrams¶
📐 For detailed architecture, refer to the existing PlantUML diagrams:
Diagram Path Description Context View docs/03-architecture/c4-model/1-context/C4_Context.pumlSystem context diagram Container View docs/03-architecture/c4-model/2-container/C4_Container.pumlSystem containers and interactions Component View docs/03-architecture/c4-model/3-component/C4_Component.pumlBackend component architecture Order State docs/03-architecture/state-machines/C4_Code_State_Order.pumlOrder status state machine Domain Model docs/03-architecture/c4-model/4-code/C4_Code_DomainModel.pumlEntity relationships These PlantUML diagrams should be validated and updated as part of Phase 6.
Current System Health Endpoints¶
| Endpoint | API | Web | Description |
|---|---|---|---|
/health |
✅ | ✅ | Full health status with build info |
/health/live |
✅ | ✅ | Simple liveness probe |
/health/ready |
✅ | - | Readiness probe (checks database) |
Phase 6 Focus Areas¶
┌──────────────────────────────────────────────────────────────────┐
│ PHASE 6: HARDENING │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ TESTING │ │ MONITORING │ │ SECURITY │ │
│ │ │ │ │ │ │ │
│ │ • Unit 80%+ │ │ • Health checks│ │ • Dep scan │ │
│ │ • Integration │ │ • Alerting │ │ • Rate limits │ │
│ │ • E2E paths │ │ • Metrics │ │ • Headers │ │
│ │ • Load 500/day │ │ • Runbooks │ │ • Auth audit │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ DOCUMENTATION │ │
│ │ │ │
│ │ • README complete • API docs (Swagger) │ │
│ │ • Architecture docs • Troubleshooting guide │ │
│ │ • Runbook operations • Environment setup │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ↓ │
│ ┌──────────────────────┐ │
│ │ PRODUCTION READY ✅ │ │
│ └──────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
📁 Files to Create/Modify¶
Testing Infrastructure¶
apps/api/src/
├── **/__tests__/ # Unit tests for all modules
│ ├── orders.controller.spec.ts # Controller tests (if not done in 5h)
│ ├── print-jobs.controller.spec.ts
│ └── ...
│
├── test/
│ ├── integration/
│ │ ├── order-flow.spec.ts # Order → Print → Fulfill flow
│ │ ├── cancellation-flow.spec.ts # Cancellation handling
│ │ └── error-recovery.spec.ts # Error recovery scenarios
│ └── e2e/
│ └── critical-paths.spec.ts # E2E critical path tests
apps/web/src/
├── **/__tests__/ # Additional component tests
load-tests/
├── k6/
│ ├── config.js # K6 configuration
│ ├── scenarios/
│ │ ├── order-throughput.js # 500 orders/day simulation
│ │ ├── dashboard-load.js # Dashboard concurrent users
│ │ └── webhook-burst.js # Webhook burst handling
│ └── reports/ # Generated reports
Monitoring Infrastructure¶
apps/api/src/
├── health/
│ ├── health.module.ts # UPDATE: Add external service checks
│ ├── health.controller.ts # UPDATE: Enhanced health endpoints
│ └── indicators/
│ ├── shopify.indicator.ts # Shopify API health check
│ ├── simplyprint.indicator.ts # SimplyPrint API health check
│ ├── sendcloud.indicator.ts # Sendcloud API health check
│ └── database.indicator.ts # Database health check
deployment/
├── monitoring/
│ ├── alerting-rules.yml # Alert definitions
│ ├── runbook.md # Operations runbook
│ └── dashboard.json # Metrics dashboard config
Security Hardening¶
apps/api/src/
├── common/
│ ├── guards/
│ │ └── throttler.guard.ts # Rate limiting guard
│ ├── middleware/
│ │ └── security-headers.middleware.ts # Security headers
│ └── filters/
│ └── global-exception.filter.ts # UPDATE: Enhanced error handling
.github/workflows/
└── security-scan.yml # Dependency security scan workflow
Documentation¶
docs/
├── 04-development/
│ ├── runbook.md # NEW: Operations runbook
│ ├── troubleshooting.md # NEW: Troubleshooting guide
│ └── environment-setup.md # NEW: Environment setup guide
├── 03-architecture/
│ └── (validate and update all diagrams)
└── README.md # UPDATE: Complete project documentation
🔧 Feature F6.1: Comprehensive Testing¶
Requirements Reference¶
- NFR-MA-002: Test Coverage (> 80%)
- NFR-PE-001: Performance Requirements
- Success Metric: 99% automation success rate
Implementation¶
1. Test Coverage Analysis¶
First, analyze current coverage and identify gaps:
# Generate coverage report for backend
pnpm nx test api --coverage
# Generate coverage report for frontend
pnpm nx test web --coverage
# Identify files with low coverage
# Target: 80%+ statements, functions, branches, lines
2. Integration Tests¶
Create apps/api/test/integration/order-flow.spec.ts:
import { Test, TestingModule } from '@nestjs/testing';
import { INestApplication } from '@nestjs/common';
import * as request from 'supertest';
import { AppModule } from '../../src/app.module';
import { PrismaService } from '../../src/database/prisma.service';
import { OrderStatus, PrintJobStatus } from '@prisma/client';
describe('Order Flow Integration (Integration)', () => {
let app: INestApplication;
let prisma: PrismaService;
beforeAll(async () => {
const moduleRef: TestingModule = await Test.createTestingModule({
imports: [AppModule],
}).compile();
app = moduleRef.createNestApplication();
prisma = moduleRef.get<PrismaService>(PrismaService);
await app.init();
});
afterAll(async () => {
await app.close();
});
beforeEach(async () => {
// Clean up test data
await prisma.eventLog.deleteMany();
await prisma.printJob.deleteMany();
await prisma.lineItem.deleteMany();
await prisma.shipment.deleteMany();
await prisma.order.deleteMany();
});
describe('Complete Order Flow', () => {
it('should process order from creation to fulfillment', async () => {
// 1. Create order via webhook simulation
const webhookPayload = createMockShopifyOrderWebhook();
const orderResponse = await request(app.getHttpServer())
.post('/api/v1/webhooks/shopify')
.set('X-Shopify-Topic', 'orders/create')
.set('X-Shopify-Hmac-SHA256', calculateHmac(webhookPayload))
.send(webhookPayload)
.expect(200);
expect(orderResponse.body.success).toBe(true);
// 2. Verify order was created
const order = await prisma.order.findFirst({
where: { shopifyOrderId: webhookPayload.id.toString() },
include: { lineItems: true, printJobs: true },
});
expect(order).toBeDefined();
expect(order!.status).toBe(OrderStatus.PENDING);
expect(order!.lineItems).toHaveLength(webhookPayload.line_items.length);
// 3. Simulate print job completion
for (const printJob of order!.printJobs) {
await prisma.printJob.update({
where: { id: printJob.id },
data: { status: PrintJobStatus.COMPLETED },
});
}
// 4. Trigger orchestration check
// (In real flow this happens via events)
// 5. Verify order is ready for fulfillment
const updatedOrder = await prisma.order.findUnique({
where: { id: order!.id },
});
// Order should be completed when all print jobs are done
expect(updatedOrder!.status).toBe(OrderStatus.COMPLETED);
});
it('should handle order cancellation during printing', async () => {
// Create order and start printing
const order = await createTestOrder(prisma);
// Simulate cancellation webhook
const cancelWebhook = createMockCancellationWebhook(order.shopifyOrderId);
await request(app.getHttpServer())
.post('/api/v1/webhooks/shopify')
.set('X-Shopify-Topic', 'orders/cancelled')
.set('X-Shopify-Hmac-SHA256', calculateHmac(cancelWebhook))
.send(cancelWebhook)
.expect(200);
// Verify order and print jobs are cancelled
const cancelledOrder = await prisma.order.findUnique({
where: { id: order.id },
include: { printJobs: true },
});
expect(cancelledOrder!.status).toBe(OrderStatus.CANCELLED);
cancelledOrder!.printJobs.forEach((job) => {
expect([PrintJobStatus.CANCELLED, PrintJobStatus.COMPLETED]).toContain(job.status);
});
});
});
describe('Error Recovery', () => {
it('should retry failed print jobs', async () => {
// Create order with a print job that will fail
const order = await createTestOrder(prisma);
const printJob = order.printJobs[0];
// Mark as failed
await prisma.printJob.update({
where: { id: printJob.id },
data: {
status: PrintJobStatus.FAILED,
errorMessage: 'Simulated failure',
},
});
// Trigger retry via API
await request(app.getHttpServer())
.post(`/api/v1/print-jobs/${printJob.id}/retry`)
.set('X-API-Key', process.env.API_KEY || 'test-key')
.expect(200);
// Verify job is queued again
const retriedJob = await prisma.printJob.findUnique({
where: { id: printJob.id },
});
expect(retriedJob!.status).toBe(PrintJobStatus.QUEUED);
});
});
});
// Helper functions
function createMockShopifyOrderWebhook() {
return {
id: Date.now(),
order_number: 1001,
email: 'test@example.com',
total_price: '49.99',
currency: 'EUR',
shipping_address: {
first_name: 'Test',
last_name: 'Customer',
address1: '123 Test St',
city: 'Brussels',
zip: '1000',
country_code: 'BE',
},
line_items: [
{
id: Date.now(),
variant_id: 12345,
title: 'Test Product',
quantity: 1,
price: '49.99',
sku: 'TEST-SKU-001',
},
],
};
}
function createMockCancellationWebhook(shopifyOrderId: string) {
return {
id: shopifyOrderId,
cancelled_at: new Date().toISOString(),
};
}
function calculateHmac(payload: object): string {
const crypto = require('crypto');
const secret = process.env.SHOPIFY_WEBHOOK_SECRET || 'test-secret';
return crypto
.createHmac('sha256', secret)
.update(JSON.stringify(payload))
.digest('base64');
}
async function createTestOrder(prisma: PrismaService) {
// Create a test order with line items and print jobs
return prisma.order.create({
data: {
shopifyOrderId: `test-${Date.now()}`,
shopifyOrderNumber: 'TEST-1001',
customerName: 'Test Customer',
customerEmail: 'test@example.com',
shippingAddress: {
first_name: 'Test',
last_name: 'Customer',
address1: '123 Test St',
city: 'Brussels',
zip: '1000',
country_code: 'BE',
},
totalPrice: 49.99,
currency: 'EUR',
status: OrderStatus.PROCESSING,
lineItems: {
create: [
{
shopifyLineItemId: `line-${Date.now()}`,
shopifyVariantId: '12345',
title: 'Test Product',
quantity: 1,
price: 49.99,
sku: 'TEST-SKU-001',
},
],
},
printJobs: {
create: [
{
simplyPrintJobId: `sp-${Date.now()}`,
status: PrintJobStatus.PRINTING,
},
],
},
},
include: {
lineItems: true,
printJobs: true,
},
});
}
3. Load Testing with K6¶
Create load-tests/k6/scenarios/order-throughput.js:
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';
// Custom metrics
const orderCreationRate = new Rate('order_creation_success');
const orderCreationDuration = new Trend('order_creation_duration');
// Test configuration
export const options = {
scenarios: {
// Simulate 500 orders/day = ~21 orders/hour = ~0.35 orders/minute
// But we want to test burst capacity too
sustained_load: {
executor: 'constant-arrival-rate',
rate: 1, // 1 order per second (3600/hour for stress test)
timeUnit: '1s',
duration: '5m',
preAllocatedVUs: 10,
maxVUs: 50,
},
spike_test: {
executor: 'ramping-arrival-rate',
startRate: 0,
timeUnit: '1s',
preAllocatedVUs: 50,
maxVUs: 100,
stages: [
{ target: 10, duration: '30s' }, // Ramp up to 10/s
{ target: 10, duration: '1m' }, // Hold at 10/s
{ target: 50, duration: '30s' }, // Spike to 50/s
{ target: 50, duration: '30s' }, // Hold spike
{ target: 0, duration: '30s' }, // Ramp down
],
startTime: '6m', // Start after sustained load
},
},
thresholds: {
'http_req_duration': ['p(95)<2000'], // 95th percentile < 2s
'http_req_failed': ['rate<0.01'], // Error rate < 1%
'order_creation_success': ['rate>0.99'], // 99% success rate
},
};
const BASE_URL = __ENV.API_URL || 'http://localhost:3000';
const API_KEY = __ENV.API_KEY || 'test-api-key';
// Webhook simulation (read-only for load test)
export default function () {
// Test 1: Health check
const healthRes = http.get(`${BASE_URL}/health`);
check(healthRes, {
'health check status is 200': (r) => r.status === 200,
});
// Test 2: Orders list (dashboard simulation)
const ordersRes = http.get(`${BASE_URL}/api/v1/orders?page=1&pageSize=20`, {
headers: { 'X-API-Key': API_KEY },
});
check(ordersRes, {
'orders list status is 200': (r) => r.status === 200,
'orders list has data': (r) => {
const body = JSON.parse(r.body);
return Array.isArray(body.orders);
},
});
// Test 3: Single order detail
const orderId = getRandomOrderId();
if (orderId) {
const orderDetailRes = http.get(`${BASE_URL}/api/v1/orders/${orderId}`, {
headers: { 'X-API-Key': API_KEY },
});
check(orderDetailRes, {
'order detail status is 200 or 404': (r) => [200, 404].includes(r.status),
});
}
// Test 4: Shipping methods (unauthenticated)
const shippingRes = http.get(`${BASE_URL}/api/v1/shipping/methods?country=BE`);
check(shippingRes, {
'shipping methods status is 200': (r) => r.status === 200,
});
sleep(0.1); // 100ms between iterations
}
function getRandomOrderId() {
// In real test, fetch from a pool of known order IDs
// For now, return null to skip order detail test
return null;
}
Create load-tests/k6/config.js:
export const environments = {
local: {
baseUrl: 'http://localhost:3000',
apiKey: 'dev-api-key',
},
staging: {
baseUrl: 'https://staging-connect-api.forma3d.be',
apiKey: '__STAGING_API_KEY__',
},
};
Add load test scripts to package.json:
{
"scripts": {
"load-test:local": "k6 run --env API_URL=http://localhost:3000 load-tests/k6/scenarios/order-throughput.js",
"load-test:staging": "k6 run --env API_URL=https://staging-connect-api.forma3d.be load-tests/k6/scenarios/order-throughput.js"
}
}
4. E2E Critical Path Tests¶
Create apps/api/test/e2e/critical-paths.spec.ts:
/**
* E2E Critical Path Tests
*
* These tests verify the complete automation flow works end-to-end
* against a real database (test environment).
*/
import { Test, TestingModule } from '@nestjs/testing';
import { INestApplication } from '@nestjs/common';
import { AppModule } from '../../src/app.module';
describe('Critical Paths E2E', () => {
let app: INestApplication;
beforeAll(async () => {
const moduleRef: TestingModule = await Test.createTestingModule({
imports: [AppModule],
}).compile();
app = moduleRef.createNestApplication();
await app.init();
});
afterAll(async () => {
await app.close();
});
describe('Order → Print Job → Fulfillment Path', () => {
it('should complete full automation cycle', async () => {
// This is a placeholder for a full E2E test
// In a real scenario, this would:
// 1. Create a Shopify order via webhook
// 2. Verify print jobs are created
// 3. Simulate print completion
// 4. Verify shipping label is generated
// 5. Verify Shopify fulfillment is created
expect(true).toBe(true);
});
});
describe('Cancellation Path', () => {
it('should handle cancellation at any stage', async () => {
expect(true).toBe(true);
});
});
describe('Error Recovery Path', () => {
it('should recover from transient failures', async () => {
expect(true).toBe(true);
});
});
});
🔧 Feature F6.1b: Enhanced Business Observability¶
Overview¶
The BusinessObservabilityService provides comprehensive logging for business events, state transitions, and automation flow tracking. This service integrates with Sentry for structured business metrics and the EventLogService for persistent audit trails.
Key Features¶
- State Transition Logging
- Tracks old→new state with timing for orders, print jobs, shipments, and fulfillments
- Includes correlation IDs for distributed tracing
-
Persists to EventLog for audit trail
-
Flow Milestone Tracking
- Tracks order automation cycle from receipt to fulfillment
- Measures elapsed time between milestones
-
Records flow completion/failure with total duration
-
Sentry Business Integration
- Sets order/print job context for better error correlation
- Adds breadcrumbs for state transitions
- Captures flow completion/failure as Sentry events
Available Milestones¶
| Milestone | Trigger Point |
|---|---|
order_received |
Flow starts (startFlow called) |
order_validated |
Order found and validated |
print_jobs_created |
Print jobs created for all line items |
all_jobs_printing |
All jobs transitioned to PRINTING |
all_jobs_completed |
All print jobs completed |
shipping_label_created |
Sendcloud label generated |
fulfillment_created |
Shopify fulfillment created |
flow_completed |
Full automation cycle completed |
flow_failed |
Automation failed at any point |
Usage Example¶
// Start tracking a new order flow
this.businessObservability.startFlow(orderId);
// Set Sentry context
this.businessObservability.setOrderContext({
id: order.id,
shopifyOrderId: order.shopifyOrderId,
shopifyOrderNumber: order.shopifyOrderNumber,
status: order.status,
});
// Log state transition
await this.businessObservability.logStateTransition({
entityType: 'order',
entityId: orderId,
orderId,
previousState: OrderStatus.PENDING,
newState: OrderStatus.PROCESSING,
trigger: 'webhook_received',
});
// Record milestone
await this.businessObservability.recordMilestone({
orderId,
milestone: 'print_jobs_created',
metadata: { jobCount: 3 },
});
// Complete flow (called automatically when flow_completed milestone is recorded)
await this.businessObservability.recordMilestone({
orderId,
milestone: 'flow_completed',
});
Log Output Examples¶
State Transition:
{
"message": "[STATE CHANGE] order:order-123 PENDING → PROCESSING",
"correlationId": "abc-123",
"entityType": "order",
"entityId": "order-123",
"previousState": "PENDING",
"newState": "PROCESSING",
"trigger": "webhook_received"
}
Flow Completion:
{
"message": "[FLOW COMPLETED] Order order-123 automation completed successfully",
"correlationId": "abc-123",
"orderId": "order-123",
"success": true,
"totalDurationMs": 125000,
"totalDurationMinutes": 2.1,
"milestones": {
"order_received": 0,
"order_validated": 50,
"print_jobs_created": 200,
"all_jobs_completed": 120000,
"shipping_label_created": 122000,
"fulfillment_created": 125000
}
}
🔧 Feature F6.2: Monitoring and Alerting¶
Requirements Reference¶
- NFR-AV-001: System Uptime (99%)
- NFR-PE-003: Processing Latency (< 2 minutes)
- Health check endpoints operational
Implementation¶
1. Enhanced Health Indicators¶
Create apps/api/src/health/indicators/shopify.indicator.ts:
import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { ShopifyApiClient } from '../../shopify/shopify-api.client';
@Injectable()
export class ShopifyHealthIndicator extends HealthIndicator {
constructor(private readonly shopifyClient: ShopifyApiClient) {
super();
}
async isHealthy(key: string): Promise<HealthIndicatorResult> {
try {
// Make a lightweight API call to verify connectivity
const isConnected = await this.shopifyClient.ping();
if (isConnected) {
return this.getStatus(key, true);
}
throw new HealthCheckError(
'Shopify API check failed',
this.getStatus(key, false, { message: 'Unable to connect to Shopify API' })
);
} catch (error) {
throw new HealthCheckError(
'Shopify API check failed',
this.getStatus(key, false, { error: error.message })
);
}
}
}
Create apps/api/src/health/indicators/simplyprint.indicator.ts:
import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { SimplyPrintApiClient } from '../../simplyprint/simplyprint-api.client';
@Injectable()
export class SimplyPrintHealthIndicator extends HealthIndicator {
constructor(private readonly simplyPrintClient: SimplyPrintApiClient) {
super();
}
async isHealthy(key: string): Promise<HealthIndicatorResult> {
try {
const isConnected = await this.simplyPrintClient.ping();
if (isConnected) {
return this.getStatus(key, true);
}
throw new HealthCheckError(
'SimplyPrint API check failed',
this.getStatus(key, false)
);
} catch (error) {
throw new HealthCheckError(
'SimplyPrint API check failed',
this.getStatus(key, false, { error: error.message })
);
}
}
}
Create apps/api/src/health/indicators/sendcloud.indicator.ts:
import { Injectable } from '@nestjs/common';
import { HealthIndicator, HealthIndicatorResult, HealthCheckError } from '@nestjs/terminus';
import { SendcloudApiClient } from '../../sendcloud/sendcloud-api.client';
@Injectable()
export class SendcloudHealthIndicator extends HealthIndicator {
constructor(private readonly sendcloudClient: SendcloudApiClient) {
super();
}
async isHealthy(key: string): Promise<HealthIndicatorResult> {
if (!this.sendcloudClient.isShippingEnabled()) {
return this.getStatus(key, true, { status: 'disabled' });
}
try {
// Get shipping methods as a health check
await this.sendcloudClient.getShippingMethods();
return this.getStatus(key, true);
} catch (error) {
throw new HealthCheckError(
'Sendcloud API check failed',
this.getStatus(key, false, { error: error.message })
);
}
}
}
2. Update Health Controller¶
Update apps/api/src/health/health.controller.ts:
import { Controller, Get } from '@nestjs/common';
import { ApiTags, ApiOperation, ApiResponse } from '@nestjs/swagger';
import {
HealthCheckService,
HealthCheck,
PrismaHealthIndicator,
HealthCheckResult,
} from '@nestjs/terminus';
import { PrismaService } from '../database/prisma.service';
import { ShopifyHealthIndicator } from './indicators/shopify.indicator';
import { SimplyPrintHealthIndicator } from './indicators/simplyprint.indicator';
import { SendcloudHealthIndicator } from './indicators/sendcloud.indicator';
@ApiTags('Health')
@Controller('health')
export class HealthController {
constructor(
private readonly health: HealthCheckService,
private readonly prismaHealth: PrismaHealthIndicator,
private readonly prisma: PrismaService,
private readonly shopifyHealth: ShopifyHealthIndicator,
private readonly simplyPrintHealth: SimplyPrintHealthIndicator,
private readonly sendcloudHealth: SendcloudHealthIndicator,
) {}
@Get()
@HealthCheck()
@ApiOperation({ summary: 'Full health check with all dependencies' })
@ApiResponse({ status: 200, description: 'System is healthy' })
@ApiResponse({ status: 503, description: 'System is unhealthy' })
async check(): Promise<HealthCheckResult> {
return this.health.check([
// Database
() => this.prismaHealth.pingCheck('database', this.prisma),
// External services
() => this.shopifyHealth.isHealthy('shopify'),
() => this.simplyPrintHealth.isHealthy('simplyprint'),
() => this.sendcloudHealth.isHealthy('sendcloud'),
]);
}
@Get('live')
@ApiOperation({ summary: 'Liveness probe - is the process running?' })
@ApiResponse({ status: 200, description: 'Process is alive' })
async liveness(): Promise<{ status: string; timestamp: string }> {
return {
status: 'ok',
timestamp: new Date().toISOString(),
};
}
@Get('ready')
@HealthCheck()
@ApiOperation({ summary: 'Readiness probe - is the service ready to accept traffic?' })
@ApiResponse({ status: 200, description: 'Service is ready' })
@ApiResponse({ status: 503, description: 'Service is not ready' })
async readiness(): Promise<HealthCheckResult> {
return this.health.check([
() => this.prismaHealth.pingCheck('database', this.prisma),
]);
}
@Get('dependencies')
@HealthCheck()
@ApiOperation({ summary: 'Check all external service dependencies' })
@ApiResponse({ status: 200, description: 'All dependencies healthy' })
@ApiResponse({ status: 503, description: 'One or more dependencies unhealthy' })
async dependencies(): Promise<HealthCheckResult> {
return this.health.check([
() => this.shopifyHealth.isHealthy('shopify'),
() => this.simplyPrintHealth.isHealthy('simplyprint'),
() => this.sendcloudHealth.isHealthy('sendcloud'),
]);
}
}
3. Alerting Rules¶
Create deployment/monitoring/alerting-rules.yml:
# Alerting Rules for Forma3D.Connect
# Configure in your monitoring system (e.g., Prometheus AlertManager, Datadog, etc.)
groups:
- name: forma3d-connect
rules:
# High error rate
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is above 5% for the last 5 minutes"
runbook_url: "docs/04-development/runbook.md#high-error-rate"
# API latency
- alert: HighAPILatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "High API latency"
description: "95th percentile latency is above 2 seconds"
runbook_url: "docs/04-development/runbook.md#high-latency"
# Database connection issues
- alert: DatabaseConnectionFailed
expr: pg_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Database connection failed"
description: "Cannot connect to PostgreSQL database"
runbook_url: "docs/04-development/runbook.md#database-connection"
# External service failures
- alert: ShopifyAPIDown
expr: shopify_api_up == 0
for: 5m
labels:
severity: high
annotations:
summary: "Shopify API unreachable"
description: "Cannot connect to Shopify API for 5 minutes"
runbook_url: "docs/04-development/runbook.md#shopify-down"
- alert: SimplyPrintAPIDown
expr: simplyprint_api_up == 0
for: 5m
labels:
severity: high
annotations:
summary: "SimplyPrint API unreachable"
description: "Cannot connect to SimplyPrint API for 5 minutes"
runbook_url: "docs/04-development/runbook.md#simplyprint-down"
# Order processing stuck
- alert: OrdersStuckInProcessing
expr: count(orders_status{status="processing", age_minutes > 60}) > 5
for: 10m
labels:
severity: warning
annotations:
summary: "Orders stuck in processing"
description: "Multiple orders have been in processing state for over 60 minutes"
runbook_url: "docs/04-development/runbook.md#stuck-orders"
# Retry queue growing
- alert: RetryQueueBacklog
expr: retry_queue_size > 50
for: 15m
labels:
severity: warning
annotations:
summary: "Retry queue backlog growing"
description: "More than 50 items in retry queue for over 15 minutes"
runbook_url: "docs/04-development/runbook.md#retry-queue-backlog"
4. Operations Runbook¶
Create docs/04-development/runbook.md:
# Forma3D.Connect Operations Runbook
## Overview
This runbook provides procedures for operating and troubleshooting Forma3D.Connect in production.
## Table of Contents
1. [Service Architecture](#service-architecture)
2. [Health Checks](#health-checks)
3. [Common Issues and Resolutions](#common-issues)
4. [Incident Response](#incident-response)
5. [Maintenance Procedures](#maintenance-procedures)
---
## Service Architecture
### Components
| Component | URL | Purpose |
|-----------|-----|---------|
| API | `https://connect-api.forma3d.be` | Backend NestJS application |
| Web | `https://connect.forma3d.be` | React dashboard |
| Database | PostgreSQL (managed) | Data persistence |
| Traefik | Internal | Reverse proxy with TLS |
### External Dependencies
| Service | Purpose | Documentation |
|---------|---------|---------------|
| Shopify | E-commerce platform | [Shopify API Docs](https://shopify.dev/docs/api) |
| SimplyPrint | 3D print management | [SimplyPrint API](https://simplyprint.io/docs) |
| Sendcloud | Shipping labels | [Sendcloud API](https://api.sendcloud.dev) |
| Sentry | Error monitoring | [Sentry Dashboard](https://sentry.io) |
---
## Health Checks
### Endpoints
```bash
# Full health check
curl https://connect-api.forma3d.be/health
# Liveness probe
curl https://connect-api.forma3d.be/health/live
# Readiness probe
curl https://connect-api.forma3d.be/health/ready
# External dependencies
curl https://connect-api.forma3d.be/health/dependencies
Expected Responses¶
Healthy:
{
"status": "ok",
"info": {
"database": { "status": "up" },
"shopify": { "status": "up" },
"simplyprint": { "status": "up" },
"sendcloud": { "status": "up" }
}
}
Unhealthy (example):
{
"status": "error",
"error": {
"shopify": {
"status": "down",
"error": "Connection timeout"
}
}
}
Common Issues¶
High Error Rate¶
Symptoms: Error rate > 5%, Sentry alerts
Investigation:
1. Check Sentry for error patterns
2. Review API logs: docker logs forma3d-api --tail 100
3. Check database connectivity
4. Verify external service status
Resolution: 1. If database issue: Restart database connection pool 2. If external service: Enable fallback/degraded mode 3. If code bug: Deploy hotfix
High Latency¶
Symptoms: 95th percentile response time > 2s
Investigation: 1. Check database query performance 2. Review slow query logs 3. Check external API response times 4. Monitor memory/CPU usage
Resolution: 1. Scale up resources if needed 2. Optimize slow queries 3. Add caching if appropriate
Database Connection¶
Symptoms: Database health check failing
Investigation: 1. Check PostgreSQL status 2. Verify connection string 3. Check network connectivity 4. Review connection pool settings
Resolution:
1. Restart API service: docker-compose restart api
2. Check database credentials
3. Contact database provider if managed
Shopify Down¶
Symptoms: Webhooks not being received, fulfillments failing
Investigation: 1. Check Shopify status page 2. Verify webhook configuration in Shopify admin 3. Review API logs for errors
Resolution: 1. Wait for Shopify to recover 2. Orders will be retried via retry queue 3. Manual reprocessing if needed
SimplyPrint Down¶
Symptoms: Print jobs not being created
Investigation: 1. Check SimplyPrint API status 2. Verify API credentials 3. Check SimplyPrint dashboard
Resolution:
1. Wait for service recovery
2. Failed jobs will be retried automatically
3. Manual retry: POST /api/v1/print-jobs/{id}/retry
Stuck Orders¶
Symptoms: Orders in PROCESSING state for > 60 minutes
Investigation: 1. Check print job status in dashboard 2. Verify SimplyPrint job status 3. Check for failed webhooks
Resolution: 1. Force refresh print job status 2. Manually update order status if needed 3. Contact SimplyPrint support if print issues
Retry Queue Backlog¶
Symptoms: > 50 items in retry queue
Investigation:
1. Check retry queue: GET /api/v1/admin/retry-queue
2. Identify failing job types
3. Check error messages
Resolution: 1. Fix underlying issue causing failures 2. Clear old/stale entries if safe 3. Increase retry queue processing capacity
Incident Response¶
Severity Levels¶
| Level | Description | Response Time | Examples |
|---|---|---|---|
| P1 - Critical | Complete service outage | 15 minutes | Database down, API unresponsive |
| P2 - High | Major feature broken | 1 hour | Webhooks failing, fulfillments stuck |
| P3 - Medium | Degraded performance | 4 hours | High latency, intermittent errors |
| P4 - Low | Minor issue | 1 business day | UI bugs, documentation issues |
Incident Template¶
## Incident Report
**Date:** YYYY-MM-DD
**Severity:** P1/P2/P3/P4
**Duration:** HH:MM - HH:MM
**Impact:** [Description of user impact]
### Timeline
- HH:MM - Issue detected
- HH:MM - Investigation started
- HH:MM - Root cause identified
- HH:MM - Fix deployed
- HH:MM - Issue resolved
### Root Cause
[Description of what caused the issue]
### Resolution
[What was done to fix it]
### Prevention
[What will be done to prevent recurrence]
Maintenance Procedures¶
Deploying Updates¶
# Pull latest changes
git pull origin main
# Build and push images
docker build -t forma3d-api:latest apps/api
docker push registry.digitalocean.com/forma3d/api:latest
# Deploy with zero downtime
docker-compose up -d --no-deps api
Database Migrations¶
# Run migrations
pnpm prisma migrate deploy
# Rollback (if needed)
pnpm prisma migrate resolve --rolled-back MIGRATION_NAME
Log Rotation¶
Logs are automatically rotated by Docker. Manual cleanup:
# Clear old logs
docker system prune --volumes
Backup Procedures¶
Database backups are handled by the managed PostgreSQL provider.
Manual backup:
pg_dump $DATABASE_URL > backup_$(date +%Y%m%d).sql
Keys & Certificates Inventory¶
IMPORTANT: Maintain an up-to-date inventory of all API keys, secrets, and certificates.
Create and maintain docs/04-development/keys-certificates-inventory.md:
| Key/Certificate | Purpose | Lifespan | Renewal Location | Renewal Procedure | Last Renewed |
|---|---|---|---|---|---|
| — INFRASTRUCTURE — | |||||
| Droplet SSH Key | SSH access to server | No expiry (rotate annually) | DigitalOcean → Settings → Security → SSH Keys | Generate new keypair, add to DO, update ~/.ssh/authorized_keys on droplet |
YYYY-MM-DD |
| Droplet Root Password | Emergency console access | No expiry (rotate annually) | DigitalOcean → Droplet → Access → Reset Root Password | Reset via DO console, store in password manager | YYYY-MM-DD |
| TLS Certificate (Let's Encrypt) | HTTPS for API/Web | 90 days | Let's Encrypt via Traefik ACME | Auto-renewed by Traefik (see note below) | Auto |
| — DATABASE — | |||||
| Database CA Certificate | SSL connection to managed DB | 1-5 years (provider-managed) | DigitalOcean → Databases → Your DB → Connection Details → Download CA | Download new CA cert, update ?sslmode=require&sslrootcert= path |
YYYY-MM-DD |
| Database Password | PostgreSQL access | No expiry (rotate quarterly) | DigitalOcean → Databases → Your DB → Users | Reset via provider, update DATABASE_URL |
YYYY-MM-DD |
| — CONTAINER REGISTRY — | |||||
| Container Registry Token | Push/pull Docker images | No expiry | DigitalOcean → Container Registry → API | Generate new token, update CI/CD variables | YYYY-MM-DD |
| Cosign Signing Key | Container image signing | No expiry | Self-generated | cosign generate-key-pair, update CI/CD secrets |
YYYY-MM-DD |
| — EXTERNAL SERVICES — | |||||
| Shopify API Key | Shopify Admin API access | No expiry | Shopify Admin → Apps → Your App | Regenerate in Shopify admin, update .env |
YYYY-MM-DD |
| Shopify API Secret | App authentication | No expiry | Shopify Admin → Apps → Your App | Regenerate in Shopify admin, update .env |
YYYY-MM-DD |
| Shopify Access Token | Store-specific access | No expiry (unless revoked) | Shopify Admin → Apps | Reinstall app or regenerate | YYYY-MM-DD |
| Shopify Webhook Secret | HMAC verification | No expiry | Shopify Admin → Notifications → Webhooks | Regenerate in webhooks settings | YYYY-MM-DD |
| SimplyPrint API Key | Print farm API access | No expiry | SimplyPrint Dashboard → API Settings | Generate new key in dashboard | YYYY-MM-DD |
| SimplyPrint Webhook Token | Webhook verification | No expiry | SimplyPrint Dashboard → Webhooks | Configure in webhook settings | YYYY-MM-DD |
| Sendcloud Public Key | Shipping API authentication | No expiry | Sendcloud Panel → Settings → API | Generate new integration | YYYY-MM-DD |
| Sendcloud Secret Key | Shipping API authentication | No expiry | Sendcloud Panel → Settings → API | Generate new integration | YYYY-MM-DD |
| — APPLICATION — | |||||
| API_KEY (internal) | Dashboard/Admin access | No expiry (rotate annually) | Self-generated | Generate new UUID, update .env |
YYYY-MM-DD |
| Sentry DSN | Error tracking | No expiry | Sentry Dashboard → Project Settings | Create new project if needed | YYYY-MM-DD |
| SMTP Credentials | Email notifications | Varies by provider | Email provider dashboard | Regenerate password/API key | YYYY-MM-DD |
| — CI/CD (Azure DevOps) — | |||||
| Azure DevOps PAT | Pipeline authentication | 1 year max | Azure DevOps → User Settings → Personal Access Tokens | Generate new PAT, update service connections | YYYY-MM-DD |
| Service Connection (SSH) | Deploy to droplet | Tied to SSH key | Azure DevOps → Project Settings → Service Connections | Update with new SSH private key | YYYY-MM-DD |
| Pipeline Variables | Secrets in pipelines | No expiry | Azure DevOps → Pipelines → Library → Variable Groups | Update individual variables as needed | YYYY-MM-DD |
Let's Encrypt TLS Certificate Auto-Renewal:
Traefik automatically handles Let's Encrypt certificate renewal. Key details:
- Validity: 90 days per certificate
- Auto-renewal: Traefik renews ~30 days before expiry (no manual action required)
- Storage: Certificates stored in Docker volume traefik-certs at /letsencrypt/acme.json
- Challenge: HTTP-01 challenge via port 80 (must remain accessible)
- Configuration: See deployment/staging/traefik.yml → certificatesResolvers.letsencrypt
# traefik.yml - ACME auto-renewal configuration
certificatesResolvers:
letsencrypt:
acme:
email: admin@forma3d.be # Expiry notifications sent here
storage: /letsencrypt/acme.json # Persisted in Docker volume
httpChallenge:
entryPoint: web # Port 80 for challenge
Monitoring TLS auto-renewal:
# Check certificate expiry date
echo | openssl s_client -connect staging-connect-api.forma3d.be:443 2>/dev/null | openssl x509 -noout -dates
# Check Traefik logs for renewal activity
docker logs forma3d-traefik 2>&1 | grep -i "acme\|certificate\|renew"
# Verify ACME storage file exists
docker exec forma3d-traefik ls -la /letsencrypt/acme.json
Troubleshooting failed renewal:
1. Ensure port 80 is open and reachable from internet
2. Check DNS still points to correct IP
3. Verify Traefik container is running: docker ps | grep traefik
4. Check Traefik logs for ACME errors
5. If needed, remove acme.json and restart Traefik to re-issue certificates
Renewal Calendar: - Weekly (automated): TLS certificates checked by Traefik, renewed if within 30 days of expiry - Monthly: Verify TLS auto-renewal is working (check cert dates) - Quarterly: Rotate database password, review API key usage - Annually: Rotate SSH keys, Azure DevOps PAT, internal API_KEY, review all external API keys - Before Expiry: Database CA certificate (monitor provider notifications), Azure DevOps PAT - On Incident: Rotate any potentially compromised credentials immediately
Monitoring Expiry:
- Set calendar reminders for annually-rotated credentials
- Subscribe to provider notifications (DigitalOcean, Azure DevOps)
- Let's Encrypt sends expiry warnings to admin@forma3d.be (but Traefik should renew automatically)
- Database CA cert expiry: openssl x509 -enddate -noout -in ca-certificate.crt
Renewal Checklist:
1. [ ] Generate new credential in source system
2. [ ] Update .env files (staging and production)
3. [ ] Update secrets in CI/CD (Azure DevOps variables)
4. [ ] Deploy with new credentials
5. [ ] Verify system still works (health checks)
6. [ ] Revoke old credential (if applicable)
7. [ ] Update this inventory with new "Last Renewed" date
Contact Information¶
| Role | Contact | Escalation |
|---|---|---|
| On-call Engineer | [email/phone] | Primary |
| Tech Lead | [email/phone] | If P1/P2 |
| Database Admin | [email/phone] | Database issues |
|
2. Install Dependencies¶
pnpm install
3. Configure Environment¶
Copy the example environment file:
cp .env.example .env
Edit .env with your configuration:
# Database
DATABASE_URL=postgresql://user:password@localhost:5432/forma3d_connect
# Shopify
SHOPIFY_SHOP_DOMAIN=your-shop.myshopify.com
SHOPIFY_API_KEY=your-api-key
SHOPIFY_API_SECRET=your-api-secret
SHOPIFY_ACCESS_TOKEN=your-access-token
SHOPIFY_WEBHOOK_SECRET=your-webhook-secret
# SimplyPrint
SIMPLYPRINT_API_URL=https://api.simplyprint.io/v1
SIMPLYPRINT_API_KEY=your-api-key
# Sendcloud (optional)
SENDCLOUD_PUBLIC_KEY=your-public-key
SENDCLOUD_SECRET_KEY=your-secret-key
SHIPPING_ENABLED=true
# Application
API_KEY=your-admin-api-key
NODE_ENV=development
PORT=3000
4. Start Database¶
Using Docker:
docker-compose up -d postgres
Or use your local PostgreSQL installation.
5. Run Migrations¶
pnpm prisma migrate dev
6. Start Development Servers¶
# Start all services
pnpm dev
# Or start individually
pnpm nx serve api # Backend on http://localhost:3000
pnpm nx serve web # Frontend on http://localhost:4200
7. Verify Installation¶
# Check API health
curl http://localhost:3000/health
# Access Swagger docs
open http://localhost:3000/api/docs
# Access dashboard
open http://localhost:4200
Docker Development¶
Build and run everything in Docker:
docker-compose up -d
Testing¶
# Run all tests
pnpm test
# Run with coverage
pnpm test:coverage
# Run E2E tests
pnpm e2e
Troubleshooting¶
Database Connection Issues¶
- Verify PostgreSQL is running
- Check DATABASE_URL format
- Ensure database exists:
createdb forma3d_connect
Port Conflicts¶
If port 3000 is in use:
PORT=3001 pnpm nx serve api
Prisma Issues¶
Regenerate Prisma client:
pnpm prisma generate
Reset database:
pnpm prisma migrate reset
#### 2. Troubleshooting Guide
Create `docs/04-development/troubleshooting.md`:
```markdown
# Troubleshooting Guide
## Common Issues
### Build Errors
#### "Cannot find module '@forma3d/...'"
**Cause:** Library not built or missing from node_modules
**Solution:**
```bash
pnpm install
pnpm nx run-many --target=build --all
TypeScript compilation errors¶
Cause: Type mismatches or outdated types
Solution:
pnpm prisma generate # Regenerate Prisma types
pnpm nx reset # Clear Nx cache
pnpm install # Reinstall dependencies
Runtime Errors¶
"ECONNREFUSED" to database¶
Cause: Database not running or wrong connection string
Solution:
1. Start database: docker-compose up -d postgres
2. Verify DATABASE_URL in .env
3. Check network connectivity
"Invalid API key" from external services¶
Cause: Missing or incorrect API credentials
Solution:
1. Verify credentials in .env
2. Check for extra spaces or newlines
3. Regenerate API keys if needed
Webhook Issues¶
Shopify webhooks not arriving¶
Cause: Webhook URL not accessible or HMAC validation failing
Solution:
1. Use ngrok for local development: ngrok http 3000
2. Update webhook URL in Shopify admin
3. Verify SHOPIFY_WEBHOOK_SECRET
SimplyPrint webhooks failing¶
Cause: Token mismatch or network issues
Solution:
1. Verify SIMPLYPRINT_WEBHOOK_TOKEN
2. Check firewall/security group rules
3. Review SimplyPrint webhook logs
Performance Issues¶
Slow API responses¶
Cause: Database queries not optimized
Solution: 1. Enable query logging in Prisma 2. Add missing indexes 3. Use pagination for large datasets
Memory issues¶
Cause: Memory leaks or insufficient resources
Solution:
1. Monitor with docker stats
2. Increase container memory limits
3. Review for memory leaks in code
Testing Issues¶
Tests timing out¶
Cause: Async operations not completing
Solution: 1. Increase Jest timeout 2. Check for unresolved promises 3. Verify test database is accessible
MSW not intercepting requests¶
Cause: Handler not matching request
Solution:
1. Check handler URL patterns
2. Verify request method (GET/POST)
3. Add console.log to handler to debug
Logs and Debugging¶
View API Logs¶
# Development
pnpm nx serve api --verbose
# Docker
docker logs forma3d-api -f --tail 100
# Staging
ssh staging 'docker logs forma3d-api'
Enable Debug Mode¶
DEBUG=forma3d:*
LOG_LEVEL=debug
Prisma Query Logging¶
DEBUG=prisma:query
Getting Help¶
- Check this troubleshooting guide
- Search existing GitHub issues
- Review Sentry for similar errors
- Ask in team Slack channel
- Create a GitHub issue with reproduction steps
#### 3. Update README.md Update the main README with comprehensive documentation covering: - Project overview - Features list - Quick start guide - Architecture overview - API documentation link - Development setup - Testing instructions - Deployment guide - Contributing guidelines --- ## 🔧 Feature F6.4: Security Hardening ### Requirements Reference - NFR-SE-001: Secure Credential Storage - NFR-SE-002: Webhook Verification - Security scan passing ### Implementation #### 1. Rate Limiting Create `apps/api/src/common/guards/throttler.guard.ts`: ```typescript import { Injectable, ExecutionContext } from '@nestjs/common'; import { ThrottlerGuard } from '@nestjs/throttler'; @Injectable() export class CustomThrottlerGuard extends ThrottlerGuard { protected async getTracker(req: Record<string, unknown>): Promise<string> { // Use X-Forwarded-For header when behind proxy const forwarded = req.headers?.['x-forwarded-for'] as string; if (forwarded) { return forwarded.split(',')[0].trim(); } return req.ip as string; } protected async shouldSkip(context: ExecutionContext): Promise<boolean> { const request = context.switchToHttp().getRequest(); // Skip rate limiting for health checks if (request.url.startsWith('/health')) { return true; } return false; } }
Update apps/api/src/app.module.ts:
import { ThrottlerModule } from '@nestjs/throttler';
@Module({
imports: [
// ... existing imports
ThrottlerModule.forRoot([
{
name: 'short',
ttl: 1000, // 1 second
limit: 10, // 10 requests per second
},
{
name: 'medium',
ttl: 10000, // 10 seconds
limit: 50, // 50 requests per 10 seconds
},
{
name: 'long',
ttl: 60000, // 1 minute
limit: 200, // 200 requests per minute
},
]),
],
providers: [
{
provide: APP_GUARD,
useClass: CustomThrottlerGuard,
},
],
})
export class AppModule {}
2. Security Headers¶
Create apps/api/src/common/middleware/security-headers.middleware.ts:
import { Injectable, NestMiddleware } from '@nestjs/common';
import { Request, Response, NextFunction } from 'express';
import helmet from 'helmet';
@Injectable()
export class SecurityHeadersMiddleware implements NestMiddleware {
private readonly helmet = helmet({
contentSecurityPolicy: {
directives: {
defaultSrc: ["'self'"],
styleSrc: ["'self'", "'unsafe-inline'"],
scriptSrc: ["'self'"],
imgSrc: ["'self'", 'data:', 'https:'],
connectSrc: ["'self'", 'https://api.sentry.io'],
frameSrc: ["'none'"],
objectSrc: ["'none'"],
},
},
crossOriginEmbedderPolicy: false, // Required for Swagger UI
crossOriginOpenerPolicy: { policy: 'same-origin-allow-popups' },
crossOriginResourcePolicy: { policy: 'cross-origin' },
hsts: {
maxAge: 31536000,
includeSubDomains: true,
preload: true,
},
noSniff: true,
referrerPolicy: { policy: 'strict-origin-when-cross-origin' },
xssFilter: true,
});
use(req: Request, res: Response, next: NextFunction) {
this.helmet(req, res, next);
}
}
3. Dependency Security Scan¶
Create .github/workflows/security-scan.yml:
name: Security Scan
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
schedule:
- cron: '0 0 * * 1' # Weekly on Monday
jobs:
dependency-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install pnpm
uses: pnpm/action-setup@v2
with:
version: 8
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Run npm audit
run: pnpm audit --audit-level=high
continue-on-error: true
- name: Run Snyk scan
uses: snyk/actions/node@master
continue-on-error: true
env:
SNYK_TOKEN: ${{ secrets.SNYK_TOKEN }}
- name: Upload Snyk report
uses: github/codeql-action/upload-sarif@v2
if: always()
with:
sarif_file: snyk.sarif
continue-on-error: true
code-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: javascript, typescript
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
4. Authentication Audit¶
Create a checklist for security review:
# Security Audit Checklist
## Authentication & Authorization
- [ ] API key authentication implemented for admin endpoints
- [ ] API keys are stored hashed in database (if applicable)
- [ ] Rate limiting applied to authentication endpoints
- [ ] Failed authentication attempts are logged
## Webhook Security
- [ ] Shopify webhooks use HMAC verification
- [ ] SimplyPrint webhooks use token verification
- [ ] Sendcloud webhooks verified (if applicable)
- [ ] Webhook idempotency implemented
## Data Protection
- [ ] Sensitive data is not logged (passwords, tokens, etc.)
- [ ] Database credentials are not in code
- [ ] API keys are stored in environment variables
- [ ] HTTPS enforced in production
## Input Validation
- [ ] All DTOs have validation decorators
- [ ] JSON payloads are validated at boundaries
- [ ] SQL injection prevented via Prisma
- [ ] XSS prevented via proper output encoding
## Dependencies
- [ ] No critical vulnerabilities in dependencies
- [ ] Dependencies are up to date
- [ ] Lock file is committed
## Infrastructure
- [ ] TLS certificates valid and auto-renewed
- [ ] Security headers configured
- [ ] CORS properly configured
- [ ] Firewall rules reviewed
🧪 Testing Requirements¶
Test Coverage Requirements¶
Per requirements.md (NFR-MA-002):
- Unit Tests: > 80% coverage for all services
- Integration Tests: All API integrations tested
- E2E Tests: Critical paths covered
- Load Tests: 500+ orders/day capacity verified
Unit Test Scenarios Required¶
| Category | Scenario | Priority |
|---|---|---|
| Health Indicators | Shopify health check succeeds | High |
| Health Indicators | SimplyPrint health check succeeds | High |
| Health Indicators | Sendcloud health check (disabled) | Medium |
| Rate Limiting | Requests are throttled correctly | High |
| Rate Limiting | Health endpoints bypass throttling | Medium |
| Security Headers | All required headers present | High |
Load Test Scenarios¶
| Scenario | Description | Target |
|---|---|---|
| Sustained Load | 1 request/second for 5 minutes | < 1% errors |
| Spike Test | Ramp to 50 requests/second | < 5% errors |
| Order Throughput | 500 orders/day simulation | 100% success |
| Dashboard Load | 10 concurrent dashboard users | < 2s latency |
✅ Validation Checklist¶
Infrastructure¶
- All modules compile without errors
-
pnpm nx build apisucceeds -
pnpm nx build websucceeds -
pnpm lintpasses on all files - No TypeScript errors
Testing (F6.1)¶
- Unit test coverage > 80% for backend
- Unit test coverage > 60% for frontend
- Integration tests for order flow
- Integration tests for cancellation flow
- Integration tests for error recovery
- E2E critical path tests passing
- Load tests pass with 500 orders/day
- Performance meets latency requirements (< 2s)
Monitoring (F6.2)¶
- Health check endpoints working
- Shopify health indicator implemented
- SimplyPrint health indicator implemented
- Sendcloud health indicator implemented
- Alerting rules defined
- Runbook complete
Documentation (F6.3)¶
- README.md complete and up to date
- Environment setup guide complete
- Troubleshooting guide complete
- Runbook complete
- Keys & certificates inventory complete (with renewal procedures)
- API documentation (Swagger) complete
- Architecture diagrams validated
Security (F6.4)¶
- Rate limiting implemented
- Security headers configured
- Dependency scan passing
- Security audit checklist complete
- No critical vulnerabilities
🚫 Constraints and Rules¶
MUST DO¶
- Achieve 80%+ test coverage for backend
- Implement load testing for 500+ orders/day
- Add health indicators for all external services
- Create complete runbook for operations
- Configure security headers and rate limiting
- Pass dependency security scan
- Update all documentation
MUST NOT¶
- Skip writing tests to save time
- Deploy without load testing
- Leave security vulnerabilities unaddressed
- Commit hardcoded credentials
- Deploy without complete documentation
- Skip security audit checklist
🎬 Execution Order¶
Testing (F6.1)¶
- Analyze current test coverage and identify gaps
- Write missing unit tests to reach 80%+ coverage
- Create integration tests for order flow
- Create integration tests for cancellation flow
- Create integration tests for error recovery
- Set up K6 load testing infrastructure
- Create load test scenarios for throughput testing
- Run load tests and document results
- Optimize performance based on load test results
Monitoring (F6.2)¶
- Create health indicators for external services
- Update health controller with enhanced endpoints
- Define alerting rules for monitoring
- Create operations runbook with procedures
Documentation (F6.3)¶
- Create environment setup guide
- Create troubleshooting guide
- Update README.md with complete documentation
- Validate and update architecture diagrams
Security (F6.4)¶
- Implement rate limiting with @nestjs/throttler
- Configure security headers with helmet
- Create security scan workflow for CI
- Complete security audit checklist
Validation¶
- Run all tests and verify coverage
- Run load tests against staging
- Verify all health endpoints work
- Complete security audit
- Final documentation review
📊 Expected Output¶
When Phase 6 is complete:
Verification Commands¶
# Run all tests with coverage
pnpm test:coverage
# Expected output: > 80% coverage
# Run load tests
pnpm load-test:staging
# Expected output: All thresholds passed
# Check health endpoints
curl https://staging-connect-api.forma3d.be/health
curl https://staging-connect-api.forma3d.be/health/dependencies
# Run security scan
pnpm audit
# Expected output: 0 high/critical vulnerabilities
Success Metrics¶
| Metric | Target | Verification |
|---|---|---|
| Unit test coverage | > 80% | pnpm test:coverage |
| Integration tests | All passing | pnpm test:integration |
| Load test (orders/day) | 500+ | K6 load test results |
| API latency (p95) | < 2 seconds | K6 metrics |
| Error rate under load | < 1% | K6 metrics |
| Security vulnerabilities | 0 critical/high | pnpm audit |
| Documentation | 100% complete | Manual review |
📝 Documentation Updates¶
CRITICAL: All documentation must be updated to reflect Phase 6 completion.
docs/04-development/implementation-plan.md Updates Required¶
Update the implementation plan to mark Phase 6 as complete:
- Mark F6.1 (Comprehensive Testing) as ✅ Completed
- Mark F6.2 (Monitoring and Alerting) as ✅ Completed
- Mark F6.3 (Documentation) as ✅ Completed
- Mark F6.4 (Security Hardening) as ✅ Completed
- Update Phase 6 Exit Criteria with checkmarks
- Add implementation notes and component paths
- Update revision history with completion date
Additional Documentation¶
- Update README.md with complete project documentation
- Create docs/04-development/runbook.md
- Create docs/04-development/troubleshooting.md
- Create docs/04-development/environment-setup.md
- Create docs/04-development/keys-certificates-inventory.md
- Validate all architecture diagrams
🔗 Phase 6 Exit Criteria¶
From implementation-plan.md:
- Test coverage > 80%
- Load testing passed
- Monitoring operational
- Documentation complete
- Security review passed
Additional Exit Criteria¶
- All health indicators implemented and working
- Alerting rules defined
- Operations runbook complete
- Rate limiting configured
- Security headers configured
- Dependency scan passing
- Security audit checklist complete
🎉 Production Readiness¶
With Phase 6 complete, Forma3D.Connect is ready for production:
Production Checklist¶
- All tests passing
- Load testing verified capacity
- Monitoring and alerting in place
- Runbook available for operations
- Security hardened
- Documentation complete
- CI/CD pipeline verified
Go-Live Steps¶
- Final security review
- Update DNS for production domains
- Configure production environment variables
- Deploy to production
- Verify health checks
- Monitor for first 24 hours
- Announce go-live
END OF PROMPT
This prompt concludes the Forma3D.Connect implementation phases. The AI should implement all Phase 6 hardening features to ensure the system is production-ready with comprehensive testing, monitoring, documentation, and security. After Phase 6, the system achieves full production readiness.