Skip to content

Incident Report: GridFlock Pipeline Failure — Order #1032

Date: February 23, 2026 Duration: ~1.5 hours investigation and remediation Severity: P2 — First customer order for custom grid blocked Status: Resolved Affected Order: #1032 (SKU: GF-100x100-IP-NOMAG)


Summary

The first-ever customer order for a custom GridFlock grid failed at the slicing step of the pipeline. The gridflock.pipeline-failed event appeared in Bull Board immediately after the order was placed. Investigation revealed three cascading issues: a segmentation fault in BambuStudio CLI (the slicer engine), a JSON body size limit on the print service, and incorrect tenant ID resolution. All three were resolved and the order was successfully processed.


Timeline

Time (approx.) Event
18:52 UTC Order #1032 placed via Shopify for a 100x100mm custom grid
18:52 UTC GridFlock pipeline triggered automatically by Order Service
18:52 UTC Pipeline fails at slicing step — gridflock.pipeline-failed event published
19:00 UTC Investigation begins via SSH to staging server
19:05 UTC Slicer container logs show Segmentation fault (core dumped) from BambuStudio
19:10 UTC Identified linuxserver/bambustudio:latest had auto-upgraded to v02.05
19:15 UTC Pinned base image to v02.04.00.70-ls128, rebuilt and redeployed
19:20 UTC v02.04 also segfaults — same crash during "nozzle volume processing"
19:30 UTC Investigated profile format issues (numeric vs string values in JSON)
19:40 UTC Fixed all custom profiles to use string values — still segfaults
19:50 UTC Tried bundled profiles instead of custom ones — still segfaults
20:00 UTC Tested with a trivial cube STL — still segfaults (rules out model complexity)
20:05 UTC Tried --orient 0, --no-check, software OpenGL rendering, 2GB swap — all segfault
20:10 UTC Concluded: BambuStudio v02.04+ has a fundamental CLI bug in headless Docker
20:15 UTC Tested linuxserver/bambustudio:01.08.03 — slicing succeeds with exit code 0
20:20 UTC Updated Dockerfile, added Xvfb, rebuilt and deployed slicer with v01.08.03
20:25 UTC Pipeline retry: slicing succeeds, but upload to SimplyPrint returns 500
20:27 UTC Identified PayloadTooLargeError — 172KB 3MF exceeds default 100KB JSON body limit
20:30 UTC Patched print service with 50mb body parser limit, restarted
20:32 UTC Pipeline retry: slicing and upload succeed, but mapping creation returns 500
20:33 UTC Identified FK constraint violation — wrong tenantId passed ("forma3d" vs UUID)
20:35 UTC Retried with correct tenant UUID 00000000-0000-0000-0000-000000000001
20:36 UTC Pipeline completes successfully — 1 file uploaded, mapping created

Root Causes

Root Cause 1: BambuStudio CLI Segfault (Primary)

What happened: BambuStudio versions 02.04.00.70 and 02.05.00.66/67 crash with a Segmentation fault (core dumped) when run in CLI/headless mode inside a Docker container. The crash occurs consistently after profile loading and model orientation, during internal "nozzle volume processing", regardless of:

  • Input STL complexity (even a 12-triangle cube triggers it)
  • Profile source (custom or bundled BBL profiles)
  • Orient/arrange flags (--orient 0, --orient 1, or omitted)
  • OpenGL configuration (Xvfb, software rendering, or none)
  • Available memory (1GB base + 2GB swap tested)

Why it wasn't caught earlier: The slicer container was new infrastructure (GridFlock Part 4). The Dockerfile originally used linuxserver/bambustudio:latest, which worked when first built but broke when the upstream image updated from v01.x to v02.x.

Fix: Downgraded to linuxserver/bambustudio:01.08.03. See BambuStudio Version Decision below.

Root Cause 2: Print Service Body Size Limit

What happened: The sliced 3MF file (172KB binary) is base64-encoded for the JSON-based internal upload API (POST /internal/simplyprint/upload), producing a ~230KB payload. NestJS/Express defaults to a 100KB JSON body limit, causing a PayloadTooLargeError.

Why it wasn't caught earlier: No load/integration test existed for the full pipeline with a real 3MF output. Unit tests used mocked slicer responses.

Fix: Added app.useBodyParser('json', { limit: '50mb' }) to apps/print-service/src/main.ts. The 50MB limit gives ample headroom for large multi-plate grid assemblies.

Root Cause 3: Misleading Error Classification

What happened: The GridflockPipelineService.determineFailedStep() method uses keyword matching on error messages to classify failures. The generic Axios error "Request failed with status code 500" doesn't contain keywords like "slicer" or "upload", so every HTTP 500 — regardless of which service returned it — was classified as stl-generation (the default fallback).

Impact: Made the logs misleading during investigation. The BullMQ event reported failedStep: "stl-generation" when the actual failure was at the slicing or upload step.

Recommended fix: See Recommendations.


BambuStudio Version Decision

Current State

Property Value
Pinned version linuxserver/bambustudio:01.08.03 (BambuStudio 01.08.03.89)
Previous version linuxserver/bambustudio:latest (auto-resolved to v02.05)
Versions tested 01.08.03, 02.04.00.70, 02.05.00.66, 02.05.00.67

Why the Downgrade

BambuStudio v02.04+ introduced a regression in CLI mode that causes a segmentation fault when running headless in a Docker container. The crash is deep inside the slicing engine (after profile loading and orientation, during nozzle volume processing) and is not recoverable through configuration changes. It affects all tested v02.x builds equally.

BambuStudio v01.08.03 handles CLI slicing reliably. It produces non-fatal OpenGL warnings during thumbnail generation, but these do not affect the sliced output:

[error] glfwInit return error, code 65544
[error] Unable to init glew library
[error] init opengl failed! skip thumbnail generating

What Works on v01.08.03

  • All CLI flags used by the pipeline (--orient 1, --arrange 1, --load-settings, --load-filaments, --slice 0, --export-3mf)
  • All bundled BBL printer profiles (A1, A1 mini, P1S — all nozzle sizes)
  • Bundled process profiles (0.20mm Standard, 0.16mm Optimal, etc.)
  • Bundled filament profiles (Bambu PLA Basic, etc.)
  • Binary and ASCII STL input
  • 3MF output generation

What Is Missing or Different vs v02.x

Feature v01.08.03 v02.04+
CLI slicing Working Segfaults
Thumbnail in 3MF Skipped (OpenGL warning) N/A (segfaults before reaching this)
Profile inheritance system Simpler, file-based Richer, uses inherits/from fields
Newer printer support Up to ~mid-2025 models Latest Bambu Lab printers
Profile JSON format Accepts numeric values Requires all values as strings
Xvfb bundled in image No (must install) Yes (included in base)

Impact Assessment

Low impact for current use case. The GridFlock pipeline: - Only targets Bambu Lab A1 (well-supported in v01.08.03) - Does not need thumbnails (the 3MF goes directly to SimplyPrint) - Uses standard PLA settings - The bundled profiles for A1 are functionally equivalent to our custom ones

Potential future impact: - If Bambu Lab releases new printers with profiles only available in v02.x+ - If we need features added in v02.x (e.g., tree supports improvements, new infill patterns) - If linuxserver/bambustudio:01.08.03 image is removed from Docker Hub

Path to Upgrading

To move back to BambuStudio v02.x when the CLI bug is fixed:

  1. Monitor upstream — Watch the BambuStudio GitHub issues for CLI/headless mode fixes. Search for terms like "segfault CLI", "headless slicing", "batch mode crash".

  2. Test new releases — When a new v02.x release appears, test it with:

    docker run --rm -it linuxserver/bambustudio:<new-tag> bash
    # Create a test cube, run the slice command, check exit code
    /opt/bambustudio/bin/bambu-studio \
      --orient 1 --arrange 1 \
      --load-settings "/opt/bambustudio/resources/profiles/BBL/machine/Bambu Lab A1 0.4 nozzle.json;/opt/bambustudio/resources/profiles/BBL/process/0.20mm Standard @BBL A1.json" \
      --load-filaments "/opt/bambustudio/resources/profiles/BBL/filament/Bambu PLA Basic @BBL A1.json" \
      --slice 0 --export-3mf /tmp/test.3mf /tmp/cube.stl
    echo "Exit code: $?"
    

  3. Profile format migration — v02.x requires all JSON profile values as strings (e.g., "0.20" instead of 0.20, ["20000"] instead of [20000]). Our custom profiles in deployment/slicer/profiles/ have already been converted to string format, so they should be forward-compatible.

  4. Dockerfile changes — When upgrading:

  5. Update base image tag in deployment/slicer/Dockerfile
  6. xvfb can be removed from apt-get install (bundled in v02.x images)
  7. Verify LD_LIBRARY_PATH still points to correct location
  8. Test that --orient flag still accepts a value argument

  9. Alternative slicer engines — If BambuStudio CLI remains broken in v02.x for an extended period, consider:

  10. PrusaSlicer CLI — Mature, stable headless mode, well-documented CLI
  11. OrcaSlicer CLI — Fork of BambuStudio with active community, may have CLI fixes
  12. CuraEngine — Ultimaker's standalone slicer engine, designed for headless use

Post-Fix Issues Discovered

After the pipeline completed successfully, three additional issues were identified during manual verification:

SimplyPrint File Not Visible in UI

The 3MF file was uploaded successfully via the SimplyPrint API Files endpoint (https://files.simplyprint.io/{companyId}/files/Upload), which returned a file ID. However, API Files are stored separately from user-uploaded files and may not appear in the "Your files" section of the SimplyPrint web UI.

The pipeline logs confirm the upload succeeded:

File uploaded to SimplyPrint: 7c529074... (GF-100x100-IP-NOMAG_plate1.3mf)

Investigation needed: Verify whether API Files appear in SimplyPrint's file browser, the print queue, or only when referenced by file ID via the API. This determines whether operators can visually confirm uploads and whether printers can access the files.

3MF Output Contains Embedded Gcode

The slicer pipeline uses --slice 0 --export-3mf, which produces a "sliced 3MF" — a 3MF project file with embedded gcode. This is the native format Bambu Lab printers expect. The file contains: - The original 3D model geometry - Print settings (machine, process, filament profiles) - Pre-computed gcode (toolpath data) - Plate configuration

SimplyPrint may handle this differently than a raw .gcode file. Bambu Lab printers connected to SimplyPrint should be able to process sliced 3MF files natively, but this needs verification with an actual print test.

The gridflock.mapping-ready event was published with lineItemId: "li-1032" (from the manual retry), but the actual database line item ID is 7be1cbcb-9c2a-4d23-9cbd-41595a677914. The OrchestrationService.handleGridflockMappingReady() handler could not find the line item and bailed out:

Line item li-1032 not found for order 5ed925ad-deb2-45d1-ac1c-f56904f18089

Fix: Re-triggered the pipeline with the correct line item UUID. The handler then successfully created 1 print job:

GridFlock mapping ready: created 1 print jobs for SKU GF-100x100-IP-NOMAG

Root cause for the original automated flow: The Order Service's OrchestrationService passes tenantId: 'default' (line 561 in orchestration.service.ts) when triggering the GridFlock pipeline, rather than the actual tenant UUID. This caused the original pipeline to fail at the mapping creation step (FK constraint) even if slicing had succeeded. The tenantId in handleGridflockLineItem should use the order's actual tenant UUID.


Files Changed

File Change Deployed
deployment/slicer/Dockerfile Base image 01.08.03, added xvfb Yes (rebuilt on server)
deployment/slicer/api/routes/slice.js --orient 1 flag, profile mapping, error logging Yes (rebuilt on server)
deployment/slicer/profiles/bambu-a1/*.json Numeric values converted to strings Yes (rebuilt on server)
apps/print-service/src/main.ts Added 50mb JSON body parser limit Hotfixed in container; source updated
apps/print-service/src/simplyprint/simplyprint-api.client.ts Added folder management (ensure/create), API file deletion, auto-promotion to Forma3D.Connect/grids/ folder Pending deploy
apps/print-service/src/internal/internal.controller.ts Upload flow: folder path support, auto-delete API file after promotion Pending deploy
libs/service-common/src/lib/service-client/print-service.client.ts uploadFileToSimplyPrint accepts optional folderPath param Pending deploy
apps/gridflock-service/src/gridflock/gridflock-pipeline.service.ts Pipeline uploads to ["Forma3D.Connect", "grids"] folder Pending deploy

Recommendations

Immediate (before next order)

  1. Fix tenantId: 'default' in OrchestrationServiceFIXED. The handleGridflockLineItem method now receives and uses the real tenant UUID from the order DTO.

  2. Rebuild and push print service image — The body parser fix was hotfixed inside the running container. A proper image rebuild and push to the registry is needed so the fix survives container restarts.

  3. Verify SimplyPrint API Files visibility — Confirm that files uploaded via the API Files endpoint are accessible by printers via SimplyPrint. Test by queuing a print from the uploaded file ID.

  4. Add integration test for full pipeline — A test that exercises STL generation, slicing, and upload end-to-end (even with mocked SimplyPrint) would have caught both the slicer crash and the body size limit.

Short-term

  1. Fix determineFailedStep classification — Wrap each pipeline step's HTTP calls in step-specific error handling so the failure event accurately reports which step failed. Example approach:
try {
  gcodeResult = await this.slicerClient.slice(/* ... */);
} catch (error) {
  throw new PipelineStepError('slicing', error);
}
  1. Pin Docker image tags in CI — Add a lint rule or CI check that deployment/slicer/Dockerfile never uses :latest. Unpinned tags caused the original breakage when upstream auto-upgraded.

  2. Add slicer health check to pipeline pre-flight — Before starting the pipeline, verify the slicer is healthy and can accept requests. This gives a clearer error message than a generic HTTP 500.

Long-term

  1. Monitor BambuStudio upstream releases — Track when CLI segfault is fixed in v02.x and plan upgrade (see Path to Upgrading).

  2. Consider multipart upload for large files — The current base64-over-JSON approach for the SimplyPrint upload doubles the payload size. Switching to multipart/form-data would halve bandwidth and avoid body size issues entirely.

  3. Add pipeline retry mechanism — The current pipeline has no built-in retry logic. A failed pipeline requires manual re-triggering via the internal API. Consider adding BullMQ-based automatic retries with exponential backoff for transient failures.