Skip to main content
2024-present TypeScript FFmpeg ONNX Runtime OpenCV Protobuf

renderbox-engine

Composable type-safe TypeScript DSL that compiles to FFmpeg, ONNX Runtime, and OpenCV

renderbox-engine screenshot

Overview

renderbox-engine is a composable, type-safe DSL library for video, audio, and AI pipelines. Instead of hand-writing FFmpeg commands — where a simple slideshow with Ken Burns effects and crossfade transitions produces a 40-line command — you describe what you want in TypeScript and the compiler figures out how to execute it. The same composable syntax extends to multi-runtime pipelines: a face-redaction pipeline that decodes with FFmpeg, runs ONNX inference for detection, and applies OpenCV blur composes identically to a pure-FFmpeg pipeline. The compiler handles stream labeling, DAG deduplication, crossfade offset arithmetic, parameter escaping, filter graph wiring, runtime partitioning, and boundary marshalling automatically. Published on npm as renderbox-engine.

Architecture

renderbox-engine architecture

Every pipeline is represented as a typed abstract syntax tree — pure data, not imperative code. Pipelines are JSON-serializable, inspectable, and optimizable before any process is spawned. A shared set of interpreters each traverse the AST to produce different outputs: FFmpeg CLI arguments, optimization rewrites, validation errors, visual graphs, content hashes, or cost estimates. For multi-runtime pipelines, the compiler tags each node with its target runtime, partitions the graph at runtime boundaries, inserts data-format conversion nodes, and emits a serializable execution plan of ordered segments connected by pipe edges. The plan is then dispatched to per-runtime handlers — adding a new runtime means registering a handler, not modifying the compiler.

Key Concepts

Pipelines as Data

Pipelines as Data

Every pipeline is a typed syntax tree — not a sequence of side effects. You can build a sub-pipeline, pass it to a function, optimize it, serialize it, visualize it, or diff it against another pipeline. The tree is the single source of truth; execution is a final interpretation step that leaves the tree unchanged.

Multi-Runtime Compilation

Multi-Runtime Compilation

A single pipeline can span FFmpeg (media processing), ONNX Runtime (ML inference), and OpenCV (computer vision). The compiler automatically detects runtime boundaries, partitions the graph into segments, inserts marshalling for data format conversion between runtimes, and emits an execution plan that an executor can run as coordinated processes wired by pipes.

Code Highlights

Slideshow with Ken Burns, crossfades, and background music
const slides = [
  kenBurns(gentleZoomIn, frames, w, h)(staticImage("exterior.jpg", 4)),
  kenBurns(zoomIn, frames, w, h)(staticImage("living-room.jpg", 4)),
  kenBurns(panRight, frames, w, h)(staticImage("garden.jpg", 4)),
];

const { video, totalDuration } = slideshow(slides, 1, {
  segmentDuration: 4,
  transitions: ["fade", "dissolve"],
});

const audio = musicBed({ totalDuration })(input("background-music.mp3"));
const graph = output(video, audio, "output.mp4", {
  codec: "libx264", crf: 20, pixelFormat: "yuv420p", faststart: true,
});
await run(optimize(graph));
Face redaction — three runtimes, one syntax
const pipeline = detection.detect("yolov8n-face")(input("surveillance.mp4"));
const redacted = detection.redact("blur", { radius: 30 })(pipeline);
const graph = output(redacted, null, "redacted.mp4", { codec: "libx264" });

// Compiles to: FFmpeg decode → ONNX detect → OpenCV blur → FFmpeg encode
// Runtime boundaries, marshalling, and pipe wiring are automatic
Template builders for common AI pipelines
import {
  inputVideo, faceRedact, backgroundBlur, TEMPLATES
} from "renderbox-engine";

const video = inputVideo("interview.mp4");

// Direct function call
const graph = faceRedact(video, { method: "blur", confidence: 0.8 });

// Name-based lookup for dynamic dispatch
const builder = TEMPLATES["background-blur"];
const graph2 = builder(video, { radius: 20 });

Highlights

  • 21,500+ lines of TypeScript across 179 source files with 183 test files — composable DSL that compiles high-level pipeline descriptions into multi-runtime execution plans
  • 88+ typed video filters, 50+ typed audio filters, and 9 AI domains (detection, transcription, depth estimation, pose, OCR, scene classification, segmentation) with compile-time stream type safety
  • Multi-runtime compiler that automatically partitions a single pipeline graph across FFmpeg, ONNX Runtime, and OpenCV — inserting boundary marshalling and emitting a serializable execution plan
  • 12+ pluggable interpreters over a shared typed AST: compile, optimize, validate, visualize (DOT/Mermaid), inspect, hash, cost-estimate, snapshot — each independently testable
  • Binary wire format shared with a companion Rust executor for cross-language plan serialization, with golden fixture regression testing across both codebases