2024-present TypeScript FFmpeg ONNX Runtime OpenCV Protobuf

renderbox-engine

Composable type-safe TypeScript DSL that compiles to FFmpeg, ONNX Runtime, and OpenCV

Overview

renderbox-engine is a composable, type-safe DSL library for video, audio, and AI pipelines. Instead of hand-writing FFmpeg commands — where a simple slideshow with Ken Burns effects and crossfade transitions produces a 40-line command — you describe what you want in TypeScript and the compiler figures out how to execute it. The same composable syntax extends to multi-runtime pipelines: a face-redaction pipeline that decodes with FFmpeg, runs ONNX inference for detection, and applies OpenCV blur composes identically to a pure-FFmpeg pipeline. The compiler handles stream labeling, DAG deduplication, crossfade offset arithmetic, parameter escaping, filter graph wiring, runtime partitioning, and boundary marshalling automatically. Published on npm as renderbox-engine.

Architecture

Every pipeline is represented as a typed abstract syntax tree — pure data, not imperative code. Pipelines are JSON-serializable, inspectable, and optimizable before any process is spawned. A shared set of interpreters each traverse the AST to produce different outputs: FFmpeg CLI arguments, optimization rewrites, validation errors, visual graphs, content hashes, or cost estimates. For multi-runtime pipelines, the compiler tags each node with its target runtime, partitions the graph at runtime boundaries, inserts data-format conversion nodes, and emits a serializable execution plan of ordered segments connected by pipe edges. The plan is then dispatched to per-runtime handlers — adding a new runtime means registering a handler, not modifying the compiler.

Key Concepts

Pipelines as Data

Every pipeline is a typed syntax tree — not a sequence of side effects. You can build a sub-pipeline, pass it to a function, optimize it, serialize it, visualize it, or diff it against another pipeline. The tree is the single source of truth; execution is a final interpretation step that leaves the tree unchanged.

Multi-Runtime Compilation

A single pipeline can span FFmpeg (media processing), ONNX Runtime (ML inference), and OpenCV (computer vision). The compiler automatically detects runtime boundaries, partitions the graph into segments, inserts marshalling for data format conversion between runtimes, and emits an execution plan that an executor can run as coordinated processes wired by pipes.

Code Highlights

Slideshow with Ken Burns, crossfades, and background music

const slides = [
  kenBurns(gentleZoomIn, frames, w, h)(staticImage("exterior.jpg", 4)),
  kenBurns(zoomIn, frames, w, h)(staticImage("living-room.jpg", 4)),
  kenBurns(panRight, frames, w, h)(staticImage("garden.jpg", 4)),
];

const { video, totalDuration } = slideshow(slides, 1, {
  segmentDuration: 4,
  transitions: ["fade", "dissolve"],
});

const audio = musicBed({ totalDuration })(input("background-music.mp3"));
const graph = output(video, audio, "output.mp4", {
  codec: "libx264", crf: 20, pixelFormat: "yuv420p", faststart: true,
});
await run(optimize(graph));

Face redaction — three runtimes, one syntax

const pipeline = detection.detect("yolov8n-face")(input("surveillance.mp4"));
const redacted = detection.redact("blur", { radius: 30 })(pipeline);
const graph = output(redacted, null, "redacted.mp4", { codec: "libx264" });

// Compiles to: FFmpeg decode → ONNX detect → OpenCV blur → FFmpeg encode
// Runtime boundaries, marshalling, and pipe wiring are automatic

Template builders for common AI pipelines

import {
  inputVideo, faceRedact, backgroundBlur, TEMPLATES
} from "renderbox-engine";

const video = inputVideo("interview.mp4");

// Direct function call
const graph = faceRedact(video, { method: "blur", confidence: 0.8 });

// Name-based lookup for dynamic dispatch
const builder = TEMPLATES["background-blur"];
const graph2 = builder(video, { radius: 20 });

Highlights

21,500+ lines of TypeScript across 179 source files with 183 test files — composable DSL that compiles high-level pipeline descriptions into multi-runtime execution plans
88+ typed video filters, 50+ typed audio filters, and 9 AI domains (detection, transcription, depth estimation, pose, OCR, scene classification, segmentation) with compile-time stream type safety
Multi-runtime compiler that automatically partitions a single pipeline graph across FFmpeg, ONNX Runtime, and OpenCV — inserting boundary marshalling and emitting a serializable execution plan
12+ pluggable interpreters over a shared typed AST: compile, optimize, validate, visualize (DOT/Mermaid), inspect, hash, cost-estimate, snapshot — each independently testable
Binary wire format shared with a companion Rust executor for cross-language plan serialization, with golden fixture regression testing across both codebases

Related Projects

PromiseKit

Production Promise library built from algebraic first principles, shipped to ~50K DAU across multiple iOS apps

Swift GCD NSLock PovioKit

Designed and built a production Promise library from algebraic first principles -- map derives from flatMap, reduce from fold, operators match Haskell's typeclass precedence hierarchy

fluent-html

Zero-dependency, type-safe HTML builder for TypeScript with compile-time Tailwind CSS and HTMX safety

TypeScript Tailwind CSS HTMX Fastify

Designed a mixin architecture using TypeScript declaration merging + prototype assignment that splits a 1,100-line Tag class into focused modules with zero API changes

Redis Server (Zig)

From-scratch Redis-compatible server in Zig with RESP parsing, RDB persistence, and master-replica replication

Zig 0.14 TCP RESP2/RESP3 RDB

Implemented a complete Redis-compatible server from scratch in 915 lines of zero-dependency Zig with RESP protocol parser, key-value store with TTL, and RDB binary format codec

Back to CV