SML-to-Racket
Source-to-source compiler with lambda calculus IR and Church-encoded algebraic datatypes
Overview
A source-to-source compiler written in Swift that translates Standard ML programs into executable Racket code, with an optional lambda calculus intermediate representation featuring Church-encoded booleans and tuples. The compiler implements a full front-end pipeline (lexical analysis, recursive-descent parsing, name resolution, type inference with polymorphic type variables) and two independent back-ends: direct Racket code generation and a lambda calculus IR that Church-encodes all data structures as higher-order functions. The codebase spans ~8,500 lines of Swift across 84 source files, based on the official SML '97 Definition.
Architecture
The pipeline flows from lexical analysis through recursive-descent parsing with operator precedence climbing (6 levels), name resolution via a scoped symbol table, and bidirectional type inference. Two independent code generation paths branch after semantic analysis: a direct Racket code generator compiling datatypes to structs with pattern matching, and a lambda calculus generator that Church-encodes all data structures as higher-order functions.
Key Concepts
Church-Encoded Algebraic Datatypes
Each constructor of an n-variant datatype becomes an n-ary lambda that selects the corresponding handler. Constructors with associated values get an extra outer lambda to capture the payload. Case expressions become nested function application -- the encoded value applied to handler functions for each branch.
Code Highlights
func handleCase(_ c: AstCase, index: Int) -> (v: String, expression: Tree) {
let n = node.cases.count
let inner = c.associatedType.map { _ in
Tree.application(fn: .variable(name: variables[index]),
value: .variable(name: "a"))
} ?? Tree.application(fn: .variable(name: variables[index]),
value: .constant(value: 0))
let abstraction = (1...n).reduce(inner) {
Tree.abstraction(variable: variables[n - $1], expression: $0)
}
let datatypeConstructor = c.associatedType.map { _ in
Tree.abstraction(variable: "a", expression: abstraction)
} ?? abstraction
return (v: c.name.name, expression: datatypeConstructor)
}Highlights
- Dual-target compiler with direct Racket emission and a lambda calculus IR featuring Church-encoded booleans, tuples, and algebraic datatypes
- Complete SML front-end with bidirectional type inference supporting polymorphic type variables, algebraic datatypes, and multi-clause pattern matching
- Pattern matching compilation: nested tuples/lists compiled to Racket cond with auto-generated car/cdr accessor chains
- Church encoding of algebraic datatypes: each n-variant SML datatype compiles to n lambda terms selecting the i-th handler function
Related Projects
Blang (Bitis)
Lazy functional language with compile-time ownership and borrowing instead of garbage collection
Novel PL contribution: first lazy functional language using ownership/borrowing instead of GC, introducing self-borrows for cyclic data structures (graphs, infinite streams)
Atheris
Complete compiler for a Swift-like OOP language with vtables and polymorphic dispatch
Designed and implemented a complete compiler from scratch for a Swift-like OOP language (~120 Java source files, ~8,000+ lines)
PINS Compiler Course
8 progressive compiler assignments from lexer to interpreter for 100+ university students
Designed a complete 8-assignment compiler course from lexical analysis to a working interpreter (12 to 73 source files incrementally)