Architecture Overview¶
This document describes the complete architecture of the MIDI Markdown (MMD) compiler, including the compilation pipeline, core components, data structures, and runtime playback system.
System Overview¶
MML is a multi-stage compiler that transforms human-readable MIDI markup into executable MIDI sequences. The system follows a modular architecture with clear separation of concerns:
Input (.mmd)
↓
┌─────────────────────────────────────────────────────┐
│ COMPILATION PIPELINE (Parse → Expand → Validate) │
├─────────────────────────────────────────────────────┤
│ 1. Parser → Lark grammar → AST │
│ 2. Import Resolver → Device library loading │
│ 3. Alias Resolver → Alias expansion to MIDI │
│ 4. Command Expander → Variables, loops, sweeps │
│ 5. Validator → Range/timing/value validation │
│ 6. IR Compiler → AST → Intermediate Repr. │
└─────────────────────────────────────────────────────┘
↓
┌──────────────────────┬──────────────────────────────┐
│ CODE GENERATION │ RUNTIME (Phase 3) │
├──────────────────────┼──────────────────────────────┤
│ - MIDI files (.mid) │ - Real-time MIDI I/O │
│ - JSON export │ - Event scheduler │
│ - CSV export │ - Tempo tracker │
│ - Table display │ - TUI player │
└──────────────────────┴──────────────────────────────┘
↓ ↓
Output files Live MIDI Output
Design Principles¶
- Human-readable first — Markdown-inspired syntax designed for readability and scannability
- Device-agnostic core — Base MIDI commands work universally; device-specific aliases extend functionality
- Timing flexibility — Support four timing paradigms (absolute, musical, relative, simultaneous)
- Validation-friendly — Syntax designed to make errors obvious
- Composable — Reusable aliases, imports, and modular components
Technology Stack¶
- Language: Python 3.12+ with modern type hints
- Parser: Lark (LALR parser with contextual lexer)
- MIDI I/O: Mido (MIDI file generation), python-rtmidi (real-time I/O)
- CLI: Typer (declarative CLI with type hints), Rich (beautiful terminal output)
- Testing: pytest (1090+ tests), mypy (static type checking)
- Code Quality: Ruff (linting/formatting), Just (task runner)
Compilation Pipeline¶
The compiler follows a seven-stage pipeline transforming MMD source to MIDI output:
Stage 1: Parser (Lark Grammar → AST)¶
File: src/midi_markdown/parser/
The parser converts MMD text into an Abstract Syntax Tree (AST) using Lark's LALR parser:
class MMLParser:
def parse_file(self, filepath: str) -> MMLDocument: ...
def parse_string(self, text: str) -> MMLDocument: ...
Components: - mml.lark — Lark grammar definition (MIDI commands, timing, directives) - parser.py — Parser wrapper class (~40 lines) - transformer.py — AST transformation (~1,370 lines, handles all parse tree nodes) - ast_nodes.py — AST data structures (MMLDocument, Track, MIDICommand, Timing, etc.)
Key Features: - Contextual lexer handles comments, strings in different contexts - Position tracking enabled (line, column information for errors) - YAML frontmatter parsed separately before grammar parsing - Auto note_off generation for notes with duration - Forward reference support for variables (resolved in later stages)
Output: MMLDocument AST with frontmatter, tracks/events, defines, imports, and aliases
Stage 2: Import Resolver¶
File: src/midi_markdown/alias/imports.py
Loads device library files referenced via @import statements:
# Example: @import "devices/quad_cortex.mmd"
loaded_aliases = import_resolver.load_imports(document.imports)
Handles: - File loading and parsing of device library MMD files - Cycle detection (prevents circular imports) - Merges loaded aliases into document's alias dictionary
Output: Flattened alias dictionary combining document-level and imported aliases
Stage 3: Alias Resolver¶
File: src/midi_markdown/alias/resolver.py
Expands alias calls to base MIDI commands with recursive resolution:
resolver = AliasResolver(all_aliases, max_depth=10)
expanded_events = resolver.resolve_alias_call("cortex_load", [1, 2, 0, 5])
# Returns list of MIDICommand objects
Features: - Parameter binding — Matches arguments to alias parameters - Parameter types — Supports int, note, percent, bool, enum, range, channel types - Conditional branching — @if/@elif/@else evaluation - Computed values — Expression evaluation ({var = expr}) - Nested aliases — Aliases can call other aliases - Cycle detection — Stack-based recursion with max_depth limit (default: 10) - Relative timing — Accumulates timing within aliases ([+100ms], [+1b])
Key Rules: - Resolution order: document-level aliases → imported aliases → error - Max depth prevents stack overflow (hardcoded at 10) - Cycle detection: O(n) check on each recursive call - All parameters must be defined before use (except forward references)
Output: List of expanded MIDICommand events with resolved parameters
Stage 4: Command Expander¶
File: src/midi_markdown/expansion/
Expands advanced features: variables, loops, and sweeps:
Components: - expander.py — Main orchestrator (~1,000 lines) - variables.py — @define and ${} substitution - loops.py — @loop expansion (unrolls into individual events) - sweeps.py — @sweep ramp expansion (automated parameter changes)
Example - Loop Expansion:
Timing Calculation (all four paradigms):
1. ABSOLUTE: [mm:ss.ms] → ticks = seconds * (ppq * tempo / 60)
2. MUSICAL: [bars.beats.ticks] → computed from time signature
3. RELATIVE: [+value unit] → current_time + delta (ms, s, b, m, t)
4. SIMULTANEOUS: [@] → current_time (no time advance)
Output: Expanded event list with all variables, loops, and sweeps resolved
Stage 5: Validator¶
File: src/midi_markdown/utils/validation/
Validates entire compilation pipeline:
Components: - document_validator.py — Structure validation (track names, imports, etc.) - timing_validator.py — Timing monotonicity (events must be in order) - value_validator.py — MIDI value ranges (channels 1-16, CC/note 0-127, etc.)
Validation Rules: - Timing: Events must have monotonically increasing times within each track - Channels: 1-16 (MIDI standard) - Notes: 0-127 (C-1 to G9) - CC/PC/Velocity: 0-127 - Pitch Bend: -8192 to +8191 - All aliases must be defined - All imported files must exist
Output: Validated event list (throws ValidationError on failure)
Stage 6: IR Compiler¶
File: src/midi_markdown/core/
Converts validated AST to Intermediate Representation (IR):
ir_program = compile_ast_to_ir(
document=expanded_document,
tempo=bpm,
ppq=ticks_per_beat,
time_signature=(4, 4)
)
IR Data Structures:
@dataclass
class MIDIEvent:
time: int # Absolute ticks
type: str # "note_on", "cc", "pc", "pitch_bend", etc.
channel: int # 1-16
data1: int | None # Note/CC number
data2: int | None # Velocity/Value
metadata: dict # source_line, source_file, track
@dataclass
class IRProgram:
events: list[MIDIEvent]
metadata: dict # tempo, ppq, time_signature, duration
Features:
- Query methods: events_by_time(), events_by_channel(), by_type()
- Metadata preserved for debugging (source file, line number, track)
- Foundation for real-time playback, diagnostics, and code generation
Output: IRProgram (queryable, structured event list)
Stage 7: Code Generation¶
File: src/midi_markdown/codegen/
Generates output from IR in multiple formats:
Components: - midi_file.py — MIDI file writer using mido (formats 0/½) - csv_export.py — midicsv-compatible CSV format (for spreadsheet analysis) - json_export.py — JSON export (complete and simplified formats)
Output Formats:
- .mid — Standard MIDI File (binary)
- .csv — Spreadsheet-compatible event list
- .json — Programmatic access to events and metadata
- Table — Rich terminal display (diagnostics/debugging)
Core Components¶
Parser Layer (src/midi_markdown/parser/)¶
Lark Grammar (mml.lark):
- LALR parser with contextual lexer for different token contexts
- ~12KB grammar covering all MIDI commands, timing formats, and directives
- Tokenization rules: ABSOLUTE_TIME, MUSICAL_TIME, RELATIVE_TIME, NOTE_NAME, DURATION, etc.
- Supports YAML-style frontmatter before grammar parsing
Transformer (transformer.py):
- Converts Lark parse tree to AST (~1,370 lines)
- ~100 transform methods (one per grammar rule)
- Validates parameter types during transformation
- Handles forward references for variables (stored as tuples until resolution)
- Converts note names (C#4, Db5) to MIDI note numbers
AST Nodes (ast_nodes.py):
@dataclass
class MMLDocument:
frontmatter: dict[str, Any] # Tempo, time_signature, metadata
tracks: list[Track] # Multi-track mode
events: list[Any] # Single-track mode
defines: list[DefineStatement] # Variable definitions
imports: list[ImportStatement] # Device library imports
aliases: list[AliasDefinition] # Alias definitions
@dataclass
class Timing:
mode: str # "absolute", "musical", "relative", "simultaneous"
value: int | tuple[int, ...] # Absolute ticks or (bars, beats, ticks)
@dataclass
class MIDICommand:
type: str # "note_on", "cc", "pc", "pitch_bend", etc.
channel: int | None # 1-16 (or None for document-level commands)
data1: int | None # Note/CC number
data2: int | None # Velocity/Value
duration: int | None # For note commands (generates note_off)
Alias Layer (src/midi_markdown/alias/)¶
Resolver (resolver.py):
- Recursive alias expansion with cycle detection
- Parameter binding and type checking
- Conditional evaluation (@if/@elif/@else)
- Expression evaluation for computed values
- Relative timing accumulation
Imports (imports.py):
- Loads device library MMD files
- Detects circular imports
- Merges imported aliases with document-level aliases
Models (models.py):
@dataclass
class AliasDefinition:
name: str
parameters: list[Parameter] # Parameter definitions
commands: list[Any] # Command list (AST nodes)
description: str | None
@dataclass
class Parameter:
name: str
type: str # "int", "note", "percent", "enum", etc.
min: int | None
max: int | None
default: Any | None
enum_values: dict[str, int] | None
Expansion Layer (src/midi_markdown/expansion/)¶
Expander (expander.py):
- Main orchestrator coordinating all expansions
- Variable substitution and symbol table management
- Loop unrolling (expands @loop into individual events)
- Sweep expansion (automated parameter ramping)
- Timing calculation for all four paradigms
Variables (variables.py):
- Maintains symbol table during expansion
- Substitutes ${variable} references in commands
- Validates all variables are defined before use
Loops (loops.py):
- Unrolls @loop directives into individual events
- Supports loop variables (loop counters)
- Accumulates timing across loop iterations
Sweeps (sweeps.py):
- Expands @sweep ramps into CC/pitch_bend changes
- Generates intermediate values for smooth automation
- Supports different ramp types (linear, exponential, etc.)
Validation Layer (src/midi_markdown/utils/validation/)¶
Document Validator (document_validator.py):
- Checks frontmatter required fields
- Validates track configuration
- Checks for undefined aliases/imports
- Validates import file existence
Timing Validator (timing_validator.py):
- Ensures timing is monotonically increasing per track
- Checks timing values are within valid ranges
- Detects timing conflicts
Value Validator (value_validator.py):
- MIDI value range validation (0-127 for most values)
- Channel range (1-16)
- Pitch bend range (-8192 to +8191)
- Velocity range (0-127)
Codegen Layer (src/midi_markdown/codegen/)¶
MIDI File (midi_file.py):
def generate_midi_file(ir_program: IRProgram, format: int = 1) -> bytes:
"""Generate MIDI file bytes from IR program."""
# Returns raw MIDI file data (can be written to .mid file)
CSV Export (csv_export.py):
- Exports to midicsv-compatible format
- One row per MIDI event with columns: Time, Type, Channel, Data1, Data2
- Suitable for spreadsheet analysis and debugging
JSON Export (json_export.py):
- Complete format: Full event details + metadata
- Simplified format: Minimal field set for API consumption
- Includes timing, channels, event types, source information
Runtime Layer (src/midi_markdown/runtime/) — Phase 3¶
MIDI I/O (midi_io.py):
class MIDIOutput:
def __init__(self, port_index: int | str | None = None):
# Initialize MIDI output port using python-rtmidi
def send_message(self, message: MIDIMessage) -> None:
# Send MIDI message to output port
@classmethod
def list_ports(cls) -> list[str]:
# Get available MIDI output ports
Tempo Tracker (tempo_tracker.py):
- Maintains tempo map (tempo changes over time)
- Converts absolute ticks to milliseconds
- Handles dynamic tempo changes during playback
- Accounts for PPQ (ticks per quarter note)
Scheduler (scheduler.py):
- Sub-5ms timing precision
- Hybrid sleep/busy-wait algorithm (sleep 95%, busy-wait final 5ms)
- Thread-based event scheduling
- Handles MIDI message queueing
Player (player.py):
class RealtimePlayer:
def __init__(self, ir_program: IRProgram, midi_output: MIDIOutput):
pass
def play(self, tempo_bpm: int = 120) -> None:
# Start playback from current position
def pause(self) -> None:
# Pause playback (resumable)
def stop(self) -> None:
# Stop and reset to beginning
def resume(self) -> None:
# Resume from paused position
@property
def is_playing(self) -> bool:
# Check if currently playing
Terminal UI (tui/):
- state.py — Thread-safe state management with locks
- components.py — Rich UI components (progress bars, time display, port info)
- display.py — TUIDisplayManager with 30 FPS refresh rate
- input.py — KeyboardInputHandler (Space=play/pause, Q=quit, R=reset)
TUI Features: - Real-time progress indicator (bar + time) - Current playing event display - MIDI port information - Keyboard controls (non-blocking) - Thread-safe state updates
Data Flow & Structures¶
Input Processing¶
.mmd file (text)
↓
Parser (parse_string)
↓
MMLDocument AST
├─ frontmatter: {tempo, time_signature, ...}
├─ events: [Command, Timing, Command, ...]
├─ imports: [ImportStatement, ...]
└─ aliases: [AliasDefinition, ...]
Intermediate Representation¶
Expanded + Validated AST
↓
IR Compiler
↓
IRProgram
├─ events: [MIDIEvent, ...]
│ └─ Each MIDIEvent:
│ ├─ time: int (absolute ticks)
│ ├─ type: "note_on" | "cc" | "pc" | ...
│ ├─ channel: 1-16
│ ├─ data1: 0-127 (note/CC number)
│ ├─ data2: 0-127 (velocity/value)
│ └─ metadata: {source_line, source_file, track}
├─ metadata: {tempo, ppq, time_signature, duration}
└─ Query methods:
├─ events_by_time(start, end) → [events]
├─ events_by_channel(ch) → [events]
└─ by_type(type) → [events]
Output Formats¶
MIDI File: - Binary format: Standard MIDI File (format 0/½) - Tick-based timing (60 ticks per beat by default, configurable) - Optimized delta-time encoding
CSV Export:
JSON Export:
{
"metadata": {
"tempo": 120,
"ppq": 480,
"time_signature": [4, 4],
"duration_seconds": 50.5
},
"events": [
{
"time": 0,
"type": "program_change",
"channel": 1,
"data1": 0,
"data2": null
}
]
}
Testing Architecture¶
Test Organization¶
- Unit Tests (
tests/unit/) — 598 tests, fast (<5 seconds) - Component-level testing in isolation
- Mock external dependencies
-
Examples:
test_timing.py,test_aliases.py,test_loops.py -
Integration Tests (
tests/integration/) — 242 tests - Multi-component workflows
- CLI command testing
- Device library testing
- Examples:
test_cli.py,test_end_to_end.py
Test Markers¶
@pytest.mark.unit # Fast, isolated tests (598 tests)
@pytest.mark.integration # Multi-component tests (242 tests)
@pytest.mark.e2e # End-to-end workflows
@pytest.mark.cli # CLI command tests
@pytest.mark.slow # Long-running tests (>1 second)
Test Fixtures¶
Valid Fixtures (tests/fixtures/valid/):
- 25 well-formed MMD files covering all features
- Used for positive testing and validation
- Include: timing systems, aliases, loops, sweeps, device imports
Invalid Fixtures (tests/fixtures/invalid/):
- 12 known error cases
- Used for validation testing
- Verify error messages and recovery
Running Tests¶
# All tests
just test # All tests with coverage (1090+)
just test-unit # Unit tests only (598)
just test-integration # Integration tests only (242)
just test-e2e # End-to-end compilation
# Specific
uv run pytest tests/unit/test_timing.py # Single file
uv run pytest -k "test_absolute_timing" # By name
uv run pytest -x # Stop on first failure
Coverage¶
- Current: 72.53% overall (1090 tests)
- Target: 80%+ overall
- Critical paths: Parser (75.81%), Diagnostics (88.77%), Codegen (85%+)
- Reports: HTML in
htmlcov/, XML for CI/CD
Key Technical Details¶
Timing Paradigms¶
All four timing systems compile to absolute ticks:
-
Absolute Timecode:
[mm:ss.milliseconds] -
Musical Time:
[bars.beats.ticks] -
Relative Delta:
[+value unit] -
Simultaneous:
[@]
Command Abbreviations¶
All command types use abbreviated forms (NOT full names):
- pc = Program Change (NOT program_change)
- cc = Control Change (NOT control_change)
- note_on = Note On (standard)
- note_off = Note Off (standard)
- pitch_bend = Pitch Bend
- pressure = Channel Pressure / Aftertouch
Alias Parameter Types¶
Aliases support multiple parameter types:
@alias cortex_load {ch:1-16}.{setlist:0-5}.{group:0-3}.{preset:0-4}
@alias h90_reverb {type=hall:1,room:2,plate:3}.{time:0.0-10.0}
@alias toggle {name}.{on=:0,off:127}
Types:
- int, int:min-max — Integer with optional range
- note, note:C1-C8 — MIDI note names with optional range
- channel:1-16 — Channel (1-16)
- percent:0-100 — Percentage (0-100)
- bool — Boolean (true/false, 1/0, on/off)
- enum=name:value,... — Named enumeration
- {param=default} — Default values
Symbol Table Resolution¶
Variable resolution follows this order:
- Check local scope (current loop iteration)
- Check parent scopes (nested loops)
- Check document scope (@define statements)
- Check alias parameters
- Forward reference (return tuple for later resolution)
Architecture Highlights¶
Separation of Concerns¶
- Parser: Grammar + AST transformation (no business logic)
- Alias Resolver: Semantic expansion (recursive with cycle detection)
- Command Expander: Advanced features (variables, loops, sweeps)
- Validator: Constraint enforcement (ranges, timing, existence)
- Codegen: Output format generation (multiple formats)
- Runtime: Real-time playback (separate from compilation)
Immutability¶
- AST nodes are frozen after parsing
- Transformations create new nodes (no mutation)
- Enables validation and debugging consistency
Error Recovery¶
- Parser provides position tracking (line, column)
- Helpful error messages with suggestions
- Spell-correction for alias name typos (Levenshtein distance)
Performance¶
- Single-pass parsing (Lark LALR)
- Lazy evaluation of complex features (loops, sweeps)
- Event scheduling uses hybrid sleep/busy-wait (sub-5ms precision)
- Streaming output generation (doesn't load entire MIDI in memory)
Extensibility¶
- Device libraries as separate MMD files
- Alias system enables semantic command layers
- Multiple output formats without recompilation
- Plugin-ready architecture for future device/format support
References¶
- Full Specification: See spec.md (1,300+ lines)
- CLI Design: See CLAUDE.md (implementation patterns)
- Parser Details: See developer-guide/architecture/parser-summary.md
- Examples: See examples/ (51 working examples)
- Device Libraries: See devices/ (6 pre-built libraries)