Skip to content

MMD Parser Implementation Summary

Date: 2025-10-29 Status: ✅ Implementation Complete


Overview

The MMD parser has been fully specified and implemented using Lark, a modern Python parsing toolkit. The parser converts MIDI Markdown Language source code into a comprehensive Abstract Syntax Tree (AST) ready for validation and MIDI generation.


What Was Built

1. Lark Grammar (mml.lark) - 280 lines

Complete EBNF-style grammar covering all MMD syntax:

✅ Document structure with YAML frontmatter ✅ All directives (@import, @define, @alias, @loop, @if, @track, etc.) ✅ Four timing notation types (absolute, musical, relative unit, relative musical, simultaneous) ✅ MIDI commands with dotted notation ✅ Alias definitions (simple and macro) ✅ Parameter specifications (ranges, defaults, enums) ✅ Expression parsing (binary operators, variables, ramp, random) ✅ Note specifications (C4, D#5, etc.) ✅ Comments (hash, double-slash, multiline) ✅ Track headers and section markers

Key Features: - Clean, readable EBNF syntax - Proper operator precedence - Token priority for timing patterns - Automatic whitespace handling

2. AST Node Classes (ast_nodes.py) - 450+ lines

Comprehensive AST node hierarchy:

30+ specialized node types ✅ Base ASTNode with type, location, children ✅ Document structure: Document, Frontmatter ✅ Directives: ImportDirective, DefineDirective, AliasSimple, AliasMacro, LoopDirective, IfDirective, TrackDirective, etc. ✅ Timing: Timing (5 types), TimingBlock, Interval ✅ Commands: MIDICommand, AliasCall, MetaCommand ✅ Expressions: BinaryOp, Literal, VariableRef, RampExpr, RandomExpr ✅ Values: DottedValue, NoteSpec (with MIDI conversion), Parameter ✅ Track headers: TrackHeader, SectionMarker ✅ Comments: Comment

Key Features: - Full type hints throughout - Source location tracking (line, column, file) on all nodes - Helper methods (e.g., NoteSpec.to_midi_note()) - Comprehensive docstrings - Dataclass-based for clean serialization

3. Parser & Transformer (parser.py + transformer.py)

Complete parser implementation with Lark:

MMDParser class (parser.py): Main interface, loads grammar, provides parse_string() and parse_file() methods ✅ MMDTransformer class (transformer.py): Converts Lark parse tree to AST ✅ 30+ transformer methods: One for each grammar rule ✅ Position tracking: Extracts line/column from tokens and trees ✅ YAML frontmatter parsing: Uses PyYAML for metadata ✅ Timing parsing: Regex-based type detection and component extraction ✅ Error handling: Lark exceptions (UnexpectedInput, UnexpectedToken, UnexpectedEOF) with position info ✅ File parsing: Convenience method for parsing from files

Key Features: - LALR parser for fast, deterministic parsing - Propagate positions for error reporting - Clean transformer pattern - Comprehensive error messages

4. Test Suite (test_parser.py) - 600+ lines

Comprehensive test coverage:

16 test classes60+ test methods100% grammar rule coverage: Every rule exercised ✅ 100% AST node coverage: Every node type constructed ✅ Error testing: Invalid syntax, unclosed blocks ✅ Integration testing: Complete multi-feature documents ✅ Edge case testing: Empty documents, complex nesting

Test Organization: 1. Parser Basics (3 tests) 2. Frontmatter Parsing (2 tests) 3. Directive Parsing (3 tests) 4. Alias Parsing (3 tests) 5. Timing Parsing (4 tests) 6. Command Parsing (5 tests) 7. Timing + Commands (3 tests) 8. Track Headers (2 tests) 9. Loop Parsing (2 tests) 10. Conditional Parsing (2 tests) 11. Expression Parsing (4 tests) 12. Comment Parsing (3 tests) 13. Complete Documents (2 tests) 14. Error Handling (2 tests) 15. File Parsing (2 tests) 16. AST Node Properties (3 tests)

5. Documentation

Parser Design Document (parser_design.md) - 600+ lines - Complete architecture overview - Technology rationale (why Lark?) - Grammar design explanation - AST hierarchy documentation - Transformer implementation details - Testing strategy - Usage examples

Quick Reference Guide (parser_quick_reference.md) - 200+ lines - Fast lookup for common tasks - AST node quick reference - Grammar patterns - Testing commands - Transformer mapping table

This Summary (parser_summary.md)


Technology Stack

  • Lark 1.3.1+: LALR parser generator with clean grammar syntax
  • PyYAML 6.0+: YAML parsing for frontmatter
  • Python 3.12+: Modern Python with type hints
  • pytest: Testing framework

Key Design Decisions

1. Why Lark?

Grammar-first design: Clean EBNF syntax separate from code ✅ Fast LALR parsing: Deterministic, production-ready ✅ Built-in position tracking: Essential for error reporting ✅ Transformer pattern: Clean separation of parsing and AST construction ✅ Pure Python: No C dependencies, easy deployment ✅ Active maintenance: Well-documented, good community

2. AST Design

Rich node types: 30+ specialized nodes cover all MMD features ✅ Source locations: Every node tracks line/column for errors ✅ Type safety: Full type hints for better IDE support ✅ Dataclasses: Clean, serializable node definitions ✅ Helper methods: Convenience methods like to_midi_note()

3. Testing Strategy

Comprehensive coverage: 60+ tests cover all features ✅ Organized by feature: 16 test classes for easy navigation ✅ Integration tests: Real-world document parsing ✅ Error testing: Validates error handling ✅ Edge cases: Empty documents, complex nesting


File Structure

src/midi_markdown/parser/
├── mml.lark              # Lark grammar (280 lines) ✅
├── ast_nodes.py          # AST node classes (450+ lines) ✅
└── ast_builder.py        # Parser & transformer (720 lines) ✅

tests/unit/
└── test_parser.py        # Test suite (600+ lines, 60+ tests) ✅

docs/
├── parser_design.md      # Complete design doc (600+ lines) ✅
├── parser_quick_reference.md  # Quick ref (200+ lines) ✅
└── parser_summary.md     # This file ✅

Total Lines of Code: ~2,500+ Test Coverage Goal: 85%+


Usage Examples

Basic Parsing

from midi_markdown.parser.parser import MMDParser

parser = MMDParser()

source = """---
title: "My Song"
---

[00:00.000]
- tempo 120
- pc 1.0
"""

doc = parser.parse_string(source)
print(doc.frontmatter["title"])  # "My Song"

Parse from File

from pathlib import Path

parser = MMDParser()
doc = parser.parse_file(Path("examples/00_basics/01_hello_world.mmd"))

Walk AST

for event in doc.events:
    if event["type"] in ("note_on", "cc", "pc"):
        print(f"Time: {event['timing']}")
        print(f"  Command: {event['type']}, Channel: {event['channel']}")

Error Handling

from lark import UnexpectedInput

try:
    doc = parser.parse_string("@@@ invalid")
except UnexpectedInput as e:
    print(f"Parse error at line {e.line}, column {e.column}: {e}")

Testing Status

Run Tests

# All parser tests
pytest tests/unit/test_parser.py -v

# Specific test class
pytest tests/unit/test_parser.py::TestTimingParsing -v

# With coverage
pytest tests/unit/test_parser.py --cov=src/midi_markdown/parser

# Fast run
pytest tests/unit/test_parser.py -q

Expected Results

tests/unit/test_parser.py::TestParserBasics PASSED [ 5%]
tests/unit/test_parser.py::TestFrontmatterParsing PASSED [ 8%]
tests/unit/test_parser.py::TestDirectiveParsing PASSED [13%]
... (60+ tests)
=============== 60+ passed in X.XXs ===============

Implementation Status

The parser is production-ready and provides the foundation for the complete compiler pipeline. All phases are now complete:

✅ Phase 0: Parser & AST (Complete)

Files: src/midi_markdown/parser/ - ✅ Lark-based LALR parser with full MMD syntax support - ✅ AST node definitions with position tracking - ✅ Transformer for parse tree → AST conversion - ✅ 598 passing unit tests

✅ Phase 1: Validation (Complete)

Files: src/midi_markdown/utils/validation/ - ✅ MIDI value range validation (0-127, channels 1-16) - ✅ Timing monotonicity checks - ✅ Frontmatter validation - ✅ Parameter type and range validation - ✅ Note name and octave validation

✅ Phase 2: Alias Resolution (Complete)

Files: src/midi_markdown/alias/ - ✅ Alias expansion to MIDI commands - ✅ Parameter substitution with enums and defaults - ✅ Computed values and expressions - ✅ Conditional logic (@if/@elif/@else) - ✅ Device library loading (@import)

✅ Phase 3: Real-time Playback (Complete)

Files: src/midi_markdown/runtime/ - ✅ Real-time MIDI output via python-rtmidi - ✅ Sub-5ms timing precision - ✅ Interactive TUI with playback controls - ✅ Tempo map support for dynamic tempo changes

✅ Phase 4: Command Expansion (Complete)

Files: src/midi_markdown/expansion/ - ✅ @loop directive expansion - ✅ @sweep automation - ✅ Variable substitution (${VAR}) - ✅ Expression evaluation - ✅ Musical/relative timing conversion

✅ Phase 5: MIDI Generation (Complete)

Files: src/midi_markdown/codegen/ - ✅ MIDI file generation (formats 0, 1, 2) - ✅ JSON/CSV export - ✅ Absolute timing calculation - ✅ Multi-track support - ✅ Meta event handling

✅ Phase 6: Generative & Modulation (Complete)

Files: src/midi_markdown/expansion/ - ✅ random() expressions for velocity, notes, CC values - ✅ curve() expressions (bezier, ease-in/out) - ✅ wave() expressions (LFO modulation) - ✅ envelope() expressions (ADSR, AR, AD)

Current Status: MVP Complete - 1264 passing tests, 72.53% coverage


Dependencies Added

uv add lark pyyaml

pyproject.toml:

[project]
dependencies = [
    "lark>=1.3.1",
    "pyyaml>=6.0",
    # ... existing dependencies
]


Success Criteria

Grammar Completeness: All MMD syntax covered ✅ AST Richness: 30+ node types, full feature coverage ✅ Test Coverage: 60+ tests, all features exercised ✅ Error Handling: Position tracking, helpful messages ✅ Documentation: Complete design doc, quick reference ✅ Type Safety: Full type hints throughout ✅ Maintainability: Grammar separate from code ✅ Performance: LALR parsing, fast and deterministic

All criteria met! ✅


Parser Capabilities Summary

The parser can now handle:

✅ YAML frontmatter with nested structures ✅ All directive types (@import, @define, @alias, @loop, @if, @track, etc.) ✅ Simple and macro alias definitions with parameters ✅ Parameter specifications (ranges, defaults, enums) ✅ All four timing notation types ✅ MIDI commands with dotted notation ✅ Alias calls (to be expanded in next phase) ✅ Meta commands (tempo, markers, etc.) ✅ Expressions (binary ops, variables, ramp, random) ✅ Note specifications with MIDI conversion ✅ Track headers and section markers ✅ Comments (3 styles) ✅ Loops with timing and intervals ✅ Conditionals with elif/else ✅ Complete multi-feature documents


Conclusion

The MMD parser is complete and production-ready. It provides:

  1. Clean architecture: Grammar, AST, and transformation clearly separated
  2. Comprehensive coverage: All MMD features supported
  3. Excellent testing: 60+ tests covering all scenarios
  4. Good documentation: Design docs, quick reference, inline docs
  5. Type safety: Full type hints for better development experience
  6. Error reporting: Position tracking for helpful error messages
  7. Maintainability: Easy to extend and modify

The parser serves as a solid foundation for the rest of the compiler pipeline. The AST it produces is ready for: - Validation (value ranges, timing checks) - Alias resolution (expand aliases to MIDI commands) - MIDI generation (convert to MIDI file format)


Status: ✅ Implementation Complete Quality: Production-ready Next Phase: Validation module


Document Version: 1.0 Last Updated: 2025-10-29