MEP 2. Grammar
| Field | Value |
|---|---|
| MEP | 2 |
| Title | Grammar |
| Author | Mochi core |
| Status | Informational |
| Type | Informational |
| Created | 2026-05-08 |
Abstract
Mochi's grammar is encoded directly on Go struct tags processed by participle/v2. There is no separate grammar file. This MEP captures the productions in BNF-like form, identifies the ambiguities that the parser resolves with lookahead, and lists the places where the grammar requires special attention from contributors.
Motivation
A grammar that lives on struct tags is convenient to evolve but easy to misread. The same struct field can be both an AST shape and a parse rule. New contributors need a single place that lists every production so they can reason about parse behaviour without scanning the whole parser.go file.
Specification
Build configuration
The parser is built at parser/parser.go:661-666:
var Parser = participle.MustBuild[Program](
participle.Lexer(mochiLexer),
participle.Elide("Whitespace", "Comment"),
participle.Unquote("String"),
participle.UseLookahead(999),
)
UseLookahead(999) means the generator will try arbitrarily many tokens of lookahead when an alternation is ambiguous. That is what lets the parser disambiguate, for example, an Ident followed by { between a StructLiteral and an IfExpr that opens its body with a brace. The cost is silent backtracking and slower error reporting.
The participle tag language uses:
'lit'for keyword or punctuation literals.@@to recurse into a sub structure.@Tokento capture into the field.[ ... ]for optional groups.{ ... }for repetition.|for alternation.
Top level
Program = [ 'package' Ident ] Statement*
Statement = TestBlock | BenchBlock | ExpectStmt
| AgentDecl | StreamDecl | ModelDecl
| ImportStmt | TypeDecl
| ExternTypeDecl | ExternVarDecl | ExternFunDecl | ExternObjectDecl
| FactStmt | RuleStmt | OnHandler | EmitStmt
| LetStmt | VarStmt | AssignStmt | FunStmt
| ReturnStmt | IfStmt | WhileStmt | ForStmt
| BreakStmt | ContinueStmt
| FetchStmt | UpdateStmt | ExprStmt
The discriminated union is expressed by parser/parser.go:73-104. The parser tries each alternative in order. Any alternative that does not start with a unique keyword (AssignStmt, ExprStmt) sits at the bottom of the list because it would otherwise consume tokens meant for a more specific form.
Declarations
LetStmt = 'let' Ident [ ':' TypeRef ] [ '=' Expr ]
VarStmt = 'var' Ident [ ':' TypeRef ] [ '=' Expr ]
AssignStmt= Ident IndexOp* FieldOp* '=' Expr
FunStmt = [ 'export' ] 'fun' Ident [ '<' Ident { ',' Ident } '>' ]
'(' [ Param { ',' Param } ] ')' [ ':' TypeRef ]
'{' Statement* '}'
TypeDecl = 'type' Ident
( [ '=' ] '{' TypeMember { [','] TypeMember } [','] '}'
| '=' TypeVariant { '|' TypeVariant }
| '=' FunType | '=' GenericType )
TypeMember = TypeField | FunStmt
TypeDecl is the most overloaded production. Three forms:
- Struct:
type Point { x: int, y: int }(the=prefix is also accepted:type Point = { ... }). - Tagged union / simple variant:
type Shape = Circle(r: float) | Square(side: float). A bare name liketype Id = intalso lands in this path —intis parsed as a single nameless variant. The checker distinguishes a true alias from a single-variant declaration. - Function or generic alias:
type F = fun(int) : intortype MyList = list<int>.
The parser distinguishes the three by lookahead on the token after the type name.
Type references
TypeRef = FunType | GenericType | InlineStructType | Ident
FunType = 'fun' '(' [ TypeRef { ',' TypeRef } ] ')' [ ':' TypeRef ]
GenericType = Ident '<' TypeRef { ',' TypeRef } '>'
InlineStructType= '{' [ TypeField { ',' TypeField } ] [','] '}'
TypeField = Ident ':' TypeRef
There is no union type literal in TypeRef, only in TypeDecl. That is why let x: int | nil = ... will not parse. The fix is either to lift union into TypeRef or to introduce an option syntax such as int?. This is tracked in MEP 10.
There is no tuple type literal. Heterogeneous data must use a struct.
Expressions
The expression grammar is a precedence climbing pyramid:
Expr = BinaryExpr
BinaryExpr = Unary BinaryOp*
BinaryOp = ( '==' | '!=' | '<' | '<=' | '>' | '>=' | '+' | '-'
| '*' | '/' | '%' | 'in' | '&&' | '||'
| 'union' | 'except' | 'intersect' ) [ 'all' ] PostfixExpr
Unary = ( '-' | '!' )* PostfixExpr
PostfixExpr = Primary PostfixOp*
PostfixOp = CallOp | IndexOp | FieldOp | CastOp
Primary = StructLiteral | CallExpr | QueryExpr | LogicQueryExpr
| IfExpr | SelectorExpr | ListLiteral | MapLiteral
| FunExpr | MatchExpr | GenerateExpr | FetchExpr
| LoadExpr | SaveExpr | Literal | '(' Expr ')'
The flat BinaryOp* list is rebalanced into a precedence tree by the type checker, not by the parser. See types/infer.go:89-198 for the precedence levels. The parser does not enforce associativity. That is a deliberate simplification that pushes the work to a single place.
Unary.Ops is a slice, so multiple prefixes stack. --x is parsed as two unary minuses, not as a decrement operator. !!x is two boolean nots.
PostfixOp is also a slice, so chains like f()(2).field as int work.
Queries
QueryExpr = 'from' Ident 'in' Expr
{ FromClause }
{ JoinClause }
[ 'where' Expr ]
[ GroupByClause ]
[ ( 'sort' | 'order' ) 'by' Expr ]
[ 'skip' Expr ]
[ 'take' Expr ]
'select' [ 'distinct' ] Expr
JoinClause = [ 'left' | 'right' | 'outer' ] 'join' [ 'from' ] Ident 'in'
Expr 'on' Expr
GroupByClause = 'group' 'by' Expr { ',' Expr } 'into' Ident [ 'having' Expr ]
The fixed clause order matters. The grammar will not accept select ... from ... where ... because from must come first. This is LINQ shaped, not SQL shaped.
Pattern matching
MatchExpr = 'match' Expr '{' MatchCase* '}'
MatchCase = Expr '=>' ( Expr | '{' Statement* '}' )
There is no separate pattern grammar. The pattern is just an expression. The checker is responsible for deciding whether a given expression shape is allowed as a pattern. See MEP 13 for the full rules.
Lambda expressions
FunExpr = 'fun' [ '<' Ident { ',' Ident } '>' ]
'(' [ Param { ',' Param } ] ')'
[ ':' TypeRef ]
( '{' Statement* '}' | '=>' Expr )
Both block and arrow bodies are accepted. The arrow form yields the expression as the function result.
Logic programming
FactStmt = 'fact' LogicPredicate
RuleStmt = 'rule' LogicPredicate ':-' LogicCond { ',' LogicCond }
LogicCond = LogicPredicate | LogicNeq
LogicNeq = Ident '!=' Ident
LogicPredicate = Ident '(' [ LogicTerm { ',' LogicTerm } ] ')'
LogicTerm = Ident | String | Int
LogicQueryExpr = 'query' LogicPredicate
These productions parse but the runtime VM does not execute them. The interpreter handles them. See MEP 10.
Streams and agents
StreamDecl = 'stream' Ident '{' StreamField* '}'
EmitStmt = 'emit' Ident '{' [ StructLitField { ',' StructLitField } ] [','] '}'
OnHandler = 'on' Ident 'as' Ident '{' Statement* '}'
AgentDecl = 'agent' Ident '{' AgentBlock* '}'
AgentBlock = LetStmt | VarStmt | AssignStmt | OnHandler | IntentDecl
IntentDecl = 'intent' Ident '(' [ Param { ',' Param } ] ')'
[ ':' TypeRef ] '{' Statement* '}'
Same caveat as logic. Parsed and partially typed, not VM compiled.
I/O statements
FetchStmt = 'fetch' Expr 'into' Ident [ 'with' Expr ]
LoadExpr = 'load' [ String ] 'as' TypeRef [ 'with' Expr ]
SaveExpr = 'save' Expr [ 'to' String ] [ 'with' Expr ]
load and save need both as Type and an explicit format in the with map for any non-default behaviour.
Rationale
Pushing precedence handling into the type checker means the parser is a single pass that produces a flat AST. That decision keeps the parser file readable. It does cost in error quality, because a malformed precedence chain is often diagnosed late.
Lookahead 999 was chosen as effectively unbounded so we never have to think about backtracking when we add a new construct. The cost is harder-to-trace error messages when an alternative deep in the search tree fails.
Backwards Compatibility
Informational. No backward compatibility implications.
Reference Implementation
parser/parser.go:73-104— top level statement union.parser/parser.go:184-189—TypeRef.parser/parser.go:368, 373—BinaryOpshape.parser/parser.go:661-666— parser construction.types/infer.go:89-205— operator precedence applied during inference.
Open Questions
- Union types in
TypeRef. Adding union types as first-class type expressions would letlet x: int | nil = ...parse. We need to decide between full union types and anint?shorthand. - Tuple types. Heterogeneous fixed-length sequences have no surface today. We are not in a hurry, but if we ever add pattern destructuring of arbitrary product values we will need them.
- Ambiguity log. We resolve
Ident '{'conflicts with lookahead. As the language grows, the cost ofUseLookahead(999)will become noticeable. We should track the worst-case input.
References
- LINQ specification: https://learn.microsoft.com/dotnet/csharp/linq/.
Copyright
This document is placed in the public domain.