MEP 2. Grammar

Field	Value
MEP	2
Title	Grammar
Author	Mochi core
Status	Informational
Type	Informational
Created	2026-05-08

Abstract

Mochi's grammar is encoded directly on Go struct tags processed by participle/v2. There is no separate grammar file. This MEP captures the productions in BNF-like form, identifies the ambiguities that the parser resolves with lookahead, and lists the places where the grammar requires special attention from contributors.

Motivation

A grammar that lives on struct tags is convenient to evolve but easy to misread. The same struct field can be both an AST shape and a parse rule. New contributors need a single place that lists every production so they can reason about parse behaviour without scanning the whole parser.go file.

Specification

Build configuration

The parser is built at parser/parser.go:661-666:

var Parser = participle.MustBuild[Program](
    participle.Lexer(mochiLexer),
    participle.Elide("Whitespace", "Comment"),
    participle.Unquote("String"),
    participle.UseLookahead(999),
)

UseLookahead(999) means the generator will try arbitrarily many tokens of lookahead when an alternation is ambiguous. That is what lets the parser disambiguate, for example, an Ident followed by { between a StructLiteral and an IfExpr that opens its body with a brace. The cost is silent backtracking and slower error reporting.

The participle tag language uses:

'lit' for keyword or punctuation literals.
@@ to recurse into a sub structure.
@Token to capture into the field.
[ ... ] for optional groups.
{ ... } for repetition.
| for alternation.

Top level

Program        = [ 'package' Ident ] Statement*

Statement      = TestBlock | BenchBlock | ExpectStmt
               | AgentDecl | StreamDecl | ModelDecl
               | ImportStmt | TypeDecl
               | ExternTypeDecl | ExternVarDecl | ExternFunDecl | ExternObjectDecl
               | FactStmt | RuleStmt | OnHandler | EmitStmt
               | LetStmt | VarStmt | AssignStmt | FunStmt
               | ReturnStmt | IfStmt | WhileStmt | ForStmt
               | BreakStmt | ContinueStmt
               | FetchStmt | UpdateStmt | ExprStmt

The discriminated union is expressed by parser/parser.go:73-104. The parser tries each alternative in order. Any alternative that does not start with a unique keyword (AssignStmt, ExprStmt) sits at the bottom of the list because it would otherwise consume tokens meant for a more specific form.

Declarations

LetStmt   = 'let' Ident [ ':' TypeRef ] [ '=' Expr ]
VarStmt   = 'var' Ident [ ':' TypeRef ] [ '=' Expr ]
AssignStmt= Ident IndexOp* FieldOp* '=' Expr
FunStmt   = [ 'export' ] 'fun' Ident [ '<' Ident { ',' Ident } '>' ]
            '(' [ Param { ',' Param } ] ')' [ ':' TypeRef ]
            '{' Statement* '}'
TypeDecl  = 'type' Ident
            ( [ '=' ] '{' TypeMember { [','] TypeMember } [','] '}'
            | '=' TypeVariant { '|' TypeVariant }
            | '=' FunType | '=' GenericType )
TypeMember  = TypeField | FunStmt

TypeDecl is the most overloaded production. Three forms:

Struct: type Point { x: int, y: int } (the = prefix is also accepted: type Point = { ... }).
Tagged union / simple variant: type Shape = Circle(r: float) | Square(side: float). A bare name like type Id = int also lands in this path — int is parsed as a single nameless variant. The checker distinguishes a true alias from a single-variant declaration.
Function or generic alias: type F = fun(int) : int or type MyList = list<int>.

The parser distinguishes the three by lookahead on the token after the type name.

Type references

TypeRef         = FunType | GenericType | InlineStructType | Ident
FunType         = 'fun' '(' [ TypeRef { ',' TypeRef } ] ')' [ ':' TypeRef ]
GenericType     = Ident '<' TypeRef { ',' TypeRef } '>'
InlineStructType= '{' [ TypeField { ',' TypeField } ] [','] '}'
TypeField       = Ident ':' TypeRef

There is no union type literal in TypeRef, only in TypeDecl. That is why let x: int | nil = ... will not parse. The fix is either to lift union into TypeRef or to introduce an option syntax such as int?. This is tracked in MEP 10.

There is no tuple type literal. Heterogeneous data must use a struct.

Expressions

The expression grammar is a precedence climbing pyramid:

Expr        = BinaryExpr
BinaryExpr  = Unary BinaryOp*
BinaryOp    = ( '==' | '!=' | '<' | '<=' | '>' | '>=' | '+' | '-'
              | '*' | '/' | '%' | 'in' | '&&' | '||'
              | 'union' | 'except' | 'intersect' ) [ 'all' ] PostfixExpr
Unary       = ( '-' | '!' )* PostfixExpr
PostfixExpr = Primary PostfixOp*
PostfixOp   = CallOp | IndexOp | FieldOp | CastOp
Primary     = StructLiteral | CallExpr | QueryExpr | LogicQueryExpr
            | IfExpr | SelectorExpr | ListLiteral | MapLiteral
            | FunExpr | MatchExpr | GenerateExpr | FetchExpr
            | LoadExpr | SaveExpr | Literal | '(' Expr ')'

The flat BinaryOp* list is rebalanced into a precedence tree by the type checker, not by the parser. See types/infer.go:89-198 for the precedence levels. The parser does not enforce associativity. That is a deliberate simplification that pushes the work to a single place.

Unary.Ops is a slice, so multiple prefixes stack. --x is parsed as two unary minuses, not as a decrement operator. !!x is two boolean nots.

PostfixOp is also a slice, so chains like f()(2).field as int work.

Queries

QueryExpr   = 'from' Ident 'in' Expr
              { FromClause }
              { JoinClause }
              [ 'where' Expr ]
              [ GroupByClause ]
              [ ( 'sort' | 'order' ) 'by' Expr ]
              [ 'skip' Expr ]
              [ 'take' Expr ]
              'select' [ 'distinct' ] Expr
JoinClause  = [ 'left' | 'right' | 'outer' ] 'join' [ 'from' ] Ident 'in'
              Expr 'on' Expr
GroupByClause = 'group' 'by' Expr { ',' Expr } 'into' Ident [ 'having' Expr ]

The fixed clause order matters. The grammar will not accept select ... from ... where ... because from must come first. This is LINQ shaped, not SQL shaped.

Pattern matching

MatchExpr = 'match' Expr '{' MatchCase* '}'
MatchCase = Expr '=>' ( Expr | '{' Statement* '}' )

There is no separate pattern grammar. The pattern is just an expression. The checker is responsible for deciding whether a given expression shape is allowed as a pattern. See MEP 13 for the full rules.

Lambda expressions

FunExpr = 'fun' [ '<' Ident { ',' Ident } '>' ]
          '(' [ Param { ',' Param } ] ')'
          [ ':' TypeRef ]
          ( '{' Statement* '}' | '=>' Expr )

Both block and arrow bodies are accepted. The arrow form yields the expression as the function result.

Logic programming

FactStmt        = 'fact' LogicPredicate
RuleStmt        = 'rule' LogicPredicate ':-' LogicCond { ',' LogicCond }
LogicCond       = LogicPredicate | LogicNeq
LogicNeq        = Ident '!=' Ident
LogicPredicate  = Ident '(' [ LogicTerm { ',' LogicTerm } ] ')'
LogicTerm       = Ident | String | Int
LogicQueryExpr  = 'query' LogicPredicate

These productions parse but the runtime VM does not execute them. The interpreter handles them. See MEP 10.

Streams and agents

StreamDecl = 'stream' Ident '{' StreamField* '}'
EmitStmt   = 'emit' Ident '{' [ StructLitField { ',' StructLitField } ] [','] '}'
OnHandler  = 'on' Ident 'as' Ident '{' Statement* '}'
AgentDecl  = 'agent' Ident '{' AgentBlock* '}'
AgentBlock = LetStmt | VarStmt | AssignStmt | OnHandler | IntentDecl
IntentDecl = 'intent' Ident '(' [ Param { ',' Param } ] ')'
             [ ':' TypeRef ] '{' Statement* '}'

Same caveat as logic. Parsed and partially typed, not VM compiled.

I/O statements

FetchStmt = 'fetch' Expr 'into' Ident [ 'with' Expr ]
LoadExpr  = 'load' [ String ] 'as' TypeRef [ 'with' Expr ]
SaveExpr  = 'save' Expr [ 'to' String ] [ 'with' Expr ]

load and save need both as Type and an explicit format in the with map for any non-default behaviour.

Rationale

Pushing precedence handling into the type checker means the parser is a single pass that produces a flat AST. That decision keeps the parser file readable. It does cost in error quality, because a malformed precedence chain is often diagnosed late.

Lookahead 999 was chosen as effectively unbounded so we never have to think about backtracking when we add a new construct. The cost is harder-to-trace error messages when an alternative deep in the search tree fails.

Backwards Compatibility

Informational. No backward compatibility implications.

Reference Implementation

parser/parser.go:73-104 — top level statement union.
parser/parser.go:184-189 — TypeRef.
parser/parser.go:368, 373 — BinaryOp shape.
parser/parser.go:661-666 — parser construction.
types/infer.go:89-205 — operator precedence applied during inference.

Open Questions

Union types in TypeRef. Adding union types as first-class type expressions would let let x: int | nil = ... parse. We need to decide between full union types and an int? shorthand.
Tuple types. Heterogeneous fixed-length sequences have no surface today. We are not in a hurry, but if we ever add pattern destructuring of arbitrary product values we will need them.
Ambiguity log. We resolve Ident '{' conflicts with lookahead. As the language grows, the cost of UseLookahead(999) will become noticeable. We should track the worst-case input.

References

LINQ specification: https://learn.microsoft.com/dotnet/csharp/linq/.

Copyright

This document is placed in the public domain.

Abstract​

Motivation​

Specification​

Build configuration​

Top level​

Declarations​

Type references​

Expressions​

Queries​

Pattern matching​

Lambda expressions​

Logic programming​

Streams and agents​

I/O statements​

Rationale​

Backwards Compatibility​

Reference Implementation​

Open Questions​

References​

Copyright​