MEP 14. Query Algebra
| Field | Value |
|---|---|
| MEP | 14 |
| Title | Query Algebra |
| Author | Mochi core |
| Status | Informational |
| Type | Informational |
| Created | 2026-05-08 |
Abstract
Mochi has LINQ-shaped query expressions: from ... select with optional where, join, group, sort, skip, take, and distinct. This MEP documents the type rules per clause, the aggregate operators, the join cardinalities, and the soundness gaps that today's implementation accepts.
Motivation
Queries are how a lot of real Mochi programs are written. The clause type rules touch most of the type system: list types, group types, predicates, ordered values. A single document with the rules per clause keeps the implementation honest.
Specification
Surface
Mochi queries are LINQ-shaped. They start with a from and end with a select. The clause set:
from x in source
{ from y in source }*
{ [side] join z in source on cond }*
[ where pred ]
[ group by k1, ..., kn into g [ having pred ] ]
[ ( sort | order ) by expr ]
[ skip n ]
[ take n ]
select [ distinct ] proj
Required: from and select. Everything else is optional. The order is fixed by the grammar.
Type rules per clause
from x in xs requires xs : [T]. Inside the body of the query, x : T. T032 fires when the source is not a list.
Additional from y in ys clauses behave like nested loops. The projection sees both x and y in scope. There is no implicit cross-join elision; the result enumerates the cartesian product unless joined.
join z in zs on cond requires zs : [U] (T034) and cond : bool (T035). The join side modifier (left, right, outer) controls the cardinality of unmatched rows. A left join may produce z = null for unmatched outer rows; today this returns any because of MEP 10 A2.
where pred requires pred : bool (T033). The body filters rows.
group by k1, ..., kn into g partitions by the key tuple and binds g : group<K, T> where K is the key type (a tuple if multiple) and T is the source row type. The having clause requires bool (T042).
sort by expr. The expression must be ordered. Today the checker accepts any expression and the runtime sorts using a generic comparator. Negation prefix (-expr) sorts descending. We should add a fixture asserting an unordered type (e.g. struct) is rejected, but that requires the comparator to type fail rather than silently fail.
skip n and take n require n : int.
select expr returns [U] where U = ExprType(expr). Inside select the iteration variables are in scope.
select distinct expr is the same with deduplication. The element type must be hashable. Today the runtime falls back to string keys under the hood.
Aggregate operators
These are functions that take a list or a group and return a scalar:
count(g) : int. Accepts list or group.sum(g) : TwhereTis numeric.avg(g) : floatfor ints,bigratfor big numerics.min(g) : Tandmax(g) : Tfor orderedT.
T037, T038, T041 cover misuse of the aggregates.
Joins and cardinality
Inner join: rows with matching cond. Output cardinality is the intersection of left and right.
Left join: every left row appears, with null filling unmatched right-side bindings. Right join is symmetric. Outer join produces both sides.
Cardinality fixtures:
- Inner join on
idwith two rows on each side, two matches: result has two rows. - Left join with one unmatched left row: result has the matched row and one row with the right side
null. - Outer join with disjoint sides: result has the union of rows.
The current null behaviour falls under MEP 10 A2 (null is any). Once options are wired, the right-side bindings in a left join become Option<T> and pattern matching is required to access them.
Group plan and types
group by k1, ..., kn into g. Inside the rest of the query, g has type group<K, T> where K is the key tuple and T is the source row type. Aggregates over g are typed:
count(g) : intsum(g.field) : TifT : numericavg(g.field) : float | bigratmin(g.field) : T,max(g.field) : T
The group plan builder lives at types/query_plan_builder.go and emits a structured plan that the runtime consumes. The plan is also golden-tested under tests/queryplan/.
Distinct
select distinct e deduplicates by structural equality on e. Today the runtime canonicalises through a string form. We should add an explicit hashing path once the type system has hashable bounds, or constrain select distinct to types with a known canonical form (primitives, lists of primitives, structs of primitives).
sort by
sort by e orders the result by e ascending. sort by -e reverses. The expression should be of an ordered type:
- Numeric.
- String.
- Bool.
- Tuple of ordered types.
Today the checker accepts any expression. We should reject unordered expressions (structs without a declared comparator, list element types unless we lift the order to the lexicographic order) once the "ordered" trait is decided.
skip and take
Both require n : int and n >= 0. Negative values cause runtime errors. Adding a runtime guard or rejecting at type check time when the expression is a literal negative number would be a small win.
Soundness obligations
- A query whose source is not a list raises T032.
- A
wherewhose predicate is notboolraises T033. - A
havingwhose predicate is notboolraises T042. - An aggregate misuse raises T037, T038, or T041.
- The result type of a query is
[U]whereUis the projected type.
Targets:
- Reject sort expressions with unordered types.
- Reject join with non-list right-hand side.
- Reject
takeandskipwith non-int.
Fixtures to author
Existing:
query_source_not_list,join_on_not_bool,join_source_not_list,count_invalid_type,avg_non_numeric,query_basic_select,query_sort_take.
New:
query_select_struct_proj(valid).query_where_bool_required(error).query_take_int_required(error).query_skip_int_required(error).query_distinct_int(valid).query_distinct_struct_canonical(valid; pinned current behaviour).query_join_inner_cardinality(valid, snapshots row count).query_join_left_unmatched_null(valid, current behaviour pre A2).query_join_outer_both_sides(valid).query_group_having_bool_required(error).query_group_aggregate_int_sum(valid).query_group_aggregate_avg_float(valid).
Rationale
LINQ-shaped queries are familiar, simple to teach, and easy to translate to a relational plan. We keep the clause order fixed because that simplifies the parser and makes the plan obvious.
The null-from-left-join hack is a known wart. Once MEP 10 A2 lands, the right-side bindings become Option<T> and pattern matching is required to access them. We accept the wart in the meantime because changing it requires the option work.
Backwards Compatibility
Tightening sort, distinct, and skip/take to type-check at compile time is a breaking change for programs that rely on the loose runtime semantics. We pin the current behaviour in fixtures.
Reference Implementation
types/check.go— query type checking entry points.types/query_plan_builder.go— group plan construction.tests/queryplan/— golden tests for the plan.
Open Questions
- Ordered trait. Decide between an explicit interface for
min,max,sortor a hard-coded list of orderable kinds. - Hashable trait. Same question for
distinctand group keys. - Right-side
nullin left join. Tied to MEP 10 A2.
References
- LINQ specification: https://learn.microsoft.com/dotnet/csharp/linq/.
Copyright
This document is placed in the public domain.