Skip to main content

MEP 14. Query Algebra

FieldValue
MEP14
TitleQuery Algebra
AuthorMochi core
StatusInformational
TypeInformational
Created2026-05-08

Abstract

Mochi has LINQ-shaped query expressions: from ... select with optional where, join, group, sort, skip, take, and distinct. This MEP documents the type rules per clause, the aggregate operators, the join cardinalities, and the soundness gaps that today's implementation accepts.

Motivation

Queries are how a lot of real Mochi programs are written. The clause type rules touch most of the type system: list types, group types, predicates, ordered values. A single document with the rules per clause keeps the implementation honest.

Specification

Surface

Mochi queries are LINQ-shaped. They start with a from and end with a select. The clause set:

from x in source
{ from y in source }*
{ [side] join z in source on cond }*
[ where pred ]
[ group by k1, ..., kn into g [ having pred ] ]
[ ( sort | order ) by expr ]
[ skip n ]
[ take n ]
select [ distinct ] proj

Required: from and select. Everything else is optional. The order is fixed by the grammar.

Type rules per clause

from x in xs requires xs : [T]. Inside the body of the query, x : T. T032 fires when the source is not a list.

Additional from y in ys clauses behave like nested loops. The projection sees both x and y in scope. There is no implicit cross-join elision; the result enumerates the cartesian product unless joined.

join z in zs on cond requires zs : [U] (T034) and cond : bool (T035). The join side modifier (left, right, outer) controls the cardinality of unmatched rows. A left join may produce z = null for unmatched outer rows; today this returns any because of MEP 10 A2.

where pred requires pred : bool (T033). The body filters rows.

group by k1, ..., kn into g partitions by the key tuple and binds g : group<K, T> where K is the key type (a tuple if multiple) and T is the source row type. The having clause requires bool (T042).

sort by expr. The expression must be ordered. Today the checker accepts any expression and the runtime sorts using a generic comparator. Negation prefix (-expr) sorts descending. We should add a fixture asserting an unordered type (e.g. struct) is rejected, but that requires the comparator to type fail rather than silently fail.

skip n and take n require n : int.

select expr returns [U] where U = ExprType(expr). Inside select the iteration variables are in scope.

select distinct expr is the same with deduplication. The element type must be hashable. Today the runtime falls back to string keys under the hood.

Aggregate operators

These are functions that take a list or a group and return a scalar:

  • count(g) : int. Accepts list or group.
  • sum(g) : T where T is numeric.
  • avg(g) : float for ints, bigrat for big numerics.
  • min(g) : T and max(g) : T for ordered T.

T037, T038, T041 cover misuse of the aggregates.

Joins and cardinality

Inner join: rows with matching cond. Output cardinality is the intersection of left and right.

Left join: every left row appears, with null filling unmatched right-side bindings. Right join is symmetric. Outer join produces both sides.

Cardinality fixtures:

  • Inner join on id with two rows on each side, two matches: result has two rows.
  • Left join with one unmatched left row: result has the matched row and one row with the right side null.
  • Outer join with disjoint sides: result has the union of rows.

The current null behaviour falls under MEP 10 A2 (null is any). Once options are wired, the right-side bindings in a left join become Option<T> and pattern matching is required to access them.

Group plan and types

group by k1, ..., kn into g. Inside the rest of the query, g has type group<K, T> where K is the key tuple and T is the source row type. Aggregates over g are typed:

  • count(g) : int
  • sum(g.field) : T if T : numeric
  • avg(g.field) : float | bigrat
  • min(g.field) : T, max(g.field) : T

The group plan builder lives at types/query_plan_builder.go and emits a structured plan that the runtime consumes. The plan is also golden-tested under tests/queryplan/.

Distinct

select distinct e deduplicates by structural equality on e. Today the runtime canonicalises through a string form. We should add an explicit hashing path once the type system has hashable bounds, or constrain select distinct to types with a known canonical form (primitives, lists of primitives, structs of primitives).

sort by

sort by e orders the result by e ascending. sort by -e reverses. The expression should be of an ordered type:

  • Numeric.
  • String.
  • Bool.
  • Tuple of ordered types.

Today the checker accepts any expression. We should reject unordered expressions (structs without a declared comparator, list element types unless we lift the order to the lexicographic order) once the "ordered" trait is decided.

skip and take

Both require n : int and n >= 0. Negative values cause runtime errors. Adding a runtime guard or rejecting at type check time when the expression is a literal negative number would be a small win.

Soundness obligations

  • A query whose source is not a list raises T032.
  • A where whose predicate is not bool raises T033.
  • A having whose predicate is not bool raises T042.
  • An aggregate misuse raises T037, T038, or T041.
  • The result type of a query is [U] where U is the projected type.

Targets:

  • Reject sort expressions with unordered types.
  • Reject join with non-list right-hand side.
  • Reject take and skip with non-int.

Fixtures to author

Existing:

  • query_source_not_list, join_on_not_bool, join_source_not_list, count_invalid_type, avg_non_numeric, query_basic_select, query_sort_take.

New:

  • query_select_struct_proj (valid).
  • query_where_bool_required (error).
  • query_take_int_required (error).
  • query_skip_int_required (error).
  • query_distinct_int (valid).
  • query_distinct_struct_canonical (valid; pinned current behaviour).
  • query_join_inner_cardinality (valid, snapshots row count).
  • query_join_left_unmatched_null (valid, current behaviour pre A2).
  • query_join_outer_both_sides (valid).
  • query_group_having_bool_required (error).
  • query_group_aggregate_int_sum (valid).
  • query_group_aggregate_avg_float (valid).

Rationale

LINQ-shaped queries are familiar, simple to teach, and easy to translate to a relational plan. We keep the clause order fixed because that simplifies the parser and makes the plan obvious.

The null-from-left-join hack is a known wart. Once MEP 10 A2 lands, the right-side bindings become Option<T> and pattern matching is required to access them. We accept the wart in the meantime because changing it requires the option work.

Backwards Compatibility

Tightening sort, distinct, and skip/take to type-check at compile time is a breaking change for programs that rely on the loose runtime semantics. We pin the current behaviour in fixtures.

Reference Implementation

  • types/check.go — query type checking entry points.
  • types/query_plan_builder.go — group plan construction.
  • tests/queryplan/ — golden tests for the plan.

Open Questions

  • Ordered trait. Decide between an explicit interface for min, max, sort or a hard-coded list of orderable kinds.
  • Hashable trait. Same question for distinct and group keys.
  • Right-side null in left join. Tied to MEP 10 A2.

References

This document is placed in the public domain.