Generative AI

Mochi treats language models as part of the language. The generate keyword opens a block describing what to generate. The model keyword names a provider configuration to reuse across calls. Tool calling, structured output, streaming, and embeddings are all language features. No SDK to import.

A first generation

let summary = generate text {
  prompt: "Summarize the manual in one sentence."
}

print(summary)

generate text { ... } returns a string. Mochi sends the prompt to the configured default model and waits for the response.

Set the default provider once with environment variables (OPENAI_API_KEY and friends) and the call works without further configuration.

Configuring a model

A model block names a provider configuration. generate blocks then refer to the model by name.

model fast {
  provider: "openai"
  name: "gpt-5.5-mini"
  temperature: 0.3
  max_tokens: 200
}

let summary = generate text {
  model: "fast"
  prompt: "Summarize the manual in one sentence."
}

Recognized fields:

Field	Type	Notes
`provider`	`string`	`"openai"`, `"anthropic"`, `"google"`, `"ollama"`, etc.
`name`	`string`	Provider-specific model identifier (e.g. `"gpt-5.5"`, `"claude-opus-4.7"`, `"gemini-2.5-pro"`).
`temperature`	`float`	0.0 to 2.0 typically.
`max_tokens`	`int`	Output cap.
`top_p`	`float`	Nucleus sampling.
`seed`	`int`	Deterministic sampling where supported.
`base_url`	`string`	Override the API base URL.
`api_key_env`	`string`	Name of the env var holding the key.

Fields beyond these pass through to the provider untouched, so new provider features work without waiting for Mochi to model them.

Multiple model blocks can co-exist. A program can route different calls to different providers without changing the call sites:

model fast {
  provider: "openai"
  name: "gpt-5.5-mini"
  temperature: 0.3
}

model deep {
  provider: "anthropic"
  name: "claude-opus-4.7"
  temperature: 0.2
}

model cheap {
  provider: "google"
  name: "gemini-2.5-flash"
  temperature: 0.4
}

let outline  = generate text { model: "fast", prompt: "Outline a blog post." }
let polished = generate text { model: "deep", prompt: outline }
let tags     = generate text { model: "cheap", prompt: "Tag: " + polished } as json

Prompts and system messages

prompt is the user message. system (optional) is the system instruction.

let answer = generate text {
  model: "fast"
  system: "You are a terse assistant. Answer in one sentence."
  prompt: "Why is bytecode useful?"
}

Prompts use string interpolation freely:

let topic = "agents"
let intro = generate text {
  prompt: "Write a one-paragraph intro to " + topic + " for a Mochi user."
}

Structured output

as json parses the response as JSON and returns a typed value:

type Plan {
  title: string
  steps: list<string>
}

let plan = generate text {
  model: "fast"
  prompt: "Output a plan with 'title' and 'steps' fields."
} as json

print(plan["title"])
for s in plan["steps"] {
  print("•", s)
}

If the response cannot parse as JSON, the call raises an error catchable with try / catch. Most providers honour an explicit JSON request mode when as json is set; Mochi opts in automatically when the provider supports it.

To get a fully-typed result, annotate the binding:

let plan: Plan = generate text {
  model: "fast"
  prompt: "Output a plan with 'title' and 'steps' fields."
} as json

Tool calling

Functions can be exposed as tools the model is allowed to call mid- generation. Pass a list of function values in the tools field. Each tool takes a description that helps the model choose when to use it.

fun get_weather(city: string): string {
  if city == "Paris" { return "sunny, 21°C" }
  return "weather data unavailable"
}

let answer = generate text {
  model: "fast"
  prompt: "What is the weather in Paris?"
  tools: [
    get_weather { description: "Returns the current weather for a city." }
  ]
}

print(answer)

Mochi handles the tool-call protocol automatically: the model emits a tool call, Mochi invokes the function, sends the result back, and waits for the model to continue. Multi-step tool calls are supported.

For agents, intents double as MCP-exposed tools. Pass an agent's intents the same way you would a free-standing function.

Multiple turns

For a multi-turn conversation, supply a messages list instead of a single prompt:

let answer = generate text {
  model: "fast"
  messages: [
    { role: "system", content: "You are a Mochi mentor." },
    { role: "user", content: "How do I declare a struct?" },
    { role: "assistant", content: "With the `type` keyword." },
    { role: "user", content: "Show an example." }
  ]
}

The fields are interpreted by the provider in the standard way.

Streaming responses

stream: true returns tokens as they arrive. The block returns a stream the program iterates over:

for chunk in generate text {
  model: "fast"
  prompt: "Write a haiku about agents."
  stream: true
} {
  print(chunk)
}

Most providers support streaming. The chunks are concatenable strings. Join them with + for the full response.

Embeddings

generate embedding returns a list<float> instead of a string. It uses the embedding model configured for the provider.

let vec = generate embedding {
  text: "hello world"
  normalize: true
}

print(len(vec))   // depends on the model: e.g. 1536

Use embeddings for similarity search, clustering, or retrieval-augmented generation. The prelude provides cosine_similarity(a, b): float.

Caching

generate blocks cache results based on the input. Add a cache clause to enable it:

let summary = generate text {
  model: "fast"
  prompt: "Summarize the changelog."
  cache: true
}

A second call with the same prompt and model returns the cached value. Caches are keyed by a hash of the request and stored under ~/.cache/mochi/llm by default. Override with the MOCHI_LLM_CACHE env var.

Error handling

Network or provider errors are raised as catchable errors:

try {
  let summary = generate text {
    model: "fast"
    prompt: "..."
  }
  print(summary)
} catch err {
  print("generation failed:", err)
}

Models can also return safety-flagged or empty responses. Inspect the return value before using it.

Common patterns

One-shot extraction

type Address { street: string, city: string, zip: string }

let address: Address = generate text {
  model: "fast"
  prompt: "Extract the address from: '123 Main St, Springfield, 90210'"
} as json

Tool-augmented Q&A

fun lookup_doc(query: string): string {
  // your retrieval logic
}

let answer = generate text {
  prompt: "Answer the user's question using the docs."
  tools: [
    lookup_doc { description: "Search the project documentation." }
  ]
}

Embedding-based retrieval

let docs: list<Doc> = load "docs.jsonl" as Doc

fun closest(query: string, n: int): list<Doc> {
  let q = generate embedding { text: query }
  let scored =
    from d in docs
    let score = cosine_similarity(q, d.embedding)
    sort by -score
    take n
    select d
  return scored
}

Common errors

Message	Cause	Fix
`no provider configured`	Missing API key or `model` block	Set the env var or declare a `model`.
`cannot decode response as JSON`	`as json` on a non-JSON response	Adjust the prompt; some providers need explicit instructions.
`tool call did not return`	Tool function panicked	Inspect the panic; consider returning a string error instead.
`streaming not supported`	Provider does not stream	Drop `stream: true`.

A first generation​

Configuring a model​

Prompts and system messages​

Structured output​

Tool calling​

Multiple turns​

Streaming responses​

Embeddings​

Caching​

Error handling​

Common patterns​

One-shot extraction​

Tool-augmented Q&A​

Embedding-based retrieval​

Common errors​

See also​