What's served, how to call it, and how routes map to backends
Models served
Sign in with an account holding the inference-consumer role and provision a key to see the live model list. Until then, here's the route inventory below.
Routes
| Method | Path | Backend | Notes |
|---|---|---|---|
| POST | /v1/chat/completions | qwen3-35b | Primary chat model. Model override: Qwen3.6-35B-A3B. |
| POST | /v1/gemma/chat/completions | gemma-4-26b | Gemma chat. Model override: gemma-4-26B-A4B-it. |
| POST | /v1/completions | qwen3-35b | Legacy text-completions endpoint. |
| POST | /v1/embeddings | qwen3-embedding-4b (50%) + f2llm-v2-4b (50%) | Embeddings traffic is split 50/50 across the two embedding models. |
| GET | /v1/models | models-proxy | Lists every model the gateway can route to. Bypasses LLM validation. |
| POST | /v1/cloud/openai/* | openai | Direct passthrough to OpenAI (external). |
| POST | /v1/cloud/claude/* | anthropic | Direct passthrough to Anthropic Claude (external). |
Quick start (cURL)
curl https://agentgateway.dev.drai.auckland.ac.nz/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"
curl https://agentgateway.dev.drai.auckland.ac.nz/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "default",
"messages": [{"role": "user", "content": "Hello"}]
}'
curl https://agentgateway.dev.drai.auckland.ac.nz/v1/gemma/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "default",
"messages": [{"role": "user", "content": "Hello"}]
}'
curl https://agentgateway.dev.drai.auckland.ac.nz/v1/embeddings \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "default", "input": "Embed this text."}'
MCP servers & agents
agentgateway/backends.yaml),
they will appear here automatically.