INFERENCE · OPENAI-COMPATIBLE
Drop-in. Decentralised.
Same SDKs you already use. SpaceRouter places the work on the cheapest available GPU on the network. Llama, DeepSeek, Mixtral, Qwen — plus vision, embeddings and voice.
Inference API
spacerouter.ai
OpenAI-compatible inference, routed across our decentralised GPU network. Same SDKs you already use. Lower prices. Models that wouldn't fit in a single data centre.
Quickstart
Three lines to inference
curl https://spacerouter.ai/v1/chat/completions \
-H "Authorization: Bearer $SPACEROUTER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.1-70B-Instruct",
"messages": [
{ "role": "user", "content": "Hello from my agent" }
]
}'Base URL
https://spacerouter.ai/v1
Auth
Bearer token
Format
OpenAI-compatible
Models
Open models, decentralised serving
Routed to the cheapest available GPU that meets the VRAM requirement. Add more by request.
Llama 3.1 8B
Fast general-purpose chat
Llama 3.1 70B
High-quality reasoning and chat
Llama 3.1 405B
Largest open model, multi-GPU
Mistral 7B
Efficient instruction-following
Mixtral 8x7B
Mixture-of-experts, fast and capable
Mixtral 8x22B
Large MoE for complex tasks
DeepSeek V3
State-of-the-art open MoE
DeepSeek Coder V2
Code generation and completion
CodeLlama 34B
Code-specialised Llama variant
Phi-3 Mini
Small but capable, runs on any GPU
Qwen 2.5 72B
Multilingual reasoning model
Gemma 2 27B
Efficient mid-size chat
BGE Large
Text embedding model
LLaVA 1.6 34B
Vision-language model
Prices in $/M tokens (input / output). Final pricing on the pricing page.
Drop-in compatible
Use the OpenAI Python or TypeScript SDK. Just swap the base URL.
Routed to cheapest GPU
SpaceRouter discovers nodes that can serve your model and picks the best price/latency.
Voice & embeddings
TTS metered per minute. Embeddings priced per million tokens. Same key works across all.