Nexevo.aiSmart-routing LLM Gateway
100+ frontier models from 23+ providers (OpenAI / Anthropic / Google / DeepSeek / Qwen / Wan / Seedance / Moonshot / Zhipu / MiniMax and more) behind one OpenAI-compatible API. Our 4-layer routing picks the cheapest sufficient model per request — average 30-50% token savings, one API key, one bill.
100+
Upstream models
23+
Providers
30-50%
Avg token savings
<50ms
Avg latency
99.9%
SLA
4-layer Smart Routing Architecture
Not just "auto-pick a model" — a complete scheduler with online learning, ELO calibration, and cascade cost optimization
Capability-aware Matching
Match by 80 specialty tags + difficulty: code → Claude, math → o1, long Chinese context → Qwen — fine-grained per sub-skill
Bandit Online Learning
Thompson Sampling learns which model performs best per query type, with cost penalty. User feedback updates weights in real time — accuracy improves with usage
ELO Calibration
Weekly ELO tournaments on golden prompts (K=24) keep relative model strength accurate. New models auto-paired into Swiss-rounds — data-driven tier placement
Cascade Cost Optimization
Try cheap tier first, accept if confidence ≥ 0.7, escalate only if needed. Average 30-50% token savings vs always-GPT-4, no quality loss
3 Pricing Tiers, Each with Purpose
Fully usage-based — no subscription, no minimum, no monthly fee
nexevo/fast
Flat-rate entry — simple Q&A / summarization / batch processing
- GPT-4o-mini / Claude Haiku tier
- All 4 routing layers active
- ~5000 calls / $5
nexevo/balanced
Flat-rate flagship — full smart routing enabled, fits most use cases
- Routes across GPT-4o / Claude Sonnet / DeepSeek-R1
- Cascade saves 30-50% automatically
- RAG / function calling / long context all supported
- Cache hits billed at 25% × (1+fee)
Passthrough
Specify exact model_id, transparent upstream cost. Volume discounts at $5k+/mo
- Pick any of 39+ models
- BYOK mode flat 5% fee
- Transparent upstream cost for accounting
$5 free credit on signup, no card required
Complete Capability Stack
100+ Upstream Models · 23+ Providers
OpenAI / Anthropic / Google / DeepSeek / Qwen / Moonshot / Zhipu / MiniMax / Wan / Seedance — 23+ providers, 100+ model variants, one integration
BYOK (Bring Your Own Key)
Use your own OpenAI / Anthropic key through our gateway. Just 5% flat fee, no main account drawdown, same smart routing
Multi-region Deployment
HK primary gateway + Shenzhen cn-shenzhen proxy (VPC private). Mainland models route via Shenzhen, overseas via HK — < 50ms latency
Cost Optimal
Cascade routing tries cheap first, cache hits at 25% × (1+fee), mainland models on mainland endpoints (cheaper). 40-60% combined savings
Enterprise Compliance
Stripe + crypto dual payment, 2FA + multi-admin RBAC, full hash-chain audit trail, automated PII scrubbing, opt-in data flywheel
Real-time Observability
Per-request trace + cost + cache hit + routing decision + rating in monitoring; Prometheus / Grafana ready, 99.9% SLA
Knowledge Base + RAG
Per-tenant Knowledge store (private), shared support KB (curated + chat-promoted), unified RAG on smart routing
Live Customer Support
Floating chat widget site-wide, RAG-powered AI replies + one-click human handoff + customer ratings, conversations auto-feed training corpus
Flexible Billing
No subscription, no minimum, token-based. 10-25% bonus on top-up, auto-recharge, monthly PDF invoice, per-key sub-quota and rate limits
One-line Migration
100% OpenAI compatible — existing OpenAI SDK / LangChain / LlamaIndex apps integrate in one base_url change. No rewrites.
from openai import OpenAI
client = OpenAI(
api_key="nexevo-sk-xxx", # /dashboard/keys 创建
base_url="https://api.nexevo.ai/v1", # ← 改这一行
)
response = client.chat.completions.create(
model="nexevo/balanced", # 推荐档,扁平价 $5/$20 per 1M
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)TypeScript SDK (@nexevo/sdk) + Python SDK (nexevo-ai) also available; curl / any OpenAI-compatible client works
vs. Calling Upstream APIs Directly
Why use Nexevo.ai instead of going straight to OpenAI / Claude
| Aspect | Direct upstream | Nexevo.ai |
|---|---|---|
| API key management | One per vendor — register / fund / maintain each separately | 1 key for 39+, one bill |
| Model selection | Manual — pay GPT-4 even for trivial tasks | Smart routing picks by difficulty, 30-50% savings |
| China access | Cross-border slow / unstable | Shenzhen proxy_cn routing, < 50ms |
| Team management | No sub-accounts, hard to govern | Per-key quotas / IP whitelist / monthly caps / multi-admin RBAC |
| Accounting | N invoices to reconcile | Single monthly PDF + audit chain |
| Failover | Upstream down → your app down | Auto fallback to peer model + circuit breaker |
Sign up in 5 seconds, get $5 free credit
No card required — enough for ~5000 GPT-4o-mini calls or ~50 DALL-E 3 images
No subscription · No minimum · You don't pay if you don't use it