Nexevo.aiNexevo.ai
Back to models
Nexevo.ai · Flagship Product

Nexevo.aiSmart-routing LLM Gateway

100+ frontier models from 23+ providers (OpenAI / Anthropic / Google / DeepSeek / Qwen / Wan / Seedance / Moonshot / Zhipu / MiniMax and more) behind one OpenAI-compatible API. Our 4-layer routing picks the cheapest sufficient model per request — average 30-50% token savings, one API key, one bill.

100+

Upstream models

23+

Providers

30-50%

Avg token savings

<50ms

Avg latency

99.9%

SLA

4-layer Smart Routing Architecture

Not just "auto-pick a model" — a complete scheduler with online learning, ELO calibration, and cascade cost optimization

Layer 1

Capability-aware Matching

Match by 80 specialty tags + difficulty: code → Claude, math → o1, long Chinese context → Qwen — fine-grained per sub-skill

Layer 2

Bandit Online Learning

Thompson Sampling learns which model performs best per query type, with cost penalty. User feedback updates weights in real time — accuracy improves with usage

Layer 3

ELO Calibration

Weekly ELO tournaments on golden prompts (K=24) keep relative model strength accurate. New models auto-paired into Swiss-rounds — data-driven tier placement

Layer 4

Cascade Cost Optimization

Try cheap tier first, accept if confidence ≥ 0.7, escalate only if needed. Average 30-50% token savings vs always-GPT-4, no quality loss

3 Pricing Tiers, Each with Purpose

Fully usage-based — no subscription, no minimum, no monthly fee

nexevo/fast

$0.80 / $2.00/1M tokens

Flat-rate entry — simple Q&A / summarization / batch processing

  • GPT-4o-mini / Claude Haiku tier
  • All 4 routing layers active
  • ~5000 calls / $5
Recommended

nexevo/balanced

$5.00 / $20.00/1M tokens

Flat-rate flagship — full smart routing enabled, fits most use cases

  • Routes across GPT-4o / Claude Sonnet / DeepSeek-R1
  • Cascade saves 30-50% automatically
  • RAG / function calling / long context all supported
  • Cache hits billed at 25% × (1+fee)

Passthrough

Upstream price+ 5% service fee

Specify exact model_id, transparent upstream cost. Volume discounts at $5k+/mo

  • Pick any of 39+ models
  • BYOK mode flat 5% fee
  • Transparent upstream cost for accounting

$5 free credit on signup, no card required

Complete Capability Stack

100+ Upstream Models · 23+ Providers

OpenAI / Anthropic / Google / DeepSeek / Qwen / Moonshot / Zhipu / MiniMax / Wan / Seedance — 23+ providers, 100+ model variants, one integration

BYOK (Bring Your Own Key)

Use your own OpenAI / Anthropic key through our gateway. Just 5% flat fee, no main account drawdown, same smart routing

Multi-region Deployment

HK primary gateway + Shenzhen cn-shenzhen proxy (VPC private). Mainland models route via Shenzhen, overseas via HK — < 50ms latency

Cost Optimal

Cascade routing tries cheap first, cache hits at 25% × (1+fee), mainland models on mainland endpoints (cheaper). 40-60% combined savings

Enterprise Compliance

Stripe + crypto dual payment, 2FA + multi-admin RBAC, full hash-chain audit trail, automated PII scrubbing, opt-in data flywheel

Real-time Observability

Per-request trace + cost + cache hit + routing decision + rating in monitoring; Prometheus / Grafana ready, 99.9% SLA

Knowledge Base + RAG

Per-tenant Knowledge store (private), shared support KB (curated + chat-promoted), unified RAG on smart routing

Live Customer Support

Floating chat widget site-wide, RAG-powered AI replies + one-click human handoff + customer ratings, conversations auto-feed training corpus

Flexible Billing

No subscription, no minimum, token-based. 10-25% bonus on top-up, auto-recharge, monthly PDF invoice, per-key sub-quota and rate limits

One-line Migration

100% OpenAI compatible — existing OpenAI SDK / LangChain / LlamaIndex apps integrate in one base_url change. No rewrites.

from openai import OpenAI

client = OpenAI(
    api_key="nexevo-sk-xxx",                    # /dashboard/keys 创建
    base_url="https://api.nexevo.ai/v1",        # ← 改这一行
)

response = client.chat.completions.create(
    model="nexevo/balanced",                    # 推荐档,扁平价 $5/$20 per 1M
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

TypeScript SDK (@nexevo/sdk) + Python SDK (nexevo-ai) also available; curl / any OpenAI-compatible client works

vs. Calling Upstream APIs Directly

Why use Nexevo.ai instead of going straight to OpenAI / Claude

AspectDirect upstreamNexevo.ai
API key managementOne per vendor — register / fund / maintain each separately1 key for 39+, one bill
Model selectionManual — pay GPT-4 even for trivial tasksSmart routing picks by difficulty, 30-50% savings
China accessCross-border slow / unstableShenzhen proxy_cn routing, < 50ms
Team managementNo sub-accounts, hard to governPer-key quotas / IP whitelist / monthly caps / multi-admin RBAC
AccountingN invoices to reconcileSingle monthly PDF + audit chain
FailoverUpstream down → your app downAuto fallback to peer model + circuit breaker

Sign up in 5 seconds, get $5 free credit

No card required — enough for ~5000 GPT-4o-mini calls or ~50 DALL-E 3 images

No subscription · No minimum · You don't pay if you don't use it

Sign up free
Nexevo.ai · 智能路由 LLM 网关 — Nexevo.ai | Nexevo.ai