Parasail

At Parasail, we're dedicated to bringing you the best frontier open models with industry-leading latency, cost, and reliability, which is why we're excited to partner with Wafer AI.

Wafer AI optimizes the world's best open models to run faster and cheaper on the latest hardware, without sacrificing accuracy. And they move fast. Their recent release, Kimi K2.6 NVFP4, is the first publicly available NVFP4 optimization. Customers are already running it at scale on Parasail, serving over 10 billion tokens a day in production.

In head-to-head testing on Kimi K2.6 running on a single 8×B200 node, NVFP4 outperforms INT4 across the board:

Up to 58% more throughput per node
Up to 43% faster token streaming for users
Comparable accuracy across GSM8K and MMLU benchmarks

"At Wafer, we obsess over intelligence per watt. We optimize the world's best open models end-to-end, squeezing maximum performance from every layer of the stack. We selected Parasail as our preferred launch partner, so developers can put that work to use immediately, without worrying about the infrastructure layer." — Emilio Andere, CEO, Wafer AI

First Release: Kimi K2.6 NVFP4

Kimi K2.6 NVFP4 is our first joint release. As Wafer AI continues optimizing the world's best open models, Parasail will be the fastest way to run them at scale: production-ready in minutes, with no infrastructure management and no traffic limits.

Get Started

Kimi K2.6 NVFP4 is available on Parasail now, with more Wafer AI optimized models coming soon, including Kimi K2.6 with speculative decoding for much faster token speeds. Contact us if you want a sneak preview here.

Parasail and Wafer AI: Faster models, lower costs

First Release: Kimi K2.6 NVFP4

Get Started

More from Parasail

Making an EAGLE fly: How We Got 2.6x Faster LLM Inference (Without Cheating)

Making Cold Start Latencies go Brrrr: A Multi-pronged Approach (Part 1)