From the team

Blog

Product updates, engineering deep dives, and thought leadership from the Parasail team.

Product

Parasail and Wafer AI: Faster models, lower costs

Parasail and Wafer AI are partnering to make frontier AI cheaper and more accessible. Wafer optimizes models to do more with less compute. Parasail serves them reliably at scale. Developers get the most efficient versions of the best open models, instantly accessible via API. Kimi K2.6 NVFP4 is the first release.

Team Parasail · Apr 30, 2026

Engineering

Making an EAGLE fly: How We Got 2.6x Faster LLM Inference (Without Cheating)

We trained a custom EAGLE-3 speculative decoding head for OLMo-3.1-32B-Think and got 2.6x faster inference. Same model, same weights, same outputs, just faster. A single B200 running our setup outperformed 2xH200 without it. This post walks through the full pipeline: dataset prep, 40 TiB of captured hidden states, hyperparameter sweeps that mostly didn't matter, and the inference-time tuning that turned promising training curves into real production throughput.

Gabriel Perácio · Apr 28, 2026

Engineering

Making Cold Start Latencies go Brrrr: A Multi-pronged Approach (Part 1)

Cold-start latency is often orders of magnitude higher than steady-state latency on an inference platform serving hundreds of models. In Part I of a series, we walk through how we combined fastsafetensors, O_DIRECT, and io_uring to get fast cold-starts and fast warm-starts on the same stack.

Meghana Madhyastha · Apr 20, 2026
More posts coming soon.