From the team
Product updates, engineering deep dives, and thought leadership from the Parasail team.
Product
Parasail and Wafer AI are partnering to make frontier AI cheaper and more accessible. Wafer optimizes models to do more with less compute. Parasail serves them reliably at scale. Developers get the most efficient versions of the best open models, instantly accessible via API. Kimi K2.6 NVFP4 is the first release.
Engineering
We trained a custom EAGLE-3 speculative decoding head for OLMo-3.1-32B-Think and got 2.6x faster inference. Same model, same weights, same outputs, just faster. A single B200 running our setup outperformed 2xH200 without it. This post walks through the full pipeline: dataset prep, 40 TiB of captured hidden states, hyperparameter sweeps that mostly didn't matter, and the inference-time tuning that turned promising training curves into real production throughput.
Cold-start latency is often orders of magnitude higher than steady-state latency on an inference platform serving hundreds of models. In Part I of a series, we walk through how we combined fastsafetensors, O_DIRECT, and io_uring to get fast cold-starts and fast warm-starts on the same stack.