Swair's Writing

Mar 2026 ML

How Text-to-Speech Models Work

From raw waveforms to voice cloning — understanding Kokoro, CSM, and Pocket TTS from first principles. Neural audio codecs, vector quantization, flow matching, and why a 100M-parameter model can clone your voice from 5 seconds of audio.

Waveform → Codec → Tokens / Latents → Language Model → Speech
Mar 2026 Graphics

Building Ocean Sparkles from First Principles

Raymarching, halftone post-processing, and procedural sparkle generation in a single HTML file. From the mathematics of noise through the physics of Fresnel reflection to risograph-style rendering with blue noise dithering.

Noise & fBm → Raymarching → Kawase Blur → Halftone → Sparkles