Listenr & Fine-Tuning: Building a Personal ASR Pipeline

The goal is simple: as an individual, build enough high-quality audio clips and transcriptions to meaningfully improve a speech recognition model — privately, cheaply, and with a scalable process.

fine-tuning pipeline

That means solving three problems:

Data Collection:

Capture — a frictionless way to record real-world conversational audio without sending anything to the cloud.
Label — automated transcription plus LLM-assisted post-processing to get clean, accurate ground-truth text.

Data Processing

Build - Take the disparate data and create useful datasets

Fine-tuning

Train — a reproducible fine-tuning pipeline that runs on consumer hardware (AMD especially) and produces a model that actually works.
Test - a quantative way to measure if the results are useful.

This series aims to document the whole journey to enable others to do to the same!

Listenr & Fine-Tuning: Building a Personal ASR Pipeline

All Parts

How I locally fine-tuned Whisper using my own voice data and some effort

Fine-Tuning Whisper on AMD Hardware: ROCm, Docker, and Getting It Running

Fine-tune Whisper using Listenr and existing datasets