vocabularywhisperaccuracytutorialtechnical-terms

Vocabulary Hints: Fix Whisper Misrecognitions

WisperCode Team · January 21, 2026 · 8 min read

TL;DR: Whisper sometimes misrecognizes technical terms, proper nouns, and domain jargon. Vocabulary hints tell the model what words to expect, dramatically improving accuracy for specialized terms without retraining the model.

Why Whisper Misrecognizes Some Words

Whisper was trained on 680,000 hours of general internet audio, mostly podcasts, YouTube videos, audiobooks, and lectures. It knows common English words extremely well. Ask it to transcribe a casual conversation and it will deliver near-perfect results. But the moment you start speaking about Kubernetes deployments, PostgreSQL migrations, or HIPAA compliance, accuracy drops.

The reason is statistical. When Whisper's decoder generates text, it picks the most probable next word based on what it learned during training. If the model heard "Kubernetes" a thousand times in its training data but heard "Cooper Netties" zero times, it should get the transcription right. The problem is that many technical terms, brand names, and domain-specific phrases appeared rarely or never in Whisper's training set. When the model encounters an unfamiliar term, it defaults to the closest-sounding common phrase it does know.

This is the same phenomenon you experience when talking to someone who has never heard a particular word. You say "Vercel" and they hear "herself." You say "Prisma" and they hear "prism of." The listener is not broken. They simply lack the context to know that specific word exists.

Whisper struggles most with:

Technical terms: Kubernetes, PostgreSQL, Terraform, Nginx, gRPC
Brand and product names: WisperCode, Vercel, Supabase, Datadog
Medical and legal terminology: metformin, colonoscopy, habeas corpus, amicus curiae
Acronyms and abbreviations: CI/CD, HIPAA, EBITDA, RBAC, ICD-10
Names of people and places: Unusual proper nouns that do not appear in common speech

The good news is that you do not need to retrain the model or accept poor accuracy. Vocabulary hints solve this problem at inference time.

What Are Vocabulary Hints?

Vocabulary hints provide the Whisper model with a list of expected words and phrases before transcription begins. This biases the model toward recognizing these terms when it encounters ambiguous audio, similar to telling a human listener "I am going to talk about Kubernetes and PostgreSQL" before a conversation starts. The listener now knows those words exist and will correctly identify them when spoken.

Common Misrecognition Examples

Here is what happens without vocabulary hints versus with them. These are real examples that Whisper users encounter regularly.

What You Said	What Whisper Heard	With Vocabulary Hint
Kubernetes	Cooper Netties	Kubernetes
PostgreSQL	Post Gress Equal	PostgreSQL
WisperCode	Whisper Code	WisperCode
HIPAA	hippo	HIPAA
Figma	fig ma	Figma
Tailwind CSS	tailwind sees SS	Tailwind CSS
Supabase	super base	Supabase
Nginx	engine X	Nginx
gRPC	G R P C	gRPC
Terraform	terra form	Terraform

The pattern is consistent. Without hints, Whisper breaks unfamiliar terms into common English fragments. With hints, the model recognizes the intended term because it knows the term exists in this conversation's context.

How to Add Vocabulary Hints in WisperCode

Adding vocabulary hints takes less than a minute.

Step 1: Open WisperCode and go to Settings.

Step 2: Navigate to the Dictionary tab.

Step 3: Click Add Term and type the word or phrase exactly as you want it transcribed. Capitalization matters. If you want "PostgreSQL" with that exact casing, type it that way.

Step 4: Save and test. Hold your hotkey, say the term, and confirm it appears correctly.

Tips for adding terms:

Add both the full term and its common abbreviation. Add "Kubernetes" and "K8s" as separate entries.
Multi-word phrases work. "Tailwind CSS," "Visual Studio Code," and "Amazon Web Services" are all valid entries.
Use the bulk import option if you have a large list. Paste one term per line and WisperCode adds them all at once.

You do not need to restart the application after adding hints. They take effect on your next dictation.

Best Practices for Vocabulary Hints

Add terms you use daily first. Start with the ten or twenty terms you say most often that Whisper gets wrong. You will see an immediate improvement in your daily workflow.
Include proper capitalization. Whisper uses your hints as a formatting guide. If you add "GraphQL" with the capital G, Q, and L, the transcription will match that casing.
Add common abbreviations AND full forms. If you say both "CI/CD" and "continuous integration continuous deployment," add both. The model benefits from seeing the full form even if you typically use the abbreviation.
Group by domain. If you work across multiple fields, organize your terms mentally by domain: tech stack terms, client names, project names, medical terms, legal terms. This helps you identify gaps when accuracy is off.
Do not overload the list. Focus on terms that Whisper actually misrecognizes. Adding hundreds of common English words that Whisper already handles well dilutes the signal. If Whisper already transcribes "JavaScript" correctly, you do not need to add it.
Test after adding. Speak each term naturally after adding it and confirm the transcription is correct. Occasionally, hints need slight adjustments, such as adding a phonetic variation or removing a conflicting term.

Vocabulary Hints by Profession

Different professions have different problem words. Here are starter lists for common fields.

Software Developers: React, Vue, Svelte, Kubernetes, Docker, PostgreSQL, Redis, Nginx, webpack, GraphQL, Prisma, FastAPI, SQLAlchemy, NumPy, Terraform, Supabase, Vercel, gRPC, OAuth, JWT

For a complete developer setup including IDE-specific tips, read the developer voice dictation guide.

Medical Professionals: HIPAA, ICD-10, metformin, colonoscopy, laparoscopic, hemoglobin, echocardiogram, prednisone, dyspnea, tachycardia, auscultation, biopsy, thrombocytopenia

Legal Professionals: habeas corpus, amicus curiae, voir dire, certiorari, de novo, mens rea, prima facie, subpoena, adjudication, fiduciary, tort, deposition

If you handle confidential documents in these fields, see our guide on voice dictation for sensitive documents.

Financial Professionals: EBITDA, GAAP, fiduciary, amortization, derivative, securitization, LIBOR, yield curve, Sarbanes-Oxley, P/E ratio, EBIT, CAPEX, OPEX

How Hints Work Under the Hood

Whisper accepts an initial_prompt parameter that conditions the decoder before it begins generating text. When you add vocabulary hints in WisperCode, those terms are formatted and passed into this prompt. The model's beam search, the process by which it evaluates multiple possible transcriptions simultaneously, then assigns higher probability to token sequences that match your hint terms.

This is not fine-tuning. The model's weights are not modified. It is inference-time guidance, comparable to priming a language model with context before asking it a question. The model temporarily "knows" to expect these words, so when it encounters ambiguous audio that could be "Cooper Netties" or "Kubernetes," the probability shifts toward "Kubernetes" because that term was provided in the prompt.

The effect is strongest for terms that are phonetically similar to common words. Whisper already handles completely unique-sounding terms reasonably well. The biggest improvements come from terms where the spoken audio genuinely sounds like a common English phrase to an untrained ear.

For a broader explanation of how Whisper processes audio from start to finish, see What is OpenAI Whisper.

Frequently Asked Questions

How many vocabulary hints can I add?

There is no hard limit in WisperCode. Whisper's initial prompt does have a token limit (roughly 224 tokens), so WisperCode prioritizes the most relevant terms when the list is large. In practice, keeping your list focused on terms that Whisper actually misrecognizes, typically fifty to one hundred terms, delivers the best results. Adding thousands of terms provides diminishing returns and can occasionally cause the model to over-correct on words that were already fine.

Do hints slow down transcription?

No. The initial prompt is processed once at the start of each transcription. The computational cost is negligible. You will not notice any difference in speed whether you have zero hints or one hundred.

Can I share vocabulary hint lists?

Yes. WisperCode supports exporting and importing vocabulary lists. You can export your list as a file and share it with colleagues who work in the same domain. This is particularly useful for teams where everyone needs the same technical vocabulary. A team lead can set up a comprehensive list and distribute it to the entire team.

Do hints work with all Whisper model sizes?

Yes, vocabulary hints work with every model size from tiny to large-v3. However, larger models need fewer hints because they already handle uncommon terms more accurately due to their greater capacity. If you are using the base model, vocabulary hints are especially valuable. If you are using large-v3, you may only need hints for the most unusual terms in your vocabulary. For a comparison of how each model size performs, see Whisper model sizes compared.

Try WisperCode free during beta -> Download

What Is OpenAI Whisper? A Plain-English Guide

OpenAI Whisper is an open-source speech recognition model that runs locally on your device. Learn how it works, which model to pick, and why it matters for privacy.

February 7, 2026 · 15 min read

Why Local Speech Recognition Changes Everything

Cloud-based dictation is convenient. Local dictation is better. Here is why we bet everything on on-device processing.

February 5, 2026 · 13 min read

Voice Dictation Setup Guide for Mac and Windows

Step-by-step guide to setting up voice dictation on macOS and Windows using WisperCode. Covers installation, permissions, microphone setup, and optimization.

February 4, 2026 · 15 min read

All posts

vocabularywhisperaccuracytutorialtechnical-terms

Vocabulary Hints: Fix Whisper Misrecognitions

WisperCode Team · January 21, 2026 · 8 min read

Why Whisper Misrecognizes Some Words

Whisper struggles most with:

Technical terms: Kubernetes, PostgreSQL, Terraform, Nginx, gRPC
Brand and product names: WisperCode, Vercel, Supabase, Datadog
Medical and legal terminology: metformin, colonoscopy, habeas corpus, amicus curiae
Acronyms and abbreviations: CI/CD, HIPAA, EBITDA, RBAC, ICD-10
Names of people and places: Unusual proper nouns that do not appear in common speech

The good news is that you do not need to retrain the model or accept poor accuracy. Vocabulary hints solve this problem at inference time.

What Are Vocabulary Hints?

Common Misrecognition Examples

Here is what happens without vocabulary hints versus with them. These are real examples that Whisper users encounter regularly.

What You Said	What Whisper Heard	With Vocabulary Hint
Kubernetes	Cooper Netties	Kubernetes
PostgreSQL	Post Gress Equal	PostgreSQL
WisperCode	Whisper Code	WisperCode
HIPAA	hippo	HIPAA
Figma	fig ma	Figma
Tailwind CSS	tailwind sees SS	Tailwind CSS
Supabase	super base	Supabase
Nginx	engine X	Nginx
gRPC	G R P C	gRPC
Terraform	terra form	Terraform

How to Add Vocabulary Hints in WisperCode

Adding vocabulary hints takes less than a minute.

Step 1: Open WisperCode and go to Settings.

Step 2: Navigate to the Dictionary tab.

Step 3: Click Add Term and type the word or phrase exactly as you want it transcribed. Capitalization matters. If you want "PostgreSQL" with that exact casing, type it that way.

Step 4: Save and test. Hold your hotkey, say the term, and confirm it appears correctly.

Tips for adding terms:

Add both the full term and its common abbreviation. Add "Kubernetes" and "K8s" as separate entries.
Multi-word phrases work. "Tailwind CSS," "Visual Studio Code," and "Amazon Web Services" are all valid entries.
Use the bulk import option if you have a large list. Paste one term per line and WisperCode adds them all at once.

You do not need to restart the application after adding hints. They take effect on your next dictation.

Best Practices for Vocabulary Hints

Add terms you use daily first. Start with the ten or twenty terms you say most often that Whisper gets wrong. You will see an immediate improvement in your daily workflow.
Include proper capitalization. Whisper uses your hints as a formatting guide. If you add "GraphQL" with the capital G, Q, and L, the transcription will match that casing.
Add common abbreviations AND full forms. If you say both "CI/CD" and "continuous integration continuous deployment," add both. The model benefits from seeing the full form even if you typically use the abbreviation.
Group by domain. If you work across multiple fields, organize your terms mentally by domain: tech stack terms, client names, project names, medical terms, legal terms. This helps you identify gaps when accuracy is off.
Do not overload the list. Focus on terms that Whisper actually misrecognizes. Adding hundreds of common English words that Whisper already handles well dilutes the signal. If Whisper already transcribes "JavaScript" correctly, you do not need to add it.
Test after adding. Speak each term naturally after adding it and confirm the transcription is correct. Occasionally, hints need slight adjustments, such as adding a phonetic variation or removing a conflicting term.

Vocabulary Hints by Profession

Different professions have different problem words. Here are starter lists for common fields.

Software Developers: React, Vue, Svelte, Kubernetes, Docker, PostgreSQL, Redis, Nginx, webpack, GraphQL, Prisma, FastAPI, SQLAlchemy, NumPy, Terraform, Supabase, Vercel, gRPC, OAuth, JWT

For a complete developer setup including IDE-specific tips, read the developer voice dictation guide.

Medical Professionals: HIPAA, ICD-10, metformin, colonoscopy, laparoscopic, hemoglobin, echocardiogram, prednisone, dyspnea, tachycardia, auscultation, biopsy, thrombocytopenia

Legal Professionals: habeas corpus, amicus curiae, voir dire, certiorari, de novo, mens rea, prima facie, subpoena, adjudication, fiduciary, tort, deposition

If you handle confidential documents in these fields, see our guide on voice dictation for sensitive documents.

Financial Professionals: EBITDA, GAAP, fiduciary, amortization, derivative, securitization, LIBOR, yield curve, Sarbanes-Oxley, P/E ratio, EBIT, CAPEX, OPEX

How Hints Work Under the Hood

For a broader explanation of how Whisper processes audio from start to finish, see What is OpenAI Whisper.

Frequently Asked Questions

How many vocabulary hints can I add?

Do hints slow down transcription?

Can I share vocabulary hint lists?

Do hints work with all Whisper model sizes?

Try WisperCode free during beta -> Download

What Is OpenAI Whisper? A Plain-English Guide

OpenAI Whisper is an open-source speech recognition model that runs locally on your device. Learn how it works, which model to pick, and why it matters for privacy.

February 7, 2026 · 15 min read

Why Local Speech Recognition Changes Everything

Cloud-based dictation is convenient. Local dictation is better. Here is why we bet everything on on-device processing.

February 5, 2026 · 13 min read

Voice Dictation Setup Guide for Mac and Windows

Step-by-step guide to setting up voice dictation on macOS and Windows using WisperCode. Covers installation, permissions, microphone setup, and optimization.

February 4, 2026 · 15 min read

Vocabulary Hints: Fix Whisper Misrecognitions

Why Whisper Misrecognizes Some Words

What Are Vocabulary Hints?

Common Misrecognition Examples

How to Add Vocabulary Hints in WisperCode

Best Practices for Vocabulary Hints

Vocabulary Hints by Profession

How Hints Work Under the Hood

Frequently Asked Questions

Related Articles

What Is OpenAI Whisper? A Plain-English Guide

Why Local Speech Recognition Changes Everything

Voice Dictation Setup Guide for Mac and Windows

Vocabulary Hints: Fix Whisper Misrecognitions

Why Whisper Misrecognizes Some Words

What Are Vocabulary Hints?

Common Misrecognition Examples

How to Add Vocabulary Hints in WisperCode

Best Practices for Vocabulary Hints

Vocabulary Hints by Profession

How Hints Work Under the Hood

Frequently Asked Questions

Related Articles

What Is OpenAI Whisper? A Plain-English Guide

Why Local Speech Recognition Changes Everything

Voice Dictation Setup Guide for Mac and Windows