Back to blog
Model Settings7 min read

Which transcription model should you use for Mac dictation?

A practical way to choose between Tiny, Base, Small, Medium, Large, and Turbo for local dictation and file transcription on Mac.

Emerald voice waveforms moving through several local model lanes into a clean transcript panel on a dark Mac workspace.

Model choice is one of those settings that looks technical until it affects your day.

Pick a model that is too light, and the transcript may need enough cleanup that dictation stops feeling worth it. Pick one that is too heavy, and every short note waits longer than it should. The best setting is not the biggest model you can run. It is the smallest model that gives you text you trust for the job in front of you.

That job changes. A quick note to yourself, a long client voice memo, an AI prompt with product names, and an interview clip do not need the same balance of speed, accuracy, and memory.

Here is a practical way to choose.

Start with what the transcript needs to become

Before changing model size, decide what the text is for.

If the transcript is a rough capture, speed matters. You can tolerate a few awkward phrases because the point is to get the thought onto the page before it disappears. That is true for quick notes, private drafts, loose outlines, and short AI prompt context you plan to edit anyway.

If the transcript is source material, accuracy matters more. Names, numbers, quotes, technical terms, and client details are harder to fix after the fact. A slower pass can be worth it when the transcript will become research notes, a customer follow-up, support context, a published quote, or a file you need to search later.

A useful rule:

  • Use faster models when the cost of cleanup is low.
  • Use larger models when the cost of mistakes is high.
  • Use Turbo when you want a strong default and your Mac handles it comfortably.

The setting should follow the work, not the other way around.

What each model is good for

SpeakLane lets you download and switch local models from Settings > Models. The docs summarize the tradeoff like this:

  • Tiny is fastest and smallest, good for quick drafts.
  • Base balances speed and accuracy.
  • Small is better for longer or noisy audio.
  • Medium is more accurate, but slower and heavier.
  • Large gives the best overall accuracy and needs more memory.
  • Turbo is very fast with near-large quality.

That list is a starting point, not a universal ranking for every recording.

Tiny can be the right choice if you mostly dictate short, casual thoughts and you want the transcript back quickly. It is especially useful when you are testing a hotkey habit, writing private scratch notes, or using voice for first-pass capture.

Base is a sensible baseline when Tiny is a little too rough but you still care about latency. If you are not sure where to begin, Base gives you a middle ground without making every sentence feel heavy.

Small is where many everyday recordings start to feel more dependable. It is a good candidate for longer dictations, noisy rooms, and file transcription where you want fewer obvious corrections.

Medium and Large are for moments where the transcript has more responsibility. Use them when the recording includes unusual names, product terms, quoted material, or anything you would rather review carefully once than repair sentence by sentence.

Turbo is worth testing as a daily driver if your Mac has the headroom. It can be a strong fit for people who dictate often and do not want to keep switching between a fast model for notes and a stronger model for important material.

Match the model to live dictation

Live dictation has a different feel from file transcription because you are waiting for the result while you work.

If you are using a push-to-talk hotkey, start with short sessions and a model that returns text quickly. One paragraph, one reply, one task description, one prompt section. The shorter the session, the easier it is to review and the less pressure there is to run the largest model every time.

For everyday live dictation, try this sequence:

  1. Start with Base or Turbo.
  2. Dictate a normal paragraph into the app where you actually write.
  3. Check how often you fix names, punctuation, repeated words, and technical phrases.
  4. Move down if the transcript is clean enough and you want faster output.
  5. Move up if cleanup is taking more time than the heavier model would.

Do not evaluate the model from one perfect sentence. Test it on the kind of material you actually say: a customer note, a bug report, a planning thought, or a messy AI prompt with the details you usually skip when typing.

Use larger models for harder files

Existing recordings are less forgiving than live dictation. You cannot move closer to the microphone, slow down, or rephrase the first take. You have the file you have.

For file transcription, choose the model before importing the file. A short, clean voice memo may be fine with Base or Small. A long recording, a noisy clip, or an interview with names and specific details deserves a stronger pass.

Use a larger model when the file includes:

  • Product names, customer names, acronyms, or technical terms.
  • Audio recorded far from the microphone.
  • Background noise, room echo, or compressed audio.
  • Material you plan to quote or reuse.
  • A long recording where repeated small errors would be tiring to fix.

The larger model does not remove the need to review. It reduces the number of obvious problems you have to catch. That matters most when the transcript becomes something other people will read or rely on.

Decide on English-only models deliberately

English-only models can be useful when your recordings are consistently in English. If most of your dictation is email, notes, prompts, and documentation in English, they are worth testing.

Be careful when your speech includes mixed-language phrases, names from other languages, or non-English source material. In those cases, an English-only model may not be the better fit even if the surrounding sentence is English.

The practical test is simple: run the same kind of phrase through both options and compare the corrections you actually make. The model that looks best on paper matters less than the one that handles your names, terms, and speaking style with fewer edits.

Tune performance before blaming accuracy

Model size is only one part of the result. The recording itself and the performance settings matter too.

In Settings, SpeakLane lets you control the active local model, filler word cleanup, Metal acceleration on Apple Silicon, CPU threads, output behavior, and history storage. If a model feels slow, check performance settings before assuming you need to drop all the way down. If a result is rough, check the microphone and recording conditions before assuming you need the largest model.

The boring inputs still matter:

  • Stay close enough to the microphone.
  • Reduce fan noise, keyboard noise, and room echo when possible.
  • Use short sessions for technical material.
  • Say unusual names clearly the first time.
  • Keep the source audio when you may need to verify a transcript later.

A cleaner recording on Small can beat a messy recording on Large. The model helps, but it cannot recover details that were never clear in the audio.

A simple default setup

If you do not want to think about model choice every day, use a two-setting routine.

Set your everyday live dictation model to Base, Small, or Turbo, depending on what feels responsive on your Mac. Use it for short notes, quick replies, AI prompt sections, and ordinary drafts.

Then switch up for important files or difficult recordings. Use Small, Medium, Large, or Turbo when the audio is long, noisy, name-heavy, or likely to be reused.

That gives you a practical split:

  • Daily capture: fast enough that you keep using it.
  • Careful transcription: accurate enough that review does not become a chore.

You can make this more precise over time. After a week, look at the edits you keep making. If they are mostly small wording fixes, the model is probably fine. If you constantly repair names, technical terms, missed words, or sentence boundaries, move up. If you spend more time waiting than editing, move down or test Turbo.

The goal is not to find the most impressive model. The goal is to make voice feel reliable enough that you use it when the thought is still fresh.