Amara Captions - Caption Quality Standards

Last updated by julia.wind 2 months ago

Edited or Exact? 

In general, try to use the same wording as the speaker, but minor edits to what was said can sometimes make the captions more usable. However, be careful that when changing the text you do not change the meaning of what was said. Consider the following: 
You may choose to remove some filler words, such as “you know,” “well…”, “um”, or other non-essential information. However, sometimes a stutter or filler word can be meaningful (showing nervousness for example). 
When a speaker uses grammatically incorrect language or a dialect, it should be reflected in the captions. 
Words in another language should be captioned as spoken (formatted in italics if possible). However, we never translate words spoken in another language to English for the caption. In those cases, “[speaking French]” or similar will suffice.  
If speech is cut off or low quality and is impossible to understand, use the word inaudible with brackets, as so: “[inaudible]”. 
Hesitations should be replicated in the captions - long pauses should be shown using ellipses (...), and smaller pauses should be shown through commas (,) or dashes ( – ). 

Symbols and Math 

Generally speaking, when someone says “one plus two”, we would usually write that down on a piece of paper as “1+2”, with the addition symbol. Or with a function, we would write “F of X” as “F(x)”. When captioning, however, we want to write what was said, not exactly what was meant. 
 
So, when you run into symbols, please caption it as the speaker says. For example, “...two multiplied by 34…”, “... the function F prime of X turns out to equal pi!”  

Line Length and Breaks 

Putting in line breaks at the correct places makes a huge difference in the quality of the experience for the viewer. 
Avoid breaking between a word and its modifier, in prepositional phrases, between first and last names or associated titles, or immediately after a conjunction. Instead, lines should be broken at punctuation or natural pauses in speech wherever possible. 
This way, the captioned experience is similar to how we naturally process language. See the Captioning Key - Line Division section  for some helpful examples. 
Incorrect: 
Mark pushed his black 
truck. 
 
Correct: 
Mark pushed 
his black truck. 
 
Incorrect:  
Mary scampered under 
the table. 
 
Correct:  
Mary scampered 
under the table.  

Helpful Guidelines For Timing: 

There should only be a maximum of 2 lines on the screen at a time. 
Don’t include more than 42 characters on a line (including spaces). 
Where possible, both lines should be about the same length. 
Don’t end and begin a sentence on the same line unless the sentences are extremely short. 

Time On Screen 

The speed of the speech can impact how long a caption is on the screen at a time. In general, we want to stick as close to the speaker’s timing as possible. This means you can typically ignore the error in Amara. 
 
However, if you feel like the time on screen is too short to be readable (some people simply talk too fast!) you can try to stretch the caption a touch to make it more readable. Use caution though we don't want to create a domino effect where many other captions would have to be adjusted and/or would all be off-timing. 

Formatting 

Screaming and shouting should be shown using ALL CAPS. Capitalize proper nouns such as names, places, and organizations. 
Italicized text should be used: 
When a speaker is quoting someone else. 
For words and phrases that aren’t in the primary language of the video. 
When a person is thinking or daydreaming. 
For offscreen speech.  
Note:  in Amara, italicized text won’t carry over to YouTube. Do your best to differentiate these, but don’t stress!  
For n umbers from one to ten, spell out the numbers. For numbers over ten, just use the numerals as usual. 

Speaker Identification/Changes 

It is important for viewers to understand who is speaking in the captions. Here are some guidelines to consider:  
Start a new line or use labels when a new speaker begins talking. 
If a speaker is visible on screen, introduces themselves, or their name is shown on screen, no identification is needed.  
If it is visually unclear who is speaking, consider identifying the speaker on a line before their speech:
  MR. SMITH: Let's eat lunch.
OFFSCREEN NARRATOR: This is the next video in our series. 
Once a speaker has been identified, we recommend using two arrows to indicate when a speaker changes:
  >> Hi Doctor, nice to meet you.
>> General, pleasure. 
However, if the speaker has been identified but is not on-screen, be sure to still identify them.   

Non-Speech Elements 

Non-speech elements include music without lyrics, sounds, or the absence of sound and can be very important to convey meaning in a video. 
Generally, music or sound effects should be included in their own caption box so they appear separately on a screen rather than within text. 
Non-speech elements should be presented within square brackets “[ ]” 
If a video has a non-speech element, such as music or no audio through the entire video, you can simply add one caption box with the non-speech element at the beginning of the video followed by an ellipses and adjust the time so it is on screen for 15 seconds.
  [Upbeat music playing...] 

Music 

Music is often used to communicate a mood or theme. 
When possible, a musical work, the title, and composer should be included.
  [Enya playing "Orninoco Flow"] 
For unknown music, the style or presentation of the music should be described objectively.
  [Calm piano music] 
If the message of song lyrics is important, they should be captioned word for word. Surround lyrics with music icons (♪). 
Nonessential background music that doesn’t contribute to the mood can be ignored. 
There is additional guidance from the Captioning Key - Music section  that is helpful. 

Other Non-speech 

There may be other non-speech communication in a video that is important to include in captions. 
If there is speech that contains an important emotional cue that is not shown, you can indicate in the captions. For example:
  [sorrowful]
My dog died. 
If it appears that people are talking, but they actually aren’t, it would be important to let the viewer know that no speech is happening:
  [silence]  
Sound effects can be very important when helping someone who is deaf or hard of hearing understand important context in a video. However, we also want to be careful to only include sound effects that are necessary or helpful in understanding or enjoying the video. When deciding whether to include a sound effect or not, you may want to ask yourself if you think most people would want to read about that sound effect or if it adds any value to the video. 
We recommend reviewing the Captioning Key - Sound Effects section  for specific examples.  

Additional Resources 

If you would like to learn more about any of the above caption standards or other specific situations, we recommend the below resources for additional reading: 
Refreshed On: Dec 14, 2025 06:08:19 UTC+00:00