Subtitling, Workflow

Automatic subtitling: How AI is rewriting the process

Jeroen Verbeeck
October 28, 2021

The demand for automatic subtitling in the audiovisual industry is exploding and if you don’t invest in it now, you risk being left behind by your competitors. We spend more time than ever watching screens and most of the video content we see on social media and entertainment platforms is captioned, indeed it has become the de facto standard. Using AI and an “expert in the loop” workflow, you can save 80% or more. In this way automatic subtitling can reduce the cycle time to a minimum, while maintaining a consistent high quality.

The importance of good subtitling… Korean speakers claim the closed-caption subtitles in English for the Korean language drama Squid Game are “so bad” that the original meaning is often lost.

Why the huge growth in demand?

The reasons for the exponential surge in demand for subtitling are multifold. Across the globe, from US ADA (Americans with Disabilities Act) to EU EAA (European Accessibility Act) and beyond, there is a legal obligation to offer accessible on-demand content. While the regulations are driven by the inclusivity requirement for the hearing impaired – according to the World Health Organisation, 430 million people suffer from disabling hearing loss – the reasons why subtitling is so important are even more far reaching.

Subtitles enable you to maximise your audience thanks to their ability to adapt to a range of contexts, audiences and media and reach people you would not reach otherwise, for example, those who speak other languages. Several studies show that around 85% of people that view videos on Facebook view them with the sound off and research from Facebook itself claims that adding captions to your video can boost view time by 12%. A survey of U.S. consumers even found that 92% view videos with the sound off on mobile and 83% watch with sound off. Videos with subtitles record much higher views on social media than any other content.

Key question: how can you cater for quality automatic subtitling without spending?

Automatic subtitling: complex process

A palpable part of the work consists of audio transcription and this can be perfectly automated.

But subtitles are not just a transcript cut into pieces of text of two lines and 40 characters per line. It does not only consist in translating a text from a source language into a target language but it also involves a shift from oral to written language, removing language tics and incorporating nuance, natural audio breaks and pauses. So if you’re using AI for speech to text transcription, it is really important to post-edit the transcript before considering spotting or translation. Pro tip: good transcription solutions like Happy scribe, Trint and Limecraft offer an interface to do so.

Upon review of the transcript, there is a spotting process. Taking into account the applicable style guide, the transcript has to be segmented in subtitles with line breaks positioned as good as possible, i.e. the subtitle can be read as fast as possible, so the user can spend the majority of his attention span to the images.

AI to the rescue

AI may be better and faster in turning audio into timed text, and into cutting a transcript into properly segmented and timed subtitles, but it will need a little help for translating spoken language into more compact written language. This is called the “expert in the loop” workflow.

“AI software to partially automate the process is ever more intelligent but needs supervision”

Using AI and an “expert in the loop” workflow, you can save 80% or more and reduce the cycle time to a minimum, while maintaining a consistent high quality. AI software to partially automate the process is ever more intelligent but needs supervision – whether you are an occasional producer looking for subtitles, a subtitle professional or a broadcaster, there will be cases where it is better to insource the process.

More and more occasional producers right up to large broadcasters are insourcing the process, helped by technology, but insourcing or outsourcing is by far the most controversial issue in the domain, and something we explore further in this blog. It’s a delicate balance between cost, quality and delay.

How to deal with regular expressions, slang, “translating” spoken language into written language? Expert in the loop workflow

What’s the state of the art in automatic subtitling?

Automatic speech recognition nowadays works 4x faster than real-time and has a word error rate of 2%-5% depending on the quality of the speech. This is critical: 2% results in a reasonable post-editing time (2 min per min), while 5% is the limit (6 min per min or more).

Pro tip: use clear speech, finished sentences, no slang to contain WER and to minimise post-editing, whether insourced or outsourced. Typically, more than 2 lines of subtitled text on the screen is dangerous. It might result in covering the image and thus damaging the entire viewing experience.

Where machine translation is used, it is imperative to always follow up with a review/proofread by a native speaker.

More subtle is the spotting. Good subtitles draw as little attention as possible. Bad subtitles look like they have been auto generated. Limecraft is the only solution delivering automatic spotting of a quality similar or better than a human.

Whether you’re a language professional looking to optimise your margin, a content agency producing content for corporate with subtitles in one or more languages, a broadcaster looking to insource the closed captioning process or a language service provider looking for optimised operational efficiency and shorter turnaround times, Limecraft’s AI-driven automatic subtitling can save you time and money.