Automatic subtitling: How AI is rewriting the process : Limecraft

The demand for AI subtitling in the audiovisual industry is exploding and if you don’t invest in it now, you risk being left behind by your competitors. We spend more time than ever watching screens and most of the video content we see on social media and entertainment platforms is captioned, indeed it has become the de facto standard. Using AI and an “expert in the loop” workflow, you can save 80% or more. AI subtitling reduces the cycle time to a minimum, while maintaining a consistently high quality.

The importance of good subtitling… Korean speakers claim the closed-caption subtitles in English for the Korean language drama Squid Game are so poor that the original meaning often gets lost.

Why the huge growth in demand?

The reasons for the exponential surge in demand for subtitling are multifold. Across the globe, from the US ADA (Americans with Disabilities Act) to the EU EAA (European Accessibility Act) and beyond, there are legal obligations to offer accessible on-demand content. While the regulations are driven by the inclusivity requirement for the hearing impaired – according to the World Health Organisation, 430 million people suffer from disabling hearing loss – the reasons why subtitling is so important are even more far reaching.

Subtitles enable you to maximise your audience thanks to their ability to adapt to a range of contexts, audiences and media and reach people you would not reach otherwise, for example, those who speak other languages. Several studies show that around 85% of people that view videos on Facebook view them with the sound off and research from Facebook itself claims that adding captions to your video can boost view time by 12%. A survey of U.S. consumers even found that 92% view videos with the sound off on mobile and 83% watch with sound off. Videos with subtitles record much higher views on social media than any other content. The key question remains how you can you cater for quality subtitles within a reasonable budget.

Automated subtitling: complex process

A palpable part of the work consists of audio transcription and this can be perfectly automated.

But subtitles are not just a transcript cut into two lines text and 40 characters on a line. It does not only consist in translating a text from a source language into a target language but it also involves a shift from oral to written language, removing language tics and incorporating nuance, natural audio breaks, and pauses. So if you’re using AI transcription, it is really important to post-edit the transcript before considering spotting or translation.

💡 Pro tip: good transcription solutions like Happy scribe, Trint and Limecraft offer an interface to do so.

Upon approval of the transcript, there is a spotting process. Taking into account the applicable style guide, the transcript is segmented in subtitles with line breaks positioned as good as possible. This means subtitles must be as easy as possible to read, so drawing the least possible attention from the images.

AI to the rescue

Now, while AI may be better and faster in turning audio into timed text, as well as in cutting a transcript into properly segmented and timed subtitles, it still needs help for translating spoken language into more compact written language. This is the “expert in the loop” workflow.

“AI software to partially automate the process is ever more intelligent but needs supervision”

Using AI and an “expert in the loop” workflow, you can save 80% or more and reduce the cycle time to a minimum, while maintaining a consistent high quality. AI software to partially automate the process is ever more intelligent but needs supervision – whether you are an occasional producer looking for subtitles, a subtitle professional or a broadcaster, there will be cases where it is better to insource the process.

More and more occasional producers right up to large broadcasters are insourcing the process, helped by technology, but insourcing or outsourcing is by far the most controversial issue in the domain, and something we explore further in this blog. It’s a delicate balance between cost, quality and delay.

How to deal with regular expressions, slang, “translating” spoken language into written language? Expert in the loop workflow

What’s the state of the art in automatic subtitling?

Automatic speech recognition nowadays works 4x faster than real-time and has a word error rate of 2%-5% depending on the quality of the speech. This is critical: 2% results in a reasonable post-editing time (2 min per min), while 5% is the limit (6 min per min or more).

💡 Pro tip: use clear speech, finished sentences, no slang to contain WER and to minimise post-editing, whether insourced or outsourced. Typically, unless you are publishing content on social media, more than 2 lines of subtitled text on the screen is prohibited. It might result in covering the image and thus damaging the entire viewing experience.

Where machine translation is used, it is imperative to always follow up with a review/proofread by a native speaker.

More subtle is the spotting. Good subtitles draw as little attention as possible. Bad subtitles look like they have been auto generated. Limecraft is the only solution delivering automatic spotting of a quality similar or better than a human.

Whether you’re a language professional looking to optimise your margin, a content agency producing content for corporate with subtitles in one or more languages, a broadcaster looking to insource the closed captioning process or a language service provider looking for optimised operational efficiency and shorter turnaround times, Limecraft’s AI-driven automatic subtitling can save you time and money.

Ready to give it a try? You’ll need a Limecraft account to use the workflow in this piece. If you don’t have an account yet, sign up for free to get started.