Adobe panel, Subtitling, Customer Case

AI Subtitling for Short-Form Video via an Adobe Panel

Charlotte Coppejans
December 23, 2022

Emphasising its role as a digital innovator, VRT effectively implemented an online first strategy. More content has to go out faster and has to reach a wider audience. Improving accessibility by creating subtitles for every asset is the cornerstone.

To lower the cost and turnaround time of creating subtitles, VRT and Limecraft joined forces. They set up the co-creation project that is part of the STADIEM (Startup Driven Innovation in European Media) project.

In this fast-track project, running from August 2022 until January 2023, VRT and Limecraft are deploying tools to create more subtitles faster for short-form video. More specifically, they will investigate the feasibility and the added value of recycling subtitles that were originally produced for live content (if possible), the use of Artificial Intelligence (if needed), and what is required to plug these into the day-to-day workflow of video creators.

This blog discusses the objectives and key results of the project, as well as the implementation and the results to date.

Objectives and Key results

VRT considers the availability of high-quality subtitles of critical importance. This is part of its digital-first strategy. Subtitles increase the accessibility of content to those who need it, render content consumption more convenient to those who want it, and are a signal of relevance for search engines.

Not only, subtitled content is more accessible to deaf and hard-of-hearing people; but also to people on social media that don’t use audio. On top of that, subtitled content will also rank higher in search results.

To be as realistic as possible and take into account the short time to deliver results, VRT narrowed down the use case, exploring the different options that specifically apply to short-form video. VRT is looking for answers to the following three key questions:

  • Is it possible to use AI for translating the majority of the subtitles, so their video editors should only check and correct them if needed?
  • Overall, is it possible to create high-quality subtitles for short-form videos at a substantially lower cost in less time?
  • Doing so, is it possible to increase the probability of delivering videos with high-quality subtitles?

The challenges and the proposed solution

Short-form video producers are particularly keen on publishing their content much faster in comparison to conventional broadcasters. The speed (8 hours of work per hour of content) and the latency (days) of conventional subtitling, as well as the cost (€5-10 per minute depending on the type of audio), would be prohibitive.

To sort out these 3 issues at the same time, several content producers and broadcasters worldwide (including VRT) are looking for technologies that can assist the professional video editor in the creation of subtitles. The assumption is that when the technology becomes good enough, video editors will accept a co-creation between artificial intelligence doing the grunt work, and the editor taking care of the final touch.

As video editors will always be faced with the dilemma to either produce more content items or otherwise spend more time on the finishing, said AI-based technologies must offer them not a marginal improvement, but disruptively speed up the process by a factor of 3 or 4 to become a real satisfier.

Hence the premise currently under investigation is the following:

1. by re-synchronising pre-existing subtitles initially created for live content (if possible), or

2. by using AI transcription and subtitling to deliver a first approximation of high-quality subtitles (if necessary), and

3. by plugging these tools as a plugin or a panel in Adobe premiere to avoid editors having to switch or copy-and-past between applications we are lowering the thresholds for the adoption of high-quality subtitles for video editors not skilled as subtitle professionals.

Implementation

3.1 Recycling of subtitles originally created for live content

For this part of the Proof of Concept, VRT supplied us with a corpus of 20 factual programmes that have been live subtitled using respeak. The resulting subtitles are of reasonably good quality (+/- 1% of the subtitles may contain wrong interpretations), but they have a variable delay of +/-10 seconds. The key question is to what extent these subtitles can be re-aligned automatically.

To illustrate the experiment, we use the example of De Afspraak, in which a panel discusses current affairs live for 50 minutes. We have the result of the live subtitling (respeak + correction) at our disposal as an STL file, containing 750 subtitles with a timing offset of roughly 10 seconds.

In the image below, we compared the transcript generated by AI transcription with an import of the existing subtitles. The offset of the live subtitles is around 10 seconds.

Screenshot of Limecraft to compare the AI transcript with subtitles created for the hard of hearing using 'respeak', showing that these typically have a delay of 10 seconds

By using a script that turns the STL file into flat text, and by making available this plain text file for realignment, assuming it contains enough words so it can be heuristically matched with the audio, we get an interpretation of the audio which is 100% suitable for subtitling (the language doesn’t need further modification). As a result of the alignment function, 95% of the subtitles are perfectly timed. 5% of the subtitles are not timed correctly and need to be manually corrected.

Screenshot of Limecraft showing the result of processed subtitles that were initially created by 'respeak'. By creating realigning a flat text file, and recutting in subtitles, 95% of the subtitles are perfectly timed.

The two main root causes of why the timing is sometimes drifting are the following:

  • the episode contains two intermezzi with non-native speakers and open captions. This confuses the text alignment algorithm, and in that area the text and the audio are not properly matched;
  • in some occasions, the live subtitler has made significant interpretations of the original audio, whereby the result becomes hard to match with the original audio.

During the remainder of the project, we will investigate how we can further optimise the process.

3.2 Using AI transcription and subtitling as a proxy

In cases where no pre-existing texts or subtitles are available, the next best alternative is using AI transcription and the subtitling spotting algorithm. For content with suitable audio (factual programming, a trained voice, grammatically correct language), we know from experience +/- 90% of the subtitles match the quality of otherwise manually produced subtitles, saving 75% of the total turn-around time.

In the clip below, we compare the result of ASR and AI subtitling 36 months ago (in the middle of the screen) with state of the art processing (bottom of the screen).

3.3 Make available the results to the editor using an Adobe Panel

One last and arguably the most important step is to make these technologies accessible to video editors that are not necessarily skilled as professional subtitle editors. To do so, taking into account the specific requirements of VRT, we are developing an Adobe Panel that can be used by video editors to start AI transcription and subtitling and, upon completion, to pull the results back into Adobe Premiere. As a result, video editors have an automated process at their fingertips, without having to leave their editing environment, which delivers a result which is over 90% accurate and thus good enough for finishing in Adobe.

Preliminary conclusions

  • Recycling subtitles produced for live is very promising. The provisional results show 95% accuracy, and the remainder can be retimed +/- in real-time. We expect the turnaround time to be 1,5 to 2 times the length of the file, compared to 8 or 10 times. If we succeed in further improving the result e.g. by compensating for intermezzos with non-domestic audio and open captions,…
  • Automatic Speech Recognition and AI subtitling are especially useful when starting from studio quality audio, a trained voice and grammatically correct language, in which cases it saves roughly 75% of the turn-around time.
  • Using the Adobe Panel, video editors now have AI transcription and subtitling at their fingertips without further software development, and without having to copy and paste between applications.

Future Work

In the remainder of the project we will fine-tune the retiming algorithm and try to push the percentage of correctly timed subtitles as close as possible to 100%. Also, we will work on the styling of the Adobe Panel to make sure the look and feel nicely fits the dark background of Adobe. And, last be not least, we will obviously undertake to make the Limecraft panel available in the Adobe Exchange so as to make it available for a wider audience of video professionals in need for AI generated subtitles.

About VRT

VRT is the public broadcaster of the Flemish Community in Belgium. VRT presents high-quality and distinctive content in the areas of information, culture, education, entertainment and sports. With its three television channels, five radio stations and various digital channels, VRT reaches 90% of all Flemish people every week. The broadcaster plays an important role in the Flemish media ecosystem and cooperates with numerous national and international partners from various fields.

About STADIEM

STADIEM (Startup Driven Innovation in European Media), with its piloting and acceleration programme, brings together start-ups, scale-ups, investors and media organisations to foster the development of Next Generation Media solutions.