AI Subtitling for Short-Form Video via an Adobe Panel : Limecraft

Capitalising on its preferred role as a digital innovator, VRT effectively implemented an online first strategy. More content has to go out faster and has to reach a wider audience. Improving accessibility by creating subtitles for every asset is the cornerstone.

To lower the cost and turnaround time of creating subtitles, VRT and Limecraft joined forces. They set up the co-creation project that is part of the STADIEM (Startup Driven Innovation in European Media) project.

As part of this fast-track project, VRT and Limecraft deployed tools to create more subtitles faster for short-form video. More specifically, we implemented a workflow for recycling subtitles that were originally produced for live content and an Adobe Panel that lowers the hurdles to adoption of Artificial Intelligence for automated subtitling. This allows video creators to deliver short form video clips with broadcast quality subtitles in less than a minute.

This blog discusses the objectives and key results of the project, as well as the results to date.

Objectives and Key results

VRT considers the availability of high-quality subtitles of critical importance. This is part of its digital-first strategy. Subtitles increase the accessibility of content to those who need it, render content consumption more convenient to those who want it, and are a signal of relevance for search engines.

Not only, subtitled content is more accessible to deaf and hard-of-hearing people; but also to people on social media that don’t use audio. On top of that, subtitled content will also rank higher in search results.

To be as realistic as possible and take into account the short time to deliver results, VRT narrowed down the use case, exploring the different options that specifically apply to short-form video. VRT is looking for answers to the following three key questions:

Is it feasible in the first place to use AI for automatically creating subtitles, whereby video creators only have to do some spot checks?
Overall, is it possible to create high-quality subtitles for short-form videos at a substantially lower cost in less time?
Doing so, is it possible to increase the probability of delivering videos with high-quality subtitles?

The challenges and the proposed solution

Short-form video producers are particularly keen on publishing their content much faster in comparison to conventional broadcasters. The speed (8 hours of work per hour of content) and the latency (days) of conventional subtitling, as well as the cost (€5-10 per minute depending on the type of audio), would be prohibitive.

To sort out these 3 issues at the same time, several content producers and broadcasters worldwide (including VRT) are looking for technologies that can assist the professional video editor in the creation of subtitles. The assumption is that when the technology becomes good enough, video editors will accept a co-creation between artificial intelligence doing the grunt work, and the editor taking care of the final touch.

As video editors will always be faced with the dilemma to either produce more content items or otherwise spend more time on the finishing, said AI-based technologies must offer them not a marginal improvement, but disruptively speed up the process by a factor of 3 or 4 to become a real satisfier.

Hence the premise currently under investigation is the following:

1. by re-synchronising pre-existing subtitles initially created for live content (if possible), or

2. by using AI transcription and subtitling to deliver a first approximation of high-quality subtitles (if necessary), and

3. by plugging these tools as a plugin or a panel in Adobe premiere to avoid editors having to switch or copy-and-past between applications we are lowering the thresholds for the adoption of high-quality subtitles for video editors not skilled as subtitle professionals.

Implementation

3.1 Recycling of subtitles originally created for live content

For this part of the Proof of Concept, VRT supplied us with a corpus of 20 factual programmes that have been live subtitled using respeak. The resulting subtitles are of reasonably good quality (+/- 1% of the subtitles may contain wrong interpretations), but they have a variable delay of +/-10 seconds. The key question is to what extent these subtitles can be re-aligned automatically.

To illustrate the experiment, we use the example of De Afspraak, in which a panel discusses current affairs live for 50 minutes. We have the result of the live subtitling (respeak + correction) at our disposal as an STL file, containing 750 subtitles with a timing offset of roughly 10 seconds.

In the image below, we compared the transcript generated by AI transcription with an import of the existing subtitles. The offset of the live subtitles is around 10 seconds.

Hence we made available a compound workflow to resynchronise the subtitles and made it accessible as a custom workflow.

The resynchronisation workflow turns the STL file into flat text which is used as input for an alignment. The re-aligned script is than used for automatically spotting or queueing the subtitles. Doing so, we get an interpretation of the audio which is 100% suitable for subtitling (the language doesn’t need further modification).

💡 What are subtitling spotting rules?

The results are very convincing, as you can verify on the splitscreen video below. The screencast on top displays the subtitles produced for the deaf and hard of hearing (SDH), the version below the results of the automated resynchronisation.

3.2 Using AI transcription and subtitling as a proxy

In cases where no pre-existing texts or subtitles are available, the next best alternative is using AI transcription and the subtitling spotting algorithm. For content with suitable audio (factual programming, a trained voice, grammatically correct language), we know from experience +/- 90% of the subtitles match the quality of otherwise manually produced subtitles, saving 75% of the total turn-around time.

In the clip below, we compare the result of ASR and AI subtitling 36 months ago (in the middle of the screen) with state of the art processing (bottom of the screen).

3.3 Make available the results to the editor using an Adobe Panel

One last and arguably the most important deliverable is to make sure these technologies are accessible to video editors not necessarily skilled as professional subtitle editors. To do so, taking into account the specific requirements of VRT, we have developed an Adobe Panel that can be used by video editors to start AI transcription and subtitling and, upon completion, to pull the results back into Adobe Premiere. As a result, video editors have an automated process at their fingertips, without having to leave the comfort of their workspace, which delivers a result which is over 95% accurate and thus good enough for finishing in Adobe.

Provisional conclusions

Recycling subtitles originally produced for live content is very promising. The results so far indicate that 95% of the subtitles are accurate, and the remainder can be retimed +/- in real-time. We expect a total turnaround time of 1 to 2 times the length of the file, compared to 8 or 10 times.
Automatic Speech Recognition and AI subtitling are especially useful when starting from studio quality audio, a trained voice and grammatically correct language, in which cases it saves 75% of the turn-around time.
Using the Adobe Panel, video editors now have AI transcription and subtitling at their fingertips without having to standby and nurse every render, exports and file transfer, and without having to copy and paste between applications.

Future Work

The Adobe Panel is now available in private beta. In the next couple of months, we will conduct extensive user acceptance testing and fine-tune the look-and-feel to maximise operational efficiency. The goal is to make the Limecraft panel available in the Adobe Exchange and make it available for a wider audience of video professionals in need for AI generated subtitles by mid 2023.

💡 In the meantime, if you are interested to join the community of early adopters, feel free to send us an email and we will get in touch.

About VRT

VRT is the public broadcaster of the Flemish Community in Belgium. VRT presents high-quality and distinctive content in the areas of information, culture, education, entertainment and sports. With its three television channels, five radio stations and various digital channels, VRT reaches 90% of all Flemish people every week. The broadcaster plays an important role in the Flemish media ecosystem and cooperates with numerous national and international partners from various fields.

About STADIEM

STADIEM (Startup Driven Innovation in European Media), with its piloting and acceleration programme, brings together start-ups, scale-ups, investors and media organisations to foster the development of Next Generation Media solutions. More info: https://www.stadiem.eu/