Using powerful pattern-recognition algorithms, artificial intelligence software is increasingly able to take on the heavy lifting of post-production, freeing up editors to concentrate on creative decisions.
A science fiction horror film about a bio-engineered being who exceeds the expectations of her creators provided fitting material for an experiment in editing using artificial intelligence (AI).
IBM’s AI system Watson was used for a proof of concept to whittle down 90-minute Fox feature Morgan to six minutes of footage that was then used by an editor to cut the official trailer.
It is claimed that the entire process took just 24 hours.
Watson, which is described by IBM as “a technology platform that uses natural language processing and machine learning to reveal insights from large amounts of unstructured data”, was fed hundreds of suspense and horror film trailers and taught to categorise them by type of location, framing, lighting and audio, such as ambient sounds, characters’ tone of voice and musical score.
“We looked at patterns in trailers to teach Watson the horror movie domain and it was able to cluster different kinds of clips into categories such as tender or scary moments,” says IBM manager of multimedia and vision John Smith.
When fed the full-length Morgan, Watson identified 10 moments that would be the best candidates for a trailer, nine of which were used by a craft editor for the final cut.
“We decided not to use one clip because it was felt that it might convey information that could potentially be a spoiler. If we were working with a comedy, it would have a different set of parameters to select different types of moments,” says Smith.
Proponents of software that automates part or all the editing process argue that the technology can save time, and therefore money, on anything from YouTube promos to fixed-rig docs, but does not yet threaten the role of the craft editor.
Nonetheless, predictions of job losses could alarm some working in post-production. “There are two main and conflicting groups,” says Philip Hodgetts, co-founder of organisation and pre-editing software firm Lumberjack Systems.
“One approach is to use intelligent augmentation to enhance human performance. The other uses artificial intelligence to replace the human editor and, ultimately, all of us, with machines.”
If that sounds dramatic then Oren Boiman, co-founder and chief executive of Magisto, an AI-based consumer editing software developer that claims 80 million users, offers little reassurance.
“There are hundreds of tricks an editor uses every day to tell a story or evoke emotion,” says Boiman. “Magisto employs those as algorithms.”
Cognitive computing in the production process boils down to two principles.
The first is the use of mathematical formulae to scan audio-visual content and organise material according to a set of predefined parameters.
The second is machine learning, which means that over time, the capabilities of the software increase exponentially.
“There is no doubt that computer algorithms will be involved in production’s future,” says Hodgetts.
“Today’s AI is very good at recognising patterns in a large dataset, but it is hard for AI to be creative enough to create good content. Smart people will work out how to master it.”
While developers like Magisto have created their own algorithms, tech giants like Google, Microsoft and IBM are also developing Application Programming Interface (APIs) to generate the metadata building blocks for AI.
At a basic level, this includes analysis of in-frame action and camera motion, speech-to-text transcription, facial, object and even emotion detection.
Combinations of such data are claimed to serve up results as well as, if not better than, a team of trained loggers and assistants – and in a fraction of the time.
“AI is an assistive tool,” says Smith, who explains that IBM is keen to explore more applications in the content creation sector.
“We see a combination of computer and human expertise as the sweet spot. A computer is incredibly powerful at parsing large repositories of data or ‘watching’ hundreds of hours of video and distilling that down to a smaller set of things, while the editor brings a unique set of skills that AI cannot replicate at this time.”
Boiman agrees: “AI can definitely help in identifying and marking up the content to give the editor and director more time and space to make creative decisions. It is automating all the manual heavy lifting.”
Among the target market for auto-edit packages are consumers or action sports enthusiasts who have neither the skill, the time nor the inclination to edit video shot on a GoPro or smartphone.
Instead, they can upload raw footage to an application like Magisto or Antix and have it returned to them – with degrees of customisation – as a selection of highlights cut together into a storyline and packaged with effects, grading filters, even a music track.
The same techniques are also aimed at corporate communications and promos.
“Making one cut of a promo was fine when TV was the only distribution medium, but today that’s not good enough,” says Boiman.
“If you are creating a trailer for a newspaper website, Facebook, Snapchat, YouTube or Instagram, each one should be formatted differently.
“You might target by gender, age or by country. With so many variants of the same source media required to optimise every impression online, doing so manually is extremely inefficient.”
The exact same pressures, it’s argued, will result in cognitive computing being applied to reality or documentary content, where shoot ratios are increasing and turnaround times decreasing.
“If 60 hours has been shot in a day, even a week, there is no way an editor will be expected to see the rushes,” says Hodgetts.
“The only way to do this is to use modern technologies to automate the task. It’s not about replacing an editor but making life easier for them by pre-assembling the material in a coherent way. Lumberjack automates the logging and pre-editing stages – all the boring stuff – up to the point where an editor takes over.”
Lumberjack has been used on OJ Speaks: The Hidden Tapes for A&E, and by Denmark’s STV to assemble 69 x 10-minute episodes of semi-scripted kids’ series Klassen.
“STV found the basic scene organisation saved them an enormous amount of time. They would not have been able to do this show without it,” says Hodgetts.
His goal is to present a documentary fully constructed by Lumberjack to SMPTE-backed trade association HPA in 2018. “We’re on track to do it,” he says.
Forbidden Technologies business development director Jason Cowan says there may be some advantage to being able to transcribe, but the best transcriptions are only 80% accurate.
“It’s also useful to discount content where nothing is happening – therefore selecting material where something is happening. But the ability to recognise a ‘smile’ from a ‘wry smile’, for example, is the sort of nuance that creates valuable content and is very much a human trait.”
Forbidden has no plans to incorporate AI into its Forscene edit package, but does have AI at the core of its encoding technology Blackbird.
“Every compression tech is based on algorithms but ours takes this further by using AI to examine every frame a thousand times a second and to define the best compression technique based on that frame,” says Cowan. “This means we can offer online access to content without latency anywhere in the world.”
So far, AI is unlikely to be considered for scripted content, which is an area “with very set workflows and high budgets to deal with, so people are unwilling to take a risk”, says Hodgetts.
Automated systems could, however, replace loggers, runners or edit assistants. “Jobs will be lost,” he adds. “Those assistant editing roles where material is organised around a manual transcript will be reduced and probably eradicated completely.
“If you can teach any job in one or two days, then it will probably be automated out of existence in five to 10 years.”
He suggests that this will happen faster in Europe than Hollywood.
“Europe has less union involvement [than the US] and tends to be more willing to look at new solutions.”
Adam Theobald, founder of auto-edit software firm Antix, admits that trying to sell the app into the video-editing community is a challenge.
“They put up a very big barrier because they are afraid of it replacing their job,” he says. “Yet we can demonstrate how our technology does the logistics and sourcing of content so that editors can concentrate on applying their creative process.”
While dismissing the idea of AI replacement as unlikely, Cowans agrees that it could raise the bar in terms of quality.
“Should it ever happen that AI can take the grunt work out of post, the art of editing and grading will simply advance,” he says.
“If we think about the quality of reality TV compared with drama, there is more headroom for improvement, but this would be a human role.”
Boiman insists that AI offers “editing superpowers” for craft editors to do much more. “Professional video production tools like Avid are extremely sophisticated, but dumb,” he says. “They give the user ultimate control over the video, but such control forces you to do everything. There’s a huge gulf between conventional professional video production, which is 100% manual labour, and software like ours, which can produce entirely automated packages. There is no doubt the gap will close.”
AI edit tools
Prosumer application with an emphasis on action sports.
Its AI captures wearable data using heart-rate sensors, GPS, an accelerometer and a gyroscope.
“Combined with other contextual data, this can be used to create storylines with emotion,” says Antix founder Adam Theobald.
“For example, we can find clips where the subject has generated a faster heartbeat as those are more likely to represent a physical reaction.”
Users can customise the AI’s output.
“Rather than just pumping video in and out, we’re keen on letting the user put their own stamp on content because this will encourage others to share it more on social media,” Theobald adds.
Prosumer application that incorporates professional effects and transitions such as zooming in on important actions, panning to give movement to static shots and lowering the volume of the music when someone is talking.
It applies colour correction, image stabilisation and other image enhancement technologies.
“We understand that AI cannot read the mind of a producer or marketer, not least because they may not know what they want from the process at the beginning,” explains Magisto co-founder and chief executive Oren Boiman.
“We invite them to refi ne the initial draft by telling the AI, for example, to be more dramatic, or add highlights, incorporate stock footage or apply a new grade. In less than an hour, we can have a very polished video.”
Professional application that works with Final Cut Pro X and uses a keyword extraction API from machine learning software Monkey Learn.
On location, one or more people log keywords via a web-enabled device. These are tied to the media and allow a producer or editor to set the AI to look for keywords, concepts and emotions relevant to where they think the story will unfold.
“From the analysis of rushes, you could immediately see that you had, for example, 18 minutes of a certain keyword and two minutes of another,” explains Lumberjack Systems co-founder Philip Hodgetts.
“That could suggest that you look for the story in the 18-minute selection, or, if the keyword with two minutes is important to you, that indicates that you need to find more of that material.”