It has transitioned 16 million assets, from historical film to modern digital media to the cloud for long-term storage

BBC archive story

The BBC Archives Technology and Services team has centralised, digitised and migrated its 100-year-old archives to cloud storage with AWS.

Its aim is to ensure the content is preserved safely and remains accessible for future use.

The BBC began by using Amazon S3 Glacier Instant Retrieval, which is an archive storage class that delivers the lowest-cost storage for long-lived data that is rarely accessed yet requires retrieval in milliseconds.

The broadcaster’s data had been split into separate genre repositories for news, sports, radio, and programmes, stored on on-premises infrastructure.

WIth the move to the cloud, it wanted to standardise workflows to develop a centralised plan for archiving.

In 2017, the BBC started looking for a way to reduce the complexity of its systems as its existing, disparate datasets were unnecessarily onerous.

Next followed five years of consolidating various storage application layers.

The BBC had been running its media asset processing system on AWS for years, and decided it was a practical next step to migrate to an AWS solution that could contribute to its longer-term preservation strategy.

“We wanted a consistent approach to extract value from inconsistent datasets, create an authoritative single catalogue that matches the media, and drive value for our audiences,” explains Brendan Mallon, head of product and services in BBC Archives Technology and Services. “Using AWS, we can standardise storage for all of our content.”

BBC archive story2

The BBC’s 100-year-old archives span 16 million assets, from historical film to modern digital media. 

“We want to have a forward-looking strategy, with tools like flexible storage and compute that facilitate the use of machine learning,” says Mallon. “Our goal is to safeguard the content in the archives so that it’s accessible for another 100 years.”

To complete the content migration, the BBC used its existing infrastructure powered by AWS Direct Connect, which is a cloud service that helps users create a dedicated network connection to AWS for smooth and reliable data transfers at a massive scale.

The migration kicked off three years ago (November 2022), after about 12 months of planning and consulting with Cloudfirst.io (Cloudfirst), an AWS Partner.

At peak, the team migrated 120 TB of data per day, using AWS Direct Connect to transfer sizable amounts of content to AWS.

Within 10 months, the team had transferred 25 PB of data out to the cloud. By doing so, it could retire one of its legacy tape-based media repositories and develop a next-generation abstraction between media asset management systems and public cloud storage.

“We were able to retire half the archive’s physical infrastructure,” explains Mark Glanville, senior technical architect in BBC Archives Technology and Services. “This frees up a huge amount of technical space and power in some precious real estate in central London.” 

The BBC migrated most of its data to Amazon S3 Glacier Instant Retrieval. “We worked alongside the AWS team to select the right storage class for the bulk of our content,” says Glanville. “By using Amazon S3 Glacier Instant Retrieval, we can benefit from expedited retrieval from the archive while having cost flexibility.”

The team uses a combination of Amazon S3 Glacier Instant Retrieval and Amazon S3 Intelligent-Tiering, a cloud storage class that delivers automatic storage cost savings, says AWS.

The BBC can choose between these two storage classes depending on its expected level of access.

“By using Amazon S3 Glacier Instant Retrieval and Amazon S3 Intelligent-Tiering, we get archive-like pricing models for content that we previously had in relatively hot storage,” says Tom Cartwright, executive product manager at the BBC. “This is really valuable because we can make decisions early on in a project about where our data lives.”

With the archive of content now transitioned over to cloud storage, the BBC is now looking to improve discovery through machine learning, using tools like speech-to-text and facial recognition.

“Our shared vision is to set up the business for a sustainable future,” says Mallon. “We want our content to be as discoverable and as accessible as possible.”