For a number of years we have been working with news organisations to facilitate collaboration and information exchange. In 2018 we released the DPP Metadata Exchange For News framework, which provides guidance on using IPTC NewsML-G2 to exchange an agreed set of minimum metadata between systems and parties that share news content.
In November, the DPP hosted a small group of news leaders and specialists, to share information on their progress in the quest for metadata enrichment and automation. In attendance were Al Jazeera, CBC/Radio-Canada, the BBC, and Reuters News Agency.
Al Jazeera started the session with an overview of their various systems. “Metadata enrichment” was the term coined to describe their programme of activities, which is now well advanced, with work commencing to augment the metadata of their content. Their automated systems have evolved and now undertake contextual analysis of content. It’s now able to add contextual information that goes beyond traditional object detection; for example, a crowd scene could be classified not just as a crowd but by the type of crowd it is, e.g. a protest, or group of sports fans. Further still, it is able to identify a protest’s political context – whether it is left or right leaning, for example.
Al Jazeera’s partnership with the Qatar Computer Research Institute was discussed, which has developed an automated news aggregator to classify a source and stories based on stance, bias and propaganda metrics including the ability to detect fake news propagation.
Then Jeremy Tarling, Lead Information Architect, Curation, Authoring and Metadata, BBC talked about content tagging as being the first step to automation. The BBC content catalogue not only spans news but also online, TV and Radio. Their teams face challenges owing to each area employing different approaches to tagging content, thus making search and discovery extremely difficult.
In BBC News, they have been manually tagging content for well over 6 years, and have evolved tens of thousands of terms that describe the content. These are then clustered into broader topics and these topics are used to surface related and recommended content to the user. Meanwhile, the approaches used by other areas of the BBC vary; non-news television programmes may simply be tagged based on genre and format.
This manual approach presents challenges due to the human effort required, issues of consistency and accuracy, and scale. Jeremy highlighted how the same tag can mean different things in different contexts, creating problems. He shared an example of a crime story which could easily be recommended as related to an Olympic sports story, owing to both being tagged with “shooting”. Viewers may not be so happy if they’re recommended content about murder alongside their favourite coverage of clay pigeon shooting.
In order to solve this, the BBC has developed a common metadata model for all content, which they apply across the content tagging process – making use of mentions, “aboutness” and editorial tone of a piece of content. The “aboutness” of a story is a collection of tags that describe the organisation, place, theme and event related to a story. BBC Research & Development have also developed a classification system called Starfruit that automatically suggests tags using machine learning.
Unlike Al Jazeera or the BBC, Reuters is a global news agency that produces and distributes news content used by broadcasters and other organisations. Ian McLaren, Technical Director, provided an overview of the Reuters Connect platform. It’s an online portal used as a catalogue for the distribution of news assets. Ian talked about the evolution of NewsML-G2, the foundation of the DPP Metadata Exchange for News framework.
Reuters have been using NewsML-G2 for over a decade, and now have the capability to integrate with third-party APIs. This enables seamless exchange of metadata with other organisations. For content classification, Reuters are using a range of IPTC and internal category codes and topics, augmented by automated entity extraction.
We also heard from Bruce MacCormack, Senior Director Corporate Strategy and Development, CBC/Radio-Canada, about the work they’re doing in collaboration with the New York Times, BBC and the Partnership on AI around news validation. Authenticating genuine news content and combatting deep fakes has become a high priority. They are exploring the use of a unique identifier that platforms such as YouTube and Facebook could query, to authenticate the source of news content. This work continues, and further updates will be shared on the DPP blog in due course.
To find out more or to get involved with this work, please contact:
Programme Delivery Manager