Some weeks back, I was invited to a DPP panel discussion where one of the topics was how to scale your machine-learning (ML) workflows. I’m a woman of many hats, and in addition to my work at Umeå University (find Stockholm on a map, and let your eyes follow the coast upwards - we’re just below the arctic circle), I’m also acting CTO at Adlede AB and Codemill AB, two media tech companies with a strong emphasis on AI/ML in their product offerings. This means that when I was later asked to write a short blog on the same topic, I had many good people to call on, and there are a few points we wanted to share.
Adlede, which was part of the DPP and Digital Catapult’s AI in Media event in 2020, offers contextual programmatic advertising. This means we use the programmatic ecosystem to place ads in the best possible media context. A sample scenario is an international furniture retailer that wants to target online articles about major life changes. If computing resources were free and infinitely elastic, we’d classify everything that is published online the instant it appears. In the actual world, this is not likely to happen. Our solution is to focus on high-quality publications in popular demand. Anyone who has tried their hands at natural language processing knows that it is easier to have good results with clean data, and analysing a piece of content is only a meaningful investment if the result is used sufficiently many times. In short: When you are searching for a target class in a vast heterogeneous dataset, start with the part of the dataset where you would be happiest to find positive instances. If you are lucky, you find enough material for your purposes without having to traverse into the murkier parts of the data.
Codemill, the sister company of Adlede, helps major Media & Entertainment customers to remaster their video workflows, moving hardware-demanding on-premise workflows to cloud. This also gives access to a built-out ML/AI machinery that helps increase the speed and quality of content production by improving searchability, automating editing tasks, and adding additional layers of security in compliance checking. A simple example is to use ML to locate the start and end of the intros, and add the metadata needed for users to skip through them. This tagging would otherwise be manual, with the human annotator having to step back and forth through the video frames to find the exact time points.
Another typical use case for ML is to detect explicit content, that is, violence, guns, and rock n roll. This is difficult to solve exactly, but a common way forward is to use over-sensitive classifiers that flag everything that seems remotely problematic, and then a human annotator checks the flagged parts and identifies the true occurrences. This hybrid solution is still not perfect, but it may be the only option when the data stream moves so rapidly that a completely manual inspection is not possible. To improve performance, we can recognise that when it comes to training data, a little gold is better than a lot of garbage - correcting a small percentage of misclassifications in the data can have the same effect on classifier accuracy as doubling the size of the dataset. Many of the solutions that we build on top of AWS Rekognition includes a feedback loop, to allow the user to correct erroneous metadata, so that the system can improve with time.
When it comes to training data, a little gold is better than a lot of garbage
Finally, we can scale not only the technology, but also the team. We think that distributed production is the future, and have put together Accurate Video, a suite of tools for solving everyday use-cases with video. Tasks like preparing content, tagging, viewing and even editing can be done via cloud using a standard web browser, instead of using dedicated equipment, installed software and large storage servers. This makes scaling teams easy, and enables remote working that otherwise would not have been possible. Like operating out of Umeå.
To learn more about the DPP's work in AI and ML, contact Rowan.
Rowan de Pomerai