Culture of quality
Podcasting is evolving rapidly. And, when it comes to measurement, it’s not just inheriting methods from more traditional media like radio and the web — it’s also inventing its own.
We use the term ‘podcast currency’, which essentially defines the number of impressions the advertiser is buying, and the way the content creator is going to report back. Our experience connecting thousands of advertisers with content creators over the past five years helps us support multiple podcast currencies within the Acast platform.
The industry standard when it comes to podcast currencies — and one we played a prominent role in shaping — is the IAB’s Podcast Measurement Technical Guidelines 2.0.
In the Nordics there’s a similar podcast currency and ranker called PoddIndex — and, thanks to the efforts of Acast and other members, it’s recently announced it will align its measurement guidelines with the IAB’s.
A number of other initiatives are also trying to improve podcasting data, including:
It’s vital that not only the incoming requests are measured, though. This can skew the numbers for a variety of reasons, including:
Instead, the dataset containing all requests is joined with the bytes from the content delivery network (CDN) that are actually uploaded for each request. This data is processed and multiple classifiers are run, depending on the agreed podcast currency.
A podcast player needs to know how long an episode is before it starts playing it, in order to display the timeline correctly. That means that, when you do server-side audio stitching, you have to create a unique permutation of the episode when the first request from that user and episode comes in.
The podcatcher sends a request to our media selector service with a byte range starting with zero. Our media selector fetches all audio from the content management system (CMS), implements ad targeting for all ad positions in the episode, then redirects the podcatcher to stitcher.acast.com.
The podcatcher sends requests to stitcher.acast.com with increasing byte ranges until the entire file is fetched, or the user aborts the listening session.
In podcasting it’s impossible to identify a user with 100% accuracy. Unlike the web, where you control the code that will run in the user’s browser, we have no control over individual podcast players and don’t know what information we’ll get in each request — or whether we can trust it.
It’s in the podcast player’s interest, however, to let platforms like Acast know that it’s the same user returning for the same episode, because most podcasts are delivered with dynamic ad insertion — meaning the ads, and therefore the length of the mp3 file, can be different for each user.
We call this a ‘stitch’. The podcast player always wants the same stitch, to protect the user experience and ensure the audio isn’t ‘jumping’ back and forth when fetching different chunks of the mp3 file.
Our approach is to as accurately as possible identify the user listening to an episode. That’s based on what we call the ‘best effort’ identifier, in the following order of priority:
In the beginning, most people downloaded full podcast episodes when they were connected to WiFi. Then, when they listened to it, the entire mp3 file was stored locally.
Measuring this server-side was both easy and difficult at the same time. It was straightforward because clients did a single HTTP GET request to download the file, and for any given threshold it was easy to determine whether or not it was valid. On the other hand, it was impossible to know if the user had actually listened to the episode or just downloaded it to their device and forgotten about it.
Today, of course, that listening pattern has shifted with faster and cheaper mobile internet connections. Most people hit play in their favorite podcast app at the exact moment they want to listen to that episode.
These are called progressive downloads, often loosely referred to as ‘streaming’. It’s great, because it makes it easier to know that the user is actually listening to the episode. But, once again, there’s a catch — because the podcast player now fetches the mp3 file in small batches, we get multiple requests from the same device and episode.
To be able to classify these requests server-side as valid listens or not, if they’re coming from the same user requesting the same episode, we need to group the requests together.
We use the combination best-effort-identifier and show + episode ID, and have defined that a request group always starts when we get a range request starting from byte zero. If we get another request with the same combination, we close the previous group and start a new one.
This method of grouping lets us very accurately track progressive downloads server-side. Request groups for full downloads will always have a single request per group — and all requests with the same combination within a 72-hour window will belong to the same group.
We use a 72-hour aggregation window because all our stitches are cached for that length of time. It also ensures we can follow a listen over a longer time, which is important for two reasons.
Firstly, in certain cases, it might take time for the request group to hit the download threshold to be counted as a listen. Secondly, in order to accurately count the delivery of ads placed in the middle and at the end of an episode, we want to keep aggregating requests belonging to the same request group for as long as the stitch is valid.
This does open up the question of whether this window size is in conflict with those used for frequency capping, but we’ll touch on that shortly.
Below is an example of the HTTP GET request that hits our CDN when a user is listening to a podcast on Apple’s podcatcher. After 12 minutes and 28 seconds, the final request of the episode reaches our CDN — well within the 72-hour request window and well above the 60-second threshold, and will therefore be calculated as one listen.
The episode in question, however, is more than an hour long — so it’s still impossible to know whether the user finished listening to the whole thing. All we know is that the player fetched the entire episode.
event_timestamp, range_request, bytes_downloaded
2019–06–10 01:36:01, 0–65535, 66367
2019–06–10 01:36:03, 65536–61719293, 33524
2019–06–10 01:36:03, 17059–65535, 49255
2019–06–10 01:36:03, 65536–61719293, 51791633
2019–06–10 01:38:46, 50331648–61719293, 3237211
2019–06–10 01:40:58, 52428800–61719293, 3300191
2019–06–10 01:43:07, 54525952–61719293, 3100699
2019–06–10 01:45:20, 56623104–61719293, 2723460
2019–06–10 01:47:32, 58720256–61719293, 2346214
2019–06–10 01:48:29, 60817408–61719293, 903620
In the example above it’s clear that, in this case, Apple’s Podcatcher requests byte ranges that overlap. This could be due to some optimisation logic that depends on parameters such as type of internet connection, file size and download speed — but in most cases players use proprietary code that isn’t publicly available for inspection.
What’s important is that we only want to count each byte that the client requests once. If a client requests the first 10 seconds of an episode six times, it shouldn’t be counted as a listen.
The bytes served by the CDN include more than just the audio track. It also contains response headers, ID3 tags with metadata, and often — since we do dynamic ad insertion — details of the ads ingested live upon each request.
If we want to make sure the user did indeed download 60 seconds of the episode, we need to know the sizes and positions of all of these elements — so we can subtract them from the total bytes served before checking whether the threshold was met.
One controversy for podcast measurements — and something we mentioned earlier — is the window size for frequency capping, which determines after how long we count a second listen for the same user and episode.
A short window opens up the risk of double-counting requests in cases where the best effort identifier is less precise. A longer window increases the chance of under-counting listens in cases of, for example, recycled mobile IPs that are true multiple listens. The IAB clearly recommends a 24-hour window, arguing that it balances the two issues.
There’s no right or wrong, as different window sizes optimise for different things. But the key thing is that advertisers and content creators use the same definitions, to make it clear what the advertiser gets and what the creator delivers.
It’s important to always send the same stitch to the user when delivering podcasts with dynamic audio insertion, to ensure a high-quality listening experience. This means, by design, we risk serving the same combination of ads to multiple users — since server-side we can’t tell them apart.
For any two requests coming in with the same best-effort-identifier and show + episode ID, we will serve the same stitch. But some programmatic ad platforms don’t allow us to deliver the same ad multiple times, so we need to mitigate the risk of double-counting by having a bigger window.
Then we have the aforementioned 72-hour request group window. This could be in conflict with podcast currencies and ad reporting requirements so, as part of our filtering, we only mark a request group as valid when it fulfils all the criteria — and we do so only once in a 24-hour period for the same user and episode combination.
For all these windows there are also multiple ways of implementing them, which will yield different results and behaviours. For example sliding, tumbling, or hopping windows.
For Acast, it’s important we create data pipelines that are ‘idempotent’ — a mathematical term referring to an operation that can be applied arbitrarily many times without changing the result. Therefore, we use tumbling windows based on UTC time for frequency capping.
There are a number of scenarios where we want to filter out requests because they likely come from bots or other non-valid traffic. This can be done by applying different kinds of blocklists — of IP addresses, user-agents and referrers, for example — but maintaining these lists manually is tedious and far from a guarantee that it will capture everything significant.
An alternative approach is to apply machine learning algorithms to automatically detect anomalies. This can be done either in real-time, filtering directly, or offline — where a request classified as non-valid is reprocessed and filtered out.
So far the podcasting industry has been relatively sheltered from non-legitimate traffic patterns, but as the medium matures we will almost certainly see more of this — so we’re working to stay ahead of the game.
A common approach to standardising podcast metrics is to have a third party sitting between the podcast player and the hosting platform — with all requests coming in via the third party so they can ensure listens are counted uniformly, regardless of the hosting provider they use.
While it’s true that they can guarantee listens are counted in the same way, that comes at the cost of accuracy. The proxy will only see the requests clients make to the enclosure URL specified in the RSS feed — but, depending on how the hosting platform delivers the audio to the podcast player, the proxy will not always be given the full picture.
Platforms like Acast receive the initial request for the mp3 file by a media selector service that determines the combination of ads the user will be served, then redirects the player to fetch that specific version of the episode from a global CDN. Distributing content via a CDN is preferable because it ensures the response happens as quickly as possible, based on the user’s geographic location when requesting the file.
That means the podcast player is now requesting the mp3 file via the URL from the CDN, rather than the enclosure URL in the RSS feed, and will hence bypass the third-party proxy.
The IAB and PoddIndex also require you to review how many bytes were actually sent to the podcast player, to avoid counting cancelled requests.
This is a conceptual overview of how Acast has implemented all of the above with AWS.
After joining CDN request logs with the bytes actually delivered for each request, we run several transformation jobs that generate these attributes for all requests:
For IAB compliance, we then only count requests with the following attributes:
For PoddIndex compliance, we then only count requests with the following attributes: