Why have my listen counts gone down?
As part of regular maintenance, Pinecast analyzes logs of listens and subscriber events to find and flag bots that were previously unclassified. Unlike other hosts, we keep an individual record of each listen (rather than aggregate totals) along with metadata about that event. As we find new information about bots, we're able to retroactively flag listens from those bots. This has the effect of decreasing listen counts.
While we recognize that this maintenance is disruptive, we overwhelmingly prefer to give you the most accurate view of your listenership that we possibly can.
How do I know that this was done correctly?
Nightly snapshots of our analytics data is stored in the event that changes have been made incorrectly. Streaming log data of all listen events is separately stored for a fixed period of time, allowing us to rollback bad changes.
Additionally, no analytics data is deleted as part of a maintenance operation: flagged listens are kept but excluded from charts and aggregates. We are able to easily un-flag events as necessary.
All changes to analytics data are performed in an auditable two-step process. The first step simulates the operation being performed, allowing the effects to be observed for a sample of flagged events. The second step commits the effects of the operation to the database.
How do you detect unclassified bots?
Our techniques vary and evolve over time. Some common approaches:
-
We look for request fingerprints that span many podcasts. It's extremely unlikely for a genuine listener to subscribe to a very large number of Pinecast podcasts.
-
Requests that appear to be made by extremely old software (e.g., pre-2010) or software that couldn't possibly listen to a podcast is often a sign of a bot.
-
Requests from a single IP address made so quickly that it's almost impossible for the listens to be genuine are often flagged.
-
Identical requests made from a single ASN or small IP address range that have identical fingerprints are often bots.
-
Requests made from newly-cataloged data center IP address ranges are likely bots.
-
Requests made using generic HTTP clients without specifying a custom User Agent string are almost always bots.
When a potential bot is detected during an audit, it is reviewed and a rule is defined that filters the bot from new analytics being ingested. After the rule is deployed to Pinecast's analytics ingestion pipeline, the analytics database is updated.
When does this happen?
We don't have a schedule that this maintenance is performed on. We regularly audit our analytics databases on a roughly quarterly cadence, though we may opt not to make any adjustments if we find that there is not a meaningful amount of unclassified bot data to justify maintenance.