Tracking my Podcast Listening
Online bookmarking has been there for a long time now. It's very common, simple to use and have APIs to export or use data. For a long time I used Delicious but then I moved to pinboard. Pinboard has Delicious compatible APIs. The APIs support both JSON and XML formats, allows filtering by tag etc. Also most internet users know how to use these bookmarking sites. Since podcasts are just URLs at the end of the day, I started booking marking them on Pinboard as soon as I finished listening them using Android's Share Option. I use PinDroid, It's a free and open source software available on Playstore and F-Droid. There are other apps available for both Android and iOS platforms if you don't like PinDroid. With the the App installed, bookmarking is as easy as Share the episode with PinDroid. All almost all podcast clients have a way to share the episode URLs.
How do I collect data?
Once I listen to the podcast I click on share the episode and share it with pinboard. I tag all of them as "listeningnow", you can use any other tag including "podcast". Just make sure to use the same tag everytime. We use this tag to track all the listening behaviour. Initially that's all it was. Pinboard by default gives you URL, Title and Timestamp when you shared (same as of our listening ending time). Over the time I wanted more metadata which I could record at the time of bookmarking. So I started thinking about it. The easiest way was to add more tags. The challenge was to make it a standard so it can read back without any difficulty. Then it suddenly occurred to me that OpenStreeMap uses tagging to store key=value pairs. So now I started adding a few tags in the format of key=value. I started with two podcast length and podcast unique id. I add other subject related tags if required. My notes go into description box.
//for example
listeningnow len=30 pod=cooltools
- len=30 mean length of the podcast is 30 minutes. It's always in minutes. Its closer to the nearest 10th, i.e if its 36 minutes then it will get logged as 40. If its 34 then it gets logged as 30 minutes. But it's left to you. It's basically how long have you heard the podcast. You can be as accurate as you want.
- pod=cooltools means it belongs to a podcast cool tools. Its unique in my world. It has a matching entry with the details of the podcast in a different table. You can actually derive this value from the URL or domain name. But I am making it easy for myself.
This way I can add as much metadata as I want without breaking my head or the api. It's also easy to share with others or easy to read even on pinboard's minimal UI.
Data parsing
I use pinboard API to pull the data and store it on my favourite JSON store CouchDB. You can either run a CouchDB instance on your machine or use a hosted service like Cloudant. I have a python script to pull the data from Pinboard, parse the tags, format the data and insert into a CouchDB table. This piece of code is available on Github if you like to setup your own.
//python code to parse tags in attributes tags = tagstring.split(" ") post['tags'] = tags if len(tags) > 2: print str(tags) for tag in tags: if tag.startswith("len="): l = tag.replace("len=","") post["period"] = int(l) if tag.startswith("pod="): pod = tag.replace("pod=","") post["podcast"] = pod
API and Analysis
CouchDB as you know stores JSON documents. I have an example JSON document below for your reference. You can process or query these CouchDB documents like any other couchDB documents. I have written some views to get some stats.
//Individual recording in couchDB { "_id": "08cb0557e67a9d89ada74cc3a511d173", "_rev": "3-bdd0b2f38d0c4db02919fc10ad04213f", "extended": "", "description": "20: Matt Cutts, Head of Web Spam Team at Google by Cool Tools | Free Listening on SoundCloud", "tags": [ "listeningnow", "len=20", "pod=cooltools" ], "period": 20, "href": "http://tracking.feedpress.it/link/7810/529589", "meta": "7ad135033cb7100adf5f149ab3e0e32d", "time": "2018-01-08T17:39:30Z", "shared": "yes", "podcast": "cooltools", "toread": "no" }
Total Time Played
This is the one which gives the total length (includes all podcasts and episodes) of episodes that I have heard until now
{ "_id": "_design/total_time_played", "_rev": "1-3b5af376022e8455d6b5d8bb5fa10c8a", "views": { "total_time_played": { "reduce": "_sum", "map": "function(doc) {\n if(doc.time && doc.period) {\n emit(doc.time, doc.period);\n }\n}" } }, "language": "javascript" }
When you access the view url, you will get the value in minutes
// https://couch_db_path/listeningnow/_design/total_time_played/_view/total_time_played {"rows":[ {"key":null,"value":1810} ]}
Total Time Played per podcast
{ "_id": "_design/total_time_played_podcast", "_rev": "1-221fd69b3307d9544460818d0151b451", "views": { "total_time_played_podcast": { "reduce": "_sum", "map": "function(doc) {\n if(doc.podcast && doc.period) {\n emit(doc.podcast, doc.period);\n }\n}" } }, "language": "javascript" }
When you access the url with grouping and reducing enabled
// https://couch_db_path/listeningnow/_design/total_time_played_podcast/_view/total_time_played_podcast?group=true&reduce=true {"rows":[ {"key":"cooltools","value":280}, {"key":"dataengg","value":100}, {"key":"democracynow","value":50}, {"key":"hiddenbrain","value":120}, {"key":"hpr","value":30}, {"key":"initpython","value":30}, {"key":"iotpodcast","value":170}, {"key":"irl","value":30}, {"key":"krshow","value":100}, {"key":"onbeing","value":50}, {"key":"outliers","value":30}, {"key":"podcastinit","value":50}, {"key":"riskybiz","value":60}, {"key":"sedaily","value":180}, {"key":"sysh","value":60}, {"key":"talkpython","value":110}, {"key":"theintersection","value":20}, {"key":"theprepared","value":70}, {"key":"timferris","value":180}, {"key":"trackchanges","value":90} ]}
Also are there any other metadata tags that will interesting to add or capture? or Any other Query Views/API you want? We can develop a common standard so one day, Podcast clients can push this data using webhook. So we don't need to use this intermediate bookmarking path.
Hi, nearly two years on from this post, I’m wandering what your setup is today for tracking your podcast listening history.
I am just trying to find a way to track basic listening stats (Podcast name, Episode name, Duration, Original Link, Date released, Date listened, etc).
Hello, I’m quite a noob with bus currently trying go through the same kind of analysis.
I am using antennapod for downloading and playing podcasts, and I found 2 ways of getting the data out of it:
– Gpodder connection.
When connected to a gpodder account, it is able to forward all the data about feeds, episodes, listening time.
Gpodder.net don’t display it on a satisfying way , but provides some instructions about querrying : https://buildmedia.readthedocs.org/media/pdf/gpoddernet/latest/gpoddernet.pdf
– Database export
The Antenna pod export function provides a .db file that can be opened with SQLight and encloses some data about the listening history.
Not sure if it helps for your quest.
Please let me know if you get any further with this.