Counting My RSS Feed Subscribers
How many subscribers does my blog have? It's difficult to answer the question. A while back, I moved from Feedburner to my feed hosting. So I can do some estimation. It's based on the following assumptions.
- Some of the hosted multi-user feedreaders report the subscribers. We can extract that and use it.
- Self-hosted cloud-based ones usually don't. But I consider them as "1" based on the IP address.
- Folks using clients on their phone/PC without any cloud component as considered "1" per IP address.
- Some cloud-hosted multi-user feedreaders don't report subscribers. Currently, I consider them as one subscriber. I need a better way to figure this out.
These are the ones that run through my script that replaced the Feedburner. Some folks use the blog's built-in feed. I have not counted that yet. I need to figure that.
def subscribers_log(event):
event_data = {}
event_data["event"] = "subscriber"
event_data["datetime"] = datetime.utcnow().isoformat()
event_data["date"] = datetime.utcnow().isoformat()[:10]
if "requestContext" in event:
requestContext = event["requestContext"]
if "path" in requestContext:
event_data["feed"] = requestContext["path"]
if "identity" in requestContext:
identity = requestContext["identity"]
if "sourceIp" in identity:
event_data["source_ip"] = identity["sourceIp"]
if "userAgent" in identity:
user_agent = identity["userAgent"]
simplified_user_agent = subscribers_re.sub("X subscribers", user_agent)
event_data["user_agent"] = user_agent
event_data["simplified_user_agent"] = simplified_user_agent
match = subscribers_re.search(user_agent)
if match:
event_data["count"] = int(match.group(1))
else:
event_data["count"] = 1
# Insert only if you have data
k = (
event_data["source_ip"]
+ event_data["simplified_user_agent"]
+ event_data["feed"]
).encode("utf8")
if event_data["count"] > 1:
k = (event_data["simplified_user_agent"] + event_data["feed"]).encode(
"utf8"
)
# create unique key
h = hashlib.md5(k).hexdigest()
event_data["_id"] = event_data["date"] + "_" + str(h)
print("**----------------------**")
print(event_data)
try:
req = urllib.request.Request(DB_URL)
req.add_header("Authorization", AUTH_KEY)
req.add_header("Content-Type", "application/json; charset=utf-8")
jsondataasbytes = json.dumps(event_data).encode("utf8")
req.add_header("Content-Length", len(jsondataasbytes))
response = urllib.request.urlopen(req, jsondataasbytes)
except:
print("Error posting data")
Above is the script I use to extract the subscription and add it to CouchDB. It has borrowed from Simon Willison's script. Since _id
is the primary key in CouchDB, duplicate inserts are ignored.
{
"_id": "2022-04-29_04e730996daaa95df14a71e01c9ae326",
"_rev": "1-d1680d2ec633dffd17c17b70adde0b35",
"event": "subscriber",
"datetime": "2022-04-29T03:46:30.710509",
"date": "2022-04-29",
"feed": "/thejeshgn",
"source_ip": "8.29.198.27",
"user_agent": "Feedly/1.0 (+http://www.feedly.com/fetcher.html; 101 subscribers; like FeedFetcher-Google)",
"simplified_user_agent": "Feedly/1.0 (+http://www.feedly.com/fetcher.html; X subscribers; like FeedFetcher-Google)",
"count": 101
}
Then once a day, I pull the previous day's data and do aggregation. I also pull data from WordPress and add it to the aggregate JSON document.
{
"_id": "2022-04-29_subscriber_count",
"event": "subscriber_count",
"date": "2022-04-29",
"feed": "/thejeshgn",
"data":
[
{
"provider": "feedly",
"count": 101,
"type": "rss"
},
{
"provider": "theoldreader",
"count": 23,
"type": "rss"
},
{
"provider": "bloglovin",
"count": 3,
"type": "rss"
},
{
"provider": "inoreader",
"count": 2,
"type": "rss"
},
{
"provider": "independent",
"count": 70,
"type": "rss"
},
{
"provider": "wordpress",
"count": 1015,
"type": "follow"
},
{
"provider": "wordpress",
"count": 1132,
"type": "email"
}
]
}
The next step is to observe the script for a couple of days and weed out any bugs. And then update my subscription page and widget to reflect these numbers. My final goal is to beat my Twitter followers, and I know it's not easy.