How I built A Quick Dashboard for #SpeakForMe Campaign
#SpeakForMe is a campaign to petition Indian MPs, Banks, Mobile operators and other service providers to speak for you against the Aadhaar linking coercion. You can go to #SpeakForMe to send your petition. As part of campaign I built a quick and dirty dashboard for the emails sent. This is a quick note on how I did that.
#SpeakForMe has a twitter account @bulletinbabu which used to tweet updates in a standard format, at regular intervals. At first I started parsing these tweets and started plotting them on a graph. The parsing script would run every hour find all the tweets and then parse them and insert them into a CouchDB. Parsed CouchDB document is very simple and can be used to directly for charting
{ "_id":"2017-12-13T18:20:02+05:30", "_rev":"1-67c8a405a19f3a787f42640fa1ac9aef", "govt":32, "stat":"email_sent", "mps":780, "campaign":"#SpeakForMe", "others":13, "mobile":69, "tw":940926800292540417, "total":1000, "banks":106 }
Scraper code is pretty standard too
#!/usr/bin/env python # encoding: utf-8 import couchdb import tweepy #https://github.com/tweepy/tweepy import csv import re import arrow import time # The consumer keys can be found on your application's Details # page located at https://dev.twitter.com/apps (under "OAuth settings") consumer_key="" consumer_secret="" # The access tokens can be found on your applications's Details # page located at https://dev.twitter.com/apps (located # under "Your access token") access_key="" access_secret="" #you will have to change this couch_url = "https://username:password@mycouchdb.url.com" remote_server = couchdb.Server(couch_url) bulletinbabu_db = remote_server['bulletinbabu'] def get_all_tweets(screen_name): #Twitter only allows access to a users most recent 3240 tweets with this method #authorize twitter, initialize tweepy auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_key, access_secret) api = tweepy.API(auth) #initialize a list to hold all the tweepy Tweets alltweets = [] #make initial request for most recent tweets (200 is the maximum allowed count) new_tweets = api.user_timeline(screen_name = screen_name,count=200,tweet_mode="extended") #save most recent tweets alltweets.extend(new_tweets) #save the id of the oldest tweet less one oldest = alltweets[-1].id - 1 #keep grabbing tweets until there are no tweets left to grab while len(new_tweets) > 0: break #all subsiquent requests use the max_id param to prevent duplicates new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest,tweet_mode="extended") #save most recent tweets alltweets.extend(new_tweets) #update the id of the oldest tweet less one oldest = alltweets[-1].id - 1 print "...%s tweets downloaded so far" % (len(alltweets)) for tweet in alltweets: print "--------------------------------------------------------------------------------------------" bulletinbabu = {} bulletinbabu['tw']=tweet.id bulletinbabu['campaign']="#SpeakForMe" bulletinbabu['_id'] = arrow.get(tweet.created_at).to('local').format('YYYY-MM-DDTHH:mm:ssZZ') text = tweet.full_text.encode("utf-8") print str(text) if text.startswith("Emails from #SpeakForMe to:"): bulletinbabu['stat']="email_sent" regex_search = re.search('MPs:(.*) ', text, re.IGNORECASE) if regex_search: mps = regex_search.group(1) mps = mps.replace(",","") print str(mps) bulletinbabu['mps']=int(mps.strip()) regex_search = re.search('Banks:(.*) ', text, re.IGNORECASE) if regex_search: banks = regex_search.group(1) banks = banks.replace(",","") bulletinbabu['banks']=int(banks.strip()) regex_search = re.search('Mobile service providers:(.*)\ ', text, re.IGNORECASE) if regex_search: mobile = regex_search.group(1) mobile = mobile.replace(",","") bulletinbabu['mobile']=int(mobile.strip()) regex_search = re.search('Government services:(.*)\ ', text, re.IGNORECASE) if regex_search: govt = regex_search.group(1) govt = govt.replace(",","") bulletinbabu['govt']=int(govt.strip()) regex_search = re.search('Others:(.*)\ ', text, re.IGNORECASE) if regex_search: others = regex_search.group(1) others = others.replace(",","") bulletinbabu['others']=int(others.strip()) regex_search = re.search('Total:(.*)\ ', text, re.IGNORECASE) if regex_search: total = regex_search.group(1) total = total.replace(",","") bulletinbabu['total']=int(total.strip()) print str(bulletinbabu) try: bulletinbabu_db.save(bulletinbabu) except couchdb.http.ResourceConflict: print "Already exists" break time.sleep(0.1) elif text.startswith("Top recipients of #SpeakForMe emails:"): #bulletinbabu['stat']="top_rcpt" pass if __name__ == '__main__': #pass in the username of the account you want to download get_all_tweets("bulletinbabu")
Since CouchDB provides http restful access to data, there was no issue in pulling the data from the database using standard AJAX requests for plotting. Couple of days later #SpeakForMe team wanted to see how many emails were sent to MPs. So I asked them to post the aggregate analytics 1 they were collecting to my CouchDB. They started posting two types of documents. One aggregate at services level, second aggregates at individual receiver level. Posting would be a simple web POST using python requests for them. Just like posting to any webhook
import requests couch_url = "https://username:password@mycouchdb.url.com" data = {'stat': 'email_sent', 'total':2339 , 'campaign': '#SpeakForMe', 'mobile':198 , 'tw':941038005199998978 , 'govt':78 , 'mps': 1824, 'others':6 , 'banks': 233, '_id': u'2017-12-14T01:41:00+05:30'} r = requests.post(couch_url, json = data)
First document is similar to what I used to scrape. Second one is a bigger document. It has number of emails at the level of service provider or MP. Attribute "stat" differentiates the two types of document. _id which is primary key is just a standard time-stamp. As you can see in the partial "mailbox_email_sent" document below. The key has two parts "type of provider" and "provider name", separated by "/". For airtel it is "mobile/airtel" etc. For mps it starts with mp and then has state code and parliamentary constituency number, Eg: "mp/mh-47". Here is the copy of full document if you like to see.
{ "_id":"2017-12-20T13:10:03.243684+05:30", "_rev":"1-b9ed7d5104bf0c8e1fbd341743829084", "gov/pan":344, "mobile/mts":1, "mp/ar-2":4, "mp/mh-47":28, "bank/bkid":24, "mp/ka-14":19, "mp/ka-15":65, "mp/wb-41":1, "campaign":"#SpeakForMe", "bank/orbc":10, "mp/ke-20":167, "total":32354, "bank/lavb":2, "bank/synb":6, "bank/ibkl":19, "mobile/airtel":595, ... .... ... "mp/or-12":4, "mp/pb-8":17, "mp/wb-30":5, "bank/indb":14, "mobile/idea":183, "stat":"mailbox_email_sent", "bank/vijb":7, "mp/bi-38":18, "mp/bi-40":3 }
On the client side its just static html and javascript. I used Parliamentary Constituencies Maps provided by Data{Meet} Community Maps Project. They are displayed using leaflet and d3. In fact I borrowed parts of code from DataMeet maps project. I use Lodash, to query, filter and manipulate the documents returned by CouchDB. For example
let all_rows = _.reverse(returned_data.rows); //Filter emails sent let rows = _.filter(all_rows, function(o) { return o.doc.stat == "email_sent" && o.doc.campaign == "#SpeakForMe"}); let latest_row = _.last(rows); let rows_mailbox_email_sent = _.filter(all_rows, function(o) { return o.doc.stat == "mailbox_email_sent" && o.doc.campaign == "#SpeakForMe"}); let latest_mailbox_email_sent = _.last(rows_mailbox_email_sent);
You can see the code that does everything here. I used Frappé Charts for charting. I love them. They are simple and look great.
Basically analytics data gets stored in a CouchDB and served as standard restful that CouchDB provides to browser. Running couchdb to receive external authenticated webhook post and then serve the data as restful service worked like a charm. Of course I didn't have much traffic to test under load. But since CouchDB is behind a CloudFront (Amazon CDN), I guess is it can take quite a bit of load. Usually the team pushes data every 5 minutes if there are updates. So it shows live status (see the last updated time stamp).
At some point I will create graph of email traffic (daily emails sent etc). Any other graphs you would like to see? I will be happy to answer any questions if you have.
- As you can see in the data json documents, only aggregates, no personal information ↩