How I built A Quick Dashboard for #SpeakForMe Campaign

#SpeakForMe is a campaign to petition Indian MPs, Banks, Mobile operators and other service providers to speak for you against the Aadhaar linking coercion. You can go to #SpeakForMe to send your petition. As part of campaign I built a quick and dirty dashboard for the emails sent. This is a quick note on how I did that.

Part of #SpeakForMe dashboard showing emails sent to MPs on a PC map.

Part of #SpeakForMe dashboard showing emails sent to MPs on a PC map.

#SpeakForMe has a twitter account @bulletinbabu which used to tweet updates in a standard format, at regular intervals. At first I started parsing these tweets and started plotting them on a graph. The parsing script would run every hour find all the tweets and then parse them and insert them into a CouchDB. Parsed CouchDB document is very simple and can be used to directly for charting


Scraper code is pretty standard too

#!/usr/bin/env python
# encoding: utf-8
import couchdb
import tweepy #
import csv
import re
import arrow
import time

# The consumer keys can be found on your application's Details
# page located at (under "OAuth settings")

# The access tokens can be found on your applications's Details
# page located at (located
# under "Your access token")

#you will have to change this
couch_url = ""

remote_server = couchdb.Server(couch_url)
bulletinbabu_db = remote_server['bulletinbabu']

def get_all_tweets(screen_name):
	#Twitter only allows access to a users most recent 3240 tweets with this method
	#authorize twitter, initialize tweepy
	auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
	auth.set_access_token(access_key, access_secret)
	api = tweepy.API(auth)
	#initialize a list to hold all the tweepy Tweets
	alltweets = []	
	#make initial request for most recent tweets (200 is the maximum allowed count)
	new_tweets = api.user_timeline(screen_name = screen_name,count=200,tweet_mode="extended")
	#save most recent tweets
	#save the id of the oldest tweet less one
	oldest = alltweets[-1].id - 1
	#keep grabbing tweets until there are no tweets left to grab
	while len(new_tweets) > 0:
		#all subsiquent requests use the max_id param to prevent duplicates
		new_tweets = api.user_timeline(screen_name = screen_name,count=200,max_id=oldest,tweet_mode="extended")
		#save most recent tweets
		#update the id of the oldest tweet less one
		oldest = alltweets[-1].id - 1
		print "...%s tweets downloaded so far" % (len(alltweets))

	for tweet in alltweets:
		print "--------------------------------------------------------------------------------------------"
		bulletinbabu = {}
		bulletinbabu['_id'] = arrow.get(tweet.created_at).to('local').format('YYYY-MM-DDTHH:mm:ssZZ')	
		text =  tweet.full_text.encode("utf-8")
		print str(text)
		if text.startswith("Emails from #SpeakForMe to:"):
			regex_search ='MPs:(.*) ', text, re.IGNORECASE)
			if regex_search:
				mps =
				mps = mps.replace(",","")
				print str(mps)

			regex_search ='Banks:(.*) ', text, re.IGNORECASE)
			if regex_search:
				banks =
				banks = banks.replace(",","")

			regex_search ='Mobile service providers:(.*)\ ', text, re.IGNORECASE)
			if regex_search:
				mobile =
				mobile = mobile.replace(",","")

			regex_search ='Government services:(.*)\ ', text, re.IGNORECASE)
			if regex_search:
				govt =
				govt = govt.replace(",","")

			regex_search ='Others:(.*)\ ', text, re.IGNORECASE)
			if regex_search:
				others =
				others = others.replace(",","")

			regex_search ='Total:(.*)\ ', text, re.IGNORECASE)
			if regex_search:
				total =
				total = total.replace(",","")
			print str(bulletinbabu)
			except couchdb.http.ResourceConflict:
				print "Already exists"
		elif text.startswith("Top recipients of #SpeakForMe emails:"):

if __name__ == '__main__':
	#pass in the username of the account you want to download

Since CouchDB provides http restful access to data, there was no issue in pulling the data from the database using standard AJAX requests for plotting. Couple of days later #SpeakForMe team wanted to see how many emails were sent to MPs. So I asked them to post the aggregate analytics 1 they were collecting to my CouchDB. They started posting two types of documents. One aggregate at services level, second aggregates at individual receiver level. Posting would be a simple web POST using python requests for them. Just like posting to any webhook

import requests
couch_url = ""
data = {'stat': 'email_sent', 'total':2339 , 'campaign': '#SpeakForMe', 'mobile':198 , 'tw':941038005199998978 , 'govt':78 , 'mps': 1824, 'others':6 , 'banks': 233, '_id': u'2017-12-14T01:41:00+05:30'}
r =, json = data)

First document is similar to what I used to scrape. Second one is a bigger document. It has number of emails at the level of service provider or MP. Attribute "stat" differentiates the two types of document. _id which is primary key is just a standard time-stamp. As you can see in the partial "mailbox_email_sent" document below. The key has two parts "type of provider" and "provider name", separated by "/". For airtel it is "mobile/airtel" etc. For mps it starts with mp and then has state code and parliamentary constituency number, Eg: "mp/mh-47". Here is the copy of full document if you like to see.


On the client side its just static html and javascript. I used Parliamentary Constituencies Maps provided by Data{Meet} Community Maps Project. They are displayed using leaflet and d3. In fact I borrowed parts of code from DataMeet maps project. I use Lodash, to query, filter and manipulate the documents returned by CouchDB. For example

let all_rows = _.reverse(returned_data.rows);
//Filter emails sent
let rows =  _.filter(all_rows, function(o) { return o.doc.stat == "email_sent" && o.doc.campaign == "#SpeakForMe"});
let latest_row = _.last(rows);

let rows_mailbox_email_sent = _.filter(all_rows, function(o) { return o.doc.stat == "mailbox_email_sent" && o.doc.campaign == "#SpeakForMe"});
let latest_mailbox_email_sent = _.last(rows_mailbox_email_sent);

You can see the code that does everything here. I used Frappé Charts for charting. I love them. They are simple and look great.

Basically analytics data gets stored in a CouchDB and served as standard restful that CouchDB provides to browser. Running couchdb to receive external authenticated webhook post and then serve the data as restful service worked like a charm. Of course I didn't have much traffic to test under load. But since CouchDB is behind a CloudFront (Amazon CDN), I guess is it can take quite a bit of load. Usually the team pushes data every 5 minutes if there are updates. So it shows live status (see the last updated time stamp).

At some point I will create graph of email traffic (daily emails sent etc). Any other graphs you would like to see? I will be happy to answer any questions if you have.

  1. As you can see in the data json documents, only aggregates, no personal information