Thejesh GN

A Blog, A Website and A container for all my views with excerpts from technology, travel, films, india, photography, kannada, friends and other interests. I am Thejesh GN. Friends call me Thej

Personal Notes from Open DataCamp Bangalore

Posted by Thejesh GN On April - 6 - 2012ADD COMMENTS

When we planned for the Open DataCamp, we never expected to attract such a big gathering of interesting people. The venue we picked could accommodate hundred participants. We didn’t expect more that on a long weekend in any way.
But on the last day we had around 200 participants on the list. I was hoping for the worst :) I woke up at six and was at Google by eight. Mostly because I couldn’t sleep.

As usual we started with arranging tables and setting up the projector. I haven’t seen one conference where projectors weren’t a PITA. It took more time than I expected. Surprisingly enough my Ubuntu was the most easiest to work with projector and next best was Mac.

By 9.45 we had more people than I expected. Main hall was full and people were standing. We had around 140+ participants already. They were very comfortable in having conversations in corridor or making a place for themselves to sit. Thats the best thing that could happen to us. So the number didn’t worry me after that.

We started at sharp 10 am with me introducing to the concept of BarCamp and the day, followed by panel discussion.

From then on it was a smooth ride. I didn’t have to do other than time management.

I liked all the morning session talks. I could not attend most of the noon sessions. Among the ones I attended Anand’s talk on Pictures through Numbers

and Shekhar’s Open Data & Free Maps are my favorites.

I spend most of my afternoon either tweeting or scheduling or in conversations.I couldn’t do my session, may be next datacamp.

This camp was marginally different from the regular Barcamps. Morning sessions were done in a single hall. And it was curated to keep the audience interested. After noon sessions were in three halls + corridor. This time due to lack of time I had to do the curation of talks all by myself. But I would settle for standard barcamp way next time. Never the less most participants liked my curation. So I am happy.

By six we were done. But the conversation in the corridor and bar continued till eight. I reached home by 10 dead tired. A day well spent.

A big thanks to my co-organizer Nisha and all the volunteers from DataMeet. It would not have been possible without them. Thanks to Meera for all the pictures. They are on Flickr. All videos were shot and edited by our friends at HasGeek. And at last thanks to all the sponsors (Google, MSR, IWP, Gramener, Akshara, CIS and HasGeek) for working with us and trusting us. Now I am waiting for the next DataCamp.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

First Open DataCamp is here

Posted by Thejesh GN On February - 29 - 2012ADD COMMENTS

Open DataCamp is a one day unconference for people working with data from various sectors to come together and share their projects and ideas. The first one is scheduled on this March 24th, in Bangalore. Google has very graciously agreed to host us in their cafeteria. We are still working on the details of the sessions. I am sure there will be enough sessions for both technical and non-technical attendees. It will most probably be structured like a workshop in the morning half and like a barcamp in the second half. And hence the event page is still under construction. Thanks to VSR for the design. The site code and other things are in public domain, and are at bitbucket. Any help is welcome.

Yesterday, I sent my first invitation mail to datameet group.

We are writing to invite you to the 1st Open Data Camp in India on Saturday, March 24, 2012, at Bangalore.

This event is dedicated to all aspects of open data, from working with data, to getting it, and of course how to use it to create impact. This event is being organized by DataMeet, an online group of data enthusiast who hope to use data to create an impact in the lives of people living and working in India.

This Open Data Camp would bring together all the main development sector actors working with/on open data. Some of them include Nonprofit Organizations like India Water Portal, Akshara FoundationAccountability InitiativeAzim Premji University and PRS Legislative;
Also, the Indian Government, has just passed the National Data Sharing and Accessibility Policy - essentially the Open Data Policy for India. The policy itself is rotten and nothing to do with ‘open’ and ‘sharing’; but considering that before this policy all data in India was part of the ‘Official Secrets Act’, this is no small gain.

Policy and Advocacy Groups like Centre for Internet and SocietyTactical Tech Collective, and of course the Government of India;

and, the broader interest group that include Technologists, and Technology Companies (from Google and Gramener), designers and Design Schools, Journalists and Media Groups.

I hope you can attend the event. Registration is compulsory to attend the event.   Please register at doattend.com

You can find more about the event @ http://odc.datameet.org    — it’s still under progress. I will keep updating it.

If you have any questions feel free to contact me at any time.

Thanks,
Thej, Nisha and Team.

Please do register.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Octopoda – MapReduce for Human Beings in Python

Posted by Thejesh GN On February - 21 - 2012ADD COMMENTS

I have been wanting to learn MapReduce for a long time. I never got a requirement where I could use it. Last few weeks I have dabbling with huge datasets. It was time and as usual I started with wikipedia.

There are huge systems and frameworks built on the concept of MapReduce. They use distributed filesystem, have fault tolerance and can process petabytes of data. But I wanted something simple. I wanted something that’s minimalistic and does everything that a MapReduce framework should do and is written in Python.

“Map” : The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes.

“Reduce” : The master node then collects the answers to all the sub-problems and combines them in some way to form the output.

I found MinceMeatPy and Octo.py. Both are single python file MapReduce frameworks. mincemeatpy is actively developed, where as last checkin to octo.py was probably in 2008.

I thought the best way to learn the concept is to write the framework that implements it. But then reinventing the wheel is waste of everybody’s time. So I choose the middle ground and forked Octo.py and called it Octopoda.

I removed lot of code and in turn made it simple and inflexible. Added simple auth, added some examples, created a wiki and road map and how could I forget ASCII art :)

============================================================
        _____                                  _       
       / ___ \       _                        | |      
      | |   | | ____| |_  ___  ____   ___   _ | | ____ 
      | |   | |/ ___)  _)/ _ \|  _ \ / _ \ / || |/ _  |
      | |___| ( (___| |_| |_| | | | | |_| ( (_| ( ( | |
       \_____/ \____)\___)___/| ||_/ \___/ \____|\_||_|
                  MapReduce for HumanBeings
          Repo: http://code.thejeshgn.com/octopoda
============================================================

I am now working on channel encryption. I need help. The project is hosted on bitbucket. Go ahead and fork and send me pull request with your changes.

A standard MapReduce example is counting words.

#wordCount.py
source = {1:"Humpty Dumpty sat on a wall", 
2:"Humpty Dumpty had a great fall", 
3:"All the King's horses and all the King's men",
 4:"Couldn't put Humpty together again" }

def final(key, value):
    print key, value

# client
def mapfn(key, value):
    for w in value.split():
        yield w, 1

def reducefn(key, value):
    result = 0
    for v in value:
        result += v
    return result

On server:
$ python octopoda.py server ./examples/wordCount.py

On client or nodes:
$ python octopoda.py client localhost_or_server_ip

You can start as many clients as you want. Server will handle task distribution and aggregation. I know this is an overly simplistic example. With a little modification the same example can be made to calculate the word count from all the files in a directory. I will write about that in my next post. Until then have fun.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Deployments are usually very painful. We generally write scripts to make it automated as much as possible. I wanted my method to be as easy as running single line on command prompt from anywhere in the world. I didn’t want to worry about anything. At last with some experimentation I have found my way. This blog talks about the deployment of php/python applications using mercurial as code repo and fabfile. Well you actually can use any scripting format instead of fabfile. But fabile makes it easy to log into a remote machine and perform tasks. Also the scripting language of fabfile is python. This gives a lot of flexibility to customize and I dont have to learn anything new.

This process is inspired by Heroku git deployment feature. This tutorial works with hg, git and mostly with any other DVCS with minor alteration. It has two major steps
STEP 1: On your *nix Server

  1. Install mercurial on your server – it should be easy
  2. Setup SSH access to mercurial repository
    Your server should be able to login to code repository and pull the latest code. Its easier to use SSH than passwords.
    On your server machine:

    1. Open terminal
    2. Enter ssh-keygen
    3. Give a name or you can use the default name id_rsa
    4. When it asks “Enter passphrase (empty for no passphrase):” press enter. No Password
    5. Once the key generation is complete. You can verify the same using ls -a ~/.ssh
    6. Add this new identity to SSH agent ssh-add ~/.ssh/id_rsa
    7. Now we need to add this public key to bitbucket or any other provider cat ~/.ssh/id_rsa.pub. Copy the output
    8. For bitbucket, go to account -> SSH keys, add the above output to your ssh keys

    Now your server is set to access your repositories with out the need of password.

  3. Now we need to setup the Hg repository inside web accessible directory. For example, your web accessible folder could be
    /home/user_home/public_html
    or /var/www/html
    or in case of phython it can be anywhere /home/user_home/my_project_code
  4. To deploy php application tweet4blood, clone the repo inside the directory using ssh url

    cd /home/user_home/public_html
    hg clone ssh://hg@bitbucket.org/thejeshgn/tweet4blood tweet4blood.com
  5. Make sure to make the .hg folder (actual repository) inaccessible to the webserver either by .htaccess rule or changing the permissions etc

STEP 2: On your desktop

  1. Install fabfile or fabric. On ubuntu search for fabric in Synaptic Package Manager
  2. Create your fabfile, you can find the latest version of below example fabfile in my snippets project
  3. To call any method in fabfile

    $ cd /home/thej/my_deployment_scripts/tweet4blood/
    $ fab hello
  4. Fabric can also chain the calls

    $ cd /home/thej/my_deployment_scripts/tweet4blood/
    $ fab test hello

    Here it calls the test method first which sets the env variables and then calls hello
  5. None of the env variables are necessary but providing env.user, env.hosts, env.password will avoid typing them everytime
  6. BTW env.user, env.hosts, env.password are that of SERVER machine
  7. To deploy the latest version to test

    $ cd /home/thej/my_deployment_scripts/tweet4blood/
    $ fab test deploy:tip

    In this case,

    • test method sets the env variables corresponding to TEST env
    • test method also sets application env specific consumer_key which later we will use to setup config.php, similarly you can use define databas_name, database_user_name etc
    • then as per chain deploy method is called with input variable version whose value now is “tip”
    • inside deploy the first call is cd (change directory) on remote server
    • at this point fab logs into the remote server using env.user, env.hosts, env.password
    • then control goes to the repo directory
    • runs hg pull which gets everything from bitbucket
    • runs hg update -C tip which is clean update to “tip” version
    • then CDs into auth folder
    • uses the Linux sid command to replace the env specific values in config.php
  8. To deploy any other version to test. I usually tag the versions, so I will pass the tag name

    $ cd /home/thej/my_deployment_scripts/tweet4blood/
    $ fab test deploy:v.0.1.0
  9. In case you want to shutdown the apache before the deployment and restart later, you can chain them too
    $ cd /home/thej/my_deployment_scripts/tweet4blood/
    $ fab test apache_stop deploy:tip apache_start

STEP 3: Go deploy
Below I have embedded a rough version of fabfile.py for quick refrence. But as I told you can find the latest version of the same in my snippets.

Questions and suggestions are welcome.
Read the rest of this entry »

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Audio Boooks and Pain of DRM

Posted by Thejesh GN On January - 19 - 2012ADD COMMENTS

Since last december I have been listening to books than reading. It makes my daily commute enjoyable. The last three books have been Steve Jobs, Art of Inception and Ghost in the Wires. That’s three books in one and half months, not bad. I used to listen to lot of podcasts while driving. I have reduced podcast listening and shifted to audio books.

I have a platinum account on Audible and I get 2 books for month in that plan. It works great except its DRMed and wont work on Linux. I need to figure out a way to export the audio so I can save them for later use. Audible allows backing up to the CDs through. CDs for heaven’s sake, who uses CDs anymore for backing up? And itunes is a PITA.

I am looking for good software which can undrm the books for me for my own personal usage. I found this list of tools but not sure yet. I would love to pay for the software. HELP.

This such a good example of DRM troubling god fearing, paying and non-pirate customer. All I want to do is backup what I have bought and it shouldnt be such a pain to a paid customer.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Get in touch