One of the primary jobs as a Data Archivist at DataMeet is to download and archive the data from the internet. Mostly from government websites. I usually use python scripts to download, scrape and clean the data. But sometimes, I just need to download many files and store them. I could still use python, but its an overkill. So here are some of the methods that I use.
Tagged: Open Data
As you would know, I scrape a lot of web pages as a Data Archivist at DataMeet. I usually use BS4 for this, and it’s beautiful, simple, and works. But often don’t want to write a python script to do that, and I need a simple tool to get data out of HTML.
UDISE+ has the most detailed datasets about all aspects of the elementary education system in India. It collects data from 15 Lakh+ schools from across India. It also has aggregated data at Educational Blocks and Educational districts. This post is about these educational blocks and districts. Unified District Information System for Education (UDISE)...