One of the primary jobs as a Data Archivist at DataMeet is to download and archive the data from the internet. Mostly from government websites. I usually use python scripts to download, scrape and clean the data. But sometimes, I just need to download many files and store them. I could still use python, but its an overkill. So here are some of the methods that I use.
The most common use of Jinja2 is in web applications, where it is used to create HTML files from template files. But I have used it outside web applications too.
I wanted to write 100 posts in 2021, and I am nowhere close to that. I tried to look at the posts by year and see how I have performed over the years. Of course, I could have done that manually by looking at the year archive count or running a query on the database. But recently, I have started using Xidel, so why not use it? :)
So It’s in my nature to convert everything to a web-service. The biggest reason is that you can call a web service from anywhere, and it’s easy to share the processing power and logic with anyone who has access to a web browser or curl and nothing else.
As you would know, I scrape a lot of web pages as a Data Archivist at DataMeet. I usually use BS4 for this, and it’s beautiful, simple, and works. But often don’t want to write a python script to do that, and I need a simple tool to get data out of HTML.