I maintain the readlist in a CouchDB database. Each feed (channel) is a document in that database. I use the file’s name as the primary key i.e “_id”. For example, “sri-lankas-economic-crisis.json” is the key to the Sri Lankan Economic Crisis reading list. It’s a single document. It has many feed items like any JSONFeed. The first few were easy to create and manage. But then I needed something simple to manage this if I was going to be serious about using it.
One of the primary jobs as a Data Archivist at DataMeet is to download and archive the data from the internet. Mostly from government websites. I usually use python scripts to download, scrape and clean the data. But sometimes, I just need to download many files and store them. I could still use python, but its an overkill. So here are some of the methods that I use.
The most common use of Jinja2 is in web applications, where it is used to create HTML files from template files. But I have used it outside web applications too.
Email is one of those protocols that still works. It reaches me where I am. It also has various clients and libraries that work reasonably well. And it can carry almost any kind of content (Of course, there are limitations attachment size and type, etc). I use it every day, including emailing to myself.
I wanted to write 100 posts in 2021, and I am nowhere close to that. I tried to look at the posts by year and see how I have performed over the years. Of course, I could have done that manually by looking at the year archive count or running a query on the database. But recently, I have started using Xidel, so why not use it? :)
As you would know, I scrape a lot of web pages as a Data Archivist at DataMeet. I usually use BS4 for this, and it’s beautiful, simple, and works. But often don’t want to write a python script to do that, and I need a simple tool to get data out of HTML.