Building a Web API around CLI Tool
So It's in my nature to convert everything to a web-service. The biggest reason is that you can call a web service from anywhere, and it's easy to share the processing power and logic with anyone who has access to a web browser or curl and nothing else.
I scrape a lot, and but sometimes only for a single number. And usually, I track that number over time. An example could be the total number of vaccinations given in India or subscribers to my blog etc.
So basically, the flow includes downloading a page and running a CSS selector or an XPATH selector, storing it into a database.
Usually, I will run the xidel and pipe the output to a curl to insert a record into a CouchDB database.
But it should be easy to convert into a web service to run a remote machine ( I miss webnumbr). But I didn't want to write that service. So I was looking for a simple hack that I could use to convert a CLI into a web-service. There are many options available, but I found beefsack/webify to be the most simple and useful one. So I converted xidel into a web-service by running.
# This will start a server on port 8080
webify bash -c "xargs xidel"
And then, you can send the parameters to the xidel web-service as data in a POST request, and the service should answer you back with the answer. Here I am trying to get the total number of vaccinations in India from the service.
curl --request POST \
--url http://localhost:8080/ \
--header 'Content-Type: text/plain' \
--data 'https://www.mygov.in/covid-19 --xpath="//div[@class='\''total-vcount'\'']/strong/text()" --silent --output-format=json-wrapped'
When I run it. I get the answer.
thej@uma:~/Downloads$ curl --request POST \
> --url http://localhost:8080/ \
> --header 'Content-Type: text/plain' \
> --data 'https://www.mygov.in/covid-19 --xpath="//div[@class='\''total-vcount'\'']/strong/text()" --silent --output-format=json-wrapped'
[
"1,15,79,69,274"
]
Another example where I am getting the total number of pages on English Wikipedia
thej@uma:~/Downloads$ curl --request POST \
> --url http://localhost:8080/ \
> --header 'Content-Type: text/plain' \
> --data 'https://en.wikipedia.org/wiki/Special:Statistics --xpath="//tr[@class='\''mw-statistics-pages'\'']/td[2]" --silent --output-format=json-wrapped'
[
"54,656,907"
]
I have not connected this to CouchDB (for saving data) yet. I think this service is useful and practical as in independent, stateless service.
Currently, it's running inside a docker container. It's not a public webservice as I am not sure about its security, and also, I am worried about its abuse. But it's an excellent private service that I use daily for getting numbers from the web.
There are many scenarios where this can come in handy. An Android widget showing the number live or an application similar to webnumbr, or you want to get a part of a webpage without any hassle or a simple replacement to good old webnumbr.