Archive any blog/website onto Web using archive-it

by Thejesh GN · October 2, 2014

They say Internet doesn't forget anything. But link decay is very common on the web. The things that are visible today on web vanish tomorrow. Links become inaccessible and hence linkrot. But there are services like internet way back machine or archive.today which backup the web pages to find the content for rotten links later. So I thought I would create a service which would backup (not in the traditional sense of backing up for recovery) the whole blog onto one of these services so future users could read the content even though the blog itself is dead. Bonus feature is you can see how your pages changed over time.

The service very simple. It's a python code which accepts a url which supports sitemap protocol. Then it goes through every link in the sitemap, submits it to the archival service of your choice. As of now it supports archive.org and archive.today. Gives you a report at the end. You can also go to the archival service provider and see if all the pages have been archived. For example you can check all the pages of thejeshgn.com archived on archive.today.

It's very simple to run

git clone https://github.com/thejeshgn/archive-it
$python archive.py -u https://thejeshgn.com/sitemap.xml -o output.txt -s archiveis

Go ahead backup your blog to web.

Archive any blog/website onto Web using archive-it

About

Blog Ring

Top Posts

Archives

Copyright and Disclosure