Archive any blog/website onto Web using archive-it

They say Internet doesn't forget anything. But link decay is very common on the web. The things that are visible today on web vanish tomorrow. Links become inaccessible and hence linkrot. But there are services like internet way back machine or archive.today which backup the web pages to find the content for rotten links later. So I thought I would create a service which would backup (not in the traditional sense of backing up for recovery) the whole blog onto one of these services so future users could read the content even though the blog itself is dead. Bonus feature is you can see how your pages changed over time.

The service very simple. It's a python code which accepts a url which supports sitemap protocol. Then it goes through every link in the sitemap, submits it to the archival service of your choice. As of now it supports archive.org and archive.today. Gives you a report at the end. You can also go to the archival service provider and see if all the pages have been archived. For example you can check all the pages of thejeshgn.com archived on archive.today.

archive_today

It's very simple to run

git clone https://github.com/thejeshgn/archive-it
$python archive.py -u https://thejeshgn.com/sitemap.xml -o output.txt -s archiveis

Go ahead backup your blog to web.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.