You could be committing crime by Scraping or Stumbling

For most of us who work with data, Scraping is the default way to get the data out of public internet. To give you an example what it means. Lets say you have a published URL like

which gives you the postboxes in the pincode area 571120. If you are smart enough you will replace 571120 by 560100 to get the postboxes for the area 560100. But I as the owner of OpenBangalore didn't want you to access 560100, so it was "hidden" and was not linked from anywhere. But then I was so lame and didn't password protect it. I published it and assumed no one else will get access to it. So when you accessed the page with 560100 with out my explicit permission and I assume you broke law.

This is exactly what happened to Weev, who accessed pages on AT&T servers which were published by AT&T and was hidden from public view. Weev like any curious hacker recognized the pattern in url and exposed the security flaw through Gawker Media after AT&T had been notified. Unfortunately AT&T is a huge company to take this embarrassment. Also the leak exposed 114,000 iPad users personal data, including those of celebrities, the government and the military. Hence they sued.

Like I expected yesterday on November 20, 2012, Weev was found guilty of one count of identity fraud and one count of conspiracy to get access to a computer (public web server) without authorization. It's the price one pays for embarrassing rich and powerful.

But the worst part is this can make many genuine actions on internet illegal. What if I am using StumbleUpon and land on one of those pages which I am not supposed to. Or if I am a data enthusiast and scrape pages of a published public website for data using simple url logic. Is there a chance of getting sued for these actions? I am sure of it.

End Note 1: I kind of expected Apple/Ipad owners to sue AT&T for negligence. Strange it didn't happen.

End Note 2: I here by let you to read, link and distribute this article. Also you are more than welcome to scrape OpenBangalore. In fact if you are interested we can work on that together.

End Note 3: Indian IT laws are not very different. I am sure the same would have happened in India.

1 Response

  1. JB says:

    Thanks for the info, Thej. Does similar restrictions apply to web crawlers and search enginges?

    What if Google gives 560100 link in your example as a search result or “sites similar to your search”