SQLite – Open Format for Open Data
I scraped Gujarat rainfall data recently. The end result was 6 lakh rows of structured data. I released the data as an SQLite file instead of say a bunch of CSV or excel files. It kind of surprised some of my friends. But I think it’s a great file format for releasing open data.
- Database contained in a single file
- Easy to send, store and share. It’s one file after all
- Like any file, you can compress and encrypt it using standard ways
- Control read/write using file read/write permissions
- Size of the database depends on the size of the file which is virtually unlimited by hardware
- Simple UI-based software available to explore. Just the way you open .xls files with Excel, you can open .sqlite files with softwares like DB Browser for SQLite. It works on Windows, Mac and Linux.
- Apps available for Android and IOS
- Both UI and command line tools available for Linux, MAC and Windows
- Browser-based tools available like SQLite Manager for Firefox
- As fast as your computer's read/write speed
- Supports standard SQL
- Supports standard SQL data formats
- Supports export and import from other formats (like CSV, TSV)
- Standard libraries available for all programming languages
- Transparent encryption support through plugins
- GIS support though plugins
- Above all it’s a public domain software; you can do whatever you want to do with it. It’s going to stay here forever
- SQLite database file format is open
- Above all SQLite just works
So from here on my default file format for Open Data is SQLite.
I believe Firefox internally use SQLite ..if I am not grossly wrong…correct me.
Both Firefox and Chrome use SQLite internally.
Okies,
Firstly SQLite is unbeatable. Secondly I think its great to release it in that format too. Because you can get any format out of sqlite.
I have had great experiences with SQLite. Especially when I have to deal with large XML’s. I sort of never understood XML as a data exchange format. Its a pain to deal with XML parsers because some thing or the other is always broken with parsers in nearly every language that I have seen. Its immensely difficult to query them with standard tool like awk/sed/perl and often the data size makes it difficult to work with.
It makes a great tool for analytics for all kinds. My highest I’ve pushed so far is 15 million records imported straight from an XML. The database size was around 3 GB. With 15 million records my queries were a little slow. The performance degraded, but I’m told usage of rtree’s (http://www.sqlite.org/rtree.html) can help in increasing the performance by a great deal.
But I’ve also read of(In Hacker news) db file sizes hitting upwards some 10’s GB’s without much performance degradation.
Lastly what I need more is some text processing goodies. These days I have to get data into sqlite and use heavy regular expressions through perl for mining data out of fields.
sqlite can be a great tool in time to come.