opendata.json – Format for making Open Data Discoverable

It's a pain to search for Open Data on the web.

I publish quite a bit of data. As you could see on the OpenBangalore. The data is in different formats and is at different urls. There is no easy way to find it other than going through the list. You can't find it by source, copyright information, contact information etc.

It's difficult for me to find my own data. Google search helps only to certain extent. But ultimately it depends on human search capability or going through the catalog or listings. It shouldnt be that difficult. Also keeping the centralized catalog up to date is hard and doesn't scale.

There are ways this has been handled on the web. XML sitemaps is one example. Where in the site map gives a list of page (with other data) to a search engine to crawl and build an index. Another example is API JSON, where in API creator will publish the information about his APIs in the form of api.json and aggregators can use it aggregate. Both are similar and proven models. Why not use the same model for open data?

So please welcome
opendata.json - Format for making your open data discoverable.

What I am working on?

  1. Format specification: The above one is an example for opendata.json. I am yet to write a detailed specification. I have started it. I will keep you updated on it. Writing example opendata.jsons allows me to face all the challenges that end-user (publisher) will face. That helps me in writing specification.
  2. Aggregator/Search engine: A FOSS based Aggregator/Search engine. I will implement ping and search functionality as part of v0.1,

Why not RDF?
- I wanted it to be extremely simple and developer friendly. JSON i thought was the best format.

How about metadata of actual data?
- Its much more complicated. Probably we can have a metadata in a separate json file to simplify and decouple them. I am not working on it now. Let me know if you are interested in it.

I am not a developer or publisher of data, how does it help me?
- You can find open data much more easily. You are an indirect user of this protocol, but end goal is to make your search easy.

What about time?
- Time is represented as ISO 8601 format in our case. The full format is YYYY-MM-DDTHH:MM:SS+ZZ:ZZ. But you can always specify only the period which makes sense. For example only YYYY-MM for a month. or YYYY for an year. That said you can have just MM, it has to be YYYY-MM. You get the idea?

What about location?
- I know its important, Just like time, location is an important dimension for the data on which user would want to search. But I have not thought through it. I will keep you updated on it. If you have ideas, please share.

Can a shape file be discoverable through it?
- Yes, though I would like to see more open formats in the "format" tag. We can't avoid shape files as yet.

We don't publish static data but we have an API, so what do we do?
- I am not sure yet. But how about, format="api" and url is a pointer to api.json format. That way we wont be inventing anything new but also will be supporting open data apis. But for sure we need to think about this.

Is the specification commercial friendly?
- Yes. The license for the opendata.json is distributed with the same license as, which is Creative Commons ShareAlike. I know all most all commercial and non commercial sites use sitemaps.

What's the timeline?
- Planning to complete the v0.1 speficiation document before 15 december. Then will talk to some of the data publishers to see if they can adopt. By the end of january/2015 I will have alpha version of the search engine out. If you are a data publisher, let me know, I would love to discuss.

Will it be your individual effort?
- As of now yes, but I would like to have collaborators. Email me. I will also publish it on Data{Meet}. As I sense people on Data{Meet} will be indirectly using this protocol.

How does it look?
- opendata.json is a valid json. Here is an example for weather data on openbangalore.

How can I help?
- Help me in writing specification. Help me by implementing it if you are a publisher. Help me in writing the intial search engine. Everything we write will be community/commercial friendly.