Mapping Local Government Directory to WikiData

The Indian government maintains the directory and hierarchy of local governments and administrative areas in India called the Local Government Directory (LGD). Recently I wanted to map them to WikiData items. This means I wanted to map the administrative areas in the local government to items on WikiData. Update them if necessary.

Sync between LGD and WikiData

As a first step, I downloaded the data from the LGD. It's painful but possible. Suppose you are okay with some old data. You can also use this git repository. I did load some of those sheets into SQLITE so it's easy for me to work on it and publish. My work in project repository is at

https://github.com/datameet/india-local-government-directory

In the first round, I wanted to match only states and districts. States was easy. Get the list of official valid states from WikiData. Match the label of WikiData with "StateName(InEnglish)" of the LGD. Do the same with UTs. If there are spelling differences, map them manually. There are 36 States/UTs in India.

SPARQL query to get the states of India (wd:Q12443800) is below. I am also getting property wdt:P5578 which 2011 Indian census code. It helps me in mapping.

SELECT DISTINCT ?S ?SLabel ?SDescription ?Indian_census_area_code__2011_ WHERE {
  # where s is "state of india" aka wd:Q12443800
  ?S wdt:P31 wd:Q12443800.
  # remove the ones with
  # S has the dissolved property, lets call it dt     
  FILTER(NOT EXISTS { ?S wdt:P576 ?dt. })
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  OPTIONAL { ?S wdt:P5578 ?Indian_census_area_code__2011_. }
}

SPARQL query to get the UTs of India(wd:Q467745) is below. I am also getting property wdt:P5578 which 2011 Indian census code. It helps me in mapping.

SELECT DISTINCT ?S ?SLabel ?SDescription ?Indian_census_area_code__2011_ WHERE {
 # where s is "union territory of India" aka wd:Q467745
  ?S wdt:P31 wd:Q467745.
  FILTER(NOT EXISTS { ?S wdt:P576 ?dt. })
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  OPTIONAL { ?S wdt:P5578 ?Indian_census_area_code__2011_. }
}

Matching the districts was not easy. Though I did follow the same procedure. First I got all the district of india(wd:Q1149652 ) from WikiData using the query below

SELECT DISTINCT ?S ?SLabel ?SDescription ?Indian_census_area_code__2011_ WHERE {
  ?S wdt:P31 wd:Q1149652.
  FILTER(NOT EXISTS { ?S wdt:P576 ?dt. })
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  OPTIONAL { ?S wdt:P5578 ?Indian_census_area_code__2011_. }
}

It returns the 744 items, Where as according to LGD there are only 735 districts. Wikipedia gives a completely different number. It says there are 741. So I had to figure the invalid districts. I did some string matching, SQL magic and came up this list.

WikiDataIdLabelDescriptionComments
Q955977South ArcotFormer district in Tamil Nadu, India Needs to be marked as dissolved in WikiData
Q1900496BangaloreFormer district in Karnataka, India Needs to be marked as dissolved in WikiData
Q1606061AndamanFormer district of the Andaman and Nicobar Islands Needs to be marked as dissolved in WikiData
Q24949801ShahbazwanDistrict of Bihar in India is this same as GOPALGANJ district? Marked by mistake in WikiData. Should be removed as a district.
Q6007135ImphalWikimedia disambiguation page is ex-district. Was split. Needs to be marked as dissolved in WikiData
Q48731903NoklakDistrict in India, NagalandNew district. LGD needs update. January 20, 2021.
Q61746013 NarayanapetDistrict of Telangana, India There seem to be a duplicate Narayanpet district (Q85787759); but Q61746013 was created earlier. DataCommons also uses the same. It also has
Q29025081East Karbi AnglongDistrict of Assam, India When KARBI ANGLONG was split. The western part became the new "West Karbi Anglong" and the rest remained part of "Karbi Anglong". There is no "East Karbi Anglong" as such. Should be removed in WikiData?
Q101088203Bajalidistrict of Assam India New district formed in 12 January 2021. LGD needs an update
DONT KNOW Vijayanagara district of Karnataka in IndiaNew district formed in 2020/21. Needs an addition to LGD. May be mark Q1611788 as district in WikiData?
DONT KNOW Chachaura district of mpMissing on LGD, WikiData and OSM. No gazette yet
DONT KNOW Maihar district of mpMissing on LGD, WikiData and OSM. No gazette yet
DONT KNOW Nagda district of mp Missing on LGD and WikiData. No gazette yet.
Q61439260Pakke-Kessang district of Arunachal Pradesh in IndiaIt was missing from WikiData query results. Because it was not tagged as district. I updated WikiData.
Changes to be made

Since Chachaura, Maihar and Nagda are not gazetted. They are not officially districts yet. So I have not added it yet. I have synced the rest in my DB. My list now has 738 districts, because I have added Noklak, Bajali, Vijayanagara to the LGD list. I will push the changes to WikiData once I get some confirmation by the community.

If everything is okay. Then I will update the WikiData. I will also update the WikiData with Census2011Code, Census2011Code, StateCode and DistrictCode.

Districts Table

I have deployed the SQLITE3 project as datasette project on Glitch. You can explore the tables states and districts there without downloading, or you can follow on Github.

Do let me know what do you think.

2 Responses

  1. Thejesh GN says:

    DataMeet discussions on this is happening here.

  1. May 31, 2021

    […] educational districts don’t match the administrative or revenue district boundaries. They also differ in numbers. So it’s essential to consider them as entirely different entities. Each […]