Learning SPARQL with Wikidata
SPARQL has a lot of similarities to SQL and also is very different. Here I am going to get the English works of Rabindranath Tagore from Wikidata. I will use this as an exercise to learn SPARQL. As you will see, I have not tried to explain the terms here as I expect some knowledge of SQL and Semantic Triples. Since I go step by step, I think it's easy to understand.
This blog post is in continuation to my other blog posts about Learning about Semantic Web and Triple, Trying WikiData.
Table of Contents
Let start with getting all the works from Wikidata. You can run the queries on Wikidata's query interface here. Paste the query and press play button to execute.
SELECT and LIMIT
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?work ?author
WHERE {
?work wdt:P50 ?author .
}
LIMIT 10
Here ?work ?author are variables. Here are we are trying to get any ?work that has an ?author. So the statement
?work wdt:P50 ?author
is like any other semantic triple.
subject a predicate and an object
But make sure it ends with a period aka .
wdt:P50
denotes author property. And we are LIMITing the result set to 10. Since we are just exploring.
FILTER
So our query lists anything that has author property. But now what if we want to get just the works of Rabindranath Tagore. We can do that by comparing the author property to Rabindranath Tagore. WikidataId of is Rabindranath Tagore - Q7241
We can add a FILER to filter out
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?work ?author
WHERE {
?work wdt:P50 ?author .
FILTER ( ?author = wd:Q7241 )
}
VALUES
Another way is to use VALUEs keyword. It assigns a value to a variable so it beccomes easy to substitute. So the query gets simpler
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?work ?author
WHERE {
?work wdt:P50 ?author .
VALUES ( ?author ) {
( wd:Q7241 )
}
}
Now let's go one step further and get only English works. That's done based on the attribute of the work, Language. The attribute is wdt:P407
. And the Wikidata entity for English language is wd:Q1860
. So we can filter it.
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?work ?author ?pubLang
WHERE {
VALUES ( ?author ?pubLang) {
( wd:Q7241 wd:Q1860)
}
?work wdt:P50 ?author .
?work wdt:P407 ?pubLang .
}
Let's get the title of the work.
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?work ?title ?pubLang ?pubDate
WHERE {
VALUES ( ?author ?pubLang ) {
( wd:Q7241 wd:Q1860 )
}
?work wdt:P50 ?author .
?work wdt:P1476 ?title .
?work wdt:P407 ?pubLang .
?work wdt::P577 ?pubDate .
}
OPTIONAL
Now let's get published date wdt:P577
. But lets make it optional. Like an outer join. Include even if the property does't exist or have a value.
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?work ?title ?pubLang ?pubDate
WHERE {
VALUES ( ?author ?pubLang ) {
( wd:Q7241 wd:Q1860 )
}
?work wdt:P50 ?author .
?work wdt:P1476 ?title .
?work wdt:P407 ?pubLang .
OPTIONAL { ?work wdt:P577 ?pubDate } .
}
Similarly get the label of the language. In wikidata the labels can exist in many languages. So we are going to filter it only for English language labels
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?work ?title ?pubLang ?pubDate ?langauge
WHERE {
VALUES ( ?author ?pubLang ) {
( wd:Q7241 wd:Q1860 )
}
?work wdt:P50 ?author .
?work wdt:P1476 ?title .
?work wdt:P407 ?pubLang .
OPTIONAL { ?pubLang rdfs:label ?langauge } .
OPTIONAL { ?work wdt:P577 ?pubDate } .
FILTER(LANG(?langauge) = 'en')
}
ORDER BY
And then sort by ?pubDate in descending order.
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?work ?title ?pubLang ?pubDate ?langauge
WHERE {
VALUES ( ?author ?pubLang ) {
( wd:Q7241 wd:Q1860 )
}
?work wdt:P50 ?author .
?work wdt:P1476 ?title .
?work wdt:P407 ?pubLang .
OPTIONAL { ?pubLang rdfs:label ?langauge } .
OPTIONAL { ?work wdt:P577 ?pubDate } .
FILTER(LANG(?langauge) = 'en')
}
ORDER BY DESC(?pubDate)
So when we run that query on Wikidata. We get a table and here is how it looks. (I have actually embedded the results iframe from Wikidata.
I will try and include other advanced features in the next part. What do you think? Was this was useful?