sparql-gremlin: how to achieve the same goal with java? - sparql

I'm working on sparql-gremlin with Java program. Currently, I can do following query in gremlin command-line.
gremlin> g.sparql("SELECT ?name ?age WHERE { ?person v:name ?name . ?person v:age ?age } ORDER BY ASC(?age)")
==>[name:vadas,age:27]
==>[name:marko,age:29]
==>[name:josh,age:32]
==>[name:peter,age:35]
However, how can I do the same query with java? I tried following code, but received wrong result.
Graph graph = TinkerFactory.createModern() ;
try {
SparqlTraversalSource g = graph.traversal(SparqlTraversalSource.class) ;
GraphTraversal iter = g.sparql("SELECT ?name ?age WHERE { ?person v:name ?name . ?person v:age ?age } ORDER BY ASC(?age)").V() ;
while (iter.hasNext()) {
Vertex v = (Vertex) iter.next();
System.out.println(v.id().toString() + ", " + v.property("name") + ", " + v.label() + ", " + v.property("age")) ;
}
} catch (Exception e) {
e.printStackTrace();
}
Received result is:
log4j:WARN No appenders could be found for logger (org.apache.jena.util.FileManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
1, vp[name->marko], person, vp[age->29]
2, vp[name->vadas], person, vp[age->27]
3, vp[name->lop], software, vp[empty]
4, vp[name->josh], person, vp[age->32]
5, vp[name->ripple], software, vp[empty]
6, vp[name->peter], person, vp[age->35]
1, vp[name->marko], person, vp[age->29]
2, vp[name->vadas], person, vp[age->27]
3, vp[name->lop], software, vp[empty]
4, vp[name->josh], person, vp[age->32]
5, vp[name->ripple], software, vp[empty]
6, vp[name->peter], person, vp[age->35]
1, vp[name->marko], person, vp[age->29]
2, vp[name->vadas], person, vp[age->27]
3, vp[name->lop], software, vp[empty]
4, vp[name->josh], person, vp[age->32]
5, vp[name->ripple], software, vp[empty]
6, vp[name->peter], person, vp[age->35]
1, vp[name->marko], person, vp[age->29]
2, vp[name->vadas], person, vp[age->27]
3, vp[name->lop], software, vp[empty]
4, vp[name->josh], person, vp[age->32]
5, vp[name->ripple], software, vp[empty]
6, vp[name->peter], person, vp[age->35]

I got the answer by myself. I should call fill() after sparql() invocation.

Related

Filter nested list with sub list of elements

In below code , I am trying to filter masterObject list having sublist values of 2 or 3 or 4 . I am not able to filter the list even with single element. Can you guys help me in pointing out what lamba function needs to be used to get expectedList as output
fun main() {
data class ChildObject(var id: Int, var subList: List<Int>)
data class MasterObject(var identifier: Int, var listObject: List<ChildObject>)
val initialList = arrayListOf<MasterObject>(
MasterObject(100, arrayListOf(ChildObject(101, arrayListOf(10, 2, 13)), ChildObject(102, arrayListOf(14, 15, 6)), ChildObject(103, arrayListOf(17, 20, 9)))),
MasterObject(200, arrayListOf(ChildObject(201, arrayListOf(11, 40, 6)), ChildObject(202, arrayListOf(4, 5, 20)), ChildObject(203, arrayListOf(7, 13, 9)))),
MasterObject(300, arrayListOf(ChildObject(301, arrayListOf(1, 2, 30)), ChildObject(302, arrayListOf(4, 15, 60)), ChildObject(303, arrayListOf(7, 20, 90)))))
/*actual goal is to print final list of master objects containing any of (2,3,4) elements in listobject. for now I am stuck with filtering single element */
val expectedList = arrayListOf<MasterObject>(
MasterObject(100, arrayListOf(ChildObject(101, arrayListOf(10, 2, 13)))),
MasterObject(200, arrayListOf(ChildObject(202, arrayListOf(4, 5, 20)))),
MasterObject(300, arrayListOf(ChildObject(301, arrayListOf(1, 2, 30)),ChildObject(302, arrayListOf(4, 15, 60)))))
/*not able apply 2 filters and resulting in error*/
val finalListWithFilter = initialList.filter { masterObject ->
masterObject.listObject.filter { childObject -> childObject.subList.contains(2) }
}
/* prints entire childObject list for any or doesnt print any list for all*/
val finalListWithAny = initialList.filter { masterObject ->
masterObject.listObject.any { childObject ->
childObject.subList.contains(2)
}
}
/*prints list childObject but I need list of masterObject*/
val finalListWithMap = initialList.flatMap { masterObject ->
masterObject.listObject.filter { childObject ->
childObject.subList.contains(2)
}
}
println(finalListWithAny)
print(finalListWithMap)
}
Thanks
Your assumption of how the filtering works is wrong in this case. Filtering just helps you to narrow down the list of objects. However the contents (the objects themselves) will always look the same regardless of what you filter. This is also the reason why filtering for child elements with id 2, 3 and 4 will basically result in the input list, because each master object has such an element.
As your data class does not contain mutable lists, I assume that a copy of master objects is ok and therefore the following could be a working solution for you:
val childrenMatchingIds = listOf(2, 3, 4) // the child id's you are interested in
val result = initialList.mapNotNull { master -> // [1]
master.listObject.filter { child -> childrenMatchingIds.any(child.subList::contains) } // [2]
.takeUnless { it.isEmpty() } // [3]
?.let { // [4]
master.copy(listObject = it) // [5]
}
}
mapNotNull ensures that we skip those master objects later where no children match
filter will return us those children where any child is matching the id (therefore any within the filter). Note: this list is detached from the original master object. However its contents are valid references to the same objects of the master object's listObject-list.
here we ensure that if this list is empty, we will continue with a null instead (see scope functions takeIf / takeUnless)
which we then choose to ignore if it is null (see safe calls ?.)
finally we copy (see Data classes - Copying) the old infos to a completely new (detached - i.e. not the same anymore, but may contain same references) master object, which contains all the details the old one contained, whose references in the listObject are the same as those from the old master object, but is filtered.
Back to point 1 & 4: as we filtered out empty children (therefore getting null from the takeIf), the mapNotNull will filter out those master objects that do not contain our children.
Feel free to ask clarifying questions if anything is still unclear.
And yes: there are probably also a dozen of other variants how you could accomplish this. As long as you require a filtered representation of the master object, you will however always require a new master object for that. If your lists were mutable and you rather wanted to remove elements, the solution would be completely different (with all its pros and cons).
This works:
/**
* You can edit, run, and share this code.
* play.kotlinlang.org
*/
fun main() {
println("Hello, world!!!")
val list = listOf(
mapOf( "id" to 100, "name" to "xyz", "sublist" to listOf(2, 3, 4 )),
mapOf( "id" to 101, "name" to "abc", "sublist" to listOf(1, 5, 10 )),
mapOf( "id" to 102, "name" to "qwerty", "sublist" to listOf(2, 6, 12 ))
)
val result = list.filter {
(it["sublist"] as List<Int>).any { it == 2 }
}
println(result)
data class ChildObject(var id: Int, var subList: List<Int>)
data class MasterObject(var identifier: Int, var listObject: List<ChildObject>)
val initialList = arrayListOf<MasterObject>(
MasterObject(100, arrayListOf(ChildObject(101, arrayListOf(1, 2, 3)), ChildObject(102, arrayListOf(4, 5, 6)), ChildObject(103, arrayListOf(7, 2, 9)))),
MasterObject(200, arrayListOf(ChildObject(201, arrayListOf(1, 4, 6)), ChildObject(202, arrayListOf(4, 5, 2)), ChildObject(203, arrayListOf(7, 3, 9)))),
MasterObject(300, arrayListOf(ChildObject(301, arrayListOf(1, 2, 3)), ChildObject(302, arrayListOf(4, 5, 6)), ChildObject(303, arrayListOf(7, 2, 9)))))
/*actual goal is to print final list of master objects containing any of (2,3,4) elements in listobject. for now I am stuck with filtering single element */
val expectedList = arrayListOf<MasterObject>(
MasterObject(100, arrayListOf(ChildObject(101, arrayListOf(1, 2, 3)), ChildObject(103, arrayListOf(7, 2, 9)))),
MasterObject(200, arrayListOf(ChildObject(202, arrayListOf(4, 5, 2)))),
MasterObject(300, arrayListOf(ChildObject(301, arrayListOf(1, 4, 3)))))
var resultList = initialList.filter { master ->
master.listObject.filter { child ->
child.subList.any { it == 2 } || child.subList.any { it == 3 } || child.subList.any { it == 4 }
}.size > 0
}
print(resultList)
}
Output:
Hello, world!!!
[{id=100, name=xyz, sublist=[2, 3, 4]}, {id=102, name=qwerty, sublist=[2, 6, 12]}]
[MasterObject(identifier=100, listObject=[ChildObject(id=101, subList=[1, 2, 3]), ChildObject(id=102, subList=[4, 5, 6]), ChildObject(id=103, subList=[7, 2, 9])]), MasterObject(identifier=200, listObject=[ChildObject(id=201, subList=[1, 4, 6]), ChildObject(id=202, subList=[4, 5, 2]), ChildObject(id=203, subList=[7, 3, 9])]), MasterObject(identifier=300, listObject=[ChildObject(id=301, subList=[1, 2, 3]), ChildObject(id=302, subList=[4, 5, 6]), ChildObject(id=303, subList=[7, 2, 9])])]

Exact match of variable string in SPARQL Wikidata Query Service

Exact match of variable string in SPARQL Wikidata Query Service at https://query.wikidata.org does not give the the results I expected.
I was expecting I could do:
SELECT * {
hint:Query hint:optimizer "None" .
{ SELECT DISTINCT (xsd:string(?author_name_) AS ?author_name) { wd:Q5565155 skos:altLabel ?author_name_ . } }
?work wdt:P2093 ?author_name .
}
But I get no returned results from the Wikidata Query Service:
However, if I use the "=" comparison, I can match the strings:
SELECT * {
hint:Query hint:optimizer "None" .
{ SELECT DISTINCT (xsd:string(?author_name_) AS ?author_name) { wd:Q5565155 skos:altLabel ?author_name_ . } }
?work wdt:P50 wd:Q5565155 .
?work wdt:P2093 ?author_name__ .
FILTER (?author_name = ?author_name__)
}
With the current data in Wikidata, I get five rows returned in this query.
Another way to get this data is by using a BIND:
SELECT * {
BIND("Knudsen GM" AS ?author_name)
?work wdt:P2093 ?author_name .
}
I suppose there might be something wrong with the casting as this does not return anything:
SELECT * {
BIND(xsd:string("Knudsen GM") AS ?author_name)
?work wdt:P2093 ?author_name .
}
Combinations with xsd:string changed to STR or no conversion at all in the original query do neither yield result rows.

BigQuery: Select entire repeated field with group

I'm using LegacySQL, but am not strictly limited to it. (though it does have some methods I find useful, "HASH" for example).
Anyhow, the simple task is that I want to group by one top level column, while still keeping the first instance of a nested+repeated set of data alongside.
So, the following "works", and produces nested output:
SELECT
cd,
subarray.*
FROM [magicalfairy.land]
And now I attempt to just grab the entire first subarray (honestly, I don't expect this to work of course)
The following is what doesn't work:
SELECT
cd,
FIRST(subarray.*)
FROM [magicalfairy.land]
GROUP BY cd
Any alternate approaches would be appreciated.
Edit, for data behaviour example.
If Input data was roughly:
[
{
"cd": "something",
"subarray": [
{
"hello": 1,
"world": 1
},
{
"hello": 2,
"world": 2
}
]
},
{
"cd": "something",
"subarray": [
{
"hello": 1,
"world": 1
},
{
"hello": 2,
"world": 2
}
]
}
]
Would expect to get out:
[
{
"cd": "something",
"subarray": [
{
"hello": 1,
"world": 1
},
{
"hello": 2,
"world": 2
}
]
}
]
You'll have a much better time preserving the structure using standard SQL, e.g.:
WITH T AS (
SELECT
cd,
ARRAY<STRUCT<x INT64, y BOOL>>[
STRUCT(off, MOD(off, 2) = 0),
STRUCT(off - 1, false)] AS subarray
FROM UNNEST([1, 2, 1, 2]) AS cd WITH OFFSET off)
SELECT
cd,
ANY_VALUE(subarray) AS subarray
FROM T
GROUP BY cd;
ANY_VALUE will return some value of subarray for each group. If you wanted to concatenate the arrays instead, you could use ARRAY_CONCAT_AGG.
to run this against your table - try below
SELECT
cd,
ANY_VALUE(subarray) AS subarray
FROM `magicalfairy.land`
GROUP BY cd
Try below (BigQuery Standard SQL)
SELECT cd, subarray
FROM (
SELECT cd, subarray,
ROW_NUMBER() OVER(PARTITION BY cd) AS num
FROM `magicalfairy.land`
) WHERE num = 1
This gives you expected result - equivalent of "ANY ARRAY"
This solution can be extended to "FIRST ARRAY" by adding ORDER BY sort_col into OVER() clause - assuming that sort_col defines the logical order

how to programmatically get all available information from a Wikidata entity?

I'm really new to wikidata. I just figured that wikidata uses a lot of reification.
Suppose we want to get all information available for Obama. If we are going to do it from DBpedia, we would just use a simple query:
select * where {<http://dbpedia.org/resource/Barack_Obama> ?p ?o .} This would return all the properties and values with Obama being the subject. Essentially the result is the same as this page: http://dbpedia.org/page/Barack_Obama while the query result is in a format I needed.
I'm wondering how to do the same thing with Wikidata. This is the Wikidata page for Obama: https://www.wikidata.org/wiki/Q76. Let's say I want all the statements on this page. But almost all the statements on this page are reified in that they have ranks and qualifiers, etc. For example, for the "educated at" part, it not only has the school, but also the "start time" and "end time" and all schools are ranked as normal since Obama is not in these schools anymore.
I could just get all the schools by getting the truthy statements (using https://query.wikidata.org):
SELECT ?school ?schoolLabel WHERE {
wd:Q76 wdt:P69 ?school .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
The above query will simple return all the schools.
If I want to get the start time and end time of the school, I need to do this:
SELECT ?school ?schoolLabel ?start ?end WHERE {
wd:Q76 p:P69 ?school_statement .
?school_statement ps:P69 ?school .
?school_statement pq:P580 ?start .
?school_statement pq:P582 ?end .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
But the thing is, without looking at the actual page, how would I know that the ?school_statement has pq:P580 and pq:P582, namely the "start time" and "end time"? And it all comes down to a question that how do I get all the information (including reification) from https://www.wikidata.org/wiki/Q76?
Ultimately, I would expect a table like this:
||predicate||object||objectLabel||qualifier1||qualifier1Value||qualifier2||qualifier2Value||...
you should probably go for the Wikidata data API (more specifically the wbgetentities module) instead of the SPARQL endpoint:
In your case:
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q76
You should find all the qualifiers data you where looking for: example with entities.Q76.claims.P69.1
{ mainsnak:
{ snaktype: 'value',
property: 'P69',
datavalue:
{ value: { 'entity-type': 'item', 'numeric-id': 3273124, id: 'Q3273124' },
type: 'wikibase-entityid' },
datatype: 'wikibase-item' },
type: 'statement',
qualifiers:
{ P580:
[ { snaktype: 'value',
property: 'P580',
hash: 'a1db249baf916bb22da7fa5666d426954435256c',
datavalue:
{ value:
{ time: '+1971-01-01T00:00:00Z',
timezone: 0,
before: 0,
after: 0,
precision: 9,
calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
type: 'time' },
datatype: 'time' } ],
P582:
[ { snaktype: 'value',
property: 'P582',
hash: 'a065bff95f5cb3026ebad306b3df7587c8daa2e9',
datavalue:
{ value:
{ time: '+1979-01-01T00:00:00Z',
timezone: 0,
before: 0,
after: 0,
precision: 9,
calendarmodel: 'http://www.wikidata.org/entity/Q1985727' },
type: 'time' },
datatype: 'time' } ] },
'qualifiers-order': [ 'P580', 'P582' ],
id: 'q76$464382F6-E090-409E-B7B9-CB913F1C2166',
rank: 'normal' }
Then you might be interesting in ways to extract readable results from those results

Querying multiple graphs with aggregate Count and graph in results

Is it possible to count the occurrences of triples in multiple named graphs and return the results as rows in a table? Such as:
?g ?count ?sequence_count
-------- ------- ---------------
graph1 54 54
graph2 120 80
Here is the query that I tried.
SELECT ?g ?count ?sequence_count
FROM NAMED <graph1>
FROM NAMED <graph2>
WHERE {
{
select (COUNT(?identifier) as ?count) (COUNT(?sequence) as ?sequence_count)
WHERE { GRAPH ?g {
?identifier a <http://www.w3.org/2000/01/rdf-schema#Resource> .
OPTIONAL { ?identifier <urn:sequence> ?sequence }
} }
}
}
But the results were:
?g ?count ?sequence_count
-------- ------- ---------------
174 134
I'm trying to avoid having to write out:
select ?count_graph1 ?sequence_count_graph1 ?count_graph2 ...
as there could be hundreds of graphs to query.
First, the query is really close. Just move the SELECT inside of the graph statement - basically stating 'for each graph, find these aggregate values'. Second, if any of the ?identifier matches have multiple values, the count for ?identifier will have duplicates, so DISTINCT results are necessary. Try the following:
SELECT *
FROM NAMED <graph1>
FROM NAMED <graph2>
WHERE {
GRAPH ?g {
SELECT (COUNT(DISTINCT ?identifier) as ?count) (COUNT(?sequence) as ?sequence_count)
WHERE {
?identifier a <http://www.w3.org/2000/01/rdf-schema#Resource> .
OPTIONAL { ?identifier <urn:sequence> ?sequence }
}
}
}