SPARQL: Is there a way to pivot on a property? - sparql

I am querying a DataCube (RDF statistical data) that has 3 dimension and 1 measure. In this datacube, each observation is then composed of 4 statements (3 for the dimensions and 1 for the measure like the following exemple (that may be queried at http://kaiko.getalp.org/sparql).
SELECT distinct ?version ?lg ?relation ?count WHERE {
?o a qb:Observation;
qb:dataSet dbnstats:dbnaryNymRelationsCube;
dbnary:wiktionaryDumpVersion ?version;
dbnary:observationLanguage ?lg;
dbnary:nymRelation ?relation;
dbnary:count ?count.
}
The query returns something like:
version
lg
relation
count
"20210601"
"id"
antonym
4
"20210601"
"id"
approximateSynonym
0
"20210601"
"id"
hypernym
0
"20210601"
"id"
synonym
108
"20150602"
"id"
antonym
2
"20150602"
"id"
approximateSynonym
0
"20150602"
"id"
hypernym
0
"20150602"
"id"
synonym
36
"20150702"
"id"
antonym
2
"20150702"
"id"
approximateSynonym
0
"20150702"
"id"
hypernym
0
"20150702"
"id"
synonym
36
I'd like to pivot on the value of the relation to get the following table:
version
lg
antonym
approximateSynonym
hypernym
synonym
"20210601"
"id"
4
0
0
108
"20150602"
"id"
2
0
0
36
"20150702"
"id"
2
0
0
36
I could not find a way to craft a single SPARQL query to get this. Currently, I need to fetch all data and make the pivot using whatever client language I use (here python).
Is this possible in SPARQL 1.1 ? How ?
I'd rather have a general answer, but the access point is currently served by Virtuoso.
Edit: To better explain my expectation. In DataCube Vocabulary, the structure of a DataCube is described that gives the different dimensions and measure (usually by the ontology). Hence, the dimensions and measures are considered to be known by the query developper (at least for a specific version of the ontology).
Here, the values of the nymRelation are not known in advance, they are part of the data and not of the structure. The Pivot operation seems to be a valid operation on a DataCube (along with slicing, projecting, etc.).
I would like to know if such an operation could be made on the server (through a generic query that will not depend on the actual data on the server). This would make it possible for a client to maintain a LAZY datacube object and postpone actual pivot operation when the results are indeed necessary.
I suspect (and first answers seems to imply) that this operation is not possible without either fetching the entire DataCube (to perform the operation in memory on client side) or fetching the actual distinct property values and automatically crafting a query that will depend on this first result.

You need to combine values from distinct observations. If hard-coding the relation names in the queries is not too impractical, you can write separate SELECT statements that bind a common value for ?version and ?lg to pull the counts into a single solution, like this:
SELECT ?version ?lg ?antonym ?approximateSynonym # ...
WHERE {
{
SELECT ?version ?lg ?antonym
WHERE
{
?o1 a qb:Observation;
qb:dataSet dbnstats:dbnaryNymRelationsCube;
dbnary:wiktionaryDumpVersion ?version;
dbnary:observationLanguage ?lg;
dbnary:nymRelation dbnary:antonym;
dbnary:count ?antonym . # <--- bind the antonym count value
}
}
{
SELECT ?version ?lg ?approximateSynonym
WHERE
{
?o2 a qb:Observation;
qb:dataSet dbnstats:dbnaryNymRelationsCube;
dbnary:wiktionaryDumpVersion ?version;
dbnary:observationLanguage ?lg;
dbnary:nymRelation dbnary:approximateSynonym;
dbnary:count ?approximateSynonym . # <--- bind the approximateSynonym count
}
}
# ... And so on for the other columns
}
This requires that all statistics are present for each version/language combination; otherwise there will be no solution for that combination.
Alternative
If there are too many relation types, you can use the following CONSTRUCT query to aggregate the equivalent of each row into its own observation-like object. The different properties will be mapped to the same ?rowURI. You can parse this result as RDF, or just deal with a json serialization if you prefer.
CONSTRUCT {
?rowURI
dbnary:wiktionaryDumpVersion ?version ;
dbnary:observationLanguage ?lg ;
?relation ?count
}
WHERE {
?o a qb:Observation;
qb:dataSet dbnstats:dbnaryNymRelationsCube;
dbnary:wiktionaryDumpVersion ?version;
dbnary:observationLanguage ?lg;
dbnary:nymRelation ?relation;
dbnary:count ?count.
BIND(URI(CONCAT("http://example.org/row/", ?lg, ?version)) AS ?rowURI)
}

Related

get a variable number of columns for output in sparql

Is there a way to get a variable number of columns for a given predicate? Essentially, I want to turn this:
title note
A. 1
A. 2
A. 3
B. 4
B. 5
into
title note1 note2 note3
A. 1 2 3
B. 4 5 null
Like, can i set the columns created to the maximum number of "notes" in the query or something. Thanks.
There are several ways you can approach this. One way is to change your query. Now, in the general case it is not possible to do a SELECT query that does exactly what you want. However, if you happen to know in advance what the maximum number of notes per title is, you can sort of do this.
Supposing your original query was something like this:
SELECT ?title ?note
WHERE { ?title :hasNote ?note }
And supposing you know titles have at most 3 notes, you could probably (untested) do something like this:
SELECT ?title ?note1 ?note2 ?note3
WHERE {
?title :hasNote ?note1 .
OPTIONAL { ?title :hasNote ?note2 . FILTER (?note2 != ?note1) }
OPTIONAL { ?title :hasNote ?note3 . FILTER (?note3 != ?note1 && ?note3 != ?note2) }
}
As you can see this is not a very nice solution though: it doesn't scale and is probably very inefficient to process as well.
Alternatives are various forms of post-processing. To make it simpler to post-process you could use an aggregate operator to get all notes for a single item on a single line at least:
SELECT ?title (GROUP_CONCAT(?note) as ?notes)
WHERE { ?title :hasNote ?note }
GROUP BY ?title
result:
title notes
A. "1 2 3"
B. "4 5"
You could then post-process the values of the ?notes variable to split them into the separate notes again.
Another solution is that instead of using a SELECT query, you use a CONSTRUCT query to give you back an RDF graph, rather than a table, and work directly with that in your code. Tables are kinda weird in an RDF world if you think about it: you're querying a graph model, why is the query result not a graph but a table?
CONSTRUCT
WHERE { ?title :hasNote ?note }
...and then process the result in whatever API you're using to do the queries.

Maintaining auto ranking as a column in MongoDB

I am using MongoDB as my database.
I have data which contains rank and name as columns. Now a new row can be updated with a rank different from ranks already existing or can be same.
If same then the ranks of other rows must be adjusted .
Rows having lesser rank than the to be inserted one must be incremented by one and the rows which are having ranks can remain as it it.
Feature is something like number bulleted list in MS Word type of applications. Where inserting a row in between adjust the numbering of other rows below it.
Rank 1 is the highest rank.
For e.g. there are 3 rows
Name Rank
A 1
B 2
C 3
Now i want to update a row with D as name and 2 as rank. So now after the row insert, the DB should like below
Name Rank
A 1
B 3
C 4
D 2
Probably using Database triggers i can achieve this by updating the other rows.
I have couple of questions
(a) Is there any other better way than using database trigger for achieving this kind of scenario ? Updating all the rows might be a time consuming job.
(b) Does MongoDB support database trigger natively ?
Best Regards,
Saurav
No, MongoDB, does not provide triggers (yet). Also I don't think trigger is really a great way to achieve this.
So I would just like to throw some ideas, see if it makes sense.
Approach 1
Maybe instead of disturbing those many documents, you can create a collection with only one document (Let's call the collection ranking). In that document, have an array field call ranks. Since it's an array it's already maintaining a sequence.
{
_id : "RANK",
"ranks" : ["A","B","C"]
}
Now if you want to add D to this rank at 2nd position
db.ranking.update({_id:"RANK"},{$push : {"ranks":{$each : ["D"],$position:1}}});
it would add D to index 1 which is 2nd position considering index starts at 0.
{
_id : "RANK",
"ranks" : ["A","D","B","C"]
}
But there is a catch, what if you want to change C position to 1st from 4th, you need to remove it from end and put it in the beginning, I am sure both operation can't be achieved in single update (didn't dig in the options much), so we can run two queries
db.ranking.update({_id:"RANK"},{$pull : {"ranks": "C"}});
db.ranking.update({_id:"RANK"},{$push : {"ranks":{$each : ["C"],$position:0}}});
Then it would be like
{
_id : "RANK",
"ranks" : ["C","A","D","B"]
}
maintaining the rest of sequence.
Now you would probably want to store id instead of A,B,C etc. one document can be 16MB so basically this ranks array can store more than 1.3 million entries of id, if id is MongoDB ObjectId of 12 bytes each. if that is not enough, we still have option to have followup document(s) with further ranking.
Approach 2
you can also, instead of having rank as number, just have two field like followedBy and precededBy.
so your user document would look
{
_id:"A"
"followedBy":"B",
}
{
_id:"B"
"followedBy":"C",
"precededBy":"A"
}
{
_id:"c"
"precededBy":"B",
}
if you want to add D at second position, then you need to change the current 2nd position and you need to insert the new One, so it would be change in only two document
{
_id:"A"
"followedBy":"B",
}
{
_id:"B"
"followedBy":"C",
"precededBy":"D" //changed from A to D
}
{
_id:"c"
"precededBy":"B",
}
{
_id:"D"
"followedBy":"B",
"precededBy":"A"
}
The downside of this approach is that you cannot sort in query based on the ranking until and unless you get all these in application and create a linkedlist sort of structure.
This approach just preserve the ranking with minimum DB changes.

SQL dynamic WHERE clause DataSet

I have a SQL queries with a where clauses that have to exclude rows based on a list of values of some columns, these list may be hard coded (suplied by the users) or constructed from other select query.
Also the hard coded list may be updated by the users, so I need every time to update the list on the query, and that is inconvinient.
I am wondering about the best way to parameter these lists.
Exemple of WHERE clause :
WHERE
Article_Code not in ('PA_003','PA_003','PE_234','FR_980','FA_333','FC_001','TA_999','FC_212','DC_009','FF_333','PR_001')
AND
((Partner_Status != 'Radied') or (Partner_Status = 'Radied' and Partner_Code in ('PR_000453','PR_0004311T','PR_V3345','PR_004D55') ))
AND
(Case_Code not in (select Case_Code from Agreement where DDR = 3))
One though is to build a table of parameter with this structure : (ExclusionCode - Column - ColumnMemberToExclude - ExclusionDescription) :
ExclusionCode is an internal code that I gnerate to identify the reason of exclusion.
Colum is the column to use on the where (ex: Article_Code)
ColumnMemberToExclude is the member to use in the where (ex: PA_003)
ExclusionDescription : functional description (ex: exclude the list of porsche product)
and then construct the where clause as a string from this table.
Is this the best way to do ?

Search efficiently for records matching given set of properties/attributes and their values (exact match, less than, greater than)

It is fairly simple problem to describe. However I could not come up with any reasonable solution so solution may or may not be so easy to cook up. Here is the problem:
Let there be many records describing some objects. For example:
{
id : 1,
kind : cat,
weight : 25 lb,
color : red
age : 10,
fluffiness : 98
attitude : grumpy
}
{
id : 2,
kind : robot,
chassis : aluminum,
year : 2015,
hardware : intel curie,
battery : 5000,
bat-life : 168,
weight : 0.5 lb,
}
{
id : 3,
kind : lightsaber,
color : red,
type : single blade,
power : 1000,
weight : 25 lb,
creator : Darth Vader
}
Attributes are not pre-specified so an object could be described using any attribute-value pairs.
If there are 1 000 000 records/objects there could easily be 100 000 different attributes.
My goal is to efficiently search through the data structure/s that will contain all records and if possible to come up with answer (quickly) which records match the given conditions.
For example a search query could be: Find all cats that weigh more than 20 and are older than 9 and are more fluffy than 98 and are red and whose attitude is "grumpy".
We can assume that there could be infinite number of records and infinite number of attributes but any search query contains no more than 20 numerical (lt,gt) clauses.
One possible implementation using SQL/MySQL I could think of was using fulltext indexes.
For example I could store non numeric attributes as "kind_cat color_red attitude_grumpy", search through them to narrow the resultset and then scan table containing numeric attributes for matches. It seems however (I am not sure at this point) that gt, lt searches might be costly in general using this strategy (I would have to do at least N joins for N numerical clauses).
I thought of MongoDB thinking of the problem, but although MongoDB naturally allows me to store key-value pairs, searching by some fields (not all) means that I must create indexes that contain all keys in all possible orders/permutations (and this is impossible).
Can this be done efficiently (maybe in logarithmic time??) using MySQL or any other dbms? - If not, is there data structure (maybe some muti-dimensional tree?) and algorithm that allows efficiently executing this kind of searches on a large scale (considering both time and space complexity)?
If it isn't possible to solve the problem defined this way are there any heuristic approaches that solve it without sacrificing too much.
If I get it right your thinking something like:
create table t
( id int not null
, kind varchar(...) not null
, key varchar(...) not null
, val varchar(...) not null
, primary key (id, kind, key) );
There are several problems with this approach, you can google for EAV to find out more. One example is that you will have to cast val to the appropriate type when doing comparisons ( '2' > '10' )
That said, an index like:
create unique index ix1 on t (kind, key, val, id)
will reduce the pain you will be suffering slightly, but the design wont scale well and with 1E6 of rows and 1E5 attributes the performance will be far from good. Your example query would look something like:
select a.id
from ( select id
from ( select id, val
from t
where kind = 'cat'
and key = 'weight'
)
where cast(val as int) > 20
) as a
join ( select id
from ( select id, val
from t
where kind = 'cat'
and key = 'age'
)
where cast(val as int) > 9
) as b
on a.id = b.id
join ( ...
and key = 'fluffy'
)
where cast(val as int) > 98
) as c
on a.id = c.id
join ...

How to form SPARQL queries that refers to multiple resources

My question is a followup with my first question about SPARQL here.
My SPARQL query results for Mountain objects are here.
From those results I picked a certain object resource.
Now I want to get values of "is dbpedia-owl:highestPlace of" records for this chosen Mountain object.
That is, names of mountain ranges for which this mountain is highest place of.
This is, as I figure, complex. Not only because I do not know the required syntax, but also I get two objects here.
One of them is Mont Blank Massif which is of type "place".
Another one is Western Alps which is of type "mountain range" - my desired record.
I need record # 2 above but not 1. I know 1 is also relevant but sometimes it doesn't follow same pattern. Sometimes the records appear to be of YAGO type, which can be totally misleading. To be safe, I simply want to discard those records whenever there is type mismatch.
How can I form my SPARQL query to get these "is dbpedia-owl:highestPlace of" records and also have the type filtering?
you can use this query, note however that Mont_Blanc_massif in your example is both a dbpedia-owl:Place and a dbpedia-owl:MountainRange
select * where {
?place dbpedia-owl:highestPlace :Mont_Blanc.
?place rdf:type dbpedia-owl:MountainRange.
}
run query
edit after comment: filter
It is not really clear what you want to filter (yago?), technically you can filter for example like this:
select * where {
?place dbpedia-owl:highestPlace :Mont_Blanc.
?place rdf:type dbpedia-owl:MountainRange.
FILTER NOT EXISTS {
?place ?pred ?obj
Filter (regex(?obj, "yago"))
}
}
this filters out results that have any object with 'yago' in its URL.
Extending the result from the previous answer, the appropriate query would be
select * where {
?mountain a dbpedia-owl:Mountain ;
dbpedia-owl:abstract ?abstract ;
foaf:depiction ?depiction .
?range a dbpedia-owl:MountainRange ;
dbpedia-owl:highestPlace ?mountain .
FILTER(langMatches(lang(?abstract),"EN"))
}
LIMIT 10
SPARQL Results
This selects mountains with English abstracts that have at least one depiction (or else the pattern wouldn't match) and for which there is some mountain range of which the mountain is the highest place. Without the parts from the earlier question, if you just want to retrieve mountains that are the highest place of a range, you can use a query like this:
select * where {
?mountain a dbpedia-owl:Mountain .
?range a dbpedia-owl:MountainRange ;
dbpedia-owl:highestPlace ?mountain .
}
LIMIT 10
SPARQL results