SPARQL - Count occurrence of a substring in object - sparql

I'm fairly new to Linked-Data and SPARQL but I understand the concept and some of the querying as I do have knowledge of SQL. Using some example data from rdfdata.org I managed to setup a GraphDB instance with an Elvis impersonator repo.
Using some basic queries like
SELECT * WHERE {?s ?p ?o} and filtering on object values I was able to get some basic data visible in tables. I have experience using regular expressions so I decided to use this with SPARQL to count the occurrence of Elvis within the object. However, whatever I do I am not able to get this above one.
This is a problem as I have triples that contain a form of elvis more than once:
s: http://www.gigmasters.com/elvis/bobjames/
p: ep:influences
o: Elvis Elvis Elvis! I also do a Neil Diamond tribute as well, and have
been a DJ, MC, and musician for many years.
As you can see there are three occurrences of Elvis which are only counted as 1.
Here is the SPARQL query used to select the triple and to count the occurrences:
SELECT ?s ?p ?o (count(regex( ?o ,"[Ee]lvis")) as ?count)
WHERE {
?s ?p ?o.
filter(regex( ?o ,"([Ee]lvis.){3}")) //only return the triple above
}
GROUP BY ?s ?p ?o
How is it possible that these occurrences are not counted? I tried using str(?o) but as the object is a string literal to begin with that should not matter.
Expected result:
le table with 4 columns: | ?s | ?p | ?o | count |,
where count should be "3"^^xsd:integer

You can accomplish this by taking the input string (e.g., "A B A C"), replacing the occurrences of the target (e.g., "A") with the empty string ("") to get an updated string (e.g., " B C"). Then, compute the difference between the length of the updated string with the input string. Divide that by the length of the target, and that's how many times the target appears in the input. For instance:
#prefix : <urn:ex:>
:a :hasString "I like Elvis." .
:b :hasString "Elvis's name was Elvis." .
:c :hasString "Not mentioned here" .
:d :hasString "daybydaybyday" .
prefix : <urn:ex:>
select ?x ?s ?t ?count where {
values ?t { "Elvis" "daybyday" }
?x :hasString ?s .
bind(((strlen(?s) - strlen(replace(?s, ?t, ""))) / strlen(?t)) as ?count)
}
-------------------------------------------------------
| x | s | t | count |
=======================================================
| :a | "I like Elvis." | "Elvis" | 1.0 |
| :b | "Elvis's name was Elvis." | "Elvis" | 2.0 |
| :c | "Not mentioned here" | "Elvis" | 0.0 |
| :d | "daybydaybyday" | "Elvis" | 0.0 |
| :a | "I like Elvis." | "daybyday" | 0.0 |
| :b | "Elvis's name was Elvis." | "daybyday" | 0.0 |
| :c | "Not mentioned here" | "daybyday" | 0.0 |
| :d | "daybydaybyday" | "daybyday" | 1.0 |
-------------------------------------------------------
There are a couple of caveats here.
The target string must be a "normal" string. For instance, if it were a genuine regular expression pattern that could expand to text of varying length, this method won't work.
You need to be aware of how this handles overlapping strings. For instance, if your input text is "daybydaybyday" and the target is "daybyday", are you expecting to count one occurrence or two? With this method, you'll just get one, because once one occurrence is replaced, the remaining string doesn't have any more.

SPARQL count is used to count the number of matching possible bindings in the RDF data or simply said, the number of matching rows. Indeed there is only one object that matches the REGEX, thus, only one row.
Unfortunately, SPARQL doesn't have any concept of explode to create multiple rows out of a single row (or better said, I'm not aware of).
As a workaround, I wrote a SPARQL query with REGEX + String hacks. The idea is to
replace each occurrence of Elvis with some special character that hopefully doesn't occur. I've chosen Å here for demonstration.
delete each other character in the text
compute the length of the remaining string
Query
PREFIX ep: <http://www.snee.com/ns/ep>
SELECT ?s ?p ?o ?cnt
WHERE {
?s ?p ?o.
filter(regex( str(?o) ,"([Ee]lvis.)"))
bind(
strlen(
replace(
replace(str(?o), "([Ee]lvis.)", "Å")
, "[^Å]", ""
)
) as ?cnt)
}
Output (sample)
+-----------------------------------------------+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+
| s | p | o | cnt |
+-----------------------------------------------+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+
| http://www.gigmasters.com/elvis/ChuckBaril/ | http://www.snee.com/ns/epinfluences | Elvis, Donny Osmond, Barry Manilow, Pebo Bryson, James Ingram, George Benson, and George Strait | 1 |
| http://www.gigmasters.com/elvis/DukeHicks/ | dc:description | Been performing Elvis tribute shows for 10 yrs. Having been in the music business for twenty years Duke knows how to please the audience. Duke started doing his tribute shows after several request from the audience members to do more and more of Elvis' songs and a request for him to do an Elvis Tribute Show. Duke has been asked several times if he is lip-syching to Elvis' songs and the answer is absolutely NO. The sound and stage presence is so close to 'The King' that it has startled many. | 4 |
| http://www.gigmasters.com/elvis/DukeHicks/ | http://www.snee.com/ns/epcategory | Elvis Impersonator, Tribute Band | 1 |
| http://www.gigmasters.com/elvis/DukeHicks/ | http://www.snee.com/ns/epinfluences | Elvis Presley | 1 |
| http://www.gigmasters.com/elvis/ElvisByDano/ | dc:description | For a great time at your next event, how about ELVIS by Dano? His main goal is to provide a show that reflects the raw energy, passion, and humor that The King once shared with us. Dano, being a huge Elvis fan since his eleventh year, has loved singing along with The Man his entire adult life. He started to impersonate Elvis in public about 1995, and his first long solo performance, with a full set of songs, was at a church social in 2002. Dano was also a seven year member of a classic rock band and often contributed an Elvis act that audiences always truly enjoyed. Starting in February, 2004 he has performed in many solo shows for benefits, auctions, various parties , a Theme Park, as well as much time donated to entertain the elderly. He uses quality audio equipment with great sounding background tracks. Longer travel distances will be considered. Contact Dano today if you want your next party 'all shook up'!!! | 3 |
+-----------------------------------------------+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+

Related

Jq strange behavior of pipe involving variable on the left hand side

I came across a weird behavior of jq involving a variable on the left hand side of a pipe.
For your information, this question was inspired by the jq manual: under Scoping (https://stedolan.github.io/jq/manual/#Advancedfeatures) where it mentions an example filter ... | .*3 as $times_three | [. + $times_three] | .... I believe the correct version is ... | (.*3) as $times_three | [. + $times_three] | ....
First (https://jqplay.org/s/ffMPsqmsmt)
filter:
. * 3 as $times_three | .
input:
3
output:
9
Second (https://jqplay.org/s/yOFcjRAMLL)
filter:
. * 4 as $times_four | .
input:
3
output:
9
What is happening here?
But (https://jqplay.org/s/IKrTNZjKI8)
filter:
(. * 3) as $times_three | .
input:
3
output:
3
And (https://jqplay.org/s/8zoq2-HN1G)
filter:
(. * 4) as $times_four | .
input:
3
output:
3
So if parenthesis (.*3) or (.*4) is used when the variable is declared then filter behaves predictably.
But if parenthesis is not used .*3 or .*4 then strangely the output is 9 for both.
Can you explain?
Contrary to what the examples in the Scoping section assume, . * 4 as $times_four | . is equivalent to . * ( 4 as $times_four | . ) and therefore squares its input.
You might expect
. * 4 as $times_four | .
to be equivalent to
( . * 4 ) as $times_four | .
And as you point out, some example even suggest this is the case. However, the first snippet is actually equivalent to the following:
. * ( 4 as $times_four | . )
And since … as $x produces its context[1], that's the same as
. * ( . | . )
or
. * .
jq's operator precedence is inconsistent and/or quirky.
"def" | "abc" + "def" | length means"def" | ( "abc" + "def" ) | length, but"def" | "abc" + "def" as $x | length means"def" | "abc" + ( "def" as $x | length ).
This behaviour suggests that that as isn't a binary operator of the form X as $Y as one might expect, but a ternary operator of the form X as $Y | Z.
And, in fact, this is how it's documented:
Variable / Symbolic Binding Operator: ... as $identifier | ...
This leads to surprises, especially since it binds a lot more tightly than expected. And it looks like whomever authored the examples in the Scoping section fell into the trap.
It might produce it multiple times e.g. .[] as $x.
Indeed, there seems to be a mistake in the manual. In section Scoping it is contrasting the (faulty) examples
... | .*3 as $times_three | [. + $times_three] | ... # faulty!
and
... | (.*3 as $times_three | [. + $times_three]) | ... # faulty!
While the overall statement stays valid, both examples are missing additional parentheses around .*3. Thus, it should actually read
... | (.*3) as $times_three | [. + $times_three] | ...
and
... | ((.*3) as $times_three | [. + $times_three]) | ...
respectively.
From the manual under section Variable / Symbolic Binding Operator:
The expression exp as $x | ... means: for each value of expression exp, run the rest of the pipeline with the entire original input, and with $x set to that value. Thus as functions as something of a foreach loop.
This means that a variable assignment takes the one expression left of as and assigns its evaluation to the defined variable right of as (and this happens as many times as exp produces an output). But, as everything in jq is a filter, the assignment itself also is, and as such it needs to have an output itself. If you look closely, the full title of that section
Variable / Symbolic Binding Operator: ... as $identifier | ...
also features a pipe symbol next to it, which indicates that it belongs to the assignment's structure. Try just running . as $x. You will get an error because the | ... part is missing. Thus, to simply keep the input context as is (apart from maybe duplicating it as many times as the expression left of as produced an output), a complete assignment would rather look like … as $x | ., or, if the input context is what you wanted to capture in the variable, . as $x | .
That said, let's clarify what happens with your examples by putting explicit parentheses around the assignments:
3 | . * 3 as $times_three | .
3 | . * (3 as $times_three | .)
3 | . * . # with $times_three set to 3
3 * 3 # with $times_three set to 3
9 # with $times_three set to 3
3 | . * 4 as $times_four | .
3 | . * (4 as $times_four | .)
3 | . * . # with $times_four set to 4
3 * 3 # with $times_four set to 4
9 # with $times_four set to 4
3 | (. * 3) as $times_three | .
3 | ((. * 3) as $times_three | .)
3 | ((3 * 3) as $times_three | .)
3 | (9 as $times_three | .)
3 | . # with $times_three set to 9
3 # with $times_three set to 9
3 | (. * 4) as $times_four | .
3 | ((. * 4) as $times_four | .)
3 | ((3 * 4) as $times_four | .)
3 | (12 as $times_four | .)
3 | . # with $times_four set to 12
3 # with $times_four set to 12

Lucene Query - AND operator failing in Azure Search?

I have a search index of sandwiches. The index has three fields: id, meat, and bread. Each field is an Edm.String. In this index, here is a subset of my data:
ID | Meat | Bread
-----------------------
1 | Ham | White
2 | Turkey | Hoagie
3 | Tuna | Wheat
4 | Roast Beef | White
5 | Ham | Wheat
6 | Roast Beef | Rye
7 | Turkey | Wheat
I need to write a query that returns all ham or turkey sandwiches on wheat bread. In an attempt to do this, I've created the following:
{
"search":"(meat:(Ham|Turkey) AND bread:\"Wheat\")",
"searchMode":"all",
"select":"id,meat,bread"
}
When I run this query, I'm not seeing any results. What am I missing? What am I doing wrong? I'm trying to understand full queries. Do field-level queries support the phrase operator? I'm not sure what I'm doing wrong.
You need to use "queryType": "full" to request the Lucene syntax. See an example on MSDN.
That said, what you're trying to accomplish is easier and more efficiently done using filters. Assuming you make the relevant fields in your index filterable, you can use the following filter expression for your example: $filter=(meat eq 'Ham' or meat eq 'Turkey') and bread eq 'Wheat'. For more on filters, see this article. Hope this helps!

Access SQL agregate function query

I would like to make an SQL query to extract aggregate statistics from the following table:
Company | Product X1 | ProdX2 | ... | ProdX10 | ProdY1 | ProdY2 | ... | ProdY10
ABC 5 3 ... 6 5 8 ... 12
EDF 2 NULL ... 5 Null 1 ... 6
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
XYZ NULL 3 ... 14 7 2 ... 8
The result of the query should look something like this (other design suggestions appreciated)
Product | Average | Min | Covariance with corresponding X or Y Product
ProdX1 Avg(ProdX1) Min(ProdX1) Covar(ProdX1,ProdY1)
ProdX2 Avg(ProdX2) Min(ProdX2) Covar(ProdX2,ProdY2)
.
.
.
ProdY10 Avg(ProdY1) Min(ProdY10) Covar(ProdY10,ProdX10)
I am OK with the different aggregate functions, of course Covar (X1,Y1) = Covar(Y1,X1)
However, I am not sure how to create a query that returns the desired result.
Any suggestions are much appreciated.
Thank you very much.

Query results based on multiple records using compound primary key

What is the best way to get records from the database based on the selected primary keys?
I am able to get the primary keys of the selected (checked) rows from a gridview, and now need to retrieve the corresponding records from the database for those selected compound primary keys.
If the rows are using just one primary key, it would have been easier. I could have just concatenated the primary keys (comma delimited) and use it in a WHERE IN clause. But the rows are using three primary keys (int, int, string)
I am thinking of using an SELECT JOIN (one select for each set of primary keys, then join all selects), but I'm not sure if this is the most optimized way to do this.
What is the best way to handle this?
Thanks in advance.
Niki
[Update] - can't reply to individual comments, I might not have enough priviledges yet, so will update post here to be clear.
Here's an example of what I am doing:
Data coming from one database table with compound key:
TABLE1
KeyCol1 INT (Primary Key),
KeyCol2 STRING (Primary Key),
KeyCol3 INT (Primary Key),
Col4 Decimal,
Col5 STRING,
Other columns . . .
PAGE 1 WITH GRIDVIEW 1:
Sel |KeyCol1 |KeyCol2 |KeyCol3 |Col4 |Col5 |Other Columns . . .
[ ] |100 |CODE1 |01 |10.05 |Description 1 |. . .
[/] |100 |CODE1 |02 | 5.03 |Description 2 |. . .
[ ] |100 |CODE2 |01 |12.45 |Description 4 |. . .
[/] |102 |CODE1 |01 |21.50 |Description 1 |. . .
[/] |102 |CODE2 |01 | 9.10 |Description 5 |. . .
[/] |102 |CODE3 |03 | 7.15 |Description 1 |. . .
. . .
(where Sel column is a checkbox, and [/] is checked)
Then after clicking a button (ex. "Cancel Records") on Page 1, I need to get those rows, to display to another page like this:
PAGE 2 WITH GRIDVIEW 2
KeyCol1 |KeyCol2 |KeyCol3 |Col4 |Other Colums |Reason
100 |CODE1 |02 | 5.03 |. . . |_______
102 |CODE1 |01 |21.50 |. . . |_______
102 |CODE2 |01 | 9.10 |. . . |_______
102 |CODE3 |03 | 7.15 |. . . |_______
(where Reason column is a textbox)
You can do this in a single query by using adding an OR clause for each trio of ids (i.e. for each selected row in your gridview). For example:
SELECT whatever FROM Table1 WHERE (KeyCol1=val1-1 AND KeyCol2=val1-2 AND KeyCol3=val1-3) OR (KeyCol1=val2-1 AND KeyCol2=val2-2 AND KeyCol3=val2-3) OR .. etc

Extract data from one field into another in mysql

I have an old table which has a column like this
1 | McDonalds (Main Street)
2 | McDonalds (1st Ave)
3 | The Goose
4 | BurgerKing (Central Gardes)
...
I want to match the venues like ' %(%)' and then extract the content in the brackets to a second field
to result in
1 | McDonalds | Main Street
2 | McDonalds | 1st Ave
3 | The Goose | NULL
4 | BurgerKing| Central Gardes
...
How would one go about this?
MySQL provides string functions for finding characters and extracting substrings. You can also use control flow functions to handle the cases where the venue is not present.
I installed these user defined functions
http://www.mysqludf.org/lib_mysqludf_preg/
Then I could select the "branches" via
SELECT `id`, `name`, preg_capture('/.*?\\((.*)\\)/',`name`,1) AS branch FROM `venues`