Access SQL agregate function query - sql

I would like to make an SQL query to extract aggregate statistics from the following table:
Company | Product X1 | ProdX2 | ... | ProdX10 | ProdY1 | ProdY2 | ... | ProdY10
ABC 5 3 ... 6 5 8 ... 12
EDF 2 NULL ... 5 Null 1 ... 6
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
XYZ NULL 3 ... 14 7 2 ... 8
The result of the query should look something like this (other design suggestions appreciated)
Product | Average | Min | Covariance with corresponding X or Y Product
ProdX1 Avg(ProdX1) Min(ProdX1) Covar(ProdX1,ProdY1)
ProdX2 Avg(ProdX2) Min(ProdX2) Covar(ProdX2,ProdY2)
.
.
.
ProdY10 Avg(ProdY1) Min(ProdY10) Covar(ProdY10,ProdX10)
I am OK with the different aggregate functions, of course Covar (X1,Y1) = Covar(Y1,X1)
However, I am not sure how to create a query that returns the desired result.
Any suggestions are much appreciated.
Thank you very much.

Related

Jq strange behavior of pipe involving variable on the left hand side

I came across a weird behavior of jq involving a variable on the left hand side of a pipe.
For your information, this question was inspired by the jq manual: under Scoping (https://stedolan.github.io/jq/manual/#Advancedfeatures) where it mentions an example filter ... | .*3 as $times_three | [. + $times_three] | .... I believe the correct version is ... | (.*3) as $times_three | [. + $times_three] | ....
First (https://jqplay.org/s/ffMPsqmsmt)
filter:
. * 3 as $times_three | .
input:
3
output:
9
Second (https://jqplay.org/s/yOFcjRAMLL)
filter:
. * 4 as $times_four | .
input:
3
output:
9
What is happening here?
But (https://jqplay.org/s/IKrTNZjKI8)
filter:
(. * 3) as $times_three | .
input:
3
output:
3
And (https://jqplay.org/s/8zoq2-HN1G)
filter:
(. * 4) as $times_four | .
input:
3
output:
3
So if parenthesis (.*3) or (.*4) is used when the variable is declared then filter behaves predictably.
But if parenthesis is not used .*3 or .*4 then strangely the output is 9 for both.
Can you explain?
Contrary to what the examples in the Scoping section assume, . * 4 as $times_four | . is equivalent to . * ( 4 as $times_four | . ) and therefore squares its input.
You might expect
. * 4 as $times_four | .
to be equivalent to
( . * 4 ) as $times_four | .
And as you point out, some example even suggest this is the case. However, the first snippet is actually equivalent to the following:
. * ( 4 as $times_four | . )
And since … as $x produces its context[1], that's the same as
. * ( . | . )
or
. * .
jq's operator precedence is inconsistent and/or quirky.
"def" | "abc" + "def" | length means"def" | ( "abc" + "def" ) | length, but"def" | "abc" + "def" as $x | length means"def" | "abc" + ( "def" as $x | length ).
This behaviour suggests that that as isn't a binary operator of the form X as $Y as one might expect, but a ternary operator of the form X as $Y | Z.
And, in fact, this is how it's documented:
Variable / Symbolic Binding Operator: ... as $identifier | ...
This leads to surprises, especially since it binds a lot more tightly than expected. And it looks like whomever authored the examples in the Scoping section fell into the trap.
It might produce it multiple times e.g. .[] as $x.
Indeed, there seems to be a mistake in the manual. In section Scoping it is contrasting the (faulty) examples
... | .*3 as $times_three | [. + $times_three] | ... # faulty!
and
... | (.*3 as $times_three | [. + $times_three]) | ... # faulty!
While the overall statement stays valid, both examples are missing additional parentheses around .*3. Thus, it should actually read
... | (.*3) as $times_three | [. + $times_three] | ...
and
... | ((.*3) as $times_three | [. + $times_three]) | ...
respectively.
From the manual under section Variable / Symbolic Binding Operator:
The expression exp as $x | ... means: for each value of expression exp, run the rest of the pipeline with the entire original input, and with $x set to that value. Thus as functions as something of a foreach loop.
This means that a variable assignment takes the one expression left of as and assigns its evaluation to the defined variable right of as (and this happens as many times as exp produces an output). But, as everything in jq is a filter, the assignment itself also is, and as such it needs to have an output itself. If you look closely, the full title of that section
Variable / Symbolic Binding Operator: ... as $identifier | ...
also features a pipe symbol next to it, which indicates that it belongs to the assignment's structure. Try just running . as $x. You will get an error because the | ... part is missing. Thus, to simply keep the input context as is (apart from maybe duplicating it as many times as the expression left of as produced an output), a complete assignment would rather look like … as $x | ., or, if the input context is what you wanted to capture in the variable, . as $x | .
That said, let's clarify what happens with your examples by putting explicit parentheses around the assignments:
3 | . * 3 as $times_three | .
3 | . * (3 as $times_three | .)
3 | . * . # with $times_three set to 3
3 * 3 # with $times_three set to 3
9 # with $times_three set to 3
3 | . * 4 as $times_four | .
3 | . * (4 as $times_four | .)
3 | . * . # with $times_four set to 4
3 * 3 # with $times_four set to 4
9 # with $times_four set to 4
3 | (. * 3) as $times_three | .
3 | ((. * 3) as $times_three | .)
3 | ((3 * 3) as $times_three | .)
3 | (9 as $times_three | .)
3 | . # with $times_three set to 9
3 # with $times_three set to 9
3 | (. * 4) as $times_four | .
3 | ((. * 4) as $times_four | .)
3 | ((3 * 4) as $times_four | .)
3 | (12 as $times_four | .)
3 | . # with $times_four set to 12
3 # with $times_four set to 12

Identifying specific strings and checking subsequent rows for another string

I have the following DataFrame.
df = pd.DataFrame({'1': ['A','.','.','X','.','.'],
'2':['.','.','.','.','A','.'],
'3':['.','.','.','.','.','.'],
'4':['.','.','.','.','.','X']})
I want to identify all instances where 'A' occurs and check to see if 'X' occurs within the next 3 rows.
After doing that I would like to execute a command based on these conditions.
an example of what I am trying to do would be...
for i, idx in df.iterrows():
if idx == A:
if X exists within next 3 rows:
x= idx['1']
y= idx['2']
Any help would be greatly appreciated.
I am sure that the other answer could work if you were to explain what you really want to do. It would be more efficient as iterating over rows is slow.
However, here is a solution based on iterrows:
mask = df.eq('X').any(1)
mask = mask.where(mask).bfill(limit=3).fillna(False)
for idx, row in df.iterrows():
if 'A' in row.values and mask[idx]:
x = row['1']
y = row['2']
print(f'row {idx} matches: {x=}, {y=}')
example input (slightly different from yours):
1 2 3 4
0 A . . .
1 . . . .
2 . . A .
3 . . . .
4 X A . .
5 . . X .
output:
row 2 matches: x='.', y='.'
row 4 matches: x='X', y='A'
IIUC, you want to identify the cells where there is a value A and if within the next 3 rows, there is also a value X
I will use a more visual example for clarity (A/X/.):
0 1 2 3 4 5
0 A . . A . A
1 . . X . A .
2 . A . . . A
3 . X . X . X
4 X . . . . .
One can use eq to find the searched values and where+bfill(limit=3)+.fillna to extend the second mask to the previous lines.
# mask for the A
m1 = df.eq('A')
# mask for the X in the next 3 lines
m2 = df.eq('X')
m2 = m2.where(m2).bfill(limit=3).fillna(False)
# example of how to use the masks: replacing A with O
df[m1&m2] = 'O'
Example output:
0 1 2 3 4 5
0 A . . O . O
1 . . X . A .
2 . O . . . O
3 . X . X . X
4 X . . . . .
checking X for any column
Just change the second mask to:
m2 = df.eq('X').any(1)
m2 = m2.where(m2).bfill(limit=3).fillna(False)
output with this mask:
0 1 2 3 4 5
0 O . . O . O
1 . . X . O .
2 . O . . . O
3 . X . X . X
4 X . . . . .

Using Imputer with conditions on the cell values of columns

I have a data frame with certain number of 'nan' in certain columns:
(Total number of rows is over 50000)
Data looks something like below (showing for 1st 3 columns)
A
B
C
D
E
F
10
20
5
.
.
.
nan
54
10
.
.
.
23
nan
9
.
.
.
30
32
6
.
.
.
20
22
nan
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
There is a condition on these columns : A < B and A > C always for all rows.
I wish to use some imputer (preferably KNNImputer) such that after imputing the above conditions are satisfied.
(while applying imputer in a generic way, it turns out that many cells are not satisfying these conditions)
How can this be implemented?

Search a value in column and give the value from another

I have a database called Process It has 22 columns
Column 1 is Id which serial and primary,
Column 2 Process name Character(50)
column 3 is "Amount 1" Character Varying
Column 4 is "Time 1" is integer.
The rest of the columns are the same as 3 & 4 but going up in number ie column 5 "Amount 2", column 6 is "Time 2".
What i need is a query which looks in the amount columns for normal and then display the ID and the Time column. for example:
Process Table
ID . Process Name . Amount 1 . Time 1 . Amount 2 Time 2
1 . Pick . normal . 20 . normal . 40
2 . Pack . normal . 40 . 3 . 10
3 . Pull . 3 . 20 . 1 . 60
4 . Play . normal . 40
Result
ID . Time 1 . Time 2
1 . 20 . 40
2 . 40
4 . 40
I have tried the following codes :
select public."Process", amount_1 from
names-# (select ID,time_1 FROM public."Process" AS normal_tasks);
select public."Process", amount_1 from
names-# select id, Time_1 from public."Process" where Amount_1!='normal';
but i'm getting syntax errors.
Any Help will be greatly appreciated
Many Thanks
Dave
I think you are looking for CASE
SELECT CASE WHEN Amount_1 = 'Normal'
THEN Time1
END as Time1,
CASE WHEN Amount_2 = 'Normal'
THEN Time2
END as Time2
FROM Process
WHERE 'Normal' IN (Amount_1, Amount_2)
You have to add one for each of those 22 columns

SPARQL - Count occurrence of a substring in object

I'm fairly new to Linked-Data and SPARQL but I understand the concept and some of the querying as I do have knowledge of SQL. Using some example data from rdfdata.org I managed to setup a GraphDB instance with an Elvis impersonator repo.
Using some basic queries like
SELECT * WHERE {?s ?p ?o} and filtering on object values I was able to get some basic data visible in tables. I have experience using regular expressions so I decided to use this with SPARQL to count the occurrence of Elvis within the object. However, whatever I do I am not able to get this above one.
This is a problem as I have triples that contain a form of elvis more than once:
s: http://www.gigmasters.com/elvis/bobjames/
p: ep:influences
o: Elvis Elvis Elvis! I also do a Neil Diamond tribute as well, and have
been a DJ, MC, and musician for many years.
As you can see there are three occurrences of Elvis which are only counted as 1.
Here is the SPARQL query used to select the triple and to count the occurrences:
SELECT ?s ?p ?o (count(regex( ?o ,"[Ee]lvis")) as ?count)
WHERE {
?s ?p ?o.
filter(regex( ?o ,"([Ee]lvis.){3}")) //only return the triple above
}
GROUP BY ?s ?p ?o
How is it possible that these occurrences are not counted? I tried using str(?o) but as the object is a string literal to begin with that should not matter.
Expected result:
le table with 4 columns: | ?s | ?p | ?o | count |,
where count should be "3"^^xsd:integer
You can accomplish this by taking the input string (e.g., "A B A C"), replacing the occurrences of the target (e.g., "A") with the empty string ("") to get an updated string (e.g., " B C"). Then, compute the difference between the length of the updated string with the input string. Divide that by the length of the target, and that's how many times the target appears in the input. For instance:
#prefix : <urn:ex:>
:a :hasString "I like Elvis." .
:b :hasString "Elvis's name was Elvis." .
:c :hasString "Not mentioned here" .
:d :hasString "daybydaybyday" .
prefix : <urn:ex:>
select ?x ?s ?t ?count where {
values ?t { "Elvis" "daybyday" }
?x :hasString ?s .
bind(((strlen(?s) - strlen(replace(?s, ?t, ""))) / strlen(?t)) as ?count)
}
-------------------------------------------------------
| x | s | t | count |
=======================================================
| :a | "I like Elvis." | "Elvis" | 1.0 |
| :b | "Elvis's name was Elvis." | "Elvis" | 2.0 |
| :c | "Not mentioned here" | "Elvis" | 0.0 |
| :d | "daybydaybyday" | "Elvis" | 0.0 |
| :a | "I like Elvis." | "daybyday" | 0.0 |
| :b | "Elvis's name was Elvis." | "daybyday" | 0.0 |
| :c | "Not mentioned here" | "daybyday" | 0.0 |
| :d | "daybydaybyday" | "daybyday" | 1.0 |
-------------------------------------------------------
There are a couple of caveats here.
The target string must be a "normal" string. For instance, if it were a genuine regular expression pattern that could expand to text of varying length, this method won't work.
You need to be aware of how this handles overlapping strings. For instance, if your input text is "daybydaybyday" and the target is "daybyday", are you expecting to count one occurrence or two? With this method, you'll just get one, because once one occurrence is replaced, the remaining string doesn't have any more.
SPARQL count is used to count the number of matching possible bindings in the RDF data or simply said, the number of matching rows. Indeed there is only one object that matches the REGEX, thus, only one row.
Unfortunately, SPARQL doesn't have any concept of explode to create multiple rows out of a single row (or better said, I'm not aware of).
As a workaround, I wrote a SPARQL query with REGEX + String hacks. The idea is to
replace each occurrence of Elvis with some special character that hopefully doesn't occur. I've chosen Å here for demonstration.
delete each other character in the text
compute the length of the remaining string
Query
PREFIX ep: <http://www.snee.com/ns/ep>
SELECT ?s ?p ?o ?cnt
WHERE {
?s ?p ?o.
filter(regex( str(?o) ,"([Ee]lvis.)"))
bind(
strlen(
replace(
replace(str(?o), "([Ee]lvis.)", "Å")
, "[^Å]", ""
)
) as ?cnt)
}
Output (sample)
+-----------------------------------------------+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+
| s | p | o | cnt |
+-----------------------------------------------+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+
| http://www.gigmasters.com/elvis/ChuckBaril/ | http://www.snee.com/ns/epinfluences | Elvis, Donny Osmond, Barry Manilow, Pebo Bryson, James Ingram, George Benson, and George Strait | 1 |
| http://www.gigmasters.com/elvis/DukeHicks/ | dc:description | Been performing Elvis tribute shows for 10 yrs. Having been in the music business for twenty years Duke knows how to please the audience. Duke started doing his tribute shows after several request from the audience members to do more and more of Elvis' songs and a request for him to do an Elvis Tribute Show. Duke has been asked several times if he is lip-syching to Elvis' songs and the answer is absolutely NO. The sound and stage presence is so close to 'The King' that it has startled many. | 4 |
| http://www.gigmasters.com/elvis/DukeHicks/ | http://www.snee.com/ns/epcategory | Elvis Impersonator, Tribute Band | 1 |
| http://www.gigmasters.com/elvis/DukeHicks/ | http://www.snee.com/ns/epinfluences | Elvis Presley | 1 |
| http://www.gigmasters.com/elvis/ElvisByDano/ | dc:description | For a great time at your next event, how about ELVIS by Dano? His main goal is to provide a show that reflects the raw energy, passion, and humor that The King once shared with us. Dano, being a huge Elvis fan since his eleventh year, has loved singing along with The Man his entire adult life. He started to impersonate Elvis in public about 1995, and his first long solo performance, with a full set of songs, was at a church social in 2002. Dano was also a seven year member of a classic rock band and often contributed an Elvis act that audiences always truly enjoyed. Starting in February, 2004 he has performed in many solo shows for benefits, auctions, various parties , a Theme Park, as well as much time donated to entertain the elderly. He uses quality audio equipment with great sounding background tracks. Longer travel distances will be considered. Contact Dano today if you want your next party 'all shook up'!!! | 3 |
+-----------------------------------------------+-------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+