Postgres SQL aggregates query in BigQuery? - sql

For Below Postgres SQL query, I do use PIVOT in BigQuery, beside PIVOT, any other method for such query in BigQuery?
-- Postgres SQL --
SELECT
Apple,
Orange,
Lemon,
CASE WHEN Apple >= 50 THEN 1 ELSE 0 END AS Apple50
CASE WHEN Orange >= 50 THEN 1 ELSE 0 END AS Orange50
CASE WHEN Lemon >= 50 THEN 1 ELSE 0 END AS Lemon50
FROM (
SELECT td.timestamp,
COALESCE(MAX(td.value) FILTER (WHERE attribute_id = 16), 0) as Apple,
COALESCE(MAX(td.value) FILTER (WHERE attribute_id = 17), 0) as Orange,
COALESCE(MAX(td.value) FILTER (WHERE attribute_id = 18), 0) as Lemon
FROM TableData td
WHERE td.attribute_id IN (16, 17, 18)
GROUP BY td.timestamp
ORDER BY timestamp;
) AS td2
-- My attempt BigQuery Query --
SELECT
value_16 as Apple,
value_17 as Orange,
value_18 as Lemon,
CASE WHEN value_16 >= 50 THEN 1 ELSE 0 END as Apple50
CASE WHEN value_17 >= 50 THEN 1 ELSE 0 END as Orange50
CASE WHEN value_18 >= 50 THEN 1 ELSE 0 END AS Lemon50
FROM (
SELECT * FROM(
SELECT
timestamp,
attribute_id,
value
FROM `PROJECT_ID.DB_NAME.FRUITS` as td
WHERE td.attribute_id IN (16,17,18)
)PIVOT
(
MAX(value) as value
FOR attribute_id IN (16,17,18)
)
)as td2
Below is the sample relation of the table.
-- TableData --
attribute_id | value | timestamp |
--------------+-----------+------------+
17 | 100 | 1618822794 |
17 | 100 | 1618822861 |
16 | 50 | 1618822794 |
16 | 50 | 1618822861 |
-- TableAttribute --
id | name |
--------------+----------+
16 | Apple |
17 | Orange |
18 | Lemon |
-- Expected Result --
timestamp | Apple | Orange | Lemon | Apple50 | Orange50 | Lemon50 |
--------------+---------+--------+-------+---------+----------+---------+
1618822794 | 50 | 100 | 0 | 1 | 1 | 0
1618822861 | 50 | 100 | 0 | 1 | 1 | 0

Pivot is likely the best way to achieve what you're wanting. Consider the following approach though as it might be simpler to manage:
with aggregate_data as (
select td.timestamp
, ta.name
, td.value as value
from TableData td
full outer join TableAttribute ta
on td.attribute_id = ta.id
)
select timestamp
, value_Apple as Apple
, value_Orange as Orange
, value_Lemon as Lemon
, _50_Apple as Apple50
, _50_Orange as Orange50
, _50_Lemon as Lemon50
from aggregate_data
pivot(max(value) value, max(case when value >=50 then 1 else 0 end) _50 for name in ('Apple', 'Orange', 'Lemon'))
where timestamp is not null

Related

Grouping by column and rows

I have a table like this:
+----+--------------+--------+----------+
| id | name | weight | some_key |
+----+--------------+--------+----------+
| 1 | strawberries | 12 | 1 |
| 2 | blueberries | 7 | 1 |
| 3 | elderberries | 0 | 1 |
| 4 | cranberries | 8 | 2 |
| 5 | raspberries | 18 | 2 |
+----+--------------+--------+----------+
I'm looking for a generic request that would get me all berries where there are three entries with the same 'some_key' and one of the entries (within those three entries belonging to the same some_key) has the weight = 0
in case of the sample table, expected output would be:
1 strawberries
2 blueberries
3 cranberries
As you want to include non-grouped columns, I would approach this with window functions:
select id, name
from (
select id,
name,
count(*) over w as key_count,
count(*) filter (where weight = 0) over w as num_zero_weight
from fruits
window w as (partition by some_key)
) x
where x.key_count = 3
and x.num_zero_weight >= 1
The count(*) over w counts the number of rows in that group (= partition) and the count(*) filter (where weight = 0) over w counts how many of those have a weight of zero.
The window w as ... avoids repeating the same partition by clause for the window functions.
Online example: https://rextester.com/SGWFI49589
Try this-
SELECT some_key,
SUM(weight) --Sample aggregations on column
FROM your_table
GROUP BY some_key
HAVING COUNT(*) = 3 -- If you wants at least 3 then use >=3
AND SUM(CASE WHEN weight = 0 THEN 1 ELSE 0 END) >= 1
As per your edited question, you can try this below-
SELECT id, name
FROM your_table
WHERE some_key IN (
SELECT some_key
FROM your_table
GROUP BY some_key
HAVING COUNT(*) = 3 -- If you wants at least 3 then use >=3
AND SUM(CASE WHEN weight = 0 THEN 1 ELSE 0 END) >= 1
)
Try doing this.
Table structure and sample data
CREATE TABLE tmp (
id int,
name varchar(50),
weight int,
some_key int
);
INSERT INTO tmp
VALUES
('1', 'strawberries', '12', '1'),
('2', 'blueberries', '7', '1'),
('3', 'elderberries', '0', '1'),
('4', 'cranberries', '8', '2'),
('5', 'raspberries', '18', '2');
Query
SELECT t1.*
FROM tmp t1
INNER JOIN (SELECT some_key
FROM tmp
GROUP BY some_key
HAVING Count(some_key) >= 3
AND Min(Abs(weight)) = 0) t2
ON t1.some_key = t2.some_key;
Output
+-----+---------------+---------+----------+
| id | name | weight | some_key |
+-----+---------------+---------+----------+
| 1 | strawberries | 12 | 1 |
| 2 | blueberries | 7 | 1 |
| 3 | elderberries | 0 | 1 |
+-----+---------------+---------+----------+
Online Demo: http://sqlfiddle.com/#!15/70cca/26/0
Thank you, #mkRabbani for reminding me about the negative values.
Further reading
- ABS() Function - Link01, Link02
- HAVING Clause - Link01, Link02

Find unique dataset with max. value from 3 columns

Imaging following table
ID:PrimaryKey (Sequence generated Number)
ColA:ForeignKey(Number)
ColB:ForeignKey(Number)
ColC:ForeignKey(Number)
State:Enumeration(Number) 10,20,30,... 90
ValidFrom:TimeStamp(6)
LastUpdate:(6)
I know created a query to fetch any combination in the highest states (70 and above) The combination ColA,ColB and ColC should be unqiue. If there is a validfrom available the highest would win. If there are 2 in state 90 the newest would win:
So for some table like this
|------|------|------|-------|-------------|------------|
| ColA | ColB | ColC | State |ValidFrom |LastUpdate |
|------|------|------|-------|-------------|------------|
| 1 | 1 | 1 | 10 | null | 10.10.2018 | //Excluded
|------|------|------|-------|-------------|------------|
| 1 | 1 | 1 | 70 | null | 09.10.2018 | // lower State
|------|------|------|-------|-------------|------------|
| 1 | 1 | 1 | 90 | null | 05.05.2018 | // older LastUpdate
|------|------|------|-------|-------------|------------|
| 1 | 1 | 1 | 90 | null | 12.07.2018 | //Should Win
|------|------|------|-------|-------------|------------|
| 1 | 2 | 1 | 90 | 18.10.2018 | 12.07.2018 | //Should Win
|------|------|------|-------|-------------|------------|
| 1 | 2 | 1 | 90 | null | 18.11.2018 | //loose against ValidFrom
|------|------|------|-------|-------------|------------|
| 3 | 2 | 1 | 90 | 02.12.2018 | 04.08.2018 | //lower ValidFrom
|------|------|------|-------|-------------|------------|
| 3 | 2 | 1 | 70 | 19.10.2018 | 17.11.2018 | //lower state
|------|------|------|-------|-------------|------------|
| 3 | 2 | 1 | 90 | 18.10.2018 | 14.08.2018 | //Should win
|------|------|------|-------|-------------|------------|
So as you can see the combination of ColA,ColB and ColC should be unqiue at the end.
So I started writing a script gives me all the data with the highest states per combination:
SELECT MAINSELECT.*
FROM
FOO MAINSELECT
WHERE
MAINSELECT.STATE >= 70
AND NOT EXISTS
( SELECT SUBSELECT.ID
FROM
FOO SUBSELECT
WHERE SUBSELECT.ID <> MAINSELECT.ID
AND SUBSELECT.COLA = MAINSELECT.COLA
AND SUBSELECT.COLB = MAINSELECT.COLB
AND SUBSELECT.COLC = MAINSELECT.COLC
AND SUBSELECT.STATE > MAINSELECT.STATE);
This now gives me all in the highest state. As I do not want to use an OR statement I tried to solve the problem to query either NULL as Validfrom or the MAX in 2 different queries (and use union). So I tried to extend this base SELECT like this to get all with a ValidFrom != null && Max(ValidFrom):
SELECT MAINSELECT.*
FROM
FOO MAINSELECT
WHERE
MAINSELECT.STATE >= 70
MAINSELECT.VALIDFROM IS NOT NULL
AND NOT EXISTS
( SELECT SUBSELECT.ID
FROM
FOO SUBSELECT
WHERE SUBSELECT.ID <> MAINSELECT.ID
AND SUBSELECT.COLA = MAINSELECT.COLA
AND SUBSELECT.COLB = MAINSELECT.COLB
AND SUBSELECT.COLC = MAINSELECT.COLC
AND SUBSELECT.STATE > MAINSELECT.STATE)
AND NOT EXISTS
( SELECT SUBSELECT.ID
FROM
FOO SUBSELECT
WHERE SUBSELECT.ID <> MAINSELECT.ID -- Should not be the same
AND SUBSELECT.COLA = MAINSELECT.COLA -- Same combination!
AND SUBSELECT.COLB = MAINSELECT.COLB
AND SUBSELECT.COLC = MAINSELECT.COLC
AND SUBSELECT.STATE = MAINSELECT.STATE --Filter on same state!
AND SUBSELECT.VALIDFROM > MAINSELECT.VALIDFROM);
But this doesn't seem to work because now nothing ist printed.
I am expecting just row: 5 and 9! [Starting at 1 ;-)]
And I currently get row: 5, 7 and 9!
So the combination [3,2,1] is duplicate.
I do not get why the 2nd NOT EXISTS does not work. It's like there are 0F*** given!
Use row_number():
dbfiddle demo
select *
from (
select row_number() over (
partition by cola, colb, colc
order by state desc, validfrom desc nulls last, lastupdate desc) rn,
foo.*
from foo)
where rn = 1
7 wins against 9 because 2018-12-02 is newer than 2018-10-18.
Explanation:
partition by cola, colb, colc causes that for each combination of these columns numbering is done separately,
next are criteria of ordering, so higher state wins, then newer, not nullable validfrom wins, and at the end newer lastupdate wins.
For each combinantion of a, b, c we get separate set of numbered rows. Outer query filters only rows numbered as 1.
I found the answer. Instead of using NOT EXISTS I am trying to use the max, rpad and coalesce to create a string which I compare:
SELECT
MAINSELECT.*
FROM
FOO MAINSELECT
WHERE (1 = 1)
AND MAINSELECT.STATE >= 70
AND coalesce(to_char(MAINSELECT.state), rpad('0', 3, '0') ) || coalesce(to_char(MAINSELECT.validfrom,'YYMMDDhh24missFF'), rpad('0', 18, '0') ) || coalesce(to_char(MAINSELECT.lastupdate,'YYMMDDhh24missFF'), rpad('0', 18, '0') )
= (select max(coalesce(to_char(SUBSELECT.state), rpad('0', 3, '0') ) || coalesce(to_char(SUBSELECT.validfrom,'YYMMDDhh24missFF'), rpad('0', 18, '0') )|| coalesce(to_char(SUBSELECT.lastupdate,'YYMMDDhh24missFF'), rpad('0', 18, '0')))
FROM
FOO SUBSELECT
WHERE (1 = 1)
AND SUBSELECT.STATE >= 70
AND SUBSELECT.COLA = MAINSELECT.COLA
AND SUBSELECT.COLB = MAINSELECT.COLB
AND SUBSELECT.COLC = MAINSELECT.COLC
);
This creates a simple string with the values from the columns STATE,VALIDFROM and LASTUPDATE and is then trying to find the max of these! stating with the State which has the highest number and comes in the front!

Count who paid group by 1, 2 or 3+

I have a payment table like the example below and I need a query that gives me how many IDs paid (AMOUNT > 0) 1 time, 2 times, 3 or more times. Example:
+----+--------+
| ID | AMOUNT |
+----+--------+
| 1 | 50 |
| 1 | 0 |
| 2 | 10 |
| 2 | 20 |
| 2 | 15 |
| 2 | 10 |
| 3 | 80 |
+----+--------+
I expect the result:
+-----------+------------+-------------+
| 1 payment | 2 payments | 3+ payments |
+-----------+------------+-------------+
| 2 | 0 | 1 |
+-----------+------------+-------------+
ID 1: Paid 1 time (50). The other payment is 0, so I did not count. So, 1 person paid 1 time.
ID 2: Paid 3 times (10,20,15). So, 1 person paid 3 or more time.
ID 3: Paid 1 time (80). So, 2 persons paid 1 time.
I'm doing manually on excel right now but I'm pretty sure there is a more practical solution. Any ideas?
A little sub-query will do the trick
Declare #YOurTable table (ID int, AMOUNT int)
Insert into #YourTable values
( 1 , 50 ),
( 1 , 0) ,
( 2 , 10) ,
( 2 , 20) ,
( 2 , 15) ,
( 2 , 10) ,
( 3 , 80)
Select [1_Payment] = sum(case when Cnt=1 then 1 else 0 end)
,[2_Payment] = sum(case when Cnt=2 then 1 else 0 end)
,[3_Payment] = sum(case when Cnt>2 then 1 else 0 end)
From (
Select id
,Cnt=count(*)
From #YourTable
Where Amount<>0
Group By ID
) A
Returns
1_Payment 2_Payment 3_Payment
2 0 1
To get the output you want try using a table to form the data and then SELECT from that:
with c as (
select count(*) count from mytable where amount > 0 group by id)
select
sum(case count when 1 then 1 else 0 end) "1 Payment"
, sum(case count when 2 then 1 else 0 end) "2 Payments"
, sum(case when count > 2 then 1 else 0 end) "3 Payments"
from c
Here is an example you can play with to see how the query is working.

How to create sql selection based on condition?

I have the following database which shows characteristics of attributes as follows:
attributeId | attributeCode | groupCode
------------+---------------+-----------
1 | 10 | 50
1 | 10 | 50
1 | 12 | 50
My desired result from a select would be:
attributeId | groupcount | code10 | code12
------------+------------+--------+--------
1 | 1 | 2 | 1
Which means: attributeId = 1 has only one groupCode (50), where attributeCode=10 occurs 2 times and attributeCode=12 occurs 1 time.
Of course the following is not valid, but you get the idea of what I'm trying to achieve:
select attributeId,
count(distinct(groupCode)) as groupcount,
attributeCode = 10 as code10,
attributeCode = 12 as code12
from table
group by attributeId;
Try this:
SELECT attributeId, COUNT(DISTINCT groupCode) AS groupcount,
COUNT(CASE WHEN attributeCode = 10 THEN 1 END) AS code10,
COUNT(CASE WHEN attributeCode = 12 THEN 1 END) AS code12
FROM mytable
GROUP BY attributeId
Demo here

Need a query that switches columns

I have this table:
__________________________
| id1 | id2 | count | time |
|-----|-----|-------|------|
| abc | def | 10 | 3 |
| abc | def | 5 | 1 |
| ghi | jkl | 2 | 3 |
+--------------------------+
id1 and id2 are varchar, count is int and time is int.
id1 and id2 together make the primary key.
Time can be 1,2,3,4 or 5 depending on when an item was added (NOT UNIQUE).
I want to write a query that gives me this output instead:
_________________________________________
| id1 | id2 | 1 | 2 | 3 | 4 | 5 |
|-----|-----|-----|-----|-----|-----|-----|
| abc | def | 5 | 0 | 10 | 0 | 0 |
| ghi | jkl | 0 | 0 | 2 | 0 | 0 |
+-----------------------------------------+
Is that possible? I'm sittin here scratching my head but I cant figure it out!
You're in luck. The rule for a pivot is that you still need to know the number and names of the columns in the result set without having to look them up at the time you run the query. As long as you know that, you're okay, and in this case your columns are restricted to the range 1 through 5.
There are a few ways to pivot like this. I still prefer the sum(case) method:
select id1, id2,
sum(case when time = 1 then [count] else 0 end) "1",
sum(case when time = 2 then [count] else 0 end) "2",
sum(case when time = 3 then [count] else 0 end) "3",
sum(case when time = 4 then [count] else 0 end) "4",
sum(case when time = 5 then [count] else 0 end) "5"
from [table]
group by id1, id2
Another optioin is the PIVOT keyword:
select id1,id2,[1],[2],[3],[4],[5]
from [table]
PIVOT ( SUM([count]) FOR time IN ([1],[2],[3],[4],[5]) ) As Times
Something like this:
select ID1, ID2,
sum(f1) as '1',
sum(f2) as '2',
sum(f3) as '3',
sum(f4) as '4',
sum(f5) as '5'
from ( select ID1, ID2,
case when time =1 then time else 0 end as 'f1',
case when time =2 then time else 0 end as 'f2',
case when time =3 then time else 0 end as 'f3',
case when time =4 then time else 0 end as 'f4',
case when time =5 then time else 0 end as 'f5'
from dbo._Test
) as v
group by ID1, ID2
The inner query gives you columns for each time value, the outer query sums the values so you don't get two rows for the 'abc' + 'def' row.