group by and aggregate from tall to wide data in bigquery sql

group by and aggregate from tall to wide data in bigquery sql - google-bigquery

Hi I want to aggregate group by and change it from tall to wide data in bigquery, how do I do so? I have a lot of sources and here's a sample data.
Here's the table
date
source
price
id
2022-01-01
A
2
1
2022-01-02
A
2
1
2022-01-03
A
4
1
2022-01-04
A
4
1
2022-01-01
B
1
1
2022-01-02
B
1
1
2022-01-03
B
3
1
2022-01-04
B
3
1
2022-01-01
A
2
2
2022-01-02
A
2
2
2022-01-03
A
4
2
2022-01-04
A
4
2
2022-01-01
B
1
2
2022-01-02
B
1
2
2022-01-03
B
3
2
2022-01-04
B
3
2
into
fields of min from price from all source for group by id and min price per source group by id
id
minPrice
minPriceSourceA
minPriceSourceB
1
2.5
3
2
2
2.5
3
2
Here's my current code
with Amin as
(
select
id,source,
min(price) min price
from table
where source ="A"
group by 1,2
),
Bmin as
(
select
id,source,
min(price) min price
from table
where source ="B"
group by 1,2
),
select
t1.id,t1.minprice,
Amin.minprice minPriceSourceA,
Bmin.minprice minPriceSourceB
from(
select
id,source,
min(price) minprice
from table
group by 1,2) t1
left join Amin on t1.id=Amin.id
left join Bmin on t1.id=Bmin.id
The problem is I have over 100 sources and id, if I do query manually the code will be very long. Is there an efficient way to do this?

You can use PIVOT to transpose rows into columns and get the MIN of a list of columns at once:
with sample as (
select "2022-01-01" as date, "A" as source, 2 as price, "1" as id
UNION ALL
select "2022-01-02" as date, "A" as source, 1 as price, "1" as id
UNION ALL
select "2022-01-04" as date, "B" as source, 1 as price, "1" as id
UNION ALL
select "2022-01-04" as date, "A" as source, 2 as price, "2" as id
UNION ALL
select "2022-01-04" as date, "A" as source, 4 as price, "2" as id
UNION ALL
select "2022-01-04" as date, "B" as source, 3 as price, "2" as id
),
min_by_source as (
SELECT * FROM
(SELECT id, source, price FROM sample)
PIVOT(MIN(price) AS minPrice FOR source IN ('A', 'B')) -- add here the others sources
),
min_global as (
SELECT id, MIN(price) AS minPrice
FROM sample
GROUP BY id
)
SELECT *
FROM min_global
JOIN min_by_source USING (id)
Output:
id minPrice minPrice_A minPrice_B
1 1 1 1
2 2 2 3

Consider below option
select * from (
select * except(date),
avg(price) over(partition by id) avgPrice,
min(price) over(partition by id) minPrice
from your_table)
pivot (min(price) minPriceSource for source in ('A', 'B'))
if applied to sample data in your question - output is
The problem is I have over 100 sources and id, if I do query manually the code will be very long. Is there an efficient way to do this?
Use below dynamic version
execute immediate (select '''
select * from (
select * except(date),
avg(price) over(partition by id) avgPrice,
min(price) over(partition by id) minPrice
from your_table)
pivot (min(price) minPriceSource for source in (''' || string_agg(distinct '"' || source || '"') || '''))
'''
from your_table
)

Related

How do i select all columns, plus the result of the sum

I have this select:
"Select * from table" that return:
Id
Value
1
1
1
1
2
10
2
10
My goal is create a sum from each Value group by id like this:
Id
Value
Sum
1
1
2
1
1
2
2
10
20
2
10
20
I Have tried ways like:
SELECT Id,Value, (SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY IDRNC ) FROM Table v;
But the is not grouping by id.
Id
Value
Sum
1
1
1
1
1
1
2
10
10
2
10
10

Aggregation aggregates rows, reducing the number of records in the output. In this case you want to apply the result of a computation to each of your records, task carried out by the corresponding window function.
SELECT table.*, SUM(Value) OVER(PARTITION BY Id) AS sum_
FROM table
Check the demo here.

Your attempt looks correct.
Can you try the below query :
It works for me :
SELECT Id, Value,
(SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY ID) as sum
FROM Table v;

You can do it using inner join to join with selection grouped by id :
select t.*, sum
from _table t
inner join (
select id, sum(Value) as sum
from _table
group by id
) as s on s.id = t.id
You can check it here

Your select is ok if you adjust it just a little:
SELECT Id,Value, (SELECT SUM(Value) FROM Table V2 WHERE V2.Id= V.Id GROUP BY IDRNC ) FROM Table v;
GROUP BY IDRNC is a mistake and should be GROUP BY ID
you should give an alias to a sum column ...
subquery selecting the sum does not have to have self table alias to be compared with outer query that has one (this is not a mistake - works either way)
Test:
WITH
a_table (ID, VALUE) AS
(
Select 1, 1 From Dual Union All
Select 1, 1 From Dual Union All
Select 2, 10 From Dual Union All
Select 2, 10 From Dual
)
SELECT ID, VALUE, (SELECT SUM(VALUE) FROM a_table WHERE ID = v.ID GROUP BY ID) "ID_SUM" FROM a_table v;
ID VALUE ID_SUM
---------- ---------- ----------
1 1 2
1 1 2
2 10 20
2 10 20

SQL Query to find the Row with first change of data

UniqueId
ITEM
DATE
1
A
2022-01-01
2
A
2022-01-02
3
B
2022-01-03
4
B
2022-01-04
5
A
2022-01-05
6
A
2022-01-06
7
B
2022-01-07
8
B
2022-01-08
9
A
2022-01-09
10
A
2022-01-10
11
A
2022-01-11
I have above table where the item is changing from A to B and then B to A (etc).
The the most recent item in the table based on the date is A (the last row).
I need to find the date on which this last item (A) was started to be in effect.
So in this case the item A was in effect from 2022-01-09 onwards (UniqueId 9).
How can I find the UniqueId or the date of item A, where it got changed to be in effect (Row 9)?
Thank you.

with data as (
select *,
last_value(item) over (order by "date") as last_item,
lag(item) over (order by "date") as prev_item
from T
)
select
max(case when item = last_item and item <> prev_item then "date" end) as max_date
from data;
or
with data as (
select *,
case when item <> lag(item) over (order by "date")
and item = last_value(item) over (order by "date")
then 1 end as flag
from T
)
select max("date") as last_transition_date
from data
where flag = 1;
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=bd5f6398c0167d74c26a67fafac5225e
Supposing you need all the data:
with data as (
select *,
case when item <> lag(item) over (order by "date")
and item = last_value(item) over (order by "date")
then 1 end as flag
from T
)
select *,
max(case when flag = 1 then "date" end) over () as last_transition_date
from data;

Getting a flag using a comparison of current item with previous item in time, using LAG() is indeed the way.
But it's absolutely sufficient to get the highest date and highest unique (as both are sorted ascending together) where the obtained flag is 1:
WITH
-- your input
indata(UniqueId,ITEM,DATE) AS (
SELECT 1,'A',DATE '2022-01-01'
UNION ALL SELECT 2,'A',DATE '2022-01-02'
UNION ALL SELECT 3,'B',DATE '2022-01-03'
UNION ALL SELECT 4,'B',DATE '2022-01-04'
UNION ALL SELECT 5,'A',DATE '2022-01-05'
UNION ALL SELECT 6,'A',DATE '2022-01-06'
UNION ALL SELECT 7,'B',DATE '2022-01-07'
UNION ALL SELECT 8,'B',DATE '2022-01-08'
UNION ALL SELECT 9,'A',DATE '2022-01-09'
UNION ALL SELECT 10,'A',DATE '2022-01-10'
UNION ALL SELECT 11,'A',DATE '2022-01-11'
)
-- real query starts here; replace following comma with "WITH"
,
w_change_ind AS (
SELECT
*
, CASE WHEN LAG(item) OVER(ORDER BY date) <> item
THEN 1
ELSE 0
END AS chg_ind
FROM indata
)
SELECT
MAX(uniqueid) AS uqid
, MAX(date) AS dt
FROM w_change_ind
WHERE chg_ind=1
;
-- out uqid | dt
-- out ------+------------
-- out 9 | 2022-01-09

Based on your description, this is one way to do what you want.
select top 1 * from table1
where item ='A'
order by uniqueid desc
If this is not what you want, then you will have to provide additional information.

SQL: Running total count of distinct values

I'm trying to obtain rolling number of unique values in a window.
Here's how my table looks like:
SELECT
user_id
, order_date
, product
FROM example_table
WHERE user_id = 1
ORDER BY order_date ASC
user_id
order_date
product
1
2021-01-01
A
1
2021-01-01
B
1
2021-01-04
A
1
2021-01-07
C
1
2021-01-09
C
1
2021-01-20
A
Here's what I'm trying to achieve:
user_id
order_date
product
cum_dist_count
1
2021-01-01
A
1
1
2021-01-02
B
2
1
2021-01-04
A
2
1
2021-01-07
C
3
1
2021-01-09
C
3
1
2021-01-20
A
3
In other words, I want to be able to see how many unique items a customer has bough so far, and be able to see that for particular date (so for the example above: on 2021-01-04 they have bought 2 unique items and for 2021-01-07 that number was 3).
I've tried grouping by selecting user_id and product, and min(order_date) in a CTE, then doing ROW_NUMBER over user_id and product in that CTE and that worked partially - I'm able to seethe dates the countof unique products has changed (so for this example: 2021-01-01, 2021-01-02 and 2021-01-07, but then I loose the rows "between" which I still want to be able to access.
with cte as (
SELECT
user_id
, product
, min(order_date) as first_order
FROM example_table
GROUP BY 1,2
ORDER BY order_date ASC
)
SELECT
user_id
, first_order
, product
, ROW_NUMBER() OVER (PARTITION BY user_id, product ORDER BY first_order) AS number_of_unique_products
WHERE user_id = 1
With the above, I would get:
user_id
order_date
product
cum_dist_count
1
2021-01-01
A
1
1
2021-01-02
B
2
1
2021-01-07
C
3
The DB is in BigQuery StandardSQL.
Any help is much appreciated!

For each item, you can record the earliest date it appears. Then add those up:
select et.* except (seqnum),
countif(seqnum = 1) over (partition by user_id order by order_date) as running_distinct_count
from (select et.*,
row_number() over (partition by user_id, product order by order_date) as seqnum
from example_table et
) et

Below is for BigQuery
select * except(cum_products),
(select count(distinct product) from t.cum_products product) as cum_dist_count
from (
select *,
array_agg(product) over prev_rows as cum_products
from example_table
window prev_rows as (partition by user_id order by order_date)
) t
if applied to sample data in your question
with example_table as (
select 1 user_id, '2021-01-01' order_date, 'A' product union all
select 1, '2021-01-02', 'B' union all
select 1, '2021-01-04', 'A' union all
select 1, '2021-01-07', 'C' union all
select 1, '2021-01-09', 'C' union all
select 1, '2021-01-20', 'A'
)
output is

PostgreSQL Pivot by Last Date

I need to make a PIVOT table from Source like this table
FactID UserID Date Product QTY
1 11 01/01/2020 A 600
2 11 02/01/2020 A 400
3 11 03/01/2020 B 500
4 11 04/01/2020 B 200
6 22 06/01/2020 A 1000
7 22 07/01/2020 A 200
8 22 08/01/2020 B 300
9 22 09/01/2020 B 100
Need Pivot Like this where Product QTY is QTY by Last Date
UserID A B
11 400 200
22 200 100
My try PostgreSQL
Select
UserID,
MAX(CASE WHEN Product='A' THEN 'QTY' END) AS 'A',
MAX(CASE WHEN Product='B' THEN 'QTY' END) AS 'B'
FROM table
GROUP BY UserID
And Result
UserID A B
11 600 500
22 1000 300
I mean I get a result by the maximum QTY and not by the maximum date!
What do I need to add to get results by the maximum (last) date ??

Postgres doesn't have "first" and "last" aggregation functions. One method for doing this (without a subquery) uses arrays:
select userid,
(array_agg(qty order by date desc) filter (where product = 'A'))[1] as a,
(array_agg(qty order by date desc) filter (where product = 'B'))[1] as b
from tab
group by userid;
Another method uses select distinct with first_value():
select distinct userid,
first_value(qty) over (partition by userid order by product = 'A' desc, date desc) as a,
first_value(qty) over (partition by userid order by product = 'B' desc, date desc) as b
from tab;
With the appropriate indexes, though, distinct on might be the fastest approach:
select userid,
max(qty) filter (where product = 'A') as a,
max(qty) filter (where product = 'B') as b
from (select distinct on (userid, product) t.*
from tab t
order by userid, product, date desc
) t
group by userid;
In particular, this can use an index on userid, product, date desc). The improvement in performance will be most notable if there are many dates for a given user.

You can use DENSE_RANK() window function in order to filter by the last date per each product and UserID before applying conditional aggregation such as
SELECT UserID,
MAX(CASE WHEN Product='A' THEN QTY END) AS "A",
MAX(CASE WHEN Product='B' THEN QTY END) AS "B"
FROM
(
SELECT t.*, DENSE_RANK() OVER (PARTITION BY Product,UserID ORDER BY Date DESC) AS rn
FROM tab t
) q
WHERE rn = 1
GROUP BY UserID
Demo
presuming all date values are distinct(no ties occur for dates)

Select except where different in SQL

I need a bit of help with a SQL query.
Imagine I've got the following table
id | date | price
1 | 1999-01-01 | 10
2 | 1999-01-01 | 10
3 | 2000-02-02 | 15
4 | 2011-03-03 | 15
5 | 2011-04-04 | 16
6 | 2011-04-04 | 20
7 | 2017-08-15 | 20
What I need is all dates where only one price is present.
In this example I need to get rid of row 5 and 6 (because there is two difference prices for the same date) and either 1 or 2(because they're duplicate).
How do I do that?

select date,
count(distinct price) as prices -- included to test
from MyTable
group by date
having count(distinct price) = 1 -- distinct for the duplicate pricing

The following should work with any DBMS
SELECT id, date, price
FROM TheTable o
WHERE NOT EXISTS (
SELECT *
FROM TheTable i
WHERE i.date = o.date
AND (
i.price <> o.price
OR (i.price = o.price AND i.id < o.id)
)
)
;
JohnHC answer is more readable and delivers the information the OP asked for ("[...] I need all the dates [...]").
My answer, though less readable at first, is more general (allows for more complexes tie-breaking criteria) and also is capable of returning the full row (with id and price, not just date).

;WITH CTE_1(ID ,DATE,PRICE)
AS
(
SELECT 1 , '1999-01-01',10 UNION ALL
SELECT 2 , '1999-01-01',10 UNION ALL
SELECT 3 , '2000-02-02',15 UNION ALL
SELECT 4 , '2011-03-03',15 UNION ALL
SELECT 5 , '2011-04-04',16 UNION ALL
SELECT 6 , '2011-04-04',20 UNION ALL
SELECT 7 , '2017-08-15',20
)
,CTE2
AS
(
SELECT A.*
FROM CTE_1 A
INNER JOIN
CTE_1 B
ON A.DATE=B.DATE AND A.PRICE!=B.PRICE
)
SELECT * FROM CTE_1 WHERE ID NOT IN (SELECT ID FROM CTE2)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

group by and aggregate from tall to wide data in bigquery sql - google-bigquery

Related

How do i select all columns, plus the result of the sum

SQL Query to find the Row with first change of data

SQL: Running total count of distinct values

PostgreSQL Pivot by Last Date

Select except where different in SQL

Categories

Resources