convert rows into columns - Bigquery

convert rows into columns - Bigquery - google-bigquery

I have a table like as shown below
As shown, I have two rows for the same subject. each row indicating a day
However, I wish to convert them into a single row like as shown below
Can you help? I did check this post but unable to translate it?

I did check this post but unable to translate it?
Let's first transform your original data into form that we then can pivot
Below does this:
#standardSQL
SELECT subject_id, hm_id, icu_id, balance,
DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
FROM `project.dataset.table`
-- ORDER BY subject_id, hm_id, icu_id, delta
If to apply to sample data from your question - result is
Row subject_id hm_id icu_id balance delta
1 124 ab cd 2 1
2 124 ab cd 5 2
3 321 xy pq -6 1
4 321 xy pq 1 2
So, now we need to pivot this based on delta column - balance for delta = 1 will go to day_1_balance, balance for delta = 2 will go to day_2_balance and so on
Let's for now assume that there are just two deltas (as in your sample data). In this simplified case - below will make a trick
#standardSQL
SELECT subject_id, hm_id, icu_id,
MAX(IF(delta = 1, balance, NULL)) day_1_balance,
MAX(IF(delta = 2, balance, NULL)) day_2_balance
FROM (
SELECT subject_id, hm_id, icu_id, balance,
DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
FROM `project.dataset.table`
)
GROUP BY subject_id, hm_id, icu_id
-- ORDER BY subject_id, hm_id, icu_id
with result
Row subject_id hm_id icu_id day_1_balance day_2_balance
1 124 ab cd 2 5
2 321 xy pq -6 1
Obviously, in real case you don't know how many delta columns you have so you need to build above query dynamically - and that is exactly where post you referenced - will help you
You can try again by yourself - or see below for final solution
Step 1 - generating query
#standardSQL
WITH temp AS (
SELECT subject_id, hm_id, icu_id, balance,
DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
FROM `project.dataset.table`
)
SELECT CONCAT('SELECT subject_id, hm_id, icu_id,',
STRING_AGG(
CONCAT(' MAX(IF(delta = ',CAST(delta AS STRING),', balance, NULL)) as day_',CAST(delta AS STRING),'_balance')
)
,' FROM temp GROUP BY subject_id, hm_id, icu_id ORDER BY subject_id, hm_id, icu_id')
FROM (
SELECT delta
FROM temp
GROUP BY delta
ORDER BY delta
)
Result of step 1 is the text that represent final query that you need to run as step 2
Step 2 - run generated query
#standardSQL
WITH temp AS (
SELECT subject_id, hm_id, icu_id, balance,
DATE_DIFF(day, MIN(day) OVER(PARTITION BY subject_id, hm_id, icu_id), DAY) + 1 delta
FROM `project.dataset.table`
)
SELECT subject_id, hm_id, icu_id,
MAX(IF(delta = 1, balance, NULL)) AS day_1_balance,
MAX(IF(delta = 2, balance, NULL)) AS day_2_balance
FROM temp
GROUP BY subject_id, hm_id, icu_id
-- ORDER BY subject_id, hm_id, icu_id

Related

Getting category based on production shift

I have this query
with cte as(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY seq ORDER BY date_time) rn1,
ROW_NUMBER() OVER (PARTITION BY seq, output > 0
ORDER BY date_time) rn2
FROM myTable
;
select
seq,
date_time::date,
MIN(date_time) AS MinDatetime,
MAX(date_time) AS MaxDatetime,
SUM(output) AS sum_output
FROM cte cte
GROUP by
seq,
date_time::date ,
cntpr > 0,
rn1 - rn2
ORDER BY
seq,
MIN(date_time);
here's the result:
what I would like to do is to join my result to this master table
enter image description here
and the expected result will be MinDatetime and MaxDatetime among my master table's start and end shift to show the shift information, like this:
enter image description here
Any help would be very appreciated.. thank you!

This is the solution I came up with:
select seq, shift, start_shift, end_shift, MinDateTime, MaxDateTime
from
(
select
seq,
MIN(date_time) AS MinDatetime,
MAX(date_time) AS MaxDatetime,
SUM(output) AS sum_output
FROM cte cte
GROUP by
seq
ORDER BY
seq,
MIN(date_time::date)) t
join mstr
on
CASE
WHEN start_shift < end_shift THEN (MinDateTime::time between start_shift and end_shift) OR (MaxDateTime::time between start_shift and end_shift)
ELSE (MinDateTime::time >= start_shift) OR
(MaxDateTime::time >= start_shift) OR
(MinDateTime::time <= end_shift) OR
(MaxDateTime::time <= end_shift)
END
ORDER BY seq;
Fiddle: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/4208
Explanation: I get the groups, join them with master table on interval matching.

Partitioning on non-unique values

I have a table that lists events, operations in the events and the time of each operation. Event ID is not unique, as it is the same event, just happening on different times. Operations might differ for the same type of the event. The same event never runs twice in a row).
I want to populate three new columns as per given example. This will allow me to run analysis on the separate events as I'll be able to generate a unique "Event" ID.
Edit:
I already tried PARTITION function based on event and it haven't worked as SQL server assumes two events (A and B) and therefore gives the same start date to all "A" events, even if in reality I need to show them as separate events with different start dates.
Thank you!

This is just window functions:
select t.*,
min(operationtime) over (partition by event) as event_start_time,
max(operationtime) over (partition by event) as event_end_time,
concat(event, '-', min(operationtime) over (partition by event)) as event_id
from t;
Actually, for the event id, you probably want something like:
concat(event, '-', convert(varchar(255), min(operationtime) over (partition by event), 101)) as event_id
or whatever format for the date you really want. I recommend YYYY-MM-DD as a date format.

I understand this as a gaps-and-island problem, where you want to build groups of consecutive daily events.
One option uses the difference between row numbers to identify the groups:
select
t.*,
min(operation_time) over(partition by event, rn1 - rn2) event_start_time,
max(operation_time) over(partition by event, rn1 - rn2) event_end_time,
concat(event, '-', min(operation_time) over(partition by event, rn1 - rn2)) event_id
from (
select
t.*,
row_number() over(order by operation_time) rn1,
row_number() over(partition by event order by operation_time) rn2
from mytable t
) t
order by operation_time
If there is always one and only one event per day, as showned in your sample data, then one row_number() is sufficient, along with date arithmetics:
select
t.*,
min(operation_time) over(partition by event, grp) event_start_time,
max(operation_time) over(partition by event, grp) event_end_time,
concat(event, '-', min(operation_time) over(partition by event, grp)) event_id
from (
select
t.*,
dateadd(
day,
- row_number() over(partition by event order by operation_time),
operation_time
) grp
from mytable t
) t

This approach creates the event group explicitly, then it uses a windowing query very similar to the other answers. I created a simple sample table to show results.
Data
drop table if exists #tTEST;
go
select * INTO #tTEST from (values
('A', 'X', '2020-01-08'),
('A', 'Z', '2020-02-08'),
('B', 'X', '2020-03-08'),
('B', 'Z', '2020-04-08'),
('A', 'X', '2020-05-08'),
('A', 'Z', '2020-06-08')) V([Event], [Operation], operation_time);
Query
;with
grp_cte as (
select t.*, case when lag([Event], 1, 0) over (order by operation_time) != [Event] then 1 else 0 end grp_ind
from #tTEST t),
event_grp_cte as (
select gc.*, sum(grp_ind) over (order by operation_time) EventGroup
from grp_cte gc)
select
t.*,
min(operation_time) over(partition by EventGroup) event_start_time,
max(operation_time) over(partition by EventGroup) event_end_time,
concat(event, '-', min(operation_time) over(partition by EventGroup)) event_id
from event_grp_cte t
order by operation_time;
Results
Event Operation operation_time grp_ind EventGroup rn1 rn2 event_start_time event_end_time event_id
A X 2020-01-08 1 1 1 1 2020-01-08 2020-02-08 A-2020-01-08
A Z 2020-02-08 0 1 2 2 2020-01-08 2020-02-08 A-2020-01-08
B X 2020-03-08 1 2 3 1 2020-03-08 2020-04-08 B-2020-03-08
B Z 2020-04-08 0 2 4 2 2020-03-08 2020-04-08 B-2020-03-08
A X 2020-05-08 1 3 5 3 2020-05-08 2020-06-08 A-2020-05-08
A Z 2020-06-08 0 3 6 4 2020-05-08 2020-06-08 A-2020-05-08

Interview question:How to get last 3 month aggregation at column level?

This is the question i was being asked at Apple onsite interview and it blew my mind. Data is like this:
orderdate,unit_of_phone_sale
20190806,3000
20190704,3789
20190627,789
20190503,666
20190402,765
I had to write a query to get the result for each month sale, we should have last 3 month sales values. Let me put the expected output here.
order_monnth,M-1_Sale, M-2_Sale, M-3_Sale
201908,3000,3789,789,666
201907,3789,789,666,765
201906,789,666,765,0
201905,666,765,0,0
201904,765,0,0
I could only got the month wise sale and and used case statement by hardcoding month which was wrong. I banged my head to write this sql, but i could not.
Can anyone help on this. It will be really helpful for me to prepare for sql interviews
Update: This is what i tried
with abc as(
select to_char(order_date,'YYYYMM') as yearmonth,to_char(order_date,'YYYY') as year,to_char(order_date,'MM') as moth, sum(unit_of_phone_sale) as unit_sale
from t1 group by to_char(order_date,'YYYYMM'),to_char(order_date,'YYYY'),to_char(order_date,'MM'))
select yearmonth, year, case when month=01 then unit_sale else 0 end as M1_Sale,
case when month=02 then unit_sale else 0 end as M2_Sale...
case when month=12 then unit_sale else 0 end as M12_Sale
from abc

You will first of all need to sum the month's data and then use the LAG function to get previous months' data as following:
SELECT
ORDER_MONTH,
LAG(UNIT_OF_PHONE_SALE, 1) OVER(
ORDER BY
ORDER_MONTH
) AS "M-1_Sale",
LAG(UNIT_OF_PHONE_SALE, 2) OVER(
ORDER BY
ORDER_MONTH
) AS "M-2_Sale",
LAG(UNIT_OF_PHONE_SALE, 3) OVER(
ORDER BY
ORDER_MONTH
) AS "M-3_Sale"
FROM
(
SELECT
TO_CHAR(ORDERDATE, 'YYYYMM') AS ORDER_MONTH,
SUM(UNIT_OF_PHONE_SALE) AS UNIT_OF_PHONE_SALE
FROM
DATAA
GROUP BY
TO_CHAR(ORDERDATE, 'YYYYMM')
)
ORDER BY
ORDER_MONTH DESC;
Output:
ORDER_ M-1_Sale M-2_Sale M-3_Sale
------ ---------- ---------- ----------
201908 3789 789 666
201907 789 666 765
201906 666 765
201905 765
201904
db<>fiddle demo
Cheers!!
-- Update --
For the requirement mentioned in the comments, Following query will work for it.
CTE AS (
SELECT
TRUNC(ORDERDATE, 'MONTH') AS ORDER_MONTH,
SUM(UNIT_OF_PHONE_SALE) AS UNIT_OF_PHONE_SALE
FROM
DATAA
GROUP BY
TRUNC(ORDERDATE, 'MONTH')
)
SELECT
TO_CHAR(C.ORDER_MONTH,'YYYYMM') as ORDER_MONTH,
NVL(C1.UNIT_OF_PHONE_SALE, 0) AS "M-1_Sale",
NVL(C2.UNIT_OF_PHONE_SALE, 0) AS "M-2_Sale",
NVL(C3.UNIT_OF_PHONE_SALE, 0) AS "M-3_Sale"
FROM
CTE C
LEFT JOIN CTE C1 ON ( C1.ORDER_MONTH = ADD_MONTHS(C.ORDER_MONTH, - 1) )
LEFT JOIN CTE C2 ON ( C2.ORDER_MONTH = ADD_MONTHS(C.ORDER_MONTH, - 2) )
LEFT JOIN CTE C3 ON ( C3.ORDER_MONTH = ADD_MONTHS(C.ORDER_MONTH, - 3) )
ORDER BY
C.ORDER_MONTH DESC
Output:
db<>fiddle demo of updated answer.
Cheers!!

I think LEAD function can help here -
SELECT TO_CHAR(orderdate, 'YYYYMM') "DATE"
,unit_of_phone_sale M_1_Sale
,LEAD(unit_of_phone_sale,1,0) OVER(ORDER BY TO_CHAR(orderdate, 'YYYYMM') DESC) M_2_Sale
,LEAD(unit_of_phone_sale,2,0) OVER(ORDER BY TO_CHAR(orderdate, 'YYYYMM') DESC) M_3_Sale
,LEAD(unit_of_phone_sale,3,0) OVER(ORDER BY TO_CHAR(orderdate, 'YYYYMM') DESC) M_4_Sale
FROM table_sales
Here is the DB Fiddle

You can use this query:
select a.order_month, a.unit_of_phone_sale,
LEAD(unit_of_phone_sale, 1, 0) OVER (ORDER BY rownum) AS M_1,
LEAD(unit_of_phone_sale, 2, 0) OVER (ORDER BY rownum) AS M_2,
LEAD(unit_of_phone_sale, 3, 0) OVER (ORDER BY rownum) AS M_3
from (
select TO_CHAR(orderdate, 'YYYYMM') order_month,
unit_of_phone_sale,
rownum
from Y
order by order_month desc) a

Max dates for each sequence within partitions

I would like to see if somebody has an idea how to get the max and min dates within each 'id' using the 'row_num' column as an indicator when the sequence starts/ends in SQL Server 2016.
The screenshot below shows the desired output in columns 'min_date' and 'max_date'.
Any help would be appreciated.

You could use windowed MIN/MAX:
WITH cte AS (
SELECT *,SUM(CASE WHEN row_num > 1 THEN 0 ELSE 1 END)
OVER(PARTITION BY id, cat ORDER BY date_col) AS grp
FROM tab
)
SELECT *, MIN(date_col) OVER(PARTITION BY id, cat, grp) AS min_date,
MAX(date_col) OVER(PARTITION BY id, cat, grp) AS max_date
FROM cte
ORDER BY id, date_col, cat;
Rextester Demo

Try something like
SELECT
Q1.id, Q1.cat,
MIN(Q1.date) AS min_dat,
MAX(Q1.date) AS max_dat
FROM
(SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id, cat ORDER BY [date]) AS r1,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY [date]) AS r2
) AS Q1
GROUP BY
Q1.id, Q1.r2 - Q1.r1

sql server - returning text for multiple vendors based on average values

I have a table giving ratings for various suppliers/areas. The format is below.
I would like to know, for each distinct month, and supplier (exp_id)
What was the highest and lowest rated pickup_ward_text
My expected output is similar to:
[year][month][exp_id] [highest rated pickup_ward] [lowest rated pickup_ward]
Where the 'rated' is an average of rating_driver, rating_punctuality & rating_vehicle
I am completely lost on how to achieve this, I have tried to past the first line of the table correctly below. a
Year Month exp_id RATING_DRIVER RATING_PUNCTUALITY RATING_VEHICLE booking_id pickup_date ratingjobref rating_date PICKUP_WARD_TEXT
2013 10 4 5.00 5.00 5.00 1559912 30:00.0 1559912 12/10/2013 18:29 N4

There's a common pattern using row_number() to find either the minimum or the maximum. You can combine them with a little trickery:
select
year,
month,
exp_id,
max(case rn1 when 1 then pickup_ward_text end) as min_pickup_ward_text,
max(case rn2 when 1 then pickup_ward_text end) as max_pickup_ward_text
from (
select
year,
month,
exp_id,
pickup_ward_text,
row_number() over (
partition by year, month, exp_id
order By rating_driver + rating_punctuality + rating_vehicle
) rn1,
row_number() over (
partition by year, month, exp_id
order By rating_driver + rating_punctuality + rating_vehicle desc
) rn2
from
mytable
) x
where
rn1 = 1 or rn2 = 1 -- this line isn't necessary, but might make things quicker
group by
year,
month,
exp_id
order by
year,
month,
exp_id
It may actually be faster to do two derived tables, for each part and inner join them. Some testing is in order:
select
n.year,
n.month,
n.exp_id,
n.pickup_ward_text as min_pickup_ward_text,
x.pickup_ward_text as max_pickup_ward_text
from (
select
year,
month,
exp_id,
pickup_ward_text,
row_number() over (
partition by year, month, exp_id
order By rating_driver + rating_punctuality + rating_vehicle
) rn
from
mytable
) n
inner join (
select
year,
month,
exp_id,
pickup_ward_text,
row_number() over (
partition by year, month, exp_id
order By rating_driver + rating_punctuality + rating_vehicle desc
) rn
from
mytable
) x
on n.year = x.year and n.month = x.month and n.exp_id = x.exp_id
where
n.rn = 1 and
x.rn = 1
order by
year,
month,
exp_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

convert rows into columns - Bigquery - google-bigquery

I have a table like as shown below As shown, I have two rows for the same subject. each row indicating a day However, I wish to convert them into a single row like as shown below Can you help? I did check this post but unable to translate it?

Related

Getting category based on production shift

Partitioning on non-unique values

Interview question:How to get last 3 month aggregation at column level?

Max dates for each sequence within partitions

sql server - returning text for multiple vendors based on average values

Categories

Resources