To pivot a table based on a specific event value using Query

To pivot a table based on a specific event value using Query - sql

I want to make Table A like Table B.
I'd like to see what events the User caused before the Purchase event.
I've used row_number() over (partition by client_id, event_type order by time) and it's simply a pivot. How do I make logic?
Table A
client_id event_type count time
A cart 1 AM 12:00:00
A view 4 AM 12:01:00
A purchase 2 AM 12:05:00
A view 2 AM 12:10:00
B view 3 AM 12:03:00
B purchase 1 AM 12:05:00
B view 2 AM 12:10:00
Table B
client_id view cart purchase
A 4 1 2
A 2 0 0
B 3 0 1
B 2 0 0

Here is a way of doing this, i define a group of events as belonging to a single "session/activity" before purchase using the block grp_split.
Then i get this grouping correctly done in the block x, by replacing null values with the previously not null value using the max(grp) over(partition by client_id order by time1) as grp2.
After that its a matter of pivoting the columns for view,cart and purchase
with data
as (
select 'A' as client_id,'cart' as event_type , 1 as count1, cast('AM 12:00:00' as time) as time1 union all
select 'A' as client_id,'view' as event_type , 4 as count1, cast('AM 12:01:00' as time) as time1 union all
select 'A' as client_id,'purchase' as event_type , 2 as count1, cast('AM 12:05:00' as time) as time1 union all
select 'A' as client_id,'view' as event_type , 2 as count1, cast('AM 12:10:00' as time) as time1 union all
select 'B' as client_id,'view' as event_type , 3 as count1, cast('AM 12:03:00' as time) as time1 union all
select 'B' as client_id,'purchase' as event_type , 1 as count1, cast('AM 12:05:00' as time) as time1 union all
select 'B' as client_id,'view' as event_type , 2 as count1, cast('AM 12:10:00' as time) as time1
)
,grp_split
as(
select case when lag(event_type) over(partition by client_id order by time1)='purchase'
or lag(event_type) over(partition by client_id order by time1) is null
then
row_number() over(partition by client_id order by time1)
end as grp
,*
from data
)
select x.client_id
,max(case when event_type='view' then count1 else 0 end) as view
,max(case when event_type='cart' then count1 else 0 end) as cart
,max(case when event_type='purchase' then count1 else 0 end) as purchase
from (
select *
,max(grp) over(partition by client_id order by time1) as grp2
from grp_split
)x
group by client_id
,grp2
order by client_id
output
+-----------+------+------+----------+
| client_id | view | cart | purchase |
+-----------+------+------+----------+
| A | 4 | 1 | 2 |
| A | 2 | 0 | 0 |
| B | 3 | 0 | 1 |
| B | 2 | 0 | 0 |
+-----------+------+------+----------+
working example
https://dbfiddle.uk/?rdbms=postgres_12&fiddle=aeeb0878b9094e061c469bb0efb7a024

Related

Select with limited join

I have two tables: products and products_prices.
products table:
id
name
user_id
1
Headphones
1
2
Phone
1
products_prices table:
id
product_id
price
time
1
1
10
1
2
1
15
2
3
1
20
3
4
2
10
4
5
2
15
5
6
2
20
6
I have a simple query:
SELECT * FROM products WHERE (user_id = 1) LIMIT 1 OFFSET 1
So I need to get limited rows from products table with only two prices values from table product_prices ordered by time for each row in products.
(I need to get product with two latest prices).
This is example of what I want to get:
id
user_id
name
curr_price
prev_price
2
1
Phone
20
15
And example of my query:
select products.*,
(SELECT price FROM products_prices WHERE product_id = products.id ORDER BY time asc LIMIT 1 OFFSET 0) as curr_price,
(SELECT price FROM products_prices WHERE product_id = products.id ORDER BY time asc LIMIT 1 OFFSET 1) as prev_price
from "products"
where (products."user_id" = 1)
limit 1 offset 1
Is it possible to do it without subqueries?

Not sure I find any of these easier to read...
0th approach using window functions and a CTE Demo
With products as (SELECT 1 ID, 'Headphones' name, 1 user_id UNION ALL
SELECT 2 ID, 'Phone' name, 1 user_id ),
products_Prices as (SELECT 1 ID, 1 Product_ID, 10 price, 1 time UNION ALL
SELECT 2 ID, 1 Product_ID, 15 price, 2 time UNION ALL
SELECT 3 ID, 1 Product_ID, 20 price, 3 time UNION ALL
SELECT 4 ID, 2 Product_ID, 33 price, 4 time UNION ALL
SELECT 5 ID, 2 Product_ID, 22 price, 5 time UNION ALL
SELECT 6 ID, 2 Product_ID, 11 price, 6 time),
STEP1 as (
SELECT P.ID, P.Name, P.user_ID,
price as CurrentPrice, lead(price) over (partition by P.ID order by time desc) Prev_Price, time,
row_number() over (Partition by P.ID order by time Desc) RN
FROM Products P
LEFT JOIN Products_Prices Z
on Z.Product_ID = P.ID)
SELECT Id, Name, User_ID, CurrentPRice, PRev_Price
From Step1 where RN = 1
Giving us:
+----+------------+---------+--------------+------------+
| id | name | user_id | currentprice | prev_price |
+----+------------+---------+--------------+------------+
| 1 | Headphones | 1 | 20 | 15 |
| 2 | Phone | 1 | 11 | 22 |
+----+------------+---------+--------------+------------+
1st approach using analytics and a CTE: note I changed price numbers to show variance.
DEMO
With products as (SELECT 1 ID, 'Headphones' name, 1 user_id UNION ALL
SELECT 2 ID, 'Phone' name, 1 user_id ),
products_Prices as (SELECT 1 ID, 1 Product_ID, 10 price, 1 time UNION ALL
SELECT 2 ID, 1 Product_ID, 15 price, 2 time UNION ALL
SELECT 3 ID, 1 Product_ID, 20 price, 3 time UNION ALL
SELECT 4 ID, 2 Product_ID, 33 price, 4 time UNION ALL
SELECT 5 ID, 2 Product_ID, 22 price, 5 time UNION ALL
SELECT 6 ID, 2 Product_ID, 11 price, 6 time),
STEP1 as (SELECT P.ID, P.Name, P.user_ID, PP.price, row_number() over (partition by PP.product_ID order by time desc) RN
FROM Products P
LEFT JOIN products_prices PP
on P.ID = PP.Product_ID)
SELECT ID, Name, User_ID, max(case when RN = 1 then Price end) as Current_price, max(case when RN=2 then price end) as Last_price
FROM STEP1
WHERE RN <=2
GROUP BY ID, name, User_ID
Giving us:
+----+------------+---------+---------------+------------+
| id | name | user_id | current_price | last_price |
+----+------------+---------+---------------+------------+
| 2 | Phone | 1 | 11 | 22 |
| 1 | Headphones | 1 | 20 | 15 |
+----+------------+---------+---------------+------------+
Option 2 using lateral.
demo
With products as (SELECT 1 ID, 'Headphones' name, 1 user_id UNION ALL
SELECT 2 ID, 'Phone' name, 1 user_id ),
products_Prices as (SELECT 1 ID, 1 Product_ID, 10 price, 1 time UNION ALL
SELECT 2 ID, 1 Product_ID, 15 price, 2 time UNION ALL
SELECT 3 ID, 1 Product_ID, 20 price, 3 time UNION ALL
SELECT 4 ID, 2 Product_ID, 33 price, 4 time UNION ALL
SELECT 5 ID, 2 Product_ID, 22 price, 5 time UNION ALL
SELECT 6 ID, 2 Product_ID, 11 price, 6 time)
SELECT P.ID, P.Name, P.user_ID, PP.price, time
FROM Products P
LEFT JOIN lateral (SELECT Product_ID, Price, time
FROM Products_Prices Z
WHERE Z.Product_ID = P.ID
ORDER BY Time Desc LIMIT 2) PP
on TRUE
ORDER BY TIME DESC;
Givng us : (unpivoted) and using the row number logic above we could pivot.
+----+------------+---------+-------+------+
| id | name | user_id | price | time |
+----+------------+---------+-------+------+
| 2 | Phone | 1 | 11 | 6 |
| 2 | Phone | 1 | 22 | 5 |
| 1 | Headphones | 1 | 20 | 3 |
| 1 | Headphones | 1 | 15 | 2 |
+----+------------+---------+-------+------+

Split Columns into two equal number of Rows

I have the table structure below,
I need to merge the CouponNumber to two equal as CouponNumber1 and CouponNumber2 as shown in the figure
SELECT Name, MobileNumber, CouponNumber, IsDispatched, Status
FROM CouponInvoicePrescription
This is my query.

Try this:
WITH
input(ord,name,mobno,couponno,isdispatched,status) AS (
SELECT 0,'amar',8888888888,'CPever901',FALSE,1
UNION ALL SELECT 1,'amar',8888888888,'CP00005' ,FALSE,1
UNION ALL SELECT 2,'pt3' ,7777777777,'cp9090' ,FALSE,1
UNION ALL SELECT 3,'pt3' ,7777777777,'ev2' ,FALSE,1
UNION ALL SELECT 4,'pt3' ,7777777777,'cp9909' ,FALSE,1
UNION ALL SELECT 5,'pt3' ,7777777777,'cp10' ,FALSE,1
)
SELECT
name
, MAX(CASE ord % 2 WHEN 1 THEN couponno END) AS couponno1
, MAX(CASE ord % 2 WHEN 0 THEN couponno END) AS couponno2
, isdispatched
, status
FROM input
GROUP BY
ord / 2
, name
, isdispatched
, status
ORDER BY 1
-- out name | couponno1 | couponno2 | isdispatched | status
-- out ------+-----------+-----------+--------------+--------
-- out amar | CP00005 | CPever901 | f | 1
-- out pt3 | cp10 | cp9909 | f | 1
-- out pt3 | ev2 | cp9090 | f | 1

Try this:
SELECT * FROM
(
SELECT
sub.rn,
sub.Name,
sub.MobileNumber,
sub.CouponNumber as CouponNumber1,
LEAD(sub.CouponNumber,1) OVER (PARTITION BY sub.MobileNumber ORDER BY sub.rn) as CouponNumber2,
sub.IsDispatched,
sub.Status
FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION by MobileNumber ORDER BY Name) as rn,
*
FROM
input
) sub
)
WHERE rn % 2 <> 0

Distinct Conditional Counting to Avoid Overlap

Consider this table:
[Table1]
------------------------
| Person_ID | Yes | No |
|-----------|-----|----|
| 1 | 1 | 0 |
|-----------|-----|----|
| 1 | 1 | 0 |
|-----------|-----|----|
| 2 | 0 | 1 |
|-----------|-----|----|
| 2 | 0 | 1 |
|-----------|-----|----|
| 3 | 1 | 0 |
|-----------|-----|----|
| 3 | 1 | 0 |
|-----------|-----|----|
| 3 | 0 | 1 |
|-----------|-----|----|
| 3 | 1 | 0 |
------------------------
I need a distinct count on Person_ID to get the number of people that are marked Yes and No. However, if someone has a single instance of No, they should be counted as a No and not be included in the Yes count no matter how many Yes they have.
My first thought was to try something similar to:
select count(distinct (case when Yes = 1 then Person_ID else null end)) Yes_People
, count(distinct (case when No = 1 then Person_ID else null end)) No_People
from Table1
but this will result in 3 being counted in both the Yes and No counts.
My desired output would be:
--------------------------
| Yes_People | No_People |
|------------|-----------|
| 1 | 2 |
--------------------------
I'm hoping to avoid the performance hit from having to evaluate a subquery against each row but if it has to be the way to go I will accept that.

Aggregate first at the person level and then overall:
select sum(yes_only) as yes_only,
sum(1 - yes_only) as no
from (select person_id,
(case when max(yes) = min(yes) and max(yes) = 1
then 1
end) as yes_only
from t
group by person_id
) t

You can first group them by the person.
Then the CASE for the Yes people can have a not No condition.
SELECT
COUNT(CASE WHEN No = 0 AND Yes = 1 THEN Person_ID END) AS Yes_People,
COUNT(CASE WHEN No = 1 THEN Person_ID END) AS No_People
FROM
(
select Person_ID
, MAX(Yes) as Yes
, MAX(No) as No
FROM Table1
GROUP BY Person_ID
) q

You could use a window function to rank the rows for a single person_id to prioritize a 'No' over a 'Yes', but that will require a subquery
select count(case when yes=1 then 1 end) as yes_count,
count(case when no=1 then no_count) as no_count
from (
select person_id, yes, no, row_number() over (order by no desc, yes desc) as rn
from table1
)
where rn = 1
The inner subquery plus the where filter will get you a single row per person_id, giving priority to the 'no' records.
This of course assumes yes/no are mutually exclusive, and if that's true, you should probably change the model to a single field.

Think you need to precheck every person with a window function
with t as (select 1 p_id, 1 yes, 0 no from dual
union all select 1 p_id, 1 yes, 0 no from dual
union all select 2 p_id, 0 yes, 1 no from dual
union all select 2 p_id, 0 yes, 1 no from dual
union all select 3 p_id, 1 yes, 0 no from dual
union all select 3 p_id, 0 yes, 1 no from dual
union all select 3 p_id, 1 yes, 0 no from dual)
, chk as (
select max(no) over (partition by p_id) n
, max(yes) over (partition by p_id) y
, p_id
from t)
-- select * from chk;
select count(distinct decode(y-n,1,p_id,null )) yes_people
, count(distinct decode(n,1,p_id,null )) no_people
from chk
group by 1;

You can use Conditional aggregation as following:
SQL> with table1 as (select 1 PERSON_ID, 1 yes, 0 no from dual
2 union all select 1 PERSON_ID, 1 yes, 0 no from dual
3 union all select 2 PERSON_ID, 0 yes, 1 no from dual
4 union all select 2 PERSON_ID, 0 yes, 1 no from dual
5 union all select 3 PERSON_ID, 1 yes, 0 no from dual
6 union all select 3 PERSON_ID, 0 yes, 1 no from dual
7 union all select 3 PERSON_ID, 1 yes, 0 no from dual)
8 SELECT
9 SUM(CASE WHEN NOS = 0 AND YES > 0 THEN 1 END) YES_PEOPLE,
10 SUM(CASE WHEN NOS > 0 THEN 1 END) NO_PEOPLE
11 FROM
12 (
13 SELECT
14 SUM(NO) NOS,
15 PERSON_ID,
16 SUM(YES) YES
17 FROM TABLE1
18 GROUP BY PERSON_ID
19 );
YES_PEOPLE NO_PEOPLE
---------- ----------
1 2
SQL>
Cheers!!

BigQuery: Querying with standard sql

I have this table:
client_id session_id time action transaction_id
------------------------------------------------------
1 1 15:01 view NULL
1 1 15:02 basket NULL
1 1 15:03 basket NULL
1 1 15:04 purchase 1
1 2 15:05 basket NULL
1 2 15:06 purchase 2
1 2 15:07 view NULL
And I want inside the session, for all the previous actions to register the transaction_id that occur for the first time (therefore at 15:03 transaction_id = NULL)
session_id time transaction_id
------------------------------------
1 15:01 1
1 15:02 1
1 15:03 NULL
1 15:04 1
2 15:05 2
2 15:06 2
2 15:07 NULL

Hmmm . . . assuming that there is only one transaction id per session, then you can use window functions:
select t.*,
(case when row_number() over (partition by client_id, session_id, action
order by time) = 1
then max(transactc
ion_id) over (partition by client_id, session_id)
end) as new_transaction_id
from t

Below is for BigQuery Standard SQL
#standardSQL
SELECT
client_id, session_id, time, action,
(CASE
WHEN ROW_NUMBER()
OVER (PARTITION BY client_id, session_id, grp, action ORDER BY time) = 1
THEN MAX(transaction_id) OVER (PARTITION BY client_id, session_id, grp) END
) AS transaction_id
FROM (
SELECT *,
COUNTIF(transaction_id IS NOT NULL)
OVER(PARTITION BY client_id, session_id
ORDER BY time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp
FROM YourTable
)
-- ORDER BY client_id, session_id, time
You can test play with dummy data as below
#standardSQL
WITH YourTable AS (
SELECT 1 AS client_id, 1 AS session_id, '15:01' AS time, 'view' AS action, NULL AS transaction_id UNION ALL
SELECT 1, 1, '15:02', 'basket', NULL UNION ALL
SELECT 1, 1, '15:03', 'basket', NULL UNION ALL
SELECT 1, 1, '15:04', 'purchase', 1 UNION ALL
SELECT 1, 1, '15:05', 'basket', NULL UNION ALL
SELECT 1, 1, '15:06', 'basket', NULL UNION ALL
SELECT 1, 1, '15:07', 'purchase', 3 UNION ALL
SELECT 1, 2, '15:08', 'basket', NULL UNION ALL
SELECT 1, 2, '15:09', 'purchase', 2 UNION ALL
SELECT 1, 2, '15:10', 'view', NULL
)
SELECT
client_id, session_id, time, action,
(CASE
WHEN ROW_NUMBER()
OVER (PARTITION BY client_id, session_id, grp, action ORDER BY time) = 1
THEN MAX(transaction_id) OVER (PARTITION BY client_id, session_id, grp) END
) AS transaction_id
FROM (
SELECT *,
COUNTIF(transaction_id IS NOT NULL)
OVER(PARTITION BY client_id, session_id
ORDER BY time ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS grp
FROM YourTable
)
-- ORDER BY client_id, session_id, time
Output is as expected
client_id session_id time action transaction_id
1 1 15:01 view 1
1 1 15:02 basket 1
1 1 15:03 basket null
1 1 15:04 purchase 1
1 1 15:05 basket 3
1 1 15:06 basket null
1 1 15:07 purchase 3
1 2 15:08 basket 2
1 2 15:09 purchase 2
1 2 15:10 view null

Oracle SQL: Transform rows to multiple columns

I'm using Oracle 11G and need a way to turn rows into new groups of columns in a select statement. We're transitioning to a 1:3 relationship for some of our data and need a way to get it into a view. Can you help us transform data that looks like this:
+---------+------------+
| User_Id | Station_Id |
+---------+------------+
| 1 | 203 |
| 1 | 204 |
| 2 | 203 |
| 3 | 487 |
| 3 | 3787 |
| 3 | 738 |
+---------+------------+
into this:
+---------+-------------+-------------+---------------+
| User_Id | Station_One | Station_Two | Station_Three |
+---------+-------------+-------------+---------------+
| 1 | 203 | 204 | Null |
| 2 | 203 | Null | Null |
| 3 | 487 | 3787 | 738 |
+---------+-------------+-------------+---------------+
Let me know what ever other specifics you would like and thank you for any help you can give!

You can use row_number and self joins:
with cte as
(
select userid, stationid,
row_number() over(partition by userid order by stationid) rn
from tbl
)
select distinct c1.userid,
c1.stationid station_one,
c2.stationid station_two,
c3.stationid station_three
from cte c1
left join cte c2 on c1.userid=c2.userid and c2.rn=2
left join cte c3 on c1.userid=c3.userid and c3.rn=3
where c1.rn=1
See the demo
You can also do it with row_number and subqueries:
with cte as
(
select userid, stationid,
row_number() over(partition by userid order by stationid) rn
from tbl
)
select distinct userid,
(select stationid from cte c where c.userid=cte.userid and c.rn=1) station_one,
(select stationid from cte c where c.userid=cte.userid and c.rn=2) station_two,
(select stationid from cte c where c.userid=cte.userid and c.rn=3) station_three
from cte
See the demo

The easiest way to accomplish this in my experience is to use conditional aggregation:
WITH mydata AS (
SELECT 1 AS user_id, 203 AS station_id FROM dual
UNION ALL
SELECT 1 AS user_id, 204 AS station_id FROM dual
UNION ALL
SELECT 2 AS user_id, 203 AS station_id FROM dual
UNION ALL
SELECT 3 AS user_id, 487 AS station_id FROM dual
UNION ALL
SELECT 3 AS user_id, 3787 AS station_id FROM dual
UNION ALL
SELECT 3 AS user_id, 738 AS station_id FROM dual
)
SELECT user_id
, MAX(CASE WHEN rn = 1 THEN station_id END) AS station_one
, MAX(CASE WHEN rn = 2 THEN station_id END) AS station_two
, MAX(CASE WHEN rn = 3 THEN station_id END) AS station_three
FROM (
SELECT user_id, station_id, ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY rownum ) AS rn
FROM mydata
) GROUP BY user_id;
Just replace the mydata CTE in the above query with whatever your table's name is:
SELECT user_id
, MAX(CASE WHEN rn = 1 THEN station_id END) AS station_one
, MAX(CASE WHEN rn = 2 THEN station_id END) AS station_two
, MAX(CASE WHEN rn = 3 THEN station_id END) AS station_three
FROM (
SELECT user_id, station_id, ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY rownum ) AS rn
FROM mytable
) GROUP BY user_id;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

To pivot a table based on a specific event value using Query - sql

Related

Select with limited join

Split Columns into two equal number of Rows

Distinct Conditional Counting to Avoid Overlap

BigQuery: Querying with standard sql

Oracle SQL: Transform rows to multiple columns

Categories

Resources