How to write the Pivot clause for this specific table? - sql

I am using SQL Server 2014. Below is an extract of my table (t1):
Name RoomType Los RO BB HB FB StartDate EndDate CaptureDate
A DLX 7 0 0 154 200 2022-01-01 2022-01-07 2021-12-31
B SUP 7 110 0 0 0 2022-01-01 2022-01-07 2021-12-31
C COS 7 0 0 200 139 2022-01-01 2022-01-07 2021-12-31
D STD 7 0 75 0 500 2022-01-01 2022-01-07 2021-12-31
I need a Pivot query to convert the above table into the following output:
Name RoomType Los MealPlan Price StartDate EndDate CaptureDate
A DLX 7 RO 0 2022-01-01 2022-01-07 2021-12-31
A DLX 7 BB 0 2022-01-01 2022-01-07 2021-12-31
A DLX 7 HB 154 2022-01-01 2022-01-07 2021-12-31
A DLX 7 FB 200 2022-01-01 2022-01-07 2021-12-31
B SUP 7 RO 110 2022-01-01 2022-01-07 2021-12-31
B SUP 7 BB 0 2022-01-01 2022-01-07 2021-12-31
B SUP 7 HB 0 2022-01-01 2022-01-07 2021-12-31
B SUP 7 FB 0 2022-01-01 2022-01-07 2021-12-31
C COS 7 RO 0 2022-01-01 2022-01-07 2021-12-31
C COS 7 BB 0 2022-01-01 2022-01-07 2021-12-31
C COS 7 HB 200 2022-01-01 2022-01-07 2021-12-31
C COS 7 FB 139 2022-01-01 2022-01-07 2021-12-31
D STD 7 RO 0 2022-01-01 2022-01-07 2021-12-31
D STD 7 BB 75 2022-01-01 2022-01-07 2021-12-31
D STD 7 HB 0 2022-01-01 2022-01-07 2021-12-31
D STD 7 FB 500 2022-01-01 2022-01-07 2021-12-31
I had a look at the following article but it does not seem to address my problem:
SQL Server Pivot Clause
I am did some further research but I did not land on any site that provided a solution to this problem.
Any help would be highly appreciated.

You actually want an UNPIVOT here (comparison docs).
SELECT Name, RoomType, Los, MealPlan, Price,
StartDate, EndDate, CaptureDate
FROM dbo.t1
UNPIVOT (Price FOR MealPlan IN ([RO],[BB],[HB],[FB])) AS u;
Example db<>fiddle

Related

standard sql: Get customer count and first purchase date per customer and store_id

I use standard sql and I need a query that gets the total count of purchases per customer, for each store_id. And also the first purchase date per customer, for each store_id.
I have a table with this structure:
customer_id
store_id
product_no
customer_no
purchase_date
price
1
10
100
200
2022-01-01
50
1
10
110
200
2022-01-02
70
1
20
120
200
2022-01-02
60
1
20
130
200
2022-01-02
40
1
30
140
200
2022-01-02
60
Current query:
Select
customer_id,
store_id,
product_id,
product_no,
customer_no,
purchase_date,
Price,
first_value(purchase_date) over (partition_by customer_no order by purchase_date) as first_purhcase_date,
count(customer_no) over (partition by customer_id, store_id, customer_no) as customer_purchase_count)
From my_table
This gives me this type of output:
customer_id
store_id
product_no
customer_no
purchase_date
price
first_purchase_date
customer_purchase_count
1
10
100
200
2022-01-01
50
2022-01-01
2
1
10
110
200
2022-01-02
70
2022-01-01
2
1
20
120
210
2022-01-02
60
2022-01-02
2
1
20
130
210
2022-01-02
40
2022-01-02
2
1
30
140
220
2022-01-10
60
2022-01-10
3
1
10
140
220
2022-01-10
60
2022-01-10
3
1
10
140
220
2022-01-10
60
2022-01-10
3
1
10
150
220
2022-01-10
60
2022-01-10
1
However, I want it to look like the table below in its final form. How can I achieve that? If possible I would also like to add 4 colums called "only_in_store_10","only_in_store_20","only_in_store_30","only_in_store_40" for all customer_no that only shopped at that store. It should mark with at ○ on each row of each customer_no that satisfies the condition.
customer_id
store_id
product_no
customer_no
purchase_date
price
first_purchase_date
customer_purchase_count
first_purchase_date_per_store
first_purchase_date_per_store
store_row_nr
1
10
100
200
2022-01-01
50
2022-01-01
2
2022-01-01
1
1
1
10
110
200
2022-01-02
70
2022-01-01
2
2022-01-02
1
1
1
20
120
210
2022-01-02
60
2022-01-02
2
2022-01-02
2
1
1
20
130
210
2022-01-03
40
2022-01-02
2
2022-01-02
2
1
1
30
140
220
2022-01-10
60
2022-01-10
3
2022-01-10
1
1
1
10
140
220
2022-01-11
50
2022-01-11
3
2022-01-11
2
1
1
10
140
220
2022-01-12
40
2022-01-11
3
2022-01-11
2
2
1
10
150
220
2022-01-13
60
2022-01-13
1
2022-01-13
1
1

Get Data in a row with specific values

I Have a series of data like example below:
Customer
Date
Value
a
2022-01-02
100
a
2022-01-03
100
a
2022-01-04
100
a
2022-01-05
100
a
2022-01-06
100
b
2022-01-02
100
b
2022-01-03
100
b
2022-01-04
100
b
2022-01-05
100
b
2022-01-06
090
b
2022-01-07
100
c
2022-02-03
100
c
2022-02-04
100
c
2022-02-05
100
c
2022-02-06
100
c
2022-02-07
100
d
2022-04-10
100
d
2022-04-11
100
d
2022-04-12
100
d
2022-04-13
100
d
2022-04-14
100
d
2022-04-15
090
e
2022-04-10
100
e
2022-04-11
100
e
2022-04-12
080
e
2022-04-13
070
e
2022-04-14
100
e
2022-04-15
100
The result I want are customer A,C and D only. Because A, C and D have value 100 for 5 days in a row.
The start date of each customer is different.
What is the query in BigQuery I need to write for that case above?
Thank you so much
Would you consider below query ?
SELECT DISTINCT Customer
FROM sample_table
QUALIFY 5 = COUNTIF(Value = 100) OVER (
PARTITION BY Customer ORDER BY UNIX_DATE(Date) RANGE BETWEEN 4 PRECEDING AND CURRENT ROW
);
+-----+----------+
| Row | Customer |
+-----+----------+
| 1 | a |
| 2 | c |
| 3 | d |
+-----+----------+
Note that it assumes Date column has DATE type.

pandas retain values on different index dataframes

I need to merge two dataframes with different frequencies (daily to weekly). However, would like to retain the weekly values when merging to the daily dataframe.
There is a grouping variable in the data, group.
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
daily={'date':[datetime.date(2022,1,1)+relativedelta(day=i) for i in range(1,10)]*2,
'group':['A' for x in range(1,10)]+['B' for x in range(1,10)],
'daily_value':[x for x in range(1,10)]*2}
weekly={'date':[datetime.date(2022,1,1),datetime.date(2022,1,7)]*2,
'group':['A','A']+['B','B'],
'weekly_value':[100,200,300,400]}
daily_data=pd.DataFrame(daily)
weekly_data=pd.DataFrame(weekly)
daily_data output:
date group daily_value
0 2022-01-01 A 1
1 2022-01-02 A 2
2 2022-01-03 A 3
3 2022-01-04 A 4
4 2022-01-05 A 5
5 2022-01-06 A 6
6 2022-01-07 A 7
7 2022-01-08 A 8
8 2022-01-09 A 9
9 2022-01-01 B 1
10 2022-01-02 B 2
11 2022-01-03 B 3
12 2022-01-04 B 4
13 2022-01-05 B 5
14 2022-01-06 B 6
15 2022-01-07 B 7
16 2022-01-08 B 8
17 2022-01-09 B 9
weekly_data output:
date group weekly_value
0 2022-01-01 A 100
1 2022-01-07 A 200
2 2022-01-01 B 300
3 2022-01-07 B 400
The desired output
desired={'date':[datetime.date(2022,1,1)+relativedelta(day=i) for i in range(1,10)]*2,
'group':['A' for x in range(1,10)]+['B' for x in range(1,10)],
'daily_value':[x for x in range(1,10)]*2,
'weekly_value':[100]*6+[200]*3+[300]*6+[400]*3}
desired_data=pd.DataFrame(desired)
desired_data output:
date group daily_value weekly_value
0 2022-01-01 A 1 100
1 2022-01-02 A 2 100
2 2022-01-03 A 3 100
3 2022-01-04 A 4 100
4 2022-01-05 A 5 100
5 2022-01-06 A 6 100
6 2022-01-07 A 7 200
7 2022-01-08 A 8 200
8 2022-01-09 A 9 200
9 2022-01-01 B 1 300
10 2022-01-02 B 2 300
11 2022-01-03 B 3 300
12 2022-01-04 B 4 300
13 2022-01-05 B 5 300
14 2022-01-06 B 6 300
15 2022-01-07 B 7 400
16 2022-01-08 B 8 400
17 2022-01-09 B 9 400
Use merge_asof with sorting values by datetimes, last sorting like original by both columns:
daily_data['date'] = pd.to_datetime(daily_data['date'])
weekly_data['date'] = pd.to_datetime(weekly_data['date'])
df = (pd.merge_asof(daily_data.sort_values('date'),
weekly_data.sort_values('date'),
on='date',
by='group').sort_values(['group','date'], ignore_index=True))
print (df)
date group daily_value weekly_value
0 2022-01-01 A 1 100
1 2022-01-02 A 2 100
2 2022-01-03 A 3 100
3 2022-01-04 A 4 100
4 2022-01-05 A 5 100
5 2022-01-06 A 6 100
6 2022-01-07 A 7 200
7 2022-01-08 A 8 200
8 2022-01-09 A 9 200
9 2022-01-01 B 1 300
10 2022-01-02 B 2 300
11 2022-01-03 B 3 300
12 2022-01-04 B 4 300
13 2022-01-05 B 5 300
14 2022-01-06 B 6 300
15 2022-01-07 B 7 400
16 2022-01-08 B 8 400
17 2022-01-09 B 9 400

pandas how to populate missing rows

I have a dataset like:
Dept, Date, Number
dept1, 2020-01-01, 12
dept1, 2020-01-03, 34
dept2, 2020-01-03, 56
dept3, 2020-01-03, 78
dept2, 2020-01-04, 11
dept3, 2020-01-04, 12
...
eg, I want to fill zero for missing dept2 & dept3 on date 2020-01-01
Dept, Date, Number
dept1, 2020-01-01, 12
dept2, 2020-01-01, 0 <--need to be added
dept3, 2020-01-01, 0 <--need to be added
dept1, 2020-01-03, 34
dept2, 2020-01-03, 56
dept3, 2020-01-03, 78
dept1, 2020-01-04, 0 <--need to be added
dept2, 2020-01-04, 11
dept3, 2020-01-04, 12
In other words, for unique dept, I need them to be shown on every unique date.
Is it a way to achieve this? Thanks!
You could use the complete function from pyjanitor, to abstract the process, simply pass the columns that you wish to expand:
In [598]: df.complete('Dept', 'Date').fillna(0)
Out[598]:
Dept Date Number
0 dept1 2020-01-01 12.0
1 dept1 2020-01-03 34.0
2 dept1 2020-01-04 0.0
3 dept2 2020-01-01 0.0
4 dept2 2020-01-03 56.0
5 dept2 2020-01-04 11.0
6 dept3 2020-01-01 0.0
7 dept3 2020-01-03 78.0
8 dept3 2020-01-04 12.0
You could also stick solely to Pandas and use the reindex method; complete covers cases where the index is not unique, or there are nulls; it is an abstraction/convenience wrapper:
(df
.set_index(['Dept', 'Date'])
.pipe(lambda df: df.reindex(pd.MultiIndex.from_product(df.index.levels),
fill_value = 0))
.reset_index()
)
Dept Date Number
0 dept1 2020-01-01 12
1 dept1 2020-01-03 34
2 dept1 2020-01-04 0
3 dept2 2020-01-01 0
4 dept2 2020-01-03 56
5 dept2 2020-01-04 11
6 dept3 2020-01-01 0
7 dept3 2020-01-03 78
8 dept3 2020-01-04 12
Let us do pivot then stack
out = df.pivot(*df.columns).fillna(0).stack().reset_index(name='Number')
Dept Date Number
0 dept1 2020-01-01 12.0
1 dept1 2020-01-03 34.0
2 dept1 2020-01-04 0.0
3 dept2 2020-01-01 0.0
4 dept2 2020-01-03 56.0
5 dept2 2020-01-04 11.0
6 dept3 2020-01-01 0.0
7 dept3 2020-01-03 78.0
8 dept3 2020-01-04 12.0

7 days hourly mean with pandas

I need some help calculating a 7 days mean for every hour.
The timeseries has a hourly resolution and I need the 7 days mean for each hour e.g. for 13 o'clock
date, x
2020-07-01 13:00 , 4
2020-07-01 14:00 , 3
.
.
.
2020-07-02 13:00 , 3
2020-07-02 14:00 , 7
.
.
.
I tried it with pandas and a rolling mean, but rolling includes last 7 days.
Thanks for any hints!
Add a new hour column, grouping by hour column, and then add
The average was calculated over 7 days. This is consistent with the intent of the question.
df['hour'] = df.index.hour
df = df.groupby(df.hour)['x'].rolling(7).mean().reset_index()
df.head(35)
hour level_1 x
0 0 2020-07-01 00:00:00 NaN
1 0 2020-07-02 00:00:00 NaN
2 0 2020-07-03 00:00:00 NaN
3 0 2020-07-04 00:00:00 NaN
4 0 2020-07-05 00:00:00 NaN
5 0 2020-07-06 00:00:00 NaN
6 0 2020-07-07 00:00:00 48.142857
7 0 2020-07-08 00:00:00 50.285714
8 0 2020-07-09 00:00:00 60.000000
9 0 2020-07-10 00:00:00 63.142857
10 1 2020-07-01 01:00:00 NaN
11 1 2020-07-02 01:00:00 NaN
12 1 2020-07-03 01:00:00 NaN
13 1 2020-07-04 01:00:00 NaN
14 1 2020-07-05 01:00:00 NaN
15 1 2020-07-06 01:00:00 NaN
16 1 2020-07-07 01:00:00 52.571429
17 1 2020-07-08 01:00:00 48.428571
18 1 2020-07-09 01:00:00 38.000000
19 2 2020-07-01 02:00:00 NaN
20 2 2020-07-02 02:00:00 NaN
21 2 2020-07-03 02:00:00 NaN
22 2 2020-07-04 02:00:00 NaN
23 2 2020-07-05 02:00:00 NaN
24 2 2020-07-06 02:00:00 NaN
25 2 2020-07-07 02:00:00 46.571429
26 2 2020-07-08 02:00:00 47.714286
27 2 2020-07-09 02:00:00 42.714286
28 3 2020-07-01 03:00:00 NaN
29 3 2020-07-02 03:00:00 NaN
30 3 2020-07-03 03:00:00 NaN
31 3 2020-07-04 03:00:00 NaN
32 3 2020-07-05 03:00:00 NaN
33 3 2020-07-06 03:00:00 NaN
34 3 2020-07-07 03:00:00 72.571429