standard sql: Get customer count and first purchase date per customer and store_id - sql

I use standard sql and I need a query that gets the total count of purchases per customer, for each store_id. And also the first purchase date per customer, for each store_id.
I have a table with this structure:
customer_id
store_id
product_no
customer_no
purchase_date
price
1
10
100
200
2022-01-01
50
1
10
110
200
2022-01-02
70
1
20
120
200
2022-01-02
60
1
20
130
200
2022-01-02
40
1
30
140
200
2022-01-02
60
Current query:
Select
customer_id,
store_id,
product_id,
product_no,
customer_no,
purchase_date,
Price,
first_value(purchase_date) over (partition_by customer_no order by purchase_date) as first_purhcase_date,
count(customer_no) over (partition by customer_id, store_id, customer_no) as customer_purchase_count)
From my_table
This gives me this type of output:
customer_id
store_id
product_no
customer_no
purchase_date
price
first_purchase_date
customer_purchase_count
1
10
100
200
2022-01-01
50
2022-01-01
2
1
10
110
200
2022-01-02
70
2022-01-01
2
1
20
120
210
2022-01-02
60
2022-01-02
2
1
20
130
210
2022-01-02
40
2022-01-02
2
1
30
140
220
2022-01-10
60
2022-01-10
3
1
10
140
220
2022-01-10
60
2022-01-10
3
1
10
140
220
2022-01-10
60
2022-01-10
3
1
10
150
220
2022-01-10
60
2022-01-10
1
However, I want it to look like the table below in its final form. How can I achieve that? If possible I would also like to add 4 colums called "only_in_store_10","only_in_store_20","only_in_store_30","only_in_store_40" for all customer_no that only shopped at that store. It should mark with at ○ on each row of each customer_no that satisfies the condition.
customer_id
store_id
product_no
customer_no
purchase_date
price
first_purchase_date
customer_purchase_count
first_purchase_date_per_store
first_purchase_date_per_store
store_row_nr
1
10
100
200
2022-01-01
50
2022-01-01
2
2022-01-01
1
1
1
10
110
200
2022-01-02
70
2022-01-01
2
2022-01-02
1
1
1
20
120
210
2022-01-02
60
2022-01-02
2
2022-01-02
2
1
1
20
130
210
2022-01-03
40
2022-01-02
2
2022-01-02
2
1
1
30
140
220
2022-01-10
60
2022-01-10
3
2022-01-10
1
1
1
10
140
220
2022-01-11
50
2022-01-11
3
2022-01-11
2
1
1
10
140
220
2022-01-12
40
2022-01-11
3
2022-01-11
2
2
1
10
150
220
2022-01-13
60
2022-01-13
1
2022-01-13
1
1

Related

Get Data in a row with specific values

I Have a series of data like example below:
Customer
Date
Value
a
2022-01-02
100
a
2022-01-03
100
a
2022-01-04
100
a
2022-01-05
100
a
2022-01-06
100
b
2022-01-02
100
b
2022-01-03
100
b
2022-01-04
100
b
2022-01-05
100
b
2022-01-06
090
b
2022-01-07
100
c
2022-02-03
100
c
2022-02-04
100
c
2022-02-05
100
c
2022-02-06
100
c
2022-02-07
100
d
2022-04-10
100
d
2022-04-11
100
d
2022-04-12
100
d
2022-04-13
100
d
2022-04-14
100
d
2022-04-15
090
e
2022-04-10
100
e
2022-04-11
100
e
2022-04-12
080
e
2022-04-13
070
e
2022-04-14
100
e
2022-04-15
100
The result I want are customer A,C and D only. Because A, C and D have value 100 for 5 days in a row.
The start date of each customer is different.
What is the query in BigQuery I need to write for that case above?
Thank you so much
Would you consider below query ?
SELECT DISTINCT Customer
FROM sample_table
QUALIFY 5 = COUNTIF(Value = 100) OVER (
PARTITION BY Customer ORDER BY UNIX_DATE(Date) RANGE BETWEEN 4 PRECEDING AND CURRENT ROW
);
+-----+----------+
| Row | Customer |
+-----+----------+
| 1 | a |
| 2 | c |
| 3 | d |
+-----+----------+
Note that it assumes Date column has DATE type.

fill gaps day with two tables in sql

I have three different ID. id are dynamics
FOR EACH id, i need complete a calendar with last exist value.
example
ID
VALUE
date
1
30
1/1/2020
1
29
3/1/2020
2
65
1/1/2020
3
30
2/1/2020
1
11
6/1/2020
2
40
4/1/2020
3
23
5/1/2020
OUTPUT EXPECTED
ID
VALUE
date
1
30
1/1/2020
1
30
2/1/2020
1
29
3/1/2020
1
29
4/1/2020
1
29
5/1/2020
1
11
6/1/2020
2
65
1/1/2020
2
65
2/1/2020
2
65
3/1/2020
2
40
4/1/2020
2
40
5/1/2020
2
40
6/1/2020
3
30
2/1/2020
3
30
3/1/2020
3
30
4/1/2020
3
23
5/1/2020
3
23
6/1/2020
---Complete fields until today for each id---

pandas retain values on different index dataframes

I need to merge two dataframes with different frequencies (daily to weekly). However, would like to retain the weekly values when merging to the daily dataframe.
There is a grouping variable in the data, group.
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
daily={'date':[datetime.date(2022,1,1)+relativedelta(day=i) for i in range(1,10)]*2,
'group':['A' for x in range(1,10)]+['B' for x in range(1,10)],
'daily_value':[x for x in range(1,10)]*2}
weekly={'date':[datetime.date(2022,1,1),datetime.date(2022,1,7)]*2,
'group':['A','A']+['B','B'],
'weekly_value':[100,200,300,400]}
daily_data=pd.DataFrame(daily)
weekly_data=pd.DataFrame(weekly)
daily_data output:
date group daily_value
0 2022-01-01 A 1
1 2022-01-02 A 2
2 2022-01-03 A 3
3 2022-01-04 A 4
4 2022-01-05 A 5
5 2022-01-06 A 6
6 2022-01-07 A 7
7 2022-01-08 A 8
8 2022-01-09 A 9
9 2022-01-01 B 1
10 2022-01-02 B 2
11 2022-01-03 B 3
12 2022-01-04 B 4
13 2022-01-05 B 5
14 2022-01-06 B 6
15 2022-01-07 B 7
16 2022-01-08 B 8
17 2022-01-09 B 9
weekly_data output:
date group weekly_value
0 2022-01-01 A 100
1 2022-01-07 A 200
2 2022-01-01 B 300
3 2022-01-07 B 400
The desired output
desired={'date':[datetime.date(2022,1,1)+relativedelta(day=i) for i in range(1,10)]*2,
'group':['A' for x in range(1,10)]+['B' for x in range(1,10)],
'daily_value':[x for x in range(1,10)]*2,
'weekly_value':[100]*6+[200]*3+[300]*6+[400]*3}
desired_data=pd.DataFrame(desired)
desired_data output:
date group daily_value weekly_value
0 2022-01-01 A 1 100
1 2022-01-02 A 2 100
2 2022-01-03 A 3 100
3 2022-01-04 A 4 100
4 2022-01-05 A 5 100
5 2022-01-06 A 6 100
6 2022-01-07 A 7 200
7 2022-01-08 A 8 200
8 2022-01-09 A 9 200
9 2022-01-01 B 1 300
10 2022-01-02 B 2 300
11 2022-01-03 B 3 300
12 2022-01-04 B 4 300
13 2022-01-05 B 5 300
14 2022-01-06 B 6 300
15 2022-01-07 B 7 400
16 2022-01-08 B 8 400
17 2022-01-09 B 9 400
Use merge_asof with sorting values by datetimes, last sorting like original by both columns:
daily_data['date'] = pd.to_datetime(daily_data['date'])
weekly_data['date'] = pd.to_datetime(weekly_data['date'])
df = (pd.merge_asof(daily_data.sort_values('date'),
weekly_data.sort_values('date'),
on='date',
by='group').sort_values(['group','date'], ignore_index=True))
print (df)
date group daily_value weekly_value
0 2022-01-01 A 1 100
1 2022-01-02 A 2 100
2 2022-01-03 A 3 100
3 2022-01-04 A 4 100
4 2022-01-05 A 5 100
5 2022-01-06 A 6 100
6 2022-01-07 A 7 200
7 2022-01-08 A 8 200
8 2022-01-09 A 9 200
9 2022-01-01 B 1 300
10 2022-01-02 B 2 300
11 2022-01-03 B 3 300
12 2022-01-04 B 4 300
13 2022-01-05 B 5 300
14 2022-01-06 B 6 300
15 2022-01-07 B 7 400
16 2022-01-08 B 8 400
17 2022-01-09 B 9 400

Want SQL query for this scenario

tbl_employee
empid empname openingbal
2 jhon 400
3 smith 500
tbl_transection1
tid empid amount creditdebit date
1 2 100 1 2016-01-06 00:00:00.000
2 2 200 1 2016-01-08 00:00:00.000
3 2 100 2 2016-01-11 00:00:00.000
4 2 700 1 2016-01-15 00:00:00.000
5 3 100 1 2016-02-03 00:00:00.000
6 3 200 2 2016-02-06 00:00:00.000
7 3 400 1 2016-02-07 00:00:00.000
tbl_transection2
tid empid amount creditdebit date
1 2 100 1 2016-01-07 00:00:00.000
2 2 200 1 2016-01-08 00:00:00.000
3 2 100 2 2016-01-09 00:00:00.000
4 2 700 1 2016-01-14 00:00:00.000
5 3 100 1 2016-02-04 00:00:00.000
6 3 200 2 2016-02-05 00:00:00.000
7 3 400 1 2016-02-08 00:00:00.000
Here 1 stand for credit and 2 for debit
I want output like
empid empname details debitamount creditamount balance Dr/Cr date
2 jhon opening Bal 400 Cr
2 jhon transection 1 100 500 Cr 2016-01-06 00:00:00.000
2 jhon transection 2 100 600 Cr 2016-01-07 00:00:00.000
2 jhon transection 1 200 800 Cr 2016-01-08 00:00:00.000
2 jhon transection 2 200 1000 Cr 2016-01-08 00:00:00.000
2 jhon transection 2 100 900 Dr 2016-01-09 00:00:00.000
2 jhon transection 1 100 800 Dr 2016-01-11 00:00:00.000
2 jhon transection 2 700 1500 Cr 2016-01-14 00:00:00.000
2 jhon transection 1 700 2200 Cr 2016-01-15 00:00:00.000
3 smith opening Bal 500 Cr
3 smith transection 1 100 600 Cr 2016-02-03 00:00:00.000
3 smith transection 2 100 700 Cr 2016-02-04 00:00:00.000
3 smith transection 2 200 500 Dr 2016-02-05 00:00:00.000
3 smith transection 1 200 300 Dr 2016-02-06 00:00:00.000
3 smith transection 1 400 700 Cr 2016-02-07 00:00:00.000
3 smith transection 2 400 1100 Cr 2016-02-08 00:00:00.000
You can do it with something like this:
select
empid, sum(amount) over (partition by empid order by date) as balance, details
from (
select
empid, case creditdebit when 1 then amount else -amount end as amount, date, details
from (
select empid, openingbal as amount, 1 as creditdebit, '19000101' as date, 'opening Bal' as details
from tbl_employee
union all
select empid, amount, creditdebit, date, 'transection 1'
from tbl_transection1
union all
select empid, amount, creditdebit, date, 'transection 2'
from tbl_transection2
) X
) Y
The innermost select is to gather the data from the 3 tables, the next one is to calculate +/- for the amounts and the outermost is to calculate the balance.
Example in SQL Fiddle

How to extract info based on the latest row

I have two tables:-
TABLE A :-
ORNO DEL PONO QTY
801 123 1 80
801 123 2 60
801 123 3 70
801 151 1 95
801 151 3 75
802 130 1 50
802 130 2 40
802 130 3 30
802 181 2 55
TABLE B:-
ORNO PONO STATUS ITEM
801 1 12 APPLE
801 2 12 ORANGE
801 3 12 MANGO
802 1 22 PEAR
802 2 22 KIWI
802 3 22 MELON
I wish to extract the info based on the latest DEL (in Table A) using SQL. The final output should look like this:-
OUTPUT:-
ORNO PONO STATUS ITEM QTY
801 1 12 APPLE 95
801 2 12 ORANGE 60
801 3 12 MANGO 75
802 1 22 PEAR 50
802 2 22 KIWI 55
802 3 22 MELON 30
Thanks.
select b.*, y.QTY
from
(
select a.ORNO, a.PONO, MAX(a.DEL) [max]
from #tA a
group by a.ORNO, a.PONO
)x
join #tA y on y.ORNO = x.ORNO and y.PONO = x.PONO and y.DEL = x.max
join #tB b on b.ORNO = y.ORNO and b.PONO = y.PONO
Output:
ORNO PONO STATUS ITEM QTY
----------- ----------- ----------- ---------- -----------
801 1 12 APPLE 95
801 2 12 ORANGE 60
801 3 12 MANGO 75
802 1 22 PEAR 50
802 2 22 KIWI 55
802 3 22 MELON 30