standard sql: Get customer count and first purchase date per customer and store_id

standard sql: Get customer count and first purchase date per customer and store_id - sql

I use standard sql and I need a query that gets the total count of purchases per customer, for each store_id. And also the first purchase date per customer, for each store_id.
I have a table with this structure:
customer_id
store_id
product_no
customer_no
purchase_date
price
1
10
100
200
2022-01-01
50
1
10
110
200
2022-01-02
70
1
20
120
200
2022-01-02
60
1
20
130
200
2022-01-02
40
1
30
140
200
2022-01-02
60
Current query:
Select
customer_id,
store_id,
product_id,
product_no,
customer_no,
purchase_date,
Price,
first_value(purchase_date) over (partition_by customer_no order by purchase_date) as first_purhcase_date,
count(customer_no) over (partition by customer_id, store_id, customer_no) as customer_purchase_count)
From my_table
This gives me this type of output:
customer_id
store_id
product_no
customer_no
purchase_date
price
first_purchase_date
customer_purchase_count
1
10
100
200
2022-01-01
50
2022-01-01
2
1
10
110
200
2022-01-02
70
2022-01-01
2
1
20
120
210
2022-01-02
60
2022-01-02
2
1
20
130
210
2022-01-02
40
2022-01-02
2
1
30
140
220
2022-01-10
60
2022-01-10
3
1
10
140
220
2022-01-10
60
2022-01-10
3
1
10
140
220
2022-01-10
60
2022-01-10
3
1
10
150
220
2022-01-10
60
2022-01-10
1
However, I want it to look like the table below in its final form. How can I achieve that?　If possible I would also like to add 4 colums called "only_in_store_10","only_in_store_20","only_in_store_30","only_in_store_40" for all customer_no that only shopped at that store. It should mark with at ○ on each row of each customer_no that satisfies the condition.
customer_id
store_id
product_no
customer_no
purchase_date
price
first_purchase_date
customer_purchase_count
first_purchase_date_per_store
first_purchase_date_per_store
store_row_nr
1
10
100
200
2022-01-01
50
2022-01-01
2
2022-01-01
1
1
1
10
110
200
2022-01-02
70
2022-01-01
2
2022-01-02
1
1
1
20
120
210
2022-01-02
60
2022-01-02
2
2022-01-02
2
1
1
20
130
210
2022-01-03
40
2022-01-02
2
2022-01-02
2
1
1
30
140
220
2022-01-10
60
2022-01-10
3
2022-01-10
1
1
1
10
140
220
2022-01-11
50
2022-01-11
3
2022-01-11
2
1
1
10
140
220
2022-01-12
40
2022-01-11
3
2022-01-11
2
2
1
10
150
220
2022-01-13
60
2022-01-13
1
2022-01-13
1
1

Related

Get Data in a row with specific values

I Have a series of data like example below:
Customer
Date
Value
a
2022-01-02
100
a
2022-01-03
100
a
2022-01-04
100
a
2022-01-05
100
a
2022-01-06
100
b
2022-01-02
100
b
2022-01-03
100
b
2022-01-04
100
b
2022-01-05
100
b
2022-01-06
090
b
2022-01-07
100
c
2022-02-03
100
c
2022-02-04
100
c
2022-02-05
100
c
2022-02-06
100
c
2022-02-07
100
d
2022-04-10
100
d
2022-04-11
100
d
2022-04-12
100
d
2022-04-13
100
d
2022-04-14
100
d
2022-04-15
090
e
2022-04-10
100
e
2022-04-11
100
e
2022-04-12
080
e
2022-04-13
070
e
2022-04-14
100
e
2022-04-15
100
The result I want are customer A,C and D only. Because A, C and D have value 100 for 5 days in a row.
The start date of each customer is different.
What is the query in BigQuery I need to write for that case above?
Thank you so much

Would you consider below query ?
SELECT DISTINCT Customer
FROM sample_table
QUALIFY 5 = COUNTIF(Value = 100) OVER (
PARTITION BY Customer ORDER BY UNIX_DATE(Date) RANGE BETWEEN 4 PRECEDING AND CURRENT ROW
);
+-----+----------+
| Row | Customer |
+-----+----------+
| 1 | a |
| 2 | c |
| 3 | d |
+-----+----------+
Note that it assumes Date column has DATE type.

fill gaps day with two tables in sql

I have three different ID. id are dynamics
FOR EACH id, i need complete a calendar with last exist value.
example
ID
VALUE
date
1
30
1/1/2020
1
29
3/1/2020
2
65
1/1/2020
3
30
2/1/2020
1
11
6/1/2020
2
40
4/1/2020
3
23
5/1/2020
OUTPUT EXPECTED
ID
VALUE
date
1
30
1/1/2020
1
30
2/1/2020
1
29
3/1/2020
1
29
4/1/2020
1
29
5/1/2020
1
11
6/1/2020
2
65
1/1/2020
2
65
2/1/2020
2
65
3/1/2020
2
40
4/1/2020
2
40
5/1/2020
2
40
6/1/2020
3
30
2/1/2020
3
30
3/1/2020
3
30
4/1/2020
3
23
5/1/2020
3
23
6/1/2020
---Complete fields until today for each id---

pandas retain values on different index dataframes

I need to merge two dataframes with different frequencies (daily to weekly). However, would like to retain the weekly values when merging to the daily dataframe.
There is a grouping variable in the data, group.
import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
daily={'date':[datetime.date(2022,1,1)+relativedelta(day=i) for i in range(1,10)]*2,
'group':['A' for x in range(1,10)]+['B' for x in range(1,10)],
'daily_value':[x for x in range(1,10)]*2}
weekly={'date':[datetime.date(2022,1,1),datetime.date(2022,1,7)]*2,
'group':['A','A']+['B','B'],
'weekly_value':[100,200,300,400]}
daily_data=pd.DataFrame(daily)
weekly_data=pd.DataFrame(weekly)
daily_data output:
date group daily_value
0 2022-01-01 A 1
1 2022-01-02 A 2
2 2022-01-03 A 3
3 2022-01-04 A 4
4 2022-01-05 A 5
5 2022-01-06 A 6
6 2022-01-07 A 7
7 2022-01-08 A 8
8 2022-01-09 A 9
9 2022-01-01 B 1
10 2022-01-02 B 2
11 2022-01-03 B 3
12 2022-01-04 B 4
13 2022-01-05 B 5
14 2022-01-06 B 6
15 2022-01-07 B 7
16 2022-01-08 B 8
17 2022-01-09 B 9
weekly_data output:
date group weekly_value
0 2022-01-01 A 100
1 2022-01-07 A 200
2 2022-01-01 B 300
3 2022-01-07 B 400
The desired output
desired={'date':[datetime.date(2022,1,1)+relativedelta(day=i) for i in range(1,10)]*2,
'group':['A' for x in range(1,10)]+['B' for x in range(1,10)],
'daily_value':[x for x in range(1,10)]*2,
'weekly_value':[100]*6+[200]*3+[300]*6+[400]*3}
desired_data=pd.DataFrame(desired)
desired_data output:
date group daily_value weekly_value
0 2022-01-01 A 1 100
1 2022-01-02 A 2 100
2 2022-01-03 A 3 100
3 2022-01-04 A 4 100
4 2022-01-05 A 5 100
5 2022-01-06 A 6 100
6 2022-01-07 A 7 200
7 2022-01-08 A 8 200
8 2022-01-09 A 9 200
9 2022-01-01 B 1 300
10 2022-01-02 B 2 300
11 2022-01-03 B 3 300
12 2022-01-04 B 4 300
13 2022-01-05 B 5 300
14 2022-01-06 B 6 300
15 2022-01-07 B 7 400
16 2022-01-08 B 8 400
17 2022-01-09 B 9 400

Use merge_asof with sorting values by datetimes, last sorting like original by both columns:
daily_data['date'] = pd.to_datetime(daily_data['date'])
weekly_data['date'] = pd.to_datetime(weekly_data['date'])
df = (pd.merge_asof(daily_data.sort_values('date'),
weekly_data.sort_values('date'),
on='date',
by='group').sort_values(['group','date'], ignore_index=True))
print (df)
date group daily_value weekly_value
0 2022-01-01 A 1 100
1 2022-01-02 A 2 100
2 2022-01-03 A 3 100
3 2022-01-04 A 4 100
4 2022-01-05 A 5 100
5 2022-01-06 A 6 100
6 2022-01-07 A 7 200
7 2022-01-08 A 8 200
8 2022-01-09 A 9 200
9 2022-01-01 B 1 300
10 2022-01-02 B 2 300
11 2022-01-03 B 3 300
12 2022-01-04 B 4 300
13 2022-01-05 B 5 300
14 2022-01-06 B 6 300
15 2022-01-07 B 7 400
16 2022-01-08 B 8 400
17 2022-01-09 B 9 400

Want SQL query for this scenario

tbl_employee
empid empname openingbal
2 jhon 400
3 smith 500
tbl_transection1
tid empid amount creditdebit date
1 2 100 1 2016-01-06 00:00:00.000
2 2 200 1 2016-01-08 00:00:00.000
3 2 100 2 2016-01-11 00:00:00.000
4 2 700 1 2016-01-15 00:00:00.000
5 3 100 1 2016-02-03 00:00:00.000
6 3 200 2 2016-02-06 00:00:00.000
7 3 400 1 2016-02-07 00:00:00.000
tbl_transection2
tid empid amount creditdebit date
1 2 100 1 2016-01-07 00:00:00.000
2 2 200 1 2016-01-08 00:00:00.000
3 2 100 2 2016-01-09 00:00:00.000
4 2 700 1 2016-01-14 00:00:00.000
5 3 100 1 2016-02-04 00:00:00.000
6 3 200 2 2016-02-05 00:00:00.000
7 3 400 1 2016-02-08 00:00:00.000
Here 1 stand for credit and 2 for debit
I want output like
empid empname details debitamount creditamount balance Dr/Cr date
2 jhon opening Bal 400 Cr
2 jhon transection 1 100 500 Cr 2016-01-06 00:00:00.000
2 jhon transection 2 100 600 Cr 2016-01-07 00:00:00.000
2 jhon transection 1 200 800 Cr 2016-01-08 00:00:00.000
2 jhon transection 2 200 1000 Cr 2016-01-08 00:00:00.000
2 jhon transection 2 100 900 Dr 2016-01-09 00:00:00.000
2 jhon transection 1 100 800 Dr 2016-01-11 00:00:00.000
2 jhon transection 2 700 1500 Cr 2016-01-14 00:00:00.000
2 jhon transection 1 700 2200 Cr 2016-01-15 00:00:00.000
3 smith opening Bal 500 Cr
3 smith transection 1 100 600 Cr 2016-02-03 00:00:00.000
3 smith transection 2 100 700 Cr 2016-02-04 00:00:00.000
3 smith transection 2 200 500 Dr 2016-02-05 00:00:00.000
3 smith transection 1 200 300 Dr 2016-02-06 00:00:00.000
3 smith transection 1 400 700 Cr 2016-02-07 00:00:00.000
3 smith transection 2 400 1100 Cr 2016-02-08 00:00:00.000

You can do it with something like this:
select
empid, sum(amount) over (partition by empid order by date) as balance, details
from (
select
empid, case creditdebit when 1 then amount else -amount end as amount, date, details
from (
select empid, openingbal as amount, 1 as creditdebit, '19000101' as date, 'opening Bal' as details
from tbl_employee
union all
select empid, amount, creditdebit, date, 'transection 1'
from tbl_transection1
union all
select empid, amount, creditdebit, date, 'transection 2'
from tbl_transection2
) X
) Y
The innermost select is to gather the data from the 3 tables, the next one is to calculate +/- for the amounts and the outermost is to calculate the balance.
Example in SQL Fiddle

How to extract info based on the latest row

I have two tables:-
TABLE A :-
ORNO DEL PONO QTY
801 123 1 80
801 123 2 60
801 123 3 70
801 151 1 95
801 151 3 75
802 130 1 50
802 130 2 40
802 130 3 30
802 181 2 55
TABLE B:-
ORNO PONO STATUS ITEM
801 1 12 APPLE
801 2 12 ORANGE
801 3 12 MANGO
802 1 22 PEAR
802 2 22 KIWI
802 3 22 MELON
I wish to extract the info based on the latest DEL (in Table A) using SQL. The final output should look like this:-
OUTPUT:-
ORNO PONO STATUS ITEM QTY
801 1 12 APPLE 95
801 2 12 ORANGE 60
801 3 12 MANGO 75
802 1 22 PEAR 50
802 2 22 KIWI 55
802 3 22 MELON 30
Thanks.

select b.*, y.QTY
from
(
select a.ORNO, a.PONO, MAX(a.DEL) [max]
from #tA a
group by a.ORNO, a.PONO
)x
join #tA y on y.ORNO = x.ORNO and y.PONO = x.PONO and y.DEL = x.max
join #tB b on b.ORNO = y.ORNO and b.PONO = y.PONO
Output:
ORNO PONO STATUS ITEM QTY
----------- ----------- ----------- ---------- -----------
801 1 12 APPLE 95
801 2 12 ORANGE 60
801 3 12 MANGO 75
802 1 22 PEAR 50
802 2 22 KIWI 55
802 3 22 MELON 30

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

standard sql: Get customer count and first purchase date per customer and store_id - sql

Related

Get Data in a row with specific values

fill gaps day with two tables in sql

pandas retain values on different index dataframes

Want SQL query for this scenario

How to extract info based on the latest row

Categories

Resources