Counting unique combinations up until a date - per month - sql

I am looking into a table with transaction data of a two-sided platform, where you have buyers and sellers. I want to know the total amount of unique combinations of buyers and sellers. Let's say, Abe buys from Brandon in January, that's 1 combination. If Abe buys with Cece in February, that makes 2, but if Abe then buys from Brandon again, it's still 2.
My solution was to use the DENSE_RANK() function:
WITH
combos AS (
SELECT
t.buyerid, t.sellerid,
DENSE_RANK() OVER (ORDER BY t.buyerid, t.sellerid) AS combinations
FROM transactions t
WHERE t.transaction_date < '2018-05-01'
)
SELECT
MAX(combinations) AS total_combinations
FROM combos
This works fine. Each new combo gets a higher rank, and if you select the MAX of that result, you know the amount of unique combos.
However, I want to know this total amount of unique combos on a per month basis. The problem here is that if I group per transaction month, it only counts the unique combos in that month. In the example of Abe, it would be a unique combo in January, and then another combo in the next month, because that's how grouping works in SQL.
Example:
transaction_date buyerid sellerid
2018-01-03 3828 219
2018-01-08 2831 123
2018-02-10 3828 219
The output of DENSE_RANK() named combinations over all these rows is:
transaction_date buyerid sellerid combinations
2018-01-03 3828 219 1
2018-01-08 2831 123 2
2018-02-10 3828 219 2
And therefore, when selecting the MAX combinations you know the amount of unique buyer/seller combos, which is here.
However, I would like to see a running total of unique combos up until each start of the month, for all months until now. But, when we group on month, it would go like this:
transaction_date buyerid sellerid month combinations
2018-01-03 3828 219 jan 1
2018-01-08 2831 123 jan 2
2018-02-10 3828 219 feb 1
While I actually would want an output like:
month total_combinations_at_month_start
jan 0
feb 2
mar 2
How should I solve this? I've tried to find help on all kinds of window functions, but no luck until now. Thanks!

Here is one method:
WITH combos AS (
SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY sellerid, buyerid ORDER BY t.transaction_date) as combo_seqnum,
ROW_NUMBER() OVER (PARTITION BY sellerid, buyerid, date_trunc('month', t.transaction_date) ORDER BY t.transaction_date) as combo_month_seqnum
FROM transactions t
WHERE t.transaction_date < '2018-05-01'
)
SELECT 'Overall' as which, COUNT(*)
FROM combos
WHERE combo_seqnum = 1
UNION ALL
SELECT to_char(transaction_date, 'YYYY-MM'), COUNT(*)
FROM combos
WHERE combo_month_seqnum = 1
GROUP BY to_char(transaction_date, 'YYYY-MM');
This puts the results in separate rows. If you want a cumulative number and number per month:
SELECT to_char(transaction_date, 'YYYY-MM'),
SUM( (combo_month_seqnum = 1)::int ) as uniques_in_month,
SUM(SUM( (combo_seqnum = 1)::int )) OVER (ORDER BY to_char(transaction_date, 'YYYY-MM')) as uniques_through_month
FROM combos
GROUP BY to_char(transaction_date, 'YYYY-MM')
Here is a rextester illustrating the solution.

Related

Unpivoting for large dataset and greater number of unique columns

The pivot and unpivot functions in snowflake are not efficient for processing 30+ unique columns into row based.
Use case : I have 35 different month columns which needs to be rows based , another 35 columns will be quantity for the corresponding month .
So at the and there will be 2 columns(one for month data and another for quantity) for 70 unique columns
there would be aggregation of quantity based on month
But unpivoting is not at all efficient. The below query is scanning 15 GB of data from the main table used
select part_num ,concat(date_part(year, dates),'-',date_part(month, dates)) as month_year,
sum(quantity) as quantities
from table_name
unpivot(dates for cols in (month_1, 30 other uniue cols)),
unpivot(quantity for cols in (qunatity_1, 30 other uniue cols)),
group by part_num, month_year
Is there any other approach to unpivot large dataset.
Thanks
Alternative approach could be using conditional aggregation:
with cte as (
select part_num
,concat(date_part(year, dates),'-',date_part(month, dates)) as month_year
,sum(quantity) as quantities
from table_name
group by part_num, month_year
)
SELECT part_num
-- lowest date
,'2020-01' AS "2020-01"
,MAX(IFF(month_year='2020-01', quantities, NULL) AS "quantities_2020-01"
-- next date
,...
-- last date
,'2022-04' AS "2022-04"
,MAX(IFF(month_year='2022-04', quantities, NULL) AS "quantities_2022-04"
FROM cte
GROUP BY part_num;
Version using single GROUP BY and TO_VARCHAR with format:
SELECT part_num
-- lowest date
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM'),'2020-01',NULL) AS "2020-01"
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM')='2020-01',quantities,NULL) AS "quantities_2020-01"
-- next date
,...
-- last date
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM'),'2022-04',NULL) AS "2022-04"
,MAX(IFF(TO_VARCHAR(dates,'YYYY-MM')='2022-04',quantities,NULL) AS "quantities_2022-04"
FROM table_name
GROUP BY part_num;
So if we get some example DATA and test is what is happening is what is wanted..
Here is a trival and tiny CTE worth of data
with table_name(part_num, month_1, month_2, month_3, qunatity_1, qunatity_2, qunatity_3) as (
select * from values
(1, '2022-01-01'::date, '2022-02-01'::date, '2022-03-01'::date, 4, 5, 6)
)
now pointing your SQL at it (after making it compile)
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,sum(quantity) as quantities
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
unpivot(quantity for quan in (qunatity_1, qunatity_2, qunatity_3))
group by part_num, month_year
gives:
PART_NUM
MONTH_YEAR
QUANTITIES
1
2022-01
15
1
2022-02
15
1
2022-03
15
which is not what I think you are after.
If we look at the un aggregated rows:
PART_NUM
MONTH
DATES
QUAN
QUANTITY
1
MONTH_1
2022-01-01
QUNATITY_1
4
1
MONTH_1
2022-01-01
QUNATITY_2
5
1
MONTH_1
2022-01-01
QUNATITY_3
6
1
MONTH_2
2022-02-01
QUNATITY_1
4
1
MONTH_2
2022-02-01
QUNATITY_2
5
1
MONTH_2
2022-02-01
QUNATITY_3
6
1
MONTH_3
2022-03-01
QUNATITY_1
4
1
MONTH_3
2022-03-01
QUNATITY_2
5
1
MONTH_3
2022-03-01
QUNATITY_3
6
we are getting a cross join, which is not what I believe you are wanting.
my understanding is you want a relationship between month (1-35) and quantity (1-35)
thus a mix like:
PART_NUM
MONTH
DATES
QUAN
QUANTITY
1
MONTH_1
2022-01-01
QUNATITY_1
4
1
MONTH_2
2022-02-01
QUNATITY_2
5
1
MONTH_3
2022-03-01
QUNATITY_3
6
Guessed Answer:
My guess at what you really are wanting is:
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,array_construct(qunatity_1, qunatity_2, qunatity_3)[split_part(month,'_',2)::number - 1] as qunatity
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
order by 1,2;
which gives (for the same above CTE data):
PART_NUM
MONTH_YEAR
QUNATITY
1
2022-01
4
1
2022-02
5
1
2022-03
6
Another way to way to get than guessed answer:
select
part_num
,to_char(dates, 'yyyy-mm') as month_year
,sum(iff(split_part(month,'_',2)=split_part(q_name,'_',2), q_val, null)) as qunatity
from table_name
unpivot(dates for month in (month_1, month_2, month_3))
unpivot(q_val for q_name in (qunatity_1, qunatity_2, qunatity_3))
group by 1,2
order by 1,2;
which uses the double unpivot, so might be slow, but then only aggregates the values if they match. Which feels somewhat almost as gross as the build an array, to rip it apart, but that version is not needing to do large joins, just some per row grossness.
Assuming your data is already aggregated at part_num level, you could divide and conquer like this
with year_month as
(select a.part_num, b.index+1 as month_num, left(b.value,7) as year_month
from my_table a,table(flatten(input=>array_construct(m1,m2,m3...))) b),
quantities as
(select a.part_num, b.index+1 as month_num, b.value::int as quantity
from my_table a,table(flatten(input=>array_construct(q1,q2,q3...))) b)
select a.part_num, a.year_month, b.quantity
from year_month a
join quantities b on a.part_num=b.part_num and a.month_num=b.month_num

Count distinct customers who bought in previous period and not in next period Bigquery

I have a dataset in bigquery which contains order_date: DATE and customer_id.
order_date | CustomerID
2019-01-01 | 111
2019-02-01 | 112
2020-01-01 | 111
2020-02-01 | 113
2021-01-01 | 115
2021-02-01 | 119
I try to count distinct customer_id between the months of the previous year and the same months of the current year. For example, from 2019-01-01 to 2020-01-01, then from 2019-02-01 to 2020-02-01, and then who not bought in the same period of next year 2020-01-01 to 2021-01-01, then 2020-02-01 to 2021-02-01.
The output I am expect
order_date| count distinct CustomerID|who not buy in the next period
2020-01-01| 5191 |250
2020-02-01| 4859 |500
2020-03-01| 3567 |349
..........| .... |......
and the next periods shouldn't include the previous.
I tried the code below but it works in another way
with customers as (
select distinct date_trunc(date(order_date),month) as dates,
CUSTOMER_WID
from t
where date(order_date) between '2018-01-01' and current_date()-1
)
select
dates,
customers_previous,
customers_next_period
from
(
select dates,
count(CUSTOMER_WID) as customers_previous,
count(case when customer_wid_next is null then 1 end) as customers_next_period,
from (
select prev.dates,
prev.CUSTOMER_WID,
next.dates as next_dates,
next.CUSTOMER_WID as customer_wid_next
from customers as prev
left join customers
as next on next.dates=date_add(prev.dates,interval 1 year)
and prev.CUSTOMER_WID=next.CUSTOMER_WID
) as t2
group by dates
)
order by 1,2
Thanks in advance.
If I understand correctly, you are trying to count values on a window of time, and for that I recommend using window functions - docs here and here a great article explaining how it works.
That said, my recommendation would be:
SELECT DISTINCT
periods,
COUNT(DISTINCT CustomerID) OVER 12mos AS count_customers_last_12_mos
FROM (
SELECT
order_date,
FORMAT_DATE('%Y%m', order_date) AS periods,
customer_id
FROM dataset
)
WINDOW 12mos AS ( # window of last 12 months without current month
PARTITION BY periods ORDER BY periods DESC
ROWS BETWEEN 12 PRECEEDING AND 1 PRECEEDING
)
I believe from this you can build some customizations to improve the aggregations you want.
You can generate the periods using unnest(generate_date_array()). Then use joins to bring in the customers from the previous 12 months and the next 12 months. Finally, aggregate and count the customers:
select period,
count(distinct c_prev.customer_wid),
count(distinct c_next.customer_wid)
from unnest(generate_date_array(date '2020-01-01', date '2021-01-01', interval '1 month')) period join
customers c_prev
on c_prev.order_date <= period and
c_prev.order_date > date_add(period, interval -12 month) left join
customers c_next
on c_next.customer_wid = c_prev.customer_wid and
c_next.order_date > period and
c_next.order_date <= date_add(period, interval 12 month)
group by period;

I have a table of calls data I want to figure out the count Unique accounts called everyday and take sum of unique accounts called by monthly basis

I have a table with 2 unique columns one has an account number and the other is the date. The sample data is given below.
Date account
9/8/2020 555
9/8/2020 666
9/8/2020 777
9/8/2020 888
9/9/2020 555
9/9/2020 999
9/10/2020 555
9/10/2020 222
9/10/2020 333
9/11/2020 666
9/11/2020 111
I would like to calculate the number of unique accounts called every day and sum it up for a month for example if account number 555 is called on 8sept, p sept and 20 Sept its is not adding up to the cumulative sum the result should look like this
date Cumulative Unique Accounts Called SO Far this month
9/8/2020 4
9/9/2020 5
9/10/2020 7
9/11/2020 8
Thank you in advance for your help.
You can do this with aggregation and window functions. First, get the first date for each account, then aggregate and accumulate:
select min_date,
count(*) as as_of_date,
sum(count(*)) over (partition by year(min_datedate), month(min_datedate)
order by min_date
) as cumulative_unique_count
from (select account, min(date) as min_date
from t
group by account, year(date), month(date)
) t
group by min_date;
You can try the below -
with cte as
(
select date,count(*) as total from
(
select date,count,row_number() over(partition by count order by date) as rn
from tablename
)A where rn=1 group by date
)
select date,sum(total) over(order by date) as cum_sum
from cte

How to count no of days per user on a rolling basis in Oracle SQL?

Following on from a previous question, in which i have a table called orders with information regarding the time an order was placed and who made that order.
order_timestamp user_id
-------------------- ---------
1-JUN-20 02.56.12 123
3-JUN-20 12.01.01 533
23-JUN-20 08.42.18 123
12-JUN-20 02.53.59 238
19-JUN-20 02.33.72 34
I would like to calculate a daily rolling count of the number of days a user made an order in a past 10 days.
For example, in the last 10 days from the 20th June, user 34 made an order on 5 of those days. Then in the last 10 days from the 21st June, user 34 made an order on 6 of those days
In the end the table should be like this:
date user_id no_of_days
----------- --------- ------------
20-JUN-20 34 5
20-JUN-20 123 10
20-JUN-20 533 2
20-JUN-20 238 3
21-JUN-20 34 6
21-JUN-20 123 10
How would the query be written for this kind of analysis?
Please let me know if my question is unclear/more infor is required.
Thanks to you in advancement.
You can use window functions for this. Start by getting one row per user per day. And then use a rolling sum:
select day, user_id,
count(*) over (partition by user_id range between interval '10' day preceding and current row)
from (select distinct trunc(order_timestamp) as day, user_id
from t
) t
Assuming that a user places one order a day maximum, you can use window functions as follows:
select
t.*,
count(*) over(partition by user_id order by trunc(order_timestamp) range 10 preceding) no_of_days
from mytable t
Otherwise, you can get the distinct orders per day first:
select
order_day,
user_id,
count(*) over(partition by user_id order by order_day range 10 preceding) no_of_days
from (select distinct trunc(order_timestamp) order_day, user_id from mytable) t

Oracle sql: Order by with GROUP BY ROLLUP

I'm looking everywhere for an answer but nothing seems to compare with my problem. So, using rollup with query:
select year, month, count (sale_id) from sales
group by rollup (year, month);
Will give the result like:
YEAR MONTH TOTAL
2015 1 200
2015 2 415
2015 null 615
2016 1 444
2016 2 423
2016 null 867
null null 1482
And I would like to sort by total desc, but I would like year with biggest total to be on top (important: with all records that compares to that year), and then other records for other years. So I would like it to look like:
YEAR MONTH TOTAL
null null 1482
2016 null 867
2016 1 444
2016 2 423
2015 null 615
2015 2 415
2015 1 200
Or something like that. Main purpose is to not "split" records comparing to one year while sorting it with total. Can somebody help me with that?
Try using window function max to get max of total for each year in the order by clause:
select year, month, count(sale_id) total
from sales
group by rollup(year, month)
order by max(total) over (partition by year) desc, total desc;
Hmmm. I think this does what you want:
select year, month, count(sale_id) as cnt
from sales
group by rollup (year, month)
order by sum(count(sale_id)) over (partition by year) desc, year;
Actually, I've never use window functions in an order by with a rollup query. I wouldn't be surprised if a subquery were necessary.
I think you need to used GROUPING SETS and GROUP_ID's. These will help you determine a NULL caused by a subtotal. Take a look at the doc: https://docs.oracle.com/cd/B19306_01/server.102/b14223/aggreg.htm