Unique cumulative customers by each day - sql

Task: Get the total number of unique cumulative customers by each decline reason and by each day.
Input data sample:
+---------+--------------+------------+------+
| Cust_Id | Decline_Dt | Reason | Days |
+---------+--------------+------------+------+
| A | 08-09-2020 | Reason_1 | 0 |
| A | 08-09-2020 | Reason_1 | 1 |
| A | 08-09-2020 | Reason_1 | 2 |
| A | 08-09-2020 | Reason_1 | 4 |
| B | 08-09-2020 | Reason_1 | 0 |
| B | 08-09-2020 | Reason_1 | 2 |
| B | 08-09-2020 | Reason_1 | 3 |
| C | 08-09-2020 | Reason_1 | 1 |
+---------+--------------+------------+------+
1) Decline_dt - The date on which the payment was declined. (Ignore it for this task)
2) Days - Indicates the # of days after the payment decline happened, the customer interacted with IVR channel.
3) Reason - Indicates the payment decline reason
--Expected Output:
+---------------+-----------+---------------+----------------------------+
| Reason | Days | Unique_mtns | total_cumulative_customers |
+---------------+-----------+---------------+----------------------------+
| Reason_1 | 0 | 2 | 2 |
| Reason_1 | 1 | 2 | 3 |
| Reason_1 | 2 | 2 | 3 |
| Reason_1 | 3 | 1 | 3 |
| Reason_1 | 4 | 1 | 3 |
+------------------------------------------------------------------------+
My Hive query:
select a.Reason
, a.days
-- , count(distinct a.cust_id) as unique_mtns
, count(distinct a.cust_id) over (partition by Reason
order by a.days rows between unbounded preceding and current row)
as total_cumulative_customers
from table as a
group by a.reason
, a.days
Output (Incorrect):
+---------------+-----------+----------------------------+
| Reason | Days | total_cumulative_customers |
+---------------+-----------+----------------------------+
| Reason_1 | 0 | 2 |
| Reason_1 | 1 | 2 |
| Reason_1 | 2 | 2 |
| Reason_1 | 3 | 1 |
| Reason_1 | 4 | 1 |
+--------------------------------------------------------+
Ideally, I would expect the window function to be executed without group by.
However, I get an error without group by. When I use group by, I don't get the cumulative customers.

If I follow you correctly, you can use a subquery to compute the first day per customer/reason tuple, and then do conditional aggregation:
select reason, days,
count(distinct cust_id) as unique_mtns,
sum(sum(case when days = min_days then 1 else 0 end))
over(partition by reason order by days) as total_cumulative_customers
from (
select reason, cust_id,
min(days) over(partition by reason, cust_id) as min_days
from mytable
) t
group by reason, days

I would recommend using row_number() to enumerate the rows or a given customer and reason. Your code uses count(distinct) on the user id, suggesting that you might have duplicates on a given day.
This would be:
select reason, days, count(distinct cust_id) as unique_mtns,
sum(sum(case when seqnum = 1 then 1 else 0 end)) over (partition by reason order by days) as total_cumulative_customers
from (select t.*,
row_number() over (partition by reason, cust_id order by days) as seqnum
from t
) t
group by reason, days
order by reason, days;

Related

SQL Select until Quantity Met

i need a sql query that can show something like this :
My Data :
| id | Purchase Number | Qty |
|:---- |:------:| -----:|
| 1 | A | 3 |
| 2 | B | 2 |
| 3 | C | 4 |
For example i need to take 6 Qty
I want the result will be like this :
| id | Purchase Number | Qty |
|:---- |:------:| -----:|
| 1 | A | 3 |
| 2 | B | 2 |
| 3 | C | 1|
I've read similar thread but cant find what i need
You can use a cumulative sum:
select id, purchase_number,
(case when running_qty < 6 then qty
else 6 - (running_qty - qty)
end)
from (select t.*,
sum(qty) over (order by id) as running_qty
from t
) t
where running_qty - qty < 6;
Here is a db<>fiddle (which uses Postgres but this is standard SQL).

SQL - Calculate number of occurrences of previous day?

I want to calculate the number of people who also had occurrence the previous day on a daily basis, but I'm not sure how to do this?
Sample Table:
| ID | Date |
+----+-----------+
| 1 | 1/10/2020 |
| 1 | 1/11/2020 |
| 2 | 2/20/2020 |
| 3 | 2/20/2020 |
| 3 | 2/21/2020 |
| 4 | 2/23/2020 |
| 4 | 2/24/2020 |
| 5 | 2/22/2020 |
| 5 | 2/23/2020 |
| 5 | 2/24/2020 |
+----+-----------+
Desired Output:
| Date | Count |
+-----------+-------+
| 1/11/2020 | 1 |
| 2/21/2020 | 1 |
| 2/23/2020 | 1 |
| 2/24/2020 | 2 |
+-----------+-------+
Edit: Added desired output. The output count should be unique to the ID, not the number of date occurrences. i.e. an ID 5 can appear on this list 10 times for dates 2/23/2020 and 2/24/2020, but that would count as "1".
Use lag():
select date, count(*)
from (select t.*, lag(date) over (partition by id order by date) as prev_date
from t
) t
where prev_date = dateadd(day, -1, date)
group by date;

SQL group by changing column

Suppose I have a table sorted by date as so:
+-------------+--------+
| DATE | VALUE |
+-------------+--------+
| 01-09-2020 | 5 |
| 01-15-2020 | 5 |
| 01-17-2020 | 5 |
| 02-03-2020 | 8 |
| 02-13-2020 | 8 |
| 02-20-2020 | 8 |
| 02-23-2020 | 5 |
| 02-25-2020 | 5 |
| 02-28-2020 | 3 |
| 03-13-2020 | 3 |
| 03-18-2020 | 3 |
+-------------+--------+
I want to group by changes in value within that given date range, and add a value that increments each time as an added column to denote that.
I have tried a number of different things, such as using the lag function:
SELECT value, value - lag(value) over (order by date) as count
GROUP BY value
In short, I want to take the table above and have it look like:
+-------------+--------+-------+
| DATE | VALUE | COUNT |
+-------------+--------+-------+
| 01-09-2020 | 5 | 1 |
| 01-15-2020 | 5 | 1 |
| 01-17-2020 | 5 | 1 |
| 02-03-2020 | 8 | 2 |
| 02-13-2020 | 8 | 2 |
| 02-20-2020 | 8 | 2 |
| 02-23-2020 | 5 | 3 |
| 02-25-2020 | 5 | 3 |
| 02-28-2020 | 3 | 4 |
| 03-13-2020 | 3 | 4 |
| 03-18-2020 | 3 | 4 |
+-------------+--------+-------+
I want to eventually have it all in one small table with the earliest date for each.
+-------------+--------+-------+
| DATE | VALUE | COUNT |
+-------------+--------+-------+
| 01-09-2020 | 5 | 1 |
| 02-03-2020 | 8 | 2 |
| 02-23-2020 | 5 | 3 |
| 02-28-2020 | 3 | 4 |
+-------------+--------+-------+
Any help would be very appreciated
you can use a combination of Row_number and Dense_rank functions to get the required results like below:
;with cte
as
(
select t.DATE,t.VALUE
,Dense_rank() over(partition by t.VALUE order by t.DATE) as d_rank
,Row_number() over(partition by t.VALUE order by t.DATE) as r_num
from table t
)
Select t.Date,t.Value,d_rank as count
from cte
where r_num = 1
You can use a lag and cumulative sum and a subquery:
SELECT value,
SUM(CASE WHEN prev_value = value THEN 0 ELSE 1 END) OVER (ORDER BY date)
FROM (SELECT t.*, LAG(value) OVER (ORDER BY date) as prev_value
FROM t
) t
Here is a db<>fiddle.
You can recursively use lag() and then row_number() analytic functions :
WITH t2 AS
(
SELECT LAG(value,1,value-1) OVER (ORDER BY date) as lg,
t.*
FROM t
)
SELECT t2.date,t2.value, ROW_NUMBER() OVER (ORDER BY t2.date) as count
FROM t2
WHERE value - lg != 0
Demo
and filter through inequalities among the returned values from those functions.

SQL calculating sum and number of distinct values within group

I want to calculate
(1) total sales amount
(2) number of distinct stores per product
in one query, if possible. Suppose we have data:
+-----------+---------+-------+--------+
| store | product | month | amount |
+-----------+---------+-------+--------+
| Anthill | A | 1 | 1 |
| Anthill | A | 2 | 1 |
| Anthill | A | 3 | 1 |
| Beetle | A | 1 | 1 |
| Beetle | A | 3 | 1 |
| Cockroach | A | 1 | 1 |
| Cockroach | A | 2 | 1 |
| Cockroach | A | 3 | 1 |
| Anthill | B | 1 | 1 |
| Beetle | B | 2 | 1 |
| Cockroach | B | 3 | 1 |
+-----------+---------+-------+--------+
I have tried this with no luck:
select
[product]
,[month]
,[amount]
,cnt_distinct_stores = count(distinct(stores))
from dbo.temp
group by
[product]
,[month]
order by 1,2
Would there be possible any combination of GROUP BY clause with window functions like SUM(amount) OVER(partition by [product],[month] ORDER BY [month] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
Try
SELECT product,
SUM(amount),
COUNT(DISTINCT store)
FROM dbo.temp
GROUP BY product

Aggregation by positive/negative values v.2

I've posted several topics and every query had some problems :( Changed table and examples for better understanding
I have a table called PROD_COST with 5 fields
(ID,Duration,Cost,COST_NEXT,COST_CHANGE).
I need extra field called "groups" for aggregation.
Duration = number of days the price is valid (1 day=1row).
Cost = product price in this day.
-Cost_next = lead(cost,1,0).
Cost_change = Cost_next - Cost.
example:
+----+---------+------+-------------+-------+
|ID |Duration | Cost | Cost_change | Groups|
+----+---------+------+-------------+-------+
| 1 | 1 | 10 | -1,5 | 1 |
| 2 | 1 | 8,5 | 3,7 | 2 |
| 3 | 1 | 12.2 | 0 | 2 |
| 4 | 1 | 12.2 | -2,2 | 3 |
| 5 | 1 | 10 | 0 | 3 |
| 6 | 1 | 10 | 3.2 | 4 |
| 7 | 1 | 13.2 | -2,7 | 5 |
| 8 | 1 | 10.5 | -1,5 | 5 |
| 9 | 1 | 9 | 0 | 5 |
| 10 | 1 | 9 | 0 | 5 |
| 11 | 1 | 9 | -1 | 5 |
| 12 | 1 | 8 | 1.5 | 6 |
+----+---------+------+-------------+-------+
Now i need to group("Groups" field) by Cost_change. It can be positive,negative or 0 values.
Some kind guy advised me this query:
select id, COST_CHANGE, sum(GRP) over (order by id asc) +1
from
(
select *, case when sign(COST_CHANGE) != sign(isnull(lag(COST_CHANGE)
over (order by id asc),COST_CHANGE)) and Cost_change!=0 then 1 else 0 end as GRP
from PROD_COST
) X
But there is a problem: If there are 0 values between two positive or negative values than it groups it separately, for example:
+-------------+--------+
| Cost_change | Groups |
+-------------+--------+
| 9.262 | 5777 |
| -9.262 | 5778 |
| 9.262 | 5779 |
| 0.000 | 5779 |
| 9.608 | 5780 |
| -11.231 | 5781 |
| 10.000 | 5782 |
+-------------+--------+
I need to have:
+-------------+--------+
| Cost_change | Groups |
+-------------+--------+
| 9.262 | 5777 |
| -9.262 | 5778 |
| 9.262 | 5779 |
| 0.000 | 5779 |
| 9.608 | 5779 | -- Here
| -11.231 | 5780 |
| 10.000 | 5781 |
+-------------+--------+
In other words, if there's 0 values between two positive ot two negative values than they should be in one group, because Sequence: MINUS-0-0-MINUS - no rotation. But if i had MINUS-0-0-PLUS, than GROUPS should be 1-1-1-2, because positive valus is rotating with negative value.
Thank you for attention!
I'm Using Sql Server 2012
I think the best approach is to remove the zeros, do the calculation, and then re-insert them. So:
with pcg as (
select pc.*, min(id) over (partition by grp) as grpid
from (select pc.*,
(row_number() over (order by id) -
row_number() over (partition by sign(cost_change)
order by id
) as grp
from prod_cost pc
where cost_change <> 0
) pc
)
select pc.*, max(groups) over (order by id)
from prod_cost pc left join
(select pcg.*, dense_rank() over (order by grpid) as groups
from pcg
) pc
on pc.id = pcg.id;
The CTE assigns a group identifier based on the lowest id in the group, where the groups are bounded by actual sign changes. The subquery turns this into a number. The outer query then accumulates the maximum value, to give a value to the 0 records.