Find the customers and other metrics based on the purchase frequency new & repeat - sql

I am trying to find the customer count and sales by the type of customer (New and Returning) and the number of times they have purchased.
txn_date Customer_ID Transaction_Number Sales Reference(not in the SQL table) customer type (not in the sql table)
1/2/2019 1 12345 $10 Second Purchase SLS Repeat
4/3/2018 1 65890 $20 First Purchase SLS Repeat
3/22/2019 3 64453 $30 First Purchase SLS new
4/3/2019 4 88567 $20 First Purchase SLS new
5/21/2019 4 85446 $15 Second Purchase SLS new
1/23/2018 5 89464 $40 First Purchase SLS Repeat
4/3/2019 5 99674 $30 Second Purchase SLS Repeat
4/3/2019 6 32224 $20 Second Purchase SLS Repeat
1/23/2018 6 46466 $30 First Purchase SLS Repeat
1/20/2018 7 56558 $30 First Purchase SLS new
I am using the below code to get the aggregate sales and customer count for the total customers:
select seqnum, count(distinct customer_id), sum(sales) from (
select co.*,
row_number() over (partition by customer_id order by txn_date) as seqnum
from somya co)
group by seqnum
order by seqnum;
I want to get the same data by the customer type:
for example for the new customers my result should show:
New Customers Customer_Count Sum(Sales)
1st Purchase 3 $80
2nd Purchase 1 $15
Returning Customers Customer_Count Sum(Sales)
1st Purchase 3 $90
2nd Purchase 3 $60
I am trying the below query to get the data for new and repeat customers:
New Customers:
select seqnum, count(distinct customer_id), sum(sales)
from (
select co.*,
row_number() over (partition by customer_id order by trunc(txn_date)) as seqnum,
MIN (TRUNC (TXN_DATE)) OVER (PARTITION BY customer_id) as MIN_TXN_DATE
from somya co
)
where MIN_TXN_DATE between '01-JAN-19' and '31-DEC-19'
group by seqnum
order by seqnum asc;
Returning Customers:
select seqnum, count(distinct customer_id), sum(sales)
from (
select co.*,
row_number() over (partition by customer_id order by trunc(txn_date)) as seqnum,
MIN (TRUNC (TXN_DATE)) OVER (PARTITION BY customer_id) as MIN_TXN_DATE
from somya co
)
where MIN_TXN_DATE <'01-JAN-19'
group by seqnum
order by seqnum asc;
I am not able to figure out what is wrong with my query or if there is a problem with my logic.
This is just a sample data, I have transactions from all the years in my data base so I need to narrow the transaction date in the query but as soon as I narrowing down the data using the transaction date the repeat customer query doesnt give me anything and the new customer query gives me the total customer for that period.

If I understand correctly, you need to know the first time someone becomes a customer. And then use this:
select (case when first_year < 2019 then 'returning' else 'new' end) as custtype,
seqnum, count(*), sum(sales)
from (select co.*,
row_number() over (partition by customer_id, extract(year from txn_date) order by txn_date) as seqnum,
min(extract(year from txn_date)) over (partition by customer_id) as first_year
from somya co
) s
where txn_date >= date '2019-01-01' and
txn_date < date '2020-01-01'
group by (case when first_year < 2019 then 'returning' else 'new' end),
seqnum
order by custtype, seqnum;

You can categorize your sales data to assign a customer type and a purchase sequence using windowing functions, like this:
SELECT sd.txn_date,
sd.customer_id,
sd.transaction_number,
sd.sales,
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01'
AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01'
THEN 'Repeat'
ELSE 'New' END customer_type,
row_number() over ( partition by customer_id order by txn_date) purchase_sequence
FROM sales_data sd
+-----------+-------------+--------------------+-------+---------------+-------------------+
| TXN_DATE | CUSTOMER_ID | TRANSACTION_NUMBER | SALES | CUSTOMER_TYPE | PURCHASE_SEQUENCE |
+-----------+-------------+--------------------+-------+---------------+-------------------+
| 03-APR-18 | 1 | 65890 | 20 | Repeat | 1 |
| 02-JAN-19 | 1 | 12345 | 10 | Repeat | 2 |
| 22-MAR-19 | 3 | 64453 | 30 | New | 1 |
| 03-APR-19 | 4 | 88567 | 20 | New | 1 |
| 21-MAY-19 | 4 | 85446 | 15 | New | 2 |
| 23-JAN-18 | 5 | 89464 | 40 | Repeat | 1 |
| 03-APR-19 | 5 | 99674 | 30 | Repeat | 2 |
| 23-JAN-18 | 6 | 46466 | 30 | Repeat | 1 |
| 03-APR-19 | 6 | 32224 | 20 | Repeat | 2 |
| 20-JAN-18 | 7 | 56558 | 30 | New | 1 |
+-----------+-------------+--------------------+-------+---------------+-------------------+
Then, you can wrap that in a common table expression (aka "WITH" clause) and summarize by the customer type and purchase sequence:
WITH categorized_sales_data AS (
SELECT sd.txn_date,
sd.customer_id,
sd.transaction_number,
sd.sales,
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01' AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01' THEN 'Repeat' ELSE 'New' END customer_type,
row_number() over ( partition by customer_id order by txn_date) purchase_sequence
FROM sales_data sd)
SELECT customer_type, purchase_sequence, count(*), sum(sales)
FROM categorized_sales_data
group by customer_type, purchase_sequence
order by customer_type, purchase_sequence
+---------------+-------------------+----------+------------+
| CUSTOMER_TYPE | PURCHASE_SEQUENCE | COUNT(*) | SUM(SALES) |
+---------------+-------------------+----------+------------+
| New | 1 | 3 | 80 |
| New | 2 | 1 | 15 |
| Repeat | 1 | 3 | 90 |
| Repeat | 2 | 3 | 60 |
+---------------+-------------------+----------+------------+
Here's a full SQL with test data:
with sales_data (txn_date, Customer_ID, Transaction_Number, Sales ) as (
SELECT TO_DATE('1/2/2019','MM/DD/YYYY'), 1, 12345, 10 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2018','MM/DD/YYYY'), 1, 65890, 20 FROM DUAL UNION ALL
SELECT TO_DATE('3/22/2019','MM/DD/YYYY'), 3, 64453, 30 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2019','MM/DD/YYYY'), 4, 88567, 20 FROM DUAL UNION ALL
SELECT TO_DATE('5/21/2019','MM/DD/YYYY'), 4, 85446, 15 FROM DUAL UNION ALL
SELECT TO_DATE('1/23/2018','MM/DD/YYYY'), 5, 89464, 40 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2019','MM/DD/YYYY'), 5, 99674, 30 FROM DUAL UNION ALL
SELECT TO_DATE('4/3/2019','MM/DD/YYYY'), 6, 32224, 20 FROM DUAL UNION ALL
SELECT TO_DATE('1/23/2018','MM/DD/YYYY'), 6, 46466, 30 FROM DUAL UNION ALL
SELECT TO_DATE('1/20/2018','MM/DD/YYYY'), 7, 56558, 30 FROM DUAL ),
-- Query starts here
/* WITH */ categorized_sales_data AS (
SELECT sd.txn_date,
sd.customer_id,
sd.transaction_number,
sd.sales,
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01' AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01' THEN 'Repeat' ELSE 'New' END customer_type,
row_number() over ( partition by customer_id order by txn_date) purchase_sequence
FROM sales_data sd)
SELECT customer_type, purchase_sequence, count(*), sum(sales)
FROM categorized_sales_data
group by customer_type, purchase_sequence
order by customer_type, purchase_sequence
Response to comment from OP
all the customers whose first purchase date is in 2019 would be a new customer. Any customer who has transacted in 2019 but their first purchase date is before 2019 would be a repeat customer
So, change
case when min(txn_date) over ( partition by customer_id ) < DATE '2019-01-01'
AND max(txn_date) OVER ( partition by customer_id ) >= DATE '2019-01-01'
THEN 'Repeat' ELSE 'New' END customer_type
to
case when min(txn_date) over ( partition by customer_id )
BETWEEN DATE '2019-01-01' AND DATE '2020-01-01' - INTERVAL '1' SECOND
THEN 'New' ELSE 'Repeat' END customer_type
i.e., if and only if a customer's first purchase was in 2019 then they are "new".

Related

SQL/DB2 getting single row of results per employee with a UNION

I'm currently using a UNION on 2 select statements and while I'm getting the correct data, it's not exactly what I actually need when it comes time to use it in a front-end view
I'm currently using this query:
SELECT
T.employee as employee,
'Orders' as TYPE,
SUM(CASE WHEN t.order_count < QUANT THEN t.order_count ELSE QUANT END) as DATA
FROM schemaOne.order_list T
WHERE t.order_date > CURRENT_DATE - 35 DAYS
group by t.employee
UNION
select
T.employee as employee,
'Sales' as TYPE,
sum(price * quant) as DATA
from schemaOne.sales T
WHERE T.sales_date > CURRENT_DATE - 35 DAYS
group by T.employee
order by data desc;
with these dummy tables as examples and getting the following result:
order_list
employee | order_count | quant | order_date
--------------------------------------------------
123 | 5 | 1 | 2022-03-02
456 | 1 | 5 | 2022-03-02
sales
employee | price | quant | order_date
--------------------------------------------------
123 | 500 | 1 | 2022-03-02
456 | 1000 | 1 | 2022-03-02
Result
employee | type | data
------------------------------------------
123 Orders 1
123 Sales 500
456 Orders 5
456 Sales 1000
Is there a way to use a UNION but alter it so that I can instead get a single row for each employee and just get rid of the type/data columns and instead set each piece of data to the desired column (the type would instead be the column name ) like so:
Desired Result
employee | Orders | Sales
---------------------------------
123 | 1 | 500
456 | 5 | 1000
Try adding an outer query:
select employee,
MAX(case when type=Orders then data end) as orders ,
MAX(case when type=Sales then data end) as Sales
from (
SELECT T.employee as employee,
'Orders' as TYPE,
SUM(CASE WHEN t.order_count < QUANT THEN t.order_count ELSE QUANT END) as DATA
FROM schemaOne.order_list T
WHERE t.order_date > CURRENT_DATE - 35 DAYS
group by t.employee
UNION
select T.employee as employee,
'Sales' as TYPE,
sum(price * quant) as DATA
from schemaOne.sales T
WHERE T.sales_date > CURRENT_DATE - 35 DAYS
group by T.employee
) as t1
GROUP BY employee;
Note that I removed order by data desc it has no effect inside the union
You can join tables through employee columns such as
SELECT o.employee,
SUM(CASE
WHEN o.order_count < o.quant THEN
o.order_count
ELSE
o.quant
END) AS Orders,
SUM(s.price * s.quant) AS Sales
FROM schemaOne.order_list o
JOIN schemaOne.sales s
ON s.employee = o.employee
AND s.sales_date = o.order_date
WHERE o.order_date > current_date - 35 DAYS
GROUP BY o.employee

query to keep partitions separate when physically separated

I have a table that contains order/shipment history. A basic dummy version is:
ORDERS
order_no | order_stat | stat_date
2 | Planned | 01-Jan-2000
2 | Picked | 15-Jan-2000
2 | Planned | 17-Jan-2000
2 | Planned | 05-Feb-2000
2 | Planned | 31-Mar-2000
2 | Picked | 05-Apr-2000
2 | Shipped | 10-Apr-2000
I need to figure out how long each order has been in each order status/phase. The only problem is when I create a partition on the order_no and order_stat, I get results that make sense but are not what I am looking for.
My sql:
select
order_no
,order_stat
,stat_date
,lag(stat_date, 1) over (partition by order_no order by stat_date) prev_stat_date
,stat_date - lag(stat_date, 1) over (partition by order_no order by stat_date) date_diff
,row_number() over(partition by order_no, order_stat order by stat_date) rnk
from
orders
Will give me the following results:
order_no | order_stat | stat_date | prev_stat_date | rnk
2 | Planned | 01-Jan-2000 | | 1
2 | Picked | 15-Jan-2000 | 01-Jan-2000 | 1
2 | Planned | 17-Jan-2000 | 15-Jan-2000 | 2
2 | Planned | 05-Feb-2000 | 17-Jan-2000 | 3
2 | Planned | 31-Mar-2000 | 05-Feb-2000 | 4
2 | Picked | 05-Apr-2000 | 31-Mar-2000 | 2
2 | Shipped | 10-Apr-2000 | 05-Apr-2000 | 1
I would like to have results that look like this (the rnk starts over when it reverts back to a previous order stat):
order_no | order_stat | stat_date | prev_stat_date | rnk
2 | Planned | 01-Jan-2000 | | 1
2 | Picked | 15-Jan-2000 | 01-Jan-2000 | 1
2 | Planned | 17-Jan-2000 | 15-Jan-2000 | 1
2 | Planned | 05-Feb-2000 | 17-Jan-2000 | 2
2 | Planned | 31-Mar-2000 | 05-Feb-2000 | 3
2 | Picked | 05-Apr-2000 | 31-Mar-2000 | 1
2 | Shipped | 10-Apr-2000 | 05-Apr-2000 | 1
I'm trying to get a running total count of how long it has been in the status (that starts over even if the status it changes to has existed previously instead of being included in the previous partition) but I have no idea how to approach this. Any and all insight would be greatly appreciated.
If I understand correctly, this is a gaps-and-islands problem.
The difference of row numbers can be used to identify the "island"s and then to enumerate the values:
select t.*,
row_number() over (partition by order_no, order_stat, seqnum - seqnum_2 order by stat_date) as your_rank
from (select o.*,
row_number() over (partition by order_no order by stat_date) as seqnum,
row_number() over (partition by order_no, order_stat order by stat_date) as seqnum_2
from orders o
) t;
I've left out the other columns (like the lag()) so you can see the logic. It can be a bit hard to follow why this works. If you stare at some rows from the subquery, you will probably see how the difference of the row numbers defines the groups you want.
Continuing #Gordon's Tabibitosan approach, once you have the groupings you can get both the order within each group and the elapsed number of days for each member of the group:
-- CTE for sample data
with orders (order_no, order_stat, stat_date) as (
select 2, 'Planned', date '2000-01-01' from dual
union all select 2, 'Picked', date '2000-01-15' from dual
union all select 2, 'Planned', date '2000-01-17' from dual
union all select 2, 'Planned', date '2000-02-05' from dual
union all select 2, 'Planned', date '2000-03-31' from dual
union all select 2, 'Picked ', date '2000-04-05' from dual
union all select 2, 'Shipped', date '2000-04-10' from dual
)
-- actual query
select order_no, order_stat, stat_date, grp,
dense_rank() over (partition by order_no, order_stat, grp order by stat_date) as rnk,
stat_date - min(stat_date) keep (dense_rank first order by stat_date)
over (partition by order_no, order_stat, grp) as stat_days
from (
select order_no, order_stat, stat_date,
row_number() over (partition by order_no order by stat_date)
- row_number() over (partition by order_no, order_stat order by stat_date) as grp
from orders
)
order by order_no, stat_date;
ORDER_NO ORDER_S STAT_DATE GRP RNK STAT_DAYS
---------- ------- ---------- ---------- ---------- ----------
2 Planned 2000-01-01 0 1 0
2 Picked 2000-01-15 1 1 0
2 Planned 2000-01-17 1 1 0
2 Planned 2000-02-05 1 2 19
2 Planned 2000-03-31 1 3 74
2 Picked 2000-04-05 5 1 0
2 Shipped 2000-04-10 6 1 0
The inline view is essentially what Gordon did, except it trivially does the subtraction at that level. The outer query then gets the rank the same way, but also uses an analytic function to get the earliest date for that group, and subtracts it from the current row's date. You don't have to include grp or rnk in your final result of course, they're shown to give more insight into what's happening.
It isn't clear exactly what you want, but you can expand even further to, for instance:
with cte1 (order_no, order_stat, stat_date, grp) as (
select order_no, order_stat, stat_date,
row_number() over (partition by order_no order by stat_date)
- row_number() over (partition by order_no, order_stat order by stat_date)
from orders
),
cte2 (order_no, order_stat, stat_date, grp, grp_date, rnk) as (
select order_no, order_stat, stat_date, grp,
min(stat_date) keep (dense_rank first order by stat_date)
over (partition by order_no, order_stat, grp),
dense_rank() over (partition by order_no, order_stat, grp order by stat_date)
from cte1
)
select order_no, order_stat, stat_date, grp, grp_date, rnk,
stat_date - grp_date as stat_days_so_far,
case
when order_stat != 'Shipped' then
coalesce(first_value(stat_date)
over (partition by order_no order by grp_date
range between 1 following and unbounded following), trunc(sysdate))
- min(stat_date) keep (dense_rank first order by stat_date)
over (partition by order_no, order_stat, grp)
end as stat_days_total,
stat_date - min(stat_date) over (partition by order_no) as order_days_so_far,
case
when max(order_stat) keep (dense_rank last order by stat_date)
over (partition by order_no) = 'Shipped' then
max(stat_date) over (partition by order_no)
else
trunc(sysdate)
end
- min(stat_date) over (partition by order_no) as order_days_total
from cte2
order by order_no, stat_date;
which for your sample data gives:
ORDER_NO ORDER_S STAT_DATE GRP GRP_DATE RNK STAT_DAYS_SO_FAR STAT_DAYS_TOTAL ORDER_DAYS_SO_FAR ORDER_DAYS_TOTAL
---------- ------- ---------- ---------- ---------- ---------- ---------------- --------------- ----------------- ----------------
2 Planned 2000-01-01 0 2000-01-01 1 0 14 0 100
2 Picked 2000-01-15 1 2000-01-15 1 0 2 14 100
2 Planned 2000-01-17 1 2000-01-17 1 0 79 16 100
2 Planned 2000-02-05 1 2000-01-17 2 19 79 35 100
2 Planned 2000-03-31 1 2000-01-17 3 74 79 90 100
2 Picked 2000-04-05 5 2000-04-05 1 0 5 95 100
2 Shipped 2000-04-10 6 2000-04-10 1 0 100 100
I've included some logic to assume that 'Shipped' is the final status, and if that hasn't been reached then the last status is still running - so counting up to today. That might be wrong, and you might have other end-status values (e.g. cancelled). Anyway, a few things for you to explore and play with...
You might be able to do something similar with match_recognize, but I'll leave that to someone else.

Query for negative account balance period in bigquery

I am playing around with bigquery and hit an interesting use case. I have a collection of customers and account balances. The account balances collection records any account balance change.
Customers:
+---------+--------+
| ID | Name |
+---------+--------+
| 1 | Alice |
| 2 | Bob |
+---------+--------+
Accounts balances:
+---------+---------------+---------+------------+
| ID | customer_id | value | timestamp |
+---------+---------------+---------+------------+
| 1 | 1 | -500 | 2019-02-12 |
| 2 | 1 | -200 | 2019-02-10 |
| 3 | 2 | 200 | 2019-02-10 |
| 4 | 1 | 0 | 2019-02-09 |
+---------+---------------+---------+------------+
The goal is to find out, for how long a customer has a negative account balance. The resulting collection would look like this:
+---------+--------+---------------------------------+
| ID | Name | Negative account balance since |
+---------+--------+---------------------------------+
| 1 | Alice | 2 days |
+---------+--------+---------------------------------+
Bob is not in the collection, because his last account record shows a positive value.
I think following steps are involved:
get last account balance per customer, see if it is negative
go through the account balance values until you hit a positive (or no more) value
compute datediff
Is something like this even possible in sql? Do you have any ideas on who to create such query? To get customers that currently have a negative account balance, I use this query:
SELECT customer_id FROM (
SELECT t.account_balance, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY timestamp DESC) as seqnum FROM `account_balances` t
) t
WHERE seqnum = 1 AND account_balance<0
Below is for BigQuery Standard SQL
#standardSQL
SELECT customer_id, name,
SUM(IF(negative_positive < 0, days, 0)) negative_days,
SUM(IF(negative_positive = 0, days, 0)) zero_days,
SUM(IF(negative_positive > 0, days, 0)) positive_days
FROM (
SELECT customer_id, negative_positive, grp,
1 + DATE_DIFF(MAX(ts), MIN(ts), DAY) days
FROM (
SELECT customer_id, ts, SIGN(value) negative_positive,
COUNTIF(flag) OVER(PARTITION BY customer_id ORDER BY ts) grp
FROM (
SELECT *, SIGN(value) = IFNULL(LEAD(SIGN(value)) OVER(PARTITION BY customer_id ORDER BY ts), 0) flag
FROM `project.dataset.balances`
)
)
GROUP BY customer_id, negative_positive, grp
)
LEFT JOIN `project.dataset.customers`
ON id = customer_id
GROUP BY customer_id, name
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.balances` AS (
SELECT 1 customer_id, -500 value, DATE '2019-02-12' ts UNION ALL
SELECT 1, -200, '2019-02-10' UNION ALL
SELECT 2, 200, '2019-02-10' UNION ALL
SELECT 1, 0, '2019-02-09'
), `project.dataset.customers` AS (
SELECT 1 id, 'Alice' name UNION ALL
SELECT 2, 'Bob'
)
SELECT customer_id, name,
SUM(IF(negative_positive < 0, days, 0)) negative_days,
SUM(IF(negative_positive = 0, days, 0)) zero_days,
SUM(IF(negative_positive > 0, days, 0)) positive_days
FROM (
SELECT customer_id, negative_positive, grp,
1 + DATE_DIFF(MAX(ts), MIN(ts), DAY) days
FROM (
SELECT customer_id, ts, SIGN(value) negative_positive,
COUNTIF(flag) OVER(PARTITION BY customer_id ORDER BY ts) grp
FROM (
SELECT *, SIGN(value) = IFNULL(LEAD(SIGN(value)) OVER(PARTITION BY customer_id ORDER BY ts), 0) flag
FROM `project.dataset.balances`
)
)
GROUP BY customer_id, negative_positive, grp
)
LEFT JOIN `project.dataset.customers`
ON id = customer_id
GROUP BY customer_id, name
-- ORDER BY customer_id
with result
Row customer_id name negative_days zero_days positive_days
1 1 Alice 3 1 0
2 2 Bob 0 0 1

Count and pivot a table by date

I would like to identify the returning customers from an Oracle(11g) table like this:
CustID | Date
-------|----------
XC321 | 2016-04-28
AV626 | 2016-05-18
DX970 | 2016-06-23
XC321 | 2016-05-28
XC321 | 2016-06-02
So I can see which customers returned within various windows, for example within 10, 20, 30, 40 or 50 days. For example:
CustID | 10_day | 20_day | 30_day | 40_day | 50_day
-------|--------|--------|--------|--------|--------
XC321 | | | 1 | |
XC321 | | | | 1 |
I would even accept a result like this:
CustID | Date | days_from_last_visit
-------|------------|---------------------
XC321 | 2016-05-28 | 30
XC321 | 2016-06-02 | 5
I guess it would use a partition by windowing clause with unbounded following and preceding clauses... but I cannot find any suitable examples.
Any ideas...?
Thanks
No need for window functions here, you can simply do it with conditional aggregation using CASE EXPRESSION :
SELECT t.custID,
COUNT(CASE WHEN (last_visit- t.date) <= 10 THEN 1 END) as 10_day,
COUNT(CASE WHEN (last_visit- t.date) between 11 and 20 THEN 1 END) as 20_day,
COUNT(CASE WHEN (last_visit- t.date) between 21 and 30 THEN 1 END) as 30_day,
.....
FROM (SELECT s.custID,
LEAD(s.date) OVER(PARTITION BY s.custID ORDER BY s.date DESC) as last_visit
FROM YourTable s) t
GROUP BY t.custID
Oracle Setup:
CREATE TABLE customers ( CustID, Activity_Date ) AS
SELECT 'XC321', DATE '2016-04-28' FROM DUAL UNION ALL
SELECT 'AV626', DATE '2016-05-18' FROM DUAL UNION ALL
SELECT 'DX970', DATE '2016-06-23' FROM DUAL UNION ALL
SELECT 'XC321', DATE '2016-05-28' FROM DUAL UNION ALL
SELECT 'XC321', DATE '2016-06-02' FROM DUAL;
Query:
SELECT *
FROM (
SELECT CustID,
Activity_Date AS First_Date,
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '10' DAY FOLLOWING )
- 1 AS "10_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '20' DAY FOLLOWING )
- 1 AS "20_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '30' DAY FOLLOWING )
- 1 AS "30_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '40' DAY FOLLOWING )
- 1 AS "40_Day",
COUNT(1) OVER ( PARTITION BY CustID
ORDER BY Activity_Date
RANGE BETWEEN CURRENT ROW AND INTERVAL '50' DAY FOLLOWING )
- 1 AS "50_Day",
ROW_NUMBER() OVER ( PARTITION BY CustID ORDER BY Activity_Date ) AS rn
FROM Customers
)
WHERE rn = 1;
Output
USTID FIRST_DATE 10_Day 20_Day 30_Day 40_Day 50_Day RN
------ ------------------- ---------- ---------- ---------- ---------- ---------- ----------
AV626 2016-05-18 00:00:00 0 0 0 0 0 1
DX970 2016-06-23 00:00:00 0 0 0 0 0 1
XC321 2016-04-28 00:00:00 0 0 1 2 2 1
Here is an answer that works for me, I have based it on your answers above, thanks for contributions from MT0 and Sagi:
SELECT CustID,
visit_date,
Prev_Visit ,
COUNT( CASE WHEN (Days_between_visits) <=10 THEN 1 END) AS "0-10_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 11 AND 20 THEN 1 END) AS "11-20_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 21 AND 30 THEN 1 END) AS "21-30_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 31 AND 40 THEN 1 END) AS "31-40_day" ,
COUNT( CASE WHEN (Days_between_visits) BETWEEN 41 AND 50 THEN 1 END) AS "41-50_day" ,
COUNT( CASE WHEN (Days_between_visits) >50 THEN 1 END) AS "51+_day"
FROM
(SELECT CustID,
visit_date,
Lead(T1.visit_date) over (partition BY T1.CustID order by T1.visit_date DESC) AS Prev_visit,
visit_date - Lead(T1.visit_date) over (
partition BY T1.CustID order by T1.visit_date DESC) AS Days_between_visits
FROM T1
) T2
WHERE Days_between_visits >0
GROUP BY T2.CustID ,
T2.visit_date ,
T2.Prev_visit ,
T2.Days_between_visits;
This returns:
CUSTID | VISIT_DATE | PREV_VISIT | DAYS_BETWEEN_VISIT | 0-10_DAY | 11-20_DAY | 21-30_DAY | 31-40_DAY | 41-50_DAY | 51+DAY
XC321 | 2016-05-28 | 2016-04-28 | 30 | | | 1 | | |
XC321 | 2016-06-02 | 2016-05-28 | 5 | 1 | | | | |

Querying for an ID that has the most number of reads

Suppose I have a table like the one below:
+----+-----------+
| ID | TIME |
+----+-----------+
| 1 | 12-MAR-15 |
| 2 | 23-APR-14 |
| 2 | 01-DEC-14 |
| 1 | 01-DEC-15 |
| 3 | 05-NOV-15 |
+----+-----------+
What I want to do is for each year ( the year is defined as DATE), list the ID that has the highest count in that year. So for example, ID 1 occurs the most in 2015, ID 2 occurs the most in 2014, etc.
What I have for a query is:
SELECT EXTRACT(year from time) "YEAR", COUNT(ID) "ID"
FROM table
GROUP BY EXTRACT(year from time)
ORDER BY COUNT(ID) DESC;
But this query just counts how many times a year occurs, how do I fix it to highest count of an ID in that year?
Output:
+------+----+
| YEAR | ID |
+------+----+
| 2015 | 2 |
| 2012 | 2 |
+------+----+
Expected Output:
+------+----+
| YEAR | ID |
+------+----+
| 2015 | 1 |
| 2014 | 2 |
+------+----+
Starting with your sample query, the first change is simply to group by the ID as well as by the year.
SELECT EXTRACT(year from time) "YEAR" , id, COUNT(*) "TOTAL"
FROM table
GROUP BY EXTRACT(year from time), id
ORDER BY EXTRACT(year from time) DESC, COUNT(*) DESC
With that, you could find the rows you want by visual inspection (the first row for each year is the ID with the most rows).
To have the query just return the rows with the highest totals, there are several different ways to do it. You need to consider what you want to do if there are ties - do you want to see all IDs tied for highest in a year, or just an arbitrary one?
Here is one approach - if there is a tie, this should return just the lowest of the tied IDs:
WITH groups AS (
SELECT EXTRACT(year from time) "YEAR" , id, COUNT(*) "TOTAL"
FROM table
GROUP BY EXTRACT(year from time), id
)
SELECT year, MIN(id) KEEP (DENSE_RANK FIRST ORDER BY total DESC)
FROM groups
GROUP BY year
ORDER BY year DESC
You need to count per id and then apply a RANK on that count:
SELECT *
FROM
(
SELECT EXTRACT(year from time) "YEAR" , ID, COUNT(*) AS cnt
, RANK() OVER (PARTITION BY "YEAR" ORDER BY COUNT(*) DESC) AS rnk
FROM table
GROUP BY EXTRACT(year from time), ID
) dt
WHERE rnk = 1
If this return multiple rows with the same high count per year and you want just one of them randomly, you can switch to a ROW_NUMBER.
This should do what you're after, I think:
with sample_data as (select 1 id, to_date('12/03/2015', 'dd/mm/yyyy') time from dual union all
select 2 id, to_date('23/04/2014', 'dd/mm/yyyy') time from dual union all
select 2 id, to_date('01/12/2014', 'dd/mm/yyyy') time from dual union all
select 1 id, to_date('01/12/2015', 'dd/mm/yyyy') time from dual union all
select 3 id, to_date('05/11/2015', 'dd/mm/yyyy') time from dual)
-- End of creating a subquery to mimick a table called "sample_data" containing your input data.
-- See SQL below:
select yr,
id most_frequent_id,
cnt_id_yr cnt_of_most_freq_id
from (select to_char(time, 'yyyy') yr,
id,
count(*) cnt_id_yr,
dense_rank() over (partition by to_char(time, 'yyyy') order by count(*) desc) dr
from sample_data
group by to_char(time, 'yyyy'),
id)
where dr = 1;
YR MOST_FREQUENT_ID CNT_OF_MOST_FREQ_ID
---- ---------------- -------------------
2014 2 2
2015 1 2