SQL: How to group data into bands - sql

I've created a query that shows the number of times an individual client appears in a list of transactions....
select Client_Ref, count(*)
from Transactions
where Start_Date >= '2015-01-01'
group by Client_Ref
order by Client_Ref
...this returns data like this...
Client1 1
Client2 4
Client3 1
Client4 3
..What I need to do is summarize this into bands of frequency so that I get something like this...
No. of Clients with 1 transaction 53
No. of Clients with 2 transaction 157
No. of Clients with 3 transaction 25
No. of Clients with >3 transactions 259
I can't think how to so this in SQL, I could probably figure it out in Excel but I'd rather it was done at server level.

I call this a "histogram of histogram" query. Just use group by twice:
select cnt, count(*), min(CLlient_Ref), max(Client_Ref)
from (select Client_Ref, count(*) as cnt
from Transactions
where Start_Date >= '2015-01-01'
group by Client_Ref
) t
group by cnt
order by cnt;
I include the min and max client ref, because I often want to investigate certain values further.
If you want a limit at 3, you can use case:
select (case when cnt <= 3 then cast(cnt as varchar(255)) else '4+' end) as grp,
count(*), min(CLlient_Ref), max(Client_Ref)
from (select Client_Ref, count(*) as cnt
from Transactions
where Start_Date >= '2015-01-01'
group by Client_Ref
) t
group by (case when cnt <= 3 then cast(cnt as varchar(255)) else '4+' end)
order by min(cnt);

select cnt, count(*) from
(
select case count(*) when 1 then 'No. of Clients with 1 transaction'
when 2 then 'No. of Clients with 2 transactions'
when 3 then 'No. of Clients with 3 transactions'
else 'No. of Clients with >3 transactions'
end as cnt
from Transactions
where Start_Date >= '2015-01-01'
group by Client_Ref
)
group by cnt

You can do a conditional SUM() to pull the total for each grouping:
Select 'No. of Clients with 1 transaction' = Sum(Case When A.Total = 1 Then 1 Else 0 End),
'No. of Clients with 2 transactions' = Sum(Case When A.Total = 2 Then 1 Else 0 End),
'No. of Clients with 3 transactions' = Sum(Case When A.Total = 3 Then 1 Else 0 End),
'No. of Clients with >3 transactions' = Sum(Case When A.Total > 3 Then 1 Else 0 End)
From
(
Select Client_Ref, count(*) As Total
From Transactions
Where Start_Date >= '2015-01-01'
Group by Client_Ref
) A

You can create the buckets separately and then use a union all:
with COUNT1 as (
select Client_Ref, count(*) as count1
from Transactions
where Start_Date >= '2015-01-01'
group by Client_Ref
order by Client_Ref
)
,COUNT2 as (
select cast(C.count1 as varchar(5)) as count1,count(Client_Ref) as count2
from COUNT1 C
where count1 <= 3
group by C.count1
)
,COUNT3 as (
select '> 3' as count1,count(*) as count2
from COUNT1 C
where C.count1 > 3
)
select * from COUNT2
union all
select * from COUNT3
You can manually enter that text ('No. of Clients with N transactions') if you want to.

select Client_Ref
,count(*) as Count
,case when count(*) < 4 then count(*) else 4 end as Band
from Transactions
where Start_Date >= '2015-01-01'
group by Client_Ref
order by Client_Ref

Related

Flag=1/0 based on multiple criteria on same column

I have a temp table that is being created, we will say that column 1 is YearMonth, column2 as user_id, Column 3 is Type.
YearMonth User_id Type
200101 1 x
200101 2 y
200101 2 z
200102 1 x
200103 2 x
200103 2 p
200103 2 q
I want to count userids based on flag based on type. Hence I am trying to set flag to 1 and 0 but it always results in 0.
So for e.g. when the type contains x or y or z AND type contains P or Q then flag=1 by YearMonth.
I am trying something like
SELECT count (distinct t1.user_id) as count,
t1.YearMonth,
case when t1.type in ('x','y','z')
and
t1.type in ('p','q') then 1 else 0 end as flag
FROM table t1
group by 2,3;
I would like to know why it doesn't give output as below:
count YearMonth Flag
0 200001 1
2 200001 0
1 200002 1
1 200002 0
What am I missing here? Thanks
If I follow you correctly, you can use two levels of aggregation:
select yearmonth, flag, count(*) cnt
from (
select yearmonth, id,
case when max(case when t1.type in ('x', 'y', 'z') then 1 else 0 end) = 1
and max(case when t1.type in ('p', 'q') then 1 else 0 end) = 1
then 1
else 0
end as flag
from mytable
group by yearmonth, id
) t
group by yearmonth, flag
This first flags users for each month, using conditional aggregation, then aggregates by flag and month.
If you also want to display 0 for flags that do not appear for a given month, then you can generate the combinations with a cross join first, then brin the above resultset with a left join:
select y.yearmonth, f.flag, count(t.id) cnt
from (select distinct yearmonth from mytable) y
cross join (values (0), (1)) f(flag)
left join (
select yearmonth, id,
case when max(case when t1.type in ('x', 'y', 'z') then 1 else 0 end) = 1
and max(case when t1.type in ('p', 'q') then 1 else 0 end) = 1
then 1
else 0
end as flag
from mytable
group by yearmonth, id
) t on t.yearmonth = y.yearmonth and t.flag = f.flag
group by y.yearmonth, f.flag
I thought a very similar idea as GMB, however, like him, I don't get the expected results. Likely, however, we both are assuming the expected results are wrong:
SELECT COUNT(DISTINCT UserID) AS [Count],
YearMonth,
CASE WHEN COUNT(CASE WHEN [Type] IN ('x','y','z') THEN 1 END) > 0
AND COUNT(CASE WHEN [Type] IN ('p','q') THEN 1 END) > 0 THEN 1 ELSE 0
END AS Flag
FROM (VALUES(200101,1,'x'),
(200101,2,'y'),
(200101,2,'z'),
(200102,1,'x'),
(200103,2,'x'),
(200103,2,'p'),
(200103,2,'q')) V(YearMonth,UserID,[Type])
GROUP BY YearMonth;

Counting number of orders depending on city

I have a temp table that is being created, we will say that column 1 is an order_id, and column 2 is user_id, column 3 is start_date, column 4 is end_date and column 5 is city.
order_id user_id Start_date end_date city
101 1 200001 200101 X
101 2 200101 200110 y
101 3 200110 200112 z
101 3 200112 200210 z
I want to count by city the number of order_ids that moved out of it to another city and in another column the number of order_ids that moved into it from another city.
I would like it to come out as a table, like this:
city moved_out_orders moved_into_orders
x 1 0
y 1 1
z 0 1
You can do:
with
x as (
select a.city as from_city, b.city as to_city
from t a
join t b on a.order_id = b.order_id
and a.city <> b.city
and a.end_date = b.start_date
),
o (city, cnt) as (
select from_city, count(*) from x group by from_city
),
i (city, cnt) as (
select to_city, count(*) from x group by to_city
)
select
coalesce(i.city, o.city) as city,
o.cnt as moved_out_orders,
i.cnt as moved_in_orders
from i
full join o on o.city = i.city
Hmmm . . . I think you just want to enumerate the rows for each order and then discard the highest and lowest for each count:
select city,
sum(case when seqnum_desc > 1 then 1 else 0 end) as moved_out,
sum(case when seqnum_asc > 1 then 1 else 0 end) as moved_in
from (select t.*,
row_number() over (partition by orderid order by startdate) as seqnum_asc,
row_number() over (partition by orderid order by startdate desc) as seqnum_desc
from t
) t
group by city;
EDIT:
You appear to have adjacent rows in the same city. Seems strange, but instead you can use lead() and lag():
select city,
sum(case when next_city <> city then 1 else 0 end) as moved_out,
sum(case when prev_city <> city then 1 else 0 end) as moved_in
from (select t.*,
lag(city) over (partition by orderid order by startdate) as prev_city,
lead(city) over (partition by orderid order by startdate) as next_city
from t
) t
group by city;

SQL query to get daily acquisitions

I have a sales table:
date, user_id, product
there are 26 products(a-z), and those users who have purchased both 'a' and 'b' product are classified as acquired customers.
What I want is the daily level count of acquired customers as a SQL query
Say for eg, A user 'X' bought product 'a' on 1st apr, and bought product 'b' on 20th apr then he will be deemed as acquired on 20th apr.
Need a SQL query for this
Sample data:
date user_id Product sale
01-04-2019 123 a 200
01-04-2019 234 b 300
01-04-2019 345 a 200
02-04-2019 123 b 300
03-04-2019 234 b 300
04-04-2019 555 g 400
05-04-2019 666 a 200
05-04-2019 666 b 300
Desired Output from sql query:
date ac-quired_users
01-04-2019 0
02-04-2019 1
03-04-2019 0
04-04-2019 0
05-04-2019 1
obviously there will be a lot more data
You can use window functions for this. First, get the "start" date for each user:
select userid, min(date)
from (select t.*,
sum(case when product = 'a' then 1 else 0 end) over (partition by userid order by date) as cnt_a,
sum(case when product = 'b' then 1 else 0 end) over (partition by userid order by date) as cnt_b
from t
) t
from t
group by userid;
Then aggregate this:
select date, count(*)
from (select userid, min(date)
from (select t.*,
sum(case when product = 'a' then 1 else 0 end) over (partition by userid order by date) as cnt_a,
sum(case when product = 'b' then 1 else 0 end) over (partition by userid order by date) as cnt_b
from t
) t
from t
group by userid
) u
group by date
order by date;

Tuning oracle subquery in select statement

I have a master table and a reference table as below.
WITH MAS as (
SELECT 10 as CUSTOMER_ID, 1 PROCESS_ID, 44 PROCESS_TYPE, 200 as AMOUNT FROM DUAL UNION ALL
SELECT 10 as CUSTOMER_ID, 1 PROCESS_ID, 44 PROCESS_TYPE, 250 as AMOUNT FROM DUAL UNION ALL
SELECT 10 as CUSTOMER_ID, 2 PROCESS_ID, 45 PROCESS_TYPE, 300 as AMOUNT FROM DUAL UNION ALL
SELECT 10 as CUSTOMER_ID, 2 PROCESS_ID, 45 PROCESS_TYPE, 350 as AMOUNT FROM DUAL
), REFTAB as (
SELECT 44 PROCESS_TYPE, 'A' GROUP_ID FROM DUAL UNION ALL
SELECT 44 PROCESS_TYPE, 'B' GROUP_ID FROM DUAL UNION ALL
SELECT 45 PROCESS_TYPE, 'C' GROUP_ID FROM DUAL UNION ALL
SELECT 45 PROCESS_TYPE, 'D' GROUP_ID FROM DUAL
) SELECT ...
My first select statement which works correctly is this one:
SELECT CUSTOMER_ID,
SUM(AMOUNT) as AMOUNT1,
SUM(CASE WHEN PROCESS_TYPE IN (SELECT PROCESS_TYPE FROM REFTAB WHERE GROUP_ID = 'A')
THEN AMOUNT ELSE NULL END) as AMOUNT2,
COUNT(CASE WHEN PROCESS_TYPE IN (SELECT PROCESS_TYPE FROM REFTAB WHERE GROUP_ID = 'D')
THEN 1 ELSE NULL END) as COUNT1
FROM MAS
GROUP BY CUSTOMER_ID
However, to address a performance issue, I changed it to this select statement:
SELECT CUSTOMER_ID,
SUM(AMOUNT) as AMOUNT1,
SUM(CASE WHEN GROUP_ID = 'A' THEN AMOUNT ELSE NULL END) as AMOUNT2,
COUNT(CASE WHEN GROUP_ID = 'D' THEN 1 ELSE NULL END) as COUNT1
FROM MAS A
LEFT JOIN REFTAB B ON A.PROCESS_TYPE = B.PROCESS_TYPE
GROUP BY CUSTOMER_ID
For the AMOUNT2 and COUNT1 columns, the values stay the same. But for AMOUNT1, the value is multiplied because of the join with the reference table.
I know I can add 1 more left join with an additional join condition on GROUP_ID. But that won't be any different from using a subquery.
Any idea how to make the query work with just 1 left join while not multiplying the AMOUNT1 value?
I know I can add 1 more left join with adding aditional GROUP_ID clause but it wont be different from subquery.
You'd be surprised. Having 2 left joins instead of subqueries in the SELECT gives the optimizer more ways of optimizing the query. I would still try it:
select m.customer_id,
sum(m.amount) as amount1,
sum(case when grpA.group_id is not null then m.amount end) as amount2,
count(grpD.group_id) as count1
from mas m
left join reftab grpA
on grpA.process_type = m.process_type
and grpA.group_id = 'A'
left join reftab grpD
on grpD.process_type = m.process_type
and grpD.group_id = 'D'
group by m.customer_id
You can also try this query, which uses the SUM() analytic function to calculate the amount1 value before the join to avoid the duplicate value problem:
select m.customer_id,
m.customer_sum as amount1,
sum(case when r.group_id = 'A' then m.amount end) as amount2,
count(case when r.group_id = 'D' then 'X' end) as count1
from (select customer_id,
process_type,
amount,
sum(amount) over (partition by customer_id) as customer_sum
from mas) m
left join reftab r
on r.process_type = m.process_type
group by m.customer_id,
m.customer_sum
You can test both options, and see which one performs better.
Starting off with your original query, simply replacing your IN queries with EXISTS statements should provide a significant boost. Also, be wary of summing NULLs, perhaps your ELSE statements should be 0?
SELECT CUSTOMER_ID,
SUM(AMOUNT) as AMOUNT1,
SUM(CASE WHEN EXISTS(SELECT 1 FROM REFTAB WHERE REFTAB.GROUP_ID = 'A' AND REFTAB.PROCESS_TYPE = MAS.PROCESS_TYPE)
THEN AMOUNT ELSE NULL END) as AMOUNT2,
COUNT(CASE WHEN EXISTS(SELECT 1 FROM REFTAB WHERE REFTAB.GROUP_ID = 'D' AND REFTAB.PROCESS_TYPE = MAS.PROCESS_TYPE)
THEN 1 ELSE NULL END) as COUNT1
FROM MAS
GROUP BY CUSTOMER_ID
The normal way is to aggregate the values before the group by. You can also use conditional aggregation, if the rest of the query is correct:
SELECT CUSTOMER_ID,
SUM(CASE WHEN seqnum = 1 THEN AMOUNT END) as AMOUNT1,
SUM(CASE WHEN GROUP_ID = 'A' THEN AMOUNT ELSE NULL END) as AMOUNT2,
COUNT(CASE WHEN GROUP_ID = 'D' THEN 1 ELSE NULL END) as COUNT1
FROM MAS A LEFT JOIN
(SELECT B.*, ROW_NUMBER() OVER (PARTITION BY PROCESS_TYPE ORDER BY PROCESS_TYPE) as seqnum
FROM REFTAB B
) B
ON A.PROCESS_TYPE = B.PROCESS_TYPE
GROUP BY CUSTOMER_ID;
This ignores the duplicates created by the joins.

combine multiple select [group by] queries with condition

I have following 3 select queries which returns 2 columns set.
Is there any way that quizno, correct, wrong and notattempted columns come at once like following:
quizno correct wrong notattempted
1 80 10 10
2 60 20 20
3 100 0 0
These are the separated queries:
select quizno, count(*) as correct from v_t1 where examid=96 AND result='correct'
group by quizno order by count(*) desc
select quizno, count(*) as wrong from v_t1 where examid=96 AND result='wrong'
group by quizno order by count(*) desc
select quizno, count(*) as notattempted from v_t1 where examid=96 AND result='notattempted'
group by quizno order by count(*) desc
you can use CASE aggregation and get the expected output
select quizno,
sum( case when result='correct' then 1 else 0 end) as 'correct',
sum( case when result='wrong' then 1 else 0 end) as 'wrong',
sum( case when result='notattempted' then 1 else 0 end) as 'notattempted'
from v_t1
where examid = 96
group by quizno