How to find repetitive values per groups using SQL? - sql

I'm trying to find repetitions between rows based on the column. I've tried window functions with row_number() / rank() but they group all the values that are found (similar to GROUP BY) which I do not expect.
How can I find repetitions of the values?
I tried to do something like this:
SELECT *, rank() OVER(PARTITION BY customer ORDER BY id) FROM customers ORDER BY id
And got the following result:
id
customer
rank
1
customer_1
1
2
customer_2
1
3
customer_2
2
4
customer_1
2
5
customer_3
1
6
customer_1
3
What I want to do:
id
customer
rank
1
customer_1
1
2
customer_2
1
3
customer_2
2
4
customer_1
1
5
customer_3
1
6
customer_1
1

You are looking for counts within adjacent rows. This is a type of gaps-and-islands problem. You can define the adjacent rows with the difference of row_numbers() and then enumerate them:
SELECT c.*,
ROW_NUMBER() OVER (PARTITION BY customer, seqnum - seqnum_2 ORDER BY id) as ranking
FROM (SELECT c.*,
ROW_NUMBER() OVER (ORDER BY id) as seqnum,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY id) as seqnum_2
FROM customers c
) c
ORDER BY id

You can use a recursive query:
WITH RECURSIVE repeatitions(id, customer, repeat_count) AS (
SELECT id, customer, 1 as repeat_count
FROM customers
UNION ALL
SELECT c.id, c.customer, r.repeat_count + 1
FROM customers c, repeatitions r
WHERE c.id = r.id + 1 AND c.customer = r.customer
)
SELECT id, customer, repeat_count
FROM repeatitions
ORDER by id
I created a working fiddle to demonstrate it.

Related

How do I find the Sum and Max value per Unique ID in HIVE?

basically how do I turn
id name quantity
1 Jerry 1
1 Jerry 2
1 Nana 1
2 Max 4
2 Lenny 3
into
id name quantity
1 Jerry 3
2 Max 4
in HIVE?
I want to sum up and find the highest quantity for each unique ID
You can use window functions with aggregation:
select id, name, quantity
from (select id, name, sum(quantity) as quantity,
row_number() over (partition by id order by sum(quantity) desc) as seqnum
from t
group by id, name
) t
where seqnum = 1;
You can first calculate the sum of quantity per group, then rank them according to descending quantity, and finally filter the rows with rank = 1.
select
id, name, quantity
from (
select
*,
row_number() over (partition by id order by quantity desc) as rn
from (
select id, name, sum(quantity) as quantity
from mytable
group by id, name
)
) where rn = 1;
try like below
with cte as
(
select id,name,sum(quantity) as q
from table_name group by id,name
) select id,name,q from cte t1
where t1.q=( select max(q) from cte t2 where t1.id=t2.id)

How to rank groups of data?

Given the following, and tasked with ranking the raw data by the SUM(volume) within each group:
group_id volume
1 2
1 3
2 5
3 1
3 3
How can I obtain the following?
group_id volume group_volume rank
1 2 5 1
1 3 5 1
2 5 5 2
3 1 4 3
3 3 4 3
I can get group_volume easily, but am struggling on how to break the ties in rank without grouping by + ranking in a separate subquery and joining in.
SELECT *
, SUM(volume) OVER (PARTITION BY group_id) AS grouped_volume
, ... AS rank
FROM groups
Use CTE and Dense_rank
WITH CTE1 AS (SELECT group_id, volume,
sum(volume) over(partition by group_id) group_volume
from table1)
SELECT A.*, dense_rank() over( order by group_id, group_volume) rank FROM CTE1 A;
Use two levels of window functions for this:
select g.*,
dense_rank() over (order by group_volume desc, group_id) as rank
from (select g.*,
sum(volume) over (partition by group_id) as group_volume
from groups g
) g;
There is no need for a JOIN.

SQL : Return joint most frequent values from a column

I have the following table named customerOrders.
ID user order
1 1 2
2 1 3
3 1 1
4 2 1
5 1 5
6 2 4
7 3 1
8 6 2
9 2 2
10 2 3
I want to return to users with most orders. Currently, I have the following QUERY:
SELECT user, COUNT(user) AS UsersWithMostOrders
FROM customerOrders
GROUP BY user
ORDER BY UsersWithMostOrders DESC;
This returns me all the values grouped by total orders like.
user UsersWithMostOrders
1 4
2 4
3 1
6 1
I only want to return the users with most orders. In my case that would be user 1 and 2 since both of them have 4 orders. If I use TOP 1 or LIMIT, it will only return the first user. If I use TOP 2, it will only work in this scenario, it will return invalid data when top two users have different count of orders.
Required Result
user UsersWithMostOrders
1 4
2 4
You can use TOP 1 WITH TIES:
SELECT TOP 1 WITH TIES
[user], COUNT(*) AS UsersWithMostOrders
FROM customerOrders
GROUP BY [user]
ORDER BY UsersWithMostOrders DESC;
See the demo.
Results:
> user | UsersWithMostOrders
> ---: | ------------------:
> 1 | 4
> 2 | 4
Option 1
Should work with most versions of SQL.
select *
from (
select *,
rank() over(order by numOrders desc) as rrank
from (
select `user`, count(*) as numOrders
from customerOrders
group by `user`
) summed
) ranked
where rrank = 1
Play around with the code here
Option 2
If your version of SQL allows window functions (with), here is a much more readable solution which does the same thing
with summed as (
select `user`, count(*) as numOrders
from customerOrders
group by `user`
),
ranked as (
select *,
rank() over(order by numOrders desc) as rrank
from summed
)
select *
from ranked
where rrank = 1
Play around with the code here
You can use a CTE to attain this Req:
;WITH CTE AS(
SELECT [user], COUNT(user) AS UsersWithMostOrders
FROM #T
GROUP BY [user])
SELECT M.* from CTE M
INNER JOIN ( SELECT
MAX(UsersWithMostOrders) AS MaximumOrders FROM CTE) S ON
M.UsersWithMostOrders=S.MaximumOrders
Below Oracle Query can help:
WITH test_table AS
(
SELECT user, COUNT(order) AS total_order , DENSE_RANK() OVER (ORDER BY
total_order desc) AS rank_orders FROM customerOrders
GROUP BY user
)
select * from test_table where rank_orders = 1

how to find the number has more than two consecutive appearences?

The source table:
id num
-------------------
1 1
2 1
3 1
4 2
5 2
6 1
The output:(appear at least 2 times)
num times
--------------
1 3
2 2
Based on the addition logic defined in the comments it appears this is what you're after:
WITH YourTable AS(
SELECT V.id,
V.num
FROM (VALUES(1,1),
(2,1),
(3,1),
(4,2),
(5,2),
(6,1),
(7,1))V(id,num)), --Added extra row due to logic defined in comments
Grps AS(
SELECT YT.id,
YT.num,
ROW_NUMBER() OVER (ORDER BY id) -
ROW_NUMBER() OVER (PARTITION BY Num ORDER BY id) AS Grp
FROM YourTable YT),
Counts AS(
SELECT num,
COUNT(num) AS Times
FROM grps
GROUP BY grp,
num)
SELECT num,
MAX(times) AS times
FROM Counts
GROUP BY num;
This uses a CTE and ROW_NUMBER to define the groups, and then an additional CTE to get the COUNT per group. Finally you can then get the MAX COUNT per num.
I would adress this with a gaps-and-islands technique:
select num, max(cnt)
from (
select num, count(*) cnt
from (
select
id,
num,
row_number() over(order by id) rn1,
row_number() over(partition by num order by id) rn2
from mytable
) t
group by num, rn1 - rn2
) t
group by num
The most inner query computes row numbers over the whole table and within num groups; the difference between the row numbers gives you the group of adjacent records that each record belong to (you can run that subquery independently and follow how the difference evolves to understand more).
Then, the next level count the number of records in each group of adjacent records. The most outer query takes the maximum count of adjacent records in for each num.
Demo on DB Fiddle:
num | (No column name)
--: | ---------------:
1 | 3
2 | 2
this will work for you
select num,count(num) times from Tabl
group by num

SQL rank grouping variation

I'm trying to achieve the following "rank" result given the original dataset composed by the column ID and CODE.
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 4
10 A 4
Using the RANK_DENSE instruction over the CODE column i get the following result (with the A code getting the same rank value also after "the break" between the rows)
id code rank
1 A 1
2 A 1
3 A 1
4 B 2
5 B 2
6 C 3
7 C 3
8 C 3
9 A 1
10 A 1
Is it possible to achieve the results as shown in the first (example) table, with the A code changing rank when there is a separation between the group formed by id: 1-2-3 and the one formed by id: 9-10 without using a cursor?
Thanks
You want to find sequences of values and give them a rank. You can do this with a difference of row numbers approach. The following assigns a different number to each grouping:
select o.*, dense_rank() over (order by grp, code)
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o;
If you want the assignment in the same order as the original data, then you can order by the id, but that requires an additional window function:
select o.*, dense_rank() over (order by minid) as therank
from (select o.*, min(id) over (partition by grp, code) as minid
from (select o.*,
(row_number() over (order by id) -
row_number() over (partition by code order by id)
) as grp
from original o
) o
) o;
SUM by if current is the same as previous row. Works from SQL Server 2012.
WITH CTE AS (
SELECT id, code,
CASE Code WHEN LAG(CODE) OVER (ORDER BY id) THEN 0 ELSE 1 END AS Diff
FROM Table1)
SELECT id, code, SUM(Diff) OVER (ORDER BY id) FROM CTE
Please also see similar question at How to make row numbering with ordering, partitioning and grouping