Select 5 of each distinct value - sql

I have the following table in PostgreSQL:
| a | b | c |
===================
| 'w' | 2 | 3 |
| 'w' | 7 | 2 |
| 'w' | 8 | 1 |
| 'w' | 3 | 6 |
| 'w' | 0 | 8 |
| 'w' | 2 | 9 |
| 'w' | 2 | 9 |
| 'z' | 4 | 9 |
| 'z' | 0 | 9 |
| 'z' | 0 | 8 |
| 'z' | 3 | 6 |
| 'z' | 2 | 7 |
| 'z' | 3 | 1 |
| 'z' | 3 | 2 |
| 'z' | 3 | 3 |
I want to select all records, but limit them to 5 records for each distinct value in column a.
So the result would look like:
| a | b | c |
===================
| 'w' | 2 | 3 |
| 'w' | 7 | 2 |
| 'w' | 8 | 1 |
| 'w' | 3 | 6 |
| 'w' | 0 | 8 |
| 'z' | 4 | 9 |
| 'z' | 0 | 9 |
| 'z' | 0 | 8 |
| 'z' | 3 | 6 |
| 'z' | 2 | 7 |
What is the most effecient way to achieve that in RoR? Thanks!

you can use row_number, but you have to specify order or you will get unpredictable resutls
with cte as (
select
*,
row_number() over(partition by a order by b, c) as row_num
from table1
)
select a, b, c
from cte
where row_num <= 5

Related

Get the count of longest streak including the break point

I am working on the problem where I have to get the count of streak with max value, but to get the exact result I have to count that point as well where the streak breaks. My table looks like this
+-----------------+--------+-------+
| customer_number | Months | Flags |
+-----------------+--------+-------+
| 1 | 12 | 1 |
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 1 |
| 1 | 4 | 1 |
| 1 | 5 | 1 |
| 1 | 8 | 1 |
| 1 | 9 | 1 |
| 1 | 10 | 1 |
| 1 | 11 | 1 |
| 6 | 12 | 1 |
| 6 | 1 | 1 |
| 6 | 2 | 1 |
| 6 | 3 | 1 |
| 6 | 4 | 1 |
| 6 | 5 | 4 |
| 6 | 9 | 1 |
| 6 | 10 | 1 |
| 6 | 11 | 1 |
| 7 | 5 | 1 |
| 8 | 9 | 1 |
| 8 | 10 | 1 |
| 8 | 11 | 1 |
| 9 | 9 | 1 |
| 9 | 10 | 1 |
| 9 | 11 | 1 |
| 10 | 11 | 1 |
+-----------------+--------+-------+
and my desired output is
+----------+--------------------+
| Customer | Consecutive streak |
+----------+--------------------+
| 1 | 10 |
| 6 | 6 |
| 7 | 1 |
| 8 | 3 |
| 9 | 3 |
| 10 | 1 |
+----------+--------------------+
the code I have
SELECT customer_number, max(streak) max_consecutive_streak FROM (
SELECT customer_number, COUNT(*) as streak
FROM
(select *,
(row_number() over (order by customer_number) -
row_number() over (order by customer_number)
) as counts
from table1
) cc
group by customer_number, counts
)
GROUP BY 1;
It is working good but for customer_number 6 it returns 5 but I want it to be 6, means it should count 4 as well in its longest streak as the streak breaks at this point. Any idea how can I achieve that?
You can use a cte with row_number:
with cte(r, id, flag) as (
select row_number() over (order by c.customer_number), c.* from customers c
),
freq(id, t, f) as (
select c2.id, c2.f, count(*) from
(select c.id, (select sum(c1.flag!=c.flag) from cte c1 where c1.id=c.id and c1.r <= c.r) f from cte c)
c2 group by c2.id, c2.f
)
select id, max(f) from freq group by id;

How to assign duplicate increment in SQL?

While going through SQL columns, if we find text match "NEW" in Calc column, update the incrementing a count starting with 1 in Results column.
It should look like this on the output:
The following uses an id column to resolve the order issue. Replace that with your corresponding expression. This also addresses the requirement to start the display sequence with 1 and also show 0 for the 'NEW' rows.
The SQL (updated):
SELECT logs.*
, CASE WHEN text = 'NEW' THEN 0
ELSE
COALESCE(SUM(CASE WHEN text = 'NEW' THEN 1 END) OVER (PARTITION BY xrank ORDER BY id)+1, 1)
END AS display
FROM logs
ORDER BY id
The result:
+----+-------+------+---------+
| id | xrank | text | display |
+----+-------+------+---------+
| 1 | 1 | A | 1 |
| 2 | 1 | B | 1 |
| 3 | 1 | C | 1 |
| 4 | 1 | NEW | 0 |
| 5 | 1 | D | 2 |
| 6 | 1 | Q | 2 |
| 7 | 1 | B | 2 |
| 8 | 1 | NEW | 0 |
| 9 | 1 | D | 3 |
| 10 | 1 | Z | 3 |
| 11 | 2 | A | 1 |
| 12 | 2 | B | 1 |
| 13 | 2 | C | 1 |
| 14 | 2 | NEW | 0 |
| 15 | 2 | D | 2 |
| 16 | 2 | Q | 2 |
| 17 | 2 | B | 2 |
| 18 | 2 | NEW | 0 |
| 19 | 2 | D | 3 |
| 20 | 2 | Z | 3 |
+----+-------+------+---------+
You need a column that specifies the ordering for the table. With that, just use a cumulative sum:
select t.*,
1 + sum(case when Calc = 'NEW' then 1 else 0 end) over (partition by Rank_Id order by Seq) as display
from t;

SQL - Partition restarted based on a column value

I need to create a new column that restarts at every 0 value of Column Repeated Call of each Customer_ID:
+-------------+---------+----------------------+---------------+
| Customer_ID | Call_ID | Days Since Last Call | Repeated Call |
+-------------+---------+----------------------+---------------+
| 1 | 1 | Null | 0 |
| 1 | 2 | 45 | 0 |
| 1 | 3 | 0 | 1 |
| 1 | 4 | 0 | 1 |
| 1 | 5 | 0 | 1 |
| 1 | 6 | 48 | 0 |
| 1 | 7 | 1 | 1 |
| 2 | 8 | Null | 0 |
| 2 | 9 | 1 | 1 |
+-------------+---------+----------------------+---------------+
In to something like this:
+-------------+---------+----------------------+---------------+-------------+
| Customer_ID | Call_ID | Days Since Last Call | Repeated Call | Order_Group |
+-------------+---------+----------------------+---------------+-------------+
| 1 | 1 | Null | 0 | 1 |
| 1 | 2 | 45 | 0 | 2 |
| 1 | 3 | 0 | 1 | 2 |
| 1 | 4 | 0 | 1 | 2 |
| 1 | 5 | 0 | 1 | 2 |
| 1 | 6 | 48 | 0 | 3 |
| 1 | 7 | 1 | 1 | 3 |
| 2 | 8 | Null | 0 | 1 |
| 2 | 9 | 1 | 1 | 1 |
+-------------+---------+----------------------+---------------+-------------+
Appreciate your suggestion, thanks!
You can use SUM() window function:
select t.*,
sum(case when Repeated_Call = 0 then 1 else 0 end)
over (partition by Customer_ID order by Call_Id) Order_Group
from tablename t
See the demo (for MySql but it is standard SQL).
Results:
| Customer_ID | Call_ID | Days Since Last Call | Repeated_Call | Order_Group |
| ----------- | ------- | -------------------- | ------------- | ----------- |
| 1 | 1 | | 0 | 1 |
| 1 | 2 | 45 | 0 | 2 |
| 1 | 3 | 0 | 1 | 2 |
| 1 | 4 | 0 | 1 | 2 |
| 1 | 5 | 0 | 1 | 2 |
| 1 | 6 | 48 | 0 | 3 |
| 1 | 7 | 1 | 1 | 3 |
| 2 | 8 | | 0 | 1 |
| 2 | 9 | 1 | 1 | 1 |
You can calculation every 0 value in column Repeated Call (for each customer) using window analytic function COUNT with ROWS UNBOUNDED PRECEDING:
SELECT *,
COUNT(CASE WHEN Repeated Call=0 THEN 1 ELSE NULL END )OVER(PARTITION BY Customer_ID
ORDER BY Call_ID ROWS UNBOUNDED PRECEDING)Order_Gr FROM Table

Find relation level from text field

I have a table containing geographical structure of units. There are parent-child relation columns but I want to use the existing text field (instead of recursion) to find the relation level between the items.
(here's a table creation script)
drop table if exists #temp_structure
create table #temp_structure
(org_id int,
parent_org_id int,
org_name nvarchar(255),
search_tree nvarchar(255))
insert into #temp_structure
values
(1,null,'World','| 1 |'),
(2,1,'Europe','| 1 | 2 |'),
(3,1,'North America','| 1 | 3 |'),
(4,1,'South America','| 1 | 4 |'),
(5,1,'Asia','| 1 | 5 |'),
(6,1,'Africa','| 1 | 6 |'),
(7,1,'Australia','| 1 | 7 |'),
(8,2,'Spain','| 1 | 2 | 8 |'),
(9,2,'Germany','| 1 | 2 | 9 |'),
(10,2,'Italy','| 1 | 2 | 10 |'),
(11,2,'France','| 1 | 2 | 11 |'),
(12,8,'Madrid ','| 1 | 2 | 8 | 12 |'),
(13,8,'Barcelona ','| 1 | 2 | 8 | 13 |'),
(14,9,'Berlin','| 1 | 2 | 9 | 14 |'),
(15,9,'Munich','| 1 | 2 | 9 | 15 |'),
(16,10,'Rome','| 1 | 2 | 10 | 16 |'),
(17,10,'Milano','| 1 | 2 | 10 | 17 |'),
(18,11,'Paris','| 1 | 2 | 11 | 18 |'),
(19,11,'Marseille','| 1 | 2 | 11 | 19 |')
The expected result I would like to achieve is presented below (I listed only one 4th level example):
+--------+-------------+------------+
| org_id | search_item | nest_level |
+--------+-------------+------------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 1 | 2 |
| 3 | 3 | 1 |
| 3 | 1 | 2 |
| 4 | 4 | 1 |
| 4 | 1 | 2 |
| 5 | 5 | 1 |
| 5 | 1 | 2 |
| 6 | 6 | 1 |
| 6 | 1 | 2 |
| 7 | 7 | 1 |
| 7 | 1 | 2 |
| 8 | 8 | 1 |
| 8 | 2 | 2 |
| 8 | 1 | 3 |
| 9 | 9 | 1 |
| 9 | 2 | 2 |
| 9 | 1 | 3 |
| 10 | 10 | 1 |
| 10 | 2 | 2 |
| 10 | 1 | 3 |
| 11 | 11 | 1 |
| 11 | 2 | 2 |
| 11 | 1 | 3 |
| 12 | 12 | 1 |
| 12 | 8 | 2 |
| 12 | 2 | 3 |
| 12 | 1 | 4 |
.....................................
+--------+-------------+------------+
I was able to pull the org_id-search_item relation using STRING_SPLIT, but I still miss the tricky level part (I wonder about enumerating the '|' characters)
SELECT t.org_id
--,substring(replace(search_tree, ' ', ''), 2, len(replace(search_tree, ' ', '')) - 2)
,ss.value as search_item
FROM #temp_structure t
CROSS APPLY string_split(substring(replace(search_tree, ' ', ''), 2, len(replace(search_tree, ' ', '')) - 2),'|') ss
I have not thoroughly tested this, but you could try something like the following:
-- Table mock-up.
DECLARE #temp TABLE ( org_id int, parent_org_id int, org_name nvarchar(255), search_tree nvarchar(255) )
-- Insert sample data...
INSERT INTO #temp VALUES
(1,null,'World','| 1 |'),(2,1,'Europe','| 1 | 2 |'),
(3,1,'North America','| 1 | 3 |'),(4,1,'South America','| 1 | 4 |'),
(5,1,'Asia','| 1 | 5 |'),(6,1,'Africa','| 1 | 6 |'),
(7,1,'Australia','| 1 | 7 |'),(8,2,'Spain','| 1 | 2 | 8 |'),
(9,2,'Germany','| 1 | 2 | 9 |'),(10,2,'Italy','| 1 | 2 | 10 |'),
(11,2,'France','| 1 | 2 | 11 |'),(12,8,'Madrid ','| 1 | 2 | 8 | 12 |');
-- Select data in a nested level...
SELECT
org_id,
search_item,
ROW_NUMBER() OVER ( PARTITION BY org_id ORDER BY org_id, parent_org_id, search_item DESC ) AS nest_level
FROM #temp AS tmp
CROSS APPLY (
SELECT CAST ( [value] AS INT ) AS search_item FROM STRING_SPLIT ( tmp.search_tree, '|' )
WHERE NULLIF ( [value], '' ) IS NOT NULL
) AS tree
ORDER BY
org_id, parent_org_id, search_item DESC;
Returns
+--------+-------------+------------+
| org_id | search_item | nest_level |
+--------+-------------+------------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 2 | 1 | 2 |
| 3 | 3 | 1 |
| 3 | 1 | 2 |
| 4 | 4 | 1 |
| 4 | 1 | 2 |
| 5 | 5 | 1 |
| 5 | 1 | 2 |
| 6 | 6 | 1 |
| 6 | 1 | 2 |
| 7 | 7 | 1 |
| 7 | 1 | 2 |
| 8 | 8 | 1 |
| 8 | 2 | 2 |
| 8 | 1 | 3 |
| 9 | 9 | 1 |
| 9 | 2 | 2 |
| 9 | 1 | 3 |
| 10 | 10 | 1 |
| 10 | 2 | 2 |
| 10 | 1 | 3 |
| 11 | 11 | 1 |
| 11 | 2 | 2 |
| 11 | 1 | 3 |
| 12 | 12 | 1 |
| 12 | 8 | 2 |
| 12 | 2 | 3 |
| 12 | 1 | 4 |
+--------+-------------+------------+

SQL: Ranking Sections separately of a Rollup over multiple columns

I try to do a Rollup over multiple columns and then apply a ranking on each stage/section of the rollup process. The result should look somewhat like the following:
| ColA | ColB | ColC | RankingCriteria | Ranking |
|------|------|------|-----------------|---------|
| - | - | - | 10 | 1 |
|------|------|------|-----------------|---------|
| A | - | - | 10 | 1 |
| B | - | - | 8 | 2 |
|------|------|------|-----------------|---------|
| A | a | - | 9 | 1 |
| A | b | - | 7 | 2 |
| A | c | - | 5 | 3 |
| A | d | - | 2 | 4 |
|------|------|------|-----------------|---------|
| B | a | - | 8 | 1 |
| B | c | - | 7 | 2 |
| B | b | - | 2 | 3 |
|------|------|------|-----------------|---------|
| A | a | x | 7 | 1 |
| A | a | y | 5 | 2 |
| A | a | z | 4 | 3 |
|------|------|------|-----------------|---------|
| A | b | y | 6 | 1 |
|------|------|------|-----------------|---------|
| A | c | w | 10 | 1 |
| A | c | y | 10 | 1 |
| A | c | z | 8 | 2 |
| A | c | x | 6 | 3 |
|------|------|------|-----------------|---------|
| A | d | y | 4 | 1 |
|------|------|------|-----------------|---------|
| B | a | w | 10 | 1 |
| B | a | x | 8 | 2 |
|------|------|------|-----------------|---------|
| B | b | y | 6 | 1 |
| B | b | z | 5 | 2 |
| B | b | w | 4 | 3 |
|------|------|------|-----------------|---------|
| B | c | x | 6 | 1 |
|------|------|------|-----------------|---------|
So as you can see each grouping set has it's own ranking.
The basic Rollup-Query for this is simple but the ranking is giving me headaches and I am running out of ideas on how to achieve this.
Select ColA, ColB, ColC, RankingCriteria
From table
Group By Rollup(ColA, ColB, ColC)
The problem is that I cannot use a normal Rank() over (Partition by ...) because there is no partition I could use that'd work on the whole thing.
I think this will produce what you want:
SELECT r.*,
row_number() over (partition by (case when colb is null and colc is null and cola is not null
then 1 else 0 end),
(case when colb is null and colc is null and cola is not null
then NULL else A end),
(case when colb is null and colc is null and cola is not null
then NULL else B end)
order by RankingCriteria desc) as seqnum
FROM (Select ColA, ColB, ColC, RankingCriteria
From table
Group By Rollup(ColA, ColB, ColC)
) r;
The way I read the logic is that partitioning by A and B works for all but the second group. That is why this uses the three case statements.