Row Number() Order Issue - sql

Apologies in advance if this specific scenario has been asked previously, but I can't seem to get these to order properly (which is probably from staring at it for too long).
I'm using Netezza/Oracle, and In the data set below - I basically need the order_num to result in 1,2,2,2,2,3,4 (basically grouping Department and Desc1 (desc1 is not unique as there are different codes for each year, but I'm only interested in the type, not year).) Among other attempts, I've tried:
row_number () over (partition by a.department order by desc1) order_num
Which orders it alphabetically. I've also ordered by seq_no and desc1 - but that only works if I needed it alphabetically.
Thanks in advance.

Assuming that the Country is consistent with the grouping as you have shown; if you get the minimum seq_no per country in either a CTE or sub-query you can use this value in your dense_rank function, e.g.
SELECT
m.Department,
m.Desc1,
m.seq_no,
m.Country,
m.beg_date,
m.end_date,
dense_rank() OVER(PARTITION BY m.Department ORDER BY mintbl.MinSeq)
FROM dbo.mytable AS m
JOIN ( SELECT min(m.seq_no) AS MinSeq,
m.Department,
m.Country
FROM dbo.mytable AS m
GROUP BY m.Department,m.Country
) AS mintbl ON mintbl.Department = m.Department AND mintbl.Country = m.Country
ORDER BY m.seq_no

You want dense_rank() rather than row_number():
dense_rank() over (partition by a.department order by desc1) order_num
If you want to maintain the seqnum order, you can use a subquery to calculate:
min(seqnum) over (partition by department, desc1) as min_seqnum
Then in the outer query use min_seqnum for the order by.

Can you not use
dense_rank() over(partition by department, desc1 order by beg_date)
Or...
dense_rank() over(partition by department,desc1 order by seq_no)

Related

SQL ZOO Window LAG #8

Question: For each country that has had at last 1000 new cases in a single day, show the date of the peak number of new cases.
Here is a few sample data of the covid table.
What I write:
SELECT name,date,MAX(confirmed-lag) AS PeakNew
FROM(
SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') date, confirmed,
LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn) lag
FROM covid
ORDER BY confirmed
) temp
GROUP BY name
HAVING PeakNew>=1000
ORDER BY PeakNew DESC;
The result I got is weird, PeakNew seems correct, but the related date is not.
My answer
The right answer
Anyone can help to get the right answer? Thank you!
The below query works perfectly fine for me. Though the dates and values are correct, the output will say otherwise as the order is different. Here the order is by date, then by name.
SELECT z1.name, DATE_FORMAT(c.dt,'%Y-%m-%d'), z1.nc
FROM
(
SELECT z.name, MAX(z.nc) AS 'mx'
FROM (
SELECT DATE(whn) AS 'dt', name, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY DATE(whn) ASC) AS 'nc'
FROM covid ) z
WHERE z.nc >= 1000
GROUP BY z.name
) z1
INNER JOIN
(
SELECT DATE(whn) AS 'dt', name, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY DATE(whn) ASC) AS 'nc'
FROM covid
) c
ON c.nc = z1.mx
AND c.name = z1.name
ORDER BY 2 ASC
The date value in the outer query doesn't correspond to row where MAX(confirmed-lag) is found - it's just a random date value within that group. Check out the section titled, "The ONLY_FULL_GROUP_BY Issue" in this blog post: https://www.percona.com/blog/2019/05/13/solve-query-failures-regarding-only_full_group_by-sql-mode/ for more information.
I used the ROW_NUMBER() function to get the entire row corresponding to the maximum new cases. However, my final result wasn't ordered the way the answer was, and there's no specification to how it should be ordered, so I still didn't get that satisfying happy emoji.
You need to self join to obtain the date on which the max count occurred:
WITH CTE1 as
(SELECT name,DATE_FORMAT(whn, "%Y-%m-%d") as date,
confirmed - LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY DATE(whn)) as increase
FROM covid
ORDER BY whn),
CTE2 AS
(SELECT name, MAX(increase) as max_increase
FROM CTE1
WHERE increase >999
GROUP BY name
ORDER BY date)
SELECT c1.name,c1.date,c2.max_increase as peakNewCases
FROM CTE1 as c1
JOIN CTE2 as c2
ON c1.name=c2.name AND c1.increase=c2.max_increase
WITH CTE1 as
(SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') as date_form, confirmed - LAG(confirmed,1) OVER(PARTITION BY name ORDER BY whn) AS newcases
FROM covid
ORDER BY name,whn)
SELECT name, date_form, newcases FROM
(
SELECT name, date_form, newcases, ROW_NUMBER() OVER (PARTITION BY name ORDER BY newcases DESC) as rank
FROM CTE1
WHERE newcases > 999
) cte2
WHERE rank =1

Distinct rows in a table in sql

I have a table with multiple rows of the same member id. I need only distinct rows based on 2 unique columns
Ex: there are 100 different customers, the table has 1000 rows because every customer has multiple cities and segments assigned to him.
I need 100 distinct rows for these customers depending on a unique segment and city combination. There is no specific requirement for this combination, just the first from the table is fine.
So, currently the table is somewhat like this,
Hope this helps.
use row_number()
select * from (select *,row_number() over(partition by memberid order by sales) rn
from table_name
) a where a.rn=1
Handy sql-server top(1) with ties syntax for that
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by sales)
As you have no paticular requirement for which exactly row to select, any column will do at order by, it can be null as well
select top(1) with ties t.*
from table_name t
order by row_number() over(partition by memberid order by (select null))
The simplest way to do this is to use the ROW_NUMBER() OVER(GROUP BY...) syntax. You have no need to use an order by, since you want an arbitrary row, but only one, for each member.
Since you need only the expected data, and not the Row_Number value, make sure that you detail the fields returned, like below:
SELECT
MemberId,
city,
segment,
sales
FROM (
SELECT *
ROW_NUMBER() OVER (GROUP BY MemberId) as Seq
FROM [Status]
) src
WHERE Seq = 1

Dense_Rank with Case statement is not giving Rank with Date

I have an issue while using Dense_Rank with CASE Statement. Below is the
sample table screenshot
So my requirement is two provide a Rank to every employee based on Emp_Dep_id
Req 1-->If Emp_Dep_id is same give same rank
Req 2-->If Emp_Dep_id is null then give same rank only when Emp_Joining_Date and Emp_Country is same
Below is the code to give rank
Select case
when Emp.Emp_Dep_Id IS NULL
then
DENSE_RANK() over (order by Emp.Emp_Dep_Id desc,
Emp.Emp_Joining_Date desc,Emp.Emp_Country)
else
DENSE_RANK() over (order by Emp.Emp_Dep_Id desc)
end as
rnk ,*
from Employee Emp with (nolock)
Below is Output-->
So,I am facing two issues-
Why Rank is skipping if same ranks is there ex- after second rank why sixth rank is coming next
I want to give rank basis of Emp_Joining_Date Currently it is behaving like firstly it is assigning rank if Emp_Dep_Id is not null after that it is continuing for Emp_Dep_Id is null.
I want to get the rank based on latest Emp_Joining_Date means joining date with 2016 with null should come first
Thanks Guys for your valuable response,I fixed my issue by doing this way
1. Step 1
Select case
when Emp.Emp_Dep_Id IS NULL
then
DENSE_RANK() over (order by Emp.Emp_Joining_Date,Emp.Emp_Country)
else
DENSE_RANK() over (order by Emp.Emp_Dep_Id desc)
end as
rnk ,*
into #Emp_Output_Tbl
from Employee Emp
Select * from #Emp_Output_Tbl order by Emp_Joining_date desc
--Step 2
Select distinct rnk,Emp_Joining_date into #Emp_New_Tbl from #Emp_Output_Tbl order by Emp_Joining_date desc
Select * from #Emp_New_Tbl order by Emp_Joining_Date desc
Select * from #Emp_Output_Tbl order by Emp_Joining_Date desc
--Step 3
Select * from #Emp_Output_Tbl where rnk in(
Select TOP 5 rnk from #Emp_New_Tbl
)
order by Emp_Joining_Date desc
**Output as per expectation**
I hope this is going to help
I think you want this logic:
You want a single dense_rank(). The trick is to get the logic into the order by clause.
I think this is what you want:
Select dense_rank() over (order by Emp.Emp_Dep_Id,
(case when Emp.Emp_Dep_Id IS NULL then Emp.Emp_Joining_Date end) desc,
(case when Emp.Emp_Dep_Id IS NULL then Emp.Emp_Country end) desc
)
The Dense rank is working as per the result set & the number of records in it. Please refer the link given
https://msdn.microsoft.com/en-IN/library/ms173825.aspx?
use the below query to know the result better. Also try to use order by after the SQL statement to order it properly.
Select
DENSE_RANK() over (order by Emp.Emp_Dep_Id desc,
Emp.Emp_Joining_Date desc,Emp.Emp_Country),
DENSE_RANK() over (order by Emp.Emp_Dep_Id desc),
case
when Emp.Emp_Dep_Id IS NULL
then
DENSE_RANK() over (order by Emp.Emp_Dep_Id desc,
Emp.Emp_Joining_Date desc,Emp.Emp_Country)
else
DENSE_RANK() over (order by Emp.Emp_Dep_Id desc)
end as
rnk ,*
from Employee Emp with (nolock)
Could you please use the below query whether your requirement is met or not. I believe the dense range will operate as your requirement. You don't need to add a extra case statement. Only thing you need to specify is the order of the column. It automatically handle as you required.
Select
DENSE_RANK() over (order by Emp.Emp_Dep_Id desc,Emp.Emp_Joining_Date desc,Emp.Emp_Country) as rnk ,*
from Employee Emp with (nolock)
Req 1-->If Emp_Dep_id is same give same rank
Req 2-->If Emp_Dep_id is null then give same rank only when
Emp_Joining_Date and Emp_Country is same
Thanks Guys for your valuable response,I fixed my issue by doing this way
1. Step 1
Select case
when Emp.Emp_Dep_Id IS NULL
then
DENSE_RANK() over (order by Emp.Emp_Joining_Date,Emp.Emp_Country)
else
DENSE_RANK() over (order by Emp.Emp_Dep_Id desc)
end as
rnk ,*
into #Emp_Output_Tbl
from Employee Emp
Select * from #Emp_Output_Tbl order by Emp_Joining_date desc
--Step 2
Select distinct rnk,Emp_Joining_date into #Emp_New_Tbl from #Emp_Output_Tbl order by Emp_Joining_date desc
Select * from #Emp_New_Tbl order by Emp_Joining_Date desc
Select * from #Emp_Output_Tbl order by Emp_Joining_Date desc
--Step 3
Select * from #Emp_Output_Tbl where rnk in(
Select TOP 5 rnk from #Emp_New_Tbl
)
order by Emp_Joining_Date desc
**Output as per expectation**

How to write a derived query in Netezza SQL?

I need to query the data for inviteid based. For each inviteid I need to have the top 5 IDs and ID Descriptions.
I see that the query I wrote is taking all the time in the world to fetch. I didn't notice an error or anything wrong with it.
The code is:
SELECT count(distinct ID),
IDdesc,
inviteid,
A
FROM (
SELECT
ID,
IDdesc,
inviteid,
RANK() OVER(order by invtypeid asc ) A
FROM Fact_s
--WHERE dateid ='26012013'
GROUP BY invteid,IDdesc,ID
ORDER BY invteid,IDdesc,ID
) B
WHERE A <=5
GROUP BY A, IDDESC, inviteid
ORDER BY A
I'm not sure I understood you requirement completely, but as far as I can tell the group by in the derived table is not necessary (just as the order by as Mark mentioned) because you are using a window function.
And you probably want row_number() instead of rank() in there.
Including the result of rank() in the outer query seems dubious as well.
So this leads to the following statement:
SELECT count(distinct ID),
IDdesc,
inviteid
FROM (
SELECT ID,
IDdesc,
inviteid,
row_number() OVER (order by invtypeid asc ) as rn
FROM Fact_s
) B
WHERE rn <= 5
GROUP BY IDDESC, inviteid;

Query uses rank() needs optimization

select * from
(
Select DISTINCT
DocManREPORT_View.DOCINPUTDATE,
DocManREPORT_View.REACTIVATEDATE,
DocManREPORT_View.TRACENO,
DocManREPORT_View.CLIENTNAME,
DocManREPORT_View.DOCUMENTID,DocManREPORT_View.BARCODEID,
DocManREPORT_View.INPUTMODE,
DocManREPORT_View.INPUTSOURCE,PI.start_time,
RANK() OVER (PARTITION BY process_instance_id
ORDER BY last_modified_date desc) rank,
PI.STATUS AS PROCESSSTATUS
FROM DocManREPORT_View
INNER JOIN PROCESS_INSTANCE PI ON
(pi.instance_id = DocManREPORT_View.process_instance_id)
)
where rank = 1;
I presume DISTINCT clause could screw up the performance. I would recommend you to get rid of it by including into partition by clause and have a look what have you got.
If you can, try to use the
RANK() OVER (PARTITION BY process_instance_id
ORDER BY last_modified_date desc) rank,
Inside the VIEW, since I tihnk the View has already every data to make this step inside.