Instead of ROW_NUMBER ORDER BY - sql

In my last post (Below link), I've tried to use the ROW_NUMBER ORDER BY and finally got the required solution. See the following link:
Get Wages Yearly Increment Column Wise Using Sql
Now I am trying to use the following query in Sql server 2000, just for demonstration purpose. I know, ROW_NUMBER ORDER BY can't be used in it. And after some googling, tried to use the following for Sql server 2000: (1st query)
SELECT k.ID, k.[Name], m.Amt,
(SELECT COUNT(*) FROM EmpIncrement l WHERE l.EmpID <= m.EmpID) as RowNum
FROM EmpIncrement m
JOIN Employee k ON m.EmpID = k.ID
And I got this output:
ID Name Amt RowNum
1 John 2000 2
2 Jack 8000 4
1 John 1000 2
2 Jack 4000 4
Similarly when I use the following with ROW_NUMBER ORDER BY, then it shows different output: (2nd query)
SELECT k.ID, k.Name, m.Amt,
ROW_NUMBER() OVER (PARTITION BY EmpID ORDER BY DATEPART(yy,IncrementDate)) as RowNum
FROM EmpIncrement m
JOIN Employee k ON k.ID = m.EmpID
Output:
ID Name Amt RowNum
1 John 1000 1
2 John 2000 2
1 Jack 4000 1
2 Jack 8000 2
So it's noticed that the grouping for the employee ids (RowNum) are different in both the queries where the output of the second query is correct. I would like to know the difference of both the queries output and if the 1st query is equivalent to ROW_NUMBER ORDER BY. Thanks.
Note: I didn't include the table structure and sample data here again. Never mind - You can see the earlier post for that.

To recreate the partition by EmpId, your subquery should have l.EmpId = m.Empid. You really need a unique column or set of columns to unique identify a row for this version to work properly. In an attempt based on the given data, if EmpId, Amt are a unique pair you can use and l.Amt < m.Amt. If you have a surrogateid on the table, that would be better instead of Amt.
select
k.id
, k.[Name]
, m.Amt
, ( select count(*)
from EmpIncrement l
where l.Empid = m.Empid
and l.Amt <= m.Amt
) as RowNum
from EmpIncrement m
inner join Employee k
on m.Empid = k.id
If you have no set of columns to uniquely identify and order the rows, you can use a temporary table with an identity() column.
create table #temp (tmpid int identity(1,1) not null, id int, [Name] varchar(32), Amt int);
insert into #temp (id, [Name], Amt);
select
k.id
, k.[Name]
, m.Amt
from EmpIncrement m
inner join Employee k
on m.Empid = k.id;
select
t.id
, t.[Name]
, t.Amt
, ( select count(*)
from #Temp i
where i.Empid = t.Empid
and i.tmpId <= t.tmpId
) as RowNum
from #temp t

Related

Display invoice in 3 columns

I have table with invoice in Oracle database. I will need to select invoice_id column from invoice in 3 column ordered by data_creation.
Result should be like under
invoie_id invoie_id invoie_id
1 2 3
4 5 6
...
You can do this by using rownum & inline view of the same column to split them in columns. Then you have to make join. Your query will be like
select g.empno, f.empno from (select e.empno from emp e
where rownum < 3 or rownum >2)f, emp g
where g.empno = f.empno
and rownum < 3
This is just a direction for you not the exact answer. You can create your logic from here.
Ok i found by myself
select *
from (
SELECT invoice_id,
LEAD (invoice_id,1) OVER (ORDER BY invoice_date) AS next_invoice ,
LEAD (invoice_id,2) OVER (ORDER BY invoice_date) AS next_invoice2
FROM invoice ORDER BY invoice_date)
where mod(rownum,3) = 0

SELECT a single field by ordered value

Consider the following two tables:
student_id score date
-------------------------
1 10 05-01-2013
2 100 05-15-2013
2 60 05-01-2012
2 95 05-14-2013
3 15 05-01-2011
3 40 05-01-2012
class_id student_id
----------------------------
1 1
1 2
2 3
I want to get unique class_ids where the score is above a certain threshold for at least one student, ordered by the latest score.
So for instance, if I wanted to get a list of classes where the score was > 80, i would get class_id 1 as a result, since student 2's latest score was above > 80.
How would I go about this in t-sql?
Are you asking for this?
SELECT DISTINCT
t2.[class_ID]
FROM
t1
JOIN t2
ON t2.[student_id] = t1.[student_id]
WHERE
t1.[score] > 80
Edit based on your date requirement, then you could use row_number() to get the result:
select c.class_id
from class_student c
inner join
(
select student_id,
score,
date,
row_number() over(partition by student_id order by date desc) rn
from student_score
) s
on c.student_id = s.student_id
where s.rn = 1
and s.score >80;
See SQL Fiddle with Demo
Or you can use a WHERE EXISTS:
select c.class_id
from class_student c
where exists (select 1
from student_score s
where c.student_id = s.student_id
and s.score > 80
and s.[date] = (select max(date)
from student_score s1
where s.student_id = s1.student_id));
See SQL Fiddle with Demo
select distinct(class_id) from table2 where student_id in
(select distinct(student_id) from table1 where score > thresholdScore)
This should do the trick:
SELECT DISTINCT
CS.Class_ID
FROM
dbo.ClassStudent CS
CROSS APPLY (
SELECT TOP 1 *
FROM dbo.StudentScore S
WHERE CS.Student_ID = S.Student_ID
ORDER BY S.Date DESC
) L
WHERE
L.Score > 80
;
And here's another way:
WITH LastScore AS (
SELECT TOP 1 WITH TIES
FROM dbo.StudentScore
ORDER BY Row_Number() OVER (PARTITION BY Student_ID ORDER BY Date DESC)
)
SELECT DISTINCT
CS.Class_ID
FROM
dbo.ClassStudent CS
WHERE
EXISTS (
SELECT *
FROM LastScore L
WHERE
CS.Student_ID = L.Student_ID
AND L.Score > 80
)
;
Depending on the data and the indexes, these two queries could have very different performance characteristics. It is worth trying several to see if one stands out as superior to the others.
It seems like there could be some version of the query where the engine would stop looking as soon as it finds just one student with the requisite score, but I am not sure at this moment how to accomplish that.

grouping and aggregates with subqueries

I have a query that is designed to find the number of people who went to a hospital more than once. What I have works, but is there a way to do it without the subquery?
SELECT count(*) as counts, hospitals.hospitalname
FROM Patient INNER JOIN
hospitals ON Patient.hospitalnpi = hospitals.npi
WHERE (hospitals.hospitalname = 'X')
group by patientid, hospitalname
having count(patient.patientid) >1
order by count(*) desc
This will always return the number of correct rows (30), but not the number 30. If I remove the group by patientid then I get the entire result set returned.
I solved this problem by doing
select COUNT(*),hospitalname
from
(
SELECT count(*) as counts,hospitals.hospitalname
FROM hospitals INNER JOIN
Patient ON hospitals.npi = Patient.hospitalnpi
group by patientid, hospitals.hospitalname
having count(patient.patientid) >1
) t
group by t.hospitalname
order by t.hospitalname desc
I feel that there has to be a more elegant solution than using subqueries all the time. How could this be improved?
sample data from first query
row # revisits
1 2
2 2
3 2
4 2
same data from second, working query
row# hosp. name revisitAggregate
1 x 30
2 y 15
3 z 5
Simple one-to-many relationship between patient and hospitals
It's super hacky, but here you are:
SELECT TOP 1
ROW_NUMBER() OVER (order by patient.patientid) as Count
FROM
Patient
INNER JOIN hospitals
ON Patient.hospitalnpi = hospitals.npi
WHERE
(hospitals.hospitalname = 'X')
GROUP BY
patientid,
hospitalname
HAVING
count(patient.patientid) >1
ORDER BY
Count desc
select distinct hospitalname, count(*) over (partition by hospitalname) from (
SELECT hospitalname, count(*) over (partition by patientid,
hospitals.hospitalname) as counter
FROM hospitals INNER JOIN
Patient ON hospitals.npi = Patient.hospitalnpi
WHERE (hospitals.hospitalname = 'X')
) Z
where counter > 1

Sql Server Find Next Most Recent Changed Record

In my employee history table I'm trying to find what the salary was and then what it was changed to. Each Salary change inserts a new record because the new salary is considered a new "job" so it has a start and end date attached. I can select all these dates fine but I keep getting duplicates because I can't seem to compare the current record only against its most recent prior record for that employee. (if that makes sense)
I would like the results to be along the lines of:
Employe Name, OldSalary, NewSalary, ChangeDate(EndDate)
Joe 40,000 42,000 01/10/2011
Example data looks like
EmployeeHistId EmpId Name Salary StartDate EndDate
1 45 Joe 40,000.00 01/05/2011 01/10/2011
2 45 Joe 42,000.00 01/11/2011 NULL
3 46 Bob 20,000.00 01/12/2011 NULL
The Swiss army ROW_NUMBER() to the rescue:
with cte as (
select EmployeeHistId
, EmpId
, Name
, Salary
, StartDate
, EndDate
, row_number () over (
partition by EmpId order by StartDate desc) as StartDateRank
from EmployeeHist)
select n.EmpId
, n.Name
, o.Salary as OldDalary
, n.Salary as NewSalary
, o.EndData as ChangeDate
from cte n
join cte o on o.EmpId = n.EmpId
and n.StartDateRank = 1
and o.StartDateRank = 2;
Use outer join to get employees that never got a raise too.
These kind of queries are always tricky because of data purity issues, if StartDate and EndDate overlap for instance.
I assume the StartDate and EndDate will be same for the new job and previous job.
If thats the case try this.
SELECT a.Name AS EmployeeName, b.Salary AS NewSalary a.Salary AS NewSalary, a.StartDate AS ChangeDate
FROM EMPLOYEE A, EMPLOYEE B
WHERE a.EmpID = b.EmpID
AND a.EndDate IS NULL
AND a.StartDate = b.EndDate
You can use the correlated join operator APPLY which can solve these types of challenges easily
select a.name, curr.salary, prev.salary, prev.enddate
from employee e
cross apply ( -- to get the current
select top(1) *
from emphist h
where e.empid = h.empid -- related to the employee
order by startdate desc) curr
outer apply ( -- to get the prior, if any
select top(1) *
from emphist h
where e.empid = h.empid -- related to the employee
and h.EmployeeHistId <> curr.EmployeeHistId -- prevent curr=prev
order by enddate desc) prev -- last ended

Tricky SQL SELECT Statement

I have a performance issue when selecting data in my project.
There is a table with 3 columns: "id","time" and "group"
The ids are just unique ids as usual.
The time is the creation date of the entry.
The group is there to cummulate certain entries together.
So the table data may look like this:
ID | TIME | GROUP
------------------------
1 | 20090805 | A
2 | 20090804 | A
3 | 20090804 | B
4 | 20090805 | B
5 | 20090803 | A
6 | 20090802 | B
...and so on.
The task is now to select the "current" entries (their ids) in each group for a given date. That is, for each group find the most recent entry for a given date.
Following preconditions apply:
I do not know the different groups in advance - there may be many different ones changing over time
The selection date may lie "in between" the dates of the entries in the table. Then I have to find the closest one in each group. That is, TIME is less than the selection date but the maximum of those to which this rule applies in a group.
What I currently do is a multi-step process which I would like to change into single SELECT statement:
SELECT DISTINCT group FROM table to find the available groups
For each group found in 1), SELECT * FROM table WHERE time<selectionDate AND group=loop ORDER BY time DESC
Take the first row of each result found in 2)
Obviously this is not optimal.
So I would be very happy if some more experienced SQL expert could help me to find a solution to put these steps in a single statement.
Thank you!
The following will work on SQL Server 2005+ and Oracle 9i+:
WITH groups AS (
SELECT t.group,
MAX(t.time) 'maxtime'
FROM TABLE t
GROUP BY t.group)
SELECT t.id,
t.time,
t.group
FROM TABLE t
JOIN groups g ON g.group = t.group AND g.maxtime = t.time
Any database should support:
SELECT t.id,
t.time,
t.group
FROM TABLE t
JOIN (SELECT t.group,
MAX(t.time) 'maxtime'
FROM TABLE t
GROUP BY t.group) g ON g.group = t.group AND g.maxtime = t.time
Here's how I would do it in SQL Server:
SELECT * FROM table WHERE id in
(SELECT top 1 id FROM table WHERE time<selectionDate GROUP BY [group] ORDER BY [time])
The solution will vary by database server, since the syntax for TOP queries varies. Basically you are looking for a "top n per group" query, so you can Google that if you want.
Here is a solution in SQL Server. The following will return the top 10 players who hit the most home runs per year since 1990. The key is to calculate the "Home Run Rank" of each player for each year.
select
HRRanks.*
from
(
Select
b.yearID, b.PlayerID, sum(b.Hr) as TotalHR,
rank() over (partition by b.yearID order by sum(b.hr) desc) as HR_Rank
from
Batting b
where
b.yearID > 1990
group by
b.yearID, b.playerID
)
HRRanks
where
HRRanks.HR_Rank <= 10
Here is a solution in Oracle (Top Salespeople per Department)
SELECT deptno, avg_sal
FROM(
SELECT deptno, AVG(sal) avg_sal
GROUP BY deptno
ORDER BY AVG(sal) DESC
)
WHERE ROWNUM <= 10;
Or using analytic functions:
SELECT deptno, avg_sal
FROM (
SELECT deptno, avg_sal, RANK() OVER (ORDER BY sal DESC) rank
FROM
(
SELECT deptno, AVG(sal) avg_sal
FROM emp
GROUP BY deptno
)
)
WHERE rank <= 10;
Or same again, but using DENSE_RANK() instead of RANK()
select * from TABLE where (GROUP, TIME) in (
select GROUP, max(TIME) from things
where TIME >= 20090804
group by GROUP
)
Tested with MySQL (but I had to change the table and column names because they are keywords).
SELECT *
FROM TABB T1
QUALIFY ROW_NUMBER() OVER ( PARTITION BY GROUPP,TIMEE order by id desc )=1