sql server row number order by - sql

I have a below table in sql server 2014
Empno Resign Hour Dept
1000 2999-01-01 40 20
1000 2999-01-01 40 21
1001 2999-01-01 40 22
1001 2999-01-01 40 23
I need to pick a top record based on Resignation date and Hour. It doesn't matter row with which dept gets picked up. So I went with query
SELECT *
FROM(
SELECT Empno,Resign, Hour, Dept,
ROW_NUMBER() OVER(PARTITION BY Empno ORDER BY Resign DESC,
hour DESC) AS Row
FROM Table ) AS master
WHERE master.Row = 1
AND master.Empno = '1000';
I got back with
EmployeeNumber ResignationDate Hour Dept Row
1000 2999-01-01 40 20 1
I understand sql server doesn't guarantee the order(in this case row with which dept) of the row number unless an order by clause is specified for the Dept.
I dont mind which row with which dept gets picked up but would this happen consistently to pick one, based on some index or id, how would the top would be produced by the query plan?
In the row_number I can simply add another orderby based on dept so that it consistently picks up one but I dont want to do that.

No. You have to add order by with unique combination to force non-arbitrary output.
And why? Its easy question - sql server don't see tables as human. He reads and finds pages which not may not be near or in a row.

Related

How to use Max while taking other values from another column?

I am new in SQL and have problem picking the biggest value of a column for every manager_id and also other information in the same row.
Let me show the example - consider this table:
name
manager_id
sales
John
1
100
David
1
80
Selena
2
26
Leo
1
120
Frank
2
97
Sara
2
105
and the result I am expecting would be like this:
name
manager_id
top_sales
Leo
1
120
Sara
2
105
I tried using Max but the problem is that I must group it with manager_id and not being able to take name of the salesPerson.
select manager_id, max(sales) as top_sales
from table
group by manager_id ;
This is just an example and the actual query is very long and I am taking the information from different tables. I know that I can use the same table and join it again but the problem is as I mentioned this query is very long as I am extracting info from different tables with multiple conditions. And I don't want to make a temporary table to save it. It should be done in one single query and I actually did solve this but the query is super long due to the inner join that I used and made original table twice.
My question is that can I use Max and have the value in the name column or is there other method to solve this?
Appreciate all help
You can use row_number() with CTE to get the highest sales for each manager as below:
with MaxSales as (
select name, manager_id, sales,row_number() over (partition by manager_id order by sales desc) rownumber from table
)
select name , manager_id ,sales from MaxSales where rownumber=1

Optimize SELECT query for working with large database

This is a part of my database:
ID EmployeeID Status EffectiveDate
1 110545 Active 2011-08-01
2 110700 Active 2012-01-05
3 110060 Active 2012-01-05
4 110222 Active 2012-06-30
5 110545 Resigned 2012-07-01
6 110545 Active 2013-02-12
I want to generate records which select Active employees:
ID EmployeeID Status EffectiveDate
2 110700 Active 2012-01-05
3 110060 Active 2012-01-05
4 110222 Active 2012-06-30
So, I tried this query:
SELECT *
FROM Employee AS E
WHERE E.Status='Active' AND
E.EffectiveDate between'2011-08-01' and '2012-07-02'AND NOT
EXISTS(SELECT * FROM Employee AS E2
WHERE E2.EmployeeID = E.EmployeeID AND E2.Status = 'Resigned'
AND E2.EffectiveDate between '2011-08-01' and '2012-07-02'
);
It only works with small amount of data, but got timeout error with large database.
Can you help me optimize this?
This is how I read your request: You want to show active employees. For this to happen, you look at their latest entry, which is either 'Active' or 'Resigned'.
You want to restrict this to a certain time range. That probably means you want to find all employees that became active without becoming immediately inactive again within that time frame.
So, get the latest date per employee first, then stay with those rows in case they are active.
select *
from employee
where (employeeid, effectivedate) in
(
select employeeid, max(effectivedate)
from employee
where effectivedate between date '2011-08-01' and date '2012-07-02'
group by employeeid
)
and status = 'active'
order by employeeid;
The subquery tries to find a time range and then look at each employee to find their latest date within. I'd offer the DBMS this index:
create index idx on employee (effectivedate, employeeid);
The main query wants to find that row again by using employeeid and effectivedate and would then look up the status. The above index could be used again. We could even add the status in order to ease the lookup:
create index idx on employee (effectivedate, employeeid, status);
The DBMS may use this index or not. That's up to the DBMS to decide. I find it likely that it will, for it can be used for all steps in the execution of the query and even contains all columns the query works with, so the table itself wouldn't even have to be read.
I have tried to achieve the above result set using Case Statements.
Hope this helps.
CREATE TABLE employee_test
(rec NUMBER,
employee_id NUMBER,
status VARCHAR2(100),
effectivedate DATE);
INSERT INTO employee_test VALUES(1,110545,'Active',TO_DATE('01-08-2011','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(2,110545,'Active',TO_DATE('05-01-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(3,110545,'Active',TO_DATE('05-01-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(4,110545,'Active',TO_DATE('30-06-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(5,110545,'Resigned',TO_DATE('01-07-2012','DD-MM-YYYY'));
INSERT INTO employee_test VALUES(6,110545,'Active',TO_DATE('12-02-2013','DD-MM-YYYY'));
COMMIT;
SELECT * FROM(
SELECT e.* ,
CASE WHEN (effectivedate BETWEEN TO_DATE('2011-08-01','YYYY-MM-DD') AND TO_DATE('2012-07-02','YYYY-MM-DD') AND status='Active')
THEN 'Y' ELSE 'N' END AS FLAG
FROM Employee_Test e)
WHERE Flag='Y'
;
I'm adding another answer with another interpretation of the request. Just in case :-)
The table shows statuses per employee. An employee can become active, then retired, then active again. But they can not become active and then active again, without becoming retired in between, of course.
We are looking at a time range and want to find all employees that became active but never retired within - no matter whether they became active again after retirement in that period.
This makes this easy. We are looking for employees, that have exactly one row in that time range and that row is active. One way to do this:
select employeeid, any_value(effectivedate), max(status)
from employee
where effectivedate between date '2011-08-01' and date '2012-07-02'
group by employeeid
having max(status) = 'Active'
order by employeeid;
As in my other answer, an appropriate index would be
create index idx on employee (effectivedate, employeeid, status);
as we want to look into the date range and look up the statuses per employee.

Group BY Statement error to get unique records

I am new to SQL Server, used to work with MYSQL and trying to get the records from a table using Group By.
The table structure is given below:
SELECT S1.ID,S1.Template_ID,S1.Assigned_By,S1.Assignees,S1.Active FROM "Schedule" AS S1;
Output:
ID Template_ID Assigned_By Assignees Active
2 25 1 3 1
3 25 5 6 1
6 26 5 6 1
I need to get the values of all columns using the Group By statement below
SELECT Template_ID FROM "Schedule" WHERE "Assignees" IN(6, 3) GROUP BY "Template_ID";
Output:
Template_ID
25
26
I tried the following code to fetch the table using Group By, but it's fetching all the rows.
SELECT S1.ID,S1.Template_ID,S1.Assigned_By,S1.Assignees,S1.Active FROM "Schedule" AS S1 INNER JOIN(SELECT Template_ID FROM "Schedule" WHERE "Assignees" IN(6, 3) GROUP BY "Template_ID") AS S2 ON S2.Template_ID=S1.Template_ID
My Output Should be like,
ID Template_ID Assigned_By Assignees Active
2 25 1 3 1
6 26 5 6 1
I was wondering whether I can get ID of the column as well? I use the ID for editing the records in the web.
The query doesn't work as expected in MySQL either, except by accident.
Nonaggregated columns in MySQL aren't part of the SQL standard and not even allowed in MySQL 5.7 and later unless the default value of the ONLY_FULL_GROUP_BY mode is changed.
In earlier versions the result is non-deterministic.
The server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.
This means there's was no way to know what rows will be returned this query :
SELECT S1.ID,S1.Template_ID,S1.Assigned_By,S1.Assignees,S1.Active
FROM "Schedule" AS S1
GROUP BY Template_ID;
To get deterministic results you'd need a way to rank rows with the ranking functions introduced in MySQL 8, like ROW_NUMBER(). These are already available in SQL Server since SQL Server 2012 at least. The syntax is the same for both databases :
WITH ranked as AS
(
SELECT
ID,Template_ID,Assigned_By,Assignees Active,
ROW_NUMBER(PARTITION BY Template_ID Order BY ID)
FROM Scheduled
WHERE Assignees IN(6, 3)
)
SELECT ID,Template_ID,Assigned_By,Assignees Active
FROM ranked
Where RN=1
PARTITION BY Template_ID splits the result rows based on their Template_ID value into separate partitions. Within that partition, the rows are ordered based on the ORDER BY clause. Finally, ROW_NUMBER calculates a row number for each ordered partition row.

i want to recover the value of column according to the last date till the date i will insert and want to recover the 2nd maximum salary

employee table amount is 100.62 for 04/04/2013 and 102.62 for 05/04/2013 and so on, so i want to recover the value for amount according to last maximum date and then i want the second maximum incremented salary of the person
and i tried these query till noq for my problem
i am very new in sql server and i write like
select amount from employee where max(date)<=15/04/2013
and
select top1 from employee(select top2 from employee order by salary desc) order by salary
You could use OFFSET FETCH Clause. For example your second query could be like this:
SELECT * FROM employee ORDER BY salary DESC OFFSET 1 ROWS FETCH NEXT 1 ROWS ONLY

Getting total number of records while fetching limited set - Oracle

I have the query below that fetches 500 records according to certain criteria. The fact that there are many rows (millions). I want to get total number of records so I can say "display 500 out of .... rows". Can I do this with this query? currently I have separate query to do that but I was wondering if I can do it in same query.
Cheers,
Tam
The Query:
SELECT * FROM APPL_PERF_STATS
WHERE (GENERIC_METHOD_NAME != 'NULL' AND CURRENT_APPL_ID != 'NULL' AND EVENT_NAME != 'NULL')
AND ROWNUM < 500
AND current_appl_id LIKE '%OrderingGUI%'
AND event_name LIKE '%/ccui%'
AND generic_method_name LIKE '%com.telus.customermgt.service.CustomerMgtSvc.getCustomer%' AND appl_perf_interval_typ_id = 1440
AND cover_period_start_ts >= to_date('06-07-2008 11:53','dd-mm-yyyy HH24:mi')
AND cover_period_start_ts <= to_date('11-08-2009 11:53','dd-mm-yyyy HH24:mi')
ORDER BY CURRENT_APPL_ID, EVENT_NAME, GENERIC_METHOD_NAME, CREATE_TS
In practice, when I am given such a problem I generally select one more than I am willing to display (say 500, so pull 501), and if I reached this amount then tell the user "More than 500 records were returned" which gives the user the ability to refine the query. You could run a count of the query first, then return top n rows, but that will require another trip to the server and depending on query complexity and data volume, may take a long time.
Another method would be to add to a statistics table counts which you can sum up right before you execute this query. The theory is that your statistics table would hold much less data than your primary tables and therefore can be mashed by the server quickly. If you write a process (stored procedures work best) to update these and then return your results.
I'm not sure about your application, or your users. But mine generally either don't care about the total amount of records, or want only the total amount of records and don't want the detail.
In Oracle at least you can do this using analytic functions:
For example:
select
count(*) over (partition by null) total_num_items,
p.*
from
APPL_PERF_STATS p
where
...
Note that (as APC mentioned) you'll need to embed the ordered query in a sub-query before using ROWNUM to restrict the output to n lines.
Although this is a method of getting the total number of lines in the returned resultset, in the background Oracle will be counting all the rows. So, if there are "millions of rows" then there will be a performance hit. If the performance hit is excessive, the approach I'd use would be to pre-aggregate the total row count as a separate batch job. You may find materialized views useful if this is the case.
ORDER BY and ROWNUM don't interact the way you think they interact. ROWNUM gets applied first:
SQL> select ename from emp
2 where rownum < 5
3 order by ename
4 /
ENAME
----------
CLARKE
PADFIELD
ROBERTSON
VAN WIJK
SQL> select * from (
2 select ename from emp
3 order by ename
4 )
5 where rownum < 5
6 /
ENAME
----------
BILLINGTON
BOEHMER
CAVE
CLARKE
SQL>
The desire to display page 1 of 100 comes from over-exposure to Google. As in other areas what Google does is irrelevant to enterprise IT. Google guesses, but it has the architecture to make its guesses fairly accurate.
If you're using an out-of-the-box RDBMS this is not true of your set-up. You have to do it the hard way, by executing two queries - one to get the count, then one one get the rows. If the query is well indexed, doing the initial count might not be too expensive, but it's still two queries.
SELECT rn, total_rows, x.OWNER, x.object_name, x.object_type
FROM (SELECT COUNT (*) OVER (PARTITION BY owner) AS TOTAL_ROWS,
ROW_NUMBER () OVER (ORDER BY 1) AS rn, uo.*
FROM all_objects uo
WHERE owner = 'CSEIS') x
WHERE rn BETWEEN 6 AND 10
RN TOTAL_ROWS OWNER OBJECT_NAME OBJECT_TYPE
6 1262 CSEIS CG$BDS_MODIFICATION_TYPES TRIGGER
7 1262 CSEIS CG$AUS_MODIFICATION_TYPES TRIGGER
8 1262 CSEIS CG$BDR_MODIFICATION_TYPES TRIGGER
9 1262 CSEIS CG$ADS_MODIFICATION_TYPES TRIGGER
10 1262 CSEIS CG$BIS_LANGUAGES TRIGGER
In practice I've found it almost never helpful to have such data - and expensive. Especially if the tables are being written to, your total count is constantly changing and therefor a pretty unreliable number to show the user.