I have a selection that returns
EMP DOC DATE
1 78 01/01
1 96 02/01
1 96 02/01
1 105 07/01
2 4 04/01
2 7 04/01
3 45 07/01
3 45 07/01
3 67 09/01
And i want to add a row number (il'l use it as a primary id) but i want it to change always when the "EMP" changes, and also won't change when the doc is same as previous one like:
EMP DOC DATE ID
1 78 01/01 1
1 96 02/01 2
1 96 02/01 2
1 105 07/01 3
2 4 04/01 1
2 7 04/01 2
3 45 07/01 1
3 45 07/01 1
3 67 09/01 2
In SQL Server I could use LAG to compare previous DOC but I can't seem to find a way into SYBASE SQL Anywhere, I'm using ROW_NUMBER to partitions by the "EMP", but it's not what I need.
SELECT EMP, DOC, DATE, ROW_NUMBER() OVER (PARTITION BY EMP ORDER BY EMP, DOC, DATE) ID -- <== THIS WILL CHANGE THE ROW NUMBER ON SAME DOC ON SAME EMP, SO WOULD NOT WORK.
Anyone have a direction for this?
You sem to want dense_rank():
select
emp,
doc,
date,
dense_rank() over(partition by emp order by date) id
from mytable
This numbers rows within groups having the same emp, and increments only when date changes, without gaps.
if performance is not a issue in your case, you can try sth. like:
SELECT tx.EMP, tx.DOC, tx.DATE, y.ID
FROM table_xxx tx
join y on tx.EMP = y.EMP and tx.DOC = y.DOC
(SELECT EMP, DOC, ROW_NUMBER() OVER (PARTITION BY EMP ORDER BY DOC) ID
FROM(SELECT EMP, DOC FROM table_xxx GROUP BY EMP, DOC)x)y
Related
I have 2 simple tables as follows:-
Student
---------------------------------------------
student_id student_name student_class
107 paul A Level-I
108 susan Diploma
109 jack O Level-II
---------------------------------------------
Student_Positions
--------------------------------------------------
position_id student_id position date
1 107 1 1-1-2020
2 107 1 1-1-2021
3 109 2 1-1-2021
4 109 1 1-6-2019
I want a left outer join on these tables for the latest position of every student as fol:-
student_id student_name position date
107 paul 1 1-1-2021
108 susan
109 jack 2 1-1-2021
I have made multiple tries with different positions of max(date) and group by but in vain.
Please help with correct query
The canonical SQL solution uses a window function such as row_number():
select s.*, sp.position, sp.date
from students s left join
(select sp.*,
row_number() over (partition by student_id order by date desc) as seqnum
from student_positions sp
) sp
on sp.student_id = s.student_id and sp.seqnum = 1;
So, i have an task in uni to get max stipend in each faculty from a table with stipends.
Faculty table is:
ID_FACULTY FACULTY_NAME DEAN TELEPHON
---------- ------------------------------ -------------------- --------
10 Informacijas tehnologiju Vitols 63023095
11 Lauksaimniecibas Gaile 63022584
12 Tehniska Dukulis 53020762
13 Partikas tehnologijas Sabovics 63021075
Money table is:
ID_PAYOUT STUDENT_ID PAYOUT_DA STIPEND COMPENSATION
---------- ---------- --------- ---------- ------------
100 1 24-SEP-20 45.25 15
101 7 20-SEP-20 149.99 0
102 3 18-SEP-20 100 0
103 17 02-SEP-20 90.85 20
104 9 03-SEP-20 85 20
105 19 09-SEP-20 70.75 0
106 25 15-SEP-20 55 15
107 17 17-SEP-20 105.54 0
108 15 22-SEP-20 94 0
109 27 28-SEP-20 100 20
And the student table is:
ID_STUDENT SURNAME NAME COURSE_YEAR FACULTY_ID BIRTHDATE
---------- ------------------------- -------------------- ----------- ---------- ---------
1 Lapa Juris 4 13 27-SEP-96
3 Vilkauss Fredis 2 10 17-MAY-99
5 Karlsone Rasa 1 11 13-MAR-00
7 Grozitis Guntars 3 12 16-APR-97
9 Sonciks Jurgis 2 10 17-MAR-99
11 Berzajs Olafs 3 10 14-FEB-97
13 Vike Ilvija 2 13 14-MAY-99
15 Baure Inga 3 11 12-APR-97
17 Viskers Zigmunds 2 13 15-AUG-99
19 Talmanis Harijs 3 13 15-JUL-97
21 Livmanis Indulis 1 10 19-JAN-00
23 Shaveja Uva 2 13 18-FEB-98
25 Lacis Guntis 4 10 17-SEP-96
27 Liepa Guna 4 11 18-AUG-96
29 Klava Juris 2 10 19-MAY-98
I have tried many variations of queries, i think that I even tried all the possible combinations of joins, but i cannot achieve the neccessary result.
One of my queries looked like this:
SQL> SELECT ROW_NUMBER() OVER (ORDER BY surname) "Nr.",
f.faculty_name,
s.surname,
s.name,
MAX(m.stipend)
FROM faculty f, student s INNER JOIN money m ON s.id_student = m.student_id
WHERE s.faculty_id = f.id_faculty
GROUP BY f.faculty_name, s.surname, s.name
ORDER BY s.surname;
Which returned me the following result:
Nr. FACULTY_NAME SURNAME NAME MAX(M.STIPEND)
---------- ------------------------------ ------------------------- -------------------- --------------
1 Lauksaimniecibas Baure Inga 94
2 Tehniska Grozitis Guntars 149.99
3 Informacijas tehnologiju Lacis Guntis 55
4 Partikas tehnologijas Lapa Juris 45.25
5 Lauksaimniecibas Liepa Guna 100
6 Informacijas tehnologiju Sonciks Jurgis 85
7 Partikas tehnologijas Talmanis Harijs 70.75
8 Informacijas tehnologiju Vilkauss Fredis 100
9 Partikas tehnologijas Viskers Zigmunds 105.54
9 rows selected.
So the goal of this task is to retrieve the maximum amount of stipend granted to a student in a certain faculty.
Can someone please tell what am I doing wrong here?
Just max amount per faculty:
SELECT
f.faculty_name,
MAX(m.stipend)
FROM
faculty f
INNER JOIN student s ON s.faculty_id = f.id_faculty
INNER JOIN money m ON s.id_student = m.student_id
GROUP BY f.faculty_name
Max amount and all other details too:
SELECT * FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY f.faculty_name ORDER BY m.stipend desc) rn,
f.*,
s.*,
m.*
FROM
faculty f
INNER JOIN student s ON s.faculty_id = f.id_faculty
INNER JOIN money m ON s.id_student = m.student_id
) x
WHERE x.rn = 1
Points of note:
Do not use old style joins; if you ever write one table_name, other_table_name in a FROM block, you're using old style joins. Don't do it; they became bad news about 30 years ago
When you have a max-n-per-group, you specify how finely detailed the group is. If you GROUP BY s.first_name, s.last_name, f.faculty_name then your groups are "every unique combination of firstname/lastname/faculty, so the only way you'll get multiple items in your group is if there are two John Smiths in Mathematics. If the group is to be the whole of mathematics, then the faculty name (and anything else that is uniquely related 1:1 to it, like the faculty ID) is all that you can put in your group. Anything not in a group must be in an aggregation, like MAX
When you want other details too, you either group and max the data and then join this groupmaxed data back to the original data to use it as a filter, or you use an approach like here where you use a row_number or rank, with a partition (which is like an autojoined grouped summary). There is no group here; the row numbering acts like a group because it restarts from 1 every different faculty and proceeds incrementally as stipend decreses. This means that the highest stipend is always in row number 1.
Unlike using a groupmax that you join back to get the detail, the row_number route does not produce duplicate rows with tied-for-highest stipends
This question already has answers here:
How to group by on consecutive values in SQL
(2 answers)
Closed 6 years ago.
I have a requirement to compute bonus payout based on spread goal and date achieved as follows:
Spread Goal | Date Achieved | Bonus Payout
----------------------------------------------
$3,500 | < 27 wks | $2,000
$3,500 | 27 wks to 34 wks | $1,000
$3,500 | > 34 wks | $0
I have a table in SQL Server 2014 where the subset of the data is as follows:
EMP_ID WK_NUM NET_SPRD_LCL
123 10 0
123 11 1500
123 15 3600
123 18 3800
123 19 4000
Based on the requirement, I need to look for records where NET_SPRD_LCL is greater than or equal to 3500 during 2 continuous wk_num.
So, in my example, WK_NUM 15 and 18 (which in my case are continuous because I have a calendar table that I join to to exclude the holiday weeks) are less than 27 wks and have NET_SPRD_LCL > 3500.
For this case, I want to output the MAX(WK_NUM), it's associated NET_SPRD_LCL and BONUSPAYOUT = 2000. So, the output should be as follows:
EMP_ID WK_NUM NET_SPRD_LCL BONUSPAYOUT
123 18 3800 2000
If this meets the first requirement, the script should output and quit. If not, then I will look for the second requirement where Date Achieved is between 27 wks to 34 wks.
I hope I was able to explain my requirement clearly :-)
Thanks for the help.
Nice question! I broke my mind on situations like 4 rows in a turn are with 3500 and more. And came up with this.
You can use CTE, recursive CTE and ROW_NUMBER():
;WITH cte AS(
SELECT EMP_ID,
WK_NUM,
NET_SPRD_LCL,
ROW_NUMBER() OVER (PARTITION BY EMP_ID ORDER BY WK_NUM) rn
FROM YourTable
)
, recur AS (
SELECT EMP_ID,
WK_NUM,
NET_SPRD_LCL,
rn,
1 as lev
FROM cte
WHERE rn = 1
UNION ALL
SELECT c.EMP_ID,
c.WK_NUM,
c.NET_SPRD_LCL,
c.rn,
CASE WHEN c.NET_SPRD_LCL < 3500 THEN Lev+1 ELSE Lev END
FROM cte c
INNER JOIN recur r
ON r.rn+1 = c.rn
)
SELECT TOP 1 WITH TIES
EMP_ID,
WK_NUM,
NET_SPRD_LCL,
CASE WHEN WK_NUM < 27 THEN $2000
WHEN WK_NUM between 27 and 34 THEN $1000
ELSE $0 END as Bonus
FROM recur
WHERE NET_SPRD_LCL >= 3500
ORDER BY ROW_NUMBER() OVER(PARTITION BY EMP_ID,lev ORDER BY WK_NUM)%2
Output for data you provided:
EMP_ID WK_NUM NET_SPRD_LCL Bonus
123 18 3800 2000,00
I have created an emp table with the following records in it.
create table emp(
EMPNO integer,
EMPNAME varchar2(20),
SALARY number);
select * from emp;
empno empname salary
10 bill 2000
11 bill 2000
12 mark 3000
12 mark 3000
12 mark 3000
12 philip 3000
12 john 3000
13 tom 4000
14 tom 4000
14 jerry 5000
14 matt 5000
15 susan 5000
To delete duplicates i have been using the rownum() function along with partition by and order by clause with the query as follows:
delete from emp where rowid in
(
select rid from
(
select rowid rid,
row_number() over(partition by empno order by empno) rn
from emp
)
where rn > 1
);
--6 rows deleted
The query deletes all the employee records with duplicate empno's and the result looks somethin like this:
empno empname salary
10 bill 2000
11 bill 2000
12 mark 3000
13 tom 4000
14 tom 4000
15 susan 5000
When i use the inner query to fetch the rownumbers for all the results in the table it gives me the following result:
select rowid as rid,empno,empname,
row_number() over(partition by empno order by empno) rn
from emp;
rowid rownumber
AACDJUAAPAAGLlTAAA 10 bill 1
AACDJUAAPAAGLlTAAB 11 bill 1
AACDJUAAPAAGLlTAAE 12 mark 1
AACDJUAAPAAGLlTAAD 12 mark 2
AACDJUAAPAAGLlTAAC 12 mark 3
AACDJUAAPAAGLlTAAF 12 philip 4
AACDJUAAPAAGLlTAAG 12 john 5
AACDJUAAPAAGLlTAAH 13 tom 1
AACDJUAAPAAGLlTAAI 14 tom 1
AACDJUAAPAAGLlTAAJ 14 jerry 2
AACDJUAAPAAGLlTAAK 14 matt 3
AACDJUAAPAAGLlTAAL 15 susan 1
But when i use rank() in place of the rownumber() function it gives me the following result:
select rowid as rid,empno,empname,
rank() over(partition by empno order by empno) rn
from emp;
rowid rank
AACDJUAAPAAGLlTAAA 10 bill 1
AACDJUAAPAAGLlTAAB 11 bill 1
AACDJUAAPAAGLlTAAE 12 mark 1
AACDJUAAPAAGLlTAAD 12 mark 1
AACDJUAAPAAGLlTAAC 12 mark 1
AACDJUAAPAAGLlTAAF 12 philip 1
AACDJUAAPAAGLlTAAG 12 john 1
AACDJUAAPAAGLlTAAH 13 tom 1
AACDJUAAPAAGLlTAAI 14 tom 1
AACDJUAAPAAGLlTAAJ 14 jerry 1
AACDJUAAPAAGLlTAAK 14 matt 1
AACDJUAAPAAGLlTAAL 15 susan 1
So my question here is why does rank() give the same value to all the records in the table even though there are duplicate empid's?
That's the way RANK() works. It would be rather surprising to get different RANK values for equal-ranking rows within the partition. In fact, the ORDER BY clause is the significant driver for RANK within a partition, but since you're using the same columns for the partitions as for the ordering, it is clear that every row ranks first within their respective partition (as they're the only value in the partition)
See an explanation in this blog post, where this SQL (PostgreSQL syntax)
SELECT
v,
ROW_NUMBER() OVER (window) row_number,
RANK() OVER (window) rank,
DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v
... produces this output
+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a | 1 | 1 | 1 |
| a | 2 | 1 | 1 |
| a | 3 | 1 | 1 |
| b | 4 | 4 | 2 |
| c | 5 | 5 | 3 |
| c | 6 | 5 | 3 |
| d | 7 | 7 | 4 |
| e | 8 | 8 | 5 |
+---+------------+------+------------+
There are three "ranking" analytic functions: row_number(), rank(), and dense_rank().
These all work very similarly. They assign numbers, in order, to rows within a group. The group is defined by the partition by clause. The ordering is defined by the order by clause. The difference between the three is how they handle duplicate values.
row_number() always returns sequential numbers within the group. When there are ties, then equal valued rows have sequential values, but they are different.
dense_rank() assigns sequential values with no gaps. However, equal valued rows are given the same values. The next value has a rank one more.
rank() assigns sequential values with gaps. Equal valued rows have the same value but subsequent rows have a gap.
Here is an example:
value row_number dense_rank rank
a 1 1 1
b 2 2 2
b 3 2 2
b 4 2 2
c 5 3 5
d 6 4 6
d 7 4 6
I'm fairly new to mysql and need a query I just can't figure out. Given a table like so:
emp cat date amt cum
44 e1 2009-01-01 1 1
44 e2 2009-01-02 2 2
44 e1 2009-01-03 3 4
44 e1 2009-01-07 5 9
44 e7 2009-01-04 5 5
44 e2 2009-01-04 3 5
44 e7 2009-01-05 1 6
55 e7 2009-01-02 2 2
55 e1 2009-01-05 4 4
55 e7 2009-01-03 4 6
I need to select the latest date transaction per 'emp' and per 'cat'. The above table would produce something like:
emp cat date amt cum
44 e1 2009-01-07 5 9
44 e2 2009-01-04 3 5
44 e7 2009-01-05 1 6
55 e1 2009-01-05 4 4
55 e7 2009-01-03 4 6
I've tried something like:
select * from orders where emp=44 and category='e1' order by date desc limit 1;
select * from orders where emp=44 and category='e2' order by date desc limit 1;
....
but this doesn't feel right. Can anyone point me in the right direction?
This should work, but I haven't tested it.
SELECT orders.* FROM orders
INNER JOIN (
SELECT emp, cat, MAX(date) date
FROM orders
GROUP BY emp, cat
) criteria USING (emp, cat, date)
Basically, this uses a subquery to get the latest entry for each emp and cat, then joins that against the original table to get all the data for that order (since you can't GROUP BY amt and cum).
The answer given by #R.Bemrose should work, and here's another trick for comparison:
SELECT o1.*
FROM orders o1
LEFT OUTER JOIN orders o2
ON (o1.emp = o2.emp AND o1.cat = o2.cat AND o1.date < o2.date)
WHERE o2.emp IS NULL;
This assumes that the columns (emp, cat, date) comprise a candidate key. I.e. there can be only one date for a given pair of emp & cat.