I am stuck on how to fetch the previous row after joining multiple tables - Below is the data set after joining with multiple tables
CARRCD FLTNBR IND DEPDATETIME
---- -------- ----- --------
AB 123 0 2020-10-29T14:00:00
AB 124 0 2020-10-29T10:00:00
AB 119 0 2020-10-29T09:00:00
AB 100 0 2020-10-29T08:00:00
AB 105 1 2020-10-29T07:00:00 ---------> Match
AB 99 1 2020-10-29T06:00:00
AB 135 1 2020-10-29T04:00:00
AB 178 1 2020-10-29T02:00:00
Now once I get the above dataset after joining multiple tables, I have to find the first record whose IND matches with 1 and then return the previous record. So in the above data set
the first record which matches IND=1 is "AB 105" and then I have to return the previous record
AB 100 0 2020-10-29T08:00:00
Please help
If you want the first time this happens, then:
select t.*
from (select t.*,
lead(ind) over (order by depdatetime desc) as next_ind
from t
) t
where t.ind = 0 and t.next_ind = 1
order by depdatetime
fetch first 1 row only;
However, I suspect that you want this per carcd. If so, you need partition by and some more logic:
select t.*
from (select t.*,
row_number() over (partition by carcd order by depdatetime) as seqnum
from (select t.*,
lead(ind) over (partition by carcd order by depdatetime desc) as next_ind
from t
) t
where t.ind = 0 and t.next_ind = 1
) t
where seqnum = 1;
Note that the above is quite general. In particular, it works:
When ind might have more than two values.
When ind can return to 0 after 1.
When the first row is 1 (which is rejected as a candidate).
If the problem is more constrained, there are likely other solutions.
Based on additional info you provided in comment, you can locate requested row as the one which has ind=0 and is followed by row with ind=1. This is done using lead analytical function.
Assuming t is your relation, this blindly typed SQL should work:
select CARRCD, FLTNBR, IND, DEPDATETIME
from (
select t.*, lead(IND) over (order by depdatetime desc) next_ind
from t
) x
where x.ind = 0 and x.next_ind = 1
Related
I do have the following table (just a sample) and would like to get the Points subtract from Record2 to Record1. (Record2-Record1) from the latest record of both record1 and 2. The records are entered in category of Match. 1 Match will consists of 2 records which are Record 1 and Record 2.
The output will be 3 as the newest record is ID 3 and 4 from the Match2.)
ID
Name
Points
TimeRecorded
Match
1
Record 1
3
2-Mar 2pm
1
2
Record 2
5
2-Mar 2pm
1
3
Record 1
5
4-Mar 5pm
2
4
Record 2
8
4-Mar 5pm
2
I tried to get the value of subtracting both query as below. But I feel that this is not the good way as it is hard coded for the match and the Name of the record. May I know how to construct a better query in order to get the latest record of the grouped match and calculate the points whereby subtracting Record1 from Record2.
SELECT
(select Points from RunRecord where Name= 'Record2' AND Match = 2)
- (select Points from RunRecord where Name= 'Record1' AND Match = 2)
You could use:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TimeRecorded DESC) rn
FROM yourTable
)
SELECT
MAX(CASE WHEN Name = 'Record 2' THEN Points END) -
MAX(CASE WHEN Name = 'Record 1' THEN Points END) AS diff
FROM cte
WHERE rn = 1;
The CTE assigns a row number for each group of records of the same name, with 1 being assigned to the most recent record. Then, we aggregate over the entire table and pivot out the points to find the difference.
You can use the rank() window function to rank the records by match descending. Then take the top of the ranked records and use conditional aggregation to control the sign of the points added.
SELECT sum(CASE x.name
WHEN 'Record2' THEN
x.points
WHEN 'Record1' THEN
-x.points
END)
FROM (SELECT rr.name,
rr.points,
rank() OVER (ORDER BY rr.match DESC) r
FROM runrecord rr
WHERE name IN ('Record1',
'Record2')) x
WHERE x.r = 1;
Below is my data:
My requirement is to get the first 3 consecutive approvals. So from above data, ID 4, 5 and 6 are the rows that I need to select. ID 1 and 2 are not eligible, because ID 3 is a rejection and hence breaks the consecutive condition of actions. Basically, I am looking for the last rejection in the list and then finding the 3 consecutive approvals after that.
Also, if there are no rejections in the chain of actions then the first 3 actions should be the result. For below data:
So my output should be ID 11, 12 and 13.
And if there are less than 3 approvals, then the output should be the list of approvals. For below data:
output should be ID 21 and 22.
Is there any way to achieve this with SQL query only - i.e. no PL-SQL code?
Here is one method that uses window functions:
Find the first row where there are three approvals.
Find the minimum action_at among the rows with three approvals
Filter
Keep the three rows you want
This version uses fetch which is in Oracle 12+:
select t.*
from (select t.*,
min(case when has_approval_3 = 3 then action_at end) over () as first_action_at
from (select t.*,
sum(case when action = 'APPROVAL' then 1 else 0 end) over (order by action_at rows between current row and 2 following) as has_approval_3
from t
) t
) t
where action = 'APPROVAL' and
(action_at >= first_action_at or first_action_at is null)
order by action_at
fetch first 3 rows only;
You can use IN and ROW_NUMBER analytical function as following:
SELECT * FROM
( SELECT
T.*,
ROW_NUMBER() OVER(ORDER BY Y.ACTION_AT) AS RN
FROM YOUR_TABLE Y
WHERE Y.ACTION = 'APPROVE'
AND Y.ACTION_AT >= COALESCE(
(SELECT MAX(YIN.ACTION_AT)
FROM YOUR_TABLE YIN
WHERE YIN.ACTION = 'REJECT'
), Y.ACTION_AT) )
WHERE RN <= 3;
Cheers!!
I want to identify the users who visited section a and then subsequently visited b. Given the following data structure. The table contains 300,000 rows and updates daily with approx. 8,000 rows:
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 1 b 0
2 1 b 0
1 3 b 1
Ideally I want a new column that flags the visit to section b. For example on the third visit User 1 visited section b for the first time. I was attempting to do this using a CASE WHEN statement but after many failed attempts I am not sure it is even possible with CASE WHEN and feel that I should take a different approach, I am just not sure what that approach should be. I do also have a date column at my disposal.
Any suggestions on a new way to approach the problem would be appreciated. Thanks!
Correlated sub-queries should be avoided at all cost when working with Redshift. Keep in mind there are no indexes for Redshift so you'd have to rescan and restitch the column data back together for each value in the parent resulting in an O(n^2) operation (in this particular case going from 300 thousand values scanned to 90 billion).
The best approach when you are looking to span a series of rows is to use an analytic function. There are a couple of options depending on how your data is structured but in the simplest case, you could use something like
select case
when section != lag(section) over (partition by userid order by visitid)
then 1
else 0
end
from ...
This assumes that your data for userid 2 increments the visitid as below. If not, you could also order by your timestamp column
**USERID** **VISITID** **SECTION** Desired Solution--> **Conversion**
1 1 a 0
1 2 a 0
2 1 b 0
2 *2* b 0
2 *3* b 0
1 3 b 1
select t.*, case when v.ts is null then 0 else 1 end as conversion
from tbl t
left join (select *
from tbl x
where section = 'b'
and exists (select 1
from tbl y
where y.userid = x.userid
and y.section = 'a'
and y.ts < x.ts)) v
on t.userid = v.userid
and t.visitid = v.visitid
and t.section = v.section
Fiddle:
http://sqlfiddle.com/#!15/5b954/5/0
I added sample timestamp data as that field is necessary to determine whether a comes before b or after b.
To incorporate analytic functions you could use:
(I've also made it so that only the first occurrence of B (after an A) will get flagged with the 1)
select t.*,
case
when v.first_b_after_a is not null
then 1
else 0
end as conversion
from tbl t
left join (select userid, min(ts) as first_b_after_a
from (select t.*,
sum( case when t.section = 'a' then 1 end)
over( partition by userid
order by ts ) as a_sum
from tbl t) x
where section = 'b'
and a_sum is not null
group by userid) v
on t.userid = v.userid
and t.ts = v.first_b_after_a
Fiddle: http://sqlfiddle.com/#!1/fa88f/2/0
So I am really happy being able to rank results based on effective dates, but currently I'm having an issue where one data element repeats (POD) while another changes based on EFFDT (DEPT).
I only want to rank unique values for Pod, and later Dept. However Pod is based on Dept, which changes more frequently. The below code gives me:
EENBR PodRank POD DeptRank DeptNbr DeptEffdt
100 1 73 1 12420 4/11/2005
100 2 73 2 12560 5/22/2005
100 3 73 3 12501 6/24/2007
200 1 12 1 50768 3/14/2005
200 2 13 2 10949 9/9/2012
300 1 73 1 12450 3/21/2005
300 2 73 2 12471 12/25/2005
300 3 73 3 12581 12/21/2008
300 4 73 4 12585 6/6/2010
300 5 73 5 12432 5/19/2013
SELECT DISTINCT
AL4.FULL_NAME,
AL4.EMPLOYEE_NUMBER,
dense_rank() over (partition by AL4.EMPLOYEE_NUMBER
order by AL3.EFFECTIVE_START_DATE) as POD_RANKING,
AL7.POD_NBR as POD,
row_number() over (partition by AL4.EMPLOYEE_NUMBER
order by AL3.EFFECTIVE_START_DATE) as DEPT_RANKING,
AL3.RECORDVALUE AS DEPT_NUMBER,
AL3.EFFECTIVE_START_DATE AS "DEPT EFFECTIVE DATE"
FROM T1 AL3,
T2 AL4,
T3 AL7
WHERE AL4.PERSON_ID = AL3.PERSON_ID
AND AL4.EMPLOYEE_NUMBER = AL3.EMPLOYEE_NUMBER
AND AL3.RECORDTYPE = 'DEPARTMENT_NUMBER'
AND AL7.DEPT_NBR = AL3.RECORDVALUE
Order By AL4.Employee_Number;
Is there a function that only ranks unique values?
The function you are looking for is the analytic function dense_rank():
dense_rank() over (partition by eenbr order by pod) as ranking
This is the simplest way to get what you want. You can just add it in the select clause of your query.
There's no function for this, but you can get the result when you use nested window functions:
SELECT dt.*,
SUM(flag) OVER (PARTITION BY EMPLOYEE_NUMBER
ORDER BY "DEPT EFFECTIVE DATE") AS POD_RANKING
FROM
(
SELECT
AL4.FULL_NAME,
AL4.EMPLOYEE_NUMBER,
AL7.POD_NBR AS POD,
ROW_NUMBER() OVER (PARTITION BY AL4.EMPLOYEE_NUMBER
ORDER BY AL3.EFFECTIVE_START_DATE) AS DEPT_RANKING,
AL3.RECORDVALUE AS DEPT_NUMBER,
AL3.EFFECTIVE_START_DATE AS "DEPT EFFECTIVE DATE",
CASE WHEN ROW_NUMBER()
OVER (PARTITION BY AL4.EMPLOYEE_NUMBER,AL7.POD_NBR
ORDER BY AL3.EFFECTIVE_START_DATE) = 1 THEN 1 ELSE 0 END AS flag
FROM T1 AL3,
T2 AL4,
T3 AL7
WHERE AL4.PERSON_ID = AL3.PERSON_ID
AND AL4.EMPLOYEE_NUMBER = AL3.EMPLOYEE_NUMBER
AND AL3.RECORDTYPE = 'DEPARTMENT_NUMBER'
AND AL7.DEPT_NBR = AL3.RECORDVALUE
) dt
ORDER BY AL4.Employee_Number;
Edit:
Ok, I noticed this is a overly complex version of a simple DENSE_RANK with different order, shortly before Gordon posted his answer :-)
dense_rank() over (partition by AL4.EMPLOYEE_NUMBER order by AL7.POD_NBR)
Correction to my question....
I'm trying to select and sort in a query from a single table. The primary key for the table is a combination of a serialized number and a time/date stamp.
The table's name in the database is "A12", the columns are defined as:
Serial2D (PK, char(25), not null)
Completed (PK, datetime, not null)
Result (smallint, null)
MachineID (FK, smallint, null)
PT_1 (float, null)
PT_2 (float, null)
PT_3 (float, null)
PT_4 (float, null)
Since the primary key for the table is a combination of the "Serial2D" and "Completed", there can be multiple "Serial2D" entries with different values in the "Completed" and "Result" columns. (I did not make this database... I have to work with what I got)
I want to write a query that will utilize the value of the "Result" column ( always a "0" or "1") and retrive only unique rows for each "Serial2D" value. If the "Result" column has a "1" for that row, I want to choose it over any entries with that Serial that has a "0" in the Result column. There should be only one entry in the table that has a Result column entry of "1" for any Serial2D value.
Ex. table
Serial2d Completed Result PT_1 PT_2 PT_3 PT_4
------- ------- ------ ---- ---- ---- ----
A1 1:00AM 0 32.5 20 26 29
A1 1:02AM 0 32.5 10 29 40
A1 1:03AM 1 10 5 4 3
B1 1:04AM 0 29 4 1 9
B1 1:05AM 0 40 3 4 9
C1 1:06AM 1 9 7 6 4
I would like to be able to retrieve would be:
Serial2d Completed Result PT_1 PT_2 PT_3 PT_4
------- ------- ------ ---- ---- ---- ----
A1 1:03AM 1 10 5 4 3
B1 1:05AM 0 40 3 4 9
C1 1:06AM 1 9 7 6 4
I'm new to SQL and I'm still learning ALL the syntax. I'm finding it difficult to search for the correct operators to use since I'm not sure what I need, so please forgive my ignorance. A post with my answer could be staring me right in the face and i wouldn't know it, please just point me to it.
I appreciate the answers to my previous post, but the answers weren't sufficient for me due to MY lack of information and ineptness with SQL. I know this is probably insanely easy for some, but try to remember when you first started SQL... that's where I'm at.
Since you are using SQL Server, you can use Windowing Functions to get this data.
Using a sub-query:
select *
from
(
select *,
row_number() over(partition by serial2d
order by result desc, completed desc) rn
from a12
) x
where rn = 1
See SQL Fiddle with Demo
Or you can use CTE for this query:
;with cte as
(
select *,
row_number() over(partition by serial2d
order by result desc, completed desc) rn
from a12
)
select *
from cte c
where rn = 1;
See SQL Fiddle With Demo
You can group by Serial to get the MAX of each Time.
SELECT Serial, MAX([Time]) AS [Time]
FROM myTable
GROUP BY Serial
HAVING MAX(Result) => 0
SELECT
t.Serial,
max_Result,
MAX([time]) AS max_time
FROM
myTable t inner join
(SELECT
Serial,
MAX([Result]) AS max_Result
FROM
myTable
GROUP BY
Serial) m on
t.serial = m.serial and
t.result = m.max_result
group by
t.serial,
max_Result
This can be solved using a correlated sub-query:
SELECT
T.serial,
T.[time],
0 AS result
FROM tablename T
WHERE
T.result = 1
OR
NOT EXISTS(
SELECT 1
FROM tablename
WHERE
serial = T.serial
AND (
[time] > T.[time]
OR
result = 1
)
)