SQL - display selective output that is not sequenced last in a multiple occurrence - sql

I have a table like below
OP OP_var SPS SPS_sq
1010 01 KEB_x 01
1010 01 KEK_x 02
1010 02 KEH_c 01
1010 02 KEK_y 02
1010 02 KEB_d 03
1020 01 KEK_f 01
1020 01 KEE_g 02
The OP column has variance (OP_var) and within it is a group of SPS. SPS_sq is the sequencing of these SPS lines within the OP+OP_var.
I would like to display KEK% where the KEK%'s SPS_sq is not last (meaning, the KEK% is either first or anywhere in the middle of the sequence number of the OP and OP_var as long as it is not last.
The output should look like this :
OP OP_var SPS SPS_sq
1010 02 KEK_y 02
1020 01 KEK_f 01
ignore all KEK% that is SPS_sq last within the OP+OP_var.

I assume you're looking for a random row per (op, op_var) combination. The random row has to have an SPS like 'KEK%', and it cannot have the same SPS as the last row. (That implies it cannot be the last row itself.)
This example uses window functions, which are available in SQL Server, Oracle, PostGRES. It uses a SQL Server specific way (newid()) to create a random order.
select *
from (
select row_number() over (
partition by yt1.OP, yt1.OP_var
order by newid()) as rn2 -- Random order
, yt1.*
from dbo.YourTable yt1
join (
select row_number() over (
partition by OP, OP_var
order by SPS_sq desc) as rn
, *
from YourTable
) as last_row
on yt1.OP = last_row.OP
and yt1.OP_var = last_row.OP_var
and last_row.rn = 1 -- Highest SPS_sq
where yt1.SPS <> last_row.SPS
and yt1.SPS like 'KEK%'
) SubQueryALias
where rn2 = 1 -- Random KEK row that doesn't share SPS with last row
Example at SQL Fiddle.

If you want all kek where kek is not at the max sps_sq for (op, op_var)
select * from Table1 t
where t.sps like 'KEK%'
and not exists
(select null from Table1 t1
inner join (select MAX(t2.sps_sq) as maxsps_sq, t2.op, t2.op_var
from Table1 t2
GROUP BY t2.op, t2.op_var) as getmax
on t1.op = getmax.op and t1.op_var = getmax.op_var
and t1.sps_sq = getmax.maxsps_sq
where t1.op = t.op and t1.op_var = t.op_var and t1.sps = t.sps and t.sps_sq = t1.sps_sq
);
SqlFiddle
Caution :
As Andomar's noticed, this will take all the kek% for an [op, op_var] which don't have the last sps_sq number.

Related

Two layered SQL Filter on single column

I am trying to find a better alternative to the common (probably why there is a typical answer) approach of checking IN (subquery ordered by column A) when looking for a latest version.
Sample as follows.
SELECT v.id
FROM tblitemversion v,
tblitem i
WHERE v.id = i.id
AND v.versionid IN
( --latest revision
SELECT v2.versionid
FROM tblversionextended ve,
tblitemversion v2
WHERE ve.id = v2.id
AND v2.secondaryid = ir. secondaryid
ORDER BY ve.datecreated DESC,
ve.lastupdated DESC,
v2.versionid DESC offset 0 ROWS
FETCH NEXT 1 ROWS ONLY)
ID
VersionID
A1
00
A1
01
A1
01V1
A1
01Z1
B1
00
C1
00
C1
01
C1
02
C1
03
What I would like to do is write it so that I can have two filters, for IDs like B1 and C1 i only go by max value since its numeric. For A1 I know that 01V1 and 01Z1 are only going to be able to be determined based on the created date from the joined table, i have to include that in the filter. I was curious if there is a conditional or better approach than the existing query in order to increase performance (hitting against millions of records).

to verify ID passes through all previous stages

id current stage previous stages
1 06 05
1 06 03
2 04 03
2 04 02
Suppose there are 5 stages of an id (02,03,04 etc). An id should go through each of the stages. Here in the example Id=1 skips 04 and 02 stage but id=2 passes through all. So it should be current stage -1 and -2 etc.
I have to identify such ids which skips stages. I need to do it with a PostgreSQL query.
Assuming that current/previous is unique per id:
select id
from tab
group by id
having min(previous_stage) <> 2 -- doesn't start with 2
or max(current_stage) - 2 <> count(*) -- there's at least one missing stage
Edit:
Apply distinct if previous_stage is not unique within id:
select id
from tab
group by id
having min(previous_stage) <> 2 -- doesn't start with 2
or max(current_stage) - 2 <> count(distinct previous_stage) -- there's at least one missing stage
Edit:
My previous queries had a wrong logic, it should have been or instead of and.
This should cover you requirements:
select id
from tab
group by id
having not
-- these are the correct ids
( min(previous_stage) = 2 -- start with 2
and max(current_stage) - min(previous_stage) = count(distinct previous_stage) -- no missing steps
and max(previous_stage) = max(current_stage) -1 -- no greater step
)

Find out the last updated record in my DB using MAX in CASE statement

I have APPLICATIONSTATUSLOG_ID primary key field on my table.
In order to find out the last updated record in my DB and the MAX(APPLICATIONSTATUSLOG_ID) is presumed to be the most recent record.
I tried this code :
SELECT
MAX(CASE WHEN MAX(d.ApplicationStatusLog_ID) = d.ApplicationStatusLog_ID THEN d.ApplicationStatusID END) AS StatusID,
FROM
ApplicationStatusLog d
But I get error:
Msg 130, Level 15, State 1, Line 53 Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
My table looks like
ApplicationID - ApplicationStatusID - ApplicationStatusLogID
10000 17 100
10000 08 101
10000 10 102
10001 06 103
10001 10 104
10002 06 105
10002 07 106
My output should be:
10000 10
10001 10
10002 07
Please help me understand and resolve my problem.
If you want to just find the last updated row, given that it has max value in APPLICATIONSTATUSLOG_ID column. The query would be:
SELECT *
FROM ApplicationStatusLog
WHERE ApplicationStatusLog_ID = (SELECT MAX(ApplicationStatusLog_ID) FROM ApplicationStatusLog )
EDIT
So as you stated in comment, the query for it will be:
DECLARE #statusId INT
SELECT #statusId = STATUSID
FROM ApplicationStatusLog
WHERE ApplicationStatusLog_ID = (SELECT MAX(ApplicationStatusLog_ID) FROM ApplicationStatusLog )
EDIT 2:
The query as per your edit in question will be:
WITH C AS
(
SELECT ApplicationID,ApplicationStatusID,ApplicationStatusLogID, ROW_NUMBER() OVER (PARTITION BY ApplicationID ORDER BY ApplicationStatusLogID DESC) AS ranking
FROM ApplicationStatusLog
)
SELECT ApplicationID,ApplicationStatusID
FROM C
WHERE ranking = 1
You can join same table twice like this:
select IT.JoiningID, JT.MAXAPPLICATIONSTATUSID FROM dbo.[Table] IT
INNER JOIN (
Select JoiningID, MAX (APPLICATIONSTATUSID) MAXAPPLICATIONSTATUSID
FROM dbo.[Table]
GROUP BY JoiningID
) JT ON IT.JoiningID = JT.JoiningID
Now you have MAXAPPLICATIONSTATUSID per ID so you can write what you wand based on MAXAPPLICATIONSTATUSID.
Without full query
SELECT
x.StatusId
...
FROM <Table> a
CROSS APPLY
(
SELECT x.APPLICATIONSTATUSID as StatusId
FROM <Table> x
HAVING MAX(APPLICATIONSTATUSLOG_ID) = a.APPLICATIONSTATUSLOG_ID
GROUP BY x.APPLICATIONSTATUSID
)

How can I select rows where number went down as date went up

If I have a program at a repair shop and I want to select all of the cars in my RepairOrder table where the mileage of the later repair order is less than the mileage of the prior repair order, how can I build that select statement?
ID VehicleID Mileage RepairDate
01 1 18425 2013-08-13
02 1 28952 2013-02-26
03 2 22318 2012-08-27
04 3 21309 2012-08-07
05 3 16311 2012-02-27
06 3 16310 2012-02-11
07 4 11098 2011-03-23
08 5 21309 2012-08-07
09 5 16309 2012-02-27
10 5 16310 2012-02-11
In this case I should only be selecting VehicleID 1 because it has a RepairDate that is greater then the previous row, but a Mileage that is less than the previous row. There could also be 3 rows with the same vehicle and the middle date has a mileage of 3 or 5000000, and I will need to select those VehicleID's as well.
Results from using the LEAD() function
ID RepairDate Mileage
25 2011-12-23 45934
48 2009-02-26 13
48 2009-04-24 10
71 2011-07-26 31163
71 2015-01-13 65656
This is a great place to use LEAD() function for sql 2014+
SQL FIDDLE DEMO
WITH NextM as (
SELECT
* ,
LEAD(Mileage, 1, null) over (partition by VehicleID order by RepairDate) NextMileage
FROM RepairOrder
)
SELECT *
FROM NextM
WHERE Mileage > NextMileage
My solution show all columns so you can check what row have the problem.
Also I avoid using distinct because as OP suggest there may be several mistake for same car and this way you can see it all.
It's not terribly efficient, but you could do a pairwise selection
select t1.VehicleID
from table t1, table t2
where t1.VehicleId = t2.VehicleId
AND t1.Mileage > t2.Mileage
AND t1.RepairDate < t2.RepairDate
There is likely a better solution as pair-wise selections get EXTREMELY SLOW, but this should work as-is.
select distinct RO.VehicleID
from RepairOrder RO
where exists(select *
from RepairOrder
where ID != RO.ID
and VehicleID = RO.VehicleID and RepairDate > RO.RepairDate
and Mileage < RO.Mileage);
WITH RepairSeqs AS(
SELECT
DateSeq = Row_Number OVER (PARTITION BY VehicleID ORDER BY RepairDate),
MileageSeq = Row_Number OVER (PARTITION BY VehicleID ORDER BY Mileage),
*
FROM
dbo.RepairOrder
)
SELECT *
FROM RepairSeqs
WHERE DateSeq <> MileageSeq;
select distinct t.VehicleId
from (
select t.*, LEAD(Mileage) OVER (Partition by VehicleId ORDER BY RepairDate) LeadMileageValue
from RepairOrder t
) t
where t.Mileage > t.LeadMileageValue

Aggregate adjacent only records with T-SQL

I have (simplified for the example) a table with the following data
Row Start Finish ID Amount
--- --------- ---------- -- ------
1 2008-10-01 2008-10-02 01 10
2 2008-10-02 2008-10-03 02 20
3 2008-10-03 2008-10-04 01 38
4 2008-10-04 2008-10-05 01 23
5 2008-10-05 2008-10-06 03 14
6 2008-10-06 2008-10-07 02 3
7 2008-10-07 2008-10-08 02 8
8 2008-10-08 2008-11-08 03 19
The dates represent a period in time, the ID is the state a system was in during that period and the amount is a value related to that state.
What I want to do is to aggregate the Amounts for adjacent rows with the same ID number, but keep the same overall sequence so that contiguous runs can be combined. Thus I want to end up with data like:
Row Start Finish ID Amount
--- --------- ---------- -- ------
1 2008-10-01 2008-10-02 01 10
2 2008-10-02 2008-10-03 02 20
3 2008-10-03 2008-10-05 01 61
4 2008-10-05 2008-10-06 03 14
5 2008-10-06 2008-10-08 02 11
6 2008-10-08 2008-11-08 03 19
I am after a T-SQL solution that can be put into a SP, however I can't see how to do that with simple queries. I suspect that it may require iteration of some sort but I don't want to go down that path.
The reason I want to do this aggregation is that the next step in the process is to do a SUM() and Count() grouped by the unique ID's that occur within the sequence, so that my final data will look something like:
ID Counts Total
-- ------ -----
01 2 71
02 2 31
03 2 33
However if I do a simple
SELECT COUNT(ID), SUM(Amount) FROM data GROUP BY ID
On the original table I get something like
ID Counts Total
-- ------ -----
01 3 71
02 3 31
03 2 33
Which is not what I want.
If you read the book "Developing Time-Oriented Database Applications in SQL" by R T Snodgrass (the pdf of which is available from his web site under publications), and get as far as Figure 6.25 on p165-166, you will find the non-trivial SQL which can be used in the current example to group the various rows with the same ID value and continuous time intervals.
The query development below is close to correct, but there is a problem spotted right at the end, that has its source in the first SELECT statement. I've not yet tracked down why the incorrect answer is being given. [If someone can test the SQL on their DBMS and tell me whether the first query works correctly there, it would be a great help!]
It looks something like:
-- Derived from Figure 6.25 from Snodgrass "Developing Time-Oriented
-- Database Applications in SQL"
CREATE TABLE Data
(
Start DATE,
Finish DATE,
ID CHAR(2),
Amount INT
);
INSERT INTO Data VALUES('2008-10-01', '2008-10-02', '01', 10);
INSERT INTO Data VALUES('2008-10-02', '2008-10-03', '02', 20);
INSERT INTO Data VALUES('2008-10-03', '2008-10-04', '01', 38);
INSERT INTO Data VALUES('2008-10-04', '2008-10-05', '01', 23);
INSERT INTO Data VALUES('2008-10-05', '2008-10-06', '03', 14);
INSERT INTO Data VALUES('2008-10-06', '2008-10-07', '02', 3);
INSERT INTO Data VALUES('2008-10-07', '2008-10-08', '02', 8);
INSERT INTO Data VALUES('2008-10-08', '2008-11-08', '03', 19);
SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS (SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS (SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS (SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish)
OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)));
The output from that query is:
01 2008-10-01 2008-10-02
01 2008-10-03 2008-10-05
02 2008-10-02 2008-10-03
02 2008-10-06 2008-10-08
03 2008-10-05 2008-10-06
03 2008-10-05 2008-11-08
03 2008-10-08 2008-11-08
Edited: There's a problem with the penultimate row - it should not be there. And I'm not clear (yet) where it is coming from.
Now we need to treat that complex expression as a query expression in the FROM clause of another SELECT statement, which will sum the amount values for a given ID over the entries that overlap with the maximal ranges shown above.
SELECT M.ID, M.Start, M.Finish, SUM(D.Amount)
FROM Data AS D,
(SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS (SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS (SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS (SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish)
OR (T2.Start <= L.Finish AND L.Finish < T2.Finish)))) AS M
WHERE D.ID = M.ID
AND M.Start <= D.Start
AND M.Finish >= D.Finish
GROUP BY M.ID, M.Start, M.Finish
ORDER BY M.ID, M.Start;
This gives:
ID Start Finish Amount
01 2008-10-01 2008-10-02 10
01 2008-10-03 2008-10-05 61
02 2008-10-02 2008-10-03 20
02 2008-10-06 2008-10-08 11
03 2008-10-05 2008-10-06 14
03 2008-10-05 2008-11-08 33 -- Here be trouble!
03 2008-10-08 2008-11-08 19
Edited: This is almost the correct data set on which to do the COUNT and SUM aggregation requested by the original question, so the final answer is:
SELECT I.ID, COUNT(*) AS Number, SUM(I.Amount) AS Amount
FROM (SELECT M.ID, M.Start, M.Finish, SUM(D.Amount) AS Amount
FROM Data AS D,
(SELECT DISTINCT F.ID, F.Start, L.Finish
FROM Data AS F, Data AS L
WHERE F.Start < L.Finish
AND F.ID = L.ID
-- There are no gaps between F.Finish and L.Start
AND NOT EXISTS
(SELECT *
FROM Data AS M
WHERE M.ID = F.ID
AND F.Finish < M.Start
AND M.Start < L.Start
AND NOT EXISTS
(SELECT *
FROM Data AS T1
WHERE T1.ID = F.ID
AND T1.Start < M.Start
AND M.Start <= T1.Finish))
-- Cannot be extended further
AND NOT EXISTS
(SELECT *
FROM Data AS T2
WHERE T2.ID = F.ID
AND ((T2.Start < F.Start AND F.Start <= T2.Finish) OR
(T2.Start <= L.Finish AND L.Finish < T2.Finish)))
) AS M
WHERE D.ID = M.ID
AND M.Start <= D.Start
AND M.Finish >= D.Finish
GROUP BY M.ID, M.Start, M.Finish
) AS I
GROUP BY I.ID
ORDER BY I.ID;
id number amount
01 2 71
02 2 31
03 3 66
Review:
Oh! Drat...the entry for 3 has twice the 'amount' that it should have. Previous 'edited' parts indicate where things started to go wrong. It looks as though either the first query is subtly wrong (maybe it is intended for a different question), or the optimizer I'm working with is misbehaving. Nevertheless, there should be an answer closely related to this that will give the correct values.
For the record: tested on IBM Informix Dynamic Server 11.50 on Solaris 10. However, should work fine on any other moderately standard-conformant SQL DBMS.
Probably need to create a cursor and loop through the results, keeping track of which id you are working with and accumulating the data along the way. When the id changes you can insert the accumulated data into a temporary table and return the table at the end of the procedure (select all from it). A table-based function might be better as you can then just insert into the return table as you go along.
I suspect that it may require iteration of some sort but I don't want to go down that path.
I think that's the route you'll have to take, use a cursor to populate a table variable. If you have a large number of records you could use a permanent table to store the results then when you need to retrieve the data you could process only the new data.
I would add a bit field with a default of 0 to the source table to keep track of which records have been processed. Assuming no one is using select * on the table, adding a column with a default value won't affect the rest of your application.
Add a comment to this post if you want help coding the solution.
Well I decided to go down the iteration route using a mixture of joins and cursors. By JOINing the data table against itself I can create a link list of only those records that are consecutive.
INSERT INTO #CONSEC
SELECT a.ID, a.Start, b.Finish, b.Amount
FROM Data a JOIN Data b
ON (a.Finish = b.Start) AND (a.ID = b.ID)
Then I can unwind the list by iterating over it with a cursor, and doing updates back to the data table to adjust (And delete the now extraneous records from the Data table)
DECLARE CCursor CURSOR FOR
SELECT ID, Start, Finish, Amount FROM #CONSEC ORDER BY Start DESC
#Total = 0
OPEN CCursor
FETCH NEXT FROM CCursor INTO #ID, #START, #FINISH, #AMOUNT
WHILE #FETCH_STATUS = 0
BEGIN
#Total = #Total + #Amount
#Start_Last = #Start
#Finish_Last = #Finish
#ID_Last = #ID
DELETE FROM Data WHERE Start = #Finish
FETCH NEXT FROM CCursor INTO #ID, #START, #FINISH, #AMOUNT
IF (#ID_Last<> #ID) OR (#Finish<>#Start_Last)
BEGIN
UPDATE Data
SET Amount = Amount + #Total
WHERE Start = #Start_Last
#Total = 0
END
END
CLOSE CCursor
DEALLOCATE CCursor
This all works and has acceptable performance for typical data that I am using.
I did find one small issue with the above code. Originally I was updating the Data table on each loop through the cursor. But this didn't work. It seems that you can only do one update on a record, and that multiple updates (in order to keep adding data) revert back to the reading the original contents of the record.