How can I select rows where number went down as date went up - sql

If I have a program at a repair shop and I want to select all of the cars in my RepairOrder table where the mileage of the later repair order is less than the mileage of the prior repair order, how can I build that select statement?
ID VehicleID Mileage RepairDate
01 1 18425 2013-08-13
02 1 28952 2013-02-26
03 2 22318 2012-08-27
04 3 21309 2012-08-07
05 3 16311 2012-02-27
06 3 16310 2012-02-11
07 4 11098 2011-03-23
08 5 21309 2012-08-07
09 5 16309 2012-02-27
10 5 16310 2012-02-11
In this case I should only be selecting VehicleID 1 because it has a RepairDate that is greater then the previous row, but a Mileage that is less than the previous row. There could also be 3 rows with the same vehicle and the middle date has a mileage of 3 or 5000000, and I will need to select those VehicleID's as well.
Results from using the LEAD() function
ID RepairDate Mileage
25 2011-12-23 45934
48 2009-02-26 13
48 2009-04-24 10
71 2011-07-26 31163
71 2015-01-13 65656

This is a great place to use LEAD() function for sql 2014+
SQL FIDDLE DEMO
WITH NextM as (
SELECT
* ,
LEAD(Mileage, 1, null) over (partition by VehicleID order by RepairDate) NextMileage
FROM RepairOrder
)
SELECT *
FROM NextM
WHERE Mileage > NextMileage
My solution show all columns so you can check what row have the problem.
Also I avoid using distinct because as OP suggest there may be several mistake for same car and this way you can see it all.

It's not terribly efficient, but you could do a pairwise selection
select t1.VehicleID
from table t1, table t2
where t1.VehicleId = t2.VehicleId
AND t1.Mileage > t2.Mileage
AND t1.RepairDate < t2.RepairDate
There is likely a better solution as pair-wise selections get EXTREMELY SLOW, but this should work as-is.

select distinct RO.VehicleID
from RepairOrder RO
where exists(select *
from RepairOrder
where ID != RO.ID
and VehicleID = RO.VehicleID and RepairDate > RO.RepairDate
and Mileage < RO.Mileage);

WITH RepairSeqs AS(
SELECT
DateSeq = Row_Number OVER (PARTITION BY VehicleID ORDER BY RepairDate),
MileageSeq = Row_Number OVER (PARTITION BY VehicleID ORDER BY Mileage),
*
FROM
dbo.RepairOrder
)
SELECT *
FROM RepairSeqs
WHERE DateSeq <> MileageSeq;

select distinct t.VehicleId
from (
select t.*, LEAD(Mileage) OVER (Partition by VehicleId ORDER BY RepairDate) LeadMileageValue
from RepairOrder t
) t
where t.Mileage > t.LeadMileageValue

Related

How to find observations that occurs at least 3 times spanning at least 15 days but no more than 90 days for each unique ID in SQL?

Suppose I have this table:
CREATE TABLE #t1
(
PersonID int ,
ExamDates date,
Score varchar(50) SPARSE NULL,
);
SET dateformat mdy;
INSERT INTO #t1 (PersonID, ExamDates, Score)
VALUES (1, '1.1.2018',70),
(1, '1.13.2018', 100),
(1, '1.18.2018', 85),
(2, '1.1.2018', 90),
(2, '2.1.2018', 95),
(2, '3.15.2018', 95),
(2, '7.30.2018', 100),
(3, '1.1.2018', 80),
(3, '1.2.2018', 80),
(3, '5.3.2018', 50),
(4, '2.1.2018', 90),
(4, '2.20.2018', 100);
I would like to find observations that occurs at least 3 times spanning at least 15 days but no more than 90 days for each unique ID.
My final table should look like this:
PersonID
ExamDates
Score
1
1/1/2018
70
1
1/13/2018
100
1
1/18/2018
85
2
1/1/2018
90
2
2/1/2018
95
2
3/15/2018
95
We have code working for this using R, but would like to avoid pulling large datasets into R just to run this code. We are doing this in a very large dataset and concerned about efficiency of the query.
Thanks!
-Peter
To start with, the common name for this situation is Gaps and Islands. That will help you as you search for answers or come up with similar problems in the future.
That out of the way, here is my solution. Start with this:
WITH Leads As (
SELECT t1.*
, datediff(day, ExamDates, lead(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates)) As Diff
FROM t1
)
SELECT *
FROM Leads
WHERE Diff BETWEEN 15 AND 90
I have to use the CTE, because you can't put a windowing function in a WHERE clause. It produces this result, which is only part of what you want:
PersonID
ExamDates
Score
Diff
1
2018-01-01
70
17
2
2018-01-01
90
73
This shows the first record in each group. We can use it to join back to the original table and find all the records that meet the requirements.
But first, we have a problem. The sample data only has groups with exactly three records. However, the real data might end up with groups with more than three items. In that case this would find multiple first records from the same group.
You can see it in this updated SQL Fiddle, which adds an additional record for PersonID #1 that is still inside the date range.
PersonID
ExamDates
Score
Diff
1
2018-01-01
70
17
1
2018-01-13
100
29
2
2018-01-01
90
73
I'll be using this additional record in every step from now on.
To account for this, we also need to check to see each record is not in the middle or end of a valid group. That is, also look a couple records both ahead and behind.
WITH Diffs As (
SELECT #t1.*
, datediff(day, ExamDates, lead(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates)) As LeadDiff2
, datediff(day, ExamDates, lead(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates)) As LeadDiff1
, datediff(day, lag(ExamDates, 1, NULL) over (partition by PersonID ORDER BY ExamDates), ExamDates) as LagDiff1
, datediff(day, lag(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates), ExamDates) as LagDiff2
FROM #t1
)
SELECT *
FROM Diffs
WHERE LeadDiff2 BETWEEN 15 AND 90
AND coalesce(LeadDiff1 + LagDiff1,100) > 90 /* Not in the middle of a valid group */
AND coalesce(Lagdiff2, 100) > 90 /* Not at the end of a valid group */
This code gets us back to the original results, even with the additional record. Here's the updated fiddle:
http://sqlfiddle.com/#!18/ea12ad/23
Now we can join back to the original table and find all records in each group:
WITH Diffs As (
SELECT 3t1.*
, datediff(day, ExamDates, lead(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates)) As LeadDiff2
, datediff(day, ExamDates, lead(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates)) As LeadDiff1
, datediff(day, lag(ExamDates, 1, NULL) over (partition by PersonID ORDER BY ExamDates), ExamDates) as LagDiff1
, datediff(day, lag(ExamDates, 2, NULL) over (partition by PersonID ORDER BY ExamDates), ExamDates) as LagDiff2
FROM #t1
), FirstRecords AS (
SELECT PersonID, ExamDates, DATEADD(day, 90, ExamDates) AS FinalDate
FROM Diffs
WHERE LeadDiff2 BETWEEN 15 AND 90
AND coalesce(LeadDiff1 + LagDiff1,100) > 90 /* Not in the middle of a valid group */
AND coalesce(lagdiff2, 100) > 90 /* Not at the end of a valid group */
)
SELECT t.*
FROM FirstRecords f
INNER JOIN #t1 t ON t.PersonID = f.PersonID
AND t.ExamDates >= f.ExamDates
AND t.ExamDates <= f.FinalDate
ORDER BY t.PersonID, t.ExamDates
That gives me this, which matches your desired output and my extra record:
PersonID
ExamDates
Score
1
2018-01-01
70
1
2018-01-13
100
1
2018-01-18
85
1
2018-02-11
89
2
2018-01-01
90
2
2018-02-01
95
2
2018-03-15
95
See it work here:
http://sqlfiddle.com/#!18/ea12ad/26
Here's Eli's idea done a bit more simply, and moving all of the heavy computation to the cte, where it may possibly be more efficient:
With cte As (
Select PersonID, ExamDates
,Case When Datediff(DAY,ExamDates, Lead(ExamDates,2,Null) Over (Partition by PersonID Order by ExamDates)) Between 15 and 90
Then Lead(ExamDates,2,Null) Over (Partition by PersonID Order by ExamDates)
Else NULL End as EndDateRange
From #t1
)
Select Distinct B.*
From cte Inner Join #t1 B On B.PersonID=cte.PersonID
And B.ExamDates Between cte.ExamDates and cte.EndDateRange
The Case statement in the CTE only returns a valid date if the entry two items later satisfies the overall condition; that date is used to form a range with the current record's ExamDate. By returning NULL on non-qualified ranges we ensure the join in the outer part of the SQL is not satisfied. The Distinct clause is needed to collapse duplicates when there are are 4+ consecutive observations within the 15-90 day range.
You'll need a CTE to identify the base for the conditions which you described.
This code works with your sample set, and should work even when you have a larger set - though may require a distinct if you have overlapping results, i.e. 5 exam dates in the 15-90 range.
WITH cte AS(
SELECT
PERSONID
,EXAMDATES
,Score
,COUNT(*) OVER (PARTITION BY PERSONID ORDER BY ExamDates ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW )AS COUNTS
,LAG(ExamDates,2,NULL) OVER (PARTITION BY PERSONID ORDER BY ExamDates) DIFFS
FROM #t1
)
SELECT B.*
FROM CTE
INNER JOIN #T1 B ON CTE.PERSONID = B.PERSONID
WHERE CTE.COUNTS >=3
AND DATEDIFF(DAY,CTE.DIFFS,CTE.EXAMDATES) BETWEEN 15 AND 90
AND B.EXAMDATES BETWEEN CTE.DIFFS AND CTE.EXAMDATES

Find out the last updated record in my DB using MAX in CASE statement

I have APPLICATIONSTATUSLOG_ID primary key field on my table.
In order to find out the last updated record in my DB and the MAX(APPLICATIONSTATUSLOG_ID) is presumed to be the most recent record.
I tried this code :
SELECT
MAX(CASE WHEN MAX(d.ApplicationStatusLog_ID) = d.ApplicationStatusLog_ID THEN d.ApplicationStatusID END) AS StatusID,
FROM
ApplicationStatusLog d
But I get error:
Msg 130, Level 15, State 1, Line 53 Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
My table looks like
ApplicationID - ApplicationStatusID - ApplicationStatusLogID
10000 17 100
10000 08 101
10000 10 102
10001 06 103
10001 10 104
10002 06 105
10002 07 106
My output should be:
10000 10
10001 10
10002 07
Please help me understand and resolve my problem.
If you want to just find the last updated row, given that it has max value in APPLICATIONSTATUSLOG_ID column. The query would be:
SELECT *
FROM ApplicationStatusLog
WHERE ApplicationStatusLog_ID = (SELECT MAX(ApplicationStatusLog_ID) FROM ApplicationStatusLog )
EDIT
So as you stated in comment, the query for it will be:
DECLARE #statusId INT
SELECT #statusId = STATUSID
FROM ApplicationStatusLog
WHERE ApplicationStatusLog_ID = (SELECT MAX(ApplicationStatusLog_ID) FROM ApplicationStatusLog )
EDIT 2:
The query as per your edit in question will be:
WITH C AS
(
SELECT ApplicationID,ApplicationStatusID,ApplicationStatusLogID, ROW_NUMBER() OVER (PARTITION BY ApplicationID ORDER BY ApplicationStatusLogID DESC) AS ranking
FROM ApplicationStatusLog
)
SELECT ApplicationID,ApplicationStatusID
FROM C
WHERE ranking = 1
You can join same table twice like this:
select IT.JoiningID, JT.MAXAPPLICATIONSTATUSID FROM dbo.[Table] IT
INNER JOIN (
Select JoiningID, MAX (APPLICATIONSTATUSID) MAXAPPLICATIONSTATUSID
FROM dbo.[Table]
GROUP BY JoiningID
) JT ON IT.JoiningID = JT.JoiningID
Now you have MAXAPPLICATIONSTATUSID per ID so you can write what you wand based on MAXAPPLICATIONSTATUSID.
Without full query
SELECT
x.StatusId
...
FROM <Table> a
CROSS APPLY
(
SELECT x.APPLICATIONSTATUSID as StatusId
FROM <Table> x
HAVING MAX(APPLICATIONSTATUSLOG_ID) = a.APPLICATIONSTATUSLOG_ID
GROUP BY x.APPLICATIONSTATUSID
)

How to display only few columns of a table based on the data

Consider the table below
Name partno. sch_date WO# owed panels
aa 1234 08/22/2017 121 22 26
aa 1234 08/22/2017 222 22 27
aa 1234 08/22/2017 242 22 27
aa 1234 08/29/2017 152 20 24
aa 1234 08/29/2017 167 20 24
aa 1234 08/29/2017 202 20 26`
Is it possible to display the data in such way that when the number of panels is greater than owed, then i don't won't to dispaly the other partno. schedule on the same date(sch_date).
Expected Result
Name partno. sch_date WO# owed panels
aa 1234 08/22/2017 121 22 26
aa 1234 08/29/2017 152 20 24
Cross apply may help here. (note you can see why I asked about order in my earlier comment as the ORDER of records in a table is not guaranteed! we need to know in what you want the records evaluated! Date isn't enough (unless it has a time compoent not displayed that is different!)
WORKING example on Rextester: http://rextester.com/CAUK18185
Many assumptions made:
When owned is > panels you don't need to see the record.
You want to see the the lowest WO# when owed is < panels and suppress all other records including ones where owed > panels.
If there are no records for a date, name and partno that have owed < panels, you want to see no records.
If these assumptions are incorrect, please provide a better sample set and expected results to test these types of situations and explain what you want to have happen.
SELECT Distinct B.*
FROM tblName Z
CROSS APPLY (SELECT TOP 1 A.*
FROM tblName A
WHERE A.owed < A.panels
and Z.Name = A.Name
and Z.[partno.] = a.[partno.]
and Z.sch_date = a.sch_date
ORDER by A.name, A.[partno.], A.sch_date, A.[wo#]) B
For each record in A run a query which returns the lowest wo# for a name, partno and sch_date when the owed < panels.
UPDATED:
I see in a comment you want to keep records if owed > panels... if it's encountered first.... but what if it's not encountered first?
http://rextester.com/NXS51018
--First we get all the records w/ a owed < panels per group and assign the earliest row (that having the lowest WO) a RN of 1. then we return that set.
cte as (
Select A.*, row_number() over (Partition by Name, [partno.], sch_date ORDER BY [WO#]) RN
from tblName A
where owed < panels)
Select * from cte where RN =1
UNION ALL
--We then union in the records where owed >=panels and their WO# < the wo# from the CTE.
SELECT Z.*, 0 as rn FROM tblName Z where owed >=panels
and exists (Select * from cte
where Z.name = CTE.name
and Z.[partno.] = cte.[partno.]
and Z.sch_date = cte.sch_date
and CTE.[WO#] > Z.[WO#]) --Now this line may not be needed, depending on if you want all or just some of the WO#'s when owed >=panels.
ORDER BY name, [partno.], Sch_date, [Wo#]
After last comment update:
cte as (
Select A.*, row_number() over (Partition by Name, [partno.], sch_date ORDER BY [WO#]) RN
from tblName A
where owed < panels),
cte2 as (Select * from cte where RN =1
UNION ALL
SELECT Z.*, 0 as rn FROM tblName Z where owed >=panels
and exists (Select * from cte
where Z.name = CTE.name
and Z.[partno.] = cte.[partno.]
and Z.sch_date = cte.sch_date
and CTE.[WO#] > Z.[WO#]))
Select * into SOQ#45619304 from CTE2; --This line creates the table based on the 2nd cte results.
Select * from SOQ#45619304;
You can try this -
SELECT Name, partno., sch_date, WO#, owed, panels
FROM YOUR_TABLE
WHERE panels < owed
UNION ALL
SELECT Name, partno., sch_date, MIN(WO#), owed, MIN(panels)
FROM YOUR_TABLE
WHERE panels > owed
GROUP BY Name, partno., sch_date, owed
ORDER BY Name

What's the most efficient way to match values between 2 tables based on most recent prior date?

I've got two tables in MS SQL Server:
dailyt - which contains daily data:
date val
---------------------
2014-05-22 10
2014-05-21 9.5
2014-05-20 9
2014-05-19 8
2014-05-18 7.5
etc...
And periodt - which contains data coming in at irregular periods:
date val
---------------------
2014-05-21 2
2014-05-18 1
Given a row in dailyt, I want to adjust its value by adding the corresponding value in periodt with the closest date prior or equal to the date of the dailyt row. So, the output would look like:
addt
date val
---------------------
2014-05-22 12 <- add 2 from 2014-05-21
2014-05-21 11.5 <- add 2 from 2014-05-21
2014-05-20 10 <- add 1 from 2014-05-18
2014-05-19 9 <- add 1 from 2014-05-18
2014-05-18 8.5 <- add 1 from 2014-05-18
I know that one way to do this is to join the dailyt and periodt tables on periodt.date <= dailyt.date and then imposing a ROW_NUMBER() (PARTITION BY dailyt.date ORDER BY periodt.date DESC) condition, and then having a WHERE condition on the row number to = 1.
Is there another way to do this that would be more efficient? Or is this pretty much optimal?
I think using APPLY would be the most efficient way:
SELECT d.Val,
p.Val,
NewVal = d.Val + ISNULL(p.Val, 0)
FROM Dailyt AS d
OUTER APPLY
( SELECT TOP 1 Val
FROM Periodt p
WHERE p.Date <= d.Date
ORDER BY p.Date DESC
) AS p;
Example on SQL Fiddle
If there relatively very few periodt rows, then there is an option that may prove quite efficient.
Convert periodt into a From/To ranges table using subqueries or CTEs. (Obviously performance depends on how efficiently this initial step can be done, which is why a small number of periodt rows is preferable.) Then the join to dailyt will be extremely efficient. E.g.
;WITH PIds AS (
SELECT ROW_NUMBER() OVER(ORDER BY PDate) RN, *
FROM #periodt
),
PRange AS (
SELECT f.PDate AS FromDate, t.PDate as ToDate, f.PVal
FROM PIds f
LEFT OUTER JOIN PIds t ON
t.RN = f.RN + 1
)
SELECT d.*, p.PVal
FROM #dailyt d
LEFT OUTER JOIN PRange p ON
d.DDate >= p.FromDate
AND (d.DDate < p.ToDate OR p.ToDate IS NULL)
ORDER BY 1 DESC
If you want to try the query, the following produces the sample data using table variables. Note I added an extra row to dailyt to demonstrate no periodt entries with a smaller date.
DECLARE #dailyt table (
DDate date NOT NULL,
DVal float NOT NULL
)
INSERT INTO #dailyt(DDate, DVal)
SELECT '20140522', 10
UNION ALL SELECT '20140521', 9.5
UNION ALL SELECT '20140520', 9
UNION ALL SELECT '20140519', 8
UNION ALL SELECT '20140518', 7.5
UNION ALL SELECT '20140517', 6.5
DECLARE #periodt table (
PDate date NOT NULL,
PVal int NOT NULL
)
INSERT INTO #periodt
SELECT '20140521', 2
UNION ALL SELECT '20140518', 1

Selecting and sorting data from a single table

Correction to my question....
I'm trying to select and sort in a query from a single table. The primary key for the table is a combination of a serialized number and a time/date stamp.
The table's name in the database is "A12", the columns are defined as:
Serial2D (PK, char(25), not null)
Completed (PK, datetime, not null)
Result (smallint, null)
MachineID (FK, smallint, null)
PT_1 (float, null)
PT_2 (float, null)
PT_3 (float, null)
PT_4 (float, null)
Since the primary key for the table is a combination of the "Serial2D" and "Completed", there can be multiple "Serial2D" entries with different values in the "Completed" and "Result" columns. (I did not make this database... I have to work with what I got)
I want to write a query that will utilize the value of the "Result" column ( always a "0" or "1") and retrive only unique rows for each "Serial2D" value. If the "Result" column has a "1" for that row, I want to choose it over any entries with that Serial that has a "0" in the Result column. There should be only one entry in the table that has a Result column entry of "1" for any Serial2D value.
Ex. table
Serial2d Completed Result PT_1 PT_2 PT_3 PT_4
------- ------- ------ ---- ---- ---- ----
A1 1:00AM 0 32.5 20 26 29
A1 1:02AM 0 32.5 10 29 40
A1 1:03AM 1 10 5 4 3
B1 1:04AM 0 29 4 1 9
B1 1:05AM 0 40 3 4 9
C1 1:06AM 1 9 7 6 4
I would like to be able to retrieve would be:
Serial2d Completed Result PT_1 PT_2 PT_3 PT_4
------- ------- ------ ---- ---- ---- ----
A1 1:03AM 1 10 5 4 3
B1 1:05AM 0 40 3 4 9
C1 1:06AM 1 9 7 6 4
I'm new to SQL and I'm still learning ALL the syntax. I'm finding it difficult to search for the correct operators to use since I'm not sure what I need, so please forgive my ignorance. A post with my answer could be staring me right in the face and i wouldn't know it, please just point me to it.
I appreciate the answers to my previous post, but the answers weren't sufficient for me due to MY lack of information and ineptness with SQL. I know this is probably insanely easy for some, but try to remember when you first started SQL... that's where I'm at.
Since you are using SQL Server, you can use Windowing Functions to get this data.
Using a sub-query:
select *
from
(
select *,
row_number() over(partition by serial2d
order by result desc, completed desc) rn
from a12
) x
where rn = 1
See SQL Fiddle with Demo
Or you can use CTE for this query:
;with cte as
(
select *,
row_number() over(partition by serial2d
order by result desc, completed desc) rn
from a12
)
select *
from cte c
where rn = 1;
See SQL Fiddle With Demo
You can group by Serial to get the MAX of each Time.
SELECT Serial, MAX([Time]) AS [Time]
FROM myTable
GROUP BY Serial
HAVING MAX(Result) => 0
SELECT
t.Serial,
max_Result,
MAX([time]) AS max_time
FROM
myTable t inner join
(SELECT
Serial,
MAX([Result]) AS max_Result
FROM
myTable
GROUP BY
Serial) m on
t.serial = m.serial and
t.result = m.max_result
group by
t.serial,
max_Result
This can be solved using a correlated sub-query:
SELECT
T.serial,
T.[time],
0 AS result
FROM tablename T
WHERE
T.result = 1
OR
NOT EXISTS(
SELECT 1
FROM tablename
WHERE
serial = T.serial
AND (
[time] > T.[time]
OR
result = 1
)
)