Select most recent inspection - sql

I have a ROAD_INSPECTION table:
+----+------------------------+-----------+
| ID | DATE | CONDITION |
+----+------------------------+-----------+
| 1 | 01/01/2009 | 20 |
| 1 | 05/01/2013 | 16 |
| 1 | 04/29/2016 10:02:52 AM | 15 |
+----+------------------------+-----------+
| 2 | 01/01/2009 | 8 |
| 2 | 06/06/2012 9:55:13 AM | 8 |
| 2 | 04/28/2015 | 11 |
+----+------------------------+-----------+
| 3 | 06/11/2012 | 10 |
| 3 | 04/21/2015 | 19 |
+----+------------------------+-----------+
What is the most efficient way to select the most recent inspection? The query would need to include the ID and CONDITION columns, despite the fact that they wouldn't group by cleanly:
+----+------------------------+-----------+
| ID | DATE | CONDITION |
+----+------------------------+-----------+
| 1 | 04/29/2016 10:02:52 AM | 15 |
+----+------------------------+-----------+
| 2 | 04/28/2015 | 11 |
+----+------------------------+-----------+
| 3 | 04/21/2015 | 19 |
+----+------------------------+-----------+

One way could be to retrieve id and date column in derived table and join the output to the main table to retrieve corresponding data from condition column as below.
SELECT t1.id,
t1.date1,
t2.CONDITION1
FROM
(SELECT id,
max(date1) AS date1
FROM table1
GROUP BY id) t1
JOIN table1 t2 ON t1.id = t2.id
AND t1.date1 = t2.date1;
Result:
id date1 CONDITION1
-------------------------------------
1 29.04.2016 10:02:52 15
2 28.04.2015 00:00:00 11
3 21.04.2015 00:00:00 19
DEMO
OR if your rdbms supports windows function, use below.
SELECT id,
date1,
condition1
FROM
(SELECT id,
date1,
condition1,
row_number() over(PARTITION BY id
ORDER BY date1 DESC) AS rn
FROM table1 ) t1
WHERE rn = 1;
DEMO

Related

How to sum rows before a condition is met in SQL

I have a table which has multiple records for the same id. Looks like this, and the rows are sorted by sequence number.
+----+--------+----------+----------+
| id | result | duration | sequence |
+----+--------+----------+----------+
| 1 | 12 | 7254 | 1 |
+----+--------+----------+----------+
| 1 | 12 | 2333 | 2 |
+----+--------+----------+----------+
| 1 | 11 | 1000 | 3 |
+----+--------+----------+----------+
| 1 | 6 | 5 | 4 |
+----+--------+----------+----------+
| 1 | 3 | 20 | 5 |
+----+--------+----------+----------+
| 2 | 1 | 230 | 1 |
+----+--------+----------+----------+
| 2 | 9 | 10 | 2 |
+----+--------+----------+----------+
| 2 | 6 | 0 | 3 |
+----+--------+----------+----------+
| 2 | 1 | 5 | 4 |
+----+--------+----------+----------+
| 2 | 12 | 3 | 5 |
+----+--------+----------+----------+
E.g. for id=1, i would like to sum the duration for all the rows before and include result=6, which is 7254+2333+1000+5. Same for id =2, it would be 230+10+0. Anything after the row where result=6 will be left out.
My expected output:
+----+----------+
| id | duration |
+----+----------+
| 1 | 10592 |
+----+----------+
| 2 | 240 |
+----+----------+
The sequence has to be in ascending order.
I'm not sure how I can do this in sql.
Thank you in advance!
I think you want:
select t2.id, sum(t2.duration)
from t
where t.sequence <= (select t2.sequence
from t t2
where t2.id = t.id and t2.result = 6
);
In PrestoDB, I would recommend window functions:
select id, sum(duration)
from (select t.*,
min(case when result = 6 then sequence end) over (partition by id) as sequence_6
from t
) t
where sequence <= sequence_6;
You can use a simple aggregate query with a condition that uses a subquery to recover the sequence corresponding to the record whose sequence is 6 :
SELECT t.id, SUM(t.duration) total_duration
FROM mytable t
WHERE t.sequence <= (
SELECT sequence
FROM mytable
WHERE id = t.id AND result = 6
)
GROUP BY t.id
This demo on DB Fiddle with your test data returns :
| id | total_duration |
| --- | -------------- |
| 1 | 10592 |
| 2 | 240 |
Basic group by query should solve your issue
select
id,
sum(duration) duration
from t
group by id
for the certain rows:
select
id,
sum(duration) duration
from t
where id = 1
group by id
if you want to include it in your result set
select id, duration, sequence from t
union all
select
id,
sum(duration) duration
null sequence
from t
group by id

Hive window functions: last value of previous partition

Using Hive window functions, I would like to get the last value of the previous partition:
| name | rank | type |
| one | 1 | T1 |
| two | 2 | T2 |
| thr | 3 | T2 |
| fou | 4 | T1 |
| fiv | 5 | T2 |
| six | 6 | T2 |
| sev | 7 | T2 |
Following query:
SELECT
name,
rank,
first_value(rank over(partition by type order by rank)) as new_rank
FROM my_table
Would give:
| name | rank | type | new_rank |
| one | 1 | T1 | 1 |
| two | 2 | T2 | 2 |
| thr | 3 | T2 | 2 |
| fou | 4 | T1 | 4 |
| fiv | 5 | T2 | 5 |
| six | 6 | T2 | 5 |
| sev | 7 | T2 | 5 |
But what I need is "the last value of the previous partition":
| name | rank | type | new_rank |
| one | 1 | T1 | NULL |
| two | 2 | T2 | 1 |
| thr | 3 | T2 | 1 |
| fou | 4 | T1 | 3 |
| fiv | 5 | T2 | 4 |
| six | 6 | T2 | 4 |
| sev | 7 | T2 | 4 |
This seems quite tricky. This is a variant of group-and-islands. Here is the idea:
Identify the "islands" where type is the same (using difference of row numbers).
Then use lag() to introduce the previous rank into the island.
Do a min scan to get the new rank that you want.
So:
with gi as (
select t.*,
(seqnum - seqnum_t) as grp
from (select t.*,
row_number() over (partition by type order by rank) as seqnum_t,
row_number() over (order by rank) as seqnum
from t
) t
),
gi2 as (
select gi.*, lag(rank) over (order by gi.rank) as prev_rank
from gi
)
select gi2.*,
min(prev_rank) over (partition by type, grp) as new_rank
from gi2
order by rank;
Here is a SQL Fiddle (albeit using Postgres).

How to get max value from sql table with group by?

I have a table
Id | Version | DateFrom | DateTo |
_______________________________________
1 | 1 | 2015-09-15 | 2015-09-18 |
1 | 2 | 2015-09-15 | 2015-09-18 |
1 | 3 | 2015-09-15 | 2015-09-20 | --different date
1 | 4 | 2015-09-15 | 2015-09-18 |
2 | 1 | 2015-09-15 | 2015-09-18 |
2 | 2 | 2015-09-15 | 2015-09-18 |
And I'm trying to make a view which return records with the lastest version independently from the rest of the columns.
So for example I expect:
Id | Version | DateFrom | DateTo |
_______________________________________
1 | 4 | 2015-09-15 | 2015-09-18 |
2 | 2 | 2015-09-15 | 2015-09-18 |
This is what I already did:
Select
Id,
MAX([Version]) AS Version,
DateFrom,
DateTo
FROM
dbo.Table_1
Group By
Id,
DateFrom,
DateTo
But the result is:
Id | Version | DateFrom | DateTo |
_______________________________________
1 | 4 | 2015-09-15 | 2015-09-18 |
1 | 3 | 2015-09-15 | 2015-09-20 |
2 | 2 | 2015-09-15 | 2015-09-18 |
Can also be done with a left join :
SELECT t.*
FROM YourTable t
LEFT JOIN YourTable s
ON(t.id = s.id AND t.version < s.version)
WHERE s.version is null
Have a sub-query that returns each id's max version. Join with that result:
select t1.*
from tablename t1
join (select id, max(version) as maxversion
from tablename
group by id) t2
on t1.id = t2.id and t1.version = t2.maxversion
You can run that kind of query :
Select ID, version, dateFrom, dateTo
from table_1 a
where not exists (
select 1
from table_1 b
where a.id = b.id and a.version < b.version)

Update using Self Join Sql Server

I have huge data and sample of the table looks like below
+-----------+------------+-----------+-----------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+-----------+
| 1 | 6/3/2014 | 1 | 6/3/2014 |
| 1 | 5/22/2015 | 2 | NULL |
| 1 | 6/3/2015 | 3 | NULL |
| 1 | 11/20/2015 | 4 | NULL |
| 2 | 2/25/2014 | 1 | 2/25/2014 |
| 2 | 7/31/2014 | 2 | NULL |
| 2 | 8/26/2014 | 3 | NULL |
+-----------+------------+-----------+-----------+
Now I need to check if the difference between Date in 2nd row and Flag_date in 1st row. If the difference is more than 180 then 2nd row Flag_date should be updated with the date in 2nd row else it needs to be updated by Flag_date in 1st Row. And same rule follows for all rows with same unique_ID
update a
set a.Flag_Date=case when DATEDIFF(dd,b.Flag_Date,a.[Date])>180 then a.[Date] else b.Flag_Date end
from Table1 a
inner join Table1 b
on a.RowNumber=b.RowNumber+1 and a.Unique_ID=b.Unique_ID
The above update query when executed once, only the second row under each Unique_ID gets updated and result looks like below
+-----------+------------+-----------+------------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+------------+
| 1 | 2014-06-03 | 1 | 2014-06-03 |
| 1 | 2015-05-22 | 2 | 2015-05-22 |
| 1 | 2015-06-03 | 3 | NULL |
| 1 | 2015-11-20 | 4 | NULL |
| 2 | 2014-02-25 | 1 | 2014-02-25 |
| 2 | 2014-07-31 | 2 | 2014-02-25 |
| 2 | 2014-08-26 | 3 | NULL |
+-----------+------------+-----------+------------+
And I need to run four times to achieve my desired result
+-----------+------------+-----------+------------+
| Unique_ID | Date | RowNumber | Flag_Date |
+-----------+------------+-----------+------------+
| 1 | 2014-06-03 | 1 | 2014-06-03 |
| 1 | 2015-05-22 | 2 | 2015-05-22 |
| 1 | 2015-06-03 | 3 | 2015-05-22 |
| 1 | 2015-11-20 | 4 | 2015-11-20 |
| 2 | 2014-02-25 | 1 | 2014-02-25 |
| 2 | 2014-07-31 | 2 | 2014-02-25 |
| 2 | 2014-08-26 | 3 | 2014-08-26 |
+-----------+------------+-----------+------------+
Is there a way where I can run update only once and all the rows are updated.
Thank you!
If you are using SQL Server 2012+, then you can use lag():
with toupdate as (
select t1.*,
lag(flag_date) over (partition by unique_id order by rownumber) as prev_flag_date
from table1 t1
)
update toupdate
set Flag_Date = (case when DATEDIFF(day, prev_Flag_Date, toupdate.[Date]) > 180
then toupdate.[Date] else prev_Flag_Date
end);
Both this version and your version can take advantage of an index on table1(unique_id, rownumber) or, better yet, table1(unique_id, rownumber, flag_date).
EDIT:
In earlier versions, this might have better performance:
with toupdate as (
select t1.*, t2.flag_date as prev_flag_date
from table1 t1 outer apply
(select top 1 t2.flag_date
from table1 t2
where t2.unique_id = t1.unique_id and
t2.rownumber < t1.rownumber
order by t2.rownumber desc
) t2
)
update toupdate
set Flag_Date = (case when DATEDIFF(day, prev_Flag_Date, toupdate.[Date]) > 180
then toupdate.[Date] else prev_Flag_Date
end);
The CTE can make use of the same index -- and it is important to have the index. The reason for the better performance is because your join on row_number() cannot use an index on that field.

SQL Query to get results that match between three tables, or a single result for no match

Is there a way to use a where clause to check if there were zero matches between tables for a record from the first table, and produce one row or results reflecting that?
I'm trying to get results that look like this:
+----------+----------+-----------+----------+-------------+
| Results |
+----------+----------+-----------+----------+-------------+
| Date | Queue ID | From Date | To Date | Campaign ID |
| 3/1/2014 | 1 | 2/24/2014 | 3/2/2014 | 1 |
| 3/1/2014 | 2 | (NULL) | (NULL) | (NULL) |
+----------+----------+-----------+----------+-------------+
From a combination of tables that look like this:
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| Table 1 | | Table 2 | | Table 3 |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| Date | Queue | | Queue | SP | | SP | From Date | To Date | Campaign |
| | ID | | ID | ID | | ID | | | ID |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
| 3/1/2014 | 1 | | 1 | 1 | | 1 | 2/24/2014 | 3/2/2014 | 1 |
| 3/1/2014 | 2 | | 1 | 2 | | 2 | 3/3/2014 | 3/9/2014 | 5 |
| | | | 1 | 3 | | 3 | 3/10/2014 | 3/16/2014 | 1 |
| | | | 1 | 4 | | 4 | 3/17/2014 | 3/23/2014 | 1 |
| | | | 1 | 5 | | 5 | 3/24/2014 | 3/30/2014 | 4 |
| | | | 2 | 6 | | 6 | 3/3/2014 | 3/9/2014 | 5 |
| | | | 2 | 7 | | 7 | 3/10/2014 | 3/16/2014 | 5 |
| | | | 2 | 8 | | 8 | 3/17/2014 | 3/23/2014 | 5 |
| | | | 2 | 9 | | 9 | 3/24/2014 | 3/30/2014 | 5 |
+----------+-------+ +-------+----+ +----+-----------+-----------+----------+
I'm joining Table 1 to Table 2 on QUEUE ID,
and Table 2 to Table 3 on SP ID,
and DATE from Table 1 should fall between Table 3's FROM DATE and TO DATE.
I want a single record returned for each queue, including if there were no date matches.
Unfortunately any combinations of joins or where clauses I've tried so far only result in either one record for Queue ID 1 or multiple records for each Queue ID.
I would suggest this:
SELECT
t1.Date,
t1.QueueID,
s.FromDate,
s.ToDate,
s.CampaignID
FROM
Table1 t1
LEFT JOIN
(
SELECT
t2.QueueID,
t3.FromDate,
t3.ToDate,
t3.CampaignID
FROM
Table2 t2
INNER JOIN
Table3 t3 ON
t2.SPID = t3.SPID
) s ON
t1.QueueID = s.QueueID AND
t1.Date BETWEEN s.FromDate AND s.ToDate
SQL Fiddle here with an abbreviated dataset
A trivial amendment to AHiggins code. Using the CTE makes it a little easier to read perhaps.
With AllDates as
(
SELECT
t2.QueueID,
t3.FromDate,
t3.ToDate,
t3.CampaignID
FROM Table2 t2
INNER JOIN Table3 t3 ON
t2.SPID = t3.SPID
)
SELECT
t1.Date,
t1.QueueID,
s.FromDate,
s.ToDate,
s.CampaignID
FROM Table1 t1
LEFT JOIN AllDates s ON
t1.QueueID = s.QueueID AND
t1.Date BETWEEN s.FromDate AND s.ToDate
You want something like:
select distinct t1.date, t1,queue_id IFNULL(t3.from_date,'NULL'),
IFNULL(t3.to_date,'NULL'), IFNULL(t3.campaign,'NULL')
FROM table1 t1
LEFT OUTER JOIN table2 t2 on t1.queue_id = t2.queue_id
left outer join table3 t3 on t2.sp_id = t3.sp_id
where t3.from_date <= t1.date
AND t3.to_date >= t1.date
This will select dsitinct records from the table (eliminating null duplicates and replacing them with NULL)
SELECT t1.[Date], t1.[Queue ID], s.[From Date], s.[To Date], s.[Campaign ID]
FROM table1 t1
LEFT JOIN (SELECT t3.*, t2.[Queue ID] FROM table3 t3 JOIN table2 t2 ON t2.[SP ID] = t3.[SP ID]) s
ON s.[Queue ID] = t1.[Queue ID] AND t1.[Date] BETWEEN s.[From Date] AND s.[To Date]
SQL Fiddle