SQL How to combine Timestamps to show a duration? - sql

I have the following query where the data is the location of an ITEM at any given time in a STORAGE_BOX, where the location of the item can be specified further with a delimiter so you could have STORAGE_BOX/SLOT/TRAY/ROW_ID. This is why I do the SUBSTRING for that column.
What I'm trying to do: Create a view where to show what START_DATE and END_DATE a particular item was in a certain storage box.
select at.primary_key as 'ITEM_ID', f.name as 'INSTITUTION', SUBSTRING(sc.location +'/',0, CHARINDEX('/', sc.location + '/'))
as 'STORAGE_BOX', at.db_timestamp as 'TIMESTAMP' from [dbo].AUDIT_TRAIL at
RIGHT JOIN [dbo].BIOMATERIAL bio on at.primary_key = bio.id
LEFT JOIN [dbo].FACILITY f on bio.at_facility_id = f.id
LEFT JOIN [dbo].STORAGE_CONTAINER sc on at.primary_key = sc.id
where at.table_name = 'Biomaterial' AND sc.location IS NOT NULL
So the following output from the above query
+---------+-------------+-------------+------------+
| ITEM_ID | INSTITUTION | STORAGE_BOX | TIMESTAMP |
+---------+-------------+-------------+------------+
| 1 | Building#1 | STORAGE_0 | 2012-03-25 |
| 1 | Building#1 | STORAGE_0 | 2013-12-25 |
| 1 | Building#1 | STORAGE_1 | 2015-03-25 |
| 2 | Building#2 | STORAGE_3 | 2012-03-25 |
| 2 | Building#2 | STORAGE_4 | 2013-03-25 |
| 2 | Building#2 | STORAGE_5 | 2015-03-25 |
+---------+-------------+-------------+------------+
And change it into the below result where the START_DATE is the first timestamp of the new STORAGE_BOX and the END_DATE is the next timestamp of whatever STORAGE_BOX is next or the current timestamp if it is still there.
I have no idea how to compute these fields in the above query to get it to show
+---------+-------------+-------------+------------+---------------------+
| ITEM_ID | INSTITUTION | STORAGE_BOX | START_DATE | END_DATE |
+---------+-------------+-------------+------------+---------------------+
| 1 | Building#1 | STORAGE_0 | 2012-03-25 | 2015-03-25 |
| 1 | Building#1 | STORAGE_1 | 2015-03-25 | {Current_TimeStamp} |
| 2 | Building#2 | STORAGE_3 | 2012-03-25 | 2013-03-25 |
| 2 | Building#2 | STORAGE_4 | 2013-03-25 | 2015-03-25 |
| 2 | Building#2 | STORAGE_5 | 2015-03-25 | {Current_TimeStamp} |
+---------+-------------+-------------+------------+---------------------+
EDIT
I used the answer provided by Gordon Linoff to create the following query with sql server 2008 limitations
with t as (
select at.transaction_uid,at.primary_key as BIOMATERIAL_ID, f.name as INSTITUTION,
at.new_value as FREEZER,
at.db_timestamp as TIMESTAMP
from [dbo].AUDIT_TRAIL at RIGHT JOIN
[dbo].BIOMATERIAL bio
on at.primary_key = bio.id LEFT JOIN
[dbo].FACILITY f
on bio.at_facility_id = f.id
where at.table_name = 'Biomaterial' AND at.column_name = 'container_id.location' AND at.new_value IS NOT NULL
),
t1 as (
select t.*,
row_number() over (partition by BIOMATERIAL_ID, INSTITUTION, FREEZER
order by timestamp) as seqnum
from t
),
t2 as(
select t1.*,
ROW_NUMBER() over (partition by BIOMATERIAL_ID order by seqnum) as seqnum_b
from t1
where t1.seqnum = 1
)
SELECT a.BIOMATERIAL_ID, a.INSTITUTION, a.FREEZER, a.TIMESTAMP as START_DATE,coalesce(b.TIMESTAMP, getdate()) as END_DATE
FROM t2 a left join t2 b on a.BIOMATERIAL_ID = b.BIOMATERIAL_ID AND a.seqnum_b = (b.seqnum_b + 1) order by a.BIOMATERIAL_ID

You can do this using window functions. First use row_number() to get just the first row for each group:
with t as (
select at.primary_key as ITEM_ID, f.name as INSTITUTION,
SUBSTRING(sc.location +'/',0, CHARINDEX('/', sc.location + '/')) as STORAGE_BOX,
at.db_timestamp as TIMESTAMP
from [dbo].AUDIT_TRAIL at RIGHT JOIN
[dbo].BIOMATERIAL bio
on at.primary_key = bio.id LEFT JOIN
[dbo].FACILITY f
on bio.at_facility_id = f.id LEFT JOIN
[dbo].STORAGE_CONTAINER sc
on at.primary_key = sc.id
where at.table_name = 'Biomaterial' AND sc.location IS NOT NULL
),
t1 as (
select t.*,
row_number() over (partition by ITEM_ID, INSTITUTION, STORAGE_BOX
order by timestamp) as seqnum
from t
),
t2 as (
select t1.*, lead(timestamp) over (partition by item_id, institution order by timestamp) as next_timestamp
from t1
where seqnum = 1
)
select t2.ITEM_ID, t2.INSTITUTION, t2.STORAGE_BOX,
t2.timstamp as START_DATE,
coalesce(t2.next_timestamp, getdate()) as END_DATE
from t2 ;
The first CTE is your query. The second enumerates the rows for each item, institution, and storage box to eliminate duplicates. This appears to be the logic for your query, although if a storage box is used twice for the same item/location somewhat more complicated logic may be necessary.
The third CTE, t2 gets the next timestamp. And the final query applies the logic.
This assumes SQL Server 2012+ (based on your syntax I'm assuming SQL Server). You can do something similar with outer apply in earlier versions.

Related

Tie-breaking mutliple matches on MAX() in SQL

I have a table that looks like this:
| client_id | program_id | provider_id | date_of_service | data_entry_date | data_entry_time |
| --------- | ---------- | ----------- | --------------- | --------------- | --------------- |
| 2 | 5 | 6 | 02/02/2022 | 02/02/2022 | 0945 |
| 2 | 5 | 6 | 02/02/2022 | 02/07/2022 | 0900 |
| 2 | 5 | 6 | 02/04/2022 | 02/04/2022 | 1000 |
| 2 | 5 | 6 | 02/04/2022 | 02/04/2022 | 1700 |
| 2 | 5 | 6 | 02/04/2022 | 02/05/2022 | 0800 |
| 2 | 5 | 6 | 02/04/2022 | 02/05/2022 | 0900 |
I need to get the most recent date_of_service entered. From the table above, the desired result/row is:
date_of_service = 02/04/2022, data_entry_date = 02/05/2022, data_entry_time = 0900
This resulting date_of_service will be left joined to the master table.
This query mostly works:
SELECT t1.client_id, t1.program_id, t1.provider_id, t2.date_of_service
FROM table1 as t1
WHERE provider_id = '6'
LEFT JOIN
(SELECT client_id, program_id, provider_id, date_of_service
FROM table2) as t2
ON t2.client_id = t1.client_id
AND t2.program_id = t1.program_id
AND t2.provider_id = t1.provider_id
AND t2.date_of_service =
(SELECT MAX(date_of_service)
FROM t2 as t3
WHERE t3.client_id = t1.client_id
AND t3.program_id = t1.program_id
AND t3.provider_id = t1.provider_id
)
)
But it also returns multiple rows whenever there is more than one match on the max(date_of_service).
To solve this, I need to use the max data_entry_date to break any ties whenever there is more than one row that matches the max(date_of_service). Likewise, I also need to use the max data_entry_time to break any ties whenever there is more than one row that also matches the max data_entry_date.
I tried the following:
SELECT t1.client_id, t1.program_id, t1.provider_id, t2.date_of_service
FROM table1 as t1
WHERE provider_id = '6'
LEFT JOIN
(SELECT TOP(1) client_id, program_id, provider_id, date_of_service, data_entry_date, data_entry_time
FROM table2
ORDER BY date_of_service DESC, data_entry_date DESC, data_entry_time DESC
) as t2
ON t2.client_id = t1.client_id
AND t2.program_id = t1.program_id
AND t2.provider_id = t1.provider_id
But I can only get it to return null values for the date_of_service.
Likewise, this:
SELECT t1.client_id, t1.program_id, t1.provider_id, t2.date_of_service
FROM table1 as t1
WHERE provider_id = '6'
LEFT JOIN
(
SELECT TOP(1) client_id AS client_id2, program_id AS program_id2, provider_id AS provider_id2, date_of_service, data_entry_date, data_entry_time
FROM table2 AS t3
JOIN
(SELECT
MAX(date_of_service) AS max_date_of_service
,MAX(data_entry_date) AS max_data_entry_date
FROM table1
WHERE date_of_service = (SELECT MAX(date_of_service) FROM table2)
) AS t4
ON t3.date_of_service = t4.max_date_of_service
AND t3.data_entry_date = t4.max_data_entry_date
ORDER BY data_entry_time
) AS t2
ON t2.client_id2 = t1.client_id
AND t2.program_id2 = t1.program_id
AND t2.provider_id2 = t1.provider_id
... works (meaning it doesn't throw any errors), but it only seems to return null values for me.
I've tried various combinations of MAX, ORDER BY, and multiple variations of JOIN's, but haven't found one that works yet.
I don't know what version my SQL database is, but it doesn't appear to handle window functions like OVER and PARTITION or other things like COALESCE. I've been using DBeaver 22.2.0 to test the SQL scripts.
Based on your what you've provided, looks like you can simply query table2:
SELECT client_id, program_id, provider_id, MAX(date_of_service), MAX(data_entry_date), MAX(data_entry_time)
FROM table2
GROUP BY client_id, program_id, provider_id
If you need to join this result set to table1, just JOIN to the statement above on client_id, program_id, provider_id
Try using below query. This is using just joins and sub query.
SELECT TOP 1 * FROM table1 t1
JOIN (
SELECT
MAX(date_of_Service) AS Max_date_of_Service
,MAX(data_entry_date) AS Max_data_entry_date
FROM table1
WHERE date_of_Service = (SELECT MAX(date_of_Service) FROM table1)
)t2
ON t1.date_of_Service = t2.Max_date_of_Service
AND t1.data_entry_date = t2.Max_data_entry_date
ORDER BY data_entry_time

Linking tables on multiple criteria

I've got myself in a bit of a mess on something I'm doing where I'm trying to get two tables linked together based on multiple bits of info.
I want to link one table to another based on the basic rules of(in this hierarchy)
where main linking is where orderid matches between the two tables
records from table 2 where valid=Y,
from those i want the valid records which has the highest seqn1 number and then from those the one that has the highest seqn2 value
table1
orderid | date | otherinfo
223344 | 22/10/2020 | okokkokokooeodijjf
table2
orderid | seqn1 | seqn2 | valid | additonaldata
223344 | 1 | 3 | y | sdfsfsf
223344 | 2 | 1 | y | sffferfr
223344 | 2 | 2 | y | sfrfrefr -- This row
223344 | 2 | 3 | n | rfrg66rr
223344 | 2 | 4 | n | adwere
223344 | 3 | 4 | n | adwere
so would want the final record to be
orderid | date | otherinfo | seqn1 | seqn2 | valid | additonaldata
223344 | 22/10/2020 | okokkokokooeodijjf | 2 | 2 | y | sfrfrefr
I started off with the code below but I'm not sure I'm doing it right and I can't seem to get it to pay attention to the valid flag when i try to add it in.
SELECT * FROM table1
left JOIN table2
ON table1.orderid = table2.orderid
AND table2.seqn1 = (SELECT MAX(table2.seqn1) FROM table2 WHERE table1.orderid = table2.orderid)
AND table2.seqn2 = (SELECT MAX(table2.seqn2) FROM table2 WHERE table1.orderid = table2.orderid
AND table2.seqn1 = (SELECT MAX(table2.seqn1) FROM table2 WHERE table1.orderid = table2.orderid))
Could someone help me amend the code please.
Use row_number analytic function with partition by orderid and order by SEQNRs in the order you need. No need for multiple subselects. To add more selections for the single row, use CASE to map your values to numbers and order by them also.
Fiddle here.
with l as (
select *,
rank() over(partition by orderid order by seqn1 desc, seqn2 desc) as rn
from line
where valid = 'y'
)
select *
from header as h
join l
on h.orderid = l.orderid
and l.rn = 1
How about something like this:
;
with cte_table2 as
(
SELECT ordered
,MAX(seqn1) as seqn1
,MAX(seqn2) as seqn2
FROM table2
where valid = 'y'
group by ordered --check if you need to add 'valid' to the group by but I don't think so.
)
SELECT
t1.*
,t3.otherinfo
--,t3.[OtherFields]
from table1 t1
inner join cte_table2 t2 on t1.orderid = t2.orderid -- first match on id
left join table2 t3 on t3.orderid = t2.orderid and t3.seqn1 = t2.seqn1 and t3.seqn2 = t2.seqn2

postgresql - How to get one row the min value

I have table (t_image) with this column
datacd | imagecode | indexdate
----------------------------------
A | 1 | 20170213
A | 2 | 20170213
A | 3 | 20170214
B | 4 | 20170201
B | 5 | 20170202
desired result is this
datacd | imagecode | indexdate
----------------------------------
A | 1 | 20170213
B | 4 | 20170201
In the above table, I want to retrieve 1 row for each datacd who has the minimum index date
Here is my query, but the result returns 2 rows for datacd A
select *
from (
select datacd, min(indexdate) as indexdate
from t_image
group by datacd
) as t1 inner join t_image as t2 on t2.datacd = t1.datacd and t2.indexdate = t1.indexdate;
The Postgres proprietary distinct on () operator is typically the fastest solution for greatest-n-per-group queries:
select distinct on (datacd) *
from t_image
order by datacd, indexdate;
One option uses ROW_NUMBER():
SELECT t.datacd,
t.imagecode,
t.indexdate
FROM
(
SELECT datacd, imagecode, indexdate,
ROW_NUMBER() OVER (PARTITION BY datacd ORDER BY indexdate) rn
FROM t_image
) t
WHERE t.rn = 1

row counter with condition in two different columns

I have the following tables with sport results (e.g. football):
tblGoals (RowId, GameRowIdm PlayerRowId, TeamRowId, GoalMinute)
RowId | GameRowId | PlayerRowId | TeamRowId | GoalMinute
--------------------------------------------------------
1 | 1 | 1 | 1 | 25
2 | 1 | 2 | 2 | 45
3 | 1 | 3 | 1 | 66
tblPlayers (RowId, PlayerName)
RowId | PlayerName
------------------
1 | John Snow
2 | Frank Underwood
3 | Jack Bauer
tblGames (RowId, TeamHomeRowId, TeamGuestRowId)
RowId | TeamHomeRowId | TeamGuestRowId | GameDate
---------------------------------------------------
1 | 1 | 2 | 2015-01-01
Now I want get a list of all goals. The list should look like this:
GoalMinute | PlayerName | GoalsHome | GoalsGuest
-----------------------------------------------------
25 | John Snow | 1 | 0
45 | Frank Underwood | 1 | 1
66 | Jack Bauer | 2 | 1
GoalsHome and GoalsGuest should be a counter of the shot goals for the team. So e.g. if you check the last row, the result is 2:1 for home team.
To get this list of goals, I used this statement:
SELECT t_gol.GoalMinute,
t_ply.PlayerName,
CASE WHEN
t_gol.TeamRowId = t_gam.TeamHomeRowId
THEN ROW_NUMBER() OVER (PARTITION BY t_gam.TeamHomeRowId ORDER BY t_gam.TeamHomeRowId)
END AS GoalsHome,
CASE WHEN
t_gol.TeamRowId = t_gam.TeamGuestRowId
THEN ROW_NUMBER() OVER (PARTITION BY t_gam.TeamGuestRowId ORDER BY t_gam.TeamGuestRowId)
END AS GoalsGuest
FROM dbo.tblGoalsFussball AS t_gol
LEFT JOIN dbo.tblPlayersFussball AS t_ply ON (t_ply.RowId = t_gol.PlayerRowId)
LEFT JOIN dbo.tblGames AS t_gam ON (t_gam.RowId = t_gol.GameRowId)
WHERE t_gol.GameRowId = #match_row
But what I get is this here:
GoalMinute | PlayerName | GoalsHome | GoalsGuest
-----------------------------------------------------
25 | John Snow | 1 | NULL
45 | Frank Underwood | NULL | 2
66 | Jack Bauer | 3 | NULL
Maybe ROW_NUMBER() is the wrong approach?
I would do the running total using sum() as a windowed aggregate function with the over ... clause, which works in SQL Server 2012+.
select
g.RowId, g.GameDate, t.GoalMinute, p.PlayerName,
GoalsHome = COALESCE(SUM(case when TeamRowId = g.TeamHomeRowId then 1 end) OVER (PARTITION BY gamerowid ORDER BY goalminute),0),
GoalsGuest = COALESCE(SUM(case when TeamRowId = g.TeamGuestRowId then 1 end) OVER (PARTITION BY gamerowid ORDER BY goalminute),0)
from tblGoals t
join tblPlayers p on t.PlayerRowId = p.RowId
join tblGames g on t.GameRowId = g.RowId
order by t.GameRowId, t.GoalMinute
Another approach (that also works in older versions) is to use a self-join and sum up the rows with lower goalminutes. For ease of reading I've used a common table expression to split the goals into two columns for home and guest team:
;with t as (
select
g.GoalMinute, g.PlayerRowId, g.GameRowId,
case when TeamRowId = ga.TeamHomeRowId then 1 end HomeGoals,
case when TeamRowId = ga.TeamGuestRowId then 1 end GuestGoals
from tblGoals g
join tblGames ga on g.GameRowId = ga.RowId
)
select
g.RowId, g.GameDate, t.GoalMinute, p.PlayerName,
GoalsHome = (select sum(coalesce(HomeGoals,0)) from t t2 where t2.GoalMinute <= t.GoalMinute and t2.GameRowId = t.GameRowId),
GoalsGuest = (select sum(coalesce(GuestGoals,0)) from t t2 where t2.GoalMinute <= t.GoalMinute and t2.GameRowId = t.GameRowId)
from t
join tblPlayers p on t.PlayerRowId = p.RowId
join tblGames g on t.GameRowId = g.RowId
order by t.GameRowId, t.GoalMinute
The CTE isn't necessary though, you could just as well use a derived table
Sample SQL Fiddle
I think the easiest way is with subqueries..
SELECT
tgs.GoalMinute,
tpl.PlayerName,
( SELECT
COUNT(t.RowId)
FROM
tblgoals AS t
WHERE t.GoalMinute <= tgs.GoalMinute
AND t.GameRowId = tgm.RowId
AND t.TeamRowId = tgm.TeamHomeRowId
) AS HomeGoals,
( SELECT
COUNT(t.RowId)
FROM
tblgoals AS t
WHERE t.GoalMinute <= tgs.GoalMinute
AND t.GameRowId = tgm.RowId
AND t.TeamRowId = tgm.TeamGuestRowId
) AS GuestGoals
FROM
tblgoals AS tgs
JOIN tblplayers AS tpl ON tgs.RowId = tpl.RowId
JOIN tblGames AS tgm ON tgm.RowId = tgs.GameRowId
ORDER BY tgs.GoalMinute

how to query range?

Raw Data
| ID | STATUS |
| 1 | A |
| 2 | A |
| 3 | B |
| 4 | B |
| 5 | B |
| 6 | A |
| 7 | A |
| 8 | A |
| 9 | C |
Result
| START | END |
| 1 | 2 |
| 6 | 8 |
Range of STATUS A
How to query ?
This should give you the correct ranges:
SELECT
STATUS,
MIN(ID),
max_id
FROM (
SELECT
t1.STATUS,
t1.ID,
COALESCE(MAX(t2.ID), t1.ID) max_id
FROM
yourtable t1 LEFT JOIN yourtable t2
ON t1.STATUS=t2.STATUS AND t1.ID<t2.ID
WHERE
NOT EXISTS (SELECT NULL
FROM yourtable t3
WHERE
t3.STATUS!=t1.STATUS
AND t3.ID>t1.ID AND t3.ID<t2.ID)
GROUP BY
t1.ID,
t1.STATUS
) s
WHERE
status = 'A'
GROUP BY
STATUS,
max_id
Please see fiddle here.
You are probably better off with a cursor-based solution or a client-side function.
However, if you were using Oracle - the following would work.
WITH LOWER_VALS AS
( -- All the Ids with no immediate predecessor
SELECT ROWNUM AS RN, STATUS, ID AS LOWER FROM
(
SELECT STATUS, ID
FROM RAWDATA RD1
WHERE RD1.ID -1 NOT IN
(SELECT ID FROM RAWDATA PRED_TABLE WHERE PRED_TABLE.STATUS = RD1.STATUS)
ORDER BY STATUS, ID
)
) ,
UPPER_VALS AS
( -- All the Ids with no immediate successor
SELECT ROWNUM AS RN, STATUS, ID AS UPPER FROM
(
SELECT STATUS, ID
FROM RAWDATA RD2
WHERE RD2.ID +1 NOT IN
(SELECT ID FROM RAWDATA SUCC_TABLE WHERE SUCC_TABLE.STATUS = RD2.STATUS)
ORDER BY STATUS, ID
)
)
SELECT
L.STATUS, L.LOWER, U.UPPER
FROM
LOWER_VALS L
JOIN UPPER_VALS U ON
U.RN = L.RN;
Results in the set
A 1 2
A 6 8
B 3 5
C 9 9
http://sqlfiddle.com/#!4/10184/2
There is not a lot to go on from what you put, but I think this might work. I am using T-SQL because I don't know what you are using?
SELECT
min(ID)
, max(ID)
FROM RawData
WHERE [Status] = 'A'