postgresql : extract(epoch function over multiple rows - sql

I am using postgresql 8.3 (no choice in the version at this time)
My raw data is as follows:
ID | From | To | Time
01 | n/a | open | 06:56
01 | open | pt1 | 07:56
01 | pt1 | pt2 | 07:59
01 | pt2 | pt3 | 08:36
01 | pt3 | pt4 | 08:56
01 | pt4 | close | 09:58
What I want to end up with is:
ID | Open_Time | Close_Time
01 | 06:56 | 09:58
I don't care about the time intervals between the individual parts. I have many ID numbers and each can have this or more are part intervals to it. I'm fairly new to sql so I am pretty lost here. I'm stuck on how to merge the two end and beginning rows into one row in a new view.

select
id ID,
min(case rd.to when 'open' then "time" end) Open_Time,
min(case rd.to when 'close' then "time" end) Close_Time
from raw_data rd
group by id

Related

Grouping Data 3 Hours after the Initial Time

I need to be able to filter down a dataset to only show the first instance every 3 hours. If an instance is found, any other instances that occur up to 3 hours afterwards should be hidden.
The closes thing I've been able to find is using date_trunc to get the first instance each hour, but I need to hide specifically up to 3 hours after the first instance exactly.
Example Data:
+------------------------+-------+
| Timestamp | Value |
+------------------------+-------+
| "2015-12-29 13:35:00" | 65 |
| "2015-12-29 13:40:00" | 26 |
| "2015-12-29 13:45:00" | 80 |
| "2015-12-29 13:50:00" | 10 |
| "2015-12-29 16:40:00" | 76 |
| "2015-12-29 16:45:00" | 73 |
| "2016-01-04 08:05:00" | 87 |
| "2016-01-04 08:10:00" | 90 |
| "2016-01-04 08:15:00" | 52 |
| "2016-01-04 08:20:00" | 90 |
| "2016-01-04 08:25:00" | 23 |
| "2016-01-04 08:30:00" | 96 |
| "2016-01-04 13:35:00" | 53 |
| "2016-01-04 13:40:00" | 15 |
| "2016-01-04 13:45:00" | 85 |
+------------------------+-------+
Expected Result:
+------------------------+-------+
| Timestamp | Value |
+------------------------+-------+
| "2015-12-29 13:35:00" | 65 |
| "2015-12-29 16:40:00" | 76 |
| "2016-01-04 08:05:00" | 87 |
| "2016-01-04 13:30:00" | 7 |
+------------------------+-------+
Anyone have any ideas? Thank you so much for your help.
This is tricky, because you need to keep track of the last picked record to identify the next one - so you can't just group by 3 hours intervals.
Here is one approach using a recursive cte:
with recursive cte(ts, value) as (
select ts, value
from mytable
where ts = (select min(ts) from mytable)
union all
select x.*
from (select ts from cte order by ts desc limit 1) c
cross join lateral (
select t.ts, t.value
from mytable t
where t.ts >= c.ts + interval '3' hour
order by t.ts
limit 1
) x
)
select * from cte order by ts
The idea is to start from the earliest record in the table, then iterate by picking the first available record that is at least 3 hours later (this assumes no duplicates in the timestamp column).
Note that timestamp is not a good choice for a column name, because it conflicts with a language keyword (that's a datatype). I remaned it to ts in the query.
Demo on DB Fiddle:
ts | value
:------------------ | ----:
2015-12-29 13:35:00 | 65
2015-12-29 16:40:00 | 76
2016-01-04 08:05:00 | 87
2016-01-04 13:35:00 | 53

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

Aggregating continuous rows within a SQL table

I am trying to run an aggregate function on the following SQL table to sum up all the "LengthOfRecord" grouped by "Long+Lat" and only rows that are contiguous (i.e. "RowNumber" that is in running sequence).
+-----------+-----------+---------------+----------------+
| RowNumber | Vessel ID | Long+Lat | LengthOfRecord |
+-----------+-----------+---------------+----------------+
| 102313179 | Vessel 01 | 123.751 1.196 | 181 |
| 102313180 | Vessel 01 | 123.751 1.196 | 179 |
| 102313181 | Vessel 01 | 123.751 1.196 | 361 |
| 102313182 | Vessel 01 | 123.751 1.196 | 359 |
| 102313183 | Vessel 01 | 123.751 1.196 | 180 |
| 102313184 | Vessel 01 | 123.751 1.196 | 181 |
| 102313185 | Vessel 01 | 123.751 1.196 | 179 |
| 102313186 | Vessel 01 | 123.751 1.196 | 180 |
| 102313187 | Vessel 01 | 123.751 1.196 | 360 |
| 102313188 | Vessel 01 | 123.751 1.196 | 360 |
| 102313189 | Vessel 01 | 123.751 1.196 | 180 |
| 102313191 | Vessel 01 | 123.751 1.196 | 181 |
| 102313298 | Vessel 01 | 123.750 1.197 | 180 |
| 102313375 | Vessel 01 | 123.742 1.196 | 179 |
| 102313376 | Vessel 01 | 123.742 1.196 | 359 |
| 102313377 | Vessel 01 | 123.742 1.196 | 180 |
| 102313379 | Vessel 01 | 123.742 1.196 | 181 |
| 102313380 | Vessel 01 | 123.742 1.196 | 178 |
+-----------+-----------+---------------+----------------+
The following is the result that I am trying to achieve through SQL statements. Is there anyway that I can do this through an SQL query?
+-----------+---------------+----------------+
| Vessel ID | Long+Lat | LengthOfRecord |
+-----------+---------------+----------------+
| Vessel 01 | 123.751 1.196 | 2881 |
| Vessel 01 | 123.750 1.197 | 180 |
| Vessel 01 | 123.742 1.196 | 1077 |
+-----------+---------------+----------------+
You can do this using a difference in row numbers approach:
select vesselId, latLong, sum(lengthOfRecord)
from (select t.*,
row_number() over (partition by vesselId order by rowNumber) as seqnum,
row_number() over (partition by vesselId, latlong order by rowNumber) as seqnum_latlong
from table t
) t
group by (seqnum - seqnum_latlong), latLong, vesselId;
The difference of row number approach is a bit tricky to explain. It identifies adjacent rows with the same values. If you run the subquery, you will see how the calculation works.
This may be long but hopefully covers your requirements in a relatively readable manner:
declare #t table (RowNumber int not null, VesselID varchar(17) not null,
LatLong varchar(19),LengthOfRecord int not null)
insert into #t(RowNumber,VesselID,LatLong,LengthOfRecord) values
(102313179,'Vessel 01','123.751 1.196',181),
(102313180,'Vessel 01','123.751 1.196',179),
(102313181,'Vessel 01','123.751 1.196',361),
(102313182,'Vessel 01','123.751 1.196',359),
(102313183,'Vessel 01','123.751 1.196',180),
(102313184,'Vessel 01','123.751 1.196',181),
(102313185,'Vessel 01','123.751 1.196',179),
(102313186,'Vessel 01','123.751 1.196',180),
(102313187,'Vessel 01','123.751 1.196',360),
(102313188,'Vessel 01','123.751 1.196',360),
(102313189,'Vessel 01','123.751 1.196',180),
(102313191,'Vessel 01','123.751 1.196',181),
(102313298,'Vessel 01','123.750 1.197',180),
(102313375,'Vessel 01','123.742 1.195',179),
(102313376,'Vessel 01','123.742 1.195',359),
(102313377,'Vessel 01','123.742 1.195',180),
(102313379,'Vessel 01','123.742 1.195',181),
(102313380,'Vessel 01','123.742 1.195',178)
;With ContiguousRN as (
select
*,
ROW_NUMBER() OVER (PARTITION BY VesselID ORDER BY RowNumber) as rn
from
#t
), Starts as (
select
r1.VesselID,
r1.rn,
r1.LatLong,
ROW_NUMBER() OVER (PARTITION BY r1.VesselID ORDER BY r1.rn) as srn
from
ContiguousRN r1
left join
ContiguousRN r2
on
r1.rn = r2.rn + 1 and
r1.VesselID = r2.VesselID and
r1.LatLong = r2.LatLong
where
r2.rn is null
), Ends as (
select
r1.VesselID,
r1.rn,
r1.LatLong,
ROW_NUMBER() OVER (PARTITION BY r1.VesselID ORDER BY r1.rn) as srn
from
ContiguousRN r1
left join
ContiguousRN r2
on
r1.rn = r2.rn - 1 and
r1.VesselID = r2.VesselID and
r1.LatLong = r2.LatLong
where
r2.rn is null
), Sequences as (
select
s.VesselID,
s.LatLong,
s.rn as StartRow,e.rn as EndRow
from
Starts s
inner join
Ends e
on
s.VesselID = e.VesselID and
s.srn = e.srn
)
select
seq.VesselID,
seq.LatLong,
(select SUM(LengthOfRecord) from ContiguousRN r
where r.VesselID = seq.VesselID and
r.rn between seq.StartRow and seq.EndRow) as LengthOfRecord
from Sequences seq
I've changed some of the column names so that I don't have to keep quoting them because they contain spaces or punctuation. I'd also recommend you either store the position in a genuine geography-typed column or you store lat and long in separate columns.
So, the above query. The first CTE (ContiguousRN) just arranges for us to have row numbers (rn) that don't have gaps, unlike RowNumber. The second and third queries locate the rows within the table that are the start and end of each run - basically, locating rows where the immediate preceding or succeeding rows have different LatLong values. We also generate a separate series of row numbers for just these rows, so that, in Sequences, we can combine each start row with its corresponding end row.
Finally, in the last select, we bring this together and we total up all of the rows that sit between each start and end marker.
I've assumed throughout that VesselID should be used as some form of partitioning value and that your actual data may contain details for more than one vessel and this process shouldn't mingle the data together. If that's not so, you can remove most of the conditions around VesselID in the above.
Results:
VesselID LatLong LengthOfRecord
----------------- ------------------- --------------
Vessel 01 123.751 1.196 2881
Vessel 01 123.750 1.197 180
Vessel 01 123.742 1.195 1077

SQL TSQL Query to get the count of equipment on weekly basis

ID | Equipment | HireDate | HireTodate | ActualOffhireDate
---------------------------------------------------------------------
01 | Printer | 01/01/2013 | 31/12/2016 |
02 | Printer | 01/05/2015 | 31/12/2016 |
03 | Laptop | 17/01/2016 | 31/12/2016 |
04 | Laptop | 01/01/2015 | 31/12/2016 | 28/01/2016
I have like the above table and would like to get the count based on weekly (from friday to thurday) for month of january 2016 as per the below
Equipment | January count | Week 1| Week 2| Week 3| Week 4
------------------------------------------------------------------
Printer | 02 | 02 | 02 | 02 | 02
Laptop | 02 | 01 | 01 | 02 | 01
You need to have a calendar table which helps in processing this kind of queries very fast and with minimal effort.
Here is one way which i used to generate calendar table..and it looks like below in my environment...
Once you have calendar table,all you have to do is join date which you want to count which is as simple as below
select
equipment,
count(*) as 'Jancount',
sum(case when wkno =1 then 1 else 0 end) 'Week 1',
sum(case when wkno =2 then 1 else 0 end) 'Week 2',
sum(case when wkno =3 then 1 else 0 end) 'Week 3',
sum(case when wkno =4 then 1 else 0 end) 'Week 4'
from
calendar c
join testtable p
on p.hiretodate=c.date
group by equipment
Output:

SQL - How do I query for re-admissions in TSQL?

I'm trying to figure out how to query for readmissions on Server 2008r2. Here is the basic structure of the visit table. There are other fields but none that I thought would be helpful. One issue is that some of these may be transfers instead of discharges which I have no easy way to deduce but that issue can be ignored for now. I tried my hand at this but I guess my understanding of SQL needs more work. I tried to find any info I could online but none of the queries lead me to a useful conclusion or I just didn't understand. Any suggestions would be appreciated.
EDIT: Readmission is if a patient returns within 30 days of previous discharge.
+---------+--------+-----------------+-----------------+
| VisitID | UID | AdmitDT | DischargeDT |
+---------+--------+-----------------+-----------------+
| 12 | 2 | 6/17/2013 6:51 | 6/17/2013 6:51 |
| 16 | 3 | 6/19/2013 4:48 | 6/21/2013 13:35 |
| 18 | 3 | 6/11/2013 12:08 | 6/11/2013 12:08 |
| 21 | 3 | 6/12/2013 14:40 | 6/12/2013 14:40 |
| 22 | 3 | 6/13/2013 10:00 | 6/14/2013 12:00 |
| 25 | 2 | 6/11/2013 16:13 | 6/11/2013 16:13 |
| 30 | 1 | 6/20/2013 8:35 | 6/20/2013 8:35 |
| 31 | 7 | 6/13/2013 6:12 | 6/13/2013 6:12 |
| 34 | 3 | 6/12/2013 8:40 | NULL |
| 35 | 1 | 6/12/2013 8:52 | NULL |
| 38 | 2 | 6/12/2013 10:10 | 6/12/2013 10:10 |
+---------+--------+-----------------+-----------------+
Attempt at Code:
SELECT N2.*
FROM visitTable AS N1
INNER JOIN
visitTable AS N2 ON N1.UID = N2.UID
WHERE N1.EncounterID <> N2.EncounterID AND ( N2.AdmitDT BETWEEN N1.DischargeDT and DATEADD(DD,30, N1.DischargeDT))
Here's a start:
sqlfiddle
new fiddle
It gets each visit for each UID in order of admitDT, then pairs each visit with the next visit in that result. If the current admit date is between the last discharge date and 30 days from then, select it. There are some weird points though - UID 1 is shown to have been admitted on 6/12/2012 and never discharged, but then admitted again on 6/20/2013 and discharged the same day.
edit: restructured a bit to reduce the number of joins
WITH cte AS (
SELECT visitid,uid,dischargedt,admitdt,
row_number()over(partition BY uid ORDER BY admitdt) AS r
FROM t
)
SELECT
c1.visitid AS v1, c2.visitid AS v2,
c1.uid,
c1.dischargedt as [Discharged from first visit],
c2.admitdt as [Admitted to next visit]
FROM cte c1
INNER JOIN cte c2 ON c1.uid=c2.uid
WHERE c1.visitid<>c2.visitid
AND c1.r+1=c2.r
AND c2.admitdt BETWEEN c1.dischargedt AND dateadd(d,30,c1.dischargedt )
ORDER BY c1.uid
Results:
| V1 | V2 | UID | DISCHARGED FROM FIRST VISIT | ADMITTED TO NEXT VISIT |
|----|----|-----|-----------------------------|-----------------------------|
| 25 | 38 | 2 | June, 11 2013 16:13:00+0000 | June, 12 2013 10:10:00+0000 |
| 38 | 12 | 2 | June, 12 2013 10:10:00+0000 | June, 17 2013 06:51:00+0000 |
| 18 | 34 | 3 | June, 11 2013 12:08:00+0000 | June, 12 2013 08:40:00+0000 |
| 21 | 22 | 3 | June, 12 2013 14:40:00+0000 | June, 13 2013 10:00:00+0000 |
| 22 | 16 | 3 | June, 14 2013 12:00:00+0000 | June, 19 2013 04:48:00+0000 |
try this: (Show me the visits where the admission date is after discharge for another earlier visit by the same patient)
Select * From visits v
Where Exists (Select * From Visits
Where uid = v.uid
and v.AdmitDT > DischargeDT)
You have not explained any business rules so I'll take a guess. A readmission is when multiple UID appear, and it is every record except the first one
Here is another method using windowing functions.
SELECT VT.*
FROM visitTable VT
INNER JOIN
(
SELECT VisitID, ROW_NUMBER() OVER (PARTITION BY UID ORDER BY AdmitDT) VisitCount
FROM visitTable
) RA
ON RA.VisitCount > 1 AND RA.VisitID = VT.VisitID