Date difference between similar records in SQL

Date difference between similar records in SQL - sql

I have a table and need to get the difference between two dates for a very similar set of records. I've tried a few methods today but cannot seem to get this one to work.
Example Table:
Payment_ID | Created_Date | Version_ID | Status
----------------------------------------------------------
1526 | 20/10/2015 | 1 | Opened
1526 | 20/10/2015 | 2 | Verified Open
1526 | 22/10/2015 | 3 | Assigned
1526 | 23/10/2015 | 4 | Contact Made
1859 | 20/10/2015 | 1 | Opened
1859 | 20/10/2015 | 2 | Verified Open
1859 | 22/10/2015 | 3 | Assigned
1859 | 22/10/2015 | 3.5 | Re-Assigned
1859 | 22/10/2015 | 4.5 | Contact Failed
1859 | 23/10/2015 | 4 | Contact Made
1859 | 24/10/2015 | 5 | Assigned Updated
1859 | 25/10/2015 | 6 | Contact Made
1859 | 26/10/2015 | 7 | Resolved
1859 | 21/10/2015 | 8 | Closed
1852 | 26/10/2015 | 1 | Opened
1778 | 21/09/2015 | 1 | Opened
1778 | 22/09/2015 | 2 | Verified Open
1778 | 23/09/2015 | 3 | Assigned
1778 | 24/09/2015 | 4 | Contact Made
1778 | 25/09/2015 | 5 | Assigned Updated
The requirement is to return the Payment_ID and StatusDateDiff for a given Status, in this case the Contact_Made one and only the first one if a Payment_ID has more than one, then take the difference between that date and the previous status date for any of them.
So taking 1526 "Contact_Made" was on the 24/10/2015 and the previous status, regardless of what that was, is 23/10/2015 so the difference is 1.
For the above it would look like this:
Payment_ID | StatusDateDiff
-----------------------------
1526 | 1
1859 | 1
1852 | 0
1778 | 1
I tried a few sub queries to get the distinct Payment_ID and Min(Created_Date), but that resulted in duplicates once put together.
Also tried a Common Table Expression but that lead to the same - though I'm not too familiar with them.
Any thoughts would be appreciated.

Use LAG() (available in SQL Server 2012+):
select payment_id, datediff(day, prev_created_date, created_date)
from (select t.*,
lag(created_date) over (partition by payment_id order by created_date) as prev_created_date,
row_number() over (partition by payment_id, status order by created_date) as seqnum
from t
) t
where status = 'Contact Made' and seqnum = 1;

This is untested, but this should point you in the right direction. You can use a windowed ROW_NUMBER() function to determine which values are the latest, and do a DATEDIFF() to find the number of days they are different.
Edit: I just noticed you have a SQL Server tag and an Oracle tag - this answer is for SQL Server
;With Ver As
(
Select *,
Row_Number() Over (Partition By Payment_Id Order By Version Desc) Row_Number
From Table
)
Select Latest.Payment_Id,
DateDiff(Day, Coalesce(Previous.Created_Date, Latest.CreatedDate), Latest.CreatedDate) As StatusDateDiff
From Ver As Latest
Left Join Ver As Previous On Latest.Payment_Id = Previous.Payment_Id
And Previous.Row_Number = 2
Where Latest.Row_Number = 1

Related

Select only record until timestamp from another table

I have three tables.
The first one is Device table
+----------+------+
| DeviceId | Type |
+----------+------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
+----------+------+
The second one is History table - data received by different devices.
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 1 | 31 | 15.08.2020 1:42:00 |
| 2 | 100 | 15.08.2020 1:42:01 |
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 1 | 34 | 15.08.2020 1:45:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
| 2 | 45 | 15.08.2020 1:47:00 |
+----------+-------------+--------------------+
The third one is DeviceStatusHistory table
+----------+---------+--------------------+
| DeviceId | State | TimeStamp |
+----------+---------+--------------------+
| 1 | 1(OK) | 15.08.2020 1:42:00 |
| 2 | 1(OK) | 15.08.2020 1:43:00 |
| 1 | 1(OK) | 15.08.2020 1:44:00 |
| 1 | 0(FAIL) | 15.08.2020 1:44:30 |
| 1 | 0(FAIL) | 15.08.2020 1:46:00 |
| 2 | 0(FAIL) | 15.08.2020 1:46:10 |
+----------+---------+--------------------+
I want to select the last temperature of devices, but take into account only those history records that occurs until the first device failure.
Since device1 starts failing from 15.08.2020 1:44:30, I don't want its records that go after that timestamp.
The same for the device2.
So as a final result, I want to have only data of all devices until they get first FAIL status:
+----------+-------------+--------------------+
| DeviceId | Temperature | TimeStamp |
+----------+-------------+--------------------+
| 2 | 40 | 15.08.2020 1:43:00 |
| 1 | 32 | 15.08.2020 1:44:00 |
| 3 | 20 | 15.08.2020 1:46:00 |
+----------+-------------+--------------------+
I can select an appropriate history only if device failed at least once:
SELECT * FROM Device D
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE D.Id = H.DeviceId
and H.DeviceTimeStamp <
(select MIN(UpdatedOn) from DeviceStatusHistory Y where [State]=0 and DeviceId=D.Id)
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY D.Id;
The problems is, if a device never fails, I don't get its history at all.
Update:
My idea is to use something like this
SELECT * FROM DeviceHardwarePart HP
CROSS APPLY
(SELECT TOP 1 * FROM History H
WHERE HP.Id = H.DeviceId
and H.DeviceTimeStamp <
(select ISNULL((select MIN(UpdatedOn) from DeviceMetadataPart where [State]=0 and DeviceId=HP.Id),
cast('12/31/9999 23:59:59.997' as datetime)))
ORDER BY H.DeviceTimeStamp desc) X
ORDER BY HP.Id;
I'm not sure whether it is a good solution

You can use COALESCE: coalesce(min(UpdateOn), cast('9999-12-31 23:59:59' as datetime)). This ensures you always have an upperbound for your select instead of NULL.

I will treat this as two parts problem
I will try to find the time at which device has failed and if it hasn't failed I will keep it as a large value like some timestamp in 2099
Once I have the above I can simply join with histories table and take the latest value before the failed timestamp.
In order to get one, I guess there can be several approaches. From top of my mind something like below should work
select device_id, coalesce(min(failed_timestamps), cast('01-01-2099 01:01:01' as timestamp)) as failed_at
(select device_id, case when state = 0 then timestamp else null end as failed_timestamps from History) as X
group by device_id
This gives us the minimum of failed timestamp for a particular device, and an arbitrary large value for the devices which have never failed.
I guess after this the solution is straight forward.

SQL - Group two rows by columns that value and null on different columns

Question
Say I have a table with such rows:
id | country | place | last_action | second_to_last_action
----------------------------------------------------------
1 | US | 2 | reply |
1 | US | 2 | | comment
4 | DE | 5 | reply |
4 | | | | comment
What I want to do is to combine these by id, country and place so that the last_action and second_to_last_action would be on the same row
id | country | place | last_action | second_to_last_action
----------------------------------------------------------
1 | US | 2 | reply | comment
4 | DE | 5 | reply | comment
How would I approach this? I guess I would need an aggregate here but my mind is hitting completely blank on which one should I use.
It can be expected that there will always be a matching pair.
Background:
Note: this table has been derived from something like this:
id | country | place | action | time
----------------------------------------------------------
1 | US | 2 | reply | 16:15
1 | US | 2 | comment | 15:16
1 | US | 2 | view | 13:16
4 | DE | 5 | reply | 17:15
4 | DE | 5 | comment | 16:16
4 | DE | 5 | view | 14:12
Code used to partition was:
row_number() over (partition by id order by time desc) as event_no
And then I got the last and second_to_last action by getting event_no 1 & 2. So if there's more efficient way to get the last two actions in two distinct columns I would be happy to hear that.

You can fix your first data by using aggregation:
select id, country, place, max(last_action), max(second_to_last_action)
from derived
group by id, country, place;
You can do this from the original table using conditional aggregation:
select id, country, place,
max(case when seqnum = 1 then action end) as last_action,
max(case when seqnum = 2 then action end) as second_to_last_action
from (select t.*,
row_number() over (partition by id order by time desc) as seqnum
from t
) t
group by id, country, place;

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!

Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440

You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

SQL Server Active Record counts by month

I've created a database storing Incident tickets.
I have created a fact and a number of dimension tables.
Here is some sample data
+---------------------+--------------+--------------+-------------+------------+
| LastModifiedDateKey | TicketNumber | Status | factCurrent | Date |
+---------------------+--------------+--------------+-------------+------------+
| 2774 | T:9992260 | Open | 1 | 4/12/2017 |
| 2777 | T:9992805 | Open | 1 | 7/12/2017 |
| 2777 | T:9993068 | Open | 1 | 7/12/2017 |
| 2777 | T:9993098 | Open | 0 | 7/12/2017 |
| 2793 | T:9993098 | Acknowledged | 0 | 23/12/2017 |
| 2928 | T:9993098 | Closed | 1 | 5/01/2018 |
| 2777 | T:9993799 | Open | 0 | 7/12/2017 |
| 2928 | T:9993799 | Closed | 1 | 5/01/2018 |
| 2778 | T:9994729 | Open | 1 | 8/12/2017 |
| 2774 | T:9994791 | Open | 0 | 4/12/2017 |
| 2928 | T:9994791 | Closed | 1 | 5/01/2018 |
| 2777 | T:9994912 | Open | 1 | 7/12/2017 |
| 2778 | T:9995201 | Open | 0 | 8/12/2017 |
| 2793 | T:9995201 | Closed | 1 | 23/12/2017 |
| 2931 | T:9718629 | Open | 1 | 8/01/2018 |
| 2933 | T:9718629 | Closed | 1 | 10/01/2018 |
| 2932 | T:9855664 | Open | 1 | 9/01/2018 |
| 2931 | T:9891975 | Open | 1 | 8/01/2018 |
+---------------------+--------------+--------------+-------------+------------+
I want a query that will give me the total of tickets open at the end of each month.
In the data January should have 8 and Feb 2.
Note: that a ticket can have multiple rows with same status because a dimension key has changed or multiple rows with different status all in the same month. e.g. T:9993098.

This approach first uses ROW_NUMBER to identify the most recent record for each ticket, for each month/year. It is assumed that the most recent record in a month will contain the status in which a ticket ended for that month. Then, it aggregates over this modified table, counting only tickets which ended the month in an open status.
SELECT
YEAR(Date) + "-" + MONTH(Date) AS date,
COUNT(*) AS num_open_tickets
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARITION BY YEAR(Date), MONTH(Date), TicketNumber
ORDER BY BY Date DESC) rn
FROM yourTable
) t
WHERE t.rn = 1 AND t.Status = 'Open'
GROUP BY
YEAR(Date) + "-" + MONTH(Date);

First, I would generate the months. Then do a cumulative count of the opens minus the closes. Alas, that is a bit tricky because of the repeated rows for a ticket and because you are using an old version of SQL Server.
But . . . you can do this:
with months as (
select dateadd(day, 1 - day(min(date)), min(date)) as mon_start,
max(date) as max_date
from sample
union all
select dateadd(month, 1, mon_start), max_date
from months
where dateadd(month, 1, mon_start) < max_date
)
select m.mon_end,
(select count(distinct case when status = 'Open' then ticket end) -
count(distinct case when status = 'Closed' then ticket end)
from sample s
where s.date <= m.mon_end
) as open_tickets
from (select dateadd(day, -1, mon_start) as mon_end
from months
) m;
This uses a recursive CTE to generate the months. It is easier to generate the first day of the months and then subtract one day afterwards (what is the date when you add 1 month to the last day of February?)
The rest uses a correlated subquery to count the number of open tickets on that date.

How do I do multiple selection based on a flowchart of criteria?

Table name: Copies
+------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value | most_recent |
+----------------------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 900 | 2 | null | Y | 3 | Oct 16 |
| 900 | 3 | null | N | 9 | Oct 16 |
| 901 | 4 | 378 | Y | 3 | Oct 16 |
| 901 | 5 | null | N | 2 | Oct 16 |
| 902 | 6 | null | N | 5 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
| 903 | 9 | null | Y | 3 | May16 |
| 904 | 10 | null | N | 0 | May 16 |
| 904 | 11 | null | N | 0 | May16
--------------------------------------------------------------------------------------
Output table
+---------------------------------------------------------------------------------------------------+
| group_id | my_id | previous | in_this | higher_value |most_recent|
+----------------------------------------------------------------------------------------------------
| 900 | 1 | null | Y | 7 | May16 |
| 902 | 7 | null | N | 9 | Oct 16 |
| 903 | 8 | null | Y | 3 | Oct 16 |
---------------------------------------------------------------------------------------------------------
Hi all, I need help with a query that returns one record within a group based on the importance of the field. The importance is ranked as follows:
previous- if one record within the group_id is not null, then neither record within a group_id is returned (because according to our rules, all records within a group should have the same previous value)
in_this- If one record is Y, and the other is N within a group_id, then we keep the Y; If all records are Y or all are N, then we move to the next attribute
Higher_value- If all records in the ‘in_this’ field are equal, then we need to select the record with the greater value from this field. If both records have an equal value, we move to the next attribute
Most_recent- If all records were of equal value in the ‘higher_value’ field, then we consider the newest record. If these are equal, then nothing is returned.
This is a simplified version of the table I am looking at, but I just would like to get the gist of how something like this would work. Basically, my table has multiple copies of records that have been grouped through some algorithm. I have been tasked with selecting which of these records within a group is the ‘good’ one, and we are basing this on these fields.
I’d like the output to actually show all fields, because I will likely attempt to refine the query to include other fields (there are over 40 to consider), but the most important is the group_id and my_id fields. It would be neat if we could also somehow flag why each record got picked, but that isn’t necessary.
It seems like something like this should be easy, but I have a hard time wrapping my head around how to pick from within a group_id. Thanks for your help.

You can use analytic functions for this. The trick is establishing the right variables for each condition:
select t.*
from (select t.*,
max(in_this) over (partition by group_id) as max_in_this,
min(higher_value) over (partition by group_id) as min_higher_value,
max(higher_value) over (partition by group_id) as max_higher_value,
row_number() over (partition by group_id, higher_value order by my_id) as seqnum_ghv,
min(most_recent) over (partition by group_id) as min_most_recent,
max(most_recent) over (partition by group_id) as max_most_recent,
row_number() over (partition by group_id order by most_recent) as seqnum_mr
from t
) t
where max_in_this is not null and
( (min_higher_value <> max_higher_value and seqnum_ghv = 1) or
(min_higher_value = max_higher_value and min_most_recent <> max_most_recent and seqnum_mr = 1
)
);
The third condition as stated makes no sense, but you should get the idea for how to implement this.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Date difference between similar records in SQL - sql

Related

Select only record until timestamp from another table

SQL - Group two rows by columns that value and null on different columns

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

SQL Server Active Record counts by month

How do I do multiple selection based on a flowchart of criteria?

Categories

Resources