SQL Query Pivot Data 2 rows into 1 - sql

I am try to write a query which pivots activity data into summary rows. For example the input data is activity date, description and status of either; Start, Stop or null if it is an informative row.
This is an example of the data format:
ID | ActivityID | ActivityType | ActivityDate | Status | Activity
----------------------------------------------------------------------
701 | 26 | Start | 02/07/13 15:16 | 10 | Run Job
728 | 26 | No Change | 05/07/13 09:30 | 20 | Running
859 | 26 | Stop | 22/07/13 12:45 | 30 | Error
1064 | 26 | Start | 10/08/13 13:26 | 11 | Restarted
1524 | 26 | Stop | 28/08/13 10:19 | 31 | Error
1785 | 26 | Stop | 07/09/13 11:48 | 31 | Error
2205 | 26 | Start | 17/09/13 09:05 | 10 | Restarted
2528 | 26 | Start | 14/10/13 17:56 | 11 | Restarted
2528 | 26 | Stop | 25/10/13 20:47 | 32 | Completed
And this is the expected result:
ActivityID | Start_Type | Start_Date | Start_Status | Start_Activity | Stop_Type | Stop_Date | Stop_Status | Stop_Activity
---------------------------------------------------------------------------------
26 | Start | 02/07/13 15:16 | 10 | Run Job | Stop | 22/07/13 12:45 | 30 | Error
26 | Start | 10/08/13 13:26 | 11 | Restarted | Stop | 28/08/13 10:19 | 31 | Error
26 | Start | 17/09/13 09:05 | 10 | Restarted | Stop | 25/10/13 20:47 | 32 | Done
I want to get all the starts and put the corrosponding stop in the same row as activity_start_date etc. and activity_stop_date.
My problem is I want the first stop after each start which I have done, but I want to also ignore a start that has a start before it. For instants Start, Stop, Stop, Start, Start, Stop, Start would return; Start Stop, Start Stop, Start null.
I have tried a left join to join the 1st stop after a start but this also includes the second start matching the same stop twice.
I thought I needed a temp table but I think this is unnecessary. Would a date variable work where the first start sets the variable then the stop is the first stop after the variable which sets the variable and the next start is the first start after the new set variable?
Your help is appreciated!
This is what I have so far:
SELECT
activity.ID,
activity.ActivityType AS StartActivityType,
activity.ActivityDate AS StartActivityDate,
activity.Status AS StartStatus,
activity.Activity AS StartActivity,
activity2.ActivityType AS StopActivityType,
activity2.ActivityDate AS StopActivityDate,
activity2.Status AS StopStatus,
activity2.Activity AS StopActivity
FROM tempdb..#TempTable activity
FULL OUTER JOIN #TempTable activity2
ON activity.ID = activity2.ID
AND activity2.ActivityType = 'Stop'
AND (activity2.ActivityDate > activity.ActivityDate)
AND (activity2.ActivityDate = ( SELECT MIN(activity3.ActivityDate)
FROM tempdb..#TempTable activity3
WHERE activity.ID = activity2.ID
AND activity3.ActivityType = 'Stop'
AND activity3.ActivityDate > activity.ActivityDate))
WHERE activity.ActivityType = 'Start'
--does not have a start before
AND activity.ActivityDate > ( SELECT MAX(activity3.ActivityDate)
FROM tempdb..#TempTable activity2
WHERE activity2.PathwayID = activity.ID
AND activity2.ActivityType IN ('Stop','Start')
AND activity2.ActivityDate > activity.ActivityDate))
--AND activity2.ActivityDate != LAG(activity2.ActivityDate) OVER (ORDER BY activity.ActivityDate),
--AND activity.ActivityDate IS NULL
ORDER BY activity.ID ASC

select * from #TempTable a1
cross apply
(
select top 1 *
from
#TempTable a2
where
a2.ActivityType = 'Stop'
and a2.ActivityID = a1.ActivityID
and a2.ActivityDate > a1.ActivityDate
order by
a2.ActivityDate
) a2
where
a1.ActivityType = 'Start'
order by
a1.ActivityDate;
You have some oddities in your data that I do not understand, but this should get you 90% there and it's much cleaner than trying to join back on ActivityDate.

The query of shriop is almost correct, it fails where there are following rows with ActivityType = 'Start' like 2205 and 2528. For those the duplicated rows (the rows with the same ActivityType of the precedent) need to be dropped.
If the OP uses SQLServer 2012 or better this can be done using LAG
WITH DATA AS (
SELECT ActivityID
, ActivityType
, ActivityDate
, Status
, Activity
, LastActivity = LAG(ActivityType, 1, 'Stop') OVER (ORDER BY ActivityDate)
FROM Table1
WHERE ActivityType IN ('Start', 'Stop')
)
SELECT a1.ActivityID
, Start_Type = a1.ActivityType
, Start_Date = a1.ActivityDate
, Start_Status = a1.Status
, Start_Activity = a1.Activity
, Stop_Type = a2.ActivityType
, Stop_Date = a2.ActivityDate
, Stop_Status = a2.Status
, Stop_Activity = a2.Activity
FROM DATA a1
CROSS apply (SELECT top 1 *
FROM Table1 a2
WHERE a2.ActivityType = 'Stop'
AND a2.ActivityID = a1.ActivityID
AND a2.ActivityDate > a1.ActivityDate
ORDER BY a2.ActivityDate) a2
WHERE a1.ActivityType = 'Start'
AND a1.LastActivity <> a1.ActivityType
ORDER BY a1.ActivityDate;
Otherwise LAG can be simulated by row numbering and self join

Related

SQL: Selecting data from multiple tables with multiple records to select and join

I have three tables: VolunteerRelationships, Organizations, and CampaignDates. I'm trying to write a query that will give me the organization id and name, and the org's start and end campaign dates for the current campaign year <#CampaignYear>, based on the selected volunteer <#SelectedInd>.
Dates are stored as separate column values for day, month and year which I'm trying to cast into a more an formatted date value. If I can get this, I'd also like to use a case statement to get the status of the campaign based on whether the date campaign dates are upcoming, currently running, or already closed, but need to get the first part of the query first.
Sorry if I'm leaving a lot of needed info out, this is my first time posting a question to this forumn. Thank you!
VolunteerRelationships
id | name | managesId |expiryDate
1 | john | 1 |
2 | jack | 2 |6/30/2020
3 | jerry| 3 |12/31/2021
Organizations
id | name1
1 | ACME
CampaignDates
orgId | dateDay | dateMonth | dateYear | dateType | Campaign Year
1 | 5 | 11 | 2020 | Start | 2020
1 | 15 | 11 | 2020 | End | 2020
Result
orgId | orgName | startDate | endDate | Status
1 | ACME | 2020-01-01| 2020-01-15 | Closed
select
v.MANAGEDACCOUNT,
o.Name1,
select * from
(select cast(cast dateyear*1000 + datemonth*100 + dateday as varchar(255)) as date as date1 from <#Schema>.CampaignDates where datetype = 'Start' and campaignyear = <#CampaignYear> and orgaccountnumber = v.MANAGEDACCOUNT) d1,
(select cast(cast dateyear*1000 + datemonth*100 + dateday as varchar(255)) as date as date2 from <#Schema>.CampaignDates where datetype = 'End' and campaignyear = <#CampaignYear> and orgaccountnumber = v.MANAGEDACCOUNT) d2
from <#Schema>.VolunteerRelationships v
inner join <#Schema>.organizations o
on o.accountnumber=v.MANAGEDACCOUNT
where v.VOLUNTEERACCOUNT = <#SelectedInd> and ( v.EXPIRYDATE IS NULL OR v.EXPIRYDATE > <#Today> )

Is there way to add date difference values we get to the date automatically?

What I was trying to do is I have two dates and using DateDiff to get a difference between dates. For example, I Have planned Start Date and actual start Date and I got the difference between this date is 5, now I want to add this day to the Finish date.
If my Finish date is not what I assumed, but behind, then that difference we got I want to add and want to find next finish date because we are behind so next upcoming dates.
Sum (DATEDIFF(day, sa.PlannedStartDate, sa.ActualStartDate)) OVER
(Partition
By ts.Id)as TotalVariance,
Case when (Sum (DATEDIFF(day, sa.PlannedStartDate, sa.ActualStartDate))
OVER
(Partition By ts.Id) >30) then 'Positive' end as Violation,
DATEADD (day, DATEDIFF(day, sa.PlannedStartDate, sa.ActualStartDate))as
Summar violations,
If the activity 1 - planned Start date is 8/21/2019 but the actual start date is 9/21/2019, in this case we are behind 30 days.
Now the next activity will be delayed, so I want to add this difference to the next activity.
If the second activity planned Start date was 08/25/2019, but because of the delay of activity 1 the start date will change for second activity, in this case I want to find that new date.
Activity PlannedStartdate ActualStartDate Variance NewPlannedstartdate
Activity 1 8/21/2019 9/21/2019 30
Acivity 2 8/26/2019 null 9/26/2019
Here's an example you can run in SSMS:
-- CREATE ACTIVITY TABLE AND ADD SOME DATA --
DECLARE #Activity TABLE ( ActivityId INT, PlannedStart DATE, ActualStart DATE );
INSERT INTO #Activity (
ActivityId, PlannedStart, ActualStart
)
VALUES
( 1, '08/21/2019', '08/27/2019' ), ( 1, '08/26/2019', NULL ), ( 1, '09/14/2019', NULL );
Query #Activity to see what's in it:
SELECT * FROM #Activity ORDER BY ActivityId, PlannedStart;
#Activity content:
+------------+--------------+-------------+
| ActivityId | PlannedStart | ActualStart |
+------------+--------------+-------------+
| 1 | 2019-08-21 | 2019-08-27 |
| 1 | 2019-08-26 | NULL |
| 1 | 2019-09-14 | NULL |
+------------+--------------+-------------+
Query #Activity to factor the new starting dates:
;WITH Activity_CTE AS (
SELECT
ROW_NUMBER() OVER ( ORDER BY PlannedStart ) AS Id,
ActivityId, PlannedStart, ActualStart, DATEDIFF( dd, PlannedStart, ActualStart ) Delayed
FROM #Activity
WHERE
ActivityId = #ActivityId
)
SELECT
ActivityId,
PlannedStart,
ActualStart,
DATEADD( dd, Delays.DaysDelayed, PlannedStart ) AS NewStart
FROM Activity_CTE AS Activity
OUTER APPLY (
SELECT CASE
WHEN ( Delayed IS NOT NULL ) THEN Delayed
ELSE ISNULL( ( SELECT TOP 1 Delayed FROM Activity_CTE WHERE Id < Activity.Id AND Delayed IS NOT NULL ORDER BY Id DESC ), 0 )
END AS DaysDelayed
) AS Delays
ORDER BY
PlannedStart;
Returns
+------------+--------------+-------------+------------+
| ActivityId | PlannedStart | ActualStart | NewStart |
+------------+--------------+-------------+------------+
| 1 | 2019-08-21 | 2019-08-27 | 2019-08-27 |
| 1 | 2019-08-26 | NULL | 2019-09-01 |
| 1 | 2019-09-14 | NULL | 2019-09-20 |
+------------+--------------+-------------+------------+
The real "magic" here is this line:
ELSE ISNULL( ( SELECT TOP 1 Delayed FROM Activity_CTE WHERE Id < Activity.Id AND Delayed IS NOT NULL ORDER BY Id DESC ), 0 )
It's checking for any prior records to itself that has a delay. If none are found, it returns 0. This value is then used to add days to the PlannedStart date to determine the NewStart date. The ORDER BY is of particular note too. Sorting in a DESC order ensures we get the "closest" delay prior to the current row.
Using a CTE in this way also takes into account the idea that the delay may not happen on the very first record (e.g., say the 08/26 planned was delayed instead of 08/21). It conveniently gives us a subtable to query against in our OUTER APPLY.
This is what you would see if you included all columns on the CTE's SELECT:
+----+------------+--------------+-------------+---------+-------------+
| Id | ActivityId | PlannedStart | ActualStart | Delayed | DaysDelayed |
+----+------------+--------------+-------------+---------+-------------+
| 1 | 1 | 2019-08-21 | 2019-08-27 | 6 | 6 |
| 2 | 1 | 2019-08-26 | NULL | NULL | 6 |
| 3 | 1 | 2019-09-14 | NULL | NULL | 6 |
+----+------------+--------------+-------------+---------+-------------+
Because the very first record is the only record with a delay, its delay of 6 days persists through each of the following records.

Most efficient way to count number of matching rows with multiple criterias at once

I have a very large table (called device_operation with 50 million rows) which holds all the operations of a product in its lifecycle (such as "start", "stop", "refill", ..." and the status of these operations (row status : Completed, Failed), with the ID of the associated device (row device_id) and a timestamp for each operation (row create_date).
Something like this :
/------+-----------+------------------+---------\
| ID | Device ID | Create_Date | Status |
+------+-----------+------------------+---------+
| 1 | 1 | 2012-03-04 01:43 | Success |
| 2 | 4 | 2012-04-04 02:34 | Failed |
| 3 | 9 | 2013-01-01 01:23 | Failed |
| 4 | 4 | 2013-12-12 12:34 | Success |
| 5 | 23 | 2014-02-01 03:45 | Success |
| 6 | 1 | 2014-05-03 08:34 | Failed |
\------+-----------+------------------+---------/
I also have another table (called subscription) that tells me when the warranty has started (row create_date) for the product (row device_id). Warranty lasts one year.
/-----------+------------------\
| Device ID | Create_Date |
+-----------+------------------+
| 2 | 2011-04-03 05:00 |
| 4 | 2012-03-05 03:45 |
| 5 | 2012-03-05 06:07 |
| ... | ... |
\-----------+------------------/
I am using PostgreSQL.
I want to do the following :
List all device IDs which had at least one successful operation before a given date (2014-07-06)
For each of those devices, count :
The number of failed operations after that date + 2 days (2014-07-08), and the device was under warranty when the operation was attempted
The number of failed operations after that date + 2 days (2014-07-08), and the device was outside warranty when the operation was attempted
The number of successful operations after that date (device being under warranty or not)
I had some limited success with the following (query has been simplified a little bit for readability - there are other joins involved to get to the subscription table, and other criterias to include the devices in the list) :
SELECT distinct device_operation.device_id as did, subscription.create_date,
(
SELECT COUNT(*)
FROM device_operation dop
WHERE dop.device_id = device_operation.device_id and
dop.create_date > '2014-07-08' and
dop.status = 'Success'
) as success,
(
SELECT COUNT(*)
FROM device_operation dop2
WHERE
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08' and
dop2.status = 'Failed' and
dop2.create_date <= subscription.create_date + interval '1 year'
) as failed_during_warranty,
(
SELECT COUNT(*)
FROM device_operation dop2
WHERE
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08' and
dop2.status = 'Failed' and
dop2.create_date > subscription.create_date + interval '1 year'
) as failed_after_warranty,
FROM device_operation, subscription
WHERE
device_operation.status = 'Success' and -- list operations which are successful
device_operation.create_date <= '2014-07-06' and -- list operations before that date
device_operation.device_id = subscription.device_id -- get warranty start for each operation
ORDER BY success DESC, failed_during_warranty DESC, failed_after_warranty DESC
As you can guess, it's so slow I cannot run the query. However it gives you an idea of the structure.
I have tried to use NULLIF to combine the requests into one, in the hope it's going to make PostgreSQL only list the subquery once instead of 3, but it returns "subquery must return only one column" :
SELECT distinct device_operation.device_id as did, subscription.create_date,
(
SELECT COUNT(NULLIF(dop2.status != 'Success', true)) as completed,
COUNT(NULLIF(dop2.status != 'Failed' or not (dop2.create_date <= subscription.create_date + interval '1 year'), true)) as failed_in_warranty,
COUNT(NULLIF(dop2.status != 'Failed' or (dop2.create_date <= subscription.create_date + interval '1 year'), true)) as failed_after_warranty
FROM device_operation dop2
WHERE
dop2.device_id = device_operation.device_id and
dop2.device_id = subscription.device_id and
dop2.create_date > '2014-07-08'
) as subq
FROM device_operation, subscription
WHERE
device_operation.status = 'Success' and -- list operations which are successful
device_operation.create_date <= '2014-07-06' and -- list operations before that date
device_operation.device_id = subscription.device_id -- get warranty start for each operation
ORDER BY success DESC, failed_in_warranty DESC, failed_outside_warranty DESC
I also tried to move the subquery to the FROM clause but that doesn't work as I need to run the subquery for each row of the main query (or do I ? maybe there's a better way)
What I expect is something like this :
/-----------+---------+------------------------+-----------------------\
| Device ID | Success | Failed during warranty | Failed after warranty |
+-----------+---------+------------------------+-----------------------+
| 194853 | 10 | 0 | 0 |
| 7853 | 5 | 5 | 0 |
| 5848 | 3 | 0 | 56 |
| 8546455 | 0 | 45 | 0 |
| 102 | 0 | 4 | 1 |
| 69329548 | 0 | 0 | 9 |
| 17 | 0 | 0 | 0 |
\-----------+---------+------------------------+-----------------------+
Can someone help me find the most efficient way to do it ?
EDIT: Corner cases: You can consider all devices have an entry in subscription.
Thank you very much !
I think you just require conditional aggregation. I find the data structure and logic a bit hard to follow, but I think the following is basically what you need:
SELECT d.device_id,
SUM(CASE WHEN d.status = 'Failed' AND d.create_date <= '2014-07-06' + interval '2 day'
THEN 1 ELSE 0
END) as NumFails,
SUM(CASE WHEN d.status = 'Failed' AND d.create_date <= '2014-07-06' + interval '2 day' AND
d.create_date > s.create_date + interval '1 year'
THEN 1 ELSE 0
END) as NumFailsNoWarranty,
SUM(CASE WHEN d.status = 'Success' AND d.create_date <= '2014-07-06' + interval '2 day'
THEN 1 ELSE 0
END) as NumSuccesses
FROM device_operation d JOIN
subscription s
ON d.device_id = s.device_id
GROUP BY d.device_id
HAVING SUM(CASE WHEN d.status = 'Success' AND d.create_date <= '2014-07-06' THEN 1 ELSE 0 END) > 0;

Access - How to delete records if there is data with a newer date

I need to delete some records from access. My data looks like,
RDNUMB | RD SEQ | COUNTDATE | COUNT
-------+--------+-----------+--------
101200 | 10 | 3/25/12 | 120
101200 | 20 | 2/27/13 | 1400
101200 | 20 | 6/15/11 | 905
101200 | 20 | 10/1/07 | 1020
I need to figure out a way to look at look at the RDNUMB and RD SEQ and delete the entire record if there are records with a newer date. In this case I need to delete the records on dates 6/15/11 and 10/1/07.
RD SEQ is not unique to only this RDNUMB it is used over and over again.
Thanks for your thoughts and time
From a SQL Perspective this is what you want to do:
DELETE T
FROM YourTable AS T
JOIN ( SELECT RDNUMB, RDSEQ, COUNTDATE
FROM YourTable AS YT
WHERE YT.RDNUMB = T.RDNUMB AND YT.RDSEQ = T.RDSEQ
AND YT.CountDate <> (SELECT MAX(COUNTDATE) FROM YourTable
WHERE RDNUMB = YT.RDNUMB AND RDSEQ = YT.RDSEQ)
) AS R
ON R.RDNUMB = T.RDNUMB AND R.RDSEQ = T.RDSEQ AND R.COUNTDATE = T.COUNTDATE

Selecting Historical Record Outside of Date Range

I have a situation where I need to select an address that was current for a particular date and time from an address history table. Some sample records might be as follows:
Address/Client JOIN Table (Address_Client_JOIN):
-------------------------
|AddressId | ClientId |
-------------------------
|5 | 8888887 |
-------------------------
|6 | 8888887 |
-------------------------
History Table (Address_History):
-------------------------------------------------------------------------------------------
|HistoryId | AddressId | AddTypeId | StreetAddress | CreatedDate | ModifiedDate |
-------------------------------------------------------------------------------------------
|1 | 5 | 1 | 123 Home Street| 2013-03-11 21:08 | 2013-04-02 13:18|
-------------------------------------------------------------------------------------------
|2 | 5 | 2 | 456 My Avenue | 2013-03-11 21:08 | 2013-04-08 15:00|
-------------------------------------------------------------------------------------------
|3 | 6 | 1 | 789 Cat Road | 2013-05-17 12:00 | 2013-05-17 12:00|
-------------------------------------------------------------------------------------------
The requirements for this query are that I have to grab the earliest record where #dateOfService falls between the CreatedDate and the ModifiedDate and where the AddTypeId is "1", if there is one, otherwise any other AddTypeId. The query I've thus far created is:
SELECT TOP 1 ah.HistoryId, ah.AddTypeId, ah.AddressId, ah.StreetAddress,
ah.CreatedDate, ah.ModifiedDate
FROM Address_Client_JOIN acj WITH (NOLOCK)
INNER JOIN Address_History ah WITH (NOLOCK) ON ah.AddressId = acj.AddressId
WHERE apj.ClientId = #clientId
AND (ah.CreatedDate <= #dateOfService
AND (#dateOfService <= ah.ModifiedDate ))
ORDER BY
ah.HistoryId ASC, CASE WHEN ah.AddTypeId = 1 THEN 0 ELSE 1 END
This works fine as long as the #dateOfService falls between the CreatedDate and ModifiedDate. However, when I've got a #dateOfService that occurs after the ModifiedDate, I get nothing, obviously. I need to be able to account for a situation where (using the above data) #dateOfService is after the ModifiedDate of 5/17/2013. For example, where #dateOfService = '2013-08-01 12:30'.
Thanks in advance.
You are only selecting the top row. That means that you can move the where filter into the order by clause. Then, it becomes a priority rather than a filter.
So, if nothing matches the filter, you will still be able to get a row. I think the query you want is something like:
SELECT TOP 1 ah.HistoryId, ah.AddTypeId, ah.AddressId, ah.StreetAddress,
ah.CreatedDate, ah.ModifiedDate
FROM Address_Client_JOIN acj WITH (NOLOCK) INNER JOIN
Address_History ah WITH (NOLOCK)
ON ah.AddressId = acj.AddressId
WHERE apj.ClientId = #clientId
ORDER BY ah.HistoryId ASC,
(CASE WHEN ah.AddTypeId = 1 THEN 0 ELSE 1 END),
(case when ah.CreatedDate <= #dateOfService AND #dateOfService <= ah.ModifiedDate then 1
when #dateOfService > ah.ModifiedDate then 2
else 3
end)