Finding runs of a particular value - sql

I have a table in Oracle 10 that is defined like this:
LOCATION HOUR STATUS
--------------------------------------
10 12/10/09 5:00PM 1
10 12/10/09 6:00PM 1
10 12/10/09 7:00PM 2
10 12/10/09 8:00PM 1
10 12/10/09 9:00PM 3
10 12/10/09 10:00PM 3
10 12/10/09 11:00PM 3
This table continues for various locations and for a small number of status values. Each row covers one hour for one location. Data is collected from a particular location over the course of that hour, and processed in chunks. Sometimes the data is available, sometimes it isn't, and that information is encoded in the status. I am trying to find runs of a particular status, so that I could convert the above table into something like:
LOCATION STATUS START END
-----------------------------------------------------------
10 1 12/10/09 5:00PM 12/10/09 7:00PM
10 2 12/10/09 7:00PM 12/10/09 8:00PM
10 1 12/10/09 8:00PM 12/10/09 9:00PM
10 3 12/10/09 9:00PM 12/11/09 12:00AM
Basically condensing the table into rows that define each stretch of a particular status. I have tried various tricks, like using lead/lag to figure out where starts and ends are and such, but all of them have met with failure. The only trick that works so far is going one by one through the values programatically, which is slow. Any ideas for doing it directly in Oracle? Thanks!

Here's an ANSI SQL solution:
select t1.location
, t1.status
, min(t1.hour) AS "start" -- first of stretch of same status
, coalesce(t2.hour, max(t1.hour) + INTERVAL '1' HOUR) AS "end"
from t_intervals t1 -- base table, this is what we are condensing
left join t_intervals t2 -- finding the first datetime after a stretch of t1
on t1.location = t2.location -- demand same location
and t1.hour < t2.hour -- demand t1 before t2
and t1.status != t2.status -- demand different status
left join t_intervals t3 -- finding rows not like t1, with hour between t1 and t2
on t1.location = t3.location
and t1.status != t3.status
and t1.hour < t3.hour
and t2.hour > t3.hour
where t3.status is null -- demand that t3 does not exist, in other words, t2 marks a status transition
group by t1.location -- condense on location, status.
, t1.status
, t2.hour -- this pins the status transition
order by t1.location
, t1.status
, min(t1.hour)

OK, I apologize for not knowing Oracle syntax, but I hope that the below Sybase one is clear enough
(I split it into 3 queries creating 2 temp tables for readbility but you can just re-unit as sub-queries. I don't know how to add/subtract 1 hour in Oracle, dateadd(hh...) does it in Sybase
SELECT * FROM T
INTO #START_OF_PERIODS
WHERE NOT EXISTS (
SELECT 1 FROM T_BEFORE
WHERE T.LOCATION = T_BEFORE.LOCATION
AND T.STATUS = T_BEFORE.STATUS
AND T.HOUR = dateadd(hh, T_BEFORE.HOUR, 1)
)
SELECT * FROM T
INTO #END_OF_PERIODS
WHERE NOT EXISTS (
SELECT 1 FROM T_AFTER
WHERE T.LOCATION = T_AFTER.LOCATION
AND T.STATUS = T_AFTER.STATUS
AND T.HOUR = dateadd(hh, T_AFTER.HOUR, -1)
)
SELECT T1.LOCATION, T1.STATUS, T1.HOUR AS 'START', MIN(T2.HOUR) AS 'END'
FROM #START_OF_PERIODS 'T1', #END_OF_PERIODS 'T2'
WHERE T1.LOCATION = T2.LOCATION
AND T1.STATUS = T2.STATUS
AND T1.HOUR <= T2.HOUR
GROUP BY T1.LOCATION, T1.STATUS, T1.HOUR
-- May need to add T2.LOCATION, T2.STATUS to GROUP BY???

Ever thought about a stored procedure? I think that would be the most readable solution.
Basic Idea:
run a select statement that gives you the rown in the right order for one building
iterate over the result line by line and write a new 'run'-record every time the status changes and when reaching the end of the result set.
You need to test if it is also the fastest way. Depending on the number of records, this might not be an issue at all.

Related

Update column with autonumber

I need to figure out when each person will complete a task based on a work calendar that won't include sequential dates. I know the data in two tables T1
Name DaysRemaining Complete
Joe 3
Mary 2
and T2
Date Count
6/1/2018
6/8/2018
6/10/2018
6/15/2018
Now if Joe has 3 days remaining I would like to count 3 records forward from today in T2 and return the date to the Complete column. If today is 6/1/2018 I would want the Update query to return 6/10/2018 to the Complete column for Joe.
My thought is that I could daily update T2.count with a query that began today and would then autoincrement. Following that I could join the T1 and T2 on DaysRemaining and Count. I can do that but haven't found a working solution for updating t2.count with autoincrement. Any better ideas? I am using a linked sharepoint table so creating a new field each time would not be an option.
I think this will work:
select t1.*, t2.date
from t1, t2 -- ms access doesn't support cross join
where t1.daysremaining = (select count(*)
from t2 as tt2
where tt2.date <= t2.date and tt2.date > now()
);
This is an expensive query and one that is easier to express and more efficient in almost any other database.

SQL select from multiple tables based on datetime

I am working on a script to analyze some data contained in thousands of tables on a SQL Server 2008 database.
For simplicity sakes, the tables can be broken down into groups of 4-8 semi-related tables. By semi-related I mean that they are data collections for the same item but they do not have any actual SQL relationship. Each table consists of a date-time stamp (datetime2 data type), value (can be a bit, int, or float depending on the particular item), and some other columns that are currently not of interest. The date-time stamp is set for every 15 minutes (on the quarter hour) within a few seconds; however, not all of the data is recorded precisely at the same time...
For example:
TABLE1:
TIMESTAMP VALUE
2014-11-27 07:15:00.390 1
2014-11-27 07:30:00.390 0
2014-11-27 07:45:00.373 0
2014-11-27 08:00:00.327 0
TABLE2:
TIMESTAMP VALUE
2014-11-19 08:00:07.880 0
2014-11-19 08:15:06.867 0.0979999974370003
2014-11-19 08:30:08.593 0.0979999974370003
2014-11-19 08:45:07.397 0.0979999974370003
TABLE3
TIMESTAMP VALUE
2014-11-27 07:15:00.390 0
2014-11-27 07:30:00.390 0
2014-11-27 07:45:00.373 1
2014-11-27 08:00:00.327 1
As you can see, not all of the tables will start with the same quarterly TIMESTAMP. Basically, what I am after is a query that will return the VALUE for each of the 3 tables for every 15 minute interval starting with the earliest TIMESTAMP out of the 3 tables. For the example given, I'd want to start at 2014-11-27 07:15 (don't care about seconds... thus, would need to allow for the timestamp to be +- 1 minute or so). Returning NULL for the value when there is no record for the particular TIMESTAMP is ok. So, the query for my listed example would return something like:
TIMESTAMP VALUE1 VALUE2 VALUE3
2014-11-27 07:15 1 NULL 0
2014-11-27 07:30 0 NULL 0
2014-11-27 07:45 0 NULL 1
2014-11-27 08:00 0 NULL 1
...
2014-11-19 08:00 0 0 1
2014-11-19 08:15 0 0.0979999974370003 0
2014-11-19 08:30 0 0.0979999974370003 0
2014-11-19 08:45 0 0.0979999974370003 0
I hope this makes sense. Any help/pointers/guidance will be appreciated.
Use Full Outer Join
SELECT COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP]) [TIMESTAMP],
Isnull(Max(a.VALUE), 0) VALUE1,
Max(b.VALUE) VALUE2,
Isnull(Max(c.VALUE), 0) VALUE3
FROM TABLE1 a
FULL OUTER JOIN TABLE2 b
ON CONVERT(SMALLDATETIME, a.[TIMESTAMP]) = CONVERT(SMALLDATETIME, b.[TIMESTAMP])
FULL OUTER JOIN TABLE3 c
ON CONVERT(SMALLDATETIME, a.[TIMESTAMP]) = CONVERT(SMALLDATETIME, c.[TIMESTAMP])
GROUP BY COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP])
ORDER BY [TIMESTAMP] DESC
The first thing I would do is normalize the timestamps to the minute. You can do this with an update to the existing column
UPDATE TABLENAME
SET TIMESTAMP = dateadd(minute,datediff(minute,0,TIMESTAMP),0)
or in a new column
ALTER TABLE TABLENAME ADD COLUMN NORMTIME DATETIME;
UPDATE TABLENAME
SET NORMTIME = dateadd(minute,datediff(minute,0,TIMESTAMP),0)
For details on flooring dates this see this post: Floor a date in SQL server
The next step is to make a table that has all of the timestamps (normalized) that you expect to see -- that is every 15 -- one per row. Lets call this table TIME_PERIOD and the column EVENT_TIME for my examples (call it whatever you want).
There are many ways to make such a table recursive CTE, ROW_NUMBER(), even brute force. I leave that part up to you.
Now the problem is simple select with left joins and a filter for valid values like this:
SELECT TP.EVENT_TIME, a.VALUE as VALUE1, b.VALUE as VALUE2, c.VALUE as VALUE3
FROM TIME_PERIOD TP
LEFT JOIN TABLE1 a ON a.[TIMESTAMP] = TP.EVENT_TIME
LEFT JOIN TABLE2 b ON b.[TIMESTAMP] = TP.EVENT_TIME
LEFT JOIN TABLE3 c ON c.[TIMESTAMP] = TP.EVENT_TIME
WHERE COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP]) is not null
ORDER BY TP.EVENT_TIME DESC
The where might get a little more complex if they are different types so you can always use this (which is not as good as coalesce but will always work):
WHERE a.[TIMESTAMP] IS NOT NULL OR
b.[TIMESTAMP] IS NOT NULL OR
c.[TIMESTAMP] IS NOT NULL
Here is an updated version of NoDisplayName's answer that does what you want. It works for SQL 2012, but you could replace the DATETIMEFROMPARTS function with a series of other functions to get the same result.
;WITH
NewT1 as (
SELECT DATETimeFROMPARTS( DATEPART(year,Timestamp) , DATEPART(month,timestamp) , datepart(day,timestamp),datepart(hour,timestamp), datepart(minute,timestamp),0,0 ) as TimeStamp, Value
FROM Table1),
NewT2 as (
SELECT DATETimeFROMPARTS( DATEPART(year,Timestamp) , DATEPART(month,timestamp) , datepart(day,timestamp),datepart(hour,timestamp), datepart(minute,timestamp),0,0 ) as TimeStamp, Value
FROM Table2),
NewT3 as (
SELECT DATETimeFROMPARTS( DATEPART(year,Timestamp) , DATEPART(month,timestamp) , datepart(day,timestamp),datepart(hour,timestamp), datepart(minute,timestamp),0,0 ) as TimeStamp, Value
FROM Table3)
SELECT COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP]) [TIMESTAMPs],
Isnull(Max(a.VALUE), 0) VALUE1,
Isnull(Max(b.VALUE), 0) VALUE2,
Isnull(Max(c.VALUE), 0) VALUE3
FROM NewT1 a
FULL OUTER JOIN NewT2 b
ON a.[TIMESTAMP] = b.[TIMESTAMP]
FULL OUTER JOIN TABLE3 c
ON a.[TIMESTAMP] = b.[TIMESTAMP]
GROUP BY COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP])
ORDER BY [TIMESTAMPs]

Delete rows with continuous dates within date period in SQL Server [duplicate]

This question already exists:
Closed 10 years ago.
Possible Duplicate:
Trying to consolidate employer records who are continuously work for same department
I am trying to consolidate employees records who have been continuously (anything < 45 days) enrolled with the specific department
Note: If the date diff (between emp_eff_to_date and next row emp_eff_from_date) is less than 45 days then it is considered as continuous
INPUT:
EMP_ID + DEPT_ID + EMP_EFF_FROM_DATE + EMP_EFF_TO_DATE
-----------------------------------------------------------------------
10 10001 8/1/2008 10/31/2009
10 10001 11/1/2009 2/25/2010
10 10001 2/26/2010 5/1/2011
10 10001 8/1/2011 10/30/2011
10 10001 12/1/2011 10/31/2012
10 10003 7/1/2007 10/31/2007
10 10004 9/27/2004 6/8/2006
10 10004 6/30/2006 6/29/2007
10 10007 6/25/2006 6/20/2007
10 10007 8/25/2007 5/25/2008
Output desired:
EMP_ID DEPT_ID EMP_EFF_FROM_DATE EMP_EFF_TO_DATE
-------------------------------------------------------------------------
10 10001 2008-08-01 2011-05-01
10 10001 2011-08-01 2012-10-31
10 10003 2007-07-01 2007-10-31
10 10004 2004-09-27 2007-06-29
10 10007 2006-06-25 2007-06-20
10 10007 2007-08-25 2007-06-29
I had to do a very similar thing recently, and my first thought was a Recursive table expression, which works, but may not be the best solution depending on the amount of data that is in your table.
It is not clear whether you want to actually delete the rows from the database, or just view the results as required based on the records as they currently are.
SOLUTION 1 (SQL Fiddle)
This uses the CTE to just select the results. It will essentially find the next row where the from date is within 45 days of the current row's to date, and keep looping until there are no matches. Once done it finds the result for the latest result for each from date (MaxRecursion field), and excludes then all other rows that fall within the date range of that row.
WITH CTE AS
( SELECT *, [Recursion] = 0
FROM T
UNION ALL
SELECT T.EMP_ID,
T.DEPT_ID,
T.EMP_EFF_FROM_DATE,
T2.EMP_EFF_TO_DATE,
T.[Recursion] + 1
FROM CTE T
INNER JOIN T T2
ON T.EMP_ID = T.EMP_ID
AND T.DEPT_ID = T2.DEPT_ID
AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_FROM_DATE
AND T2.EMP_EFF_TO_DATE > T.EMP_EFF_TO_DATE
AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
), CTE2 AS
( SELECT *,
[MaxRecursion] = MAX(Recursion) OVER(PARTITION BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE)
FROM CTE
)
SELECT T.EMP_ID,
T.DEPT_ID,
T.EMP_EFF_FROM_DATE,
T.EMP_EFF_TO_DATE
FROM CTE2 T
WHERE Recursion = MaxRecursion
AND NOT EXISTS
( SELECT 1
FROM CTE2 T2
WHERE T.EMP_ID = T2.EMP_ID
AND T.DEPT_ID = T2.DEPT_ID
AND T.EMP_EFF_FROM_DATE < T2.EMP_EFF_FROM_DATE
AND T.EMP_EFF_TO_DATE >= T2.EMP_EFF_TO_DATE
)
ORDER BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE, EMP_EFF_TO_DATE;
SOLUTION 2 (SQL Fiddle)
This will actually update existing rows, and delete redundant rows, meaning you can just select from the table to get the desired results. If ofcourse you don't want to actually delete from the database you could just insert the data into a temp table and apply the same principle (Example here). In my case this solution ran a lot faster than using a recursive CTE, because at each stage of the loop the query is dealing with less data, rather than more as with the recursive cte.
WHILE EXISTS
( SELECT 1
FROM T
INNER JOIN T T2
ON T2.EMP_ID = T.EMP_ID
AND T2.DEPT_ID = T.DEPT_ID
AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_TO_DATE
AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
)
BEGIN
UPDATE T
SET EMP_EFF_TO_DATE = T2.EMP_EFF_TO_DATE
FROM T
INNER JOIN
( SELECT *
FROM T
) T2
ON T2.EMP_ID = T.EMP_ID
AND T2.DEPT_ID = T.DEPT_ID
AND T2.EMP_EFF_FROM_DATE > T.EMP_EFF_TO_DATE
AND T2.EMP_EFF_FROM_DATE <= DATEADD(DAY, 45, T.EMP_EFF_TO_DATE)
DELETE T
FROM T
WHERE EXISTS
( SELECT 1
FROM T T2
WHERE T2.EMP_ID = T.EMP_ID
AND T2.DEPT_ID = T.DEPT_ID
AND T2.EMP_EFF_FROM_DATE < T.EMP_EFF_FROM_DATE
AND T2.EMP_EFF_TO_DATE BETWEEN T.EMP_EFF_FROM_DATE AND T.EMP_EFF_TO_DATE
)
END;
SELECT *
FROM T
ORDER BY EMP_ID, DEPT_ID, EMP_EFF_FROM_DATE;
All of these solutions differ to your sample data in the last row which appears to be an error:
I think this row:
10 10007 2007-08-25 2007-06-29
should be:
10 10007 2007-08-25 2008-05-25
Assuming the next row is according to the emp_eff_from_date field (sorted), here is a way to solve it:
WITH DATA
AS (SELECT *,
Row_number()
OVER (
PARTITION BY EMP_ID
ORDER BY EMP_EFF_FROM_DATE)rn
FROM TEST)
SELECT t1.*
FROM DATA t1
INNER JOIN DATA t2
ON t1.RN = t2.RN - 1
WHERE Datediff(DAY, t1.EMP_EFF_TO_DATE, t2.EMP_EFF_FROM_DATE) <= 45
The full solution is here
Let me know if it's not exactly what you wanted.

Is there a way to sum the rows above the current row

I am trying to replicate a spreadsheet into a database and I am stumped on how to only sum the rows above the current row like I can do in excel. For example, the formula for the one column is as follows:
=SUM($B$9:$B9)/SUM($E$9:$E9)
=SUM($B$9:$B10)/SUM($E$9:$E10)
=SUM($B$9:$B11)/SUM($E$9:$E11)
.... and so on.
I need to somehow reproduce this formula in my select qry but not sure how?
SELECT Column1,Column2, SUM(Column1) + SUM(Column2) as Expr1
From tblTest
Group By Column1,Column2
Any ideas?
Col1 Col2 Col3 Total Col1/Total Trying to get this
7 0 3 10 70.00% 70.00%
1 0 1 100.00% 72.73%
2 0 4 6 33.33% 58.82%
3 1 1 5 60.00% 59.09%
The trick is you want to use a subquery with a way to identify all the previous rows.
SELECT ID,
(
SELECT SUM(ValueColumn)
FROM Test T2
WHERE T2.ID <= T1.ID
) RunningSum
FROM
Test T1
ORDER BY
T1.ID
This will work. If you have a large data set you may want to just select the data and calculate in your application as you'll be able to keep a running-sum as you loop through the data more efficiently than this query will run.

SQl Server 2005: How to create a TSQL as per my business logic?

I have two tables:-
Table_01
Table01ID (int)
TestID (uniqueidentifier)
TestDate (Datetime)
TestNo (varchar)
Table_02
Table02ID (int)
TestID (Uniqueidentifier - Fk from Table01)
TransDate (DateTIme)
status = (int - 0 or 1)
Rules
Table_02 can have more than one record per TestID (FK)
So records per TestID have status = 1.
but once status is 0. (that means transaction done)
For Example (Table_02): -
Table02ID TestID TransDate Status
----------------------------------------------------------------------------------
1 {21EC2020-3AEA-1069-A2DD-08002B30309D} 01-10-2010 11:30:00.000 1
2 {21EC2020-3AEA-1069-A2DD-08002B30309D} 01-10-2010 11:35:00.000 1
3 {21EC2020-3AEA-1069-A2DD-08002B30309D} 01-10-2010 11:40:00.000 1
4 {21EC2020-3AEA-1069-A2DD-08002B30309D} 01-10-2010 11:59:00.000 1
5 {21EC2020-3AEA-1069-A2DD-08002B30309D} 01-10-2010 12:20:00.000 0
Now I need:-
SELECT t.*
FROM Table_01 t1
JOIN Table_02 t2 on t1.TestID = t2.TestID
WHERE
if t2.Status = 0 then Ignore the record (TestID)
AND t2.Status = 1 then return only 1 Top Record order by t2.TransDate desc
I hope you guys got it what i mean :|. Please help?
#Novice: Here's how I'd do it.
Select * from Table_01
Join
(Select MAX(TransDate) as LatestDate, TestID, status from Table_02
where TestID not in
(Select TestID from Table_02 where status = 0)
group by TestID, status) as latestTrans
on latestTrans.TestID = Table_01.TestID
order by latestTrans.LatestDate DESC
The middle part (lines 3-5) gets the latest record for each TestID (filtering out those you don't want) and creates a temporary table "latestTrans". Joining this to the original Table_01 gives you all relevant Table_01 columns for each entry in the temporary table.
#Raymund's answer does the same thing with slightly different syntax. I'm not sure which solution would perform better. My script makes it easier to maintain as the "group by" statement won't grow if you need more columns from Table_01 to be returned and it is easier to add more join statements if need be.
UPDATE
I changed the script to take out any TestIDs where the status is 0. The script's starting to look rather complex for a seemingly simple idea, but it's complicated in that you need to group the data twice, first by status and then by TestID (to get the MaxDate).
Note: I took out the check for status = 1 assuming all possible values for 'status' are 0 and 1. Put it back in if that's not the case.
try this
SELECT t1.Table01ID, t1.TestNo, t1.TestDate, MAX(t2.TransDate), t2.TestID, t2.status
FROM Table_02 t2 INNER JOIN
Table_01 t1 ON t2.TestID = t1.TestID
GROUP BY t2.status, t2.TestID, t1.Table01ID, t1.TestDate, t1.TestNo
HAVING (t2.status = 1)