SQL select from multiple tables based on datetime - sql

I am working on a script to analyze some data contained in thousands of tables on a SQL Server 2008 database.
For simplicity sakes, the tables can be broken down into groups of 4-8 semi-related tables. By semi-related I mean that they are data collections for the same item but they do not have any actual SQL relationship. Each table consists of a date-time stamp (datetime2 data type), value (can be a bit, int, or float depending on the particular item), and some other columns that are currently not of interest. The date-time stamp is set for every 15 minutes (on the quarter hour) within a few seconds; however, not all of the data is recorded precisely at the same time...
For example:
TABLE1:
TIMESTAMP VALUE
2014-11-27 07:15:00.390 1
2014-11-27 07:30:00.390 0
2014-11-27 07:45:00.373 0
2014-11-27 08:00:00.327 0
TABLE2:
TIMESTAMP VALUE
2014-11-19 08:00:07.880 0
2014-11-19 08:15:06.867 0.0979999974370003
2014-11-19 08:30:08.593 0.0979999974370003
2014-11-19 08:45:07.397 0.0979999974370003
TABLE3
TIMESTAMP VALUE
2014-11-27 07:15:00.390 0
2014-11-27 07:30:00.390 0
2014-11-27 07:45:00.373 1
2014-11-27 08:00:00.327 1
As you can see, not all of the tables will start with the same quarterly TIMESTAMP. Basically, what I am after is a query that will return the VALUE for each of the 3 tables for every 15 minute interval starting with the earliest TIMESTAMP out of the 3 tables. For the example given, I'd want to start at 2014-11-27 07:15 (don't care about seconds... thus, would need to allow for the timestamp to be +- 1 minute or so). Returning NULL for the value when there is no record for the particular TIMESTAMP is ok. So, the query for my listed example would return something like:
TIMESTAMP VALUE1 VALUE2 VALUE3
2014-11-27 07:15 1 NULL 0
2014-11-27 07:30 0 NULL 0
2014-11-27 07:45 0 NULL 1
2014-11-27 08:00 0 NULL 1
...
2014-11-19 08:00 0 0 1
2014-11-19 08:15 0 0.0979999974370003 0
2014-11-19 08:30 0 0.0979999974370003 0
2014-11-19 08:45 0 0.0979999974370003 0
I hope this makes sense. Any help/pointers/guidance will be appreciated.

Use Full Outer Join
SELECT COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP]) [TIMESTAMP],
Isnull(Max(a.VALUE), 0) VALUE1,
Max(b.VALUE) VALUE2,
Isnull(Max(c.VALUE), 0) VALUE3
FROM TABLE1 a
FULL OUTER JOIN TABLE2 b
ON CONVERT(SMALLDATETIME, a.[TIMESTAMP]) = CONVERT(SMALLDATETIME, b.[TIMESTAMP])
FULL OUTER JOIN TABLE3 c
ON CONVERT(SMALLDATETIME, a.[TIMESTAMP]) = CONVERT(SMALLDATETIME, c.[TIMESTAMP])
GROUP BY COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP])
ORDER BY [TIMESTAMP] DESC

The first thing I would do is normalize the timestamps to the minute. You can do this with an update to the existing column
UPDATE TABLENAME
SET TIMESTAMP = dateadd(minute,datediff(minute,0,TIMESTAMP),0)
or in a new column
ALTER TABLE TABLENAME ADD COLUMN NORMTIME DATETIME;
UPDATE TABLENAME
SET NORMTIME = dateadd(minute,datediff(minute,0,TIMESTAMP),0)
For details on flooring dates this see this post: Floor a date in SQL server
The next step is to make a table that has all of the timestamps (normalized) that you expect to see -- that is every 15 -- one per row. Lets call this table TIME_PERIOD and the column EVENT_TIME for my examples (call it whatever you want).
There are many ways to make such a table recursive CTE, ROW_NUMBER(), even brute force. I leave that part up to you.
Now the problem is simple select with left joins and a filter for valid values like this:
SELECT TP.EVENT_TIME, a.VALUE as VALUE1, b.VALUE as VALUE2, c.VALUE as VALUE3
FROM TIME_PERIOD TP
LEFT JOIN TABLE1 a ON a.[TIMESTAMP] = TP.EVENT_TIME
LEFT JOIN TABLE2 b ON b.[TIMESTAMP] = TP.EVENT_TIME
LEFT JOIN TABLE3 c ON c.[TIMESTAMP] = TP.EVENT_TIME
WHERE COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP]) is not null
ORDER BY TP.EVENT_TIME DESC
The where might get a little more complex if they are different types so you can always use this (which is not as good as coalesce but will always work):
WHERE a.[TIMESTAMP] IS NOT NULL OR
b.[TIMESTAMP] IS NOT NULL OR
c.[TIMESTAMP] IS NOT NULL

Here is an updated version of NoDisplayName's answer that does what you want. It works for SQL 2012, but you could replace the DATETIMEFROMPARTS function with a series of other functions to get the same result.
;WITH
NewT1 as (
SELECT DATETimeFROMPARTS( DATEPART(year,Timestamp) , DATEPART(month,timestamp) , datepart(day,timestamp),datepart(hour,timestamp), datepart(minute,timestamp),0,0 ) as TimeStamp, Value
FROM Table1),
NewT2 as (
SELECT DATETimeFROMPARTS( DATEPART(year,Timestamp) , DATEPART(month,timestamp) , datepart(day,timestamp),datepart(hour,timestamp), datepart(minute,timestamp),0,0 ) as TimeStamp, Value
FROM Table2),
NewT3 as (
SELECT DATETimeFROMPARTS( DATEPART(year,Timestamp) , DATEPART(month,timestamp) , datepart(day,timestamp),datepart(hour,timestamp), datepart(minute,timestamp),0,0 ) as TimeStamp, Value
FROM Table3)
SELECT COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP]) [TIMESTAMPs],
Isnull(Max(a.VALUE), 0) VALUE1,
Isnull(Max(b.VALUE), 0) VALUE2,
Isnull(Max(c.VALUE), 0) VALUE3
FROM NewT1 a
FULL OUTER JOIN NewT2 b
ON a.[TIMESTAMP] = b.[TIMESTAMP]
FULL OUTER JOIN TABLE3 c
ON a.[TIMESTAMP] = b.[TIMESTAMP]
GROUP BY COALESCE(a.[TIMESTAMP], b.[TIMESTAMP], c.[TIMESTAMP])
ORDER BY [TIMESTAMPs]

Related

Compare 2 time datatype

I have a table in SQL Server which has 2 columns StartTime AND EndTime. The datatype of both columns in time(7). So when I view data in the table it can look like this:
08:33:00.0000000
or
19:33:00.0000000
I want to return all the rows in the table where StartTime and EndTime conflicts with another row.
Example table TimeTable
RowID StartTime EndTime
1 08:33:00.0000000 19:33:00.0000000
2 10:34:00.0000000 15:32:00.0000000
3 03:00:00.0000000 05:00:00.0000000
Type of query I am trying to do:
SELECT * FROM TimeTable
WHERE RowID = 1
AND
TimeTable.StartTime AND EndTime
Falls in range
(SELECT * FROM TimeTable WHERE RowID <> 1)
Expected result:
2 10:34:00.0000000 15:32:00.0000000
You can use logic like this for at least a fraction of a second of overlap:
select tt.*
from timetable tt
where exists (select 1
from tt2
where tt2.rowid <> tt.rowid and
tt2.endtime > tt.starttime and
tt2.starttime < tt.endtime
);

SQL:How do I group by the ids with only the null directly located directly below it?

How do I group by the ids with only the null directly located directly below it. Then get sum of the time?
ID time
1 time1
null time1
null time1
null time1
2 time1
null time1
null time1
3 time1
null time1
null time1
Result wanted
ID time
1 sumTime
2 sumTime
3 sumTime
SQL tables represent unordered sets. In order for you to do what you want, you need a column that specifies the ordering. Once you have that, you can identify the groups by counting up the cumulative number of non-null values in id and aggregating:
select id, sum(time)
from (select t.*,
count(id) over (order by <ordering col>) as grp
from t
) t
group by id;
If you do not have an ordering column, your question does not make sense, because the table is unordered.
I agree with Gordon Linoff that what you are asking falls outside the rules of SQL Server because tables in SQL Server are unordered sets.
However, assuming that if you run the command
SELECT * FROM YourTimeTable
returns the data following the order and structure you showed:
ID time
1 time1
null time1
null time1
null time1
2 time1
null time1
null time1
3 time1
null time1
null time1
You can make it work with the following strategy:
Add a new column with row numbers that we can use to add ordering
Then run an update statement to set the ID = to the highest ID in row numbers smaller than the current row number.
if OBJECT_ID('tempdb.dbo.#tempTimeTable') IS NOT NULL
begin
drop table #tempTimeTable
end
SELECT ROW_NUMBER() OVER(ORDER BY TIME) AS RowN, * into #tempTimeTable FROM YourTimeTable
update t1 set ID = (select max(ID) from #tempTimeTable t2 where t2.RowN < t1.RowN) from #tempTimeTable t1 where id is null
select ID, SUM([time]) from #tempTimeTable group by [ID]
What we are doing is:
Insert the data from the original table into a temp table with a new column added that indicates the row number.
Update the ID fields on the rows that are NULL and set it = to the highest ID from lower number rows only. it will look like this:
1 time1
1 time1
1 time1
1 time1
2 time1
2 time1
2 time1
3 time1
3 time1
3 time1
Retrieve the data after summing all the times for each ID together.
Let me know if this works for you.

How to use multiple counts in where clause to compare data of a table in sql?

I want to compare data of a table with its other records. The count of rows with a specific condition has to match the count of rows without the where clause but on the same grouping.
Below is the table
-------------
id name time status
1 John 10 C
2 Alex 10 R
3 Dan 10 C
4 Tim 11 C
5 Tom 11 C
Output should be time = 11 as the count for grouping on time column is different when a where clause is added on status = 'C'
SELECT q1.time
FROM (SELECT time,
Count(id)
FROM table
GROUP BY time) AS q1
INNER JOIN (SELECT time,
Count(id)
FROM table
WHERE status = 'C'
GROUP BY time) AS q2
ON q1.time = q2.time
WHERE q1.count = q2.count
This is giving the desired output but is there a better and efficient way to get the desired result?
Are you looking for this :
select t.*
from table t
where not exists (select 1 from table t1 where t1.time = t.time and t1.status <> 'C');
However you can do :
select time
from table t
group by time
having sum (case when status <> 'c' then 1 else 0 end ) = 0;
If you want the times where the rows all satisfy the where clause, then in Postgres, you can express this as:
select time
from t
group by time
having count(*) = count(*) filter (where status = 'C');

Select min date after selected date

I have a schema that looks like this:
------------------------------------
ID | Time | Type | Description | obj
------------------------------------
And some data will be like
1 | 01/01/1900 01:01:01 AM | 1 | Start | O1
2 | 01/01/1900 01:01:02 AM | 1 | Start | O2
3 | 01/01/1900 01:01:03 AM | 2 | Stop | O1
4 | 01/01/1900 01:01:04 AM | 2 | Stop | O2
Notes:
The O1, O2 etc. is the ID of the object that the process operated on. It will be consistent between the start/stop times, but there will be multiple start/stop times for each object (the process will start and finish operating on a specific object multiple times, and my query will need to select records for each time the process processed that object)
The description says Start/Stop for the benefit of this question's clarity. In practice, it has all kinds of data that is parsed out.
So, what I need are the pairs of start times and stop times that are closest to each other. Stated another way: for every start time, I need the next closest stop time. So the result of a select statement for the sample data above (that only selected ids) would return:
(1, 3)
(2, 4)
What I've tried:
SELECT obj,
[Time] AS StartTime,
(SELECT MIN([TIME]) AS t
FROM TheTable
WHERE [Type] = 2
HAVING MIN([Time]) > StartTime) AS StopTime
FROM TheTable
WHERE [Type] = 1;
This obviously doesn't work as StartTime is unknown to the inner select.
Without the Having clause in the inner select, it runs but I get the same StopTime for all entries, as you would expect. Which is, of course, not what I need.
Is there any way that I can solve this?
SELECT t1.obj, t1.Time as Start, min(t2.Time) as Stop
FROM TheTable t1
LEFT OUTER JOIN TheTable t2
ON t1.obj = t2.obj and t2.Description = 'Stop' and t2.Time > t1.time
WHERE t1.Description = 'Start'
GROUP BY (t1.obj, t1.Time, t1.Description, t2.Description)
left outer join because there might be a start time and not yet a stop time
I am not sure why it doesn't work.
You can always use outer column reference in SubQ, you are missing a 'comma' and 'group by' BTW
SELECT obj,
[Time] AS StartTime,
(SELECT MIN([TIME]) AS t
FROM TheTable
WHERE [Type] = 2
and t1.obj = obj
HAVING t > StartTime) AS StopTime
FROM TheTable t1
WHERE [Type] = 1
group by obj,TIME;
EDIT:
I am not expert in SQL server and have no idea why the column alias is not working. This query works in other Dbs like Teradata which I was using. Anyhow, you can use table alias to workaround this.
SELECT obj,
[TIME] AS StartTime,
(SELECT MIN([TIME]) As [tt]
FROM TheTable
WHERE [Type] = 2
and t1.obj = obj
HAVING MIN([TIME]) > t1.TIME
) AS StopTime
FROM TheTable t1
WHERE [Type] = 1
group by obj,TIME;
SQLFiddle:
http://sqlfiddle.com/#!6/3a745/14
Now, it seems that column alias is not allowed in Having clause in SQL server:
SELECT obj,
min([tdate]) AS StartTime from thetable group by obj having starttime>5 ;
Invalid column name 'starttime'.: SELECT obj, min([tdate]) AS StartTime from thetable group by obj having starttime>5

Tricky SQL SELECT statement - combine two rows into two columns

My problem:
I have a table with a Channel <int> and a Value <float> column, along with a timestamp and a couple of other columns with additional data. Channel is either 1 or 2, and there is either 1 or 2 rows that have everything except channel and value the same.
What I'd like to do is select this data into a new form, where the two channels show up as columns. I tried to do something with GROUP BY, but I couldn't figure out how to get the values into the correct columns based on the channel on the same row.
Example:
For those of you that rather look at the data I have and the data I want and figure it out from there, here it is. What I have:
Channel Value Timestamp OtherStuff
1 0.2394 2010-07-09 13:00:00 'some other stuff'
2 1.2348 2010-07-09 13:00:00 'some other stuff'
1 24.2348 2010-07-09 12:58:00 'some other stuff'
2 16.3728 2010-07-09 12:58:00 'some other stuff'
1 12.284 2010-07-09 13:00:00 'unrelated things'
2 9.6147 2010-07-09 13:00:00 'unrelated things'
What I want:
Value1 Value2 Timestamp OtherStuff
0.2394 1.2348 2010-07-09 13:00:00 'some other stuff'
24.2348 16.3728 2010-07-09 12:58:00 'some other stuff'
12.284 9.6147 2010-07-09 13:00:00 'unrelated things'
Update in response to some questions that have arised in comments, and a few follow up questions/clarifications:
Yes, it is the combination of Timestamp and OtherStuff that links the two rows together. (OtherStuff is actually more than one column, but I simplified for brevity.) There are also a couple of other columns that are not necessarily equal, but should be kept just as they are.
The table in question is already joined from two tables, where Value, Channel and Timestamp comes from one of them, and the rest (a total of 7 more columns, out of which 4 are always equal for "linked" rows, and the other three are mostly not). There have been a couple of suggestions using INNER JOIN - will these still work if I'm already joining stuff together (even though I don't have a myTable to join to itself)?
There are a lot of rows with the same timestamp, so I need information from both the tables I'm joining to figure out which rows to link together.
I have a lot of data. The input comes from measurement devices stationed all over the country, and most of them (if not all) upload measurements (for up to 4 channels) every 2 minutes. Right now we have about 1000 devices online, so this means an addidtion of on average approximately 1000 rows every minute. I need to consider values that are up to at least 3, preferrably 6, hours old, which means 180 000 to 360 000 rows in the table with channel, value and timestamp.
As long as you have something that links the 2 rows, something like this
SELECT
c1.Value AS Value1, c2.Value AS Value2, c1.timestamp, c2.otherstuff
FROM
MyTable c1
JOIN
MyTable c2 ON c1.timestamp = c2.timestamp AND c1.otherstuff = c2.otherstuff
WHERE
c1.Channel = 1 AND c2.Channel = 2
If you don't have anything that links the 2 rows, then it probably can't be done because how do you know they are paired?
If you have 1 or 2 rows (edit: and don't know which channel value you have)
SELECT
c1.Value AS Value1, c2.Value AS Value2, c1.timestamp, c2.otherstuff
FROM
(
SELECT Value, timestamp, otherstuff
FROM MyTable
WHERE Channel = 1
) c1
FULL OUTER JOIN
(
SELECT Value, timestamp, otherstuff
FROM MyTable
WHERE Channel = 2
) c2 ON c1.timestamp = c2.timestamp AND c1.otherstuff = c2.otherstuff
Something like...
SELECT MAX(CASE Channel WHEN 1 THEN Value ELSE 0 END) AS Value1,
MAX(CASE Channel WHEN 2 THEN Value ELSE 0 END) AS Value2,
Timestamp,
OtherStuff
FROM {tablename}
GROUP BY Timestamp, OtherStuff
(I havent tested this!)
(and this assumes your Value is always positive!)
Alternatively (see comments below)...
SELECT SUM(CASE Channel WHEN 1 THEN Value ELSE 0 END) AS Value1,
SUM(CASE Channel WHEN 2 THEN Value ELSE 0 END) AS Value2,
Timestamp,
OtherStuff
FROM {tablename}
GROUP BY Timestamp, OtherStuff
SELECT a.Value as Value1, b.Value as Value2,
a.TimeStamp, a.OtherStuff
FROM myTable a INNER JOIN myTable b
ON a.OtherStuff = b.OtherStuff and a.TimeStamp = b.TimeStamp
WHERE a.Channel = 1 AND b.Channel = 2
Written without a query editor.
Edit: INNER JOIN could also be used here.