SQL-Server-2017: Self join a table using timestamp criteria - sql

i have a table named as events and looks like this:
timestamp | intvalue | hostname | attributes
2019-03-13 14:43:05.437| 257 | room04 | Success 000
2019-03-13 14:43:05.317| 257 | room03 | Success 000
2019-03-13 14:43:03.450| 2049 | room05 | Error 108
2019-03-13 14:43:03.393| 0 | room05 | TicketNumber=3
2019-03-13 14:43:02.347| 0 | room04 | TicketNumber=2
2019-03-13 14:43:02.257| 0 | room03 | TicketNumber=1
The above is a sample of a table containing thousands of rows like this.
I'll explain in a few words what you see in this table. The timestamp column gives the date and time of when each event happened. In the intvalue column, 257 means successful entry, 2049 means error and 0 means a ticket made a request. The hostname gives the name of the card/ticket reader that reads each ticket and the attributes column gives some details like the number of the ticket (1, 2, 3 etc) or the type of error (i.e 108 or 109) and if the event is successful.
In this situation there is a pattern that says, if a ticket requests to enter and it is valid and happened at a time like 14:43:02.257, then the message of the successful entry will be written in the database (as a new event) in 6 seconds at most (that means at 14:49:02.257 maximum) after the ticket was read by the ticket reader.
If the ticket fails to enter, then after a time margin of 100 ms the error message will be written in the database.
So in this example what i want to do is create a table like below
timestamp | intvalue | hostname | result | ticketnumber
2019-03-13 14:43:05.437| 257 | room04 | Success 000 | TicketNumber=2
2019-03-13 14:43:05.317| 257 | room03 | Success 000 | TicketNumber=1
2019-03-13 14:43:03.450| 2049 | room05 | Error 108 | TicketNumber=3
As you can see the ticket with TicketNumber=3 is matched with the result Error 108 because if you look at the initial table, they have a time margin of less than 100ms, the other two tickets are matched 1-to-1 with their respective results, because the time margin is less than 6 seconds (and over than 100ms). You can also notice, that the hostnames can help the matching, the row with the attribute of the TicketNumber=3 has a hostname of room05, just like the next row that has the attribute of Error 108.
I've been trying to self join this table or join it with a CTE. I've used cross apply and i also have tried methods using datediff but i've failed miserably and i'm stuck.
Is there anyone that can help me and show me a correct way of achieving the desired outcome?
Thank you very much for your time.

The time lags don't really seem to make a difference, unless somehow a single room could be interleaved with both success and failure messages. Assuming that two requests do not happen in a row with no intervening event, then you can use lag():
select e.*
from (select timestamp, intvalue, hostname, attributes,
lag(attributes) over (partition by hostname order by timestamp) as ticketnumber
from event
) e
where intvalue > 0
order by timestamp

OK...here is the result you asked for based on the data you provided. This is just an example of how to write a self join to get the results in your example. I hope this pushes you in the right direction.
IF OBJECT_ID('tempdb..#t') IS NOT NULL
BEGIN
DROP TABLE #t
END
CREATE TABLE #t
(
[timestamp] DATETIME,
intValue INT,
hostName VARCHAR(50),
attributes VARCHAR(50)
)
INSERT INTO #t([timestamp], intValue, hostName, attributes)
VALUES ('2019-03-13 14:43:05.437', 257, 'room04', 'Success 000'),
('2019-03-13 14:43:05.317',257, 'room03','Success 000'),
('2019-03-13 14:43:03.450',2049, 'room05','Error 108'),
('2019-03-13 14:43:03.393',0, 'room05','TicketNumber=3'),
('2019-03-13 14:43:02.347',0, 'room04','TicketNumber=2'),
('2019-03-13 14:43:02.257',0, 'room03','TicketNumber=1')
SELECT x.[timestamp], x.intValue, x.hostName, x.attributes result, y.attributes
ticketnumber
FROM (SELECT * FROM #t WHERE intValue > 0) AS x
INNER JOIN #t y
ON x.hostName = y.hostName AND y.intValue = 0
GROUP BY x.[timestamp], x.intValue, x.hostName, x.attributes, y.attributes
ORDER BY x.[timestamp] DESC
I would not try to copy this into your project and use it, this is just an example of how to use the join. I would need way more information about what you want to accomplish before posting a full blown solution as there much much better ways to produce reports for large data sets.
- Bill

Since you're using SQL 2017, you can make use of lead/lag.
with evt(timestamp,intvalue,hostname,attributes) as
(
select cast('2019-03-13 14:43:05.437' as datetime), 257 , 'room04','Success 000' union all
select cast('2019-03-13 14:43:05.317' as datetime), 257 , 'room03','Success 000' union all
select cast('2019-03-13 14:43:03.450' as datetime), 2049 , 'room05','Error 108' union all
select cast('2019-03-13 14:43:03.393' as datetime), 0 , 'room05','TicketNumber=3' union all
select cast('2019-03-13 14:43:02.347' as datetime), 0 , 'room04','TicketNumber=2' union all
select cast('2019-03-13 14:43:02.257' as datetime), 0 , 'room03','TicketNumber=1'
)
select [timestamp], intvalue, hostname, attributes, lag(attributes) over (partition by hostname order by timestamp) ticketnumber, datediff(ss,lag([timestamp]) over (partition by hostname order by timestamp), [timestamp]) lapse
from evt
order by timestamp

Related

SQL Having/Where clause to compare MAX from current/another table

I have a table that has date information and is being copied to another table and trying to perform an incremental load.
date = date format
hour = int
person
date
hour
bob
2023-01-01
1
bill
2023-01-02
2
select * into test.person_copy from
(select * from original.person)
My thought process of performing the incremental load is to check on the max(date) & max(hour) from the original table against the copied table to identify what is the gap between the max values from both tables. However, I'm not entirely sure how to implement the logic as it doesn't seem straight forward with the where clause. Having clause might make more sense, but also doesn't seem correct?
select * into test.person_copy from
(select * from original.person org
Having max(org.date, org.hour) > (select max(copy.date,copy.hour) from test.person_copy copy)
)
The other variation I had in mind was to use HAVING NOT IN
Having max(org.date, org.hour) NOT IN (select max(copy.date,copy.hour) from test.person_copy copy)
Wasn't sure if logic is correct. Hour field will be of importance's, but can live with just the date fields.
Expected output would be that the logic would check for existing max(date) and only insert if it doesn't exist. Example below, 2023-01-03
| person | date | hour |
|--------|------------|------|
| bob | 2023-01-01 | 1 |
| bill | 2023-01-02 | 2 |
| test | 2023-01-03 | 2 |
Don't have access to a RedShift environment but the following query should work:
select *
into test.person_copy
from original.person org
where dateadd(hrs, org.hour, org.date) >
(select max(dateadd(hrs, cpy.hour, cpy.date))
from test.person_copy cpy
)
This assumes that when the previous hour's copy was made entire set of source rows for that date&hour was copied (the new incremental load would have all rows for the dates&hours not already copied). This means that you need additional criteria in the select to make sure that you include only completed date-hours (i.e. make sure that you don't include the rows with hour=10 while the time is still 10:30).

Getting warning: Null value is eliminated by an aggregate or other SET operation

I have this schema
create table t(id int, d date)
insert into t (id, d) values (1, getdate()),
(2, NULL)
When doing
declare #mindate date
select #mindate = min(d) from t
I get the warning
Null value is eliminated by an aggregate or other SET operation
Why and what can I do about it?
Mostly you should do nothing about it.
It is possible to disable the warning by setting ansi_warnings off but this has other effects, e.g. on how division by zero is handled and can cause failures when your queries use features like indexed views, computed columns or XML methods.
In some limited cases you can rewrite the aggregate to avoid it. e.g. COUNT(nullable_column) can be rewritten as SUM(CASE WHEN nullable_column IS NULL THEN 0 ELSE 1 END) but this isn't always possible to do straightforwardly without changing the semantics.
It's just an informational message required in the SQL standard. Apart from adding unwanted noise to the messages stream it has no ill effects (other than meaning that SQL Server can't just bypass reading NULL rows, which can have an overhead but disabling the warning doesn't give better execution plans in this respect)
The reason for returning this message is that throughout most operations in SQL nulls propagate.
SELECT NULL + 3 + 7 returns NULL (regarding NULL as an unknown quantity this makes sense as ? + 3 + 7 is also unknown)
but
SELECT SUM(N)
FROM (VALUES (NULL),
(3),
(7)) V(N)
Returns 10 and the warning that nulls were ignored.
However these are exactly the semantics you want for typical aggregation queries. Otherwise the presence of a single NULL would mean aggregations on that column over all rows would always end up yielding NULL which is not very useful.
Which is the heaviest cake below? (Image Source, Creative Commons image altered (cropped and annotated) by me)
After the third cake was weighed the scales broke and so no information is available about the fourth but it was still possible to measure the circumference.
+--------+--------+---------------+
| CakeId | Weight | Circumference |
+--------+--------+---------------+
| 1 | 50 | 12.0 |
| 2 | 80 | 14.2 |
| 3 | 70 | 13.7 |
| 4 | NULL | 13.4 |
+--------+--------+---------------+
The query
SELECT MAX(Weight) AS MaxWeight,
AVG(Circumference) AS AvgCircumference
FROM Cakes
Returns
+-----------+------------------+
| MaxWeight | AvgCircumference |
+-----------+------------------+
| 80 | 13.325 |
+-----------+------------------+
even though technically it is not possible to say with certainty that 80 was the weight of the heaviest cake (as the unknown number may be larger) the results above are generally more useful than simply returning unknown.
+-----------+------------------+
| MaxWeight | AvgCircumference |
+-----------+------------------+
| ? | 13.325 |
+-----------+------------------+
So likely you want NULLs to be ignored, and the warning just alerts you to the fact that this is happening.
#juergen provided two good answers:
Suppress the warning using SET ANSI_WARNINGS OFF
Assuming you want to include NULL values and treat them as (say) use select #mindate = min(isnull(d, cast(0 as datetime))) from t
However if you want to ignore rows where the d column is null and not concern yourself with the ANSI_WARNINGS option then you can do this by excluding all rows where d is set to null as so:
select #mindate = min(d) from t where (d IS NOT NULL)
I think you can ignore this warning in the case since you using the MIN function.
"Except for COUNT, aggregate functions ignore null values"
Please refer Aggregate Functions (Transact-SQL)
What should min() return in your case as lowest value of d?
The error informs you that the min() function did not take records into account that are null.
So if it should ignore the NULL values and return the lowest existing date then you can ignore this warning.
If you also like to suppress warnings for this single statement then you can do it like this
set ansi_warnings off
select #mindate = min(d) from t
set ansi_warnings on
If you want NULL values taken into account by using a default value for them then you can set a default date value like this
select #mindate = min(isnull(d, cast(0 as datetime)))
from t
If you want to make aggregates consider null values and treat the result as null you can use:
SELECT IIF(COUNT(N) != COUNT(*), NULL, SUM(N)) as [Sum]
FROM (VALUES (NULL),
(3),
(7)) V(N)
This returns null if not all values are given.

Using SQL to get the last item before n

I am not quite sure how to ask this so I will start off with an example. Let's say I have a table in my database that looks like this:
id | time | event | pnumber
---------------------------
1 | 1200 | foo | 23
2 | 1130 | bar | 52
3 | 1045 | bat | 13
...
n | 0 | baz | 7
Now say I wanted to get the last known pnumber after a certain time. For example at time = 1135, it would have to go back and find the last known time in the table (1130) and then return that pnumber. So for t = 1130, it would return pnumber = 52. But as soon as the t = 1045 it would return pnumber = 13. (Time counts down in this context from 1200 to 0).
Here's what I have so far.
SELECT pnumber FROM table WHERE time = (SELECT time FROM table WHERE time <= '1135' ORDER BY time LIMIT 1)
Is there an easier way to do this? Without using multiple statements. I am using sqlite3
Sure. You can condense that query by doing:
SELECT pnumber FROM table WHERE time >= 1135 ORDER BY time DESC LIMIT 1;
No need to nest the select to get a specific time first, this should work.
EDIT: Got the inequality sign mixed around -- if you're looking for the first record AFTER a specific time, you'll want time >= 1135 and order by time descending with a limit of one.
Why do you need the second query? Could you do something like this:
SELECT TOP 1 pnumber FROM table WHERE time >= '1135' ORDER BY TIME DESC
I'm a bit confused. You are asking that 1135 would return the value for 1130, yet you are using greater than or equal to instead of less than. If your example is what you are looking for, try this.
SELECT PNUMBER FROM TABLE WHERE TIME<=1135 ORDER BY TIME DESC LIMIT 1

Is this possible with an SQL query?

Sorry for the generic title of the question, but I didn't know how else to put it.. So here goes:
I have a single table that holds the following information:
computerName | userName | date | logOn | startUp
| | | |
ID_000000001 | NULL | 2012-08-14 08:00:00.000 | NULL | 1
ID_000000001 | NULL | 2012-08-15 09:00:00.000 | NULL | 0
ID_000000003 | user02 | 2012-08-15 19:00:00.000 | 1 | NULL
ID_000000004 | user02 | 2012-08-16 20:00:00.000 | 0 | NULL
computername and username are self-explanatory I suppose
logOn is 1 when the user logged on at the machine and 0 when he logged off.
startUp is 1 when the machine was turned on and 0 when it got turned off.
the other entry is alway NULL respectively since we can't login and startup at the exact same time.
Now my task is: Find out which computers have been turned on the least amount of time over the last month (or any given amount of time, but for now let's say one month) Is this even possible with SQL? <-- Careful: I don't need to know how many times a PC was turned on, but how many hours/minutes each computer was turned on over the given timespace
There's two little problems as well:
We cannot say that the first entry of each computer is a 1 in the startUp column since the script that logs those events was installed recently and thus maybe a computer was already running when it started logging.
We cannot assume that if we order by date and only show the startUpcolumn that the entries will all be alternating 1's and 0's because if the computer is forced shut down by pulling the plug for example there won't be a log for the shutdown and there could be two 1's in a row.
EDIT: userName is of course NULL when startUp has a value, since turning on/shutting down doesn't show which user did that.
In a stored procedure, with cursors and fetch loops.
And you use a temptable to store by computername the uptime.
I give you the main plan, I'll let you see for the details in the TSQL guide.
Another link: a good example with TSQL Cursor.
DECLARE #total_hour_by_computername
declare #computer_name varchar(255)
declare #RowNum int
--Now in you ComputerNameList cursor, you have all different computernames:
declare ComputerNameList cursor for
select DISTINCT computername from myTable
-- We open the cursor
OPEN ComputerNameList
--You begin your foreach computername loop :
FETCH NEXT FROM ComputerNameList
INTO #computer_name
set #RowNum = 0
WHILE ##FETCH_STATUS = 0
BEGIN
SET #total_hour_by_computername=0;
--This query selects all startup 1 dates by computername order by date.
select #current_date=date from myTable where startup = 1 and computername = #computername order by date
--You use a 2nd loop on the dates that were sent you back:
--This query gives you the previous date
select TOP(1) #previousDate=date from myTable
where computername = #computername and date < #current_date and startup is not null
order by date DESC
--If it comes null, you can decide not to take it into account.
--Else
SET #total_hour_by_computername=#total_hour_by_computername+datediff(hour, #previousDate, #current_date);
--Once all dates have been parsed, you insert into your temptable the results for this computername
INSERT INTO TEMPTABLE(Computername,uptime) VALUES (#computername,#total_hour_by_computername)
--End of the #computer_name loop
FETCH NEXT FROM ComputerNameList
INTO #computer_name
END
CLOSE ComputerNameList
DEALLOCATE ComputerNameList
You only need a select into your temptable to determine which one of the computers has been up the most time.
You could group by computer, and use where to filter for startups in a particular month:
select computerName
, count(*)
from YourTable
where '2012-08-01' <= [date] and [date] < '2012-09-01'
and startup = 1
group by
computerName
order by
count(*) desc
As RoadWarrior pointed out, an accurate reports is not possible when shutdown messages are dropped. But here is an attempt to generate something useful. I'm going to assume the table name is computers:
SELECT c1.computerName,
timediff(MIN(c2.date), c1.date) as upTime
FROM computers as c1, computers as c2
WHERE c1.computerName=c2.computerName
AND c1.startUp=1 AND c2.startUp=0
AND c2.date >= c1.date
GROUP BY c1.date
ORDER BY c1.date;
This will generate a list of all the periods a computer was on. To generate your requested report you can use the above query as a subquery:
SELECT
c3.computerName,
SEC_TO_TIME(SUM(TIME_TO_SEC(c3.upTime))) AS totalUpTime
FROM
(SELECT c1.computerName,
timediff(MIN(c2.date), c1.date) AS upTime
FROM computers AS c1, computers AS c2
WHERE c1.computerName=c2.computerName
AND c1.startUp=1 AND c2.startUp=0
AND c2.date >= c1.date
GROUP BY c1.date
ORDER BY c1.date
) AS c3
GROUP BY c3.computerName
ORDER BY c3.totalUpTime;
Try this query (replace table_name with the name of your table):
SELECT SUM(startUp) AS startupTimes
FROM table_name
GROUP BY computerName
ORDER BY startupTimes
This will output the number of times each computer has been started. To get just the first row (the computer that has the least amount of startups) you can append LIMIT 1 to the query.
If (per your last paragraph) you aren't recording all shutdown events. then you don't have the information available to generate a report showing the amount of time each computer has been switched on. Because you aren't recording all instances of computer shutdown, it doesn't matter what SQL query you use.
FWIW, this schema isn't 3NF. A more common approach would be to have a single column recording each event, for example:
ComputerId:UserId:EventId:EventDate
The first three columns are each a foreign key into another table where the details are stored. Although even with this schema, the UserID would be null for startup/shutdown events.

Need a Complex SQL Query

I need to make a rather complex query, and I need help bad. Below is an example I made.
Basically, I need a query that will return one row for each case_id where the type is support, status start, and date meaning the very first one created (so that in the example below, only the 2/1/2009 John's case gets returned, not the 3/1/2009). The search needs to be dynamic to the point of being able to return all similar rows with different case_id's etc from a table with thousands of rows.
There's more after that but I don't know all the details yet, and I think I can figure it out if you guys (an gals) can help me out here. :)
ID | Case_ID | Name | Date | Status | Type
48 | 450 | John | 6/1/2009 | Fixed | Support
47 | 450 | John | 4/1/2009 | Moved | Support
46 | 451 | Sarah | 3/1/2009 | |
45 | 432 | John | 3/1/2009 | Fixed | Critical
44 | 450 | John | 3/1/2009 | Start | Support
42 | 450 | John | 2/1/2009 | Start | Support
41 | 440 | Ben | 2/1/2009 | |
40 | 432 | John | 1/1/2009 | Start | Critical
...
Thanks a bunch!
Edit:
To answer some people's questions, I'm using SQL Server 2005. And the date is just plain date, not string.
Ok so now I got further in the problem. I ended up with Bliek's solution which worked like a charm. But now I ran into the problem that sometimes the status never starts, as it's solved immediately. I need to include this in as well. But only for a certain time period.
I imagine I'm going to have to check for the case table referenced by FK Case_ID here. So I'd need a way to check for each Case_ID created in the CaseTable within the past month, and then run a search for these in the same table and same manner as posted above, returning only the first result as before. How can I use the other table like that?
As usual I'll try to find the answer myself while waiting, thanks again!
Edit 2:
Seems this is the answer. I don't have access to the full DB yet so I can't fully test it, but it seems to be working with the dummy tables I created, to continue from Bliek's code's WHERE clause:
WHERE RowNumber = 1 AND Case_ID IN (SELECT Case_ID FROM CaseTable
WHERE (Date BETWEEN '2007/11/1' AND '2007/11/30'))
The date's screwed again but you get the idea I'm sure. Thanks for the help everyone! I'll get back if there're more problems, but I think with this info I can improvise my way through most of the SQL problems I currently have to deal with. :)
Maybe something like:
select Case_ID, Name, MIN(date), Status, Type
from table
where type = 'Support'
and status = 'Start'
group by Case_ID, Name, Status, Type
EDIT: You haven't provided a lot of details about what you really want, so I'd suggest that you read all the answers and choose one that suits your problem best. So far I'd say that Tomalak's answer is closest to what you're looking for...
SELECT
c.ID,
c.Case_ID,
c.Name,
c.Date,
c.Status,
c.Type
FROM
CaseTable c
WHERE
c.Type = 'Support'
AND c.Status = 'Start'
AND c.Date = (
SELECT MIN(Date)
FROM CaseTable
WHERE Case_ID = c.Case_ID AND Type = c.Type AND Status = c.Status)
/* GROUP BY only needed when for a given Case_ID several rows
exist that fulfill the WHERE clause */
GROUP BY
c.ID,
c.Case_ID,
c.Name,
c.Date,
c.Status,
c.Type
This query benefits greatly from indexes on the Case_ID, Date, Status and Type columns.
Added value though the fact that the filter on Support and Status only needs to be set in one place.
As an alternative to the GROUP BY clause, you can do SELECT DISTINCT, which would increase readability (this may or may not affect overall performance, I suggest you measure both variants against each other). If you are sure that for no Case_ID in your table two rows exist that have the same Date, you won't need GROUP BY or SELECT DISTINCT at all.
In SQL Server 2005 and beyond I would use Common Table Expressions (CTE). This offers lots of possibilities like so:
With ResultTable (RowNumber
,ID
,Case_ID
,Name
,Date
,Status
,Type)
AS
(
SELECT Row_Number() OVER (PARTITION BY Case_ID
ORDER BY Date ASC)
,ID
,Case_ID
,Name
,Date
,Status
,Type
FROM CaseTable
WHERE Type = 'Support'
AND Status = 'Start'
)
SELECT ID
,Case_ID
,Name
,Date
,Status
,Type
FROM ResultTable
WHERE RowNumber = 1
Don't apologize for your date formatting, it makes more sense that way.
SELECT ID, Case_ID, Name, MIN(Date), Status, Type
FROM caseTable
WHERE Type = 'Support'
AND status = 'Start'
GROUP BY ID, Case_ID, Name, Status, Type