i have a table named as events and looks like this:
timestamp | intvalue | hostname | attributes
2019-03-13 14:43:05.437| 257 | room04 | Success 000
2019-03-13 14:43:05.317| 257 | room03 | Success 000
2019-03-13 14:43:03.450| 2049 | room05 | Error 108
2019-03-13 14:43:03.393| 0 | room05 | TicketNumber=3
2019-03-13 14:43:02.347| 0 | room04 | TicketNumber=2
2019-03-13 14:43:02.257| 0 | room03 | TicketNumber=1
The above is a sample of a table containing thousands of rows like this.
I'll explain in a few words what you see in this table. The timestamp column gives the date and time of when each event happened. In the intvalue column, 257 means successful entry, 2049 means error and 0 means a ticket made a request. The hostname gives the name of the card/ticket reader that reads each ticket and the attributes column gives some details like the number of the ticket (1, 2, 3 etc) or the type of error (i.e 108 or 109) and if the event is successful.
In this situation there is a pattern that says, if a ticket requests to enter and it is valid and happened at a time like 14:43:02.257, then the message of the successful entry will be written in the database (as a new event) in 6 seconds at most (that means at 14:49:02.257 maximum) after the ticket was read by the ticket reader.
If the ticket fails to enter, then after a time margin of 100 ms the error message will be written in the database.
So in this example what i want to do is create a table like below
timestamp | intvalue | hostname | result | ticketnumber
2019-03-13 14:43:05.437| 257 | room04 | Success 000 | TicketNumber=2
2019-03-13 14:43:05.317| 257 | room03 | Success 000 | TicketNumber=1
2019-03-13 14:43:03.450| 2049 | room05 | Error 108 | TicketNumber=3
As you can see the ticket with TicketNumber=3 is matched with the result Error 108 because if you look at the initial table, they have a time margin of less than 100ms, the other two tickets are matched 1-to-1 with their respective results, because the time margin is less than 6 seconds (and over than 100ms). You can also notice, that the hostnames can help the matching, the row with the attribute of the TicketNumber=3 has a hostname of room05, just like the next row that has the attribute of Error 108.
I've been trying to self join this table or join it with a CTE. I've used cross apply and i also have tried methods using datediff but i've failed miserably and i'm stuck.
Is there anyone that can help me and show me a correct way of achieving the desired outcome?
Thank you very much for your time.
The time lags don't really seem to make a difference, unless somehow a single room could be interleaved with both success and failure messages. Assuming that two requests do not happen in a row with no intervening event, then you can use lag():
select e.*
from (select timestamp, intvalue, hostname, attributes,
lag(attributes) over (partition by hostname order by timestamp) as ticketnumber
from event
) e
where intvalue > 0
order by timestamp
OK...here is the result you asked for based on the data you provided. This is just an example of how to write a self join to get the results in your example. I hope this pushes you in the right direction.
IF OBJECT_ID('tempdb..#t') IS NOT NULL
BEGIN
DROP TABLE #t
END
CREATE TABLE #t
(
[timestamp] DATETIME,
intValue INT,
hostName VARCHAR(50),
attributes VARCHAR(50)
)
INSERT INTO #t([timestamp], intValue, hostName, attributes)
VALUES ('2019-03-13 14:43:05.437', 257, 'room04', 'Success 000'),
('2019-03-13 14:43:05.317',257, 'room03','Success 000'),
('2019-03-13 14:43:03.450',2049, 'room05','Error 108'),
('2019-03-13 14:43:03.393',0, 'room05','TicketNumber=3'),
('2019-03-13 14:43:02.347',0, 'room04','TicketNumber=2'),
('2019-03-13 14:43:02.257',0, 'room03','TicketNumber=1')
SELECT x.[timestamp], x.intValue, x.hostName, x.attributes result, y.attributes
ticketnumber
FROM (SELECT * FROM #t WHERE intValue > 0) AS x
INNER JOIN #t y
ON x.hostName = y.hostName AND y.intValue = 0
GROUP BY x.[timestamp], x.intValue, x.hostName, x.attributes, y.attributes
ORDER BY x.[timestamp] DESC
I would not try to copy this into your project and use it, this is just an example of how to use the join. I would need way more information about what you want to accomplish before posting a full blown solution as there much much better ways to produce reports for large data sets.
- Bill
Since you're using SQL 2017, you can make use of lead/lag.
with evt(timestamp,intvalue,hostname,attributes) as
(
select cast('2019-03-13 14:43:05.437' as datetime), 257 , 'room04','Success 000' union all
select cast('2019-03-13 14:43:05.317' as datetime), 257 , 'room03','Success 000' union all
select cast('2019-03-13 14:43:03.450' as datetime), 2049 , 'room05','Error 108' union all
select cast('2019-03-13 14:43:03.393' as datetime), 0 , 'room05','TicketNumber=3' union all
select cast('2019-03-13 14:43:02.347' as datetime), 0 , 'room04','TicketNumber=2' union all
select cast('2019-03-13 14:43:02.257' as datetime), 0 , 'room03','TicketNumber=1'
)
select [timestamp], intvalue, hostname, attributes, lag(attributes) over (partition by hostname order by timestamp) ticketnumber, datediff(ss,lag([timestamp]) over (partition by hostname order by timestamp), [timestamp]) lapse
from evt
order by timestamp
So I've looked through a lot of questions about subtraction and all that for SQL but haven't found the exact same use.
I'm using a single table and trying to find an average response time between two people talking on my site. Here's the data sample:
id created_at conversation_id sender_id receiver_id
307165 2017-05-03 20:03:27 96557 24 1755
307166 2017-05-03 20:04:22 96557 1755 24
303130 2017-04-20 18:03:53 102458 2518 4475
302671 2017-04-18 20:11:20 102505 3100 1079
302670 2017-04-18 20:09:38 103014 3100 2676
350570 2017-09-18 20:59:56 103496 5453 929
290458 2017-02-16 13:38:47 103575 2841 2282
300001 2017-04-08 16:42:16 104159 2740 1689
304204 2017-04-24 17:31:25 104531 5963 1118
284873 2017-01-12 22:33:19 104712 3657 3967
284872 2017-01-12 22:31:38 104712 3967 3657
What I want is to find an Average Response Time based on the conversation_id
Hmmm . . . You can get the "response" for a given row by getting the next row between the two conversers. The rest is getting the average -- which is database dependent.
Something like this:
select avg(next_created_at - created_at) -- exact syntax depends on the database
from (select m.*,
(select min(m2.created_at)
from messages m2
where m2.sender_id = m.receiver_id and m.sender_id = m2.receiver_id and
m2.conversation_id = m.conversation_id and
m2.created_at > m.created_at
) next_created_at
from messages m
) mm
where next_created_at is not null;
A CTE will take care of bringing the conversation start and end into the same row.
Then use DATEDIFF to compute the response time, and average it.
Assumes there are only ever two entries per conversation (ignores others with 1 or more than 2).
WITH X AS (
SELECT conversation_id, MIN(created_at) AS convstart, MAX(created_at) AS convend
FROM theTable
GROUP BY conversation_id
HAVING COUNT(*) = 2
)
SELECT AVG(DATEDIFF(second,convstart,convend)) AS AvgResponse
FROM X
I have difficulties formulating my issue.
I have a view which brings these results. There's a need to add a column to the view, which will pair up round-trip flights with identical number.
Flt_No From_Airport To_Airport Dep_Date RequiredResult
124 |LCA |CDG |10/19/14 5:00 1
125 |CDG |LCA |10/19/14 10:00 1
197 |LCA |BCN |10/4/12 5:00 2
198 |BCN |LCA |10/4/12 11:00 2
501 |LCA |HER |15/8/12 12:05 3
502 |HER |LCA |15/8/12 15:15 3
I.e. flight 124 is going from Larnaca to CDG, and flight 125 is going back from CDG to Larnaca - they both have to have the same identifier.
Round-trip flights will always have following flight numbers.
I have a bunch of conditions which I won't write now.
Omitting hours is not an option, they're important.
I was thinking dense_rank() but I don't know how to create one identifier for 2 flights with different numbers, please help.
If your data is similar to the sample data posted, then the following query should give the required result:
SELECT *,
DENSE_RANK() OVER (ORDER BY CASE
WHEN From_Airport < To_Airport THEN From_Airport
ELSE To_Airport
END)
FROM mytable
Join conditions are not limited to simple equality. Assuming {Flight No, Departure, Destination} is unique on any one day, then a self join should do it:
select whatever
from flights outbound
inner join flights inbound on outbound.flt_no+1 = inbound.flt_no
and cast(outbound.dep_date, date)
= cast(inbound.dep_date, date)
and outbound.From_Airport = inbound.To_Airport
and outbound.To_Airpott = inbound.From_Ariport
I am using MS SQL Server 2012 on Windows7 platform and Qt5 to develop an application that uses a database. I'm good at Qt but I don't have SQL skills.
Currently I have a table named Devices that looks like this:
DeviceSerial DeviceIPAddr DeviceSwVersion
===========================================
1000 192.168.1.1 8.00
1043 192.168.1.2 8.00
1045 192.168.1.2 8.01
1049 192.168.1.3 8.00
1055 192.168.1.4 8.00
1058 192.168.1.6 8.00
1060 192.168.1.5 8.00
1061 192.168.1.8 8.01
1066 192.168.1.3 8.00
1070 192.168.1.10 8.00
1071 192.168.1.12 8.00
...
There is also another table named CommandQueue, (that is empty or it should be before this operation?) and that it should be populated with data, based on the first table field DeviceIPAddr and the value of a parameter (integer) from my app, like below:
TargetIP CommandID
========================
192.168.1.1 30
192.168.1.2 30
192.168.1.3 30
192.168.1.4 30
192.168.1.5 30
192.168.1.6 30
192.168.1.8 30
192.168.1.10 30
192.168.1.12 30
...
The value of commandID is the same at one moment of time for all rows of the CommandQueue table. For the sake of the example/sample, I chose 30.
How to create a SQL query to populate the CommandQueue table?
Use INSERT INTO .. SELECT FROM construct like
insert into CommandQueue(TargetIP, CommandID)
select DeviceIPAddr,
#someparameter_value
from Devices;
You can as well hard code that value if you want saying
insert into CommandQueue(TargetIP, CommandID)
select DeviceIPAddr, 30 //assuming CommandID is INT
from Devices;
per your comment, you can then change the query itself like
insert into CommandQueue(TargetIP, CommandID)
select d.DeviceIPAddr,
#someparameter_value
from Devices d
left join CommandQueue ck on d.DeviceIPAddr = ck.TargetIP
and ck.CommandID <> #someparameter_value
where ck.TargetIP is null;
-- TRUNCATE is quickest way to make sure its empty.
-- If you do a DELETE with a million rows, it could
-- take 5 min to delete. With TRUNCATE it will take
-- 1-second no matter how many rows.
TRUNCATE TABLE CommandQueue
GO
INSERT INTO CommandQueue
( TargetIP ,
CommandID )
SELECT DISTINCT -- Only unique rows
DeviceIPAddr ,
#parameter1
FROM Devices;
GO
The following will insert into CommandQueue only data from Devices where the DeviceIPAddr + #parameter1 is not already in CommandQueue. The records that are inserted must have DeviceSwVersion = 8.
INSERT INTO CommandQueue
( TargetIP ,
CommandID )
SELECT DISTINCT -- Only unique rows
DeviceIPAddr ,
#parameter1
FROM Devices a
LEFT OUTER JOIN CommandQueue b ON a.DeviceIPAddr = b.TargetIP
AND b.CommandID = #parameter1
WHERE b.DeviceIPAddr IS NULL
AND a.DeviceSwVersion = 8
I have 2 rows like below:
941 78 252 3008 86412 1718502 257796 2223252 292221 45514 114894
980 78 258 3064 88318 1785623 269374 2322408 305467 46305 116970
I want to insert current time stamp while inserting each row.
finally in my hive table row should be like below:
941 78 252 3008 86412 1718502 257796 2223252 292221 45514 114894
2014-10-21
980 78 258 3064 88318 1785623 269374 2322408 305467 46305 116970
2014-10-22
Is there any way I can insert timestamp directly into hive without using pig script?
You can use from_unixtime(unix_timestamp()) while inserting.
For example, suppose you have following tables:
create table t1(c1 String);
create table t2(c1 String, c2 timestamp);
Now you can populate table t2 from t1 with current timestamp:
insert into table t2 select *, from_unixtime(unix_timestamp()) from t1;