SQL query to find the visitor together with the date time - sql

My visitor log table has id, visitor, department,vtime fields.
id | visitor | Visittime | Department_id
--------------------------------------------------------------
1 1 2019-05-07 13:53:50 1
2 2 2019-05-07 13:56:54 1
3 1 2019-05-07 14:54:10 3
4 2 2019-05-08 13:54:49 1
5 1 2019-05-08 13:58:15 1
6 2 2019-05-08 18:54:30 2
7 1 2019-05-08 18:54:37 2
And I have already have the following index
CREATE INDEX Idx_VisitorLog_Visitor_VisitTime_Includes ON VisitorLog
(Visitor, VisitTime) INCLUDE (DepartmentId, ID)
From the above table 4 filters are passed from User interface, visitor 1 and visitor 2 and visiting start time and end time.
In what are the department visitor 1 and visitor 2 both together with the VisitTime difference with in 5 mins those need to be filtered
Output shout be
id | visitor | Visittime | Department_id
--------------------------------------------------------------
1 1 2019-05-07 13:53:50 1
2 2 2019-05-07 13:56:54 1
4 2 2019-05-08 13:54:49 1
5 1 2019-05-08 13:58:15 1
For that I had used the following query,
;with CTE1 AS(
Select id,visitor,Visittime,department_id from visitorlog where visitor=1
)
,CTE2 AS(
Select id,visitor,Visittime,department_id from visitorlog where visitor=2
)
select * from CTE2 V2
Inner join CTE1 V1 on V2.department_id=V1.department_id and DATEDIFF(minute,V2.Visittime,V1.Visittime)between -5 and 5**
The above query takes too much of time to give response. Because in my table, almost 20 million records are available
Could any one suggest the correct way for my requirement.
Thanks in advance

This is a completely revised answer, based upon your additional information above.
After reviewing the data file above and the results you desire, this seems like the cleanest way to provide your results. First, we need a different index:
create index idx_POC_visitorlog on visitorlog
(visitor, Department_id, Visittime) include(id);
With this index, we can limit the queries to only the two passed in IDs. To simulate that, I created variables to hold their values. This query returns the data you are looking for.
DECLARE #Visitor1 int = 1,
#Visitor2 int = 2
;with t as (
select Department_id,
dateadd(minute, -5, visittime) as EarlyTime,
dateadd(minute, 5, Visittime) as LateTime,
id
from visitorlog
where visitor = #Visitor1
),
v as (
select v.id,
t.id as tid
from visitorlog v
INNER JOIN t
ON v.visitor = #Visitor2
AND v.Department_id = t.Department_id
and v.Visittime BETWEEN t.EarlyTime and t.LateTime
)
SELECT *
FROM visitorlog vl
WHERE ID IN (
SELECT v.id
FROM v
UNION
SELECT v.tid
FROM v
)
ORDER BY visittime;

If your version of SQL Server supports the LAG and LEAD functions, try rewriting the query as follows:
with t as (
select
*,
dateadd(minute, 5,
lag(Visittime) over(partition by Department_id order by Visittime)) lag_visit_time,
dateadd(minute, -5,
lead(Visittime) over(partition by Department_id order by Visittime)) lead_visit_time
from visitorlog
where visitor in(1, 2)
)
select
id, visitor, visittime, department_id
from t
where lag_visit_time >= Visittime or lead_visit_time <= Visittime;
This index is called a POC.
Results:
+----+---------+----------------------+---------------+
| id | visitor | visittime | department_id |
+----+---------+----------------------+---------------+
| 1 | 1 | 2019-05-07T13:53:50Z | 1 |
| 2 | 2 | 2019-05-07T13:56:54Z | 1 |
| 4 | 2 | 2019-05-08T13:54:49Z | 1 |
| 5 | 1 | 2019-05-08T13:58:15Z | 1 |
| 6 | 2 | 2019-05-08T18:54:30Z | 2 |
| 7 | 1 | 2019-05-08T18:54:37Z | 2 |
+----+---------+----------------------+---------------+
Demo.

Related

Teradata sql query from grouping records using Intervals

In Teradata SQL how to assign same row numbers for the group of records created with in 8 seconds of time Interval.
Example:-
Customerid Customername Itembought dateandtime
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01
100 ALex Circketball 2017-02-10 10:10:06
100 ALex Baseball 2017-02-10 10:10:08
100 ALex volleyball 2017-02-10 10:11:01
100 ALex footbball 2017-02-10 10:11:05
100 ALex ringball 2017-02-10 10:11:08
100 Alex football 2017-02-10 10:12:10
My Expected result shoud have additional column with Row_number where it should assign the same number for all the purchases of the customer with in 8 seconds: Refer the below expected result
Customerid Customername Itembought dateandtime Row_number
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01 1
100 ALex Circketball 2017-02-10 10:10:06 1
100 ALex Baseball 2017-02-10 10:10:08 1
100 ALex volleyball 2017-02-10 10:11:01 2
100 ALex footbball 2017-02-10 10:11:05 2
100 ALex ringball 2017-02-10 10:11:08 2
100 Alex football 2017-02-10 10:12:10 3
This is one way to do it with a recursive cte. Reset the running total of difference from the previous row's timestamp when it gets > 8 to 0 and start a new group.
WITH ROWNUMS AS
(SELECT T.*
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY TM) AS RNUM
/*Replace DATEDIFF with Teradata specific function*/
,DATEDIFF(SECOND,COALESCE(MIN(TM) OVER(PARTITION BY ID
ORDER BY TM ROWS BETWEEN 1 PRECEDING AND CURRENT ROW), TM),TM) AS DIFF
FROM T --replace this with your tablename and add columns as required
)
,RECURSIVE CTE(ID,TM,DIFF,SUM_DIFF,RNUM,GRP) AS
(SELECT ID,
TM,
DIFF,
DIFF,
RNUM,
CAST(1 AS int)
FROM ROWNUMS
WHERE RNUM=1
UNION ALL
SELECT T.ID,
T.TM,
T.DIFF,
CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN 0 ELSE C.SUM_DIFF+T.DIFF END,
T.RNUM,
CAST(CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN T.RNUM ELSE C.GRP END AS int)
FROM CTE C
JOIN ROWNUMS T ON T.RNUM=C.RNUM+1 AND T.ID=C.ID
)
SELECT ID,
TM,
DENSE_RANK() OVER(PARTITION BY ID ORDER BY GRP) AS row_num
FROM CTE
Demo in SQL Server
I am going to interpret the problem differently from vkp. Any row within 8 seconds of another row should be in the same group. Such values can chain together, so the overall span can be more than 8 seconds.
The advantage of this method is that recursive CTEs are not needed, so it should be faster. (Of course, this is not an advantage if the OP does not agree with the definition.)
The basic idea is to look at the previous date/time value; if it is more than 8 seconds away, then add a flag. The cumulative sum of the flag is the row number you are looking for.
select t.*,
sum(case when prev_dt >= dateandtime - interval '8' second
then 0 else 1
end) over (partition by customerid order by dateandtime
) as row_number
from (select t.*,
max(dateandtime) over (partition by customerid order by dateandtime row between 1 preceding and 1 preceding) as prev_dt
from t
) t;
Using Teradata's PERIOD data type and the awesome td_normalize_overlap_meet:
Consider table test32:
SELECT * FROM test32
+----+----+------------------------+
| f1 | f2 | f3 |
+----+----+------------------------+
| 1 | 2 | 2017-05-11 03:59:00 PM |
| 1 | 3 | 2017-05-11 03:59:01 PM |
| 1 | 4 | 2017-05-11 03:58:58 PM |
| 1 | 5 | 2017-05-11 03:59:26 PM |
| 1 | 2 | 2017-05-11 03:59:28 PM |
| 1 | 2 | 2017-05-11 03:59:46 PM |
+----+----+------------------------+
The following will group your records:
WITH
normalizedCTE AS
(
SELECT *
FROM TABLE
(
td_normalize_overlap_meet(NEW VARIANT_TYPE(periodCTE.f1), periodCTE.fper)
RETURNS (f1 integer, fper PERIOD(TIMESTAMP(0)), recordCount integer)
HASH BY f1
LOCAL ORDER BY f1, fper
) as output(f1, fper, recordcount)
),
periodCTE AS
(
SELECT f1, f2, f3, PERIOD(f3, f3 + INTERVAL '9' SECOND) as fper FROM test32
)
SELECT t2.f1, t2.f2, t2.f3, t1.fper, DENSE_RANK() OVER (PARTITION BY t2.f1 ORDER BY t1.fper) as fgroup
FROM normalizedCTE t1
INNER JOIN periodCTE t2 ON
t1.fper P_INTERSECT t2.fper IS NOT NULL
Results:
+----+----+------------------------+-------------+
| f1 | f2 | f3 | fgroup |
+----+----+------------------------+-------------+
| 1 | 2 | 2017-05-11 03:59:00 PM | 1 |
| 1 | 3 | 2017-05-11 03:59:01 PM | 1 |
| 1 | 4 | 2017-05-11 03:58:58 PM | 1 |
| 1 | 5 | 2017-05-11 03:59:26 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:28 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:46 PM | 3 |
+----+----+------------------------+-------------+
A Period in Teradata is a special data type that holds a date or datetime range. The first parameter is the start of the range and the second is the ending time (up to, but not including which is why it's "+ 9 seconds"). The result is that we get a 8 second time "Period" where each record might "intersect" with another record.
We then use td_normalize_overlap_meet to merge records that intersect, sharing the f1 field's value as the key. In your case that would be customerid. The result is three records for this one customer since we have three groups that "overlap" or "meet" each other's time periods.
We then join the td_normalize_overlap_meet output with the output from when we determined the periods. We use the P_INTERSECT function to see which periods from the normalized CTE INTERSECT with the periods from the initial Period CTE. From the result of that P_INTERSECT join we grab the values we need from each CTE.
Lastly, Dense_Rank() gives us a rank based on the normalized period for each group.

How to account for Postgresql rank() ties

I've a table teams with 30 rows and has a handful of statistics stored as attributes. For example, goals for, goals against, etc and I've created a view that uses rank() and does a good job ranking the records. Here's an abridged query example and resulting table:
SELECT name,
points,
rank() OVER (ORDER BY points DESC) AS point_tank
FROM teams;
name | points | point_rank
-----------------------+-----------+----------------
Team 1 | 14 | 1
Team 2 | 11 | 2
Team 3 | 9 | 3
Team 4 | 9 | 3
I would like to add an additional column that would return boolean based on whether or not the rank is a tie. eg Team 3 and Team 4 in this example. It might look something like this:
name | points | point_rank | tie
-----------------------+-----------+----------------+----------------
Team 1 | 14 | 1 | false
Team 2 | 11 | 2 | false
Team 3 | 9 | 3 | true
Team 4 | 9 | 3 | true
Any ideas here? Or am I approaching this incorrectly and abusing rank() here? Thanks in advance!
You could use a CTE and then use the lag/lead functions to check for ties:
with ranked as (
SELECT name,
points,
rank() OVER (ORDER BY points DESC) AS point_rank
FROM teams
)
select name, points, point_rank,
( point_rank = lag(point_rank, 1, -1::bigint) over (order by point_rank)
or point_rank = lead(point_rank, 1, -1::bigint) over (order by point_rank)
) as is_tie
from ranked;
The default value for the lag and lead function is needed for the first and last row, to avoid checking for null there.
Example: https://dbfiddle.uk/-01aFLr4
One option would be to place your current query into a common table expression and then use it to identify which ranks are duplicate:
WITH cte AS (
SELECT name,
points,
rank() OVER (ORDER BY points DESC) AS point_rank
FROM teams;
)
SELECT cte.name,
cte.points,
cte.point_rank
CASE WHEN t.point_rank IS NOT NULL THEN 'false' ELSE 'true' END AS tie
FROM cte
LEFT JOIN
(
SELECT point_rank
FROM cte
GROUP BY point_rank
HAVING COUNT(*) = 1
) t
ON cte.point_rank = t.point_rank
SELECT name
, points
, rank() OVER (rrr) AS point_rank
-- , count(*) OVER (ppp) AS ppp_cnt
, rank() OVER (pp2) AS sub_rank
, (COUNT(*) OVER (ppp) > 1) AS is_tie
FROM teams
WINDOW ppp AS (PARTITION BY points )
, pp2 AS (PARTITION BY points ORDER BY ctid )
, rrr AS (ORDER BY points DESC)
ORDER BY points DESC
;
Result (I added two extra rows):
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 6
name | points | point_rank | sub_rank | is_tie
--------+--------+------------+----------+--------
Team_1 | 14 | 1 | 1 | f
Team_2 | 11 | 2 | 1 | f
Team_3 | 9 | 3 | 1 | t
Team_4 | 9 | 3 | 2 | t
Team_5 | 5 | 5 | 1 | t
Team_6 | 5 | 5 | 2 | t
(6 rows)

Measuring two arrays for equality of lowest numbers

I have the following SQL Tables:
create table events (
sensor_id integer not null,
event_type integer not null,
value integer not null,
time timestamp unique not null
);
And it looks something like this:
sensor_id | event_type | value | time
-----------+------------+------------+--------------------
2 | 2 | 5 | 2014-02-13 12:42:00
2 | 4 | -42 | 2014-02-13 13:19:57
2 | 2 | 2 | 2014-02-13 14:48:30
3 | 2 | 7 | 2014-02-13 12:54:39
2 | 3 | 54 | 2014-02-13 13:32:36
what is the easiest way to return the most recent value in terms of time by sensor_id and event_type? so it looks like this:
sensor_id | event_type | value
-----------+------------+-----------
2 | 2 | 2
2 | 3 | 54
2 | 4 | -42
3 | 2 | 7
I cant get my head around it
try the t-sql code below:
SELECT e.sensor_id, e.event_type, e.value
FROM
(SELECT e.sensor_id, time = MAX(e.time)
FROM dbo.events e WITH(NOLOCK)
GROUP BY e.sensor_id, e.event_type) m
JOIN dbo.events e WITH(NOLOCK) ON e.sensor_id = m.sensor_id
AND e.time = m.time
ORDER BY e.sensor_id
how are your sorting the rows? can't seem to see the pattern.
EDIT: got the sorting now!
SELECT e.sensor_id, e.event_type, e.value
FROM
(SELECT e.sensor_id, time = MAX(e.time)
FROM dbo.events e WITH(NOLOCK)
GROUP BY e.sensor_id, e.event_type) m
JOIN dbo.events e WITH(NOLOCK) ON e.sensor_id = m.sensor_id
AND e.time = m.time
ORDER BY e.sensor_id, e.sensor_id + e.event_type
One way is to do:
SELECT value FROM events WHERE sensor_id=? AND event_type=? ORDER BY time DESC
Then the first one is the most recent.
Databases can sometimes limit you to just one, depending on the database you want.
I put a question mark where you would need to put a number.
SELECT
sensor_id, event_type, value
FROM
(
SELECT
ROW_NUMBER() OVER
(PARTITION BY sensor_id, event_type ORDER BY time DESC) AS Ordering,
*
FROM events
) data
WHERE
data.Ordering = 1

subtract data from single column

I have a database table with 2 columns naming piece and diff and type.
Here's what the table looks like
id | piece | diff | type
1 | 20 | NULL | cake
2 | 15 | NULL | cake
3 | 10 | NULL | cake
I want like 20 - 15 = 5 then 15 -10 = 5 , then so on so fort with type as where.
Result will be like this
id | piece | diff | type
1 | 20 | 0 | cake
2 | 15 | 5 | cake
3 | 10 | 5 | cake
Here's the code I have so far but i dont think I'm on the right track
SELECT
tableblabla.id,
(tableblabla.cast(pieces as decimal(7, 2)) - t.cast(pieces as decimal(7, 2))) as diff
FROM
tableblabla
INNER JOIN
tableblablaas t ON tableblabla.id = t.id + 1
Thanks for the help
Use LAG/LEAD window function.
Considering that you want to find Difference per type else remove Partition by from window functions
select id, piece,
Isnull(lag(piece)over(partition by type order by id) - piece,0) as Diff,
type
From yourtable
If you are using Sql Server prior to 2012 use this.
;WITH cte
AS (SELECT Row_number()OVER(partition by type ORDER BY id) RN,*
FROM Yourtable)
SELECT a.id,
a.piece,
Isnull(b.piece - a.piece, 0) AS diff,
a.type
FROM cte a
LEFT JOIN cte b
ON a.rn = b.rn + 1

SQL duration between two dates in different rows

I would really appreciate some assistance if somebody could help me construct a MSSQL Server 2000 query that would return the duration between a customer's A entry and their B entry.
Not all customers are expected to have a B record and so no results would be returned.
Customers Audit
+---+---------------+---+----------------------+
| 1 | Peter Griffin | A | 2013-01-01 15:00:00 |
| 2 | Martin Biggs | A | 2013-01-02 15:00:00 |
| 3 | Peter Griffin | C | 2013-01-05 09:00:00 |
| 4 | Super Mario | A | 2013-01-01 15:00:00 |
| 5 | Martin Biggs | B | 2013-01-03 18:00:00 |
+---+---------------+---+----------------------+
I'm hoping for results similar to:
+--------------+----------------+
| Martin Biggs | 1 day, 3 hours |
+--------------+----------------+
Something like the below (don't know your schema, so you'll need to change names of objects) should suffice.
SELECT ABS(DATEDIFF(HOUR, CA.TheDate, CB.TheDate)) AS HoursBetween
FROM dbo.Customers CA
INNER JOIN dbo.Customers CB
ON CB.Name = CA.Name
AND CB.Code = 'B'
WHERE CA.Code = 'A'
SELECT A.CUSTOMER, DATEDIFF(HOUR, A.ENTRY_DATE, B.ENTRY_DATE) DURATION
FROM CUSTOMERSAUDIT A, CUSTOMERSAUDIT B
WHERE B.CUSTOMER = A.CUSTOMER AND B.ENTRY_DATE > A.ENTRY_DATE
This is Oracle query but all features available in MS Server as far as I know. I'm sure I do not have to tell you how to concatenate the output to get desired result. All values in output will be in separate columns - days, hours, etc... And it is not always easy to format the output here:
SELECT id, name, grade
, NVL(EXTRACT(DAY FROM day_time_diff), 0) days
, NVL(EXTRACT(HOUR FROM day_time_diff), 0) hours
, NVL(EXTRACT(MINUTE FROM day_time_diff), 0) minutes
, NVL(EXTRACT(SECOND FROM day_time_diff), 0) seconds
FROM
(
SELECT id, name, grade
, (begin_date-end_date) day_time_diff
FROM
(
SELECT id, name, grade
, CAST(start_date AS TIMESTAMP) begin_date
, CAST(end_date AS TIMESTAMP) end_date
FROM
(
SELECT id, name, grade, start_date
, LAG(start_date, 1, to_date(null)) OVER (ORDER BY id) end_date
FROM stack_test
)
)
)
/
Output:
ID NAME GRADE DAYS HOURS MINUTES SECONDS
------------------------------------------------------------
1 Peter Griffin A 0 0 0 0
2 Martin Biggs A 1 1 0 0
3 Peter Griffin C 2 17 0 0
4 Super Mario A -3 -18 0 0
5 Martin Biggs A 2 3 0 0
The table structure/columns I used - it would be great if you took care of this and data in advance:
CREATE TABLE stack_test
(
id NUMBER
,name VARCHAR2(50)
,grade VARCHAR2(3)
,start_date DATE
)
/