How to find the average time difference between rows in a table? - sql

I have a mysql database that stores some timestamps. Let's assume that all there is in the table is the ID and the timestamp. The timestamps might be duplicated.
I want to find the average time difference between consecutive rows that are not duplicates (timewise). Is there a way to do it in SQL?

If your table is t, and your timestamp column is ts, and you want the answer in seconds:
SELECT TIMESTAMPDIFF(SECOND, MIN(ts), MAX(ts) )
/
(COUNT(DISTINCT(ts)) -1)
FROM t
This will be miles quicker for large tables as it has no n-squared JOIN
This uses a cute mathematical trick which helps with this problem. Ignore the problem of duplicates for the moment. The average time difference between consecutive rows is the difference between the first timestamp and the last timestamp, divided by the number of rows -1.
Proof: The average distance between consecutive rows is the sum of the distance between consective rows, divided by the number of consecutive rows. But the sum of the difference between consecutive rows is just the distance between the first row and last row (assuming they are sorted by timestamp). And the number of consecutive rows is the total number of rows -1.
Then we just condition the timestamps to be distinct.

Are the ID's contiguous ?
You could do something like,
SELECT
a.ID
, b.ID
, a.Timestamp
, b.Timestamp
, b.timestamp - a.timestamp as Difference
FROM
MyTable a
JOIN MyTable b
ON a.ID = b.ID + 1 AND a.Timestamp <> b.Timestamp
That'll give you a list of time differences on each consecutive row pair...
Then you could wrap that up in an AVG grouping...

Here's one way:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on cur.id = prev.id + 1
and cur.datecol <> prev.datecol
The timestampdiff function allows you to choose between days, months, seconds, and so on.
If the id's are not consecutive, you can select the previous row by adding a rule that there are no other rows in between:
select avg(timestampdiff(MINUTE,prev.datecol,cur.datecol))
from table cur
inner join table prev
on prev.datecol < cur.datecol
and not exists (
select *
from table inbetween
where prev.datecol < inbetween.datecol
and inbetween.datecol < cur.datecol)
)

OLD POST but ....
Easies way is to use the Lag function and TIMESTAMPDIFF
SELECT
id,
TIMESTAMPDIFF('MINUTES', PREVIOUS_TIMESTAMP, TIMESTAMP) AS TIME_DIFF_IN_MINUTES
FROM (
SELECT
id,
TIMESTAMP,
LAG(TIMESTAMP, 1) OVER (ORDER BY TIMESTAMP) AS PREVIOUS_TIMESTAMP
FROM TABLE_NAME
)

Adapted for SQL Server from this discussion.
Essential columns used are:
cmis_load_date: A date/time stamp associated with each record.
extract_file: The full path to a file from which the record was loaded.
Comments:
There can be many records in each file. Records have to be grouped by the files loaded on the extract_file column. Intervals of days may pass between one file and the next being loaded. There is no reliable sequential value in any column, so the grouped rows are sorted by the minimum load date in each file group, and the ROW_NUMBER() function then serves as an ad hoc sequential value.
SELECT
AVG(DATEDIFF(day, t2.MinCMISLoadDate, t1.MinCMISLoadDate)) as ElapsedAvg
FROM
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t1
LEFT JOIN
(
SELECT
ROW_NUMBER() OVER (ORDER BY MIN(cmis_load_date)) as RowNumber,
MIN(cmis_load_date) as MinCMISLoadDate,
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END as ExtractFile
FROM
TrafTabRecordsHistory
WHERE
court_id = 17
and
cmis_load_date >= '2019-09-01'
GROUP BY
CASE WHEN NOT CHARINDEX('\', extract_file) > 0 THEN '' ELSE RIGHT(extract_file, CHARINDEX('\', REVERSE(extract_file)) - 1) END
) t2 on t2.RowNumber + 1 = t1.RowNumber

Related

SQL: Filter records based on record creation date and other criteria

I am struggling to find a better solution to pick unique records from my user call data table.
My table structure is as follows:
SELECT [MarketName],
[WebsiteName] ,
[ID] ,
[UserID],
[CreationDate],
[CallDuration],
[FromPhone] ,
[ToPhone],
[IsAnswered],
[Source]
FROM [dbo].[UserCallData]
There are multiple entries in this table with different and same ID's. I wanted to check if [FromPhone] and [ToPhone] exists multiple times within last 3 months, if yes, I wanted to pick the first record with all columns based on [CreationDate], count the number of occurrences as TotalCallCount and sum the totalCallDuration as a single record. If [FromPhone] and [ToPhone] does not occur multiple times, I wanted to pick all columns as such. I have been able to put up partial query like below. But this doesn't return all columns without including in group by clause and also it doesn't satisfy my entire criteria. Any help on this would be highly appreciated.
select [FromPhone],
MIN([CreationDate]),
[ToPhone],
marketname,
count(*) as TotalCallCount ,
sum(CallDuration) as TotalCallDuration
from [dbo].[UserCallData]
where [CreationDate] >= DATEADD(MONTH, -3, GETDATE())
group by [FromPhone],[ToPhone], marketname
having count([FromPhone]) > 1 and count([ToPhone]) >1
Try to use ROW_NUMBER()
;with cte as
(
select *, ROW_NUMBER() OVER(PARTITION BY FromPhone, ToPhone ORDER BY CreationDate) as RN
from UserCallData
where CreationDate >= DATEADD(MONTH, -3, GETDATE())
),
cte_totals as
(
select C1.FromPhone, C1.ToPhone, COUNT(*) as TotalCallCount, SUM(CallDuration) as TotalCallDuration
from cte C1
where exists(select * from cte C2 where C1.FromPhone = C2.FromPhone and C1.ToPhone = C2.ToPhone and C2.RN > 1)
group by C1.FromPhone, C1.ToPhone
)
select C1.*, TotalCallCount, TotalCallDuration
from cte C1
inner join cte_totals C2 on C1.FromPhone = C2.FromPhone and C1.ToPhone = C2.ToPhone
where C1.RN = 1
I wrote query right in here so it can have some mistakes or mistypes, but the main idea might be clear.
I'm not entirely sure I've understood the question, but if I have the following may be what you want (or be a useful starting point):
SELECT
ucd.FromPhone,
min(ucd.CreationDate) as MinCreationDate,
ucd.ToPhone,
ucd.MarketName,
count(*) as TotalCallCount,
sum(ucd.CallDuration) as TotalCallDuration,
case
when min(ucd.WebsiteName) = max(ucd.WebsiteName) then min(ucd.WebsiteName)
else '* Various'
end as WebsiteName,
case
when min(ucd.ID) = max(ucd.ID) then min(ucd.ID)
else '* Various'
end as ID,
case
when min(ucd.UserID) = max(ucd.UserID) then min(ucd.UserID)
else '* Various'
end as UserID,
case
when min(ucd.IsAnswered) = max(ucd.IsAnswered) then min(ucd.IsAnswered)
else '* Some'
end as IsAnswered,
case
when min(ucd.Source) = max(ucd.Source) then min(ucd.Source)
else '* Various'
end as Source
FROM
dbo.UserCallData ucd
WHERE
ucd.CreationDate >= DATEADD(MONTH, -3, GETDATE())
GROUP BY
ucd.FromPhone,
ucd.ToPhone,
ucd.MarketName
Where we are collapsing rows together, if all the rows agree on a given column (so min(Field) = max(Field)), I return the min(Field) value (which is the same all the others, but avoid problems with needing additional "group by" clauses which would interfere with the other cases). Where they don't all agree, I've returned "* something".
The code assumes that all the columns are text type columns (you haven't said), you may get conversion errors. It also assumes that none of these fields are null. You / we can adapt the code if those assumptions aren't correct. If you aren't able to do that for yourself, let me know about issues, I'll be happy to do what I can.

ROW_NUMBER() Query Plan SORT Optimization

The query below accesses the Votes table that contains over 30 million rows. The result set is then selected from using WHERE n = 1. In the query plan, the SORT operation in the ROW_NUMBER() windowed function is 95% of the query's cost and it is taking over 6 minutes to complete execution.
I already have an index on same_voter, eid, country include vid, nid, sid, vote, time_stamp, new to cover the where clause.
Is the most efficient way to correct this to add an index on vid, nid, sid, new DESC, time_stamp DESC or is there an alternative to using the ROW_NUMBER() function for this to achieve the same results in a more efficient manner?
SELECT v.vid, v.nid, v.sid, v.vote, v.time_stamp, v.new, v.eid,
ROW_NUMBER() OVER (
PARTITION BY v.vid, v.nid, v.sid ORDER BY v.new DESC, v.time_stamp DESC) AS n
FROM dbo.Votes v
WHERE v.same_voter <> 1
AND v.eid <= #EId
AND v.eid > (#EId - 5)
AND v.country = #Country
One possible alternative to using ROW_NUMBER():
SELECT
V.vid,
V.nid,
V.sid,
V.vote,
V.time_stamp,
V.new,
V.eid
FROM
dbo.Votes V
LEFT OUTER JOIN dbo.Votes V2 ON
V2.vid = V.vid AND
V2.nid = V.nid AND
V2.sid = V.sid AND
V2.same_voter <> 1 AND
V2.eid <= #EId AND
V2.eid > (#EId - 5) AND
V2.country = #Country AND
(V2.new > V.new OR (V2.new = V.new AND V2.time_stamp > V.time_stamp))
WHERE
V.same_voter <> 1 AND
V.eid <= #EId AND
V.eid > (#EId - 5) AND
V.country = #Country AND
V2.vid IS NULL
The query basically says to get all rows matching your criteria, then join to any other rows that match the same criteria, but which would be ranked higher for the partition based on the new and time_stamp columns. If none are found then this must be the row that you want (it's ranked highest) and if none are found that means that V2.vid will be NULL. I'm assuming that vid otherwise can never be NULL. If it's a NULLable column in your table then you'll need to adjust that last line of the query.

Overlapping Spans

I am trying to write a query that reorders date ranges around particular spans. It should do something that looks like this
Row Rank Begin Date End Date
1 B 3/24/13 11/1/13
2 A 10/30/13 4/9/15
3 B 3/26/15 12/31/15
and have it become
Row Rank Begin Date End Date
1 B 3/24/13 10/29/13
2 A 10/30/13 4/9/15
3 B 4/10/15 12/31/15
To explain further, the dates in row 2 is ranked higher (A>B), so the dates in row 1 and 3 have to change around the dates in row 2 in order to avoid overlap in dates.
I am using SQL Server 2008 R2
You can use the following query:
;WITH CTE AS (
SELECT Row, Rank, BeginDate, EndDate,
ROW_NUMBER() OVER (ORDER BY BeginDate) AS rn
FROM mytable
), ToUpdate AS (
SELECT c1.Row, c1.Rank, c1.BeginDate, c1.EndDate,
c2.Rank AS pRank, c2.EndDate AS pEndDate,
c3.Rank AS nRank, c3.BeginDate AS nBeginDate
FROM CTE AS c1
LEFT JOIN CTE AS c2 ON c1.rn = c2.rn + 1
LEFT JOIN CTE AS c3 ON c1.rn = c3.rn - 1
WHERE c1.Rank = 'B'
)
UPDATE ToUpdate
SET BeginDate = CASE
WHEN pEndDate IS NULL
THEN BeginDate
WHEN (pEndDate >= BeginDate) AND (pRank = 'A')
THEN DATEADD(d, 1, pEndDate)
ELSE BeginDate
END,
EndDate = CASE
WHEN nBeginDate IS NULL
THEN EndDate
WHEN (nBeginDate <= EndDate) AND (nRank = 'A')
THEN DATEADD(d, -1, nBeginDate)
ELSE EndDate
END
A CTE is initially constructed to assign consecutive, ascending numbers to every record of your table. ROW_NUMBER() window function is used for this purpose.
Using this CTE as a basis we construct ToUpdate. This latter CTE contains date values of current as well as previous and next records.
This LEFT JOIN:
LEFT JOIN CTE AS c2 ON c1.rn = c2.rn + 1
is used to join together with previous record, whereas this one:
LEFT JOIN CTE AS c3 ON c1.rn = c3.rn - 1
is used to join together with next record.
Using CASE expressions we can now easily identify overlaps, and, in case there is one, perform an update.
Demo here
Please use the below query to update the table.
Update table_name
set End_Date = DATEADD(day, -1, select Begin_Date from Table_name where
Row_number = '2')
where row = 1;
You need to change the row numbers every time you run the query. Let me know If this works for you.
I suggest First create a View
CREATE OR REPLACE VIEW tempview AS
SELECT row, begin_date FROM table_name
WHERE row > 1;
Then Use this query to update all the row. If may not update just the first row.
Update table_name
set End_Date = DATEADD(day, -1, select Begin_Date from tempview)
Hope this works

troubles with next and previous query

I have a list and the returned table looks like this. I took the preview of only one car but there are many more.
What I need to do now is check that the current KM value is larger then the previous and smaller then the next. If this is not the case I need to make a field called Trustworthy and should fill it with either 1 or 0 (true/ false).
The result that I have so far is this:
validKMstand and validkmstand2 are how I calculate it. It did not work in one list so that is why I separated it.
In both of my tries my code does not work.
Here is the code that I have so far.
FullList as (
SELECT
*
FROM
eMK_Mileage as Mileage
)
, ValidChecked1 as (
SELECT
UL1.*,
CASE WHEN EXISTS(
SELECT TOP(1)UL2.*
FROM FullList AS UL2
WHERE
UL2.FK_CarID = UL1.FK_CarID AND
UL1.KM_Date > UL2.KM_Date AND
UL1.KM > UL2.KM
ORDER BY UL2.KM_Date DESC
)
THEN 1
ELSE 0
END AS validkmstand
FROM FullList as UL1
)
, ValidChecked2 as (
SELECT
List1.*,
(CASE WHEN List1.KM > ulprev.KM
THEN 1
ELSE 0
END
) AS validkmstand2
FROM ValidChecked1 as List1 outer apply
(SELECT TOP(1)UL3.*
FROM ValidChecked1 AS UL3
WHERE
UL3.FK_CarID = List1.FK_CarID AND
UL3.KM_Date <= List1.KM_Date AND
List1.KM > UL3.KM
ORDER BY UL3.KM_Date DESC) ulprev
)
SELECT * FROM ValidChecked2 order by FK_CarID, KM_Date
Maybe something like this is what you are looking for?
;with data as
(
select *, rn = row_number() over (partition by fk_carid order by km_date)
from eMK_Mileage
)
select
d.FK_CarID, d.KM, d.KM_Date,
valid =
case
when (d.KM > d_prev.KM /* or d_prev.KM is null */)
and (d.KM < d_next.KM /* or d_next.KM is null */)
then 1 else 0
end
from data d
left join data d_prev on d.FK_CarID = d_prev.FK_CarID and d_prev.rn = d.rn - 1
left join data d_next on d.FK_CarID = d_next.FK_CarID and d_next.rn = d.rn + 1
order by d.FK_CarID, d.KM_Date
With SQL Server versions 2012+ you could have used the lag() and lead() analytical functions to access the previous/next rows, but in versions before you can accomplish the same thing by numbering rows within partitions of the set. There are other ways too, like using correlated subqueries.
I left a couple of conditions commented out that deal with the first and last rows for every car - maybe those should be considered valid is they fulfill only one part of the comparison (since the previous/next rows are null)?

detect gaps in integer sequence

Intention: detect whether a numeric sequence contains gaps. No need to identify the missing elements, just flag (true / false) the sequence if it contains gaps.
CREATE TABLE foo(x INTEGER);
INSERT INTO foo(x) VALUES (1), (2), (4);
Below is my (apparently correctly functioning) query to detect gaps:
WITH cte AS
(SELECT DISTINCT x FROM foo)
SELECT
( (SELECT COUNT(*) FROM cte a
CROSS JOIN cte b
WHERE b.x=a.x-1)
=(SELECT COUNT(*)-1 FROM cte))
OR (NOT EXISTS (SELECT 1 FROM cte))
where the OR is needed for the edge case where the table is empty. The query's logic is based on the observation that in a contiguous sequence the number of links equals the number of elements minus 1.
Anything more idiomatic or performant (should I be worried by the CROSS JOIN in particularly long sequences?)
Try this:
SELECT
CASE WHEN ((MAX(x)-MIN(x)+1 = COUNT(DISTINCT X)) OR
(COUNT(DISTINCT X) = 0) )
THEN 'TRUE'
ELSE 'FALSE'
END
FROM foo
SQLFiddle demo
The following should detect whether or not there are gaps:
select (case when max(x) - min(x) + 1 = count(distinct x)
then 'No Gaps'
else 'Some Gaps'
end)
from foo;
If there are no gaps or duplicates, then the number of distinct values of x is the max minus the min plus 1.
A different approach...
If you subtract your min value from the max value, and add 1, you should equal the count.
if count = (max-min)+1 then "no gaps!"
If you can express that in SQL, it should be very efficient.
SELECT 'Has ' || count(*) - 1 || ' gaps.' AS gaps
FROM foo f1
LEFT JOIN foo f2 ON f2.id = f1.id + 1
WHERE f2.id IS NULL;
The trick is to count rows, where the next row is missing - which only happens for the last row(s) if there are no gaps.
If there are no rows, you get 'Has -1 gaps.'.
If there are no gaps, you get 'Has 0 gaps.'.
Else you get 'Has n gaps.' .. n being the exact number of gaps, no matter how big.
The count can be increased for duplicates, but 0 and -1 are immune to dupes.