Records 1 Hour Before and 1 Hour After - sql

I have a table with a set of trouble ticket data. I would like to write a query that selects all the records in this table that have occurred from 1 hour before to 1 hour after a particular record was inserted.
Example:
Error "xyz" occurred at 2018-01-03 15:30:06.000
I would like to return EVERY trouble ticket that was created between 14:30:06.000 and 16:30:06.000 on 2018-01-03. I would like this to happen for all occurrences of that error since the beginning of the year.
Is this possible?
This is what I have, considering the example provided below. I'm still only returning the results in the temp table, and not the +1h and -1h of records from the original table.
select * into #temp
from Incident
where INCIDENT_REPORTED_DATE_TIME > '01/01/2018'
and SUMMARY like '%error%'
select i.*
from Incident i
join #temp t on i.INCIDENT_ID = t.INCIDENT_ID
where i.INCIDENT_REPORTED_DATE_TIME >= DATEADD(HH, -1, t.INCIDENT_REPORTED_DATE_TIME)
and i.INCIDENT_REPORTED_DATE_TIME < DATEADD(HH, 1, t.INCIDENT_REPORTED_DATE_TIME)
order by i.INCIDENT_REPORTED_DATE_TIME

Here is an ANSI SQL approach:
select t.*
from t join
t t2
on t2.col = 'xyz' and
t.created >= t2.created - interval '1 hour' and
t.created <= t2.created + interval '1 hour'
order by t.created;
Note that the exact syntax varies by database (which isn't specified as I write this). But this idea should work in almost any database, with the right date/time functions.
EDIT:
In SQL Server, this looks like:
select t.*
from t join
t t2
on t2.col = 'xyz' and
t.created >= dateadd(hour, -1, t2.created) and
t.created <= dateadd(hour 1, t2.created)
order by t.created;

Related

Postgresql left join date_trunc with default values

I have 3 tables which I'm querying to get the data based on different conditions. I have from and to params and these are the ones I'm using to create a range of time in which I'm looking for the data in those tables.
For instance if I have from equals to '2020-07-01' and to equals to '2020-08-01' I'm expecting to receive the grouped row values of the tables by week, if in some case some of the weeks don't have records I want to return 0, if some tables have records for the same week, I'd like to sum them.
Currently I have this:
SELECT d.day, COALESCE(t.total, 0)
FROM (
SELECT day::date
FROM generate_series(timestamp '2020-07-01',
timestamp '2020-08-01',
interval '1 week') day
) d
LEFT JOIN (
SELECT date AS day,
SUM(total)
FROM table1
WHERE id = '1'
AND date BETWEEN '2020-07-01' AND '2020-08-01'
GROUP BY day
) t USING (day)
ORDER BY d.day;
I'm generating a series of dates grouped by week, and on top of that I'm doing adding a left join. Now for some reason, it only works if the dates match completely, otherwise COALESCE(t.total, 0) returns 0 even if in that week the SUM(total) is not 0.
The same way I'm applying the LEFT JOIN, I'm using other left joins with other tables in the same query, so I'm falling with the same problem.
Please see if this works for you. Whenever you find yourself aggregating more than once, ask yourself whether it is necessary.
Rather than try to match on discrete days, use time ranges.
with limits as (
select '2020-07-01'::timestamp as dt_start,
'2020-08-01'::timestamp as dt_end
), weeks as (
SELECT x.day::date as day, least(x.day::date + 7, dt_end::date) as day_end
FROM limits l
CROSS JOIN LATERAL
generate_series(l.dt_start, l.dt_end, interval '1 week') as x(day)
WHERE x.day::date != least(x.day::date + 7, dt_end::date)
), t1 as (
select w.day,
sum(coalesce(t.total, 0)) as t1total
from weeks w
left join table1 t
on t.id = 1
and t.date >= w.day
and t.date < w.day_end
group by w.day
), t2 as (
select w.day,
sum(coalesce(t.sum_measure, 0)) as t2total
from weeks w
left join table2 t
on t.something = 'whatever'
and t.date >= w.day
and t.date < w.day_end
group by w.day
)
select t1.day,
t1.t1total,
t2.t2total
from t1
join t2 on t2.day = t1.day;
You can keep adding tables like that with CTEs.
My earlier example with multiple left join was bad because it blows out the rows due to a lack of join conditions between the left-joined tables.
There is an interesting corner case for e.g. 2019-02-01 to 2019-03-01 which returns an empty interval as the last week. I have updated to filter that out.

Sampling issue with query in BigQuery (Standard SQL)

I have been running a query of the format below
SELECT b.date as Date,COUNT(DISTINCT user_id) AS NewUsers FROM (
SELECT user_id,MIN(date) as min_date
FROM tableA
WHERE date >= '2018-10-10'
AND filter1 = "XYZ"
GROUP BY ) a
CROSS JOIN (
SELECT date FROM tableB
WHERE date >= '2018-10-19' AND date <= CURRENT_DATE()
GROUP BY 1) b
WHERE a.date >= DATE_SUB(b.date, INTERVAL 6 DAY) AND a.date <= b.date
GROUP BY 1
Let's say the above is result1
SELECT b.date as Date,COUNT(DISTINCT user_id) AS NewUsers FROM (
SELECT user_id,MIN(date) as min_date
FROM tableA
WHERE date >= '2018-07-10'
AND filter1 = "XYZ"
GROUP BY ) a
CROSS JOIN (
SELECT date FROM tableB
WHERE date >= '2018-07-19' AND date <= CURRENT_DATE()
GROUP BY 1) b
WHERE a.date >= DATE_SUB(b.date, INTERVAL 6 DAY) AND a.date <= b.date
GROUP BY 1
The above is result2
Here 2018-07-19 is the launch date.
Since I have the data till 2018-10-19, I want to run the query from the later date to optimize the cost and the data consumption by the query....but some how, I am getting incorrect data.
But, if I run the same query from the launch date, I am getting the correct results.
I mean the NewUsers from result1 for the corresponding dates (like date >= 2018-10-19) are more than the NewUsers from result2.
No sure, where I am missing something.
Any help would be greatly appreciated.
Thanks
I think - it is because of use of 'MIN(date)' - You see shift in counts because you restricted dates so those users who were first seen in earlier dates - now those same "old" users are counted for recent days - thus the confusion

Query to check number of records created in a month.

My table creates a new record with timestamp daily when an integration is successful. I am trying to create a query that would check (preferably automated) the number of days in a month vs number of records in the table within a time frame.
For example, January has 31 days, so i would like to know how many days in january my process was not successful. If the number of records is less than 31, than i know the job failed 31 - x times.
I tried the following but was not getting very far:
SELECT COUNT (DISTINCT CompleteDate)
FROM table
WHERE CompleteDate BETWEEN '01/01/2015' AND '01/31/2015'
Every 7 days the system executes the job twice, so i get two records on the same day, but i am trying to determine the number of days that nothing happened (failures), so i assume some truncation of the date field is needed?!
One way to do this is to use a calendar/date table as the main source of dates in the range and left join with that and count the number of null values.
In absence of a proper date table you can generate a range of dates using a number sequence like the one found in the master..spt_values table:
select count(*) failed
from (
select dateadd(day, number, '2015-01-01') date
from master..spt_values where type='P' and number < 365
) a
left join your_table b on a.date = b.CompleteDate
where b.CompleteDate is null
and a.date BETWEEN '01/01/2015' AND '01/31/2015'
Sample SQL Fiddle (with count grouped by month)
Assuming you have an Integers table*. This query will pull all dates where no record is found in the target table:
declare #StartDate datetime = '01/01/2013',
#EndDate datetime = '12/31/2013'
;with d as (
select *, date = dateadd(d, i - 1 , #StartDate)
from dbo.Integers
where i <= datediff(d, #StartDate, #EndDate) + 1
)
select d.date
from d
where not exists (
select 1 from <target> t
where DATEADD(dd, DATEDIFF(dd, 0, t.<timestamp>), 0) = DATEADD(dd, DATEDIFF(dd, 0, d.date), 0)
)
Between is not safe here
SELECT 31 - count(distinct(convert(date, CompleteDate)))
FROM table
WHERE CompleteDate >= '01/01/2015' AND CompleteDate < '02/01/2015'
You can use the following query:
SELECT DATEDIFF(day, t.d, dateadd(month, 1, t.d)) - COUNT(DISTINCT CompleteDate)
FROM mytable
CROSS APPLY (SELECT CAST(YEAR(CompleteDate) AS VARCHAR(4)) +
RIGHT('0' + CAST(MONTH(CompleteDate) AS VARCHAR(2)), 2) +
'01') t(d)
GROUP BY t.d
SQL Fiddle Demo
Explanation:
The value CROSS APPLY-ied, i.e. t.d, is the ANSI string of the first day of the month of CompleteDate, e.g. '20150101' for 12/01/2015, or 18/01/2015.
DATEDIFF uses the above mentioned value, i.e. t.d, in order to calculate the number of days of the month that CompleteDate belongs to.
GROUP BY essentially groups by (Year, Month), hence COUNT(DISTINCT CompleteDate) returns the number of distinct records per month.
The values returned by the query are the differences of [2] - 1, i.e. the number of failures per month, for each (Year, Month) of your initial data.
If you want to query a specific Year, Month then just simply add a WHERE clause to the above:
WHERE YEAR(CompleteDate) = 2015 AND MONTH(CompleteDate) = 1

Return records between now and previous week

I have the following SQL query in oracle:
SELECT * FROM
(
SELECT s.singleid,s.titel,a.naam,s.taal,SUM(b.aantal) AS "AANTAL VERKOCHT"
FROM singles s
JOIN artiesten a on a.artiestid = s.artiestid
JOIN bestellingen b on b.singleid = s.singleid
GROUP BY s.singleid,s.titel,a.naam,s.taal,b.datum
ORDER BY sum(b.aantal) DESC
)
WHERE ROWNUM <= 5
This works, but I need to return only the records where b.datum is between the time now, and previous week.
How do I do this?
You should be able to add a BETWEEN clause to your where:
WHERE b.datum between SYSDATE - 6 AND SYSDATE
You would want to extract from the BESTELLINGEN table only those rows where [datum] is greater than or equal to (today at midnight minus 7 days) and less than or equal to 'now' (or less than tomorrow at midnight, according to your requirement). I would probably make this set of Bestellingen rows an inline view and join the other tables to it, and then do your grouping.
In SQL Server the syntax is:
AND (b.datum > DATEADD(week, -1, GetDate()) and b.datum < GetDate())
I would assume the syntax is the same, or very similar, in Oracle.

SQL - alert if there is a new unique record inserted in the last hour

I am trying to find an elegant solution in the form of a SQL query for the following problem.
New records will be inserted in the Log table.
I need to detect any new records (inserted in the last hour) that I haven't seen before and generate an alert (e.g. # of these records > 0)
ID, Url, DOB
1, site1.com/page1, "5/06/2012 20:01"
2, site2.com/page2, "5/06/2012 21:20"
3, site1.com/page1, "6/06/2012 10:05"
If "now" is 6/06/2012 10:40 - I see that there was 1 new record (id=3) inserted but I don't want to generate an alert because we have seen this URL before (id=1).
if we have
4, site3.com/pageX, "6/06/2012 10:08"
then I want to generate an alert (return count=1) because this row was inserted in the last hour and we haven't seen it before.
What is the best way to implement it? ideally without nested queries
I think this is what you are after. This will retrieve new entries in the last hour (Where new means the same URL has not been visited unitl the last hour)
SELECT *
FROM Log
WHERE DOB > DATEADD(HOUR, -1, CURRENT_TIMESTAMP)
AND NOT EXISTS
( SELECT 1
FROM Log T1
WHERE T1.URL = Log.URL
AND T1.DOB < DATEADD(HOUR, -1, CURRENT_TIMESTAMP)
)
Working example on SQL Fiddle
EDIT
Just seen a comment that you only need a count:
SELECT COUNT(*)
FROM Log
WHERE DOB > DATEADD(HOUR, -1, CURRENT_TIMESTAMP)
AND NOT EXISTS
( SELECT 1
FROM Log T1
WHERE T1.URL = Log.URL
AND T1.DOB < DATEADD(HOUR, -1, CURRENT_TIMESTAMP)
)
EDIT 2
I am not sure why there is the requirement of only being a single select, however, the closest I can get to a single select is this:
SELECT COUNT(*)
FROM ( SELECT *, MIN(DOB) OVER(PARTITION BY URL) [FirstViewed]
FROM Log
) Log
WHERE FirstViewed >= DATEADD(HOUR, -1, CURRENT_TIMESTAMP)
This will still return 2 if the same page has been visited twice in the last hour.
http://sqlfiddle.com/#!3/5a8bc/1
This one does something alternative, first search unique url by grouping, then extract those in the last hour.
SELECT x1.*
FROM
(SELECT URL,
COUNT(ID) AS urlcount,
MAX(DOB) AS uniqueurl
FROM Log
GROUP BY URL HAVING count(ID) = 1
OR MIN(DOB) > dateadd(HOUR ,-1 , CURRENT_TIMESTAMP)) AS x1
WHERE x1.uniqueurl > dateadd(HOUR ,-1 , CURRENT_TIMESTAMP);
http://sqlfiddle.com/#!3/250e0/45/0
I cannot figure out if this has acceptable performance without looking at an explain, but i think the sort operation involved in the group by could be a bottleneck
Without nested query (SQLFiddle):
SELECT COUNT(DISTINCT T0.URL)
FROM Log AS T0
LEFT OUTER JOIN Log AS T1 ON
T1.URL = T0.URL
AND T1.DOB < DATEADD(HOUR, -1, CURRENT_TIMESTAMP)
WHERE
T0.DOB > DATEADD(HOUR, -1, CURRENT_TIMESTAMP)
AND T1.ID IS NULL
But it really is the same solution as GarethD, performance wise.
Try this:
SELECT DISTINCT a.id, a.url, a.dob
FROM Log a JOIN Log b ON (a.url = b.url)
WHERE UNIX_TIMESTAMP(NOW())-UNIX_TIMESTAMP(a.DOB)<=3600
AND UNIX_TIMESTAMP(NOW())-UNIX_TIMESTAMP(b.DOB)>3600;
It should return all the records that follow the pattern you specified in the question.
Observe that I use UNIX_TIMESTAMP to get the dates translated to seconds, so the substract will return a difference in time expressed as an amount of seconds. And the comparison must be made against 3600 seconds.
EDIT:
The sentence has been corrected. But it's for MySQL (I didn't see the sql-server2005 tag)
select distinct(a.url) from tbl a, tbl b where a.dob>(now-hour) and b.dob<=(now-hour) and a.url=b.url;
(replace time manipulation with something from your db of choice. index the urls and dob)
Also hope that your database is sensible enough to do the dob-comparison before join and join using indexes.