SQL Server - Select all top of the hour records

SQL Server - Select all top of the hour records - sql

I have a large table with records created every second and want to select only those records that were created at the top of each hour for the last 2 months. So we would get 24 selected records for every day over the last 60 days
The table structure is Dateandtime, Value1, Value2, etc
Many Thanks

You could group by on the date part (cast(col1 as date)) and the hour part (datepart(hh, col1). Then pick the minimum date for each hour, and filter on that:
select *
from YourTable yt
join (
select min(dateandtime) as dt
from YourTable
where datediff(day, dateandtime, getdate()) <= 60
group by
cast(dateandtime as date)
, datepart(hh, dateandtime)
) filter
on filter.dt = yt.dateandtime
Alternatively, you can group on a date format that only includes the date and the hour. For example, convert(varchar(13), getdate(), 120) returns 2013-05-11 18.
...
group by
convert(varchar(13), getdate(), 120)
) filter
...

For clarity's sake, I would probably use a two-step, CTE-based approach (this works in SQL Server 2005 and newer - you didn't clearly specify which version of SQL Server you're using, so I'm just hoping you're not on an ancient version like 2000 anymore):
-- define a "base" CTE to get the hour component of your "DateAndTime"
-- column and make it accessible under its own name
;WITH BaseCTE AS
(
SELECT
ID, DateAndTime,
Value1, Value2,
HourPart = DATEPART(HOUR, DateAndTime)
FROM dbo.YourTable
WHERE DateAndTime >= #SomeThresholdDateHere
),
-- define a second CTE which "partitions" the data by this "HourPart",
-- and number all rows for each partition starting at 1. So each "last"
-- event for each hour is the one with the RN = 1 value
HourlyCTE AS
(
SELECT ID, DateAndTime, Value1, Value2,
RN = ROW_NUMBER() OVER(PARTITION BY HourPart ORDER BY DateAndTime DESC)
FROM BaseCTE
)
SELECT *
FROM HourlyCTE
WHERE RN=1
Also: I wasn't sure what exactly you mean by "top of the hour" - the row that's been created right at the beginning of each hour (e.g. at 04:00:00) - or rather the last row created in that hour's time span? If you mean the first one for each hour - then you'd need to change the ORDER BY DateAndTime DESC to ORDER BY DateAndTime ASC

You can use option with EXISTS operator
SELECT *
FROM dbo.tableName t
WHERE t.DateAndTime >= #YourDateCondition
AND EXISTS (
SELECT 1
FROM dbo.tableName t2
WHERE t2.Dateandtime >= DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime), 0)
AND t2.Dateandtime < DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime)+1, 0)
HAVING MAX(t2.Dateandtime) = t.Dateandtime
)
OR option with CROSS APPLY operator
SELECT *
FROM dbo.test83 t CROSS APPLY (
SELECT 1
FROM dbo.test83 t2
WHERE t2.Dateandtime >= DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime), 0)
AND t2.Dateandtime < DATEADD(HOUR, DATEDIFF(HOUR, 0, t.Dateandtime)+1, 0)
HAVING MAX(t2.Dateandtime) = t.Dateandtime
) o(IsMatch)
WHERE t.DateAndTime >= #YourDateCondition
For improving performance use this index:
CREATE INDEX x ON dbo.test83(DateAndTime) INCLUDE(Value1, Value2)

Try:
select * from mytable
where datepart(mi, dateandtime)=0 and
datepart(ss, dateandtime)=0 and
datediff(d, dateandtime, getdate()) <=60

You can use window functions for this:
select dateandtime, val1, val2, . . .
from (select t.*,
row_number() over (partition by cast(dateandtime as date), hour(dateandtime)
order by dateandtime
) as seqnum
from t
) t
where seqnum = 1
The function row_number() assigns a sequential number to each group defined by the partition clause -- in this case each hour of each day. Within this group, it orders by the dateandtime value, so the one closest to the top of the hour gets a value of 1. The outer query just selects this one record for each group.
You may need an additional filter clause to get records in the last 60 days. Use this in the subquery:
where dateandtime >= getdate() - 60

This helped me get the top of the hour. Anything that ends in ":00:00".
WHERE (CAST(DATETIME as VARCHAR(19))) LIKE '%:00:00'

Related

Recursive CTE in Amazon Redshift

We are trying to port a code to run on Amazon Redshift, but Refshift won't run the recursive CTE function. Any good soul that knows how to port this?
with tt as (
select t.*, row_number() over (partition by id order by time) as seqnum
from t
),
recursive cte as (
select t.*, time as grp_start
from tt
where seqnum = 1
union all
select tt.*,
(case when tt.time < cte.grp_start + interval '3 second'
then tt.time
else tt.grp_start
end)
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select cte.*,
(case when grp_start = lag(grp_start) over (partition by id order by time)
then 0 else 1
end) as isValid
from cte;
Or, a different code to reproduce the logic below.
It is a binary result that:
it is 1 if it is the first known value of an ID
it is 1 if it is 3 seconds or later than the previous "1" of that ID
It is 0 if it is less than 3 seconds than the previous "1" of that ID
Note 1: this is not the difference in seconds from the previous record
Note 2: there are many IDs in the data set
Note 3: original dataset has ID and Date
Desired output:
https://i.stack.imgur.com/k4KUQ.png
Dataset poc:
http://www.sqlfiddle.com/#!15/41d4b

As of this writing, Redshift does support recursive CTE's: see documentation here
To note when creating a recursive CTE in Redshift:
start the query: with recursive
column names must be declared for all recursive cte's
Consider the following example for creating a list of dates using recursive CTE's:
with recursive
start_dt as (select current_date s_dt)
, end_dt as (select dateadd(day, 1000, current_date) e_dt)
-- the recusive cte, note declaration of the column `dt`
, dates (dt) as (
-- start at the start date
select s_dt dt from start_dt
union all
-- recursive lines
select dateadd(day, 1, dt)::date dt -- converted to date to avoid type mismatch
from dates
where dt <= (select e_dt from end_dt) -- stop at the end date
)
select *
from dates

The below code could help you.
SELECT id, time, CASE WHEN sec_diff is null or prev_sec_diff - sec_diff > 3
then 1
else 0
end FROM (
select id, time, sec_diff, lag(sec_diff) over(
partition by id order by time asc
)
as prev_sec_diff
from (
select id, time, date_part('s', time - lag(time) over(
partition by id order by time asc
)
)
as sec_diff from hon
) x
) y

SQL how to write a query that return missing date ranges?

I am trying to figure out how to write a query that looks at certain records and finds missing date ranges between today and 9999-12-31.
My data looks like below:
ID |start_dt |end_dt |prc_or_disc_1
10412 |2018-07-17 00:00:00.000 |2018-07-20 00:00:00.000 |1050.000000
10413 |2018-07-23 00:00:00.000 |2018-07-26 00:00:00.000 |1040.000000
So for this data I would want my query to return:
2018-07-10 | 2018-07-16
2018-07-21 | 2018-07-22
2018-07-27 | 9999-12-31
I'm not really sure where to start. Is this possible?

You can do that using the lag() function in MS SQL (but that is available starting with 2012?).
with myData as
(
select *,
lag(end_dt,1) over (order by start_dt) as lagEnd
from myTable),
myMax as
(
select Max(end_dt) as maxDate from myTable
)
select dateadd(d,1,lagEnd) as StartDate, dateadd(d, -1, start_dt) as EndDate
from myData
where lagEnd is not null and dateadd(d,1,lagEnd) < start_dt
union all
select dateAdd(d,1,maxDate) as StartDate, cast('99991231' as Datetime) as EndDate
from myMax
where maxDate < '99991231';
If lag() is not available in MS SQL 2008, then you can mimic it with row_number() and joining.

select
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then end_dt +1 END as F1,
CASE WHEN DATEDIFF(day, end_dt, ISNULL(LEAD(start_dt) over (order by ID), '99991231')) > 1 then ISNULL(LEAD(start_dt) over (order by ID) - 1, '99991231') END as F2
from t
Working SQLFiddle example is -> Here
FOR 2008 VERSION
SELECT
X.end_dt + 1 as F1,
ISNULL(Y.start_dt-1, '99991231') as F2
FROM t X
LEFT JOIN (
SELECT
*
, (SELECT MAX(ID) FROM t WHERE ID < A.ID) as ID2
FROM t A) Y ON X.ID = Y.ID2
WHERE DATEDIFF(day, X.end_dt, ISNULL(Y.start_dt, '99991231')) > 1
Working SQLFiddle example is -> Here

This should work in 2008, it assumes that ranges in your table do not overlap. It will also eliminate rows where the end_date of the current row is a day before the start date of the next row.
with dtRanges as (
select start_dt, end_dt, row_number() over (order by start_dt) as rownum
from table1
)
select t2.end_dt + 1, coalesce(start_dt_next -1,'99991231')
FROM
( select dr1.start_dt, dr1.end_dt,dr2.start_dt as start_dt_next
from dtRanges dr1
left join dtRanges dr2 on dr2.rownum = dr1.rownum + 1
) t2
where
t2.end_dt + 1 <> coalesce(start_dt_next,'99991231')

http://sqlfiddle.com/#!18/65238/1
SELECT
*
FROM
(
SELECT
end_dt+1 AS start_dt,
LEAD(start_dt-1, 1, '9999-12-31')
OVER (ORDER BY start_dt)
AS end_dt
FROM
yourTable
)
gaps
WHERE
gaps.end_dt >= gaps.start_dt
I would, however, strongly urge you to use end dates that are "exclusive". That is, the range is everything up to but excluding the end_dt.
That way, a range of one day becomes '2018-07-09', '2018-07-10'.
It's really clear that my range is one day long, if you subtract one from the other you get a day.
Also, if you ever change to needing hour granularity or minute granularity you don't need to change your data. It just works. Always. Reliably. Intuitively.
If you search the web you'll find plenty of documentation on why inclusive-start and exclusive-end is a very good idea from a software perspective. (Then, in the query above, you can remove the wonky +1 and -1.)

This solves your case, but provide some sample data if there will ever be overlaps, fringe cases, etc.
Take one day after your end date and 1 day before the next line's start date.
DECLARE # TABLE (ID int, start_dt DATETIME, end_dt DATETIME, prc VARCHAR(100))
INSERT INTO # (id, start_dt, end_dt, prc)
VALUES
(10410, '2018-07-09 00:00:00.00','2018-07-12 00:00:00.000','1025.000000'),
(10412, '2018-07-17 00:00:00.00','2018-07-20 00:00:00.000','1050.000000'),
(10413, '2018-07-23 00:00:00.00','2018-07-26 00:00:00.000','1040.000000')
SELECT DATEADD(DAY, 1, end_dt)
, DATEADD(DAY, -1, LEAD(start_dt, 1, '9999-12-31') OVER(ORDER BY id) )
FROM #

You may want to take a look at this:
http://sqlfiddle.com/#!18/3a224/1
You just have to edit the begin range to today and the end range to 9999-12-31.

How to query database for rows from next 5 days

How can I make a query in SQL Server to query for all rows for the next 5 days.
The problem is that it has to be days with records, so the next 5 days, might become something like, Today, Tomorrow, some day in next month, etc...
Basically I want to query the database for the records for the next non empty X days.
The table has a column called Date, which is what I want to filter.

Why not split the search into 2 queries. First one searches for the date part, the second uses that result to search for records IN the dates returned by the first query.

#Anagha is close, just a little modification and it is OK.
SELECT *
FROM TABLE
WHERE DATE IN (
SELECT DISTINCT TOP 5 DATE
FROM TABLE
WHERE DATE >= referenceDate
ORDER BY DATE
)

You can use following SQL query where 5 different dates are fetched at first then all rows for those selected dates are displayed
declare #n int = 5;
select *
from myData
where
datecol in (
SELECT distinct top (#n) cast(datecol as date) as datecol
FROM myData
WHERE datecol >= '20180101'
ORDER BY datecol
)

Try this:
select date from table where date in (select distinct top 5 date
from table where date >= getdate() order by date)

If your values are dates, you can use `dense_rank():
select t.*
from (select t.*, dense_rank() over (order by datecol) as seqnum
from t
where datecol >= cast(getdate() as date)
) t
where seqnum <= 5;
If the column has a time component and you still want to define days by midnight-to-midnight (as suggested by the question), just convert to date:
select t.*
from (select t.*,
dense_rank() over (order by cast(datetimecol as date)) as seqnum
from t
where datetimecol >= cast(getdate() as date)
) t
where seqnum <= 5;

SQL SELECT Only Closest To 15-Minute Timestamp

I would like to do a SQL SELECT for the closest value to each 15-minute time value. For example:
00:15,
00:30,
00:45,
01:00,
01:15 etc...
based on timestamps(time) that are not quite on the 00 second stamped using the following if have managed to round every value down to the closest 15 minutes but I only want the closest one e.g.
SELECT dateadd(minute, -1 * datediff(minute, 0,
cast(convert(varchar(20),[time],100) as smalldatetime)) % 15,
dateadd(minute, datediff(minute, 0, [time]), 0)) as [TIMESTAMP],
cast(convert(varchar(20),[time],100) as smalldatetime), [time],
tagname , value
FROM hdata
INNER JOIN rtdata
ON hdata.tag_id = rtdata.id
WHERE tagname = 'M1_WH_004'
order by [TIMESTAMP] desc
(note: i need the inner join to pull the tagname as they are not in the hdata table)
produces:
Therefore, for each 15 minutes, I only want the value closest to the 15-min boundary. For the data above, it would be the 09:45:15.383 and 09:30:17.463 for the data for 09:45 and 09:30 respectively.
Do I need a subquery or case statement?
Any help would be greatly appreciated!
Further to this, a had a table that looked like the data in the solution (data for every 15 minutes) already and the subquery performed a calculation based on the last two values like so:
SELECT DD1.[TIME_STAMP] AS [TIME_STAMP], DD1.[kWh1] AS [kWh1], DD1.[kWh2] AS [kWh2], (DD1.[kWh1] + DD1.[kWh2]) AS [Total] FROM (SELECT a.ID
,a.TIME_STAMP
,(a.[1_M1_Wh] - (SELECT TOP 1 b.[1_M1_Wh] FROM TagCapture b WHERE b.TIME_STAMP = DATEADD(MINUTE, -15, a.TIME_STAMP))) * 0.04 AS kWh1
,(a.[1_M2_Wh] - (SELECT TOP 1 b.[1_M2_Wh] FROM TagCapture b WHERE b.TIME_STAMP = DATEADD(MINUTE, -15, a.TIME_STAMP))) * 0.04 AS kWh2
FROM [TagCapture] a) DD1
How can I use the solution provided in this query? I'm a little confused with all the subqueries.
i.e. so based on the data defined by the t subquery take the count value for one 15 minute subtract from previous and multiply to get required value where would I insert the t subquery in each FROM clause? I can't seem to get it to work. The above is just where the t query would define the two different tagnames for 'M1' and 'M2'.
Thanks again in advance!!

It looks like you are using SQL Server. If so, then you can use row_number() to solve this.
select t.*
from (select t.*, row_number() over (partition by tagname, time15 order by time) as seqnum
from (SELECT dateadd(minute, -1 * datediff(minute, 0, cast(convert(varchar(20),[time],100) as smalldatetime)) % 15, dateadd(minute, datediff(minute, 0, [time]), 0)) as [TIMESTAMP], cast(convert(varchar(20),[time],100) as smalldatetime) as time15,
[time], tagname , value
FROM hdata INNER JOIN rtdata
ON hdata.tag_id = rtdata.id
WHERE tagname = 'M1_WH_004'
) t
) t
where seqnum = 1

How to select the user with max count by day

I have a table with three columns
UserID, Count, Date
I'd like to be able to select the userid with the highest count for each date.
I've tried a few different variations of queries with inline select statements but none have worked 100%, and I'm not too fond of having a select with three inline selects.
Is doing inline selects the only way to go without using temp tables? Whats the best way to tackle this?

This solution will give you multiple records if there is a tie in Count but should work.
SELECT a.Date, a.UserId, a.[Count]
FROM yourTable a INNER JOIN (
SELECT MAX([Count]) as [Count], Date
FROM yourTable
GROUP BY Date
) b ON a.[Count] = b.[Count] AND a.Date = b.Date
ORDER BY a.Date

If [Date] is in fact a [Date] column with no time component:
;WITH x AS
(
SELECT [Date], [Count], UserID, rn = ROW_NUMBER() OVER
(PARTITION BY [Date] ORDER BY [Count] DESC)
FROM dbo.table
)
SELECT [Date], [Count], UserID
FROM x
WHERE rn = 1
ORDER BY [Date];
If [Date] is a DATETIME column with a time component, then:
;WITH x AS
(
SELECT [Date] = DATEADD(DAY, DATEDIFF(DAY, '19000101', [Date]), '19000101'),
[Count], UserID, rn = ROW_NUMBER() OVER
(PARTITION BY DATEADD(DAY, DATEDIFF(DAY, '19000101', [Date]), '19000101')
ORDER BY [Count] DESC)
FROM dbo.table
)
SELECT [Date], [Count], UserID
FROM x
WHERE rn = 1
ORDER BY [Date];
If you want to pick a specific row in the event of a tie, you can add a tie-breaker to the ORDER BY within the over. If you want to include multiple rows in the case of ties, you can try changing ROW_NUMBER() to DENSE_RANK().

SELECT x.*
FROM (
SELECT Date
FROM atable
GROUP BY Date
) t
CROSS APPLY (
SELECT TOP 1 WITH TIES
UserID, Count, Date
FROM atable
WHERE Date = t.Date
ORDER BY Count DESC
) x
If Date is datetime type and can have a non-zero time component, change the t table like this:
…
FROM (
SELECT Date = DATEADD(DAY, DATEDIFF(DAY, 0, Date), 0)
FROM atable
GROUP BY DATEADD(DAY, DATEDIFF(DAY, 0, Date), 0)
) t
…
References:
TOP (Transact-SQL)
Using APPLY

for SQL 2k5
select UserID, Count, Date
from tb
where Rank() over (partition by Date order by Count DESC, UserID DESC) = 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL Server - Select all top of the hour records - sql

I have a large table with records created every second and want to select only those records that were created at the top of each hour for the last 2 months. So we would get 24 selected records for every day over the last 60 days The table structure is Dateandtime, Value1, Value2, etc Many Thanks

Try: select * from mytable where datepart(mi, dateandtime)=0 and datepart(ss, dateandtime)=0 and datediff(d, dateandtime, getdate()) <=60

This helped me get the top of the hour. Anything that ends in ":00:00". WHERE (CAST(DATETIME as VARCHAR(19))) LIKE '%:00:00'

Related

Recursive CTE in Amazon Redshift

SQL how to write a query that return missing date ranges?

How to query database for rows from next 5 days

SQL SELECT Only Closest To 15-Minute Timestamp

How to select the user with max count by day

Categories

Resources