Ignoring Duplicate Records SQL

Ignoring Duplicate Records SQL - sql

In need of some help :)
So I have a table of records with the following columns:
Key (PK, FK, int) DT (smalldatetime) Value (real)
The DT is a datetime for every half hour of the day with an associated value
E.g.
Key DT VALUE
1000 2010-01-01 08:00:00 80
1000 2010-01-01 08:30:00 75
1000 2010-01-01 09:00:00 100
I have a Query that finds the max value every 24 hour period and its associated time however, on one day the max value occurs twice and hence duplicates the date which is causing processing issues. I have tried using rownumber() which works but I can't use a calculated column in my where clause?
Currently I have:
SELECT cast(T1.DT as date) as 'Date',Cast(T1.DT as time(0)) as 'HH', ROW_NUMBER() over (PARTITION BY cast(DT as date) ORDER BY DT) AS 'RowNumber'
FROM TABLE_1 AS T1
INNER JOIN (
SELECT CAST([DT] as date) as 'DATE'
, MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date)
) AS MAX_DT
ON MAX_DT.[DATE] = CAST(T1.[DT] as date)
AND T1.VALUE = MAX_DT.MAX_HH
WHERE DT > '6-nov-2016' and [KEY] = '1000'
ORDER BY DT
This results in
Key DT VALUE HH
1000 2010-01-01 80 07:00:00
1000 2010-02-01 100 17:30:00
1000 2010-02-01 100 18:00:00
I need to remove the duplicate date (I Have no preference which HH it takes)
I think I've explained that terribly, let me know if it makes no sense and i'll try and re write
Any ideas?

Can you try this the new code is in ** **:
SELECT cast(T1.DT as date) as 'Date', ** MIN(Cast(T1.DT as time(0))) as 'HH' **
FROM TABLE_1 AS T1
INNER JOIN (
SELECT CAST([DT] as date) as 'DATE'
, MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date)
) AS MAX_DT
ON MAX_DT.[DATE] = CAST(T1.[DT] as date)
AND T1.VALUE = MAX_DT.MAX_HH
WHERE DT > '6-nov-2016' and [KEY] = '1000'
here put the group by
GROUP BY cast(T1.DT as date)
ORDER BY DT

i would do something like this
i didnt try it but i think it s correct.
SELECT cast(T1.DT as date) as 'Date',Cast(T1.DT as time(0)) as 'HH', VALUE
FROM TABLE_1 T1
WHERE [DT] IN (
--select the max date from Table_1 for each day
SELECT MAX([DT]) max_date FROM TABLE_1
WHERE (CAST([DT] as date) ,value) IN
(
SELECT CAST([DT] as date) as 'CAST_DATE'
,MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date
)group by [DT]
)
WHERE DT > '6-nov-2016' and [KEY] = '1000'

Change the JOIN to an APPLY.
The APPLY operation will allow you to limit the connected relation to just one result for each source relation.
SELECT v.[Key], cast(v.DT As Date) as "Date", v.[Value], cast(v.DT as Time(0)) as "HH"
FROM
( -- First a projection to get just the exact dates you want
SELECT DISTINCT [Key], CAST(DT as DATE) as DT
FROM Table_1
WHERE [Key] = '1000' AMD DT > '20161106'
) dates
CROSS APPLY (
-- Then use APPLY rather than JOIN to find just the exact one record you need for each date
SELECT TOP 1 *
FROM Table_1
WHERE [Key] = dates.[Key] AND cast(DT as DATE) = dates.DT ORDER BY [Value] DESC
) v
A final note: Both this query and your sample query in the question will include values from Nov 6, 2016. The query says > 2016-11-05 with an exlusive inequality, but the original was still comparing using full DateTime values, meaning there is a implied 0 as a time component. So 12:01 AM on Nov 6 is still greater than 12:00:00.001 AM on Nov 6. If you want to exclude all Nov 6 dates from the query, you either need to change this to use a time value at the end of the date, or cast to date before making that > comparison.

With SQL you can use SELECT DISTINCT,
The SELECT DISTINCT statement is used to return only distinct (different) values.
Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.
The SELECT DISTINCT statement is used to return only distinct (different) values.

Related

Finding the maximum value in a 24h period SQL

In need of some help :)
So I have a table of records with the following columns:
Key (PK, FK, int)
DT (smalldatetime)
Value (real)
The DT is a datetime for every half hour of the day with an associated value
E.g.
Key DT VALUE
1000 2010-01-01 08:00:00 80
1000 2010-01-01 08:30:00 75
1000 2010-01-01 09:00:00 100
I need to find the max value and associated DT for every 24 hour period. for a particular key and date range
Currently I have:
SELECT CAST(LEFT([DT],11) as smalldatetime) as 'DATE'
,max([VALUE]) as 'MAX_HH'
FROM TABLE 1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST(LEFT([DT],11) as smalldatetime)
ORDER BY 'DATE'
But this returns the max values for the date e.g.
Key DT VALUE
1000 2010-01-01 00:00:00 100
Any ideas on how to pull the full DT ?
Thanks guys!

Assuming you're using a database with support for windowed functions, we can use ROW_NUMBER() (or RANK if you want to support/pull in values that are tied for first place):
declare #t table ([Key] int not null , DT smalldatetime not null, Value int not null)
insert into #t([Key],DT,VALUE) values
(1000,'2010-01-01T08:00:00',80 ),
(1000,'2010-01-01T08:30:00',75 ),
(1000,'2010-01-01T09:00:00',100)
;With Numbered as (
select *,
ROW_NUMBER() OVER (PARTITION BY [Key],CAST(DT as date) ORDER BY Value desc) as rn
from #t
)
select * from Numbered
where rn=1

Damien's answer is very good, if you can't (or want) to use windowed function, try this:
SELECT T1.*
FROM TABLE_1 AS T1
INNER JOIN (
SELECT CAST([DT] as date) as 'DATE'
, MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date)
) AS MAX_DT
ON MAX_DT.[DATE] = CAST(T1.[DT] as date)
AND T1.VALUE = MAX_DT.MAX_HH
WHERE DT > '6-nov-2016' and [KEY] = '1000'
ORDER BY DT
By the way, it's best not to use reserved keywords as object names (i.e. date)

SELECT DateTime not in SQL

I have the following table:
oDateTime pvalue
2017-06-01 00:00:00 70
2017-06-01 01:00:00 65
2017-06-01 02:00:00 90
ff.
2017-08-01 08:00:00 98
The oDateTime field is an hourly data which is impossible to have a duplicate value.
My question is, how can I know if the oDateTime data is correct? I meant, I need to make sure the data is not jump? It should be always 'hourly' base.
Am I missing the date? Am I missing the time?
Please advice. Thank you.

Based on this answer, you can get the missing times form your table MyLogTable it like this:
DECLARE #StartDate DATETIME = '20170601', #EndDate DATETIME = '20170801'
SELECT DATEADD(hour, nbr - 1, #StartDate)
FROM ( SELECT ROW_NUMBER() OVER ( ORDER BY c.object_id ) AS Nbr
FROM sys.columns c
) nbrs
WHERE nbr - 1 <= DATEDIFF(hour, #StartDate, #EndDate) AND
NOT EXISTS (SELECT 1 FROM MyLogTable WHERE DATEADD(hour, nbr - 1, #StartDate)= oDateTime )
If you need to check longer period, you can just add CROSS JOIN like this
FROM sys.columns c
CROSS JOIN sys.columns c1
It enables you to check much more than cca thousand records (rowcount of sys.columns table) in one query.

Since your table is not having any unique id number, use a row_number() to get the row number in the cte , then perform an self inner join with the row id and next id ,take the difference of oDateTime accordingly, this will show exactly which row do not have time difference of one hour
;with cte(oDateTime,pValue,Rid)
As
(
select *,row_number() over(order by oDateTime) from [YourTableName] t1
)
select *,datediff(HH,c1.oDateTime,c2.oDateTime) as HourDiff from cte c1
inner join cte c2
on c1.Rid=c2.Rid-1 where datediff(HH,c1.oDateTime,c2.oDateTime) >1

You could use DENSE_RANK() for numbering the hours in a day from 1 to 24. Then all you have to do is to check whether the max rank is 24 or not for a day. if there is at least one entry for each hour, then dense ranking will have max value of 24.
Use the following query to find the date when you have a oDateTime missing.
SELECT [date]
FROM
(
SELECT *
, CAST(oDateTime AS DATE) AS [date]
, DENSE_RANK() OVER(PARTITION BY CAST(oDateTime AS DATE) ORDER BY DATEPART(HOUR, oDateTime)) AS rank_num
FROM Test
) AS t
GROUP BY [date]
HAVING(MAX(rank_num) != 24);
If you need validation for each row of oDateTime, you could do self join based on rank and get the missing hour for each oDateTime.

Perhaps you are looking for this? This will return dates having count < 24 - which indicates a "jump"
;WITH datecount
AS ( SELECT CAST(oDateTime AS DATE) AS [date] ,
COUNT(CAST(oDateTime AS DATE)) AS [count]
FROM #temp
GROUP BY ( CAST(oDateTime AS DATE) )
)
SELECT *
FROM datecount
WHERE [count] < 24;
EDIT: Since you changed the requirement from "How to know if there is missing" to "What is the missing", here's an updated query.
DECLARE #calendar AS TABLE ( oDateTime DATETIME )
DECLARE #min DATETIME = (SELECT MIN([oDateTime]) FROM #yourTable)
DECLARE #max DATETIME = (SELECT MAX([oDateTime]) FROM #yourTable)
WHILE ( #min <= #max )
BEGIN
INSERT INTO #calendar
VALUES ( #min );
SET #min = DATEADD(hh, 1, #min);
END;
SELECT t1.[oDateTime]
FROM #calendar t1
LEFT JOIN #yourTable t2 ON t1.[oDateTime] = t2.[oDateTime]
GROUP BY t1.[oDateTime]
HAVING COUNT(t2.[oDateTime]) = 0;
I first created a hourly calendar based on your MAX and MIN Datetime, then compared your actual table to the calendar to find out if there is a "jump".

Find missing date as compare to calendar

I am explain problem in short.
select distinct DATE from #Table where DATE >='2016-01-01'
Output :
Date
2016-11-23
2016-11-22
2016-11-21
2016-11-19
2016-11-18
Now i need to find out missing date a compare to our calender dates from year '2016'
i.e. Here date '2016-11-20' is missing.
I want list of missing dates.
Thanks for reading this. Have nice day.

You need to generate dates and you have to find missing ones. Below with recursive cte i have done it
;WITH CTE AS
(
SELECT CONVERT(DATE,'2016-01-01') AS DATE1
UNION ALL
SELECT DATEADD(DD,1,DATE1) FROM CTE WHERE DATE1<'2016-12-31'
)
SELECT DATE1 MISSING_ONE FROM CTE
EXCEPT
SELECT * FROM #TABLE1
option(maxrecursion 0)

Using CTE and get all dates in CTE table then compare with your table.
CREATE TABLE #yourTable(_Values DATE)
INSERT INTO #yourTable(_Values)
SELECT '2016-11-23' UNION ALL
SELECT '2016-11-22' UNION ALL
SELECT '2016-11-21' UNION ALL
SELECT '2016-11-19' UNION ALL
SELECT '2016-11-18'
DECLARE #DATE DATE = '2016-11-01'
;WITH CTEYear (_Date) AS
(
SELECT #DATE
UNION ALL
SELECT DATEADD(DAY,1,_Date)
FROM CTEYear
WHERE _Date < EOMONTH(#DATE,0)
)
SELECT * FROM CTEYear
WHERE NOT EXISTS(SELECT 1 FROM #yourTable WHERE _Date = _Values)
OPTION(maxrecursion 0)

You need to generate the dates and then find the missing ones. A recursive CTE is one way to generate a handful of dates. Another way is to use master..spt_values as a list of numbers:
with n as (
select row_number() over (order by (select null)) - 1 as n
from master..spt_values
),
d as (
select dateadd(day, n.n, cast('2016-01-01' as date)) as dte
from n
where n <= 365
)
select d.date
from d left join
#table t
on d.dte = t.date
where t.date is null;
If you are happy enough with ranges of missing dates, you don't need a list of dates at all:
select date, (datediff(day, date, next_date) - 1) as num_missing
from (select t.*, lead(t.date) over (order by t.date) as next_date
from #table t
where t.date >= '2016-01-01'
) t
where next_date <> dateadd(day, 1, date);

SQL calculate date segments within calendar year

What I need is to calculate the missing time periods within the calendar year given a table such as this in SQL:
DatesTable
|ID|DateStart |DateEnd |
1 NULL NULL
2 2015-1-1 2015-12-31
3 2015-3-1 2015-12-31
4 2015-1-1 2015-9-30
5 2015-1-1 2015-3-31
5 2015-6-1 2015-12-31
6 2015-3-1 2015-6-30
6 2015-7-1 2015-10-31
Expected return would be:
1 2015-1-1 2015-12-31
3 2015-1-1 2015-2-28
4 2015-10-1 2015-12-31
5 2015-4-1 2015-5-31
6 2015-1-1 2015-2-28
6 2015-11-1 2015-12-31
It's essentially work blocks. What I need to show is the part of the calendar year which was NOT worked. So for ID = 3, he worked from 3/1 through the rest of the year. But he did not work from 1/1 till 2/28. That's what I'm looking for.

You can do it using LEAD, LAG window functions available from SQL Server 2012+:
;WITH CTE AS (
SELECT ID,
LAG(DateEnd) OVER (PARTITION BY ID ORDER BY DateEnd) AS PrevEnd,
DateStart,
DateEnd,
LEAD(DateStart) OVER (PARTITION BY ID ORDER BY DateEnd) AS NextStart
FROM DatesTable
)
SELECT ID, DateStart, DateEnd
FROM (
-- Get interval right before current [DateStart, DateEnd] interval
SELECT ID,
CASE
WHEN DateStart IS NULL THEN '20150101'
WHEN DateStart > start THEN start
ELSE NULL
END AS DateStart,
CASE
WHEN DateStart IS NULL THEN '20151231'
WHEN DateStart > start THEN DATEADD(d, -1, DateStart)
ELSE NULL
END AS DateEnd
FROM CTE
CROSS APPLY (SELECT COALESCE(DATEADD(d, 1, PrevEnd), '20150101')) x(start)
-- If there is no next interval then get interval right after current
-- [DateStart, DateEnd] interval (up-to end of year)
UNION ALL
SELECT ID, DATEADD(d, 1, DateEnd) AS DateStart, '20151231' AS DateEnd
FROM CTE
WHERE DateStart IS NOT NULl -- Do not re-examine [Null, Null] interval
AND NextStart IS NULL -- There is no next [DateStart, DateEnd] interval
AND DateEnd < '20151231' -- Current [DateStart, DateEnd] interval
-- does not terminate on 31/12/2015
) AS t
WHERE t.DateStart IS NOT NULL
ORDER BY ID, DateStart
The idea behind the above query is simple: for every [DateStart, DateEnd] interval get 'not worked' interval right before it. If there is no interval following the current interval, then also get successive 'not worked' interval (if any).
Also note that I assume that if DateStart is NULL then DateStart is also NULL for the same ID.
Demo here

If your data is not too big, this approach will work. It expands all the days and ids and then re-groups them:
with d as (
select cast('2015-01-01' as date)
union all
select dateadd(day, 1, d)
from d
where d < cast('2015-12-31' as date)
),
td as (
select *
from d cross join
(select distinct id from t) t
where not exists (select 1
from t t2
where d.d between t2.startdate and t2.enddate
)
)
select id, min(d) as startdate, max(d) as enddate
from (select td.*,
dateadd(day, - row_number() over (partition by id order by d), d) as grp
from td
) td
group by id, grp
order by id, grp;
An alternative method relies on cumulative sums and similar functionality that is much easier to expression in SQL Server 2012+.

Somewhat simpler approach I think.
Basically create a list of dates for all work block ranges (A). Then create a list of dates for the whole year for each ID (B). Then remove the A from B. Compile the remaining list of dates into date ranges for each ID.
DECLARE #startdate DATETIME, #enddate DATETIME
SET #startdate = '2015-01-01'
SET #enddate = '2015-12-31'
--Build date ranges from remaining date list
;WITH dateRange(ID, dates, Grouping)
AS
(
SELECT dt1.id, dt1.Dates, dt1.Dates + row_number() over (order by dt1.id asc, dt1.Dates desc) AS Grouping
FROM
(
--Remove (A) from (B)
SELECT distinct dt.ID, tmp.Dates FROM DatesTable dt
CROSS APPLY
(
--GET (B) here
SELECT DATEADD(DAY, number, #startdate) [Dates]
FROM master..spt_values
WHERE type = 'P' AND DATEADD(DAY, number, #startdate) <= #enddate
) tmp
left join
(
--GET (A) here
SELECT DISTINCT T.Id,
D.Dates
FROM DatesTable AS T
INNER JOIN master..spt_values as N on N.number between 0 and datediff(day, T.DateStart, T.DateEnd)
CROSS APPLY (select dateadd(day, N.number, T.DateStart)) as D(Dates)
WHERE N.type ='P'
) dr
ON dr.Id = dt.Id and dr.Dates = tmp.Dates
WHERE dr.id is null
) dt1
)
SELECT ID, CAST(MIN(Dates) AS DATE) DateStart, CAST(MAX(Dates) AS DATE) DateEnd
FROM dateRange
GROUP BY ID, Grouping
ORDER BY ID
Heres the code:
http://sqlfiddle.com/#!3/f3615/1
I hope this helps!

How can I sum values per day and then plot them on calendar from start date to last date

I have a table, part of which is given below. It contain multiple values (durations) per day. I need two things 1) addition of durations per day. 2) plotting them on calendar in such a way that startdate is first_date from the table and last_date is Last_update from the table. I want to mention 0 for which date there is no duration. I think it will something like below but need help.
;WITH AllDates AS(
SELECT #Fromdate As TheDate
UNION ALL
SELECT TheDate + 1
FROM AllDates
WHERE TheDate + 1 <= #ToDate
)SELECT UserId,
TheDate,
COALESCE(
SUM(
-- When the game starts and ends in the same date
CASE WHEN DATEDIFF(DAY, GameStartTime, GameEndTime) = 0
Here is what I am looking for

Another way to generate the date range you are after would be something like .....
;WITH DateLimits AS
(
SELECT MIN(First_Date) FirstDate
,MAX(Last_Update) LastDate
FROM TableName
),
DateRange AS
(
SELECT TOP (SELECT DATEDIFF(DAY,FirstDate,LastDate ) FROM DateLimits)
DATEADD(DAY
,ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
, (SELECT FirstDate FROM DateLimits)
) AS Dates
FROM master..spt_values a cross join master..spt_values b
)
SELECT * FROM DateRange --<-- you have the desired date range here
-- other query whatever you need.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Ignoring Duplicate Records SQL - sql

Related

Finding the maximum value in a 24h period SQL

SELECT DateTime not in SQL

Find missing date as compare to calendar

SQL calculate date segments within calendar year

How can I sum values per day and then plot them on calendar from start date to last date

Categories

Resources