How to count most consecutive occurrences of a value in a Column in SQL Server - sql

I have a table Attendance in my database.
Date | Present
------------------------
20/11/2013 | Y
21/11/2013 | Y
22/11/2013 | N
23/11/2013 | Y
24/11/2013 | Y
25/11/2013 | Y
26/11/2013 | Y
27/11/2013 | N
28/11/2013 | Y
I want to count the most consecutive occurrence of a value Y or N.
For example in the above table Y occurs 2, 4 & 1 times. So I want 4 as my result.
How to achieve this in SQL Server?
Any help will be appreciated.

Try this:-
The difference between the consecutive date will remain constant
Select max(Sequence)
from
(
select present ,count(*) as Sequence,
min(date) as MinDt, max(date) as MaxDt
from (
select t.Present,t.Date,
dateadd(day,
-(row_number() over (partition by present order by date))
,date
) as grp
from Table1 t
) t
group by present, grp
)a
where Present ='Y'
SQL FIDDLE

You can do this with a recursive CTE:
;WITH cte AS (SELECT Date,Present,ROW_NUMBER() OVER(ORDER BY Date) RN
FROM Table1)
,cte2 AS (SELECT Date,Present,RN,ct = 1
FROM cte
WHERE RN = 1
UNION ALL
SELECT a.Date,a.Present,a.RN,ct = CASE WHEN a.Present = b.Present THEN ct + 1 ELSE 1 END
FROM cte a
JOIN cte2 b
ON a.RN = b.RN+1)
SELECT TOP 1 *
FROM cte2
ORDER BY CT DESC
Demo: SQL Fiddle
Note, the date's in the demo got altered due to the format you posted the dates in your question.

Related

First value in DATE minus 30 days SQL

I have bunch of data out of which I'm showing ID, max date and it's corresponding values (user id, type, ...). Then I need to take MAX date for each ID, substract 30 days and show first date and it's corresponding values within this date period.
Example:
ID Date Name
1 01.05.2018 AAA
1 21.04.2018 CCC
1 05.04.2018 BBB
1 28.03.2018 AAA
expected:
ID max_date max_name previous_date previous_name
1 01.05.2018 AAA 05.04.2018 BBB
I have working solution using subselects, but as I have quite huge WHERE part, refresh takes ages.
SUBSELECT looks like that:
(SELECT MIN(N.name)
FROM t1 N
WHERE N.ID = T.ID
AND (N.date < MAX(T.date) AND N.date >= (MAX(T.date)-30))
AND (...)) AS PreviousName
How'd you write the select?
I'm using TSQL
Thanks
I can do this with 2 CTEs to build up the dates and names.
SQL Fiddle
MS SQL Server 2017 Schema Setup:
CREATE TABLE t1 (ID int, theDate date, theName varchar(10)) ;
INSERT INTO t1 (ID, theDate, theName)
VALUES
( 1,'2018-05-01','AAA' )
, ( 1,'2018-04-21','CCC' )
, ( 1,'2018-04-05','BBB' )
, ( 1,'2018-03-27','AAA' )
, ( 2,'2018-05-02','AAA' )
, ( 2,'2018-05-21','CCC' )
, ( 2,'2018-03-03','BBB' )
, ( 2,'2018-01-20','AAA' )
;
Main Query:
;WITH cte1 AS (
SELECT t1.ID, t1.theDate, t1.theName
, DATEADD(day,-30,t1.theDate) AS dMinus30
, ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY t1.theDate DESC) AS rn
FROM t1
)
, cte2 AS (
SELECT c2.ID, c2.theDate, c2.theName
, ROW_NUMBER() OVER (PARTITION BY c2.ID ORDER BY c2.theDate) AS rn
, COUNT(*) OVER (PARTITION BY c2.ID) AS theCount
FROM cte1
INNER JOIN cte1 c2 ON cte1.ID = c2.ID
AND c2.theDate >= cte1.dMinus30
WHERE cte1.rn = 1
GROUP BY c2.ID, c2.theDate, c2.theName
)
SELECT cte1.ID, cte1.theDate AS max_date, cte1.theName AS max_name
, cte2.theDate AS previous_date, cte2.theName AS previous_name
, cte2.theCount
FROM cte1
INNER JOIN cte2 ON cte1.ID = cte2.ID
AND cte2.rn=1
WHERE cte1.rn = 1
Results:
| ID | max_date | max_name | previous_date | previous_name |
|----|------------|----------|---------------|---------------|
| 1 | 2018-05-01 | AAA | 2018-04-05 | BBB |
| 2 | 2018-05-21 | CCC | 2018-05-02 | AAA |
cte1 builds the list of max_date and max_name grouped by the ID and then using a ROW_NUMBER() window function to sort the groups by the dates to get the most recent date. cte2 joins back to this list to get all dates within the last 30 days of cte1's max date. Then it does essentially the same thing to get the last date. Then the outer query joins those two results together to get the columns needed while only selecting the most and least recent rows from each respectively.
I'm not sure how well it will scale with your data, but using the CTEs should optimize pretty well.
EDIT: For the additional requirement, I just added in another COUNT() window function to cte2.
I would do:
select id,
max(case when seqnum = 1 then date end) as max_date,
max(case when seqnum = 1 then name end) as max_name,
max(case when seqnum = 2 then date end) as prev_date,
max(case when seqnum = 2 then name end) as prev_name,
from (select e.*, row_number() over (partition by id order by date desc) as seqnum
from example e
) e
group by id;

Count of duplicate values by two columns in SQL Server

From this table:
Number Value
1 a
2 b
3 a
2 c
2 b
3 a
2 b
I need to get count of all duplicate rows by Number and Value, i.e. 5.
Thanks.
I think this query is what you want:
SELECT SUM(t.cnt)
FROM
(
SELECT COUNT(*) cnt
FROM table_name
GROUP BY number, value
HAVING COUNT(*) > 1
)t;
Maybe something like this?
select value,number,max(cnt) as Count_distinct from (
select *,row_number () over (partition by value,number order by number) as cnt
from #sample
)t
group by value,number
Output
+---------------------------------+
| Value | Number | Count_Distinct |
| a | 1 | 1 |
| b | 2 | 3 |
| c | 2 | 1 |
| a | 3 | 2 |
+---------------------------------+
Select
count(distinct Number) as Distinct_Numbers,
count(distinct Value) as Distinct_Values
from
Table
This shows how many distinct values are in each column. Does this help?
Give a row number partition by both the columns and order by both the columns. Then count the number of rows where row number greater than 1.
Query
;with cte as(
select [rn] = row_number() over(
partition by [Number], [Value]
order by [Number], [Value]
), *
from [your_table_name]
)
select count(*) from cte
where [rn] > 1;
I think you mean number of unique number - value pairs, you can use:
SELECT count(*)
FROM
(SELECT ROW_NUMBER() OVER (PARTITION BY number, value ORDER BY (select 1)) from mytable rnk) i
where i.rnk = 1
May be this query may help you
select * from [dbo].[Sample_table1]
;WITH
DupContactRecords(number,value,DupsCount)
AS
(
SELECT number,value, COUNT() AS TotalCount FROM [Sample_table1] GROUP BY number,value HAVING COUNT() > 1
)
--to get the duplicats
/*select * from DupContactRecords*/
SELECT sum(DupsCount) FROM DupContactRecords

select top N records for each entity

I have a table like below -
ID | Reported Date | Device_ID
-------------------------------------------
1 | 2016-03-09 09:08:32.827 | 1
2 | 2016-03-08 09:08:32.827 | 1
3 | 2016-03-08 09:08:32.827 | 1
4 | 2016-03-10 09:08:32.827 | 2
5 | 2016-03-05 09:08:32.827 | 2
Now, i want a top 1 row based on date column for each device_ID
Expected Output
ID | Reported Date | Device_ID
-------------------------------------------
1 | 2016-03-09 09:08:32.827 | 1
4 | 2016-03-10 09:08:32.827 | 2
I am using SQL Server 2008 R2. i can go and write Stored Procedure to handle it but wanted do it with simple query.
****************EDIT**************************
Answer by 'Felix Pamittan' worked well but for 'N' just change it to
SELECT
Id, [Reported Date], Device_ID
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY Device_ID ORDER BY [ReportedDate] DESC)
FROM tbl
)t
WHERE Rn >= N
He had mentioned this in comment thought to add it to questions so that no body miss it.
Use ROW_NUMBER:
SELECT
Id, [Reported Date], Device_ID
FROM (
SELECT *,
Rn = ROW_NUMBER() OVER(PARTITION BY Device_ID ORDER BY [ReportedDate] DESC)
FROM tbl
)t
WHERE Rn = 1
You can also try using CTE
With DeviceCTE AS
(SELECT *, ROW_NUMBER() OVER(PARTITION BY Device_ID ORDER BY [Reported Date] DESC) AS Num
FROM tblname)
SELECT Id, [Reported Date], Device_ID
From DeviceCTE
Where Num = 1
If you can't use an analytic function, e.g. because your application layer won't allow it, then you can try the following solution which uses a subquery to arrive at the answer:
SELECT t1.ID, t2.maxDate, t1.Device_ID
INNER JOIN
(
SELECT Device_ID, MAX([Reported Date]) AS maxDate
FROM yourTable
GROUP BY Device_ID
) t2
ON t1.Device_ID = t2.Device_ID AND t1.[Reported Date] = t2.maxDate
Select * from DEVICE_TABLE D
where [Reported Date] = (Select Max([Reported Date]) from DEVICE_TABLE where Device_ID = D.Device_ID)
should do the trick, assume that "top 1 row based on date column" means that you want to select the latest reported date of each Device_ID ?
As for your title, select top 5 rows of each Device_ID
Select * from DEVICE_TABLE D
where [Reported Date] in (Select top 5 [Reported Date] from DEVICE_TABLE D where Device_ID = D.Device_ID)
order by Device_ID, [Reported Date] desc
will give you the top 5 latest reports of each device id.
You may want to sort out the top 5 date if your data isn't in order...
Again with no analytic functions you can use CROSS APPLY :
DECLARE #tbl TABLE(Id INT,[Reported Date] DateTime , Device_ID INT)
INSERT INTO #tbl
VALUES
(1,'2016-03-09 09:08:32.827',1),
(2,'2016-03-08 09:08:32.827',1),
(3,'2016-03-08 09:08:32.827',1),
(4,'2016-03-10 09:08:32.827',2),
(5,'2016-03-05 09:08:32.827',2)
SELECT r.*
FROM ( SELECT DISTINCT Device_ID FROM #tbl ) d
CROSS APPLY ( SELECT TOP 1 *
FROM #tbl t
WHERE d.Device_ID = t.Device_ID ) r
Can be easily modified to support N records.
Credits go to wBob answering this question here

Any other alternative to write this SQL query

I need to select data base upon three conditions
Find the latest date (StorageDate Column) from the table for each record
See if there is more then one entry for date (StorageDate Column) found in first step for same ID (ID Column)
and then see if DuplicateID is = 2
So if table has following data:
ID |StorageDate | DuplicateTypeID
1 |2014-10-22 | 1
1 |2014-10-22 | 2
1 |2014-10-18 | 1
2 |2014-10-12 | 1
3 |2014-10-11 | 1
4 |2014-09-02 | 1
4 |2014-09-02 | 2
Then I should get following results
ID
1
4
I have written following query but it is really slow, I was wondering if anyone has better way to write it.
SELECT DISTINCT(TD.RecordID)
FROM dbo.MyTable TD
JOIN (
SELECT T1.RecordID, T2.MaxDate,COUNT(*) AS RecordCount
FROM MyTable T1 WITH (nolock)
JOIN (
SELECT RecordID, MAX(StorageDate) AS MaxDate
FROM MyTable WITH (nolock)
GROUP BY RecordID)T2
ON T1.RecordID = T2.RecordID AND T1.StorageDate = T2.MaxDate
GROUP BY T1.RecordID, T2.MaxDate
HAVING COUNT(*) > 1
)PT ON TD.RecordID = PT.RecordID AND TD.StorageDate = PT.MaxDate
WHERE TD.DuplicateTypeID = 2
Try this and see how the performance goes:
;WITH
tmp AS
(
SELECT *,
RANK() OVER (PARTITION BY ID ORDER BY StorageDate DESC) AS StorageDateRank,
COUNT(ID) OVER (PARTITION BY ID, StorageDate) AS StorageDateCount
FROM MyTable
)
SELECT DISTINCT ID
FROM tmp
WHERE StorageDateRank = 1 -- latest date for each ID
AND StorageDateCount > 1 -- more than 1 entry for date
AND DuplicateTypeID = 2 -- DuplicateTypeID = 2
You can use analytic function rank , can you try this query ?
Select recordId from
(
select *, rank() over ( partition by recordId order by [StorageDate] desc) as rn
from mytable
) T
where rn =1
group by recordId
having count(*) >1
and sum( case when duplicatetypeid =2 then 1 else 0 end) >=1

Query to return first date of missing date ranges

Looking for help with a query using SQL 2008 R2... I have a table with client and date fields. Most clients have a record for most dates, however some don't.
For example I have this data:
CLIENTID DT
1 5/1/14
1 5/2/14
2 5/3/14
3 5/1/14
3 5/2/14
I can find the missing dates for each CLIENTID by creating a temp table with all possible dates for the period and then joining that to each CLIENTID and DT and only selecting where there is a NULL.
This is what I can get easily for the date range 5/1/14 to 5/4/14:
CLIENTID DTMISSED
1 5/3/14
1 5/4/14
2 5/1/14
2 5/2/14
2 5/4/14
3 5/3/14
3 5/4/14
However I want to group each consecutive missed period together and get the beginning of each period and the length.
For example, if I use the date range 5/1/14 to 5/4/14 I'd like to get:
CLIENTID DTSTART MISSED
1 5/3/14 2
2 5/1/14 2
2 5/4/14 1
3 5/3/14 2
Thanks for helping!
It's fascinating how more elegantly and also mere efficiently this kind of problems can be solved in 2012.
First, the tables:
create table #t (CLIENTID int, DT date)
go
insert #t values
(1, '5/1/14'),
(1, '5/2/14'),
(2, '5/3/14'),
(3, '5/1/14'),
(3, '5/2/14')
go
create table #calendar (dt date)
go
insert #calendar values ('5/1/14'),('5/2/14'),('5/3/14'),('5/4/14')
go
Here's the 2008 solution:
;with x as (
select *, row_number() over(order by clientid, dt) as rn
from #calendar c
cross join (select distinct clientid from #t) x
where not exists (select * from #t where c.dt=#t.dt and x.clientid=#t.clientid)
),
y as (
select x1.*, x2.dt as x2_dt, x2.clientid as x2_clientid
from x x1
left join x x2 on x1.clientid=x2.clientid and x1.dt=dateadd(day,1,x2.dt)
),
z as (
select *, (select sum(case when x2_dt is null then 1 else 0 end) from y y2 where y2.rn<=y.rn) as grp
from y
)
select clientid, min(dt), count(*)
from z
group by clientid, grp
order by clientid
Compare it to 2012:
;with x as (
select *, row_number() over(order by dt) as rn
from #calendar c
cross join (select distinct clientid from #t) x
where not exists (select * from #t where c.dt=#t.dt and x.clientid=#t.clientid)
),
y as (
select x1.*, sum(case when x2.dt is null then 1 else 0 end) over(order by x1.clientid,x1.dt) as grp
from x x1
left join x x2 on x1.clientid=x2.clientid and x1.dt=dateadd(day,1,x2.dt)
)
select clientid, min(dt), count(*)
from y
group by clientid, grp
order by clientid