Related
I am working with a set of data in a table.
For simplicity i have the table like below with some sample data:
Some of the data in this table came from a different source, such data are the ones that have cqmRecordID != null
I need to find duplicate values in this table and delete the duplicate ones that came over from the other source (ones with a cqmRecordID)
A record is considered duplicate if they have the same values for these cols:
[Name]
Cast([CreatedDate] as Date)
[CreatedBy]
So in the sample data i have above, record #5 and record #6 would be considered duplicates.
As solutions I came up with these two queries:
Query #1:
select * from (
select recordid, cqmrecordid, ROW_NUMBER() over (partition by name, cast(createddate as date), createdby
order by cqmrecordid, recordid) as rownum
from vmsNCR ) A
where cqmrecordid is not null
order by recordid
Query #2:
select A.recordID, A.cqmRecordID, B.RecordID, B.cqmRecordID
from vmsNCR A
join vmsNCR B
on A.Name = B.Name
and cast(A.CreatedDate as date) = cast(B.CreatedDate as date)
and A.CreatedBy = B.CreatedBy
and A.RecordID != B.RecordID
and A.cqmRecordID is not null
order by A.RecordID
Is there a better approach to this? Is one better than the other performance wise?
If you want to fetch all the rows without duplicates, then:
select t.* -- or all columns except seqnum
from (select t.*,
row_number() over (partition by name, cast(createddate as date), createdby
order by (case when cqmRecordId is not null then 1 else 2 end)
) as seqnum
from t
) t
where seqnum = 1;
If you want performance, create a columns and then an index:
alter table t add cqmRecordId_flag as (case when cqmRecordId is null then 0 else 1 end) persisted;
alter table t add createddate_date as (cast(createddate as date)) persisted;
And then an index:
create index idx_t_4 on t(name, createddate_date, createdby, cqmRecordId_flag desc);
EDIT:
If you actually just want to delete the NULL values from the table, you can use:
delete t from t
where t.cqmRecordId is null and
exists (select 1
from t t2
where t2.name = t.name and
convert(date, t2.createddate_date) =convert(date, t.createddate_date) and
t2.createdby = t.createdby and
t2.cqmRecordId is not null
);
You can use the same logic with select to just select the duplicates.
Try below Query it might work for You
;WITH TestCTE
AS
(
SELECT *,ROW_NUMBER() OVER(
PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy]
ORDER BY RecordId
) AS RowNumber
)
DELETE FROM TestCTE
WHERE RowNumber > 1
Use the below code to eliminate duplicates
;WITH CTE
AS
(
SELECT ROW_NUMBER() OVER(
PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy]
ORDER BY cqmRecordId
) AS Rnk
,*
)
DELETE FROM CTE
WHERE Rnk <> 1
I have been struggling for a while now with that problem and I need some help.
I have the following query :
CREATE TABLE Example(
Start NVARCHAR(8),
Endd NVARCHAR(8),
Col1 NVARCHAR(2),
Col2 NVARCHAR(2));
INSERT into Example (Start,Endd,Col1,Col2)
VALUES ('20130801','20140316','02','01'),
('20140317','20140319','04','02'),
('20140320','20140320','04','02'),
('20140321','20140421','02','Z8'),
('20140422','20140429','02','Z9'),
('20140430','20140902','04','02'),
('20140903','20150201','04','02'),
('20150202','20150223','04','02'),
('20150224','20150527','04','02'),
('20150528','99991231','04','02')
;
select MIN(Start)AS Start,MAX(Endd) AS Endd,Col1,Col2 from
(
SELECT top (100000000) Start, Endd,Col1, Col2,dense_rank() over(partition by Col1, Col2 order by Start,Endd) as rank
,LEAD (Col1) OVER (order by Start,Endd DESC) as l1
,LEAD (Col2) OVER (order by Start,Endd DESC) as l2
,LAG (Col1) OVER (order by Start,Endd DESC) as l11
,LAG (Col2) OVER (order by Start,Endd DESC) as l22
FROM Example sp
order by Start,Endd
)rq
GROUP BY Col1,Col2,case when (rq.l1=Col1 and rq.l2=Col2) or (rq.l11=Col1 and rq.l22=Col2) then 0 else rank end
order by Start,Endd;
My goal is to merge those data to have the following result:
However as you can see in the query result, when i have the same values for Col1 and Col2 on different time periods, the merge is not done correctly. It basically tries to merge them all in one, which create issues in the value for the new period.
Would someone be able to help me?
You were getting close in your query and you may have found a solution by now. This is a classic Islands and Gaps problem. I am giving the longer version with no use of LEAD AND LAG. You can replace perhaps 45% of the code below by using those windowing functions with perhaps a dense rank.
DECLARE #Example TABLE(
Start NVARCHAR(8),
Endd NVARCHAR(8),
Col1 NVARCHAR(2),
Col2 NVARCHAR(2));
INSERT into #Example (Start,Endd,Col1,Col2)
VALUES ('20130801','20140316','02','01'),
('20140317','20140319','04','02'),
('20140320','20140320','04','02'),
('20140321','20140421','02','Z8'),
('20140422','20140429','02','Z9'),
('20140430','20140902','04','02'),
('20140903','20150201','04','02'),
('20150202','20150223','04','02'),
('20150224','20150527','04','02'),
('20150528','99991231','04','02')
SELECT
TableID=MAX(TableID),Col1=MAX(Col1),Col2=MAX(Col2),Start=MIN(Start),Endd=MAX(Endd)
FROM
(
SELECT
TableID,Col1,Col2,Start,Endd,ChangeID=MAX(ChangeOnlyTableID)
FROM
(
SELECT
AllRecords.TableID,AllRecords.Col1,AllRecords.Col2,AllRecords.Start,AllRecords.Endd,ChangeOnlyTableID=ChangesOnly.TableID
FROM
(
SELECT * FROM
(
SELECT
This.Start,This.Endd,This.TableID,This.Col1,This.Col2,
Changed=CASE WHEN (Next.Col1=This.Col1 AND Next.Col2=This.Col2) THEN 0 ELSE 1 END
FROM
(
SELECT TableID=ROW_NUMBER() OVER(ORDER BY Start,Endd,Col1,Col2),Start,Endd,Col1,Col2 FROM #Example
)AS This
LEFT OUTER JOIN
(
SELECT TableID=ROW_NUMBER() OVER(ORDER BY Start,Endd,Col1,Col2),Start,Endd,Col1,Col2 FROM #Example
)
AS Next ON This.TableID=Next.TableID+1
)
AS ChangeMarkers
WHERE Changed=1
)
AS AllRecords
INNER JOIN
(
SELECT * FROM
(
SELECT
This.Start,This.Endd,This.TableID,This.Col1,This.Col2,
Changed=CASE WHEN (Next.Col1=This.Col1 AND Next.Col2=This.Col2) THEN 0 ELSE 1 END
FROM
(
SELECT TableID=ROW_NUMBER() OVER(ORDER BY Start,Endd,Col1,Col2),Start,Endd,Col1,Col2 FROM #Example
) AS This
LEFT OUTER JOIN
(
SELECT TableID=ROW_NUMBER() OVER(ORDER BY Start,Endd,Col1,Col2),Start,Endd,Col1,Col2 FROM #Example
) AS Next ON This.TableID=Next.TableID+1
)
AS ChangeMarkers
WHERE Changed=1
)
AS ChangesOnly ON ChangesOnly.Col1=AllRecords.Col1 AND ChangesOnly.Col2=AllRecords.Col2 AND ChangesOnly.TableID<=AllRecords.TableID
)AS JoinedResults
GROUP BY
TableID,Col1,Col2,Start,Endd
)
AS Final
GROUP BY
Col1,Col2,ChangeID
ORDER BY
MAX(TableID)
You may choose to shorten this somewhat with a few CTE's to produce a query such as:
;WITH TableWithIDs AS
(
SELECT TableID=ROW_NUMBER() OVER(ORDER BY Start,Endd,Col1,Col2),Start,Endd,Col1,Col2 FROM #Example
)
,ChangeMarkers AS
(
SELECT
This.Start,This.Endd,This.TableID,This.Col1,This.Col2,
Changed=CASE WHEN (Next.Col1=This.Col1 AND Next.Col2=This.Col2) THEN 0 ELSE 1 END
FROM
TableWithIDs AS This
LEFT OUTER JOIN TableWithIDs AS Next ON This.TableID=Next.TableID+1
)
,ChangesOnly AS
(
SELECT * FROM ChangeMarkers WHERE Changed=1
)
,
JoinedResults AS
(
SELECT
AllRecords.TableID,AllRecords.Col1,AllRecords.Col2,AllRecords.Start,AllRecords.Endd,ChangeOnlyTableID=ChangesOnly.TableID
FROM
ChangeMarkers AllRecords
INNER JOIN ChangesOnly ON ChangesOnly.Col1=AllRecords.Col1 AND ChangesOnly.Col2=AllRecords.Col2 AND ChangesOnly.TableID<=AllRecords.TableID
)
SELECT
TableID=MAX(TableID),Col1=MAX(Col1),Col2=MAX(Col2),Start=MIN(Start),Endd=MAX(Endd)
FROM
(
SELECT
TableID,Col1,Col2,Start,Endd,ChangeID=MAX(ChangeOnlyTableID)
FROM
JoinedResults
GROUP BY
TableID,Col1,Col2,Start,Endd
)
AS Final
GROUP BY
Col1,Col2,ChangeID
ORDER BY
MAX(TableID)
There are also some clever hacks that can be applied further using virtual keys however I went the most direct but more verbose route. You should be able to improve on this using a DENSE_RANK() with LEAD() OR LAG()
I have a table like this
Date----- ----------Value--------- Group <br>
2017-01-01--------10--------------1--<br>
2017-01-02---------9---------------1--<br>
2017-01-03 --------5---------------2--<br>
2017-01-04 --------4---------------2--<br>
i want to update all value column in the table such that it is set to minimum date's value in that group
like this
Date----- ----------Value--------- Group <br>
2017-01-01--------10--------------1--<br>
2017-01-02---------10---------------1--<br>
2017-01-03 --------5---------------2--<br>
2017-01-04 --------5---------------2--<br>
Here you go, 2 sub-queries, the first to calculate min date per group then join back to original table to get the associated value. Then finally join this to the original table to update all associated groups with that value:
UPDATE M SET M.Value = RESULT.Value FROM MyTable M
INNER JOIN (
SELECT MV.Group, M.Value FROM MyTable M
INNER JOIN (
SELECT MIN(Date) as MinDateValue, Group FROM MyTable
GROUP BY Group
) MV ON MV.MinDateValue = M.Date AND MV.Group = M.Group
) RESULT ON RESULT.Group = M.Group
First get min date and value from sub query.Based on this result update main table
CREATE TABLE #Table(_Date Date,value INT,_Group INT)
INSERT INTO #Table(_Date ,value ,_Group)
SELECT '2017-01-01',10,1 UNION ALL
SELECT '2017-01-02',9,1 UNION ALL
SELECT '2017-01-03',5,2 UNION ALL
SELECT '2017-01-04',4,2
UPDATE #Table SET value = _Output._Value
FROM
(
SELECT A._Date , A._Group , T.value _Value
FROM #Table T
JOIN
(
SELECT MIN(_Date) _Date ,_Group
FROM #Table
GROUP BY _Group
) A ON A._Date = T._Date
) _Output WHERE _Output._Group = #Table._Group
SELECT * FROM #Table
You can also use a CTE.
Query
;with cte as(
select [rn] = row_number() over(
partition by [Group]
order by [Date]
), *
from [your_table_name]
)
update t1
set t1.[Value] = t2.[Value]
from cte t1
join cte t2
on t1.[Group] = t2.[Group]
and t1.[rn] > t2.[rn];
I've been staring at this for hours and hours and can't come up with an "elegant" set-based way of getting the result set I need...
Here's my sample data (my real data could be 1,000,000+ rows)...
DECLARE #t AS TABLE (ID int,ID1 nvarchar(15),[DATE] date,PERIOD int,[TYPE] nchar(1));
INSERT INTO #t (ID,ID1,[DATE],PERIOD,[TYPE])
VALUES
(1,N'NUM1','2016-01-01',1,N'B'),
(2,N'NUM1','2016-01-01',2,N'A'),
(3,N'NUM1','2016-01-01',3,N'A'),
(4,N'NUM1','2016-01-01',4,N'B'),
(5,N'NUM1','2016-01-01',4,N'A'),
(6,N'NUM1','2016-01-01',5,N'A'),
(7,N'NUM1','2016-01-02',1,N'A'),
(8,N'NUM1','2016-01-02',2,N'A'),
(9,N'NUM1','2016-01-02',3,N'A'),
(10,N'NUM1','2016-01-02',4,N'A'),
(11,N'NUM1','2016-01-02',5,N'A'),
(12,N'NUM2','2016-01-01',1,N'A'),
(13,N'NUM2','2016-01-01',1,N'B'),
(14,N'NUM2','2016-01-01',2,N'A'),
(15,N'NUM2','2016-01-01',3,N'A'),
(16,N'NUM2','2016-01-01',4,N'B'),
(17,N'NUM2','2016-01-01',4,N'A'),
(18,N'NUM2','2016-01-01',5,N'A'),
(19,N'NUM2','2016-01-02',1,N'A'),
(20,N'NUM2','2016-01-02',2,N'B'),
(21,N'NUM2','2016-01-02',3,N'A'),
(22,N'NUM2','2016-01-02',4,N'A'),
(23,N'NUM2','2016-01-02',4,N'B'),
(24,N'NUM2','2016-01-02',5,N'A');
Here is the result set I'm trying to get...
1,'NUM1','2016-01-01',1,'B'
2,'NUM1','2016-01-01',2,'A'
3,'NUM1','2016-01-01',3,'A'
5,'NUM1','2016-01-01',4,'A'
6,'NUM1','2016-01-01',5,'A'
7,'NUM1','2016-01-02',1,'A'
8,'NUM1','2016-01-02',2,'A'
9,'NUM1','2016-01-02',3,'A'
10,'NUM1','2016-01-02',4,'A'
11,'NUM1','2016-01-02',5,'A'
12,'NUM2','2016-01-01',1,'A'
14,'NUM2','2016-01-01',2,'A'
15,'NUM2','2016-01-01',3,'A'
17,'NUM2','2016-01-01',4,'A'
18,'NUM2','2016-01-01',5,'A'
19,'NUM2','2016-01-02',1,'A'
20,'NUM2','2016-01-02',2,'B'
21,'NUM2','2016-01-02',3,'A'
22,'NUM2','2016-01-02',4,'A'
24,'NUM2','2016-01-02',5,'A'
Simply put, each day has 5 periods. They can be of type A or B. I need to get the A types. but if there are no A types, I need to get the B types... (Sounds so simple when I write it out.., but my brain will not come up with something suitable)
Pleeeeeease put me out of my misery..
You can use ROW_NUMBER for this:
SELECT ID, ID1, [DATE], PERIOD, [TYPE]
FROM (
SELECT ID, ID1, [DATE], PERIOD, [TYPE],
ROW_NUMBER() OVER (PARTITION BY ID1, [DATE], PERIOD
ORDER BY [TYPE]) AS rn
FROM #t) AS t
WHERE t.rn = 1
Using ORDER BY [TYPE] in the OVER clause of ROW_NUMBER places 'A' records on top of 'B' records. If there are no 'A' records for a given ID1, [DATE], PERIOD then B records are assigned rn = 1.
Your desired outpout contradicts the statement that "I need to get the A types. but if there are no A types, I need to get the B types... ". Every date in the data has one or more 'A' types. By the statement, the output should include only the 'A' types. But if the statement is correct, then this should work:
Select d.[DATE], t.Id, t.ID1, t.PERIOD, t.[TYPE]
from (select distinct [date] from #t) d
left join #t t
on t.[date] = d.[date]
and t.type = case when exists
(select * from #t
where [date] = d.[Date]
and type = 'A') then 'A'
else 'B' End
I've just come up with
SELECT * FROM #t WHERE [TYPE]='A'
UNION ALL
SELECT * FROM #t t1 WHERE [TYPE]='B' AND NOT EXISTS (SELECT ID FROM #t WHERE ID1=t1.ID1 AND [TYPE]='A' AND [DATE]=t1.[DATE] AND Period=t1.Period)
ORDER BY ID;
which give's me what I need...
I have a table like below.
I need to get the data like below.
I have created two temp tables and achieved the result like this. Please help me to do the same with PIVOT.
At least I wouldn't use pivot for that, to my mind this is simpler to do with group by and row_number:
select UserId, max(starttime) as starttime, max(endtime) as endtime
from (
select UserId,
case when StartOrEnd = 'S' then time end as starttime,
case when StartOrEnd = 'E' then time end as endtime,
row_number() over (partition by UserID order by time asc)
+ case when StartOrEnd = 'S' then 1 else 0 end as GRP
from table1
) X
group by UserId, GRP
order by starttime
The derived table splits the time into start / end time columns (to handle cases where only one exists) and uses a trick with row number to group the S / E items together. The outer select just groups the rows into the same row.
Example in SQL Fiddle
Not a efficient solution as JamesZ but should work
create table #tst (userid int,start_end char(1),times datetime)
insert #tst values
(1,'S','07-27-2015 16:45'),
(1,'E','07-27-2015 16:46'),
(2,'S','07-27-2015 16:47'),
(2,'E','07-27-2015 16:48'),
(1,'S','07-27-2015 16:49'),
(1,'E','07-27-2015 16:50')
WITH cte
AS (SELECT Row_number()OVER(ORDER BY times) rn,*
FROM #tst),
cte1
AS (SELECT a.userid,
a.start_end,
a.times,
CASE WHEN a.userid = b.userid THEN 0 ELSE 1 END AS com,
a.rn
FROM cte a
LEFT OUTER JOIN cte b
ON a.rn = b.rn + 1),
cte2
AS (SELECT userid,
start_end,
times,
(SELECT Sum(com)
FROM cte1 b
WHERE b.rn <= a.rn) AS row_num
FROM cte1 a)
SELECT USERID,
starttime=Min(CASE WHEN start_end = 's' THEN times END),
endtime=Max(CASE WHEN start_end = 'e' THEN times END)
FROM cte2
GROUP BY USERID,
row_num
Here is another method
declare #t table(userid int, StartOrEnd char(1), time datetime)
insert into #t
select 1,'S','2015-07-27 16:45' union all
select 1,'E','2015-07-27 16:46' union all
select 2,'S','2015-07-27 16:47' union all
select 2,'E','2015-07-27 16:48' union all
select 1,'S','2015-07-27 16:49' union all
select 1,'E','2015-07-27 16:50'
select userid,min(time) as minimum_time, max(time) as maximum_time from
(
select *, row_number() over (partition by cast(UserID as varchar(10))
+StartOrEnd order by time asc) as sno
from #t
) as t
group by userid,sno
Result
userid minimum_time maximum_time
----------- ----------------------- -----------------------
1 2015-07-27 16:45:00.000 2015-07-27 16:46:00.000
2 2015-07-27 16:47:00.000 2015-07-27 16:48:00.000
1 2015-07-27 16:49:00.000 2015-07-27 16:50:00.000