Records apart form max date - sql

I have been helped by Metal to write a SQL as below
select id
, OrderDate
, RejectDate
, max(case when RejectDate = '1900-01-01' then '9999-12-31' else RejectDate end) as rSum
from tableA
group by id, OrderDate, RejectDate
Now, I would like to find out all the records for a partcular id below the max reject date to delete them from a transformation table

An option is to use row_number():
select
id,
OrderDate,
RejectDate
from (
select
t.*,
row_number() over(
partition by id
order by case when RejectDate = '1900-01-01' then '9999-12-31' else RejectDate end desc
) rn
from tableA t
) t
where rn > 1
The advantage of this technique is that it avoids aggregation, which may lead to better performance. Also, you can easily turn this into a delete statement by leveraging the concept of updateable CTE, as follows:
with cte as (
select
row_number() over(
partition by id
order by case when RejectDate = '1900-01-01' then '9999-12-31' else RejectDate end desc
) rn
from tableA t
)
delete from cte where rn > 1

This should work...
SELECT *
FROM tableA t1
INNER JOIN (
SELECT ID, MAX(RejectDate) as MaxRejectDate
FROM tableA) t2 ON t1.ID = t2.ID
WHERE t1.RejectDate < t2.MaxRejectDate

Related

Finding duplicate values in a table where all the columns are not the same

I am working with a set of data in a table.
For simplicity i have the table like below with some sample data:
Some of the data in this table came from a different source, such data are the ones that have cqmRecordID != null
I need to find duplicate values in this table and delete the duplicate ones that came over from the other source (ones with a cqmRecordID)
A record is considered duplicate if they have the same values for these cols:
[Name]
Cast([CreatedDate] as Date)
[CreatedBy]
So in the sample data i have above, record #5 and record #6 would be considered duplicates.
As solutions I came up with these two queries:
Query #1:
select * from (
select recordid, cqmrecordid, ROW_NUMBER() over (partition by name, cast(createddate as date), createdby
order by cqmrecordid, recordid) as rownum
from vmsNCR ) A
where cqmrecordid is not null
order by recordid
Query #2:
select A.recordID, A.cqmRecordID, B.RecordID, B.cqmRecordID
from vmsNCR A
join vmsNCR B
on A.Name = B.Name
and cast(A.CreatedDate as date) = cast(B.CreatedDate as date)
and A.CreatedBy = B.CreatedBy
and A.RecordID != B.RecordID
and A.cqmRecordID is not null
order by A.RecordID
Is there a better approach to this? Is one better than the other performance wise?
If you want to fetch all the rows without duplicates, then:
select t.* -- or all columns except seqnum
from (select t.*,
row_number() over (partition by name, cast(createddate as date), createdby
order by (case when cqmRecordId is not null then 1 else 2 end)
) as seqnum
from t
) t
where seqnum = 1;
If you want performance, create a columns and then an index:
alter table t add cqmRecordId_flag as (case when cqmRecordId is null then 0 else 1 end) persisted;
alter table t add createddate_date as (cast(createddate as date)) persisted;
And then an index:
create index idx_t_4 on t(name, createddate_date, createdby, cqmRecordId_flag desc);
EDIT:
If you actually just want to delete the NULL values from the table, you can use:
delete t from t
where t.cqmRecordId is null and
exists (select 1
from t t2
where t2.name = t.name and
convert(date, t2.createddate_date) =convert(date, t.createddate_date) and
t2.createdby = t.createdby and
t2.cqmRecordId is not null
);
You can use the same logic with select to just select the duplicates.
Try below Query it might work for You
;WITH TestCTE
AS
(
SELECT *,ROW_NUMBER() OVER(
PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy]
ORDER BY RecordId
) AS RowNumber
)
DELETE FROM TestCTE
WHERE RowNumber > 1
Use the below code to eliminate duplicates
;WITH CTE
AS
(
SELECT ROW_NUMBER() OVER(
PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy]
ORDER BY cqmRecordId
) AS Rnk
,*
)
DELETE FROM CTE
WHERE Rnk <> 1

get only row that meet condition if such row exist and if not get the row that meet another condition

this sounds like a simple question but I just cant find the right way.
given the simplified table
with t as (
select ordernumber, orderdate, case when ordertype in (5,21) then 1 else 0 end is_restore , ordertype, row_number() over(order by orderdate) rn from
(
select to_date('29.08.08','DD.MM.YY') orderdate,'313' ordernumber, 1 as ordertype from dual union all
select to_date('13.03.15','DD.MM.YY') orderdate, '90/4/2' ordernumber, 5 as ordertype from dual
)
)
select * from t -- where clause should be here
for every row is_restore guaranteed to be 1 or 0.
if table has a row where is_restore=1 then select ordernumber,orderdate of that row and nothing else.
If a table does not have a row where is_restore=1 then select ordernumber,orderdate of the row where rn=1(row where rn=1 is guaranteed to exist in a table)
Given the requirements above what do I need to put in where clause to get the following?
You could use ROW_NUMBER:
CREATE TABLE t
AS
select ordernumber, orderdate,
case when ordertype in (5,21) then 1 else 0 end is_restore, ordertype,
row_number() over(order by orderdate) rn
from (
select to_date('29.08.08','DD.MM.YY') orderdate,'313' ordernumber,
1 as ordertype
from dual union all
select to_date('13.03.15','DD.MM.YY') orderdate, '90/4/2' ordernumber,
5 as ordertype
from dual);
-------------------
with cte as (
select t.*,
ROW_NUMBER() OVER(/*PARTITION BY ...*/ ORDER BY is_restore DESC, rn) AS rnk
from t
)
SELECT *
FROM cte
WHERE rnk = 1;
db<>fiddle demo
Here is sql, that doesn't use window functions, maybe it will be useful for those, whose databases don't support OVER ( ... ) or when there are indexed fields, on which query is based.
SELECT
*
FROM t
WHERE t.is_restore = 1
OR (
NOT EXISTS (SELECT 1 FROM t WHERE t.is_restore = 1)
AND t.rn = 1
)

penultimate date for each record

I'm struggling with creation of select which shows me penultimate date for each record in my DB.
For example:
id date
1 01.01.2018
1 05.01.2018
1 06.02.2018
2 01.06.2018
2 03.06.2018
3 12.12.2017
Out of this record I need to write select, which shows me following:
ID max_date penultimate
1 06.02.2018 05.01.2018
2 03.06.2018 01.06.2018
3 12.12.2017 NULL
Any idea how to do it? many thanks in advance
Use conditional aggregation and the ANSI-standard row_number() or dense_rank() functions:
select id,
max(date) as max_date,
max(case when seqnum = 2 then date end) as penultimate_date
from (select t.*,
dense_rank() over (partition by id order by date desc) as seqnum
from t
) t
where seqnum in (1, 2)
group by id;
Use row_number() if the dates can be the same in the event of ties.
Use GROUP BY to get the MAX and a correlated subquery with another MAX but this time lower than the former.
SELECT
T.id,
MAX(T.date) max_date,
(
SELECT
MAX(N.date)
FROM
YourTable N
WHERE
N.id = T.id AND
N.date < MAX(T.date)
) penultimate
FROM
YourTable T
GROUP BY
T.id
Just an opitimized query:
;WITH cte AS
(
SELECT id AS ID
,[date] AS max_date
,LEAD ([date], 1, 0) OVER (PARTITION BY id ORDER BY [date] DESC) AS penultimate
,ROW_NUMBER() OVER(PARTITION BY id ORDER BY [date] DESC) AS RN
FROM Table3
)
SELECT ID,max_date,penultimate
FROM cte
WHERE RN=1
SQL Fiddle
I wrote in this way,
SELECT ID
,max(StartDate) MaxDate
,(
SELECT StartDate
FROM YourTable t2
WHERE t2.id = t1.id
ORDER BY StartDate DESC OFFSET 1 ROWS FETCH NEXT 1 ROW ONLY
) penultimate
FROM YourTable t1
GROUP BY id

convert row to column using Pivot without any clause

I have a table like below.
I need to get the data like below.
I have created two temp tables and achieved the result like this. Please help me to do the same with PIVOT.
At least I wouldn't use pivot for that, to my mind this is simpler to do with group by and row_number:
select UserId, max(starttime) as starttime, max(endtime) as endtime
from (
select UserId,
case when StartOrEnd = 'S' then time end as starttime,
case when StartOrEnd = 'E' then time end as endtime,
row_number() over (partition by UserID order by time asc)
+ case when StartOrEnd = 'S' then 1 else 0 end as GRP
from table1
) X
group by UserId, GRP
order by starttime
The derived table splits the time into start / end time columns (to handle cases where only one exists) and uses a trick with row number to group the S / E items together. The outer select just groups the rows into the same row.
Example in SQL Fiddle
Not a efficient solution as JamesZ but should work
create table #tst (userid int,start_end char(1),times datetime)
insert #tst values
(1,'S','07-27-2015 16:45'),
(1,'E','07-27-2015 16:46'),
(2,'S','07-27-2015 16:47'),
(2,'E','07-27-2015 16:48'),
(1,'S','07-27-2015 16:49'),
(1,'E','07-27-2015 16:50')
WITH cte
AS (SELECT Row_number()OVER(ORDER BY times) rn,*
FROM #tst),
cte1
AS (SELECT a.userid,
a.start_end,
a.times,
CASE WHEN a.userid = b.userid THEN 0 ELSE 1 END AS com,
a.rn
FROM cte a
LEFT OUTER JOIN cte b
ON a.rn = b.rn + 1),
cte2
AS (SELECT userid,
start_end,
times,
(SELECT Sum(com)
FROM cte1 b
WHERE b.rn <= a.rn) AS row_num
FROM cte1 a)
SELECT USERID,
starttime=Min(CASE WHEN start_end = 's' THEN times END),
endtime=Max(CASE WHEN start_end = 'e' THEN times END)
FROM cte2
GROUP BY USERID,
row_num
Here is another method
declare #t table(userid int, StartOrEnd char(1), time datetime)
insert into #t
select 1,'S','2015-07-27 16:45' union all
select 1,'E','2015-07-27 16:46' union all
select 2,'S','2015-07-27 16:47' union all
select 2,'E','2015-07-27 16:48' union all
select 1,'S','2015-07-27 16:49' union all
select 1,'E','2015-07-27 16:50'
select userid,min(time) as minimum_time, max(time) as maximum_time from
(
select *, row_number() over (partition by cast(UserID as varchar(10))
+StartOrEnd order by time asc) as sno
from #t
) as t
group by userid,sno
Result
userid minimum_time maximum_time
----------- ----------------------- -----------------------
1 2015-07-27 16:45:00.000 2015-07-27 16:46:00.000
2 2015-07-27 16:47:00.000 2015-07-27 16:48:00.000
1 2015-07-27 16:49:00.000 2015-07-27 16:50:00.000

Get average time between record creation

So I have data like this:
UserID CreateDate
1 10/20/2013 4:05
1 10/20/2013 4:10
1 10/21/2013 5:10
2 10/20/2012 4:03
I need to group by each user get the average time between CreateDates. My desired results would be like this:
UserID AvgTime(minutes)
1 753.5
2 0
How can I find the difference between CreateDates for all records returned for a User grouping?
EDIT:
Using SQL Server 2012
Try this:
SELECT A.UserID,
AVG(CAST(DATEDIFF(MINUTE,B.CreateDate,A.CreateDate) AS FLOAT)) AvgTime
FROM #YourTable A
OUTER APPLY (SELECT TOP 1 *
FROM #YourTable
WHERE UserID = A.UserID
AND CreateDate < A.CreateDate
ORDER BY CreateDate DESC) B
GROUP BY A.UserID
This approach should aslo work.
Fiddle demo here:
;WITH CTE AS (
Select userId, createDate,
row_number() over (partition by userid order by createdate) rn
from Table1
)
select t1.userid,
isnull(avg(datediff(second, t1.createdate, t2.createdate)*1.0/60),0) AvgTime
from CTE t1 left join CTE t2 on t1.UserID = t2.UserID and t1.rn +1 = t2.rn
group by t1.UserID;
Updated: Thanks to #Lemark for pointing out number of diff = recordCount - 1
since you're using 2012 you can use lead() to do this
with cte as
(select
userid,
(datediff(second, createdate,
lead(CreateDate) over (Partition by userid order by createdate)
)/60) datdiff
From table1
)
select
userid,
avg(datdiff)
from cte
group by userid
Demo
Something like this:
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY CreateDate) RN,
UserID,
CreateDate
FROM Tbl
)
SELECT
T1.UserID,
AVG(DATEDIFF(mi, ISNULL(T2.CreateDate, T1.CreateDate), T1.CreateDate)) AvgTime
FROM CTE T1
LEFT JOIN CTE T2
ON T1.UserID = T2.UserID
AND T1.RN = T2.RN - 1
GROUP BY T1.UserID
With SQL 2012 you can use the ROW_NUMBER function and self-join to find the "previous" row in each group:
WITH Base AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY CreateDate) RowNum,
UserId,
CreateDate
FROM Users
)
SELECT
B1.UserID,
ISNULL(
AVG(
DATEDIFF(mi,B2.CreateDate,B1.CreateDate) * 1.0
)
,0) [Average]
FROM Base B1
LEFT JOIN Base B2
ON B1.UserID = B2.UserID
AND B1.RowNum = B2.RowNum + 1
GROUP BY B1.UserId
Although I get a different answer for UserID 1 - I get an average of (5 + 1500) / 2 = 752.
This only works in 2012. You can use the LEAD analytic function:
CREATE TABLE dates (
id integer,
created datetime not null
);
INSERT INTO dates (id, created)
SELECT 1 AS id, '10/20/2013 4:05' AS created
UNION ALL SELECT 1, '10/20/2013 4:10'
UNION ALL SELECT 1, '10/21/2013 5:10'
UNION ALL SELECT 2, '10/20/2012 4:03';
SELECT id, isnull(avg(diff), 0)
FROM (
SELECT id,
datediff(MINUTE,
created,
LEAD(created, 1, NULL) OVER(partition BY id ORDER BY created)
) AS diff
FROM dates
) as diffs
GROUP BY id;
http://sqlfiddle.com/#!6/4ce89/22