Records apart form max date

Records apart form max date - sql

I have been helped by Metal to write a SQL as below
select id
, OrderDate
, RejectDate
, max(case when RejectDate = '1900-01-01' then '9999-12-31' else RejectDate end) as rSum
from tableA
group by id, OrderDate, RejectDate
Now, I would like to find out all the records for a partcular id below the max reject date to delete them from a transformation table

An option is to use row_number():
select
id,
OrderDate,
RejectDate
from (
select
t.*,
row_number() over(
partition by id
order by case when RejectDate = '1900-01-01' then '9999-12-31' else RejectDate end desc
) rn
from tableA t
) t
where rn > 1
The advantage of this technique is that it avoids aggregation, which may lead to better performance. Also, you can easily turn this into a delete statement by leveraging the concept of updateable CTE, as follows:
with cte as (
select
row_number() over(
partition by id
order by case when RejectDate = '1900-01-01' then '9999-12-31' else RejectDate end desc
) rn
from tableA t
)
delete from cte where rn > 1

This should work...
SELECT *
FROM tableA t1
INNER JOIN (
SELECT ID, MAX(RejectDate) as MaxRejectDate
FROM tableA) t2 ON t1.ID = t2.ID
WHERE t1.RejectDate < t2.MaxRejectDate

Related

Finding duplicate values in a table where all the columns are not the same

I am working with a set of data in a table.
For simplicity i have the table like below with some sample data:
Some of the data in this table came from a different source, such data are the ones that have cqmRecordID != null
I need to find duplicate values in this table and delete the duplicate ones that came over from the other source (ones with a cqmRecordID)
A record is considered duplicate if they have the same values for these cols:
[Name]
Cast([CreatedDate] as Date)
[CreatedBy]
So in the sample data i have above, record #5 and record #6 would be considered duplicates.
As solutions I came up with these two queries:
Query #1:
select * from (
select recordid, cqmrecordid, ROW_NUMBER() over (partition by name, cast(createddate as date), createdby
order by cqmrecordid, recordid) as rownum
from vmsNCR ) A
where cqmrecordid is not null
order by recordid
Query #2:
select A.recordID, A.cqmRecordID, B.RecordID, B.cqmRecordID
from vmsNCR A
join vmsNCR B
on A.Name = B.Name
and cast(A.CreatedDate as date) = cast(B.CreatedDate as date)
and A.CreatedBy = B.CreatedBy
and A.RecordID != B.RecordID
and A.cqmRecordID is not null
order by A.RecordID
Is there a better approach to this? Is one better than the other performance wise?

If you want to fetch all the rows without duplicates, then:
select t.* -- or all columns except seqnum
from (select t.*,
row_number() over (partition by name, cast(createddate as date), createdby
order by (case when cqmRecordId is not null then 1 else 2 end)
) as seqnum
from t
) t
where seqnum = 1;
If you want performance, create a columns and then an index:
alter table t add cqmRecordId_flag as (case when cqmRecordId is null then 0 else 1 end) persisted;
alter table t add createddate_date as (cast(createddate as date)) persisted;
And then an index:
create index idx_t_4 on t(name, createddate_date, createdby, cqmRecordId_flag desc);
EDIT:
If you actually just want to delete the NULL values from the table, you can use:
delete t from t
where t.cqmRecordId is null and
exists (select 1
from t t2
where t2.name = t.name and
convert(date, t2.createddate_date) =convert(date, t.createddate_date) and
t2.createdby = t.createdby and
t2.cqmRecordId is not null
);
You can use the same logic with select to just select the duplicates.

Try below Query it might work for You
;WITH TestCTE
AS
(
SELECT *,ROW_NUMBER() OVER(
PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy]
ORDER BY RecordId
) AS RowNumber
)
DELETE FROM TestCTE
WHERE RowNumber > 1

Use the below code to eliminate duplicates
;WITH CTE
AS
(
SELECT ROW_NUMBER() OVER(
PARTITION BY [Name],Cast([CreatedDate] as Date),[CreatedBy]
ORDER BY cqmRecordId
) AS Rnk
,*
)
DELETE FROM CTE
WHERE Rnk <> 1

get only row that meet condition if such row exist and if not get the row that meet another condition

this sounds like a simple question but I just cant find the right way.
given the simplified table
with t as (
select ordernumber, orderdate, case when ordertype in (5,21) then 1 else 0 end is_restore , ordertype, row_number() over(order by orderdate) rn from
(
select to_date('29.08.08','DD.MM.YY') orderdate,'313' ordernumber, 1 as ordertype from dual union all
select to_date('13.03.15','DD.MM.YY') orderdate, '90/4/2' ordernumber, 5 as ordertype from dual
)
)
select * from t -- where clause should be here
for every row is_restore guaranteed to be 1 or 0.
if table has a row where is_restore=1 then select ordernumber,orderdate of that row and nothing else.
If a table does not have a row where is_restore=1 then select ordernumber,orderdate of the row where rn=1(row where rn=1 is guaranteed to exist in a table)
Given the requirements above what do I need to put in where clause to get the following?

You could use ROW_NUMBER:
CREATE TABLE t
AS
select ordernumber, orderdate,
case when ordertype in (5,21) then 1 else 0 end is_restore, ordertype,
row_number() over(order by orderdate) rn
from (
select to_date('29.08.08','DD.MM.YY') orderdate,'313' ordernumber,
1 as ordertype
from dual union all
select to_date('13.03.15','DD.MM.YY') orderdate, '90/4/2' ordernumber,
5 as ordertype
from dual);
-------------------
with cte as (
select t.*,
ROW_NUMBER() OVER(/*PARTITION BY ...*/ ORDER BY is_restore DESC, rn) AS rnk
from t
)
SELECT *
FROM cte
WHERE rnk = 1;
db<>fiddle demo

Here is sql, that doesn't use window functions, maybe it will be useful for those, whose databases don't support OVER ( ... ) or when there are indexed fields, on which query is based.
SELECT
*
FROM t
WHERE t.is_restore = 1
OR (
NOT EXISTS (SELECT 1 FROM t WHERE t.is_restore = 1)
AND t.rn = 1
)

penultimate date for each record

I'm struggling with creation of select which shows me penultimate date for each record in my DB.
For example:
id date
1 01.01.2018
1 05.01.2018
1 06.02.2018
2 01.06.2018
2 03.06.2018
3 12.12.2017
Out of this record I need to write select, which shows me following:
ID max_date penultimate
1 06.02.2018 05.01.2018
2 03.06.2018 01.06.2018
3 12.12.2017 NULL
Any idea how to do it? many thanks in advance

Use conditional aggregation and the ANSI-standard row_number() or dense_rank() functions:
select id,
max(date) as max_date,
max(case when seqnum = 2 then date end) as penultimate_date
from (select t.*,
dense_rank() over (partition by id order by date desc) as seqnum
from t
) t
where seqnum in (1, 2)
group by id;
Use row_number() if the dates can be the same in the event of ties.

Use GROUP BY to get the MAX and a correlated subquery with another MAX but this time lower than the former.
SELECT
T.id,
MAX(T.date) max_date,
(
SELECT
MAX(N.date)
FROM
YourTable N
WHERE
N.id = T.id AND
N.date < MAX(T.date)
) penultimate
FROM
YourTable T
GROUP BY
T.id

Just an opitimized query:
;WITH cte AS
(
SELECT id AS ID
,[date] AS max_date
,LEAD ([date], 1, 0) OVER (PARTITION BY id ORDER BY [date] DESC) AS penultimate
,ROW_NUMBER() OVER(PARTITION BY id ORDER BY [date] DESC) AS RN
FROM Table3
)
SELECT ID,max_date,penultimate
FROM cte
WHERE RN=1
SQL Fiddle

I wrote in this way,
SELECT ID
,max(StartDate) MaxDate
,(
SELECT StartDate
FROM YourTable t2
WHERE t2.id = t1.id
ORDER BY StartDate DESC OFFSET 1 ROWS FETCH NEXT 1 ROW ONLY
) penultimate
FROM YourTable t1
GROUP BY id

convert row to column using Pivot without any clause

I have a table like below.
I need to get the data like below.
I have created two temp tables and achieved the result like this. Please help me to do the same with PIVOT.

At least I wouldn't use pivot for that, to my mind this is simpler to do with group by and row_number:
select UserId, max(starttime) as starttime, max(endtime) as endtime
from (
select UserId,
case when StartOrEnd = 'S' then time end as starttime,
case when StartOrEnd = 'E' then time end as endtime,
row_number() over (partition by UserID order by time asc)
+ case when StartOrEnd = 'S' then 1 else 0 end as GRP
from table1
) X
group by UserId, GRP
order by starttime
The derived table splits the time into start / end time columns (to handle cases where only one exists) and uses a trick with row number to group the S / E items together. The outer select just groups the rows into the same row.
Example in SQL Fiddle

Not a efficient solution as JamesZ but should work
create table #tst (userid int,start_end char(1),times datetime)
insert #tst values
(1,'S','07-27-2015 16:45'),
(1,'E','07-27-2015 16:46'),
(2,'S','07-27-2015 16:47'),
(2,'E','07-27-2015 16:48'),
(1,'S','07-27-2015 16:49'),
(1,'E','07-27-2015 16:50')
WITH cte
AS (SELECT Row_number()OVER(ORDER BY times) rn,*
FROM #tst),
cte1
AS (SELECT a.userid,
a.start_end,
a.times,
CASE WHEN a.userid = b.userid THEN 0 ELSE 1 END AS com,
a.rn
FROM cte a
LEFT OUTER JOIN cte b
ON a.rn = b.rn + 1),
cte2
AS (SELECT userid,
start_end,
times,
(SELECT Sum(com)
FROM cte1 b
WHERE b.rn <= a.rn) AS row_num
FROM cte1 a)
SELECT USERID,
starttime=Min(CASE WHEN start_end = 's' THEN times END),
endtime=Max(CASE WHEN start_end = 'e' THEN times END)
FROM cte2
GROUP BY USERID,
row_num

Here is another method
declare #t table(userid int, StartOrEnd char(1), time datetime)
insert into #t
select 1,'S','2015-07-27 16:45' union all
select 1,'E','2015-07-27 16:46' union all
select 2,'S','2015-07-27 16:47' union all
select 2,'E','2015-07-27 16:48' union all
select 1,'S','2015-07-27 16:49' union all
select 1,'E','2015-07-27 16:50'
select userid,min(time) as minimum_time, max(time) as maximum_time from
(
select *, row_number() over (partition by cast(UserID as varchar(10))
+StartOrEnd order by time asc) as sno
from #t
) as t
group by userid,sno
Result
userid minimum_time maximum_time
----------- ----------------------- -----------------------
1 2015-07-27 16:45:00.000 2015-07-27 16:46:00.000
2 2015-07-27 16:47:00.000 2015-07-27 16:48:00.000
1 2015-07-27 16:49:00.000 2015-07-27 16:50:00.000

Get average time between record creation

So I have data like this:
UserID CreateDate
1 10/20/2013 4:05
1 10/20/2013 4:10
1 10/21/2013 5:10
2 10/20/2012 4:03
I need to group by each user get the average time between CreateDates. My desired results would be like this:
UserID AvgTime(minutes)
1 753.5
2 0
How can I find the difference between CreateDates for all records returned for a User grouping?
EDIT:
Using SQL Server 2012

Try this:
SELECT A.UserID,
AVG(CAST(DATEDIFF(MINUTE,B.CreateDate,A.CreateDate) AS FLOAT)) AvgTime
FROM #YourTable A
OUTER APPLY (SELECT TOP 1 *
FROM #YourTable
WHERE UserID = A.UserID
AND CreateDate < A.CreateDate
ORDER BY CreateDate DESC) B
GROUP BY A.UserID

This approach should aslo work.
Fiddle demo here:
;WITH CTE AS (
Select userId, createDate,
row_number() over (partition by userid order by createdate) rn
from Table1
)
select t1.userid,
isnull(avg(datediff(second, t1.createdate, t2.createdate)*1.0/60),0) AvgTime
from CTE t1 left join CTE t2 on t1.UserID = t2.UserID and t1.rn +1 = t2.rn
group by t1.UserID;
Updated: Thanks to #Lemark for pointing out number of diff = recordCount - 1

since you're using 2012 you can use lead() to do this
with cte as
(select
userid,
(datediff(second, createdate,
lead(CreateDate) over (Partition by userid order by createdate)
)/60) datdiff
From table1
)
select
userid,
avg(datdiff)
from cte
group by userid
Demo

Something like this:
;WITH CTE AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY CreateDate) RN,
UserID,
CreateDate
FROM Tbl
)
SELECT
T1.UserID,
AVG(DATEDIFF(mi, ISNULL(T2.CreateDate, T1.CreateDate), T1.CreateDate)) AvgTime
FROM CTE T1
LEFT JOIN CTE T2
ON T1.UserID = T2.UserID
AND T1.RN = T2.RN - 1
GROUP BY T1.UserID

With SQL 2012 you can use the ROW_NUMBER function and self-join to find the "previous" row in each group:
WITH Base AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY CreateDate) RowNum,
UserId,
CreateDate
FROM Users
)
SELECT
B1.UserID,
ISNULL(
AVG(
DATEDIFF(mi,B2.CreateDate,B1.CreateDate) * 1.0
)
,0) [Average]
FROM Base B1
LEFT JOIN Base B2
ON B1.UserID = B2.UserID
AND B1.RowNum = B2.RowNum + 1
GROUP BY B1.UserId
Although I get a different answer for UserID 1 - I get an average of (5 + 1500) / 2 = 752.

This only works in 2012. You can use the LEAD analytic function:
CREATE TABLE dates (
id integer,
created datetime not null
);
INSERT INTO dates (id, created)
SELECT 1 AS id, '10/20/2013 4:05' AS created
UNION ALL SELECT 1, '10/20/2013 4:10'
UNION ALL SELECT 1, '10/21/2013 5:10'
UNION ALL SELECT 2, '10/20/2012 4:03';
SELECT id, isnull(avg(diff), 0)
FROM (
SELECT id,
datediff(MINUTE,
created,
LEAD(created, 1, NULL) OVER(partition BY id ORDER BY created)
) AS diff
FROM dates
) as diffs
GROUP BY id;
http://sqlfiddle.com/#!6/4ce89/22

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Records apart form max date - sql

This should work... SELECT * FROM tableA t1 INNER JOIN ( SELECT ID, MAX(RejectDate) as MaxRejectDate FROM tableA) t2 ON t1.ID = t2.ID WHERE t1.RejectDate < t2.MaxRejectDate

Related

Finding duplicate values in a table where all the columns are not the same

get only row that meet condition if such row exist and if not get the row that meet another condition

penultimate date for each record

convert row to column using Pivot without any clause

Get average time between record creation

Categories

Resources