Different Rules to Handle duplicate values in Table

Different Rules to Handle duplicate values in Table - sql

Duplicates in Table 1 are indentifies as follows;
select quote_ref, count (*)
from table 1
group by quote_ref
having count(*) > 1
Now I want the eliminate the duplicates based on the 2 rules below .
Take the entry that has the Status= Complete
If none in complete status then take the one with max([created_date ])
Else Flag to look at ?
Suppose I need a CASE statement with a delete, but not sure how to construct ?

For SQL Server 2005+, you can do the following:
;WITH CTE AS
(
SELECT *,
ROW_NUMBER()
OVER(PARTITION BY quote_ref
ORDER BY CASE WHEN [Status]='COMPLETE' THEN 1 ELSE 2 END,
created_date DESC) RowNum
FROM table1
)
DELETE FROM CTE
WHERE RowNum != 1
In this case, I'm assuming that the row that you don't want to delete is the one with status = 'COMPLETE' or the one with the maximum created_date. If is the other way around, you can simply change the WHERE condition.

Related

SQL query to combine Select duplicates with count and grouping with delete based on Top but not the top 1 of each duplicate

I am looking to combine these 2 statement into one to run as a stored procedure if possible.
I have not used temp tables in queries before and may have to with this, not sure asking advice.
I did not write the original queries and manually run the first one which returns a table listing ID's with duplicate data nad how many records. Then each record ID is put into the 2nd query to remove all but the TOP 1 based on additional filtering criteria.
I have looked at using CTE from SQL select into delete DIRECTLY but am stil at a loss on how to pass each result row ID value into the delete query.
The queries, edited for public consumption are
SELECT id, count() FROM [DEV].[dbo].[7dtest] where FileVer = 1 and CALC_DATE > FORMAT(DATEADD(DD,-7,GETDATE()), 'yyyy-MM-dd') group by id having count() > 1 order by count(*) desc
returns a table with id and number of duplicate rows
then take the id of each row and put into this delete statement
delete from [DEV].[dbo].[7dtest] where AutoID not in (
SELECT TOP 1 AutoID FROM [DEV].[dbo].[7dtest] where FileVer = 1 and id = '123' and CALC_DATE > FORMAT(DATEADD(DD,-7,GETDATE()), 'yyyy-MM-dd')
order by COMPLETED_DATE_CHECK_3 desc, COMPLETED_DATE_CHECK_2 desc, COMPLETED_DATE_CHECK_1 desc)
and FileVer = 1 and id = '123' and CALC_DATE > FORMAT(DATEADD(DD,-7,GETDATE()), 'yyyy-MM-dd')
Can this be done with CTE or do I need to create a temp table and some looping to get the ID one row at a time? Is there a better way I should be doing this?
TIA

SQL Help: SELECT CASE?

I'm looking for some general guidance on the best solution for a reoccurring SQL query. Basically, I want to create a view of a table which has a lot of nearly identical rows, (except for 1 discerning column called [Status], which can be either 'Closed' or 'Draft').
I want to return distinct data for each [Port], if both 'Closed' and 'Draft' exist, then return only the 'Draft' row data, and if only 'Closed' exists, then return the 'Closed' row data.
Please refer to the attached files for a visual. Any assistance is greatly appreciated! I believe this solution will lend itself well to other practical cases/solutions for me in the future - thank you!
Original Table Data:
Example Output:

Try this,
select c.Port,c.DateAdded,max(Status) as Status
from myTable c
group by c.Port,c.DateAdded
Basically, group the table, and take the highest status code (Closed or Draft)
If both exists, Draft will be returned

Use NOT EXISTS:
SELECT t1.*
FROM tablename t1
WHERE t1.Status = 'Draft'
OR NOT EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.Port = t1.Port AND t1.Status = 'Draft'
)
Or with ROW_NUMBER() window function:
SELECT Port, DateAdded, Status
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Port ORDER BY CASE WHEN Status = 'Draft' THEN 1 ELSE 2 END) rn
FROM tablename
) t
WHERE rn = 1

A rewording of your requirement is to return just one row per Port, and that Draft rows take precidence over Closed rows.
You don't make clear if they can have different dates though. Such that if one port has two Draft rows or two Closed rows, do you want the earlier dated row, or the later dated row?
The code below presumes the dates can indeed be different, and that your prefer the later dated row.
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY port ORDER BY status DESC, dateAdded DESC) AS seq_num
FROM
YourTable
)
SELECT
*
FROM
sorted
WHERE
seq_num = 1
If the dates are always identical, MAX(status) with GROUP BY port, dateAdded is easily sufficient.

I'd use a full outer join, and coalesce the results so that the "draft" row is preferred over the "closed" row:
SELECT COALESCE(d.Port, c.Port),
COALESCE(d.DateAdded, c.DateAdded),
COALESCE(d.Status, c.Status)
FROM (SELECT Port, DateAdded, Status
FROM mytable
WHERE Status = 'Draft') d
FULL OUTER JOIN (SELECT Port, DateAdded, Status
FROM mytable
WHERE Status = 'Closed') c ON d.Port = c.Port

Listing multiple columns in a single row in SQL

(select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,ROW_NUMBER() OVER(PARTITION BY EXTERNAL_TRANSACTION_ID ORDER BY ID ) AS SEQNUM
from AC_POS_TRANSACTION_TRK aptt WHERE [RESULT] ='Success'
GROUP BY ID, EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE )
Hello,
On above query, I want to get rows of transaction id's which has seqnum=1 and seqnum=2
But if that transaction id has no second row (seqnum=2), I dont want to get any row for that transaction id.
Thanks!!
Something like this

Not 100% sure if this is correct without you table definition, but my understanding is that you want to EXCLUDE records if that record has an entry with seqnum=2 -- you can't use a where clause alone because that would still return seqnum = 1.
You can use an exists /not exists or in/not in clause like this
(select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,ROW_NUMBER() OVER(PARTITION BY EXTERNAL_TRANSACTION_ID ORDER BY ID ) AS SEQNUM
from AC_POS_TRANSACTION_TRK aptt WHERE [RESULT] ='Success'
and not exists ( select 1 from AC_POS_TRANSACTION_TRK a where a.id = aptt.id
and a.seqnum = 2)
GROUP BY ID, EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE )
basically what this does is it excludes records if a record exists as specified in the NOT EXISTS query.

One option you can try is to add a count of rows per group using the same partioning critera and then filter accordingly. Not entirely sure about your query without seeing it in context and with sample data - there's no aggregation so why use group by?
However can you try something along these lines
select * from (
select ID,EXTERNAL_TRANSACTION_ID,EXTERNAL_TRANSACTION_TYPE,
Row_Number() over(partition by EXTERNAL_TRANSACTION_ID order by ID) as SEQNUM,
Count(*) over(partition by EXTERNAL_TRANSACTION_ID) Qty
from AC_POS_TRANSACTION_TRK
where [RESULT] ='Success'
)x
where SEQNUM in (1,2) and Qty>1

This should do the job.
With Qry As (
-- Your original query goes here
),
Select Qry.*
From Qry
Where Exists (
Select *
From Qry Qry1
Where Qry1.EXTERNAL_TRANSACTION_ID = Qry.EXTERNAL_TRANSACTION_ID
And Qry1.SEQNUM = 1
)
And Exists (
Select *
From Qry Qry2
Where Qry2.EXTERNAL_TRANSACTION_ID = Qry.EXTERNAL_TRANSACTION_ID
And Qry2.SEQNUM = 2
)
BTW, your original query looks problematic to me, specifically I think that instead of a GROUP BY columns those columns should be in the PARTITION BY clause of the OVER statement, but without knowing more about the table structures and what you're trying to achieve, I could not say for sure.

Lag Function to skip over previous record where there is a null

I am trying to get a previous value using the lag function, however it only works for data that is populated on the previous record. What I am looking to do is skip the previous record only if there is a null and look at the previous record prior to that which is not a null
Select LAG(previous_reference_no)OVER(ORDER BY createdon) FROM TableA
So say if I am at record 5,record 4 is null however record 3 is not null. So from record 5 I would want to display the value of record 4.
Hope this makes sense, please help/

Add a PARTITION BY clause?
Select LAG(previous_reference_no) OVER (PARTITION BY CASE WHEN previous_reference_no IS NULL THEN 0 ELSE 1 END
ORDER BY createdon)
FROM TableA

Standard SQL has the syntax for this:
SELECT LAG(previous_reference_no IGNORE NULLS) OVER (ORDER BY createdon)
FROM TableA
Unfortunately SQL Server does not support this. One method uses two levels of window functions and some logic:
SELECT (CASE WHEN previous_reference_no IS NULL
THEN MAX(prev_reference_no) OVER (PARTITION BY grp)
ELSE LAG(previous_reference_no) OVER (PARTITION BY (CASE WHEN previous_reference_no IS NOT NULL THEN 1 ELSE 0 END)
ORDER BY createdon)
END)
FROM (SELECT a.*,
COUNT(prev_reference_no) OVER (ORDER BY a.createdon) as grp
FROM TableA a
) a;
The logic is:
Create a grouping that has a given reference number and all following NULL values in one group.
If the reference number is NULL, then get the first value for the start of the group. This would be the previous non-NULL value.
If the reference number is not NULL then use partition by to look at the last not-NULL value.
Another method -- which is likely to be much slower -- uses APPLY:
select a.*, aprev.prev_reference_no
from TableA a outer apply
(select top (1) aprev.*
from TableA aprev
where aprev.createdon < a.createdon and
aprev.prev_reference_no is not null
) aprev;
For a small table, the performance hit might be worth the simplicity of the code.

Select last duplicate row with different id Oracle 11g

I have a table that look like this:
The problem is I need to get the last record with duplicates in the column "NRODENUNCIA".

You can use MAX(DENUNCIAID), along with GROUP BY... HAVING to find the duplicates and select the row with the largest DENUNCIAID:
SELECT MAX(DENUNCIAID), NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
FROM YourTable
GROUP BY NRODENUNCIA, FECHAEMISION, ADUANA, MES, NOMBREESTADO
HAVING COUNT(1) > 1
This will only show rows that have at least one duplicate. If you want to see non-duplicate rows too, just remove the HAVING COUNT(1) > 1

There are a number of solutions for your problem. One is to use row_number.
Note that I've ordered by DENUNCIID in the OVER clause. This defines the "Last Record" as the one that has the largest DENUNCIID. If you want to define it differently you'd need to change the field that is being ordered.
with dupes as (
SELECT
ROW_NUMBER() OVER (Partition by NRODENUNCIA ORDER BY DENUNCIID DESC) RN,
*
FROM
YourTable
)
SELECT * FROM dupes where rn = 1
This only get's the last record per dupe.
If you want to only include records that have dupes then you change the where clause to
WHERE rn =1
and NRODENUNCIA in (select NRODENUNCIA from dupes where rn > 1)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Different Rules to Handle duplicate values in Table - sql

Related

SQL query to combine Select duplicates with count and grouping with delete based on Top but not the top 1 of each duplicate

SQL Help: SELECT CASE?

Listing multiple columns in a single row in SQL

Lag Function to skip over previous record where there is a null

Select last duplicate row with different id Oracle 11g

Categories

Resources