SQL Data Duplication Query

SQL Data Duplication Query - sql

Greetings of the day!!!!
I have a table having multiple columns of data with different status.
Assume I have 500 rows of data with Status 'Valid' And I have 150 rows of data with 'chkDuplicate'.
Now I have to write query to Update these 150 records status to Valid or Invalid by comparing few columns for duplication like Address,City,State.
How to achieve this, It needs to support large data tables as well.
Thanks in advance....
TABLE DEFINITION
CREATE TABLE XYZ
(
ID bigint,
ADDRESS navrchar,
CITY navrchar,
STATE nvarchar,
ZIP nvarchar,
STATUS
)
Status should update based on duplication query.
Important!!!! For Duplicate data first record should be valid others should be invalid. If re-process the Invalid data again it should not disturb the valid records.
If I run query the above table should be same. Record 1,3 should be Success and 3,4 should be 'Duplicate'. Even if i have add few more 1,3 always be in Success other duplicates should be updated to 'Duplicate'.

This query returned duplicate rows.
select tbl.data1, tbl.data2, tbl.data3
from TestTable1 tbl
inner join (
SELECT data1 , data2, data3 , COUNT(*) AS dupCount
FROM TestTable1
GROUP BY data1, data2, data3
HAVING COUNT(*) > 1
) oc on tbl.data1 = oc.data1 and tbl.data2 = oc.data2 and tbl.data3 = oc.data3
then use Cursor and update duplicate row
Cursor Expamle

Added ID for ORDER BY clause then it works for me even if I re-process the duplication call multiple times.
WITH TABLE_DATA_DUPLICATE AS
(SELECT * ,ROW_NUMBER() OVER(
PARTITION BY STREET1,CITY,STATE,ZIP
ORDER BY STREET1,CITY,STATE,ZIP,ID
) NO_OF_REPEATS
FROM YOURTABLE(NOLOCK))
UPDATE TABLE_DATA_DUPLICATE SET STATUS = (CASE WHEN NO_OF_REPEATS = 1 THEN 'VALID' ELSE 'DUPLICATE' END)
Thanks everyone for support.... Cheers!!!!

Related

SQL Server 2008 R2: update one occurrence of a group's NULL value and delete the rest

I have a table of orders which has multiple rows of orders missing a Type and I'm struggling to get the queries right. I'm pretty new to SQL so please bear with me.
I've illustrated an example in the picture below. I need help creating the query that will take the table to the right and UPDATE it to look like the right table.
The orders are sorted by group. Each group should have one instance of type OK (IF A NULL OR OK ALREADY EXISTS), and no instances of NULL. I would like to achieve this by updating one of the groups' orders with type NULL to have type OK and delete the rest of the respective group's NULL rows.
I've managed to get the rows that I want to keep by
Create a temporary table where I insert the orders and replace NULL types with EMPTY
From the temporary table, get the existing OK orders for groups that already have one OK order, else an EMPTY order that should be changed to OK.
I've done this with the following:
SELECT * FROM Orders
SELECT *
INTO #modified
FROM
(SELECT
Id, IdGroup,
CASE WHEN Type IS NULL
THEN 'EMPTY'
ELSE Type
END Type
FROM
Orders) AS XXX
SELECT MIN(x.Id) Id, x.IdGroup, x.Type
FROM #modified x
JOIN
(SELECT
IdGroup, MIN (Type) AS min_Type
FROM #modified a
WHERE Type = 'OK' OR Type = 'EMPTY'
GROUP BY IdGroup) y ON y.IdGroup = x.IdGroup AND y.min_Type = x.Type
GROUP BY x.IdGroup, x.Type
DROP TABLE #modified
The rest of the EMPTY orders should after this step be deleted, but I don't know how to proceed from here. Maybe this is a poor approach from the beginning and maybe it could be done even easier?

Well done for writing a question that shows some effort and clearly explains what you're after. That's a rare thing unfortunately!
This is how I would do it:
First backup the table (I like to put them into a different schema to keep things neat)
CREATE SCHEMA bak;
SELECT * INTO bak.Orders FROM dbo.Orders;
Now you can do a trial run on the bak table if you like.
Anyway...
Set all the NULL types to OK
UPDATE Orders SET Type = 'OK' WHERE Type IS NULL;
Now repeatedly delete redundant records. Find records with more than one OK and delete them:
DELETE Orders WHERE ID In
(
SELECT MIN(Id) Id
FROM Orders
WHERE Type = 'OK';
GROUP BY idGroup
HAVING COUNT(*) > 1
);
You'll need to run that one a few times until it affects zero records

Assuming there are no multiple OKs and each group has at least one Ok or NULL value, you can do:
select t.id, t.idGroup, t.Type
from lefttable t
where t.Type is not null and t.Type <> 'OK'
union all
select t.id, t.idGroup, 'OK'
from (select t.*, row_number() over (partition by idGroup order by coalesce(t.Type, 'ZZZ')) as seqnum
from lefttable t
where t.Type is null or t.Type = 'OK'
) t
where seqnum = 1;
Actually, this will work even if you do have multiple OKs, but it will keep only of of the rows.
The first subquery selects all rows that are not OK or NULL. The second chooses exactly one of those group and assign the type as OK.

If you want to keep any OK ones in preference to a NULL, this will work. It creates a temp table with everything we need to work on (OK and NULL), and numbers them starting from one with each group, ordered so you list OK records before null ones. Then it makes sure all the first records are OK, and deletes all the rest
Create table #work (Id int, RowNo int)
--Get a list of all the rows we need to work on, and number them for each group
--(order by type desc puts OK before nulls)
Insert into #work (Id, RowNo)
Select Id, ROW_NUMBER() over (partition by IdGroup order by type desc) as RowNo
From Orders O
where (type is null OR type = 'OK');
-- Make sure the one we keep is OK, not null
Update O set type = 'OK'
from #Work W
inner join Orders O on O.Id = W.Id
Where W.RowNo = 1 and O.type IS NULL;
--Delete the remaining ones (any rowno > 1)
Delete O
from #Work W
inner join Orders O on O.Id = W.Id
Where W.RowNo > 1;
drop table #work;

Can't you just delete the rows where Type equals null?
DELETE FROM Orders WHERE Type IS NULL

Nested SQL Queries with Self JOIN - How to filter rows OUT

I have an SQLite3 database with a table upon which I need to filter by several factors. Once such factor is to filter our rows based on the content of other rows within the same table.
From what I've researched, a self JOIN is going to be required, but I am not sure how I would do that to filter the table by several factors.
Here is a sample table of the data:
Name Part # Status Amount
---------------------------------
Item 1 12345 New $100.00
Item 2 12345 New $15.00
Item 3 35864 Old $132.56
Item 4 12345 Old $15.00
What I need to do is find any Items that have the same Part #, one of them has an "Old" Status and the Amount is the same.
So, first we would get all rows with Part # "12345," and then check if any of the rows have an "Old" status with a matching Amount. In this example, we would have Item2 and Item4 as a result.
What now would need to be done is to return the REST of the rows within the table, that have a "New" Status, essentially discarding those two items.
Desired Output:
Name Part # Status Amount
---------------------------------
Item 1 12345 New $100.00
Removed all "Old" status rows and any "New" that had a matching "Part #" and "Amount" with an "Old" status. (I'm sorry, I know that's very confusing, hence my need for help).
I have looked into the following resources to try and figure this out on my own, but there are so many levels that I am getting confused.
Self-join of a subquery
ZenTut
Compare rows and columns of same table
The first two links dealt with comparing columns within the same table. The third one does seem to be a pretty similar question, but does not have a readable answer (for me, anyway).
I do Java development as well and it would be fairly simple to do this there, but I am hoping for a single SQL query (nested), if possible.

The "not exists" statment should do the trick :
select * from table t1
where t1.Status = 'New'
and not exists (select * from table t2
where t2.Status = 'Old'
and t2.Part = t1.Part
and t2.Amount = t1.Amount);

This is a T-SQL answer. Hope it is translatable. If you have a big data set for matches you might change the not in to !Exists.
select *
from table
where Name not in(
select Name
from table t1
join table t2
on t1.PartNumber = t2.PartNumber
AND t1.Status='New'
AND t2.Status='Old'
and t1.Amount=t2.Amount)
and Status = 'New'

could be using an innner join a grouped select for get status old and not only this
select * from
my_table
INNER JOIN (
select
Part_#
, Amount
, count(distinct Status)
, sum(case when Status = 'Old' then 1 else 0 )
from my_table
group part_#, Amount,
having count(distinct Status)>1
and sum(case when Status = 'Old' then 1 else 0 ) > 0
) t on.t.part_# = my_table.part_#
and status = 'new'
and my_table.Amount <> t.Amount

Tried to understand what you want best I could...
SELECT DISTINCT yt.PartNum, yt.Status, yt.Amount
FROM YourTable yt
JOIN YourTable yt2
ON yt2.PartNum = yt.PartNum
AND yt2.Status = 'Old'
AND yt2.Amount != yt.Amount
WHERE yt.Status = 'New'
This gives everything with a new status that has an old status with a different price.

MS SQL Script to find rows where value does not exist

I have a situation where I have in one table record 'a' which have order number 0 and also record 'a' but with order number 1 - this is correct.
i also have record 'b' which has order number 1 and there is no row for record 'b' where order number = 0. - this is not correct.
I need to create a script which will find all records where order number = 1 but order number 0 doesn't exist. Can you guys help with this?
i cannot use simple:
SELECT DISTINCT record FROM tablename WHERE order_number <> 0
because it will give me also record 'a' which i don't want to have in results.
I was thinking about using Not Exists function but it always compares 2 tables where i have all records in one table.
Regards

Using Not Inin Where will eliminate 'a' and will give only 'b'
Try this:-
SELECT DISTINCT record FROM tablename WHERE order_number <> 0
and record not in (Select record from tablename WHERE order_number = 0);
hope this helps:-)

Order by data as per supplied Id in sql

Query:
SELECT *
FROM [MemberBackup].[dbo].[OriginalBackup]
where ration_card_id in
(
1247881,174772,
808454,2326154
)
Right now the data is ordered by the auto id or whatever clause I'm passing in order by.
But I want the data to come in sequential format as per id's I have passed
Expected Output:
All Data for 1247881
All Data for 174772
All Data for 808454
All Data for 2326154
Note:
Number of Id's to be passed will 300 000

One option would be to create a CTE containing the ration_card_id values and the orders which you are imposing, and the join to this table:
WITH cte AS (
SELECT 1247881 AS ration_card_id, 1 AS position
UNION ALL
SELECT 174772, 2
UNION ALL
SELECT 808454, 3
UNION ALL
SELECT 2326154, 4
)
SELECT t1.*
FROM [MemberBackup].[dbo].[OriginalBackup] t1
INNER JOIN cte t2
ON t1.ration_card_id = t2.ration_card_id
ORDER BY t2.position DESC
Edit:
If you have many IDs, then neither the answer above nor the answer given using a CASE expression will suffice. In this case, your best bet would be to load the list of IDs into a table, containing an auto increment ID column. Then, each number would be labelled with a position as its record is being loaded into your database. After this, you can join as I have done above.

If the desired order does not reflect a sequential ordering of some preexisting data, you will have to specify the ordering yourself. One way to do this is with a case statement:
SELECT *
FROM [MemberBackup].[dbo].[OriginalBackup]
where ration_card_id in
(
1247881,174772,
808454,2326154
)
ORDER BY CASE ration_card_id
WHEN 1247881 THEN 0
WHEN 174772 THEN 1
WHEN 808454 THEN 2
WHEN 2326154 THEN 3
END
Stating the obvious but note that this ordering most likely is not represented by any indexes, and will therefore not be indexed.

Insert your ration_card_id's in #temp table with one identity column.
Re-write your sql query as:
SELECT a.*
FROM [MemberBackup].[dbo].[OriginalBackup] a
JOIN #temps b
on a.ration_card_id = b.ration_card_id
order by b.id

How to update specific rows based on identified rows

have identified specific rows based on unique id in the data. I want to update those rows one column. Trying to use update command but its not working
UPDATE L03_A_AVOX_DATA
SET PWC_Exclusion_Flag =
(CASE
WHEN (L03_A_AVOX_DATA.PWC_SEQ_AVOX IN
(SELECT PWC_SEQ_AVOX
FROM L03_A_AVOX_DATA
WHERE client_id IN
(SELECT DISTINCT client_id
FROM ( SELECT DISTINCT
client_id,
extract_type,
COUNT (*)
FROM temp
GROUP BY client_id,
extract_type
HAVING COUNT (*) = 1))
AND extract_type = '0'))
THEN
1
ELSE
L03_A_AVOX_DATA.PWC_Exclusion_Flag
END )
Can anyone help me

You should simplify this statement by trying to simulate an UPDATE with JOIN.
For more details see here:
Update statement with inner join on Oracle
This idea should work for your case too.
So those records which have counterparts in the temp table, you update them.
Those which don't have counterparts - seems you don't want to update them anyway.

You're trying to update the PWC_Exclusion_Flag to 1 if the client_id has exactly 1 record of extract_type 0 in the temp table, am I right?
Try this:
update L03_A_AVOX_DATA
set PWC_Exclusion_Flag = 1
where client_id in (
select client_id
from temp
where extract_type = '0'
group by client_id
having count(1) = 1
);
This also leaves the other records in L03_A_AVOX_DATA untouched.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas