Update Query using a Subquery to mark Duplicates

Update Query using a Subquery to mark Duplicates - sql

I have a Table that regularly gets Duplicate values added in. A simple fix would be to just add an extra column for me to check which has duplicates and remove accordingly. My Subquery Select statement works on its own, but not when I'm placing it as part of the Update Statement. I am using SSMS v18.7.1 and utilizing the latest SQL DB engine (I believe 2019 Express).Sample Data done with a Group By Query I understand that Update & Group By don't particularly mix well hence why I thought I could use a subquery to perform the requested action. Ideally I would also like to remove these duplicates, but there are other variables such as ApptDate & ActualDelivery Columns; However my only request is to set the Dupchecks to Yes when appropriate and then I will work on the logic for the Deletions subsequently.
Update a
Set Dupcheck = 'Yes'
from [Local DB].[dbo].[Test] a
where (
Select
ID,
count(*) as Count
From [Local DB].[dbo].[Test]
group by UID
having count(*) > 1)

You appear to be using SQL Server. I would suggest an updatable CTE:
with toupdate as
select t.*, count(*) over (partition by uid) as cnt
from [Local DB].[dbo].[Test] t
)
update toupdate
set Dupcheck = 'Yes'
where cnt > 1;
Note: If you want all but one of the rows to have the flag set, then use row_number() rather than count(*).

I think you need to use IN
Update a Set Dupcheck = Yes from [Local DB].[dbo].[Test] a where a. ID in ( Select ID From [Local DB].[dbo].[Test] group by UID having count(*) > 1)

Looking at your query it seems like you want to update all duplicate to be marked with flag as Yes.
You can use the following query to mark all the duplicates as yes:
Update test t
Set t.Dupcheck = 'Yes'
Where exists
(select 1 from Test tt
where t.uid = tt.uid
And t.id <> tt.id);
If you want to mark all except one record as duplicate then you can use > or < instead of <> in exists clause like And t.id > tt.id

Related

INSERT or UPDATE the table from SELECT in sql server

I have a requirement where I have to check if the record for the business date already exists in the table then I need to update the values for that business date from the select statement otherwise I have to insert for that business date from the select statement. Below is my full query where I am only inserting at the moment:
INSERT INTO
gstl_calculated_daily_fee(business_date,fee_type,fee_total,range_id,total_band_count)
select
#tlf_business_date,
'FEE_LOCAL_CARD',
SUM(C.settlement_fees),
C.range_id,
Count(1)
From
(
select
*
from
(
select
rowNumber = #previous_mada_switch_fee_volume_based_count + (ROW_NUMBER() OVER(PARTITION BY DATEPART(MONTH, x_datetime) ORDER BY x_datetime)),
tt.x_datetime
from gstl_trans_temp tt where (message_type_mapping = '0220') and card_type ='GEIDP1' and response_code IN('00','10','11') and tran_amount_req >= 5000 AND merchant_type NOT IN(5542,5541,4829)
) A
CROSS APPLY
(
select
rtt.settlement_fees,
rtt.range_id
From gstl_mada_local_switch_fee_volume_based rtt
where A.rowNumber >= rtt.range_start
AND (A.rowNumber <= rtt.range_end OR rtt.range_end IS NULL)
) B
) C
group by CAST(C.x_datetime AS DATE),C.range_id
I have tried to use the if exists but could not fit in the above full query.
if exists (select
business_date
from gstl_calculated_daily_fee
where
business_date = #tlf_business_date)
UPDATE gstl_calculated_daily_fee
SET fee_total = #total_mada_local_switch_fee_low
WHERE fee_type = 'FEE_LOCAL_CARD'
AND business_date = #tlf_business_date
else
INSERT INTO
Please help.

You need a MERGE statement with a join.
Basically, our issue with MERGE is going to be that we only want to merge against a subset of the target table. To do this, we pre-filter the table as a CTE. We can also put the source table as a CTE.
Be very careful when you write MERGE when using a CTE. You must make sure you fully filter the target within the CTE to what rows you want to merge against, and then match the rows using ON
;with source as (
select
business_date = #tlf_business_date,
fee_total = SUM(C.settlement_fees),
C.range_id,
total_band_count = Count(1)
From
(
select
rowNumber = #previous_mada_switch_fee_volume_based_count + (ROW_NUMBER() OVER(PARTITION BY DATEPART(MONTH, x_datetime) ORDER BY x_datetime)),
tt.x_datetime
from gstl_trans_temp tt where (message_type_mapping = '0220') and card_type ='GEIDP1' and response_code IN('00','10','11') and tran_amount_req >= 5000 AND merchant_type NOT IN(5542,5541,4829)
) A
CROSS APPLY
(
select
rtt.settlement_fees,
rtt.range_id
From gstl_mada_local_switch_fee_volume_based rtt
where A.rowNumber >= rtt.range_start
AND (A.rowNumber <= rtt.range_end OR rtt.range_end IS NULL)
) B
group by CAST(A.x_datetime AS DATE), B.range_id
),
target as (
select
business_date,fee_type,fee_total,range_id,total_band_count
from gstl_calculated_daily_fee
where business_date = #tlf_business_date AND fee_type = 'FEE_LOCAL_CARD'
)
MERGE INTO target t
USING source s
ON t.business_date = s.business_date AND t.range_id = s.range_id
WHEN NOT MATCHED BY TARGET THEN INSERT
(business_date,fee_type,fee_total,range_id,total_band_count)
VALUES
(s.business_date,'FEE_LOCAL_CARD', s.fee_total, s.range_id, s.total_band_count)
WHEN MATCHED THEN UPDATE SET
fee_total = #total_mada_local_switch_fee_low
;
The way a MERGE statement works, is that it basically does a FULL JOIN between the source and target tables, using the ON clause to match. It then applies various conditions to the resulting join and executes statements based on them.
There are three possible conditions you can do:
WHEN MATCHED THEN
WHEN NOT MATCHED [BY TARGET] THEN
WHEN NOT MATCHED BY SOURCE THEN
And three possible statements, all of which refer to the target table: UPDATE, INSERT, DELETE (not all are applicable in all cases obviously).
A common problem is that we would only want to consider a subset of a target table. There a number of possible solutions to this:
We could filter the matching inside the WHEN MATCHED clause e.g. WHEN MATCHED AND target.somefilter = #somefilter. This can often cause a full table scan though.
Instead, we put the filtered target table inside a CTE, and then MERGE into that. The CTE must follow Updatable View rules. We must also select all columns we wish to insert or update to. But we must make sure we are fully filtering the target, otherwise if we issue a DELETE then all rows in the target table will get deleted.

"FOR UPDATE is not allowed with aggregate functions" in PostgreSQL

Here is pseudo code for what I'm trying to do:
rate_count = SELECT COUNT(id) FROM job WHERE last_processed_at >= ?
current_limit = rate_limit - rate_count
if current_limit > 0
UPDATE job SET state='processing'
WHERE id IN(
SELECT id FROM job
WHERE state='pending'
LIMIT :current_limit
)
I have it working except for concurrency issues. When run from multiple sessions at the same time, both sessions SELECT and therefore update the same stuff :(
I'm able to get the 2nd query atomic by adding FOR UPDATE in it's SELECT subquery. But I can't add FOR UPDATE to the first query because FOR UPDATE isn't allowed with aggregate functions
How can I make this piece an atomic transaction?

You can do FOR UPDATE within a subquery:
rate_count := COUNT(id)
FROM (
SELECT id FROM job
WHERE last_processed_at >= ? FOR UPDATE
) a;
You can also do this whole thing in a single query:
UPDATE job SET state='processing'
WHERE id IN (
SELECT id FROM job
WHERE state='pending'
LIMIT (SELECT GREATEST(0, rate_limit - COUNT(id))
FROM (SELECT id FROM job
WHERE last_processed_at >= ? FOR UPDATE) a
)
)

I got the same error below:
ERROR: FOR UPDATE is not allowed with aggregate functions
Because I use count() and FOR UPDATE as shown below:
SELECT count(*) FROM person FOR UPDATE;
So, I changed the query above to one of 2 queries below:
SELECT count(*) FROM (SELECT * FROM person FOR UPDATE) AS result;
WITH result AS (SELECT * FROM person FOR UPDATE) SELECT count(*) FROM result;
Then, I could use count() and FOR UPDATE together:
count
-------
7
(1 row)

How to update specific rows based on identified rows

have identified specific rows based on unique id in the data. I want to update those rows one column. Trying to use update command but its not working
UPDATE L03_A_AVOX_DATA
SET PWC_Exclusion_Flag =
(CASE
WHEN (L03_A_AVOX_DATA.PWC_SEQ_AVOX IN
(SELECT PWC_SEQ_AVOX
FROM L03_A_AVOX_DATA
WHERE client_id IN
(SELECT DISTINCT client_id
FROM ( SELECT DISTINCT
client_id,
extract_type,
COUNT (*)
FROM temp
GROUP BY client_id,
extract_type
HAVING COUNT (*) = 1))
AND extract_type = '0'))
THEN
1
ELSE
L03_A_AVOX_DATA.PWC_Exclusion_Flag
END )
Can anyone help me

You should simplify this statement by trying to simulate an UPDATE with JOIN.
For more details see here:
Update statement with inner join on Oracle
This idea should work for your case too.
So those records which have counterparts in the temp table, you update them.
Those which don't have counterparts - seems you don't want to update them anyway.

You're trying to update the PWC_Exclusion_Flag to 1 if the client_id has exactly 1 record of extract_type 0 in the temp table, am I right?
Try this:
update L03_A_AVOX_DATA
set PWC_Exclusion_Flag = 1
where client_id in (
select client_id
from temp
where extract_type = '0'
group by client_id
having count(1) = 1
);
This also leaves the other records in L03_A_AVOX_DATA untouched.

SQL: How to properly check if a record exists

While reading some SQL Tuning-related documentation, I found this:
SELECT COUNT(*) :
Counts the number of rows.
Often is improperly used to verify the existence of a record.
Is SELECT COUNT(*) really that bad?
What's the proper way to verify the existence of a record?

It's better to use either of the following:
-- Method 1.
SELECT 1
FROM table_name
WHERE unique_key = value;
-- Method 2.
SELECT COUNT(1)
FROM table_name
WHERE unique_key = value;
The first alternative should give you no result or one result, the second count should be zero or one.
How old is the documentation you're using? Although you've read good advice, most query optimizers in recent RDBMS's optimize SELECT COUNT(*) anyway, so while there is a difference in theory (and older databases), you shouldn't notice any difference in practice.

I would prefer not use Count function at all:
IF [NOT] EXISTS ( SELECT 1 FROM MyTable WHERE ... )
<do smth>
For example if you want to check if user exists before inserting it into the database the query can look like this:
IF NOT EXISTS ( SELECT 1 FROM Users WHERE FirstName = 'John' AND LastName = 'Smith' )
BEGIN
INSERT INTO Users (FirstName, LastName) VALUES ('John', 'Smith')
END

You can use:
SELECT 1 FROM MyTable WHERE <MyCondition>
If there is no record matching the condition, the resulted recordset is empty.

You can use:
SELECT 1 FROM MyTable WHERE... LIMIT 1
Use select 1 to prevent the checking of unnecessary fields.
Use LIMIT 1 to prevent the checking of unnecessary rows.

SELECT COUNT(1) FROM MyTable WHERE ...
will loop thru all the records. This is the reason it is bad to use for record existence.
I would use
SELECT TOP 1 * FROM MyTable WHERE ...
After finding 1 record, it will terminate the loop.

The other answers are quite good, but it would also be useful to add LIMIT 1 (or the equivalent, to prevent the checking of unnecessary rows.

You can use:
SELECT COUNT(1) FROM MyTable WHERE ...
or
WHERE [NOT] EXISTS
( SELECT 1 FROM MyTable WHERE ... )
This will be more efficient than SELECT * since you're simply selecting the value 1 for each row, rather than all the fields.
There's also a subtle difference between COUNT(*) and COUNT(column name):
COUNT(*) will count all rows, including nulls
COUNT(column name) will only count non null occurrences of column name

Other option:
SELECT CASE
WHEN EXISTS (
SELECT 1
FROM [MyTable] AS [MyRecord])
THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT)
END

I'm using this way:
IF (EXISTS (SELECT TOP 1 FROM Users WHERE FirstName = 'John'), 1, 0) AS DoesJohnExist

Update based on subquery fails

I am trying to do the following update in Oracle 10gR2:
update
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
set t.port_seq = t.new_seq
Voyage_port_id is the primary key, voyage_id is a foreign key. I'm trying to assign a sequence number based on the dates within each voyage.
However, the above fails with ORA-01732: data manipulation operation not legal on this view
What is the problem and how can I avoid it ?

Since you can't update subqueries with row_number, you'll have to calculate the row number in the set part of the update. At first I tried this:
update voyage_port a
set a.port_seq = (
select
row_number() over (partition by voyage_id order by arrival_date)
from voyage_port b
where b.voyage_port_id = a.voyage_port_id
)
But that doesn't work, because the subquery only selects one row, and then the row_number() is always 1. Using another subquery allows a meaningful result:
update voyage_port a
set a.port_seq = (
select c.rn
from (
select
voyage_port_id
, row_number() over (partition by voyage_id
order by arrival_date) as rn
from voyage_port b
) c
where c.voyage_port_id = a.voyage_port_id
)
It works, but more complex than I'd expect for this task.

You can update some views, but there are restrictions and one is that the view must not contain analytic functions. See SQL Language Reference on UPDATE and search for first occurence of "analytic".
This will work, provided no voyage visits more than one port on the same day (or the dates include a time component that makes them unique):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and vp2.arrival_date <= vp.arrival_date
)
I think this handles the case where a voyage visits more than 1 port per day and there is no time component (though the sequence of ports visited on the same day is then arbitrary):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and (vp2.arrival_date <= vp.arrival_date)
or ( vp2.arrival_date = vp.arrival_date
and vp2.voyage_port_id <= vp.voyage_port_id
)
)

Don't think you can update a derived table, I'd rewrite as:
update voyage_port
set port_seq = t.new_seq
from
voyage_port p
inner join
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
on p.voyage_port_id = t.voyage_port_id

The first token after the UPDATE should be the name of the table to update, then your columns-to-update. I'm not sure what you are trying to achieve with the select statement where it is, but you can' update the result set from the select legally.
A version of the sql, guessing what you have in mind, might look like...
update voyage_port t
set t.port_seq = (<select statement that generates new value of port_seq>)
NOTE: to use a select statement to set a value like this you must make sure only 1 row will be returned from the select !
EDIT : modified statement above to reflect what I was trying to explain. The question has been answered very nicely by Andomar above

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Update Query using a Subquery to mark Duplicates - sql

I think you need to use IN Update a Set Dupcheck = Yes from [Local DB].[dbo].[Test] a where a. ID in ( Select ID From [Local DB].[dbo].[Test] group by UID having count(*) > 1)

Related

INSERT or UPDATE the table from SELECT in sql server

"FOR UPDATE is not allowed with aggregate functions" in PostgreSQL

How to update specific rows based on identified rows

SQL: How to properly check if a record exists

Update based on subquery fails

Categories

Resources