select N-1 records for update - sql

I have a query where I want to update n-1 records from result set. Can this be done without loops?
If my query is like this:
with cte(id, count)
as
(
select e.id, count(*) as count
from data
where id in (multiple values)
group by id
having count(*) >1
)
Now I want to update the rows in another table with the resulting id's but only any n-1 rows for each id value from the above query. Something like this:
update top( count-1 or n-1) from data2
inner join cte on data2.id = cte.id
set somecolumn = 'some value'
where id in (select id from cte)
The id column is not unique. There are multiple rows with the same id values in table data 2.

This query will do what you want. It uses two CTEs; the first generates the list of eligible id values to update, and the second generates row numbers for id values in data2 which match those in the first CTE. The second CTE is then updated if the row number is greater than 1 (so only n-1 rows get updated):
with cte(id, count) as (
select id, count(*) as count
from data
where id in (2, 3, 4, 6, 7)
group by id
having count(*) >1
),
cte2 as (
select d.id, d.somecolumn,
row_number() over (partition by d.id order by rand()) as rn
from data2 d
join cte on cte.id = d.id
)
update cte2
set somecolumn = 'some value'
where rn > 1
Note I've chosen to order row numbers randomly, you might have some other scheme for deciding which n-1 values you want to update (e.g. ordered by id, or ...).

Is this what you're looking for? The CTE identifies ALL of the source rows, but the WHEREclause in the UPDATE statement limits the updates to n-1.
WITH cte AS
(
SELECT
id,
ROW_NUMBER() OVER (ORDER BY (SELECT 0)) AS RowNum
FROM data
)
UPDATE t
SET t.<whatever> = <whateverElse>
FROM
otherTable AS t
JOIN
cte AS c
ON t.id = c.id
WHERE
c.RowNum > 1;

I believe this would work just fine
;with cte(id, count)
as
(
select e.id, count(*) as count
from data
where id in (multiple values)
group by id
having count(*) >1
)
update data
set soemcolumn = 'some value'
from data join cte on cte.id = data.id
;

Related

Update column as Duplicate

I have a table with three columns, A, B, and status.
first, I filter the table to get only duplicate value
using this query
SELECT A
FROM Table_1
GROUP BY A
HAVING COUNT(A) >1
the output :
In the second step, I need to check if column B has a duplicate value or not, if have duplicate I need to update the status as D.
I try this query
UPDATE Table_1
SET status = 'D'
WHERE exists
(SELECT B
FROM Table_1
GROUP BY B
HAVING COUNT(B) >1)
but it is updated all the rows.
The following does what you need using row_number to identify any group with a duplicate and an updateable CTE to check for any row that's part of a group with a duplicate:
with d as (
select *, row_number() over(partition by a,b order by a,b) dn
from t
)
update d set d.status='D'
where exists (select * from d d2 where d2.a=d.a and d2.b=d.b and d2.dn>1)
You can do this with an updatable CTE without any further joins by using a windowed COUNT
WITH d AS (
SELECT *,
cnt = COUNT(*) OVER (PARTITION BY a, b)
FROM t
)
UPDATE d
SET status = 'D'
WHERE cnt > 1;

sql: first row after the last row with a property

I would like to write a query that returns the first row immediately after the last row with a given property (ordered by id). Id's may not be consecutive.
Ideally it would look something like this:
...
JOIN (select max(id) id from my_table where CONDITION) m
JOIN (select min(id) from my_table where id > m.id) n
However, I can not use identifier m in the second subselect.
It is possible to use nested queries in nested queries, but is there an easier way?
Thank you.
You could use lead() to get the next id before applying the condition:
select t.*
from my_table t join
(select max(next_id) as max_next_id
from (select t.*, lead(id) over (order by id) as next_id
from my_table t
) t
where <condition>
) tt
on t.id = tt.max_next_id;
You could also do:
select t.*
from my_table t
where t.id > (select max(t2.id) from my_table t2 where <condition>)
order by t2.id asc
fetch first 1 row only;
I am not sure how this is getting woven into the rest of your query, so I have used a CTE
WITH max_next AS (
SELECT r.id as max_id
,r.next_id
FROM (
SELECT m.id
,m.next_id
,ROW_NUMBER() OVER (ORDER BY m.id DESC) AS rn
FROM (
SELECT n.* -- to provide data to satisfy CONDITIONS
,LEAD(n.id) OVER(ORDER BY n.id) as next_id
FROM my_table AS n
) AS m
WHERE CONDITIONS
) AS r
WHERE r.rn = 1
)
I would also shrink the n.* to the columns needed by CONDITIONS to a, not be implicit as the * slows the compile time down (or historically has) as all meta data needs to be read to understand what columns is in the ANY, and the while the compile can also prune not used columns, it's faster if you just ask for what you want (in best case just a compile time savings, worse case, it read all the data when you only need x number of columns read)
And borrowing from Gordon solution, the ROW_NUMBER part could be simpler
WITH max_next AS (
SELECT m.id
,m.next_id
--, plus what ever other things you want from m
FROM (
SELECT n.* -- to satisfy CONDITIONS needs
,LEAD(n.id) OVER(ORDER BY n.id) as next_id
FROM my_table AS n
) AS m
WHERE CONDITIONS
ORDER BY m.id DESC LIMIT 1
)
So for an example for #PIG,
WITH my_table AS (
SELECT column1 AS id
,column2 AS con1
,column3 AS other
FROM VALUES (1,'a',123),(2,'b',234),(3,'a',345),(5,'b',456),(7,'a',567),(10,'c',678)
)
SELECT m.id
,m.next_id
,m.other
FROM (
SELECT n.* -- to satisfy CONDITIONS needs
,LEAD(n.id) OVER(ORDER BY n.id) as next_id
FROM my_table AS n
) AS m
WHERE m.con1 = 'b'
ORDER BY m.id DESC LIMIT 1;
gives 5, 7, 456 which is the last 'b' and the new row, and an extra value on my_table for entertainment purposes (and run on Snowflake to, which means I fixed the prior SQL also.)
This should work, it's pretty straightforward (easy), and it's good that you know records may not be stored in a ordered/consecutive fashion.
SELECT *
FROM my_table
WHERE id = (
SELECT min(id)
FROM my_table
WHERE id > (
SELECT max(id)
FROM my_table
WHERE CONDITION));

Can't find largest duplicate value in SQL Server

I have some multiple duplicate data in my table what I am trying to do I want to fetch only the largest values from the duplicate data.
I added an image for example from which I want to get only the last two row data because the first row's first column value is lower than the others and service ids are same I am trying to do this by counting the data but can't get the final result.
Currently I am using this query to count data
SELECT
ServiceId, COUNT(*) Count_Duplicate
FROM
TestDeleteTable
GROUP BY
ServiceId
HAVING
COUNT(*) > 1
ORDER BY
COUNT(*) DESC
Thanks for any help
Following query should work for you.
SELECT ServiceId,RowId FROM
(
SELECT *, COUNT(ServiceId) OVER(PARTITION BY ServiceId ORDER BY ROWID) CT, ROW_NUMBER() OVER(PARTITION BY ServiceId ORDER BY ROWID) RN
FROM TestDeleteTable
)T
WHERE T.RN> 1 AND T.CT > 1
DEMO
Another approach can be
;WITH CTE AS
(
SELECT ServiceId, MIN(ROWID) M
FROM TestDeleteTable
GROUP BY ServiceId
HAVING COUNT(*) > 1
)
SELECT * FROM TestDeleteTable T
WHERE EXISTS
(
SELECT 1 FROM CTE C WHERE C.ServiceId=T.ServiceId AND T.ROWID > C.M
)
Or simply with a INNER JOIN with CTE like following.
;WITH CTE AS
(
SELECT ServiceId, MIN(ROWID) MinValue, Count(ServiceId) CountService
FROM #t
GROUP BY ServiceId
HAVING COUNT(*) > 1
)
SELECT T.* FROM #T T
INNER JOIN CTE C ON T.ServiceId= C.ServiceId
WHERE C.CountService> 1 AND T.ROWID > C.MinValue

Make value from every second row appear in new 3rd column

Lets assume my data looks like this :
Every second row represents old (previous value) in a table that holds historical data.
table 1 :
id value
------------
1 a
1 b
2 c
2 d
3 a
3 b
and i want to get value of every second row to appear in new 3rd column like this :
table 2:
id new_value old_value
------------------------
1 a b
2 c d
3 a b
EDIT:
For clarity ill post the skeleton of query thats producing data i want to transform (so its clear i am already using WITH so cant use additional one due to oracle not yet allowing nesting of WITH elements) :
skeleton code that produces data in table 1 :
with candidates as
(
--select list of candidates
)
SELECT * FROM
(
(
--select new values
MINUS
--select old values
)
UNION
(
--select old values
MINUS
--select new values
)
)
ORDER BY id;
The goal is to finally get only a list of ids that changed with their old and new values.
Thanks in advance.
Use CTE
;WITH CTE AS(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) RN
FROM TableName
)
SELECT ID,
MIN(CASE WHEN RN=1 THEN [value] END) NewValue,
MIN(CASE WHEN RN=2 THEN [value] END) OldValue
FROM CTE
GROUP BY ID
It is quite possible that overall query can be written in a much simpler way. Just join intermediary results with old and new values together on id to put them in two different columns instead of unioning them into the same column.
WITH
candidates
AS
(
--select list of candidates
)
,CTE_NewValues
AS
(
--select new values
select id, value AS new_value
FROM candidates
WHERE ...
-- assumes id is unique, one row per id
)
,CTE_OldValues
AS
(
--select old values
select id, value AS old_value
FROM candidates
WHERE ...
-- assumes id is unique, one row per id
)
SELECT
CTE_NewValues.id
,CTE_NewValues.new_value
,CTE_OldValues.old_value
FROM
CTE_NewValues
INNER JOIN CTE_OldValues ON CTE_NewValues.id = CTE_OldValues.id
WHERE
CTE_NewValues.new_value <> CTE_OldValues.old_value
ORDER BY
CTE_NewValues.id;
If we stick to the skeleton of the query in the question, there are also many ways to do it. Self-join is likely to be less efficient than using analytic functions, like ROW_NUMBER and LEAD.
Sorting just by id is not enough to unambiguously define which value is new or old. You need to have some extra column to resolve it.
You don't "nest" WITH (common-table expressions), you "chain" them. Something like the following. As you do that, make sure to add the sort_order column to be able to distinguish old and new values, if you don't have a similar column already.
WITH
candidates
AS
(
--select list of candidates
)
,CTE_YourQuery
AS
(
SELECT * FROM
(
(
--select new values
select 1 AS sort_order, id, value
MINUS
--select old values
select 1 AS sort_order, id, value
)
UNION ALL
(
--select old values
select 2 AS sort_order, id, value
MINUS
--select new values
select 2 AS sort_order, id, value
)
)
)
,CTE_RowNumber
AS
(
SELECT
id
,value AS new_value
,ROW_NUMBER() OVER (PARTITION BY id ORDER BY sort_order) AS rn
,LEAD(value) OVER (PARTITION BY id ORDER BY sort_order) AS old_value
FROM CTE_YourQuery
)
SELECT
id
,new_value
,old_value
FROM CTE_RowNumber
WHERE rn = 1
ORDER BY id;
Assuming there is some other column which defines the "order" in which the new and old value appears, you can do this:
select t1.id, t1.value as old_value, t2.value as new_value
from the_table t1
join the_table t2 on t1.id = t2.id and t1.sort_order < t2.sort_order
But you have to have some column that distinguishes the row that is considered "old" from the one that is considered "new".

remove rows with some duplicate column value

Suppose I have a table with column A like following :
a
--
x
y
m
x
n
y
I want to delete all rows that have duplicate a column value and keep just one value.
After this operation, my column would be like If you do :
select distinct a from A;
I know how to select rows with repeated a column values But I can't just replace select with DELETE because it would delete the unique values too.
Any help would be greatly appreciated.
In Oracle, you can do this by using the hidden column rowid and a correlated subquery:
delete from a
where rowid > (select min(rowid)
from a a2
where a.a = a2.a
);
Alternatively, you can phrase this as a not in:
delete from a
where rowid not in (select min(rowid)
from a a2
group by a2.a
);
You can use combination of CTE and Ranking function
;With cte As
(
Select ROW_NUMBER() OVER (PARTITION BY colA ORDER BY colA) as rNum
From yourTable
)
Delete From cte
Where rNum<>1
In SQL, You can use CTE and delete the duplicated rows. See the query below.
WITH CTE AS(
SELECT a,
RN = ROW_NUMBER()OVER(PARTITION BY a ORDER BY a)
FROM A
)
DELETE FROM CTE WHERE RN > 1