Update based on subquery fails - sql

I am trying to do the following update in Oracle 10gR2:
update
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
set t.port_seq = t.new_seq
Voyage_port_id is the primary key, voyage_id is a foreign key. I'm trying to assign a sequence number based on the dates within each voyage.
However, the above fails with ORA-01732: data manipulation operation not legal on this view
What is the problem and how can I avoid it ?

Since you can't update subqueries with row_number, you'll have to calculate the row number in the set part of the update. At first I tried this:
update voyage_port a
set a.port_seq = (
select
row_number() over (partition by voyage_id order by arrival_date)
from voyage_port b
where b.voyage_port_id = a.voyage_port_id
)
But that doesn't work, because the subquery only selects one row, and then the row_number() is always 1. Using another subquery allows a meaningful result:
update voyage_port a
set a.port_seq = (
select c.rn
from (
select
voyage_port_id
, row_number() over (partition by voyage_id
order by arrival_date) as rn
from voyage_port b
) c
where c.voyage_port_id = a.voyage_port_id
)
It works, but more complex than I'd expect for this task.

You can update some views, but there are restrictions and one is that the view must not contain analytic functions. See SQL Language Reference on UPDATE and search for first occurence of "analytic".
This will work, provided no voyage visits more than one port on the same day (or the dates include a time component that makes them unique):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and vp2.arrival_date <= vp.arrival_date
)
I think this handles the case where a voyage visits more than 1 port per day and there is no time component (though the sequence of ports visited on the same day is then arbitrary):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and (vp2.arrival_date <= vp.arrival_date)
or ( vp2.arrival_date = vp.arrival_date
and vp2.voyage_port_id <= vp.voyage_port_id
)
)

Don't think you can update a derived table, I'd rewrite as:
update voyage_port
set port_seq = t.new_seq
from
voyage_port p
inner join
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
on p.voyage_port_id = t.voyage_port_id

The first token after the UPDATE should be the name of the table to update, then your columns-to-update. I'm not sure what you are trying to achieve with the select statement where it is, but you can' update the result set from the select legally.
A version of the sql, guessing what you have in mind, might look like...
update voyage_port t
set t.port_seq = (<select statement that generates new value of port_seq>)
NOTE: to use a select statement to set a value like this you must make sure only 1 row will be returned from the select !
EDIT : modified statement above to reflect what I was trying to explain. The question has been answered very nicely by Andomar above

Related

CTE and Update needs to be joined

I am fairly new to CTEs but I didnt think what I was trying to do was terribly complicated. Essentially, I want to update a date field with another date field (plus 1 day) and so I thought I would just use a CTE so I could use the temporary table in my update statement.
However, I keep getting the error message Invalid column name 'effdt_end_new'. I created the column name in the CTE, when I do a full reference it says that it cannot bind to it. Any ideas?
.
The SQL code is
WITH GRAB_END_DATE AS
(
SELECT emplid, LEAD(DATEADD(day,1,effdt),1) OVER(PARTITION BY emplid ORDER BY effdt) AS effdt_end_new
FROM [dw].[stage_dim_wd_staff_1]
WHERE emplid = '040089671'
)
UPDATE [dw].[stage_dim_wd_staff_1]
SET effdt_end = effdt_end_new
WHERE emplid = '040089671'`
You need to update the CTE, not the base table
WITH GRAB_END_DATE AS
(
SELECT
*,
LEAD(DATEADD(day, 1, st.effdt)) OVER (PARTITION BY st.emplid ORDER BY st.effdt) AS effdt_end_new
FROM dw.stage_dim_wd_staff_1 st
WHERE emplid = '040089671'
)
UPDATE GRAB_END_DATE
SET effdt_end = effdt_end_new;
Note that there is no join here, the CTE is updated directly, and this feeds through to the base table of the CTE.
You can also do this using a derived table or a view, it's the same thing.
UPDATE GRAB_END_DATE
SET effdt_end = effdt_end_new
FROM (
SELECT
*,
LEAD(DATEADD(day, 1, st.effdt)) OVER (PARTITION BY st.emplid ORDER BY st.effdt) AS effdt_end_new
FROM dw.stage_dim_wd_staff_1 st
WHERE emplid = '040089671'
) GRAB_END_DATE;

Big query De-duplication query is not working properly

anyone please tell me the below query is not working properly, It suppose to delete the duplicate records only and keep the one of them (latest record) but it is deleting all the record instead of keeping one of the duplicate records, why is it so?
delete
from
dev_rahul.page_content_insights
where
(sha_id,
etl_start_utc_dttm) in (
select
(a.sha_id,
a.etl_start_utc_dttm)
from
(
select
sha_id,
etl_start_utc_dttm,
ROW_NUMBER() over (Partition by sha_id
order by
etl_start_utc_dttm desc) as rn
from
dev_rahul.page_content_insights
where
(snapshot_dt) >= '2021-03-25' ) a
where
a.rn <> 1)
Query looks ok, though I don't use that syntax for cleaning up duplicates.
Can I confirm the following:
sha_id, etl_start_utc_dttm is your primary key?
You wish to keep sha_id and the latest row based on etl_start_utc_dttm field descending?
If so, try this two query pattern:
create or replace table dev_rahul.rows_not_to_delete as
SELECT col.* FROM (SELECT ARRAY_AGG(pci ORDER BY etl_start_utc_dttm desc LIMIT 1
) OFFSET(0)] col
FROM dev_rahul.page_content_insights pci
where snapshot_dt >= '2021-03-25' )
GROUP BY sha_id
);
delete dev_rahul.page_content_insights p
where not exists (select 1 from DW_pmo.rows_not_to_delete d
where p.sha_id = d.sha_id and p.etl_start_utc_dttm = d.etl_start_utc_dttm
) and snapshot_dt >= '2021-03-25';
You could do this in a singe query by putting the first statement into a CTE.

Remove duplicate row based on select statement

I have two select statements which is returning duplicated data. What I'm trying to accomplish is to remove a duplicated leg. But I'm having hard times to get to the second row programmatically.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes,i.ABID from inv_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Minutes = i2.Minutes
and i.Uid <> i2.Uid
and i.abid = i2.abid
order by i.EndDate
This select statement returns the following data.
As you can see it returns duplicate rows where minutes are the same ABID is the same but InvID are different. What I need to do is to remove one of the InvID where the criteria matches. Doesn't matter which one.
The second select statement is returning different data.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes from InvoiceLines_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Uid = i2.Uid
and i.Abid <> i2.Abid
and i.Language <> i2.Language
order by i.startdate desc
In this select statement I want to remove an InvID where UID is the same then select the lowest Mintues. In This case, I would remove the following InvIDs: 2537676 , 2537210
My goal is to remove those rows...
I could accomplish this using cursor grab the InvID and remove it by simple delete statement, but I'm trying to stay away from cursors.
Any suggestions on how I can accomplish this?
You can use exists to delete all duplicates except the one with the highest InvID by deleting those rows where another row exists with the same values but with a higher InvID
delete from inv_v
where exists (
select 1 from inv_v i2
where i2.InvID > inv_v.InvID
and i2.minutes = inv_v.minutes
and i2.EndDate = inv_v.EndDate
and i2.abid = inv_v.abid
and i2.uid <> inv_v.uid -- not sure why <> is used here, copied from question
)
I have faced similar problems regarding duplicate data and some one told me to use partition by and other methods but those were causing performance issues
However , I had a primary key in my table through which I was able to select one row from the duplicate data and then delete it.
For example in the first select statement "minutes" and "ABID" are the criteria to consider duplicacy in data.But "Invid" can be used to distinguish between the duplicate rows.
So you can use below query to remove duplicacy.
delete from inv_i where inv_id in (select max(inv_id) from inv_i group by minutes,abid having count(*) > 1 );
This simple concept was helpful to me. It can be helpful in your case if "Inv_id" is unique.
;WITH CTE AS
(
SELECT InvID
,[UID]
,StartDate
,EndDate
,[Minutes]
,ROW_NUMBER() OVER (PARTITION BY InvID, [UID] ORDER BY [Minutes] ASC) rn
FROM InvoiceLines_v
)
SELECT *
FROM CTE
WHERE rn = 1
Replace the ORIGINAL_TABLE with your table name.
QUERY 1:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY minutes, ABID ORDER BY minutes, ABID) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
QUERY 2:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY UID ORDER BY minutes) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;

SQL - Returning CTE with Top 1

I am trying to return a set of results and decided to try my luck with CTE, the first table "Vendor", has a list of references, the second table "TVView", has ticket numbers that were created using a reference from the "Vendor" table. There may be one or more tickets using the same ticket number depending on the state of that ticket and I am wanting to return the last entry for each ticket found in "TVView" that matches a selected reference from "Vendor". Also, the "TVView" table has a seed field that is incremented.
I got this to return the right amount of entries (meaning not showing the duplicate tickets but only once) but I cannot figure out how to add an additional layer to go back through and select the last entry for that ticket and return some other fields. I can figure out how to sum which is actually easy, but I really need the Top 1 of each ticket entry in "TVView" regardless if its a duplicate or not while returning all references from "Vendor". Would be nice if SQL supported "Last"
How do you do that?
Here is what I have done so far:
with cteTickets as (
Select s.Mth2, c.Ticket, c.PyRt from Vendor s
Inner join
TVView c on c.Mth1 = s.Mth1 and c.Vendor = s.Vendor
)
Select Mth2, Ticket, PayRt from cteTickets
Where cteTickets.Vendor >='20'
and cteTickets.Vendor <='40'
and cteTickets.Mth2 ='8/15/2014'
Group by cteTickets.Ticket
order by cteTickets.Ticket
Several rdbms's that support Common Table Expressions (CTE) that I am aware of also support analytic functions, including the very useful ROW_NUMBER(), so the following should work in Oracle, TSQL (MSSQL/Sybase), DB2, PostgreSQL.
In the suggestions the intention is to return just the most recent entry for each ticket found in TVView. This is done by using ROW_NUMBER() which is PARTITIONED BY Ticket that instructs row_number to recommence numbering for each change of the Ticket value. The subsequent ORDER BY Mth1 DESC is used to determine which record within each partition is assigned 1, here it will be the most recent date.
The output of row_number() needs to be referenced by a column alias, so using it in a CTE or derived table permits selection of just the most recent records by RN = 1 which you will see used in both options below:
-- using a CTE
WITH
TVLatest
AS (
SELECT
* -- specify the fields
, ROW_NUMBER() OVER (PARTITION BY Ticket
ORDER BY Mth1 DESC) AS RN
FROM TVView
)
SELECT
Mth2
, Ticket
, PayRt
FROM Vendor v
INNER JOIN TVLatest l ON v.Mth1 = l.Mth1
AND v.Vendor = l.Vendor
AND l.RN = 1
WHERE v.Vendor >= '20'
AND v <= '40'
AND v.Mth2 = '2014-08-15'
ORDER BY
v.Ticket
;
-- using a derived table instead
SELECT
Mth2
, Ticket
, PayRt
FROM Vendor v
INNER JOIN (
SELECT
* -- specify the fields
, ROW_NUMBER() OVER (PARTITION BY Ticket
ORDER BY Mth1 DESC) AS RN
FROM TVView
) TVLatest l ON v.Mth1 = l.Mth1
AND v.Vendor = l.Vendor
AND l.RN = 1
WHERE v.Vendor >= '20'
AND v <= '40'
AND v.Mth2 = '2014-08-15'
ORDER BY
v.Ticket
;
please note: "SELECT *" is a convenience or used as an abbreviation if full details are unknown. The queries above may not operate without correctly specifying the field list (eg. 'as is' they would fail in Oracle).

SQL UPDATE row Number

I have a table serviceClusters with a column identity(1590 values). Then I have another table serviceClustersNew with the columns ID, text and comment. In this table, I have some values for text and comment, the ID is always 1. Here an example for the table:
[1, dummy1, hello1;
1, dummy2, hello2;
1, dummy3, hello3;
etc.]
WhaI want now for the values in the column ID is the continuing index of the table serviceClusters plus the current Row number: In our case, this would be 1591, 1592 and 1593.
I tried to solve the problem like this: First I updated the column ID with the maximum value, then I tryed to add the row number, but this doesnt work:
-- Update ID to the maximum value 1590
UPDATE serviceClustersNew
SET ID = (SELECT MAX(ID) FROM serviceClusters);
-- This command returns the correct values 1591, 1592 and 1593
SELECT ID+ROW_NUMBER() OVER (ORDER BY Text_ID) AS RowNumber
FROM serviceClustersNew
-- But I'm not able to update the table with this command
UPDATE serviceClustersNew
SET ID = (SELECT ID+ROW_NUMBER() OVER (ORDER BY Text_ID) AS RowNumber FROM
serviceClustersNew)
By sending the last command, I get the error "Syntax error: Ordered Analytical Functions are not allowed in subqueries.". Do you have any suggestions, how I could solve the problem? I know I could do it with a volatile table or by adding a column, but is there a way without creating a new table / altering the current table?
You have to rewrite it using UPDATE FROM, the syntax is just a bit bulky:
UPDATE serviceClustersNew
FROM
(
SELECT text_id,
(SELECT MAX(ID) FROM serviceClusters) +
ROW_NUMBER() OVER (ORDER BY Text_ID) AS newID
FROM serviceClustersNew
) AS src
SET ID = newID
WHERE serviceClustersNew.Text_ID = src.Text_ID
You are not dealing with a lot of data, so a correlated subquery can serve the same purpose:
UPDATE serviceClustersNew
SET ID = (select max(ID) from serviceClustersNew) +
(select count(*)
from serviceClustersNew scn2
where scn2.Text_Id <= serviceClustersNew.TextId
)
This assumes that the text_id is unique along the rows.
Apparently you can update a base table through a CTE... had no idea. So, just change your last UPDATE statement to this, and you should be good. Just be sure to include any fields in the CTE that you desire to update.
;WITH cte_TEST AS
( SELECT
ID,
ID+ROW_NUMBER() OVER (ORDER BY TEXT_ID) AS RowNumber FROM serviceClustersNew)
UPDATE cte_TEST
SET cte_TEST.ID = cte_TEST.RowNumber
Source:
http://social.msdn.microsoft.com/Forums/sqlserver/en-US/ee06f451-c418-4bca-8288-010410e8cf14/update-table-using-rownumber-over