CTE and Update needs to be joined - sql

I am fairly new to CTEs but I didnt think what I was trying to do was terribly complicated. Essentially, I want to update a date field with another date field (plus 1 day) and so I thought I would just use a CTE so I could use the temporary table in my update statement.
However, I keep getting the error message Invalid column name 'effdt_end_new'. I created the column name in the CTE, when I do a full reference it says that it cannot bind to it. Any ideas?
.
The SQL code is
WITH GRAB_END_DATE AS
(
SELECT emplid, LEAD(DATEADD(day,1,effdt),1) OVER(PARTITION BY emplid ORDER BY effdt) AS effdt_end_new
FROM [dw].[stage_dim_wd_staff_1]
WHERE emplid = '040089671'
)
UPDATE [dw].[stage_dim_wd_staff_1]
SET effdt_end = effdt_end_new
WHERE emplid = '040089671'`

You need to update the CTE, not the base table
WITH GRAB_END_DATE AS
(
SELECT
*,
LEAD(DATEADD(day, 1, st.effdt)) OVER (PARTITION BY st.emplid ORDER BY st.effdt) AS effdt_end_new
FROM dw.stage_dim_wd_staff_1 st
WHERE emplid = '040089671'
)
UPDATE GRAB_END_DATE
SET effdt_end = effdt_end_new;
Note that there is no join here, the CTE is updated directly, and this feeds through to the base table of the CTE.
You can also do this using a derived table or a view, it's the same thing.
UPDATE GRAB_END_DATE
SET effdt_end = effdt_end_new
FROM (
SELECT
*,
LEAD(DATEADD(day, 1, st.effdt)) OVER (PARTITION BY st.emplid ORDER BY st.effdt) AS effdt_end_new
FROM dw.stage_dim_wd_staff_1 st
WHERE emplid = '040089671'
) GRAB_END_DATE;

Related

Big query De-duplication query is not working properly

anyone please tell me the below query is not working properly, It suppose to delete the duplicate records only and keep the one of them (latest record) but it is deleting all the record instead of keeping one of the duplicate records, why is it so?
delete
from
dev_rahul.page_content_insights
where
(sha_id,
etl_start_utc_dttm) in (
select
(a.sha_id,
a.etl_start_utc_dttm)
from
(
select
sha_id,
etl_start_utc_dttm,
ROW_NUMBER() over (Partition by sha_id
order by
etl_start_utc_dttm desc) as rn
from
dev_rahul.page_content_insights
where
(snapshot_dt) >= '2021-03-25' ) a
where
a.rn <> 1)
Query looks ok, though I don't use that syntax for cleaning up duplicates.
Can I confirm the following:
sha_id, etl_start_utc_dttm is your primary key?
You wish to keep sha_id and the latest row based on etl_start_utc_dttm field descending?
If so, try this two query pattern:
create or replace table dev_rahul.rows_not_to_delete as
SELECT col.* FROM (SELECT ARRAY_AGG(pci ORDER BY etl_start_utc_dttm desc LIMIT 1
) OFFSET(0)] col
FROM dev_rahul.page_content_insights pci
where snapshot_dt >= '2021-03-25' )
GROUP BY sha_id
);
delete dev_rahul.page_content_insights p
where not exists (select 1 from DW_pmo.rows_not_to_delete d
where p.sha_id = d.sha_id and p.etl_start_utc_dttm = d.etl_start_utc_dttm
) and snapshot_dt >= '2021-03-25';
You could do this in a singe query by putting the first statement into a CTE.

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Remove duplicate row based on select statement

I have two select statements which is returning duplicated data. What I'm trying to accomplish is to remove a duplicated leg. But I'm having hard times to get to the second row programmatically.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes,i.ABID from inv_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Minutes = i2.Minutes
and i.Uid <> i2.Uid
and i.abid = i2.abid
order by i.EndDate
This select statement returns the following data.
As you can see it returns duplicate rows where minutes are the same ABID is the same but InvID are different. What I need to do is to remove one of the InvID where the criteria matches. Doesn't matter which one.
The second select statement is returning different data.
select i.InvID, i.UID, i.StartDate, i.EndDate, i.Minutes from InvoiceLines_v i, InvoiceLines_v i2 where
i.Period = '2014/08'
and i.EndDate = i2.EndDate
and i.Uid = i2.Uid
and i.Abid <> i2.Abid
and i.Language <> i2.Language
order by i.startdate desc
In this select statement I want to remove an InvID where UID is the same then select the lowest Mintues. In This case, I would remove the following InvIDs: 2537676 , 2537210
My goal is to remove those rows...
I could accomplish this using cursor grab the InvID and remove it by simple delete statement, but I'm trying to stay away from cursors.
Any suggestions on how I can accomplish this?
You can use exists to delete all duplicates except the one with the highest InvID by deleting those rows where another row exists with the same values but with a higher InvID
delete from inv_v
where exists (
select 1 from inv_v i2
where i2.InvID > inv_v.InvID
and i2.minutes = inv_v.minutes
and i2.EndDate = inv_v.EndDate
and i2.abid = inv_v.abid
and i2.uid <> inv_v.uid -- not sure why <> is used here, copied from question
)
I have faced similar problems regarding duplicate data and some one told me to use partition by and other methods but those were causing performance issues
However , I had a primary key in my table through which I was able to select one row from the duplicate data and then delete it.
For example in the first select statement "minutes" and "ABID" are the criteria to consider duplicacy in data.But "Invid" can be used to distinguish between the duplicate rows.
So you can use below query to remove duplicacy.
delete from inv_i where inv_id in (select max(inv_id) from inv_i group by minutes,abid having count(*) > 1 );
This simple concept was helpful to me. It can be helpful in your case if "Inv_id" is unique.
;WITH CTE AS
(
SELECT InvID
,[UID]
,StartDate
,EndDate
,[Minutes]
,ROW_NUMBER() OVER (PARTITION BY InvID, [UID] ORDER BY [Minutes] ASC) rn
FROM InvoiceLines_v
)
SELECT *
FROM CTE
WHERE rn = 1
Replace the ORIGINAL_TABLE with your table name.
QUERY 1:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY minutes, ABID ORDER BY minutes, ABID) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;
QUERY 2:
WITH DUP_TABLE AS
(
SELECT ROW_NUMBER()
OVER (PARTITION BY UID ORDER BY minutes) As ROW_NO
FROM <ORIGINAL_TABLE>
)
DELETE FROM DUP_TABLE WHERE ROW_NO > 1;

SQL - Returning CTE with Top 1

I am trying to return a set of results and decided to try my luck with CTE, the first table "Vendor", has a list of references, the second table "TVView", has ticket numbers that were created using a reference from the "Vendor" table. There may be one or more tickets using the same ticket number depending on the state of that ticket and I am wanting to return the last entry for each ticket found in "TVView" that matches a selected reference from "Vendor". Also, the "TVView" table has a seed field that is incremented.
I got this to return the right amount of entries (meaning not showing the duplicate tickets but only once) but I cannot figure out how to add an additional layer to go back through and select the last entry for that ticket and return some other fields. I can figure out how to sum which is actually easy, but I really need the Top 1 of each ticket entry in "TVView" regardless if its a duplicate or not while returning all references from "Vendor". Would be nice if SQL supported "Last"
How do you do that?
Here is what I have done so far:
with cteTickets as (
Select s.Mth2, c.Ticket, c.PyRt from Vendor s
Inner join
TVView c on c.Mth1 = s.Mth1 and c.Vendor = s.Vendor
)
Select Mth2, Ticket, PayRt from cteTickets
Where cteTickets.Vendor >='20'
and cteTickets.Vendor <='40'
and cteTickets.Mth2 ='8/15/2014'
Group by cteTickets.Ticket
order by cteTickets.Ticket
Several rdbms's that support Common Table Expressions (CTE) that I am aware of also support analytic functions, including the very useful ROW_NUMBER(), so the following should work in Oracle, TSQL (MSSQL/Sybase), DB2, PostgreSQL.
In the suggestions the intention is to return just the most recent entry for each ticket found in TVView. This is done by using ROW_NUMBER() which is PARTITIONED BY Ticket that instructs row_number to recommence numbering for each change of the Ticket value. The subsequent ORDER BY Mth1 DESC is used to determine which record within each partition is assigned 1, here it will be the most recent date.
The output of row_number() needs to be referenced by a column alias, so using it in a CTE or derived table permits selection of just the most recent records by RN = 1 which you will see used in both options below:
-- using a CTE
WITH
TVLatest
AS (
SELECT
* -- specify the fields
, ROW_NUMBER() OVER (PARTITION BY Ticket
ORDER BY Mth1 DESC) AS RN
FROM TVView
)
SELECT
Mth2
, Ticket
, PayRt
FROM Vendor v
INNER JOIN TVLatest l ON v.Mth1 = l.Mth1
AND v.Vendor = l.Vendor
AND l.RN = 1
WHERE v.Vendor >= '20'
AND v <= '40'
AND v.Mth2 = '2014-08-15'
ORDER BY
v.Ticket
;
-- using a derived table instead
SELECT
Mth2
, Ticket
, PayRt
FROM Vendor v
INNER JOIN (
SELECT
* -- specify the fields
, ROW_NUMBER() OVER (PARTITION BY Ticket
ORDER BY Mth1 DESC) AS RN
FROM TVView
) TVLatest l ON v.Mth1 = l.Mth1
AND v.Vendor = l.Vendor
AND l.RN = 1
WHERE v.Vendor >= '20'
AND v <= '40'
AND v.Mth2 = '2014-08-15'
ORDER BY
v.Ticket
;
please note: "SELECT *" is a convenience or used as an abbreviation if full details are unknown. The queries above may not operate without correctly specifying the field list (eg. 'as is' they would fail in Oracle).

Update based on subquery fails

I am trying to do the following update in Oracle 10gR2:
update
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
set t.port_seq = t.new_seq
Voyage_port_id is the primary key, voyage_id is a foreign key. I'm trying to assign a sequence number based on the dates within each voyage.
However, the above fails with ORA-01732: data manipulation operation not legal on this view
What is the problem and how can I avoid it ?
Since you can't update subqueries with row_number, you'll have to calculate the row number in the set part of the update. At first I tried this:
update voyage_port a
set a.port_seq = (
select
row_number() over (partition by voyage_id order by arrival_date)
from voyage_port b
where b.voyage_port_id = a.voyage_port_id
)
But that doesn't work, because the subquery only selects one row, and then the row_number() is always 1. Using another subquery allows a meaningful result:
update voyage_port a
set a.port_seq = (
select c.rn
from (
select
voyage_port_id
, row_number() over (partition by voyage_id
order by arrival_date) as rn
from voyage_port b
) c
where c.voyage_port_id = a.voyage_port_id
)
It works, but more complex than I'd expect for this task.
You can update some views, but there are restrictions and one is that the view must not contain analytic functions. See SQL Language Reference on UPDATE and search for first occurence of "analytic".
This will work, provided no voyage visits more than one port on the same day (or the dates include a time component that makes them unique):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and vp2.arrival_date <= vp.arrival_date
)
I think this handles the case where a voyage visits more than 1 port per day and there is no time component (though the sequence of ports visited on the same day is then arbitrary):
update voyage_port vp
set vp.port_seq =
( select count(*)
from voyage_port vp2
where vp2.voyage_id = vp.voyage_id
and (vp2.arrival_date <= vp.arrival_date)
or ( vp2.arrival_date = vp.arrival_date
and vp2.voyage_port_id <= vp.voyage_port_id
)
)
Don't think you can update a derived table, I'd rewrite as:
update voyage_port
set port_seq = t.new_seq
from
voyage_port p
inner join
(select voyage_port_id, voyage_id, arrival_date, port_seq,
row_number() over (partition by voyage_id order by arrival_date) as new_seq
from voyage_port) t
on p.voyage_port_id = t.voyage_port_id
The first token after the UPDATE should be the name of the table to update, then your columns-to-update. I'm not sure what you are trying to achieve with the select statement where it is, but you can' update the result set from the select legally.
A version of the sql, guessing what you have in mind, might look like...
update voyage_port t
set t.port_seq = (<select statement that generates new value of port_seq>)
NOTE: to use a select statement to set a value like this you must make sure only 1 row will be returned from the select !
EDIT : modified statement above to reflect what I was trying to explain. The question has been answered very nicely by Andomar above