Update with row_number() not working, why?

Update with row_number() not working, why? - sql

I have the following table:
CREATE TABLE t_overview
(
obj_uid uuid,
obj_parent_uid uuid,
obj_no integer,
obj_text text,
obj_path text,
isdir integer,
intid bigint,
intparentid bigint
)
I want to move from uuid to bigint and created the new columns intid and intparentid. I need a unique integer (obj_uid is the primary key) for intid, so I just wanted to update with row_number() over (order by ...).
Did not seem to work. So I tried to write the results into a temp table, and update via join. But I got 1 for every intid.
But when I select from the join where I do the update, I get 1, 2, 3, 4, 5, 6 etc. What am I missing?
DROP TABLE IF EXISTS mytable;
CREATE TEMP TABLE mytable AS
WITH CTE AS
(
SELECT obj_uid, obj_parent_uid, obj_no
, obj_text, obj_path, isdir
, intid as cteIntId
, intparentid as cteParentId
, row_number() over (order by obj_uid) as rn
FROM T_Overview
)
SELECT * FROM CTE;
UPDATE T_Overview SET intid = mytable.rn
FROM T_Overview AS bt
INNER JOIN mytable
ON mytable.obj_uid = bt.obj_uid
-- UPDATE T_Overview SET intid = CTE.rn FROM CTE;
-- UPDATE T_Overview SET intparentid = CTE.intid FROM CTE;

#Frank already provided an explanation for your error.
But you don't need the temporary table at all:
BEGIN;
LOCK T_Overview; -- if there is concurrent write access
WITH cte AS (
SELECT obj_uid, obj_parent_uid
, row_number() OVER (ORDER BY obj_uid) AS intid
FROM T_Overview
)
UPDATE T_Overview t
SET intid = upd.intid
, intparentid = upd.intparentid
FROM (
SELECT t1.*, t2.intid AS intparentid
FROM cte t1
LEFT JOIN cte t2 ON t2.obj_uid = t1.obj_parent_uid
) upd
WHERE t.obj_uid = upd.obj_uid;
COMMIT;
The transaction wrapper and the explicit lock are only needed if there can be concurrent write access. (Even more so with a temp table, where you have a much bigger time slot in between.)
Assuming referential integrity - a FK constraint from T_Overview.obj_parent_uid to T_Overview.obj_uid. NULL values in obj_parent_uid are translated to NULL in intparentid.

Your update is wrong, there is no relation between T_Overview and (T_Overview + mytable).
This one should work:
UPDATE T_Overview SET intid = mytable.rn
FROM mytable
WHERE mytable.obj_uid = T_Overview.obj_uid;
Offtopic: Your CTE doesn't make much sense, a plain SELECT would give you the same results.

Related

How to update statement in snowflake using with CTE (WITH CTE- common_table_expression)?

I am trying to update the one field in the final table. i have logic down here. I have to update the last_version_flag when it meets certain scenario. how to use WITH CTE concept in snowflake. thank you in advance.
update dw.tb_fidctp_order
set last_version_flag = 'N'
from dw.tb_fidctp_order
where (with my_cte as (
select order_id, MAX(cast(VERSION as NUMBER(18,0))) as max_version
from stg.tb_fidctp_order_input
group by order_id))
DW.tb_fidctp_order.order_id = my_cte.order_id and DW.tb_fidctp_order.version < my_cte.max_version

The correct syntax is UPDATE target_table
SET col_name = value
FROM additional_tables
WHERE condition :
CREATE TABLE tb_fidctp_order(version INT, order_id INT, last_version_flag TEXT);
CREATE TABLE tb_fidctp_order_input(version INT,order_id INT, last_version_flag TEXT);
update tb_fidctp_order
set last_version_flag = 'N'
from (
WITH cte AS (
select order_id, MAX(cast(VERSION as NUMBER(18,0))) as max_version
from tb_fidctp_order_input
group by order_id
)
SELECT * FROM cte
) AS my_cte
where tb_fidctp_order.order_id = my_cte.order_id
and tb_fidctp_order.version < my_cte.max_version;

How do I return the count for both matching column in a single return

I'm trying to get a number of count for exist and non-exist for my validation to insert my data. As refer to the image below, I'm inserting 3 data in which 1 data consist of same ID number as destination table, 1 data consist of same quotation number as destination table and the last data is a new entry.
Okay here's my requirement. I am sending my source table data to insert into the dest table. So before inserting, I want to do validation by mapping the entire dest table like this:
SELECT COUNT(*) FROM sourceTable WHERE exists(
SELECT * FROM sourceTable WHERE QuotationId IN
(SELECT A.QuotationId FROM sourceTable A
JOIN DestTable B ON A.QuotationId = B.QuotationId
JOIN DestTable C ON A.QuotationNum = C.QuotationNum)

Without further details on table structure and so on it's difficult to tell, but something like the following might do the trick:
WITH cteMatch AS(
SELECT s.QuotationID AS src, d.QuotationID as dst
FROM sourceTable s
LEFT JOIN destTable d
)
SELECT CASE WHEN dst IS NULL THEN N'NonExist' ELSE N'Exist' END AS ValExist, COUNT(*) cnt
FROM cteMatch
GROUP BY CASE WHEN dst IS NULL THEN N'NonExist' ELSE N'Exist' END

You seem to want:
select count(*)
from sourcetable s
where exists(
select 1
from desttable d
where d.quotationid = s.quotationid or d.quotationnum = s.quotationnum
)
This counts how many rows in the source table have a quotation id or num that exists in the target table. If you want the count of both existing and non-existing rows, I would recommend:
select sum(flag) as cnt_exists, sum(1 - flag) as cnt_not_exists
from (
select
case when exists (
select 1
from desttable d
where d.quotationid = s.quotationid or d.quotationnum = s.quotationnum
) then 1 else 0 end as flag
from sourcetable s
) t

Stored procedure order of execution

CREATE PROCEDURE [dbo].[uspGetLogs]
(
#StartDate DATETIME,
#EndDate DATETIME
)
AS
SELECT
sl.ID,
LOG10(sl.Value)
FROM
dbo.SampleList sl
INNER JOIN
(
SELECT
ID,
RANK() OVER(PARTITION BY Codec ORDER BY TimeStampUTC DESC, d.ID DESC) ranked
FROM
dbo.SampleList
WHERE
ListDate BETWEEN #StartDate AND #EndDate
) r
ON
r.ID = sl.ID AND
r.ranked = 1
I tried this stored procedure with this #StartDate = 2014-01-29 #EndDate = 2015-03-14.
And gets this error
An invalid floating point operation occurred
The reason of the error "An Invalid floating point operation occured" is the invalid usage of mathematical function.
SELECT LOG10(-3);
SELECT LOG10(0);
If the above functions are run it will return the error.
I able to get a single value from the whole table set where value is less than one. But the ListDate for that value is 2015-03-14 so it should not be included because it is not coverted by the date range passed in the stored procedure.
So it seems that the stored procedure executes the function in the whole set first before joining and filtering the dataset with date range.
Is this expected?

I think there could be a data issue here as the basic logic of your code doesn't cause an issue, see the below sample:
CREATE TABLE #temp1 ( id INT, val INT )
CREATE TABLE #temp2 ( id INT, val INT )
INSERT INTO #temp1 ( id, val )
VALUES ( 1, 1 ), ( 2, 10 ), ( 3, -1 ) -- Negative value for id=3 exculded in subquery
INSERT INTO #temp2 ( id, val )
VALUES ( 1, 1 ), ( 2, 10 ), ( 3, 20 )
SELECT t1.id ,
LOG10(t1.val) AS Val
FROM #temp1 t1
INNER JOIN ( SELECT * ,
RANK() OVER ( PARTITION BY id ORDER BY val ) ranked
FROM #temp2
WHERE id BETWEEN 1 AND 2 -- excludes id 3
) t2 ON t2.id = t1.id
AND t2.ranked = 1
DROP TABLE #temp1
DROP TABLE #temp2
Produces:
id val
1 0
2 1
If you modify the BETWEEN clause to WHERE id BETWEEN 1 AND 3, you do see the error as the negative value is included.
So I'd triple check the data and if there's still an issue, try to post a small sample that recreates the issue.

I dont think there's a guaranteed point where function gets executed, so my answer is it depends, it depends on query plan, on how "early" the optimizer decides to execute the function - before or after the join. To make sure the function is only executed for valid values, you can change to:
CASE WHEN sl.Value < 1 THEN 0 ELSE LOG10(sl.Value) END

Delete older from a duplicate select

I have been working on a query to search and delete duplicate column values. Currently I have this query (returns duplicates):
SELECT NUIP, FECHA_REGISTRO
FROM registros_civiles_nacimiento
WHERE NUIP IN (
SELECT NUIP
FROM registros_civiles_nacimiento
GROUP BY NUIP
HAVING (COUNT(NUIP) > 1)
) order by NUIP
This work returning a table like this:
NUIP FECHA_REGISTRO
38120100138 1975-05-30
38120100138 1977-08-31
40051800275 1980-09-24
40051800275 1999-11-29
42110700118 1972-10-26
42110700118 1982-04-22
44030700535 1982-10-19
44030700535 1993-05-05
46072300777 1991-01-17
46072300777 1979-03-30
The thing is that I need to delete the rows with duplicate column values. But I need to delete the row with the oldest date, for example, for the given result, once the needed query is performed, this is the list of result that must be kept:
NUIP FECHA_REGISTRO
38120100138 1977-08-31
40051800275 1999-11-29
42110700118 1982-04-22
44030700535 1993-05-05
46072300777 1991-01-17
How can I do this using plain SQL?

--PULL YOUR SELECT OF RECS WITH DUPES INTO A TEMP TABLE
--(OR CREATE A NEW TABLE SO THAT YOU CAN KEEP THEM AROUND FOR LATER IN CASE)
SELECT NUIP,FECHA_REGISTRO
INTO #NUIP
FROM SO_NUIP
WHERE NUIP IN (
SELECT NUIP
FROM SO_NUIP
GROUP BY NUIP
HAVING (COUNT(NUIP) > 1)
)
--CREATE FLAG FOR DETERMINIG DUPES
ALTER TABLE #NUIP ADD DUPLICATETOREMOVE bit
--USE `RANK()` TO SET FLAG
UPDATE #NUIP
SET DUPLICATETOREMOVE = CASE X.RANK
WHEN 1 THEN 1
ELSE 0
END
--SELECT *
FROM #NUIP A
INNER JOIN (SELECT NUIP,FECHA_REGISTRO,RANK() OVER (PARTITION BY [NUIP] ORDER BY FECHA_REGISTRO ASC) AS RANK
FROM #NUIP) X ON X.NUIP = A.NUIP AND X.FECHA_REGISTRO = A.FECHA_REGISTRO
--HERE IS YOUR DELETE LIST
SELECT *
FROM so_registros_civiles_nacimiento R
JOIN #NUIP N ON N.NUIP = R.NUIP AND N.FECHA_REGISTRO = R.FECHA_REGISTRO
WHERE N.DUPLICATETOREMOVE = 1
--HERE IS YOUR KEEP LIST
SELECT *
FROM so_registros_civiles_nacimiento R
JOIN #NUIP N ON N.NUIP = R.NUIP AND N.FECHA_REGISTRO = R.FECHA_REGISTRO
WHERE N.DUPLICATETOREMOVE = 0
--ZAP THEM AND COMMIT YOUR TRANSACTION, YOU'VE STILL GOT A REC OF THE DELETEDS FOR AS LONG AS THE SCOPE OF YOUR #NUIP
BEGIN TRAN --COMMIT --ROLLBACK
DELETE FROM so_registros_civiles_nacimiento
JOIN #NUIP N ON N.NUIP = R.NUIP AND N.FECHA_REGISTRO = R.FECHA_REGISTRO
WHERE N.DUPLICATETOREMOVE = 1

You can use analytical functions for this:
;WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY NUIP ORDER BY FECHA_REGISTRO DESC) RN
FROM registros_civiles_nacimiento
)
DELETE FROM CTE
WHERE RN > 1;

Use RANK() to create the result set ordered by date
Use WHERE EXISTS to delete from the source.
(Note: if you run the rank function over your duplicates, you should get your results. I've just referred to the whole table below)
This statement works in Oracle (replace the select * with delete if it works for you:
SELECT *
FROM registros_civiles_nacimiento ALL_
WHERE EXISTS
(SELECT * FROM
(SELECT * FROM
(SELECT NUIP,
FECHA_REGISTRO,
RANK() OVER (PARTITION BY NUIP ORDER BY FECHA_REGISTRO) AS ORDER_
FROM registros_civiles_nacimiento)
WHERE ORDER_ = 1) OLDEST
WHERE ALL_.NUIP = OLDEST.NUIP
AND ALL_.FECHA_REGISTRO = OLDEST.FECHA_REGISTRO);

SQL: How to update multiple fields so empty field content is moved to the logically last columns - lose blank address lines

I have three address line columns, aline1, aline2, aline3 for a street
address. As staged from inconsistent data, any or all of them can be
blank. I want to move the first non-blank to addrline1, 2nd non-blank
to addrline2, and clear line 3 if there aren't three non blank lines,
else leave it. ("First" means aline1 is first unless it's blank,
aline2 is first if aline1 is blank, aline3 is first if aline1 and 2
are both blank)
The rows in this staging table do not have a key and there could be
duplicate rows. I could add a key.
Not counting a big case statement that enumerates the possible
combination of blank and non blank and moves the fields around, how
can I update the table? (This same problem comes up with a lot more
than 3 lines, so that's why I don't want to use a case statement)
I'm using Microsoft SQL Server 2008

Another alternative. It uses the undocumented %%physloc%% function to work without a key. You would be much better off adding a key to the table.
CREATE TABLE #t
(
aline1 VARCHAR(100),
aline2 VARCHAR(100),
aline3 VARCHAR(100)
)
INSERT INTO #t VALUES(NULL, NULL, 'a1')
INSERT INTO #t VALUES('a2', NULL, 'b2')
;WITH cte
AS (SELECT *,
MAX(CASE WHEN RN=1 THEN value END) OVER (PARTITION BY %%physloc%%) AS new_aline1,
MAX(CASE WHEN RN=2 THEN value END) OVER (PARTITION BY %%physloc%%) AS new_aline2,
MAX(CASE WHEN RN=3 THEN value END) OVER (PARTITION BY %%physloc%%) AS new_aline3
FROM #t
OUTER APPLY (SELECT ROW_NUMBER() OVER (ORDER BY CASE WHEN value IS NULL THEN 1 ELSE 0 END, idx) AS
RN, idx, value
FROM (VALUES(1,aline1),
(2,aline2),
(3,aline3)) t (idx, value)) d)
UPDATE cte
SET aline1 = new_aline1,
aline2 = new_aline2,
aline3 = new_aline3
SELECT *
FROM #t
DROP TABLE #t

Here's an alternative
Sample table for discussion, don't worry about the nonsensical data, they just need to be null or not
create table taddress (id int,a varchar(10),b varchar(10),c varchar(10));
insert taddress
select 1,1,2,3 union all
select 2,1, null, 3 union all
select 3,null, 1, 2 union all
select 4,null,null,2 union all
select 5,1, null, null union all
select 6,null, 4, null
The query, which really just normalizes the data
;with tmp as (
select *, rn=ROW_NUMBER() over (partition by t.id order by sort)
from taddress t
outer apply
(
select 1, t.a where t.a is not null union all
select 2, t.b where t.b is not null union all
select 3, t.c where t.c is not null
--- EXPAND HERE
) u(sort, line)
)
select t0.id, t1.line, t2.line, t3.line
from taddress t0
left join tmp t1 on t1.id = t0.id and t1.rn=1
left join tmp t2 on t2.id = t0.id and t2.rn=2
left join tmp t3 on t3.id = t0.id and t3.rn=3
--- AND HERE
order by t0.id
EDIT - for the update back into table
;with tmp as (
select *, rn=ROW_NUMBER() over (partition by t.id order by sort)
from taddress t
outer apply
(
select 1, t.a where t.a is not null union all
select 2, t.b where t.b is not null union all
select 3, t.c where t.c is not null
--- EXPAND HERE
) u(sort, line)
)
UPDATE taddress
set a = t1.line,
b = t2.line,
c = t3.line
from taddress t0
left join tmp t1 on t1.id = t0.id and t1.rn=1
left join tmp t2 on t2.id = t0.id and t2.rn=2
left join tmp t3 on t3.id = t0.id and t3.rn=3

Update - Changed statement to an Update statement. Removed Case statement solution
With this solution, you will need a unique key in the staging table.
With Inputs As
(
Select PK, 1 As LineNum, aline1 As Value
From StagingTable
Where aline1 Is Not Null
Union All
Select PK, 2, aline2
From StagingTable
Where aline2 Is Not Null
Union All
Select PK, 3, aline3
From StagingTable
Where aline3 Is Not Null
)
, ResequencedInputs As
(
Select PK, Value
, Row_Number() Over( Order By LineNum ) As LineNum
From Inputs
)
, NewValues As
(
Select S.PK
, Min( Case When R.LineNum = 1 Then R.addrline1 End ) As addrline1
, Min( Case When R.LineNum = 2 Then R.addrline1 End ) As addrline2
, Min( Case When R.LineNum = 3 Then R.addrline1 End ) As addrline3
From StagingTable As S
Left Join ResequencedInputs As R
On R.PK = S.PK
Group By S.PK
)
Update OtherTable
Set addrline1 = T2.addrline1
, addrline2 = T2.addrline2
, addrline3 = T2.addrline3
From OtherTable As T
Left Join NewValues As T2
On T2.PK = T.PK

R. A. Cyberkiwi, Thomas, and Martin, thanks very much - these were very generous responses by each of you. All of these answers were the type of spoonfeeding I was looking for. I'd say they all rely on a key-like device and work by dividing addresses into lines, some of which are empty and some of which aren't, excluding the empties. In the case of lines of addresses, in my opinion this is semantically a gimmick to make the problem fit what SQL does well, and it's not a natural way to conceptualize the problem. Address lines are not "really" separate rows in a table that just got denormalized for a report. But that's debatable and whether you agree or not, I (a rank beginner) think each of your alternatives are idiomatic solutions worth elaborating on and studying.
I also get lots of similar cases where there really is normalization to be done - e.g., collatDesc1, collatCode1, collatLastAppraisal1, ... collatLastAppraisal5, with more complex criteria about what in excludeand how to order than with addresses, and I think techniques from your answers will be helpful.
%%phsloc%% is fun - since I'm able to create a key in this case I won't use it (as Martin advises). There was other stuff in Martin's stuff I wasn't familiar with too, and I'm still tossing them all around.
FWIW, here's the trigger I tried out, I don't know that I'll actually use it for the problem at hand. I think this qualifies a "bubble sort", with the swapping expressed in a peculiar way.
create trigger fixit on lines
instead of insert as
declare #maybeblank1 as varchar(max)
declare #maybeblank2 as varchar(max)
declare #maybeblank3 as varchar(max)
set #maybeBlank1 = (select line1 from inserted)
set #maybeBlank2 = (select line2 from inserted)
set #maybeBlank3 = (select line3 from inserted)
declare #counter int
set #counter = 0
while #counter < 3
begin
set #counter = #counter + 1
if #maybeBlank2 = ''
begin
set #maybeBlank2 =#maybeblank3
set #maybeBlank3 = ''
end
if #maybeBlank1 = ''
begin
set #maybeBlank1 = #maybeBlank2
set #maybeBlank2 = ''
end
end
select * into #kludge from inserted
update #kludge
set line1 = #maybeBlank1,
line2 = #maybeBlank2,
line3 = #maybeBlank3
insert into lines
select * from #kludge

You could make an insert and update trigger that check if the fields are empty and then move them.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Update with row_number() not working, why? - sql

Related

How to update statement in snowflake using with CTE (WITH CTE- common_table_expression)?

How do I return the count for both matching column in a single return

Stored procedure order of execution

Delete older from a duplicate select

SQL: How to update multiple fields so empty field content is moved to the logically last columns - lose blank address lines

Categories

Resources