Netezza Update a Table Column by Joining to Another Table - sql

I am getting an error by running an update, but I can not figure out where the issue is in Netezza. I appreciate some help.
ERROR [42S02] ERROR: relation does not exist DEVML_WORK.AGRINSHPUN.A
update Table A
set A.COL1 = B.COL2
from A left outer join B
on A.CU_NUM=B.CU_NUM;

In general performance on correlated updates in Netezza is slow. Below are two examples that will get your query to work. The second in my experience speeds up large updates.
-- Slow but works
update Table A
set A.COL1 = B.COL2
from B
where A.CU_NUM=B.CU_NUM;
--Faster
update A set col1 = sub.col2
from (select a.rowid as rown, b.COL2
from A a inner join
B b
on a.cu_num= b.cu_num) sub
where rowid = sub.rown;

Related

How to solve ERROR: Target table must be part of an equijoin predicate in Redshift?

I want to update two columns of redshift table (table1) with columns of another redshift table (table2) based on certain conditions. I am getting the equijoin predicate error when running the below query.
UPDATE schema1.table1
SET col1 = sub.col3,
col2 = sub.col4
FROM (
SELECT a.col1, a.col2, b.col3, b.col4 FROM schema1.table1 a LEFT JOIN schema2.table2 b
ON ((a.col5 > b.col6) AND (a.col5 < b.col7))
)sub;
So the issue is that Redshift cannot figure out which row you want to update. There should be a WHERE clause indicating how the results from the from-select are matched up with the table being updated.
Here's an example from the Redshift docs:
update category set catid=100
from (select event.catid from event left join category cat on event.catid=cat.catid) eventcat
where category.catid=eventcat.catid
and catgroup='Concerts';
Even though table "category" is referenced in the "from-select" Redshift needs the WHERE clause to align the result. Also note that the reference to "category" inside the "from-select" needs to be aliased to avoid confusion.
Now let's look at your query (formatted):
UPDATE schema1.table1 SET col1 = sub.col3, col2 = sub.col4
FROM (SELECT a.col1, a.col2, b.col3, b.col4
FROM schema1.table1 a
LEFT JOIN schema2.table2 b
ON ( ( a.col5 > b.col6 ) AND ( a.col5 < b.col7 ) ))sub;
You need a WHERE clause to allow Redshift to know where to apply each row of the "from-select". This could be as simple as
WHERE table1.col1 = sub.col1 AND table1.col2 = sub.col2
but since I don't know your data model so this could be completely off.
There is also a hazard here in that using inequality joins in producing the FROM results could lead to a large number of rows and repeated updates.

MSSQL How to copy data from one table to another with a condition

I'm trying to update table a with data from some of the columns in table b. Column names are matching in both tables, cannot figure out the syntax, can anyone help?
This is what I want to do (expressed out-of-syntax):
UPDATE table_a
SET table_a.col1 = table_b.col1, table_a.col2 = table_b.col2
WHERE table_a.id = table_b.id
Maybe (probably) I would need some kind of JOIN-clause, but I haven't gotten my head around those yet.... :-/
You can update your table using a JOIN of the two tables:
UPDATE table_a a
INNER JOIN table_b b ON a.id = b.id
SET a.col1 = b.col1, a.col2 = b.col2
I see you are on MYSQL. Not sure if the version above works. If not, try:
UPDATE table_a a
INNER JOIN table_b b
SET a.col1 = b.col1, a.col2 = b.col2
WHERE a.id = b.id
Ok, so I found it myself... :-)
MERGE (without using the WHEN NOT MATCHED clause)is the answer to my problem.
My solution:
MERGE INTO table_a
USING table_b
ON a.id=b.id
WHEN MATCHED THEN UPDATE SET
col1 = b.col1, col2 = b.col2;

Is it necessary to reduce update times even use group by statement?

There are two tables
Table A col1,col2,col3
100,200,aaa;
101,200,bbb;
102,200,ccc;
Table B col1,col2,col3
aaa,1,ok;
aaa,2,ok;
aaa,3,ok;
bbb,1,fine;
bbb,3,fine;
Assume table A is a very large table and table B is a small table. In table B, col1 only have one col3 value, e.g, if col1 is 'aaa', col3 must be 'ok'
case 1:
update a set a.col2 = b.col3
from A a, B b
where a.col3 = b.col1
case 2:
update a set a.col2 = b.col3
from A a, (select col1, col3 from B group by col1,col3) b
where a.col3 = b.col1
The result of case 1 and case 2 are the same, but I just want to ask which statement is better? Whether case 1 will update table A for 5 times? Will the group by statement in case 2 consume more calcuation?
You should run EXPLAIN on both these queries to see how your database is actually handling things. That being said, one thing does stand out in terms of performance. In your first query:
update a set a.col2 = b.col3
from A a, B b
where a.col3 = b.col1
you are joining table A with B via the col3 and col1 columns. If there were an index on B.col1 then the join could proceed much faster than if the database were forced to do a full table scan of B. But an index on B.col1 probably would not help in your second query:
update a set a.col2 = b.col3
from A a, (select col1, col3 from B group by col1,col3) b
where a.col3 = b.col1
Here you are joining A to a table derived from B and as such no index is likely available. So I would opt for your first query.
By the way, you are using the old pre ANSI-92 syntax for joining in your first query and you might want to update it.
Since these 2 statements are logically equal (result wise) they might have the same execution plan and therefore have the same performance.
Different execution plans might give an advantage to each of the statements.
I would like to emphasize one thing -
Nested-loops is not the only option to implement JOIN and in databases that support HASH JOIN they are rarely used for equality JOIN therefore the all way you are thinking about what is going here needs to be revised.
Thank you guys, according to sql execution plan, it will dedup data going to update at background so no need to distinct manually, see below screenshot.
sql server automatically sort/distinct

Joins, conditions and speed in SQL

While preparing some requests, I was writing this :
SELECT *
FROM ta A
JOIN tb B
ON A.col1 = B.col1
JOIN tc C
ON B.col2 = C.col2
WHERE B.col3 = 'whatever'
AND C.col4 = 'whatever2'
And I began to think about the following :
SELECT *
FROM ta A
JOIN (SELECT * FROM tb WHERE col3 = 'whatever') B
ON A.col1 = B.col1
JOIN (SELECT * FROM tc WHERE col4 = 'whatever2') C
ON B.col2 = C.col2
(If I'm not mistaken, the result would be the same). I'm wondering if it would be significantly faster ? My guess is that it would but I'd be interested in knowing why/why not ?
(Because our server is down at the moment, I can't test it myself right now, so I'm asking here, I hope you won't mind.)
(In case it matters, the engine is Vertica, but my question isn't really specific to Vertica)
Your second query is a little off, it should be:
SELECT *
FROM ta A
JOIN (SELECT * FROM tb WHERE tb.col3 = 'whatever') B
ON A.col1 = B.col1
JOIN (SELECT * FROM tc WHERE tc.col4 = 'whatever2') C
ON B.col2 = C.col2
Notice the inline view where clauses need to reference the table in scope, not the alias for the view. B and C are out of scope within the inline views.
In any case, because you are doing an inner join, it won't matter from a results perspective because the condition is the same whether it occurs pre-join or post-join.
You can reasonably rely on the optimizer to do the following:
Only materialize the columns required when needed.
Push predicates down where it makes sense
That said, there should be no difference between the two statements. Most likely it is pushing down predicates for the first one to make it more like the second one. If you have statistics gathered, the optimizer should be smart enough to query these the same way (or really close).
That isn't to say I haven't seen what you have in your second query "fix" query issues for me in Vertica... but usually it's only when I am using multiple COUNT(DISTINCT ...) expressions or theta joins, etc.
Now if this were an outer join, then the statements would be different. The first one would apply the filter after the join, the second would be before the join.
Of course, I'll mention that you really just need to do an explain of both methods. Just make sure statistics are gathered.
Hope it helps.
Your first query will work fine, but the second query will not be executed and causes error. The reason behind it is, you are taking JOIN (SELECT * FROM tb WHERE B.col3 = 'whatever') B ON A.col1 = B.col1.
In this condition you are matching the column with A.col1 = B.col1. Here you will get A.col1 from ta table, but you will not get B.col1. While specifying a sub query in the join, you should not use ' * ' operator. Joins will not recognize this operator in a sub query. You need to specify required column names. Like the example in below query,
SELECT *
FROM ta A
JOIN (SELECT col1,col2 FROM tb WHERE B.col3 = 'whatever') B
ON A.col1 = B.col1
JOIN (SELECT col2 FROM tc WHERE C.col4 = 'whatever2') C
ON B.col2 = C.col2
This will execute and provides you a result. Two columns is taken in the first join sub query col1,col2, as you are using the condition B.col2 from B table in the second join condition. In a select clause you can provide ' * ' operator which provides you all the columns from all three tables. But you are not supposed to use the operator in a sub query of a join, as joins are coded in such a way.
Both the queries does not have much difference, but your first logic will execute faster compared to the second. In the second logic, two sub queries are used which makes multiple searches in the database and provides you result little slower than the first logic.

join clause, match or null

So I have some procs I inherited that I am trying to clean up. One of the things I see over and over in them is the following:
Update Table_A
Set A.ColX = B.Colx
From Table_A A
Join Table_B B on B.col1 =A.col1
and B.col2 = A.col2
Update Table_A
Set A.ColX = B.Colx
From Table_A A
Join Table_B B on a.col1 =b.col1
and B.col2 is null
Now , I have tried to combine these to make them a single query using the following different final lines (not at the same time!):
1) and (B.col2 = A.col2 or B.col2 is null)
2) and (isnull(B.col2,'') = COALESCE(a.col2, ''))
However, it always seems to do one of the updates, not both. I feel like I am missing something rather obvious, Is there a good way to combine these two queries?
thanks
This query should work:
Update Table_A
Set A.ColX = B.Colx
From Table_A A
Join Table_B B on B.col1 = A.col1
and (B.col2 = A.col2 OR or B.col2 is null)
which you said you tried - but you may try it as a SELECT first and see what the results are. That may shed some light on why you're not getting the results you expect.
I would expect the following query to work in SQL Server:
Update A
Set ColX = B.Colx
From Table_A A Join
Table_B B
on a.col1 = b.col1 and
(B.col2 = A.col2 or B.col2 is null);
Notes:
You should use the alias defined in the from clause after the update. My understanding is that if you use the table name and the table is not in the from clause without an alias, then all rows will be updated.
Although I was pretty sure that SQL Server does not support table aliases in the set, I appear to be wrong about that, as this simple SQL Fiddle shows. Perhaps this was not allowed in some ancient version of SQL Server, and the limitation just stuck with me.