In my scenario, i select all the entries from a table where the condition is true, put it into a vector and use an update statement through a loop, passing the vector's values. It works.
SELECT * FROM MAP AS A WHERE EXISTS
(SELECT (X, Y) FROM MAP AS B WHERE B.X = A.X + 1 AND B.Y = A.Y ) AND EXISTS
(SELECT (X, Y) FROM MAP AS C WHERE C.X = A.X - 1 AND C.Y = A.Y ) ;
for...
UPDATE MAP SET VAL = 2 WHERE X = ? AND Y = ?;
...
But i wanted to try and use a single statement to complete this objective, while we can update a table using a select statement, in my scenario there are 2 keys that need to checked before selecting a record, so i'm not able to put a where condition for x or y together.
UPDATE MAP SET USED = 1 WHERE EXISTS (
SELECT * FROM MAP AS A WHERE EXISTS
(SELECT (X, Y) FROM MAP AS B WHERE B.X = A.X + 1 AND B.Y = A.Y ) AND EXISTS
(SELECT (X, Y) FROM MAP AS C WHERE C.X = A.X - 1 AND C.Y = A.Y ) );
When i put a where exists condition like above, it updates all the entries. How do i update the table in one query ?
The problem with your subquery is that it does not refer to the table (MAP) in the UPDATE statement.
Just drop the MAP AS A subquery and refer to MAP directly (UPDATE does not allow table aliases):
UPDATE MAP
SET USED = 1
WHERE EXISTS (SELECT 1 FROM MAP AS B WHERE B.X = MAP.X + 1 AND B.Y = MAP.Y)
AND EXISTS (SELECT 1 FROM MAP AS C WHERE C.X = MAP.X - 1 AND C.Y = MAP.Y)
Since you've verified the subquery is returning the rows you want to update, your update should then look something like this:
UPDATE MAP SET USED = 1
WHERE (X,Y) IN (
SELECT X, Y FROM MAP AS A WHERE EXISTS
(SELECT X, Y FROM MAP AS B WHERE B.X = A.X + 1 AND B.Y = A.Y ) AND EXISTS
(SELECT X, Y FROM MAP AS C WHERE C.X = A.X - 1 AND C.Y = A.Y ) );
Related
My current query is as below
UPDATE A
SET Computation = CASE WHEN Type = 'P' THEN
B.X * A.Y *-1
ELSE
B.X * A.Y
END
FROM table A
INNER JOIN table B
ON A.Link = B.Link
If its possible, how can I optimize it by only showing the formula B.X * A.Y once in my query? Or actually in performance wise this query is good enough?
You can bring B.X * A.Y outside of case statement:
(CASE
WHEN Type = 'P' THEN -1
ELSE 1
end
)*B.X * A.Y
Your query:
UPDATE A
SET Computation = (CASE WHEN Type = 'P' THEN -1
ELSE 1 END
)*B.X * A.Y
FROM table A
INNER JOIN table B
ON A.Link = B.Link
You can use CROSS APPLY any time you wish to use the results of a calculation more than once.
UPDATE A
SET Computation = CASE WHEN Type = 'P' THEN X.Value * -1 ELSE X.Value END
FROM table A
INNER JOIN table B B.?? = ON A.??
CROSS APPLY (VALUES (B.X * A.Y)) X (Value)
For a simple calculation such as this its unlikely to make much difference to performance (check the execution plan and see). But personally from a readability and maintainability perspective I like having my calculations in one place.
You can refactor to simplify for readability but performance should be similar
UPDATE A
SET Computation = IIF(Type = 'P', -1, 1) * B.X * A.Y
FROM table A
INNER JOIN table B
ON A.Link = B.Link
Schema for table A: A(x,y,z)
Schema for table B: B(u,x,v)
[Primary keys mentioned in bold]
For the SQL query as mentioned:-
SELECT x
FROM A
WHERE x in ( SELECT x
FROM B
WHERE x<10)
How does the inner query resolve that this x mentioned is from the table B and not the table A?
x is resolved from the innermost query out. It is always better to qualify column names, so write this query as:
SELECT A.x
FROM A
WHERE A.x IN (SELECT B.x
FROM B
WHERE B.x < 10
);
This has the advantage that if B.x does not exist, you will get an error. Otherwise, the IN (SELECT x . . . will refer to A.x (but only when B.x does not exist).
I am performing joins in Hive:
select * from
(select * from
(select * from A join B on A.x = B.x) t1
join C on t1.y = C.y) t2
join D on t2.x = D.x
I am getting column x cannot be resolved since A and B both contains column x. How should I use qualified name or is there a way to drop the duplicate column in Hive.
Because table A and table B have the column x, you must assign an alias within this select for this column
select * from A join B on A.x = B.x
Something like this
select A.x as x1, B.x as x2, ...
from A join B on A.x = B.x
You can do something similar to the following but it means you cannot use special character in columns names.
set hive.support.quoted.identifiers=none;
select * from
(select C.*,t1.`(y)?+.+` from
(select A.*,B.`(x)?+.+` from A join B on A.x = B.x) t1
join C on t1.y = C.y) t2
join D on t2.x = D.x
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-REGEXColumnSpecification
I have got exactly the same problem and solution for me was to simply rename the duplicate columns by recreating the Dataframe with modified schema. Here is some sample code:
def renameDuplicatedColumns(df: DataFrame): DataFrame = {
val duplicatedColumns = df.columns
.groupBy(identity)
.filter(_._2.length > 1)
.keys
.toSet
val newIndexes = mutable.Map[String, Int]().withDefaultValue(0)
val schema: StructType = StructType(
df.schema
.collect {
case field if duplicatedColumns.contains(field.name) =>
val idx = newIndexes(field.name)
newIndexes.update(field.name, idx + 1)
field.copy(name = field.name + "__" + idx)
case field =>
field
}
)
df.sqlContext.createDataFrame(df.rdd, schema)
}
I have 2 tables out of which one column/field is same in both tables, I need to update the Table A with data from table B.
Here table A.x value needs to be taken and compared with B.w and equivalant B.z value needs to be updated in A.x. x value differs from x1,x2 etc.. so each value needs to be taken and compared with w in Table B and equivalent z value need to be updated in x,x1,x2 etc in Table A.
Table A (columns j, x, x1,x2,x3..x20 and so on)
---------
j x x1 x2 ..x20 and y y1 y2 .. y20
Table B (columns w and z)
--------
w z
UPDATE TableA a SET a.x = (SELECT b.w
FROM TableB b
WHERE a.x = b.z)
WHERE a.j='somevalue';
If I write this way I need to write 40 update statement, is there any easy way to do these updates.
And the subquery might return multiple rows and I need to refine that too.
Thanks,
Ashraf
As a.x has another value as a.x1 and a.x2, etc. you need a subquery per value of course, because it is different b records you want to find.
UPDATE TableA a
SET a.x = (SELECT b.w FROM TableB b WHERE a.x = b.z)
, a.x1 = (SELECT b.w FROM TableB b WHERE a.x1 = b.z)
, a.x2 = (SELECT b.w FROM TableB b WHERE a.x2 = b.z)
...
WHERE a.j = 'somevalue';
You might want to consider changing your table design for tableA, so as to have rows holding the values rather than columns. That would make updates (and queries in general for that matter) much easier.
I am using a subquery in an UPDATE:
UPDATE tableA
SET x,y,z = ( (SELECT x, y, z
FROM tableB b
WHERE tableA.id = b.id
AND (tableA.x != b.x
OR tableA.y != b.y
OR tableA.z != b.z))) );
My question is, what happens if the subquery returns no rows? Will it do an update with nulls?
Secondly, is there a better way to write this. I am basically updating three fields in tableA from tableB, but the update should only happen if any of the three fields are different.
what happens if the subquery returns
no rows? Will it do an update with
nulls?
Yes-- you can test this like:
update YourTable
set col1 = (select 1 where 1=0)
This will fill col1 with NULLs. In case the subquery returns multiple rows, like:
update YourTable
set col1 = (select 1 union select 2)
The database will generate an error.
Secondly, is there a better way to
write this. I am basically updating
three fields in tableA from tableB,
but the update should only happen if
any of the three fields are different.
Intuitively I wouldn't worry about the performance. If you really wish to avoid the update, you can write it like:
UPDATE a
SET x = b.x, y = b.y, z = b.z
FROM tableA a, tableB b
WHERE a.id = b.id AND (a.x <> b.x OR a.y <> b.y OR a.z <> b.z)
The WHERE clause prevents updates with NULL.
On informix I used, a variation of Andomar's solution:
UPDATE a
SET x,y,z = ( (SELECT x, y, z
FROM tableB b
WHERE tableA.id = b.id) )
WHERE tableA.id IN (SELECT fromTable.id
FROM tableA toTable, tableB fromTable
WHERE toTable.id = fromTable.id
AND ((toTable.x <> fromTable.x)
OR (toTable.y <> fromTable.y)
OR (toTable.z <> fromTable.z))