Simplify SQL query for repeating formula in an update statement - sql

My current query is as below
UPDATE A
SET Computation = CASE WHEN Type = 'P' THEN
B.X * A.Y *-1
ELSE
B.X * A.Y
END
FROM table A
INNER JOIN table B
ON A.Link = B.Link
If its possible, how can I optimize it by only showing the formula B.X * A.Y once in my query? Or actually in performance wise this query is good enough?

You can bring B.X * A.Y outside of case statement:
(CASE
WHEN Type = 'P' THEN -1
ELSE 1
end
)*B.X * A.Y
Your query:
UPDATE A
SET Computation = (CASE WHEN Type = 'P' THEN -1
ELSE 1 END
)*B.X * A.Y
FROM table A
INNER JOIN table B
ON A.Link = B.Link

You can use CROSS APPLY any time you wish to use the results of a calculation more than once.
UPDATE A
SET Computation = CASE WHEN Type = 'P' THEN X.Value * -1 ELSE X.Value END
FROM table A
INNER JOIN table B B.?? = ON A.??
CROSS APPLY (VALUES (B.X * A.Y)) X (Value)
For a simple calculation such as this its unlikely to make much difference to performance (check the execution plan and see). But personally from a readability and maintainability perspective I like having my calculations in one place.

You can refactor to simplify for readability but performance should be similar
UPDATE A
SET Computation = IIF(Type = 'P', -1, 1) * B.X * A.Y
FROM table A
INNER JOIN table B
ON A.Link = B.Link

Related

sql join scenario

I have 2 tables name like A and B. A have columns say X, Y and Z and Table B have coulmns say P, Q and R. here in my case table have blank data for few rows in all the columns.
I need to join these 2 tables such that If A.X<>'' and B.X<>'' then It should join the table. If A.X='' and B.X='' then
it should check the next columns A.Y<>'' and B.Y<>''. If this is also blank it should join the table on next condition A.Z<>'' and B.Z<>''. If all these 3 condition have blanks It should not join for that row.
How can we achieve this using sql join?
Thanks in advance
You can go for conditional JOINS as given below:
SELECT *
FROM A
LEFT OUTER JOIN B as b1
ON A.X = b1.X AND B1.X <> '' -- JOIN only rows WHERE x is not blank
LEFT OUTER JOIN B as b2
ON A.Y = b2.Y AND b2.Y <> '' -- JOIN only rows WHERE y is not blank
LEFT OUTER JOIN B AS b3
ON A.Z = b3.Z AND b3.Z <> '' -- JOIN only rows WHERE z is not blank
WHERE
b1.X IS NOT NULL OR
b2.Y IS NOT NULL OR
b3.Z IS NOT NULL
Ramu's answer is close (I upvoted it) but it needs to be refined. The important part of the answer that is correct -- the equality conditions in the JOINs make the query easier to optimize.
However, it is better written as:
SELECT a.*,
COALESCE(b1.P, b2.P, b3.P) as P,
COALESCE(b1.Q, b2.Q, b3.Q) as Q,
COALESCE(b1.R, b2.R, b3.R) as R
FROM A LEFT JOIN
B b1
ON A.X = b1.X LEFT JOIN
B b2
ON A.Y = b2.Y AND
b1.X IS NULL LEFT JOIN -- no previous match
B b3
ON A.Z = b3.Z AND
b2.Y IS NULL AND
b1.X IS NULL -- no previous match
WHERE b1.X IS NOT NULL OR
b2.Y IS NOT NULL OR
b3.Z IS NOT NULL ;
The two key changes are:
The LEFT JOIN conditions check that previous columns did not match.
The SELECT uses COALESCE() to fetch columns.
Also, I don't think the condition on empty strings is needed. There will be no match if there are no empty string values in B for that column. If both tables have empty strings, then you apparently do want a match -- and an empty string matches an empty string in SQL Server.
You can also express this using APPLY:
SELECT a.*, b.*
FROM A CROSS APPLY
(SELECT TOP (1) WITH TIES B.*
FROM B
WHERE A.X = b.X OR
A.Y = b.Y OR
A.Z = b.Z
ORDER BY (CASE WHEN A.X = B.X THEN 1
WHEN A.Y = B.Y THEN 2
WHEN A.Z = B.Z THEN 3
)
) B;
However, I would expect the previous version to have much better performance.
Use a CASE expression in the ON clause:
SELECT *
FROM A INNER JOIN B
ON 1 = CASE
WHEN A.X <> '' AND B.P <> '' AND A.X = B.P THEN 1
WHEN A.Y <> '' AND B.Q <> '' AND A.Y = B.Q THEN 1
WHEN A.Z <> '' AND B.R <> '' AND A.Z = B.R THEN 1
END
or:
SELECT *
FROM A INNER JOIN B
ON 1 = CASE
WHEN A.X <> '' AND A.X = B.P THEN 1
WHEN A.Y <> '' AND A.Y = B.Q THEN 1
WHEN A.Z <> '' AND A.Z = B.R THEN 1
END
There is no need for further joins.
The CASE expression makes sure that each condition will be checked in the order that you want it to be checked.
So if the 1st condition is satisfied then the rows of the 2 tables will be joined, if not then the 2nd condition will be checked and so on.

Alternative to <> in SQL Developer

I did some searching on this site and couldn't find exactly what I'm looking for, so I hope this isn't a duplicate. I have an issue where a query in a view is taking about 39 seconds to run, which is dragging down a report query that joins to this view multiple times.
To keep this simple I'm going to keep the code simple, but keep the structure exactly as it is on the view. Here is the SELECT statement:
SELECT ....
FROM A a
JOIN B b on a.x = b.x
JOIN C c ON c.s = 'P' AND c.y = b.y
JOIN B AS b2 ON b2.y = c.y AND b2.x <> a.x
JOIN B b3 ON b3.x = b2.x
The x's and y's are the same column names in all join predicates.
The issue I am having comes with the line AND b2.x <> a.x. Without this, it runs in about 1 second, but with it its always taking over 30 seconds. I've tried rewriting this predicate multiple times:
b2.x IN (select b2.x FROM B b2 join A a on b2.x <> a.x)
b2.x NOT IN (select b2.x FROM b b2 JOIN A a on b2.x <> a.x)
NOT b2.x = a.x
OR even removing it and putting in a where clause after the joins, with all of the above varieties and also :
WHERE b2.x NOT IN (SELECT x FROM a)
WHERE b2.x (NOT IN SELECT DISTINCT x FROM a)
Im running out of ideas and need to figure out a way to optimize this. Any suggestions or hints at what else I can look at? Just running
SELECT b2.x from B b2 JOIN A a ON b2.x <> a.x
runs very quickly, so I don't think the underlying tables are the issues.
If the query runs really fast without the condition, but poorly with it, then I might suggest a materialized CTE:
WITH abc as (
SELECT /*+ materialize */...., b2.x as b2x, a.x as ax
FROM A a JOIN
B b
ON a.x = b.x JOIN
C c
ON c.s = 'P' AND c.y = b.y JOIN
B b2
ON b2.y = c.y AND b2.x <> a.x JOIN
B b3
ON b3.x = b2.x
)
SELECT abc.*
FROM ABC
WHERE b2x <> ax;

Update table for all entries from select statement

In my scenario, i select all the entries from a table where the condition is true, put it into a vector and use an update statement through a loop, passing the vector's values. It works.
SELECT * FROM MAP AS A WHERE EXISTS
(SELECT (X, Y) FROM MAP AS B WHERE B.X = A.X + 1 AND B.Y = A.Y ) AND EXISTS
(SELECT (X, Y) FROM MAP AS C WHERE C.X = A.X - 1 AND C.Y = A.Y ) ;
for...
UPDATE MAP SET VAL = 2 WHERE X = ? AND Y = ?;
...
But i wanted to try and use a single statement to complete this objective, while we can update a table using a select statement, in my scenario there are 2 keys that need to checked before selecting a record, so i'm not able to put a where condition for x or y together.
UPDATE MAP SET USED = 1 WHERE EXISTS (
SELECT * FROM MAP AS A WHERE EXISTS
(SELECT (X, Y) FROM MAP AS B WHERE B.X = A.X + 1 AND B.Y = A.Y ) AND EXISTS
(SELECT (X, Y) FROM MAP AS C WHERE C.X = A.X - 1 AND C.Y = A.Y ) );
When i put a where exists condition like above, it updates all the entries. How do i update the table in one query ?
The problem with your subquery is that it does not refer to the table (MAP) in the UPDATE statement.
Just drop the MAP AS A subquery and refer to MAP directly (UPDATE does not allow table aliases):
UPDATE MAP
SET USED = 1
WHERE EXISTS (SELECT 1 FROM MAP AS B WHERE B.X = MAP.X + 1 AND B.Y = MAP.Y)
AND EXISTS (SELECT 1 FROM MAP AS C WHERE C.X = MAP.X - 1 AND C.Y = MAP.Y)
Since you've verified the subquery is returning the rows you want to update, your update should then look something like this:
UPDATE MAP SET USED = 1
WHERE (X,Y) IN (
SELECT X, Y FROM MAP AS A WHERE EXISTS
(SELECT X, Y FROM MAP AS B WHERE B.X = A.X + 1 AND B.Y = A.Y ) AND EXISTS
(SELECT X, Y FROM MAP AS C WHERE C.X = A.X - 1 AND C.Y = A.Y ) );

Using a single SQL correlated sub-query to get two columns

My problem is represented by the following query:
SELECT
b.row_id, b.x, b.y, b.something,
(SELECT a.x FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_x,
(SELECT a.y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 ) AS source_y
FROM
my_table b
I'm using the same subquery statement twice, for getting both source_x and source_y.
That's why I'm wondering if it's possible to do it using one subquery only?
Because once I run this query on my real data (millions of rows) it seems to never finish and take hours, if not days (my connection hang up before the end).
I am using PostgreSQL 8.4
I think you can use this approach:
SELECT b.row_id
, b.x
, b.y
, b.something
, a.x
, a.y
FROM my_table b
left join my_table a on a.row_id = (b.row_id - 1)
and a.something != 42
#DavidEG posted the best syntax for the query.
However, your problem is definitely not just with the query technique. A JOIN instead of two subqueries can speed up things by a factor of two at best. Most likely less. That doesn't explain "hours". Even with millions of rows, a decently set up Postgres should finish the simple query in seconds, not hours.
First thing that stands out is the syntax error in your query:
... WHERE a.row_id = (b.row_id - 1), a.something != 42
AND or OR is needed here, not a comma.
Next thing to check are indexes. If row_id is not the primary key, you may not have an index on it. For optimum performance of this particular query create a multi-column index on (row_id, something) like this:
CREATE INDEX my_table_row_id_something_idx ON my_table (row_id, something)
If the filter excludes the same value every time in something != 42you can also use a partial index instead for additional speed up:
CREATE INDEX my_table_row_id_something_idx ON my_table (row_id)
WHERE something != 42
This will only make a substantial difference if 42 is a common value or something is a bigger column than just an integer. (An index with two integer columns normally occupies the the same size on disk as an index with just one, due to data alignment. See:
Calculating and saving space in PostgreSQL
Is a composite index also good for queries on the first field?
When performance is an issue, it is always a good idea to check your settings. Standard settings in Postgres use minimal resources in many distributions and are not up to handling "millions of rows".
Depending on your actual version of Postgres, an upgrade to a current version (9.1 at the time of writing) may help a lot.
Ultimately, hardware is always a factor, too. Tuning and optimizing can only get you so far.
old-fashioned syntax:
SELECT
b.row_id, b.x, b.y, b.something
, a.x AS source_x
, a.y AS source
FROM my_table b
,my_table a
WHERE a.row_id = b.row_id - 1
AND a.something != 42
;
Join-syntax:
SELECT
b.row_id, b.x, b.y, b.something
, a.x AS source_x
, a.y AS source
FROM my_table b
JOIN my_table a
ON (a.row_id = b.row_id - 1)
WHERE a.something != 42
;
SELECT b.row_id, b.x, b.y, b.something, a.x, a.y
FROM my_table b
LEFT JOIN (
SELECT row_id + 1, x, y
FROM my_table
WHERE something != 42
) AS a ON a.row_id = b.row_id;
Postgres:
SELECT
b.row_id, b.x, b.y, b.something,
source_x,
source_y
FROM
my_table b,
LATERAL(SELECT a.x AS source_x, a.y AS source_y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 )
MsSQL
SELECT
b.row_id, b.x, b.y, b.something,
source_x,
source_y
FROM
my_table b
OUTER APPLY(SELECT a.x AS source_x, a.y AS source_y FROM my_table a WHERE a.row_id = (b.row_id - 1), a.something != 42 )

Update via subquery, what if the subquery returns no rows?

I am using a subquery in an UPDATE:
UPDATE tableA
SET x,y,z = ( (SELECT x, y, z
FROM tableB b
WHERE tableA.id = b.id
AND (tableA.x != b.x
OR tableA.y != b.y
OR tableA.z != b.z))) );
My question is, what happens if the subquery returns no rows? Will it do an update with nulls?
Secondly, is there a better way to write this. I am basically updating three fields in tableA from tableB, but the update should only happen if any of the three fields are different.
what happens if the subquery returns
no rows? Will it do an update with
nulls?
Yes-- you can test this like:
update YourTable
set col1 = (select 1 where 1=0)
This will fill col1 with NULLs. In case the subquery returns multiple rows, like:
update YourTable
set col1 = (select 1 union select 2)
The database will generate an error.
Secondly, is there a better way to
write this. I am basically updating
three fields in tableA from tableB,
but the update should only happen if
any of the three fields are different.
Intuitively I wouldn't worry about the performance. If you really wish to avoid the update, you can write it like:
UPDATE a
SET x = b.x, y = b.y, z = b.z
FROM tableA a, tableB b
WHERE a.id = b.id AND (a.x <> b.x OR a.y <> b.y OR a.z <> b.z)
The WHERE clause prevents updates with NULL.
On informix I used, a variation of Andomar's solution:
UPDATE a
SET x,y,z = ( (SELECT x, y, z
FROM tableB b
WHERE tableA.id = b.id) )
WHERE tableA.id IN (SELECT fromTable.id
FROM tableA toTable, tableB fromTable
WHERE toTable.id = fromTable.id
AND ((toTable.x <> fromTable.x)
OR (toTable.y <> fromTable.y)
OR (toTable.z <> fromTable.z))