General rules for simplifying SQL statements - sql

I'm looking for some "inference rules" (similar to set operation rules or logic rules) which I can use to reduce a SQL query in complexity or size.
Does there exist something like that? Any papers, any tools? Any equivalencies that you found on your own? It's somehow similar to query optimization, but not in terms of performance.
To state it different: Having a (complex) query with JOINs, SUBSELECTs, UNIONs is it possible (or not) to reduce it to a simpler, equivalent SQL statement, which is producing the same result, by using some transformation rules?
So, I'm looking for equivalent transformations of SQL statements like the fact that most SUBSELECTs can be rewritten as a JOIN.

To state it different: Having a (complex) query with JOINs, SUBSELECTs, UNIONs is it possible (or not) to reduce it to a simpler, equivalent SQL statement, which is producing the same result, by using some transformation rules?
This answer was written in 2009. Some of the query optimization tricks described here are obsolete by now, others can be made more efficient, yet others still apply. The statements about feature support by different database systems apply to versions that existed at the time of this writing.
That's exactly what optimizers do for a living (not that I'm saying they always do this well).
Since SQL is a set based language, there are usually more than one way to transform one query to other.
Like this query:
SELECT *
FROM mytable
WHERE col1 > #value1 OR col2 < #value2
can be transformed into this one (provided that mytable has a primary key):
SELECT *
FROM mytable
WHERE col1 > #value1
UNION
SELECT *
FROM mytable
WHERE col2 < #value2
or this one:
SELECT mo.*
FROM (
SELECT id
FROM mytable
WHERE col1 > #value1
UNION
SELECT id
FROM mytable
WHERE col2 < #value2
) mi
JOIN mytable mo
ON mo.id = mi.id
, which look uglier but can yield better execution plans.
One of the most common things to do is replacing this query:
SELECT *
FROM mytable
WHERE col IN
(
SELECT othercol
FROM othertable
)
with this one:
SELECT *
FROM mytable mo
WHERE EXISTS
(
SELECT NULL
FROM othertable o
WHERE o.othercol = mo.col
)
In some RDBMS's (like PostgreSQL 8.4), DISTINCT and GROUP BY use different execution plans, so sometimes it's better to replace the one with the other:
SELECT mo.grouper,
(
SELECT SUM(col)
FROM mytable mi
WHERE mi.grouper = mo.grouper
)
FROM (
SELECT DISTINCT grouper
FROM mytable
) mo
vs.
SELECT mo.grouper, SUM(col)
FROM mytable
GROUP BY
mo.grouper
In PostgreSQL, DISTINCT sorts and GROUP BY hashes.
MySQL 5.6 lacks FULL OUTER JOIN, so it can be rewritten as following:
SELECT t1.col1, t2.col2
FROM table1 t1
LEFT OUTER JOIN
table2 t2
ON t1.id = t2.id
vs.
SELECT t1.col1, t2.col2
FROM table1 t1
LEFT JOIN
table2 t2
ON t1.id = t2.id
UNION ALL
SELECT NULL, t2.col2
FROM table1 t1
RIGHT JOIN
table2 t2
ON t1.id = t2.id
WHERE t1.id IS NULL
, but see this article in my blog on how to do this more efficiently in MySQL:
Emulating FULL OUTER JOIN in MySQL
This hierarchical query in Oracle 11g:
SELECT DISTINCT(animal_id) AS animal_id
FROM animal
START WITH
animal_id = :id
CONNECT BY
PRIOR animal_id IN (father, mother)
ORDER BY
animal_id
can be transformed to this:
SELECT DISTINCT(animal_id) AS animal_id
FROM (
SELECT 0 AS gender, animal_id, father AS parent
FROM animal
UNION ALL
SELECT 1, animal_id, mother
FROM animal
)
START WITH
animal_id = :id
CONNECT BY
parent = PRIOR animal_id
ORDER BY
animal_id
, the latter one being more efficient.
See this article in my blog for the execution plan details:
Genealogy query on both parents
To find all ranges that overlap the given range, you can use the following query:
SELECT *
FROM ranges
WHERE end_date >= #start
AND start_date <= #end
, but in SQL Server this more complex query yields same results faster:
SELECT *
FROM ranges
WHERE (start_date > #start AND start_date <= #end)
OR (#start BETWEEN start_date AND end_date)
, and believe it or not, I have an article in my blog on this too:
Overlapping ranges: SQL Server
SQL Server 2008 also lacks an efficient way to do cumulative aggregates, so this query:
SELECT mi.id, SUM(mo.value) AS running_sum
FROM mytable mi
JOIN mytable mo
ON mo.id <= mi.id
GROUP BY
mi.id
can be more efficiently rewritten using, Lord help me, cursors (you heard me right: "cursors", "more efficiently" and "SQL Server" in one sentence).
See this article in my blog on how to do it:
Flattening timespans: SQL Server
There is a certain kind of query, commonly met in financial applications, that pulls effective exchange rate for a currency, like this one in Oracle 11g:
SELECT TO_CHAR(SUM(xac_amount * rte_rate), 'FM999G999G999G999G999G999D999999')
FROM t_transaction x
JOIN t_rate r
ON (rte_currency, rte_date) IN
(
SELECT xac_currency, MAX(rte_date)
FROM t_rate
WHERE rte_currency = xac_currency
AND rte_date <= xac_date
)
This query can be heavily rewritten to use an equality condition which allows a HASH JOIN instead of NESTED LOOPS:
WITH v_rate AS
(
SELECT cur_id AS eff_currency, dte_date AS eff_date, rte_rate AS eff_rate
FROM (
SELECT cur_id, dte_date,
(
SELECT MAX(rte_date)
FROM t_rate ri
WHERE rte_currency = cur_id
AND rte_date <= dte_date
) AS rte_effdate
FROM (
SELECT (
SELECT MAX(rte_date)
FROM t_rate
) - level + 1 AS dte_date
FROM dual
CONNECT BY
level <=
(
SELECT MAX(rte_date) - MIN(rte_date)
FROM t_rate
)
) v_date,
(
SELECT 1 AS cur_id
FROM dual
UNION ALL
SELECT 2 AS cur_id
FROM dual
) v_currency
) v_eff
LEFT JOIN
t_rate
ON rte_currency = cur_id
AND rte_date = rte_effdate
)
SELECT TO_CHAR(SUM(xac_amount * eff_rate), 'FM999G999G999G999G999G999D999999')
FROM (
SELECT xac_currency, TRUNC(xac_date) AS xac_date, SUM(xac_amount) AS xac_amount, COUNT(*) AS cnt
FROM t_transaction x
GROUP BY
xac_currency, TRUNC(xac_date)
)
JOIN v_rate
ON eff_currency = xac_currency
AND eff_date = xac_date
Despite being bulky as hell, the latter query is six times as fast.
The main idea here is replacing <= with =, which requires building an in-memory calendar table to join with.
Converting currencies

Here's a few from working with Oracle 8 & 9 (of course, sometimes doing the opposite might make the query simpler or faster):
Parentheses can be removed if they are not used to override operator precedence. A simple example is when all the boolean operators in your where clause are the same: where ((a or b) or c) is equivalent to where a or b or c.
A sub-query can often (if not always) be merged with the main query to simplify it. In my experience, this often improves performance considerably:
select foo.a,
bar.a
from foomatic foo,
bartastic bar
where foo.id = bar.id and
bar.id = (
select ban.id
from bantabulous ban
where ban.bandana = 42
)
;
is equivalent to
select foo.a,
bar.a
from foomatic foo,
bartastic bar,
bantabulous ban
where foo.id = bar.id and
bar.id = ban.id and
ban.bandana = 42
;
Using ANSI joins separates a lot of "code monkey" logic from the really interesting parts of the where clause: The previous query is equivalent to
select foo.a,
bar.a
from foomatic foo
join bartastic bar on bar.id = foo.id
join bantabulous ban on ban.id = bar.id
where ban.bandana = 42
;
If you want to check for the existence of a row, don't use count(*), instead use either rownum = 1 or put the query in a where exists clause to fetch only one row instead of all.

I suppose the obvious one is look for any Cursors that can be replaced with a SQL 'Set' based operation.
Next on my list, is look for any correlated sub-queries that can be re-written as a un-correlated query
In long stored procedures, break out separate SQL statements into their own stored procedures. That way they will get there own cached query plan.
Look for transactions that can have their scope shortened. I regularly find statements inside a transaction that can safely be outside.
Sub-selects can often be re-written as straight forward joins (modern optimisers are good at spotting simple ones)
As #Quassnoi mentioned, the Optimiser often does a good job. One way to help it is to ensure indexes and statistics are up to date, and that suitable indexes exist for your query workload.

I like everyone on a team to follow a set of standards to make code readable, maintainable, understandable, washable, etc.. :)
everyone uses the same alias
no cursors. no loops
why even think of IN when you can EXISTS
INDENT
Consistency in coding style
there is some more stuff here What are some of your most useful database standards?

I like to replace all sort of subselect by join query.
This one is obvious :
SELECT *
FROM mytable mo
WHERE EXISTS
(
SELECT *
FROM othertable o
WHERE o.othercol = mo.col
)
by
SELECT mo.*
FROM mytable mo inner join othertable o on o.othercol = mo.col
And this one is under estimate :
SELECT *
FROM mytable mo
WHERE NOT EXISTS
(
SELECT *
FROM othertable o
WHERE o.othercol = mo.col
)
by
SELECT mo.*
FROM mytable mo left outer join othertable o on o.othercol = mo.col
WHERE o.othercol is null
It could help the DBMS to choose the good execution plan in a big request.

Given the nature of SQL, you absolutely have to be aware of the performance implications of any refactoring. Refactoring SQL Applications is a good resource on refactoring with a heavy emphasis on performance (see Chapter 5).

Although simplification may not equal optimization, simplification can be important in writing readable SQL code, which is in turn critical to being able to check your SQL code for conceptual correctness (not syntactic correctness, which your development environment should check for you). It seems to me that in an ideal world, we would write the most simple, readable SQL code and then the optimizer would rewrite that SQL code to be in whatever form (perhaps more verbose) would run the fastest.
I have found that thinking of SQL statements as based on set logic is very useful, particularly if I need to combine where clauses or figure out a complex negation of a where clause. I use the laws of boolean algebra in this case.
The most important ones for simplifying a where clause are probably DeMorgan's Laws (note that "·" is "AND" and "+" is "OR"):
NOT (x · y) = NOT x + NOT y
NOT (x + y) = NOT x · NOT y
This translates in SQL to:
NOT (expr1 AND expr2) -> NOT expr1 OR NOT expr2
NOT (expr1 OR expr2) -> NOT expr1 AND NOT expr2
These laws can be very useful in simplifying where clauses with lots of nested AND and OR parts.
It is also useful to remember that the statement field1 IN (value1, value2, ...) is equivalent to field1 = value1 OR field1 = value2 OR ... . This allows you to negate the IN () one of two ways:
NOT field1 IN (value1, value2) -- for longer lists
NOT field1 = value1 AND NOT field1 = value2 -- for shorter lists
A sub-query can be thought of this way also. For example, this negated where clause:
NOT (table1.field1 = value1 AND EXISTS (SELECT * FROM table2 WHERE table1.field1 = table2.field2))
can be rewritten as:
NOT table1.field1 = value1 OR NOT EXISTS (SELECT * FROM table2 WHERE table1.field1 = table2.field2))
These laws do not tell you how to transform a SQL query using a subquery into one using a join, but boolean logic can help you understand join types and what your query should be returning. For example, with tables A and B, an INNER JOIN is like A AND B, a LEFT OUTER JOIN is like (A AND NOT B) OR (A AND B) which simplifies to A OR (A AND B), and a FULL OUTER JOIN is A OR (A AND B) OR B which simplifies to A OR B.

jOOQ supports pattern based transformation, which can be used in the online SQL parser and translator (look for the "patterns" dropdown), or as a parser CLI, or programmatically.
Since you're mainly looking for ways to turn your query into something simpler, not necessarily faster (which may depend on the target RDBMS), jOOQ could help you here.
Some examples include:
CASE to CASE abbreviation
-- Original
SELECT
CASE WHEN x IS NULL THEN y ELSE x END,
CASE WHEN x = y THEN NULL ELSE x END,
CASE WHEN x IS NOT NULL THEN y ELSE z END,
CASE WHEN x IS NULL THEN y ELSE z END,
CASE WHEN x = 1 THEN y WHEN x = 2 THEN z END,
FROM tab;
-- Transformed
SELECT
NVL(x, y), -- If available in the target dialect, otherwise COALESCE
NULLIF(x, y),
NVL2(x, y, z), -- If available in the target dialect
NVL2(x, z, y), -- If available in the target dialect
CHOOSE(x, y, z) -- If available in the target dialect
FROM tab;
COUNT(*) scalar subquery comparison
-- Original
SELECT (SELECT COUNT(*) FROM tab) > 0;
-- Transformed
SELECT EXISTS (SELECT 1 FROM tab)
Flatten CASE
-- Original
SELECT
CASE
WHEN a = b THEN 1
ELSE CASE
WHEN c = d THEN 2
END
END
FROM tab;
-- Transformed
SELECT
CASE
WHEN a = b THEN 1
WHEN c = d THEN 2
END
FROM tab;
NOT AND (De Morgan's rules)
-- Original
SELECT
NOT (x = 1 AND y = 2),
NOT (x = 1 AND y = 2 AND z = 3)
FROM tab;
-- Transformed
SELECT
NOT (x = 1) OR NOT (y = 2),
NOT (x = 1) OR NOT (y = 2) OR NOT (z = 3)
FROM tab;
Unnecessary EXISTS subquery clauses
-- Original
SELECT EXISTS (SELECT DISTINCT a, b FROM t);
-- Transformed
SELECT EXISTS (SELECT 1 FROM t);
There's a lot more.
Disclaimer: I work for the company behind jOOQ

My approach is to learn relational theory in general and relational algebra in particular. Then learn to spot the constructs used in SQL to implement operators from the relational algebra (e.g. universal quantification a.k.a. division) and calculus (e.g. existential quantification). The gotcha is that SQL has features not found in the relational model e.g. nulls, which are probably best refactored away anyhow. Recommended reading: SQL and Relational Theory: How to Write Accurate SQL Code By C. J. Date.
In this vein, I'm not convinced "the fact that most SUBSELECTs can be rewritten as a JOIN" represents a simplification.
Take this query for example:
SELECT c
FROM T1
WHERE c NOT IN ( SELECT c FROM T2 );
Rewrite using JOIN
SELECT DISTINCT T1.c
FROM T1 NATURAL LEFT OUTER JOIN T2
WHERE T2.c IS NULL;
The join is more verbose!
Alternatively, recognize the construct is implementing an antijoin on the projection of c e.g. pseudo algrbra
T1 { c } antijoin T2 { c }
Simplification using relational operators:
SELECT c FROM T1 EXCEPT SELECT c FROM T2;

Related

Impala Error : AnalysisException: Multiple subqueries are not supported in expression: (CASE *** WHEN THEN)

I have a where Clause that I need to check if values exists in a table, and I'm doing that in a (subquery). The problem is, that should be made based on
values - 'FIX' and 'VAR'. Depending on each, we need to check on a different table (subquery). To achieve that goal I'm using a Case When statement in the where clause, as shown below:
select *
FROM T1
where
(upper(trim(ITAXAVAR)) = 'S'
and
(
upper(trim(CTIPAMOR)) not in ('A','U','F')
)
)
and
--problem starts here.....
(case ucase(trim(CTIPTXFX)) --Values 'FIX';'VAR';'PUR'
WHEN 'FIX'
THEN
(concat(trim(CPRZTXFX),trim(CTAXAREF)) not in
(select trim(A.tayd91c0_celemtab)
from cd_estruturais.tat91_tabelas A
where A.tayd91c0_ctabela = 'W03' and
--data_date_part = '${Data_ref}' and --por vezes não temos actualização TAT91 para mesma data_ref das tabelas
A.data_date_part = (select max(B.data_date_part)
from cd_estruturais.tat91_tabelas B
where A.tayd91c0_ctabela = B.tayd91c0_ctabela and
B.data_date_part > date_add(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())),-5)
)
and length(nvl(trim(A.tayd91c0_celemtab),'')) <> 0
)
)
WHEN 'VAR'
THEN
(concat(trim(CTAXAREF),trim(CPERRVTX)) not in
(select concat(trim(A.CTXREF),trim(A.CPERRVTX))
from land_estruturais.cat01_taxref A
where A.data_date_part > date_add(TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP())),-5)
and length(nvl(concat(trim(A.CTXREF),trim(A.CPERRVTX)),'')) <> 0
)
)
END
)
;
Below is a simplified view of the same query:
select *
FROM T1
where
(--first criteria
)
and
--problem starts here.....
(case ucase(trim(CTIPTXFX)) --Values 'FIX';'VAR';'PUR'
WHEN 'FIX'
THEN
(field1 not in
(subquery 1)
)
WHEN 'VAR'
THEN
(field1 not in
(subquery 2)
END
)
;
Can anyone tell me what I'm doing wrong, please?
I seems to me that Impala does not support the subqueries inside a Case When Statement.
Thank you.
Impala doesnt support Subqueries in the select list.
So, you need to rewrite the SQL like below -
Use LEFT ANTI JOIN in place of NOT IN() to link subqueries to T1.
To handle case when, use UNION ALL for different conditions.
SELECT * FROM T1
LEFT ANTI JOIN subqry1 y ON T1.id = y.id
WHERE col='FIX'
UNION ALL
SELECT * FROM T1
LEFT ANTI JOIN subqry2 y ON T1.id = y.id
WHERE col='VAR'
I tried to change the simple SQL you posted above. The main SQL is too complex and need table setup and data to prove the logic.
Here is my version of your simple SQL -
select * FROM T1
LEFT ANTI JOIN subquery1 ON subquery1.column = T1.field1
where (--first criteria )
and ucase(trim(CTIPTXFX))='FIX'
UNION ALL
select * FROM T1
LEFT ANTI JOIN subquery2 ON subquery2.column = T1.field1
where (--first criteria )
and ucase(trim(CTIPTXFX))='VAR'
Pls note, Anti join and union all can be expensive so if your table size if huge, please tune them accordingly.

Sql query is slow(exists/joins)

I've got a SQL query to update records. I use 'EXISTS' function but it's very slow. Now I want to change my query with joins.
UPDATE zp
SET ZP.TEST1=NULL,
ZP.TEST2=NULL,
ZP.TEST3=NULL,
ZP.TEST4=NULL,
ZP.TEST5=NULL,
ZP.TEST6=NULL,
ZP.TEST7=NULL,
ZP.TEST8=NULL,
ZP.TEST9=NULL,
ZP.TEST10=NULL,
ZP.TEST11=NULL,
ZP.TEST12=NULL,
ZP.TEST13=NULL,
ZP.TEST14=NULL,
ZP.TEST15=NULL
from TestTable ZP
WHERE NOT(
(ZP.name='I'
AND
surname='S'
OR
addr='S'
AND
ClientID IS NOT NULL)
AND EXISTS(
SELECT * FROM table2 P
WHERE P.OrgID=ZP.OrgID AND
P.CATEGORY='D'
)
)
A slight efficiency improvement would be....
EXISTS(
SELECT 1 FROM table2 P
WHERE P.OrgID=ZP.OrgID AND
P.CATEGORY='D'
Namely 1 rather than *.
But I'm sure that's not the complete solution.

SQL query: Iterate over values in table and use them in subquery

I have a simple SQL table containing some values, for example:
id | value (table 'values')
----------
0 | 4
1 | 7
2 | 9
I want to iterate over these values, and use them in a query like so:
SELECT value[0], x1
FROM (some subquery where value[0] is used)
UNION
SELECT value[1], x2
FROM (some subquery where value[1] is used)
...
etc
In order to get a result set like this:
4 | x1
7 | x2
9 | x3
It has to be in SQL as it will actually represent a database view. Of course the real query is a lot more complicated, but I tried to simplify the question while keeping the essence as much as possible.
I think I have to select from values and join the subquery, but as the value should be used in the subquery I'm lost on how to accomplish this.
Edit: I oversimplified my question; in reality I want to have 2 rows from the subquery and not only one.
Edit 2: As suggested I'm posting the real query. I simplified it a bit to make it clearer, but it's a working query and the problem is there. Note that I have hardcoded the value '2' in this query two times. I want to replace that with values from a different table, in the example table above I would want a result set of the combined results of this query with 4, 7 and 9 as values instead of the currently hardcoded 2.
SELECT x.fantasycoach_id, SUM(round_points)
FROM (
SELECT DISTINCT fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
LEFT JOIN fantasyworld_fantasyformation AS ff ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
AND _rr.sequence <= 2 /* HARDCODED USE OF VALUE */
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
AND fpc.round_sequence = 2 /* HARDCODED USE OF VALUE */
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Edit 3: I'm using PostgreSQL.
SQL works with tables as a whole, which basically involves set operations. There is no explicit iteration, and generally no need for any. In particular, the most straightforward implementation of what you described would be this:
SELECT value, (some subquery where value is used) AS x
FROM values
Do note, however, that a correlated subquery such as that is very hard on query performance. Depending on the details of what you're trying to do, it may well be possible to structure it around a simple join, an uncorrelated subquery, or a similar, better-performing alternative.
Update:
In view of the update to the question indicating that the subquery is expected to yield multiple rows for each value in table values, contrary to the example results, it seems a better approach would be to just rewrite the subquery as the main query. If it does not already do so (and maybe even if it does) then it would join table values as another base table.
Update 2:
Given the real query now presented, this is how the values from table values could be incorporated into it:
SELECT x.fantasycoach_id, SUM(round_points) FROM
(
SELECT DISTINCT
fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
-- one row for each combination of coach and value:
CROSS JOIN values
LEFT JOIN fantasyworld_fantasyformation AS ff
ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr
ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff
ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
-- use the value obtained from values:
AND _rr.sequence <= values.value
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
-- use the value obtained from values again:
AND fpc.round_sequence = values.value
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Note in particular the CROSS JOIN which forms the cross product of two tables; this is the same thing as an INNER JOIN without any join predicate, and it can be written that way if desired.
The overall query could be at least a bit simplified, but I do not do so because it is a working example rather than an actual production query, so it is unclear what other changes would translate to the actual application.
In the example I create two tables. See how outer table have an alias you use in the inner select?
SQL Fiddle Demo
SELECT T.[value], (SELECT [property] FROM Table2 P WHERE P.[value] = T.[value])
FROM Table1 T
This is a better way for performance
SELECT T.[value], P.[property]
FROM Table1 T
INNER JOIN Table2 p
on P.[value] = T.[value];
Table 2 can be a QUERY instead of a real table
Third Option
Using a cte to calculate your values and then join back to the main table. This way you have the subquery logic separated from your final query.
WITH cte AS (
SELECT
T.[value],
T.[value] * T.[value] as property
FROM Table1 T
)
SELECT T.[value], C.[property]
FROM Table1 T
INNER JOIN cte C
on T.[value] = C.[value];
It might be helpful to extract the computation to a function that is called in the SELECT clause and is executed for each row of the result set
Here's the documentation for CREATE FUNCTION for SQL Server. It's probably similar to whatever database system you're using, and if not you can easily Google for it.
Here's an example of creating a function and using it in a query:
CREATE FUNCTION DoComputation(#parameter1 int)
RETURNS int
AS
BEGIN
-- Do some calculations here and return the function result.
-- This example returns the value of #parameter1 squared.
-- You can add additional parameters to the function definition if needed
DECLARE #Result int
SET #Result = #parameter1 * #parameter1
RETURN #Result
END
Here is an example of using the example function above in a query.
SELECT v.value, DoComputation(v.value) as ComputedValue
FROM [Values] v
ORDER BY value

Updating A Column Base on Another Column

I'm wondering if it's possible to do something like the following:
UPDATE SomeTable st
SET MyColumn1 = (SELECT SomeValue
FROM WhereverTable),
MyColumn2 = ((SELECT AnotherValue
FROM AnotherTable)
*
MyColumn1);
WHERE MyColumn4 = 'condition'
I'm thinking that when I multiply AnotherValue with MyColumn1, it will still have the old value of MyColumn1 rather than the new one which is supposed to be SomeValue.
I'm using DB2 if that matters.
Count on the multiplication expression using the original value of MyColumn1, not the value specified in the update. If you want to use the new value for MyColumn1 in the multiplication formula, then specify the new expression there, too. Also, you should place a MIN, MAX, or FETCH FIRST ROW ONLY in the subqueries to prevent multiple rows from being returned.
Without more specifics than that, it is hard to give you a solid answer. But I'll take a stab at it:
UPDATE SomeTable
SET MyColumn1 = wt.SomeValue
MyColumn2 = at.AnotherValue
FROM SomeTable st
CROSS JOIN (
SELECT SomeValue FROM WhereverTable
) wt
CROSS JOIN (
SELECT AnotherValue FROM AnotherTable
) at
WHERE MyColumn4 = 'condition'
If they truly aren't related, then CROSS JOIN Is what you want. But beware that the subqueries (in this case, wt and at) that are cross-joined need to have only 1 record in them, or the JOINs will cause more than one record to be generated in the FROM clause. Not sure what it would do to this query, but it would probably make the resultset non-updateable.
Note that I'm using SQL Server's T-SQL syntax, as that is what I'm more familiar with. But a quick google found that DB2 does support cross joins (see here).
Try this (untested):
UPDATE SomeTable
SET MyColumn1 = (SELECT SomeValue
FROM WhereverTable),
MyColumn2 = MyColumn1 * (SELECT AnotherValue
FROM AnotherTable)
WHERE MyColumn4 = 'condition';
This Standard SQL-92 syntax requires scalar subqueries i.e. both WhereverTable and AnotherTable must each consist of zero or one row. It is more often the case that that rows need to be 'correlated' using identifiers (or conditions or similar) in the subqueries and do so in both the SET clause and the WHERE clause in the UPDATE statement e.g. (SQL-92, untested):
UPDATE SomeTable
SET MyColumn1 = (
SELECT wt.SomeValue
FROM WhereverTable AS wt
WHERE wt.some_table_ID = SomeTable.some_table_ID
),
MyColumn2 = MyColumn1 * (
SELECT av.AnotherValue
FROM AnotherTable AS av
WHERE wt.another_table_ID = SomeTable.another_table_ID
)
WHERE MyColumn4 = 'condition'
AND EXISTS (
SELECT *
FROM WhereverTable AS wt
WHERE wt.some_table_ID = SomeTable.some_table_ID
)
OR EXISTS (
SELECT *
FROM AnotherTable AS av
WHERE wt.another_table_ID = SomeTable.another_table_ID
);
This can be rewritten using SQL-99's MERGE statement given less 'repetitive' code.
Assuming that the tables always return one (and only one) row, the following should work just fine:
UPDATE SomeTable st SET (MyColumn1, MyColumn2) = (SELECT SomeValue,
AnotherValue * MyColumn1
FROM WhereverTable wt
CROSS JOIN AnotherTable at)
WHERE MyColumn4 = 'condition'
This will update MyColumn2 as desired (using the old value of MyColumn1).
Obviously if there are more/optional rows things get more complicated.

What is the WITH statement doing in this example? I'm trying to randomly generate data

INSERT INTO files (fileUID, filename)
WITH fileUIDS(fileUID) AS
( VALUES(1) UNION ALL
SELECT fileUID+1 FROM fileUIDS WHERE fileUID < 1000 )
SELECT fileUID,
TRANSLATE ( CHAR(BIGINT(RAND() * 10000000000 )), 'abcdefgHij', '1234567890' )
FROM fileUIDS;
The WITH syntax is the same as using either a local temp table or inline view. To my knowledge, it's only supported in SQL Server (2005+, called Common Table Expressions) and Oracle (9i+, called Subquery Factoring). The intended use is for creating a basic view that is used (ie: joined to) multiple times in a single query.
Here's a typical example:
WITH example AS (
SELECT q.question_id,
t.tag_name
FROM QUESTIONS q
JOIN QUESTION_TAG_XREF qtf ON qtf.question_id = t.question_id
JOIN TAGS t ON t.tag_id = qtf.tag_id)
SELECT t.title,
e1.tag_name
FROM QUESTIONS t
JOIN example e1 ON e1.question_id = t.question_id
...which will return identical results if you use:
SELECT t.title,
e1.tag_name
FROM QUESTIONS t
JOIN (SELECT q.question_id,
t.tag_name
FROM QUESTIONS q
JOIN QUESTION_TAG_XREF qtf ON qtf.question_id = t.question_id
JOIN TAGS t ON t.tag_id = qtf.tag_id) e1 ON e1.question_id = t.question_id
The example you provided:
WITH fileUIDS(fileUID) AS (
VALUES(1)
UNION ALL
SELECT t.fileUID+1
FROM fileUIDS t
WHERE t.fileUID < 1000 )
INSERT INTO files
(fileUID, filename)
SELECT f.fileUID,
TRANSLATE ( CHAR(BIGINT(RAND() * 10000000000 )), 'abcdefgHij', '1234567890' )
FROM fileUIDS f;
...is a recursive one. It's starting at 1, generating 999 fileuids in total (it would be 1,000 if it had started at 0).
WITH x AS (...)
This will take the output of the ... and treat it as a table named x, temporarily.
WITH x AS (...)
SELECT * FROM x
This statement will essentially give you the exact same thing as the ... outputs but it will instead be referenced as the table x
The WITH word is used to create a Common Table Expression (CTE). In this case, it's creating an inline table that the "select fileUID, ..." part is pulling data from.
It is creating CTE (Common Table Expression). This is a basically a table that you don't have to create, drop, or declare in anyway. It will be automatically deleted after the batch has ran.
Check out https://web.archive.org/web/20210927200924/http://www.4guysfromrolla.com/webtech/071906-1.shtml for more info.