SQL UPDATE order of evaluation - sql

What is the order of evaluation in the following query:
UPDATE tbl SET q = q + 1, p = q;
That is, will "tbl"."p" be set to q or q + 1? Is order of evaluation here governed by SQL standard?
Thanks.
UPDATE
After considering Migs' answer, I ran some tests on all DBs I could find. While I don't know what the standard says, implementations vary.
Given
CREATE TABLE tbl (p INT NOT NULL, q INT NOT NULL);
INSERT INTO tbl VALUES (1, 5); -- p := 1, q := 5
UPDATE tbl SET q = q + 1, p = q;
I found the values of "p" and "q" were:
database p q
-----------------+---+---
Firebird 2.1.3 | 6 | 6 -- But see "Update 2" below
InterBase 2009 | 5 | 6
MySQL 5.0.77 | 6 | 6 -- See "Update 3" below
Oracle XE (10g) | 5 | 6
PostgreSQL 8.4.2 | 5 | 6
SQLite 3.3.6 | 5 | 6
SQL Server 2016 | 5 | 6
UPDATE 2
Firebird 2.5 changes its behavior to match the majority of other SQL engines I tested, leaving MySQL alone. The relevant Release Notes entry, "Logic Change in SET Clause", strongly suggests that the majority behavior is correct per SQL specifications.
I've bugged MySQL to comment on this behavior (bug no. 52861), as they seem to be the outlier.
UPDATE 3
The aforementioned bug is today (2010-05-19) closed, and the documentation set to be updated to make this behavior explicit both in the UPDATE description and in the Differences from Standard SQL section.
Bravo, MySQL.

MySQL does "left to right" evaluation and does "see" the new values.
(Tested on 5.0.45-community-nt-log MySQL Community Edition)
Furthermore, from the MySQL manual: "Single-table UPDATE assignments are generally evaluated from left to right. For multiple-table updates, there is no guarantee that assignments are carried out in any particular order."
Now, "generally" is quite vague and "no guarantee" is very bad given that the order of evaluation is important.
So, in order to answer the question: IS the behaviour specified by "the SQL standard" or is it just a convention?
UPDATE: Got hold of the SQL92 specs which state at "13.10 update statement: searched" item "6) The (value expression)s are effectively evaluated for each row of T before updating any row of T."
IMHO not absolutely unambiguous, but enough to consider that the STANDARD is NOT to "see" the results of your own update.
Considering your example, the way Oracle, PostgreSQL and Interbase do it.

The UPDATE does not see the results of its work.
p will be set to q as of before update.
The following code will just swap the columns:
DECLARE #test TABLE (p INT, q INT)
INSERT
INTO #test
VALUES (2, 3)
SELECT *
FROM #test
p q
--- ---
2 3
UPDATE #test
SET p = q,
q = p
SELECT *
FROM #test
p q
--- ---
3 2

The write to the table has to occur after transaction that was well under way when the read was completed.

Related

Grouping sets columns in aggregate arguments and NULL replacement

There are many grouping sets examples on the internet like query Q1 in the example below. But query Q2 is different because A2 is a grouping column and it is used as the argument to SUM().
Which one of the following is correct for Q2 according to the SQL Standard (any version since 2003 that supports grouping sets)? If (1) is correct, please explain why with reference to the Standard.
A2 is replaced by NULL unless it is in an argument to an aggregate. This interpretation would give results R1 below. This is Oracle's behaviour (which seems more useful).
A2 is replaced by NULL including where it is used in an aggregate: this means that the aggregate will return NULL.
This interpretation would give results R2 below. This is how I have understood the SQL Standard (possibly incorrectly).
Example code:
-- Setup
create table A (A1 int, A2 int, A3 int);
insert into A values (1, 1, 100);
insert into A values (1, 2, 40);
insert into A values (2, 1, 70);
insert into A values (5, 1, 90);
-- Query Q1
-- Expected/Observed results:
--
-- A1 A2 SUM(A3)
-- ---------- ---------- ----------
-- 1 - 140
-- 2 - 70
-- 5 - 90
-- - 1 260
-- - 2 40
-- - - 300
select A1, A2, sum (A3)
from A
group by grouping sets ((A1), (A2), ())
order by 1, 2;
-- Query Q2
-- Results R1 (Oracle):
-- A1 A2 SUM(A2)
-- ---------- ---------- ----------
-- 1 - 3
-- 2 - 1
-- 5 - 1
-- - 1 3
-- - 2 2
-- - - 5
--
-- Results R2 (SQL Standard?):
-- A1 A2 SUM(A2)
-- ---------- ---------- ----------
-- 1 - -
-- 2 - -
-- 5 - -
-- - 1 3
-- - 2 2
-- - - - -- NULL row
select A1, A2, sum (A2)
from A
group by grouping sets ((A1), (A2), ())
order by 1, 2;
I am aware of this from SQL 2003 7.9 Syntax 17, which describes how columns are replaced with NULLs. However, I might have missed or misunderstood a rule elsewhere that excludes arguments to aggregates.
m) For each GS_i:
iii) Case:
1) If GS_i is an <ordinary grouping set>, then
A) Transform SL2 to obtain SL3, and transform HC to obtain
HC3, as follows:
II) Replace each <column reference> in SL2 and HC that
references PC_k by "CAST(NULL AS DTPCk)"
As with many difficult SQL features, it can help to look at earlier versions of the standard where the phrasing might be simpler. And it turns out that grouping sets were introduced in SQL 1999 and were then revised in SQL 2003.
SQL 1999
Syntax Rule 4 states:
Let SING be the <select list> constructed by removing from SL every <select
sublist> that is not a <derived column> that contains at least one <set
function specification>.
Then Syntax Rule 11 defines PC_k as the column references contained in the group by list. It constructs a derived table projecting the union of GSQQL_i, which are query specifications projecting the PC_k or NULL as appropriate, the PCBIT_i grouping function indicators and SING.
Thus any that contains a set function will not have its argument replaced, and its columns won't be replaced either. So answer (1) is correct.
However, in the following query the GSQQL_i corresponding to the <grand total> doesn't group by C1 so I think it will give an error rather than replacing C1 with NULL for that grouping set.
select C1 + MAX(C2) from T group by grouping sets ((C1), ());
SQL 2003 - 2011
I still don't have a definitive answer for this. It hinges on what they meant (or forgot to specify?) by "references" in the replacement rule. It would be clearer if it said one of "immediately contained", "simply contained" or "directly contained", as defined in ISO 9075-1 (SQL Part 1: Framework).
The note (number 134 in SQL 2003) at the start of General Rules says "As a
result of the syntactic transformations specified in the Syntax Rules of this
Sub-clause, only primitive <group by clause>s are left to consider." So the
aggregate argument either has or has not actually been replaced: we aren't
expected to evaluate aggregates in a special way (whereas if General Rule 3 were in effect applied before the NULL substitution of Syntax Rule 17 then answer (1) would be correct).
I found a draft of Technical Corrigendum 5 [pdf], which is a "diff" towards SQL 2003. This includes the relevant changes to on pages 80-87. Unfortunately the bulk of the change has only the brief rationale "Provide a correct, unified treatment of CUBE and ROLLUP". General Rule 3, quoted above, has the rationale "clarify the semantics of column references".

SQL UPDATE: previous updated columns value in further columns

Suppose I have this query:
UPDATE TEST SET
a = a + 23,
b = (b+5)/a,
c = c + a + b
WHERE
d = 6
OR
d = 10
and the original values of the columns is
a = 0
b = 5
c = 10
Will the query crash because of the 0 value of a (so (b+5)/a won't calculate) or will a already have value = 23.
The general question is: in an UPDATE statement, the values used to update the further columns are the original values of the already updated columns or the updated values?
Are there difference between the database used? MySQL, SQL Server, Oracle, DB2, ...?
EDIT
What would be the best practise to have the updated values on the "right side"?
Subqueries will work?
How to solve the problem of multiple values updated? Subqueries should return only 1 value
UPDATE TEST SET
a = a + 23,
b = (b+5)/(SELECT a FROM TEST WHERE d = 6),
c = c + a + b
WHERE
d = 6
OR
d = 10
In all SQL databases I know of except MySQL, all updates are done exactly simultaneously, which means a will be zero in the update of b.
In MySQL, the updates are done in order, which means a will be 23.
A very simple PostgreSQL example and a MySQL example with a different result.
In all ANSI compliant database systems, transactions are atomic and isolated, which means they happen all at once. Your query will crash for the given values.

How to label a big set of “transitive groups” with a constraint?

EDIT after #NealB solution: the #NealB's solution is very very fast comparated with any another one, and dispenses this new question about "add a constraint to improve performance". The #NealB's not need any improve, have O(n) time and is very simple.
The problem of "label transitive groups with SQL" have an elegant solution using recursion and CTE... But this solution consumes an exponential time (!). I need to work with 10000 itens: with 1000 itens need 1 second, with 2000 need 1 day...
Constraint: in my case is possible to break the problem into pieces of ~100 itens or less, but only to select one group of ~10 itens, and discard all the other ~90 labeled itens...
There are a generic algotithm to add and use this kind of "pre-selection", to reduce the quadratic, O(N^2), time? Perhaps, as showed by comments and #wildplasser, a O(N log(N)) time; but I expect, with "pre-selection" to reduce to O(N) time.
(EDIT)
I try to use alternative algorithm, but it need some improvement to use as solution here; or, to really increase performance (to O(N) time), need to use "pre-selection".
The "pre-selection" (constraint) is based on a "super-set grouping"... Stating by the original "How to label 'transitive groups' with SQL?" question t1 table,
table T1
(original T1 augmented by "super-set grouping label" ssg, and more one row)
ID1 | ID2 | ssg
1 | 2 | 1
1 | 5 | 1
4 | 7 | 1
7 | 8 | 1
9 | 1 | 1
10 | 11 | 2
So there are three groups,
g1: {1,2,5,9} because "1 t 2", "1 t 5" and "9 t 1"
g2: {4,7,8} because "4 t 7" and "7 t 8"
g3: {10,11} because "10 t 11"
The super-group is only a auxiliary grouping,
ssg1: {g1,g2}
ssg2: {g3}
If we have M super-group-items and N total T1 items, the average group length will be less tham N/M. We can suppose (for my typical problem) also that ssg maximum length is ~N/M.
So, the "label algorithm" need to run only M times with ~N/M items if it use the ssg constraint.
An SQL only soulution appears to be a bit of a problem here. With the help of some procedural
programming on top of SQL the solution appears to be failry simple and efficient. Here is a brief outline
of a solution as could be implemented using any procedural language invoking SQL.
Declare table R with primary key ID where ID corresponds the same domain as ID1 and ID2 of table T1.
Table R contains one other non-key column, a Label number
Populate table R with the range of values found in T1. Set Label to zero (no label).
Using your example data, the initial setup for R would look like:
Table R
ID Label
== =====
1 0
2 0
4 0
5 0
7 0
8 0
9 0
Using a host language cursor plus an auxiliary counter, read each row from T1. Lookup ID1 and ID2 in R. You will find one of
four cases:
Case 1: ID1.Label == 0 and ID2.Label == 0
In this case neither one of these IDs have been "seen" before: Add 1 to the counter and then update both
rows of R to the value of the counter: update R set R.Label = :counter where R.ID in (:ID1, :ID2)
Case 2: ID1.Label == 0 and ID2.Label <> 0
In this case, ID1 is new but ID2 has already been assigned a label. ID1 needs to be assigned to the
same label as ID2: update R set R.Lablel = :ID2.Label where R.ID = :ID1
Case 3: ID1.Label <> 0 and ID2.Label == 0
In this case, ID2 is new but ID1 has already been assigned a label. ID2 needs to be assigned to the
same label as ID1: update R set R.Lablel = :ID1.Label where R.ID = :ID2
Case 4: ID1.Label <> 0 and ID2.Label <> 0
In this case, the row contains redundant information. Both rows of R should contain the same Label value. If not,
there is some sort of data integrity problem. Ahhhh... not quite see edit...
EDIT I just realized that there are situations where both Label values here could be non-zero and different. If both are non-zero and different then two Label groups need to be merged at this point. All you need to do is choose one Label and update the others to match with something like: update R set R.Label to ID1.Label where R.Label = ID2.Label. Now both groups have been merged with the same Label value.
Upon completion of the cursor, table R will contain Label values needed to update T2.
Table R
ID Label
== =====
1 1
2 1
4 2
5 1
7 2
8 2
9 1
Process table T2
using something along the lines of: set T2.Label to R.Label where T2.ID1 = R.ID. The end result should be:
table T2
ID1 | ID2 | LABEL
1 | 2 | 1
1 | 5 | 1
4 | 7 | 2
7 | 8 | 2
9 | 1 | 1
This process is puerly iterative and should scale to fairly large tables without difficulty.
I suggest you check this and use some
general-purpose language for solving it.
http://en.wikipedia.org/wiki/Disjoint-set_data_structure
Traverse the graph, maybe run DFS or BFS from each node,
then use this disjoint set hint. I think this should work.
The #NealB solution is the faster(!) See an example of PostgreSQL implementation here.
Below an example of another "brute force algorithm", only for curiosity!
As #peter.petrov and #RBarryYoung suggested, some performance problems can be avoided abandoning the CTE recursion... I do some issues at the basic labeler, and, abover I add the constraint for grouping by a super-set label. This new transgroup1_loop() function is working!
PS: this solution still have performance limitations, please post your answer with better, or with some adaptation of this one.
-- DROP table transgroup1;
CREATE TABLE transgroup1 (
id serial NOT NULL PRIMARY KEY,
items integer[], -- two or more items in the transitive relationship
ssg_label varchar(12), -- the super-set gropuping label
dels integer[] DEFAULT array[]::integer[]
);
INSERT INTO transgroup1(items,ssg_label) values
(array[1, 2],'1'),
(array[1, 5],'1'),
(array[4, 7],'1'),
(array[7, 8],'1'),
(array[9, 1],'1'),
(array[10, 11],'2');
-- or SELECT array[id1, id2],ssg_label FROM t1, with 10000 items
them, with these two functions we can solve the problem,
CREATE FUNCTION transgroup1_loop(p_ssg varchar, p_max_i integer DEFAULT 100)
RETURNS integer AS $funcBody$
DECLARE
cp_dels integer[];
i integer;
BEGIN
i:=1;
LOOP
UPDATE transgroup1
SET items = array_uunion(transgroup1.items,t2.items),
dels = transgroup1.dels || t2.id
FROM transgroup1 AS t1, transgroup1 AS t2
WHERE transgroup1.id=t1.id AND t1.ssg_label=$1 AND
t1.id>t2.id AND t1.items && t2.items;
cp_dels := array(
SELECT DISTINCT unnest(dels) FROM transgroup1
); -- ensures all itens to del
RAISE NOTICE '-- bug, repeting dels, item-%; % dels! %', i, array_length(cp_dels,1), array_to_string(cp_dels,';','*');
EXIT WHEN i>p_max_i OR array_length(cp_dels,1)=0;
DELETE FROM transgroup1
WHERE ssg_label=$1 AND id IN (SELECT unnest(cp_dels));
UPDATE transgroup1 SET dels=array[]::integer[];
i:=i+1;
END LOOP;
UPDATE transgroup1 -- only to beautify
SET items = ARRAY(SELECT unnest(items) ORDER BY 1 desc);
RETURN i;
END;
$funcBody$ LANGUAGE plpgsql VOLATILE;
to run and see results, you can use
SELECT transgroup1_loop('1'); -- run with ssg-1 items only
SELECT transgroup1_loop('2'); -- run with ssg-2 items only
-- show all with a sequential group label:
SELECT *, dense_rank() over (ORDER BY id) AS group_label from transgroup1;
results:
id | items | ssg_label | dels | group_label
----+-----------+-----------+------+-------------
4 | {8,7,4} | 1 | {} | 1
5 | {9,5,2,1} | 1 | {} | 2
6 | {11,10} | 2 | {} | 3
PS: the function array_uunion() is the same as original,
CREATE FUNCTION array_uunion(anyarray,anyarray) RETURNS anyarray AS $$
-- ensures distinct items of a concatemation
SELECT ARRAY(SELECT unnest($1) UNION SELECT unnest($2))
$$ LANGUAGE sql immutable;

Aggregation over order-dependent partition?

I have a source data set like this (simplified to be more clear):
Key F1 F2
1 X 4
2 X 5
3 Y 6
4 X 9
5 X 7
6 X 8
7 Y 9
8 X 6
9 X 5
10 Y 3
The data is sorted by the Key field. Now, I want to compute an aggregate of the F2 field over partitions that are defined by the F1 field: A partition starts at the first X value and ends with the first subsequent Y value.
So, for example, I might want wo compute the MIN() over the partitions defined as described above. Then the result set would look like this:
rownum MIN(F2)
1 4
2 7
3 3
I have tried a number of resources (incl. our own intranet community and of course stackoverflow) but found nothing for my case. Usually partitioning only works with a field that can be used to identify the partitions. Here, the partitions are defined by a change in a field's content with respect to a given order.
Although I am aware that I may have to resort to writing a procedural solution I would prefer to solve this in pure SQL.
Any ideas how such a partitioning could be achieved with a SQL select statement?
Thanks and regards
Kai.
A little bit shorter solution: http://sqlfiddle.com/#!12/7390d/24
Query:
select min(f2)
from t t1
group by (select max(key)
from t t2
where t2.f1='Y' and
t1.key > t2.key)
Result:
| MIN |
-------
| 4 |
| 7 |
| 3 |
The idea is to find the key of preceding 'Y' for each row and group by it. Should work with any SQL engine.
You didn't specify engine or dialect or version so I assumed SQL Server 2012.
Example that you can run to see the solution: http://sqlfiddle.com/#!6/f5d38/21
You solve it by creating correct partitions in your set. Code looks like this.
WITH groupLimits as
(
SELECT
[Key] AS groupend
,COALESCE(LAG([Key]) OVER (order by [Key]),0)+1 AS groupstart
FROM sourceData
WHERE F1 = 'Y'
)
SELECT
MIN(sourceData.F2)
FROM groupLimits
INNER JOIN sourceData
ON sourceData.[Key] BETWEEN groupLimits.groupstart and groupLimits.groupend
GROUP BY groupLimits.groupstart
ORDER BY groupLimits.groupstart

SQL Recursive Tables

I have the following tables, the groups table which contains hierarchically ordered groups and group_member which stores which groups a user belongs to.
groups
---------
id
parent_id
name
group_member
---------
id
group_id
user_id
ID PARENT_ID NAME
---------------------------
1 NULL Cerebra
2 1 CATS
3 2 CATS 2.0
4 1 Cerepedia
5 4 Cerepedia 2.0
6 1 CMS
ID GROUP_ID USER_ID
---------------------------
1 1 3
2 1 4
3 1 5
4 2 7
5 2 6
6 4 6
7 5 12
8 4 9
9 1 10
I want to retrieve the visible groups for a given user. That it is to say groups a user belongs to and children of these groups. For example, with the above data:
USER VISIBLE_GROUPS
9 4, 5
3 1,2,4,5,6
12 5
I am getting these values using recursion and several database queries. But I would like to know if it is possible to do this with a single SQL query to improve my app performance. I am using MySQL.
Two things come to mind:
1 - You can repeatedly outer-join the table to itself to recursively walk up your tree, as in:
SELECT *
FROM
MY_GROUPS MG1
,MY_GROUPS MG2
,MY_GROUPS MG3
,MY_GROUPS MG4
,MY_GROUPS MG5
,MY_GROUP_MEMBERS MGM
WHERE MG1.PARENT_ID = MG2.UNIQID (+)
AND MG1.UNIQID = MGM.GROUP_ID (+)
AND MG2.PARENT_ID = MG3.UNIQID (+)
AND MG3.PARENT_ID = MG4.UNIQID (+)
AND MG4.PARENT_ID = MG5.UNIQID (+)
AND MGM.USER_ID = 9
That's gonna give you results like this:
UNIQID PARENT_ID NAME UNIQID_1 PARENT_ID_1 NAME_1 UNIQID_2 PARENT_ID_2 NAME_2 UNIQID_3 PARENT_ID_3 NAME_3 UNIQID_4 PARENT_ID_4 NAME_4 UNIQID_5 GROUP_ID USER_ID
4 2 Cerepedia 2 1 CATS 1 null Cerebra null null null null null null 8 4 9
The limit here is that you must add a new join for each "level" you want to walk up the tree. If your tree has less than, say, 20 levels, then you could probably get away with it by creating a view that showed 20 levels from every user.
2 - The only other approach that I know of is to create a recursive database function, and call that from code. You'll still have some lookup overhead that way (i.e., your # of queries will still be equal to the # of levels you are walking on the tree), but overall it should be faster since it's all taking place within the database.
I'm not sure about MySql, but in Oracle, such a function would be similar to this one (you'll have to change the table and field names; I'm just copying something I did in the past):
CREATE OR REPLACE FUNCTION GoUpLevel(WO_ID INTEGER, UPLEVEL INTEGER) RETURN INTEGER
IS
BEGIN
DECLARE
iResult INTEGER;
iParent INTEGER;
BEGIN
IF UPLEVEL <= 0 THEN
iResult := WO_ID;
ELSE
SELECT PARENT_ID
INTO iParent
FROM WOTREE
WHERE ID = WO_ID;
iResult := GoUpLevel(iParent,UPLEVEL-1); --recursive
END;
RETURN iResult;
EXCEPTION WHEN NO_DATA_FOUND THEN
RETURN NULL;
END;
END GoUpLevel;
/
Joe Cleko's books "SQL for Smarties" and "Trees and Hierarchies in SQL for Smarties" describe methods that avoid recursion entirely, by using nested sets. That complicates the updating, but makes other queries (that would normally need recursion) comparatively straightforward. There are some examples in this article written by Joe back in 1996.
I don't think that this can be accomplished without using recursion. You can accomplish it with with a single stored procedure using mySQL, but recursion is not allowed in stored procedures by default. This article has information about how to enable recursion. I'm not certain about how much impact this would have on performance verses the multiple query approach. mySQL may do some optimization of stored procedures, but otherwise I would expect the performance to be similar.
Didn't know if you had a Users table, so I get the list via the User_ID's stored in the Group_Member table...
SELECT GroupUsers.User_ID,
(
SELECT
STUFF((SELECT ',' +
Cast(Group_ID As Varchar(10))
FROM Group_Member Member (nolock)
WHERE Member.User_ID=GroupUsers.User_ID
FOR XML PATH('')),1,1,'')
) As Groups
FROM (SELECT User_ID FROM Group_Member GROUP BY User_ID) GroupUsers
That returns:
User_ID Groups
3 1
4 1
5 1
6 2,4
7 2
9 4
10 1
12 5
Which seems right according to the data in your table. But doesn't match up with your expected value list (e.g. User 9 is only in one group in your table data but you show it in the results as belonging to two)
EDIT: Dang. Just noticed that you're using MySQL. My solution was for SQL Server. Sorry.
-- Kevin Fairchild
There was already similar question raised.
Here is my answer (a bit edited):
I am not sure I understand correctly your question, but this could work My take on trees in SQL.
Linked post described method of storing tree in database -- PostgreSQL in that case -- but the method is clear enough, so it can be adopted easily for any database.
With this method you can easy update all the nodes depend on modified node K with about N simple SELECTs queries where N is distance of K from root node.
Good Luck!
I don't remember which SO question I found the link under, but this article on sitepoint.com (second page) shows another way of storing hierarchical trees in a table that makes it easy to find all child nodes, or the path to the top, things like that. Good explanation with example code.
PS. Newish to StackOverflow, is the above ok as an answer, or should it really have been a comment on the question since it's just a pointer to a different solution (not exactly answering the question itself)?
There's no way to do this in the SQL standard, but you can usually find vendor-specific extensions, e.g., CONNECT BY in Oracle.
UPDATE: As the comments point out, this was added in SQL 99.