Will using and updating the same field in one UPDATE statment cause undefined behaviour? - sql

Here is an example
UPDATE duration = datediff(ss, statustime, getdate()), statustime = getdate() where id = 2009
Is the field duration going to be assigned an undefined assigned value since statustime is getting used and assigned in the same statement? (i.e. positive value if datediff processed first or negative if statustime is processed first)
I can definitely update it in two separate statements but I am curious it is possible to update it in one statement.

No. Both values are calculated before either assignment is made.
Update:
I tracked down the ANSI-92 spec, and section 13.10 on the UPDATE statement says this:
The <value expression>s are effectively evaluated for each row
of T before updating any row of T.
The only other applicable rules refer to secion 9.2, but that only deals with one assignment in isolation.
There is some room for ambiguity here: it could calculate and update all statustime rows first and all duration rows afterward and still technically follow the spec, but that would be a very ... odd ... way to implement it.

My gut instinct says 'no', but this will vary depending on the SQL implementation, query parser, and so on. Your best bet in situations like these is to run a quick experiment on your server (wrap it in a transaction to keep it from modifying data), and see how your particular implementation behaves.

Related

is an update which joins on itself guaranteed to be atomic?

I have an update to execute in postgresql - which needs to update the table's value based on its old (current) value, and return the old value and the new value, and it must use the value locked for the update in the update as its old value (i.e. for my purposes, I cannot allow it to use a stale prior value!)
basically:
UPDATE mytable new
SET balance=master.balance
FROM master, mytable old
WHERE new.id=$1 AND old.id=new.id AND master.id=new.masterid
RETURNING old.balance, new.balance
I need to know for certain that old.x is the values in new.x at the time of this action -- that it cannot be interleaved between selecting that value and updating itself -- but I need to refer to both (the prior and new balances).
Can anyone direct me to where I can know for certain?
And if this is not true (old can indeed be stale) - then do you have a suggestion as to how to enforce that it is essentially select for update that works syntactically?
I would still love to know if the above is or is not guaranteed to be atomic (nothing can change for that row on mytable between the read & write)...
However, I am going with this for now which I do believe has that guarantee:
WITH old AS
(SELECT * FROM mytable WHERE id=$1 FOR UPDATE)
UPDATE mytable new
SET balance=master.balance
FROM old, master
WHERE new.id=old.id AND master.id=old.masterid
RETURNING old.balance, new.balance
It is possible that your query will return stale values for old. If an update is in progress while your statement is executing, the query can see the old value, but the update will block and see the new value once the concurrent transaction is done. The exact behavior probably depends on the execution plan chosen.
Your answer shows one way to avoid the problem. The other, perhaps simpler, way is to use the REPEATABLE READ transaction isolation level. Then you get a serialization error if the old values would be stale.

Oracle error code ORA-00913 - IN CLAUSE limitation with more than 65000 values (Used OR condition for every 1k values)

My application team is trying to fetch 85,000 values from a table using a SELECT query that is being built on the fly by their program.
SELECT * FROM TEST_TABLE
WHERE (
ID IN (00001,00002, ..., 01000)
OR ID IN (01001,01002, ..., 02000)
...
OR ID IN (84001,84002, ..., 85000)
));
But i am getting an error "ORA-00913 too many values".
If I reduce the in clause to only 65,000 values, I am not getting this error. Is there any limitation of values for the IN CLAUSE (accompanied by OR clause)
The issue isn't about in lists; it is about a limit on the number of or-delimited compound conditions. I believe the limit applies not to or specifically, but to any compound conditions using any combination of or, and and not, with or without parentheses. And, importantly, this doesn't seem to be documented anywhere, nor acknowledged by anyone at Oracle.
As you clearly know already, there is a limit of 1000 items in an in list - and you have worked around that.
The parser expands an in condition as a compound, or-delimited condition. The limit that applies to you is the one I mentioned already.
The limit is 65,535 "atomic" conditions (put together with or, and, not). It is not difficult to write examples that confirm this.
The better question is why (and, of course, how to work around it).
My suspicion: To evaluate such compound conditions, the compiled code must use a stack, which is very likely implemented as an array. The array is indexed by unsigned 16-bit integers (why so small, only Oracle can tell). So the stack size can be no more than 2^16 = 65,536; and actually only one less, because Oracle thinks that array indexes start at 1, not at 0 - so they lose one index value (0).
Workaround: create a temporary table to store your 85,000 values. Note that the idea of using tuples (artificial as it is) allows you to overcome the 1000 values limit for a single in list, but it does not work around the limit of 65,535 "atomic" conditions in an or-delimited compound condition; this limit applies in the most general case, regardless of where the conditions come from originally (in lists or anything else).
More information on AskTom - you may want to start at the bottom (my comments, which are the last ones in the threads):
https://asktom.oracle.com/pls/apex/f?p=100:11:10737011707014::::P11_QUESTION_ID:9530196000346534356#9545388800346146842
https://asktom.oracle.com/pls/apex/f?p=100:11:10737011707014::::P11_QUESTION_ID:778625947169#9545394700346458835

How to cache return value of a function for a single query

I want to use getdate() function 3-4 times in my single query for validation check. But I want that everytime I anticipate to get current datetime in a single query execution I get the same date at all 3-4 places. Not technically computers are that fast that 99.9% times I will get the same datetime at all places in query. But theoretically it may lead to bug. So how can cache that getdate return by calling it once and use that cached values in query.
But to add, I want to write such statement in check constraint, so I cant declare local variables, or any such thing.
SQL Server has the concept of run-time constant functions. The best way to describe these is that the first thing the execution engine does is pull the function references out from the query plan and execute each such function once per query.
Note that the function references appear to be column-based. So different columns can have different values, but different rows should have the same value within a column.
The two most common functions in this category are getdate() and rand(). Ironically, I find that this is a good thing for getdate(), but a bad thing for rand() (what kind of random number generator always returns the same value?).
For some reason, I can't find the actual documentation on run-time constant functions. But here are some respected blog posts that explain the matter:
https://sqlperformance.com/2014/06/t-sql-queries/dirty-secrets-of-the-case-expression
http://sqlblog.com/blogs/andrew_kelly/archive/2008/03/01/when-a-function-is-indeed-a-constant.aspx
https://blogs.msdn.microsoft.com/conor_cunningham_msft/2010/04/23/conor-vs-runtime-constant-functions/

'-999' used for all condition

I have a sample of a stored procedure like this (from my previous working experience):
Select * from table where (id=#id or id='-999')
Based on my understanding on this query, the '-999' is used to avoid exception when no value is transferred from users. So far in my research, I have not found its usage on the internet and other company implementations.
#id is transferred from user.
Any help will be appreciated in providing some links related to it.
I'd like to add my two guesses on this, although please note that to my disadvantage, I'm one of the very youngest in the field, so this is not coming from that much of history or experience.
Also, please note that for any reason anybody provides you, you might not be able to confirm it 100%. Your oven might just not have any leftover evidence in and of itself.
Now, per another question I read before, extreme integers were used in some systems to denote missing values, since text and NULL weren't options at those systems. Say I'm looking for ID#84, and I cannot find it in the table:
Not Found Is Unlikely:
Perhaps in some systems it's far more likely that a record exists with a missing/incorrect ID, than to not be existing at all? Hence, when no match is found, designers preferred all records without valid IDs to be returned?
This however has a few problems. First, depending on the design, user might not recognize the results are a set of records with missing IDs, especially if only one was returned. Second, current query poses a problem as it will always return the missing ID records in addition to the normal matches. Perhaps they relied on ORDERing to ease readability?
Exception Above SQL:
AFAIK, SQL is fine with a zero-row result, but maybe whatever thing that calls/used to call it wasn't as robust, and something goes wrong (hard exception, soft UI bug, etc.) when zero rows are returned? Perhaps then, this ID represented a dummy row (e.g. blanks and zeroes) to keep things running.
Then again, this also suffers from the same arguments above regarding "record is always outputted" and ORDER, with the added possibility that the SQL-caller might have dedicated logic to when the -999 record is the only record returned, which I doubt was the most practical approach even in whatever era this was done at.
... the more I type, the more I think this is the oven, and only the great grandmother can explain this to us.
If you want to avoid exception when no value transferred from user, in your stored procedure declare parameter as null. Like #id int = null
for instance :
CREATE PROCEDURE [dbo].[TableCheck]
#id int = null
AS
BEGIN
Select * from table where (id=#id)
END
Now you can execute it in either ways :
exec [dbo].[TableCheck] 2 or exec [dbo].[TableCheck]
Remember, it's a separate thing if you want to return whole table when your input parameter is null.
To answer your id = -999 condition, I tried it your way. It doesn't prevent any exception

When using GETDATE() in many places, is it better to use a variable?

By better, I mean does it improve performance by some non-marginal amount?
That is to say, each time I call GETDATE(), what amount of work does the server do to return that value?
If I'm using GETDATE() in many places in a stored procedure, should I instead be creating a variable to store the date of the transaction?
declare #transDate datetime = GETDATE()
Bench-marking data would be fantastic.
EDIT I want to clarify: I'm interested mainly in the actual performance differences between these two possibilities, and whether or not it is significant.
[NOTE: If you are going to downvote this answer, please leave a comment explaining why. It has already been downvoted many times, and finally ypercube (thank you) explained at least one reason why. I can't remove the answer because it is accepted, so you might as well help to improve it.]
According to this exchange on Microsoft, GETDATE() switched from being constant within a query to non-deterministic in SQL Server 2005. In retrospect, I don't think that is accurate. I think it was completely non-deterministic prior to SQL Server 2005 and then hacked into something called "non-deterministic runtime constant" since SQL Server 2005". The later phrase really seems to mean "constant within a query".
(And GETDATE() is defined as unambiguously and proudly non-deterministic, with no qualifiers.)
Alas, in SQL Server, non-deterministic does not mean that a function is evaluated for every row. SQL Server really does make this needlessly complicated and ambiguous with very little documentation on the subject.
In practice the function call is evaluated when the query is running rather than once when the query is compiled and its value changes each time it is called. In practice, GETDATE() is only evaluated once for each expression where it is used -- at execution time rather than compile time. However, Microsoft puts rand() and getdate() into a special category, called non-deterministic runtime constant functions. By contrast, Postgres doesn't jump through such hoops, it just calls functions that have a constant value when executed as "stable".
Despite Martin Smith's comment, SQL Server documentation is simply not explicit on this matter -- GETDATE() is described as both "nondeterministic" and "non-deterministic runtime constant", but that term isn't really explained. The one place I have found the term , for instance, the very next lines in the documentation say not to use nondeterministic functions in subqueries. That would be silly advice for "nondeterministic runtime constant".
I would suggest using a variable with a constant even within a query, so you have a consistent value. This also makes the intention quite clear:
You want a single value inside the query. Within a single query, you can do something like:
select . . .
from (select getdate() as now) params cross join
. . .
Actually, this is a suggestion that should evaluate only once in the query, but there might be exceptions. Confusion arises because getdate() returns the same value on all different rows -- but it can return different values in different columns. Each expression with getdate() is evaluated independently.
This is obvious if you run:
select rand(), rand()
from (values (1), (2), (3)) v(x);
Within a stored procedure, you would want to have a single value in a variable. What happens if the stored procedure is run as midnight passes by, and the date changes? What impact does that have on the results?
As for performance, my guess is that the date/time lookup is minimal and for a query occurs once per expression as the query starts to run. This should not really a performance issue, but more of a code-consistency issue.
My suggestion would be to use a variable mainly because if you have a long-running process, the GetDate() value might be different between calls.
Unless you are only using the Date part of GetDate() then you will be sure you are always using the same value.
One reason to use a variable with getdate() or functions like suser_sname() is a huge performance difference if you are inserting rows, or if you are doing a GROUP BY. You will notice this if you insert large amount of rows.
I suffered this myself migrating 300GB of data to several tables.
I was testing on a couple of stored procedures using the GETDATE() function as a variable within an SP and I was having increase on IO reads and execution time due to the fact that query optimizer does not know what's the value to operate read this Stored Procedure Execution with Parameters, Variables, and Literals , with that said you can use the GETDATE() function in every single part of the SP as #Gordon Linoff mentioned its value does not change during execution or in order to avoid/remove the thinking that the value might change I did create a parameters this way:
CREATE PROC TestGetdate
(
#CurrentDate DATETIME = NULL
)
AS
SET CurrentDate = GETDATE()
.....
and then use the parameters as you see fit, you'll see good results
Any comments or suggestions are welcome.
I used
WHERE ActualDateShipped+30 > dbo.Today()
in combination with function below. Brought my query time from 13 seconds to 2 seconds. No prior answers in this post helped this problem in SQL 2008/R2.
CREATE FUNCTION [dbo].[Today]()
RETURNS date
AS
BEGIN
DECLARE #today date = getdate()
RETURN #today
End