Oracle DETERMINISTIC HINT Overhead - sql

DETERMINISTIC HIT (as Oracle says) is used to cache the result of a function if it could be deterministic, but what is the overhead of that benefit?
I'll try to explain this better:
CREATE OR REPLACE FUNCTION betwnstr (
string_in IN VARCHAR2
, start_in IN INTEGER
, end_in IN INTEGER
)
RETURN VARCHAR2 DETERMINISTIC
IS
BEGIN
RETURN (SUBSTR (string_in, start_in, end_in - start_in + 1));
END;
/
This simple function extract the characters from BEGIN and END index from a given string.
Now i'll start to use this Function in different tables as SELECT result(other functions, procedure, package etc) and Oracle will start caching all the result from the same input.
Sure this is a wonderful result just adding a simple world on function declaration, but what is the side effect of an intensive use of this? For example, if this function is called million of times with different input ?
I could have many other functions as DETERMINISTICT for example:
AN DETERMINISTIC function to calculate the difference (in DAYS) from two given date
ecc

The documentation says:
DETERMINISTIC
Tells the optimizer that the function returns the same value whenever it is invoked with the same parameter values (if this is not true, then specifying DETERMINISTIC causes unpredictable results). If the function was invoked previously with the same parameter values, the optimizer can use the previous result instead of invoking the function again.
The optimizer can use the previous result, but doesn't have too; this is just asserting that if it needed to call it multiple times for the same parameter values - generally within a single query - it can choose to only make the call once, since you're promised it that it would always get the same result. That doesn't necessarily imply that function results could be cached somewhere between queries, though they may be cached by other mechanisms (I think).
When Oracle does cache things it manages the cache size to stay within available memory, and to optimise the memory available for various functionality. Basically you don't need to worry about side-effects from making a function deterministic, assuming you're using it properly.
There's more documentation here, including how this relates to function-based indexes etc.

Related

How postgresql 'remember' result of function?

I faced with strange behavior of postgresql, please, could you clarify it for me?
I created function, which return constants from constants table.
CREATE TABLE constants ( key varchar PRIMARY KEY , value varchar );
CREATE OR REPLACE FUNCTION get_constant(_key varchar) RETURNS varchar
AS $$ SELECT value FROM constants WHERE key = _key; $$ LANGUAGE sql
IMMUTABLE;
Then I added a constant to the table.
insert into constants(key, value)
values('const', '1')
;
Then if I change the value of the constant and call the function:
select get_constant('const');
Then result is CORRECT.
BUT!
If I call function in other procedure, for example:
create or REPLACE PROCEDURE etl.test()
LANGUAGE plpgsql
AS $$
declare
begin
raise notice '%', etl.get_constant('const');
END $$;
Then it rememer first result of calling, and don't change result of raise notice, even if I change constant in table.
But if I recompile procedure - then new const-value printing correct.
I tried to find documentation about it, tried google: 'cache results of postgre SQL procedure', and ect., but found nothing.
Could you clarify it and attach link to documentation this issue?
The documentation for CREATE TABLE says this about the IMMUTABLE keyword:
IMMUTABLE indicates that the function cannot modify the database and always returns the same result when given the same argument values; that is, it does not do database lookups or otherwise use information not directly present in its argument list. If this option is given, any call of the function with all-constant arguments can be immediately replaced with the function value.
So by declaring etl.get_constant with that keyword, you're telling Postgres "the output of this function will always be the same for a given input, forever".
The call etl.get_constant('const') has "all-constant arguments" - the value 'const' won't ever change. Since you've told Postgres that etl.get_constant will always return the same output for the same input, it immediately replaces the function call with the result.
So when you call etl.test() it doesn't run etl.get_constant at all, it just returns the value it got earlier, which you told it would be valid forever.
Compare that with the next paragraph on the same page (emphasis mine):
STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements. This is the appropriate selection for functions whose results depend on database lookups, parameter variables (such as the current time zone), etc.
So if your "constant" is subject to change, but not within the scope of a particular query, you should mark it STABLE, not IMMUTABLE.

What is the purpose of using `timestamp(nullif('',''))`

Folks
I am in the process of moving a decade old back-end from DB2 9.5 to Oracle 19c.
I frequently see in SQL queries and veiw definitions bizarre timestamp(nullif('','')) constructs used instead of a plain null.
What is the point of doing so? Why would anyone in their same mind would want to do so?
Disclaimer: my SQL skills are fairly mediocre. I might well miss something obvious.
It appears to create a NULL value with a TIMESTAMP data type.
The TIMESTAMP DB2 documentation states:
TIMESTAMP scalar function
The TIMESTAMP function returns a timestamp from a value or a pair of values.
TIMESTAMP(expression1, [expression2])
expression1 and expression2
The rules for the arguments depend on whether expression2 is specified and the data type of expression2.
If only one argument is specified it must be an expression that returns a value of one of the following built-in data types: a DATE, a TIMESTAMP, or a character string that is not a CLOB.
If you try to pass an untyped NULL to the TIMESTAMP function:
TIMESTAMP(NULL)
Then you get the error:
The invocation of routine "TIMESTAMP" is ambiguous. The argument in position "1" does not have a best fit.
To invoke the function, you need to pass one of the required DATE, TIMESTAMP or a non-CLOB string to the function which means that you need to coerce the NULL to have one of those types.
This could be:
TIMESTAMP(CAST(NULL AS VARCHAR(14)))
TIMESTAMP(NULLIF('',''))
Using NULLIF is more confusing but, if I have to try to make an excuse for using it, is slightly less to type than casting a NULL to a string.
The equivalent in Oracle would be:
CAST(NULL AS TIMESTAMP)
This also works in DB2 (and is even less to type).
It is not clear why - in any SQL dialect, no matter how old - one would use an argument like nullif('',''). Regardless of the result, that is a constant that can be calculated once and for all, and given as argument to timestamp(). Very likely, it should be null in any dialect and any version. So that should be the same as timestamp(null). The code you found suggests that whoever wrote it didn't know what they were doing.
One might need to write something like that - rather than a plain null - to get null of a specific data type. Even though "theoretical" SQL says null does not have a data type, you may need something like that, for example in a view, to define the data type of the column defined by an expression like that.
In Oracle you can use the cast() function, as MT0 demonstrated already - that is by far the most common and most elegant equivalent.
If you want something much closer in spirit to what you saw in that old code, to_timestamp(null) will have the same effect. No reason to write something more complicated for null given as argument, though - along the lines of that nullif() call.

Function defining views data type as VARCHAR2(4000) affecting performance?

I've created a function that returns a VARCHAR2 variable, the variable it returns is typed as VARCHAR2(200) within the function itself.
I've also created a view that uses that function as a column in it. This automatically sets the datatype of that column in the view to VARCHAR2(4000).
Does this cause any performance/storage issues?
Furthermore, would it be better for me to throw in a SUBSTR to limit it to the proper 200 characters it should be?
Oracle 11g BTW.
Even if using VARCHAR2(4000) everywhere may not cause problems if everything you do is in SQL (and it may very well not cause problems in that case), it may be a pain once you start pulling data into applications. The applications may have no choice but to allocate memory based on the maximum assumed length of strings in a column... you may use up memory very quickly that way. Not to mention GUI's that will reserve and format for 4000 characters for whatever the function returns...
To limit the size of the column in the view, the solution is what you already said: if you wrap your function within substr() in the select clause of your view, the column datatype will be set to VARCHAR2(200). That seems like the right approach to me.

In PostgreSQL, weird issue about citext performance?

In PostgreSQL manual it says that citext is simply a module that implements TEXT data type with a call to LOWER():
The citext module provides a case-insensitive character string type,
citext. Essentially, it internally calls lower when comparing values.
Otherwise, it behaves almost exactly like text.
On the other hand at the end of the documntation it says:
citext is not as efficient as text because the operator functions and
the B-tree comparison functions must make copies of the data and
convert it to lower case for comparisons. It is, however, slightly
more efficient than using lower to get case-insensitive matching.
So I'm confused if it uses LOWER() how can it be "slightly more efficient than using lower"?
It doesn't call the SQL function lower. As the documentation says, it essentially internally calls lower.
The calls happen within the C functions which implement the citext comparison operations. And rather than actually calling lower, they go directly to the underlying str_tolower() routine. You can see this for yourself in the source code, most of which is relatively easy to follow in this case.
So what you're saving, more or less, is the overhead of two SQL function calls per comparison. Which is not insignificant, compared with the cost of the comparison itself, but you'd probably never notice either of them next to the other costs in a typical query.

What does it mean by "Non-deterministic User-Defined functions can be used in a deterministic manner"?

According to MSDN SQL BOL (Books Online) page on Deterministic and Nondeterministic Functions, non-deterministic functions can be used "in a deterministic manner"
The following functions are not always deterministic, but can be used in indexed views or indexes on computed columns when they are specified in a deterministic manner.
What does it mean by non-deterministic functions can be used in a deterministic manner?
Can someone illustrate how that can be done? and where you would do so?
That a function is deterministic means that it is guaranteed always to return the same output value for the same input arguments.
Using a non-deterministic function in a deterministic manner I assume means that you ensure that the range of arguments you will pass to the function is such that the return value will be deterministic, ie. dependent only opon those arguments.
What this implies in practice depends on what the function does and in what way it is non-deterministic.
An example:
RAND(1) // deterministic, always returns the same number
versus:
RAND() // non-deterministic, returns new random number on each call
Note this uses the MSDN article's definition of the word "deterministic"
the BOL actually states:
The following functions are not
always deterministic, but can be
used in indexed views or indexes on
computed columns when they are
specified in a deterministic manner.
and then below it states what conditions must be met to make them deterministic.
E.g.
CAST - Deterministic unless used with
datetime, smalldatetime, or
sql_variant
In other words you need to meet those condition to use them in deterministic manner
For example when you create a table
CREATE TABLE [dbo].[deterministicTest](
[intDate] [int] NULL,
[dateDateTime] [datetime] NULL,
[castIntToDateTime] AS (CONVERT([datetime],[intDate],0)),
[castDateTimeToInt] AS (CONVERT([int],[dateDateTime],0)),
[castIntToVarchar] AS (CONVERT([varchar],[intDate],0))
) ON [PRIMARY]
you can apply index on castIntToVarchar but if you try to add index to castDateTimeToInt or castIntToDateTime you will get the following error:
Column 'castDateTimeToInt'(castIntToDateTime) in table 'dbo.deterministicTest' cannot be used in an index or statistics or as a partition key because it is non-deterministic.
So the dateTime cannot be used neither as a source nor the target format of the CONVERT function if you want to stay deterministic
BOL definitions should read:
”Deterministic functions always return the same result on the same row any time they are called with a specific set of input values (row) and given the same state of the database.
In other words deterministic functions always return the same result on any particular fixed value from their domain (in this case domain is a row).
Nondeterministic functions may return different results each time they are called with a specific set of input values (row) even if the database state that they access remains the same.
In this case nondeterministic functions
a) Return different values every time they called.
b) Depend on the values outside of the row they applied on.
Examples of group a): NEWID(), GETDATE(), GETUTDATE(), RAND() with no seed specified.
Examples of group b): GET_TRANSMISSION_STATUS(), LAG(), RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE(), SUM() when specified with the OVER and ORDER BY clauses.
”
Please note that some authors use different definition of deterministic functions which may lead to confusion.