Using the smaller of two values in SQL condition - sql

I have a database of videos with a field for both the video width, and height.
I have queries set up to get videos of a specific resolution however it fails to return any videos that are portrait/vertical.
I would like to be able to do something like WHERE MIN(width, height) == 1080 but to my knowledge, this isn't possible.
Is there anyway I can get my desired effect in SQLite?

SQLite supports multi argument min function which behaves like LEAST function.
min(X,Y,...)
The multi-argument min() function returns the argument with the
minimum value. The multi-argument min() function searches its
arguments from left to right for an argument that defines a collating
function and uses that collating function for all string comparisons.
If none of the arguments to min() define a collating function, then
the BINARY collating function is used. Note that min() is a simple
function when it has 2 or more arguments but operates as an aggregate
function if given only a single argument.
So you must be able to use it in the WHERE clause as you have mentioned in the question

You are looking for a CASE expression in your SELECT.
Something like
CASE WHEN width>height THEN height ELSE width END = 1000

Related

What is the purpose of using `timestamp(nullif('',''))`

Folks
I am in the process of moving a decade old back-end from DB2 9.5 to Oracle 19c.
I frequently see in SQL queries and veiw definitions bizarre timestamp(nullif('','')) constructs used instead of a plain null.
What is the point of doing so? Why would anyone in their same mind would want to do so?
Disclaimer: my SQL skills are fairly mediocre. I might well miss something obvious.
It appears to create a NULL value with a TIMESTAMP data type.
The TIMESTAMP DB2 documentation states:
TIMESTAMP scalar function
The TIMESTAMP function returns a timestamp from a value or a pair of values.
TIMESTAMP(expression1, [expression2])
expression1 and expression2
The rules for the arguments depend on whether expression2 is specified and the data type of expression2.
If only one argument is specified it must be an expression that returns a value of one of the following built-in data types: a DATE, a TIMESTAMP, or a character string that is not a CLOB.
If you try to pass an untyped NULL to the TIMESTAMP function:
TIMESTAMP(NULL)
Then you get the error:
The invocation of routine "TIMESTAMP" is ambiguous. The argument in position "1" does not have a best fit.
To invoke the function, you need to pass one of the required DATE, TIMESTAMP or a non-CLOB string to the function which means that you need to coerce the NULL to have one of those types.
This could be:
TIMESTAMP(CAST(NULL AS VARCHAR(14)))
TIMESTAMP(NULLIF('',''))
Using NULLIF is more confusing but, if I have to try to make an excuse for using it, is slightly less to type than casting a NULL to a string.
The equivalent in Oracle would be:
CAST(NULL AS TIMESTAMP)
This also works in DB2 (and is even less to type).
It is not clear why - in any SQL dialect, no matter how old - one would use an argument like nullif('',''). Regardless of the result, that is a constant that can be calculated once and for all, and given as argument to timestamp(). Very likely, it should be null in any dialect and any version. So that should be the same as timestamp(null). The code you found suggests that whoever wrote it didn't know what they were doing.
One might need to write something like that - rather than a plain null - to get null of a specific data type. Even though "theoretical" SQL says null does not have a data type, you may need something like that, for example in a view, to define the data type of the column defined by an expression like that.
In Oracle you can use the cast() function, as MT0 demonstrated already - that is by far the most common and most elegant equivalent.
If you want something much closer in spirit to what you saw in that old code, to_timestamp(null) will have the same effect. No reason to write something more complicated for null given as argument, though - along the lines of that nullif() call.

Postgres pg_trgm how to compare similarity for array of strings

I'm attempting to use pg_trgm for string fuzzy matching and I know it may be used like this:
SELECT * FROM artists WHERE SIMILARITY(name, 'Claud Monay') > 0.4;
where a scalar value may be used to compare against the similarity. However, I've seen this way of using SIMILARITY with an array of strings:
SELECT * FROM artists WHERE 'Cadinsky' % ANY(STRING_TO_ARRAY(name, ' '));
which uses the % operator which is a shorthand for comparing against the default value of 0.3. I'm trying to find the proper syntax to use ANY(STRING_TO_ARRAY(...)) but with the first form where an arbitrary scalar value may be given to compare the similarity against.
This is, most likely, just a simple question of properly using the syntax for ANY, but I'm failing at understanding what the correct form is.
There is no syntax to use ANY with 3 arguments (the string, the array of strings, and the similarity threshold). The way to do it is to set pg_trgm.similarity_threshold to the value you want rather than the default of 0.3, and then use % ANY.
If you want to use different thresholds in different parts of the query, you are out of luck with the ANY construct.
You can always define your own function, but you will probably not be able to get it to use an index.
create or replace function most_similar(text, text[]) returns double precision
language sql as $$
select max(similarity($1,x)) from unnest($2) f(x)
$$;
SELECT * FROM artists WHERE most_similar('Cadinsky', STRING_TO_ARRAY(name, ' '))>0.4;
I am not a DB expert nor good at SQL but here is my solution.
I basically use a function called unnest(). Thus, I can iterate over the array and check the similarity value for each item then compare it to similarity input, which is a float.
Using something like set pg_trgm.similarity_threshold=0.6; is a global setting as far as I know. The question is specifically asking for an explicit threshold.
Also, if you create a function to do the job and the function is not VOLATILE but is STABLE, you cannot use set pg_trgm.similarity_threshold. (At least that was what happened to me).
Caution: I didn't compare my approach to (ANY) approach in terms of performance.
Example Code:
CREATE OR REPLACE FUNCTION your_function_name (input text, similarity float) RETURNS
SELECT * FROM your_table_name
WHERE EXISTS
(SELECT
FROM unnest(ARRAY['item','anotherItem', 'third-ish']) element
WHERE SIMILARITY (input, element) > similarity
);
$ function $

Oracle DETERMINISTIC HINT Overhead

DETERMINISTIC HIT (as Oracle says) is used to cache the result of a function if it could be deterministic, but what is the overhead of that benefit?
I'll try to explain this better:
CREATE OR REPLACE FUNCTION betwnstr (
string_in IN VARCHAR2
, start_in IN INTEGER
, end_in IN INTEGER
)
RETURN VARCHAR2 DETERMINISTIC
IS
BEGIN
RETURN (SUBSTR (string_in, start_in, end_in - start_in + 1));
END;
/
This simple function extract the characters from BEGIN and END index from a given string.
Now i'll start to use this Function in different tables as SELECT result(other functions, procedure, package etc) and Oracle will start caching all the result from the same input.
Sure this is a wonderful result just adding a simple world on function declaration, but what is the side effect of an intensive use of this? For example, if this function is called million of times with different input ?
I could have many other functions as DETERMINISTICT for example:
AN DETERMINISTIC function to calculate the difference (in DAYS) from two given date
ecc
The documentation says:
DETERMINISTIC
Tells the optimizer that the function returns the same value whenever it is invoked with the same parameter values (if this is not true, then specifying DETERMINISTIC causes unpredictable results). If the function was invoked previously with the same parameter values, the optimizer can use the previous result instead of invoking the function again.
The optimizer can use the previous result, but doesn't have too; this is just asserting that if it needed to call it multiple times for the same parameter values - generally within a single query - it can choose to only make the call once, since you're promised it that it would always get the same result. That doesn't necessarily imply that function results could be cached somewhere between queries, though they may be cached by other mechanisms (I think).
When Oracle does cache things it manages the cache size to stay within available memory, and to optimise the memory available for various functionality. Basically you don't need to worry about side-effects from making a function deterministic, assuming you're using it properly.
There's more documentation here, including how this relates to function-based indexes etc.

Infinite optional parameters

In essence, I'd like the ability to create a scalar function which accepts a variable number of parameters and concatenates them together to return a single VARCHAR. In other words, I want the ability to create a fold over an uncertain number of variables and return the result of the fold as a VARCHAR, similar to .Aggregate in C# or Concatenate in Common Lisp.
My (procedural) pseudo code for such a function is as follows:
define a VARCHAR variable
foreach non-null parameter convert it to a VARCHAR and add it to the VARCHAR variable
return the VARCHAR variable as the result of the function
Is there an idiomatic way to do something like this in MS-SQL? Does MS-SQL Server have anything similar to the C# params/Common Lisp &rest keyword?
-- EDIT --
Is it possible to do something similar to this without using table-valued parameters, so that a call to the function could look like:
MY_SCALAR_FUNC('A', NULL, 'C', 1)
instead of having to go through the rigmarole of setting up and inserting into a new temporary table each time the function is called?
For a set of items, you could consider passing a table of values to your function?
Pass table as parameter into sql server UDF
See also http://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx
To answer your question directly, no, there is no equivalent to the params keyword. The approach I'd use is the one above - Create a user-defined table type, populate that one row per value, and pass that to your scalar function to operate on.
EDIT: If you want to avoid table parameters, and are on SQL 2012, look at the CONCAT function:
http://technet.microsoft.com/en-us/library/hh231515.aspx
CONCAT ( string_value1, string_value2 [, string_valueN ] )
This is only for the built-in CONCAT function, you couldn't roll-your-own function with "params" style declaration.

What does it mean by "Non-deterministic User-Defined functions can be used in a deterministic manner"?

According to MSDN SQL BOL (Books Online) page on Deterministic and Nondeterministic Functions, non-deterministic functions can be used "in a deterministic manner"
The following functions are not always deterministic, but can be used in indexed views or indexes on computed columns when they are specified in a deterministic manner.
What does it mean by non-deterministic functions can be used in a deterministic manner?
Can someone illustrate how that can be done? and where you would do so?
That a function is deterministic means that it is guaranteed always to return the same output value for the same input arguments.
Using a non-deterministic function in a deterministic manner I assume means that you ensure that the range of arguments you will pass to the function is such that the return value will be deterministic, ie. dependent only opon those arguments.
What this implies in practice depends on what the function does and in what way it is non-deterministic.
An example:
RAND(1) // deterministic, always returns the same number
versus:
RAND() // non-deterministic, returns new random number on each call
Note this uses the MSDN article's definition of the word "deterministic"
the BOL actually states:
The following functions are not
always deterministic, but can be
used in indexed views or indexes on
computed columns when they are
specified in a deterministic manner.
and then below it states what conditions must be met to make them deterministic.
E.g.
CAST - Deterministic unless used with
datetime, smalldatetime, or
sql_variant
In other words you need to meet those condition to use them in deterministic manner
For example when you create a table
CREATE TABLE [dbo].[deterministicTest](
[intDate] [int] NULL,
[dateDateTime] [datetime] NULL,
[castIntToDateTime] AS (CONVERT([datetime],[intDate],0)),
[castDateTimeToInt] AS (CONVERT([int],[dateDateTime],0)),
[castIntToVarchar] AS (CONVERT([varchar],[intDate],0))
) ON [PRIMARY]
you can apply index on castIntToVarchar but if you try to add index to castDateTimeToInt or castIntToDateTime you will get the following error:
Column 'castDateTimeToInt'(castIntToDateTime) in table 'dbo.deterministicTest' cannot be used in an index or statistics or as a partition key because it is non-deterministic.
So the dateTime cannot be used neither as a source nor the target format of the CONVERT function if you want to stay deterministic
BOL definitions should read:
”Deterministic functions always return the same result on the same row any time they are called with a specific set of input values (row) and given the same state of the database.
In other words deterministic functions always return the same result on any particular fixed value from their domain (in this case domain is a row).
Nondeterministic functions may return different results each time they are called with a specific set of input values (row) even if the database state that they access remains the same.
In this case nondeterministic functions
a) Return different values every time they called.
b) Depend on the values outside of the row they applied on.
Examples of group a): NEWID(), GETDATE(), GETUTDATE(), RAND() with no seed specified.
Examples of group b): GET_TRANSMISSION_STATUS(), LAG(), RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE(), SUM() when specified with the OVER and ORDER BY clauses.
”
Please note that some authors use different definition of deterministic functions which may lead to confusion.