Postgres pg_trgm how to compare similarity for array of strings - sql

I'm attempting to use pg_trgm for string fuzzy matching and I know it may be used like this:
SELECT * FROM artists WHERE SIMILARITY(name, 'Claud Monay') > 0.4;
where a scalar value may be used to compare against the similarity. However, I've seen this way of using SIMILARITY with an array of strings:
SELECT * FROM artists WHERE 'Cadinsky' % ANY(STRING_TO_ARRAY(name, ' '));
which uses the % operator which is a shorthand for comparing against the default value of 0.3. I'm trying to find the proper syntax to use ANY(STRING_TO_ARRAY(...)) but with the first form where an arbitrary scalar value may be given to compare the similarity against.
This is, most likely, just a simple question of properly using the syntax for ANY, but I'm failing at understanding what the correct form is.

There is no syntax to use ANY with 3 arguments (the string, the array of strings, and the similarity threshold). The way to do it is to set pg_trgm.similarity_threshold to the value you want rather than the default of 0.3, and then use % ANY.
If you want to use different thresholds in different parts of the query, you are out of luck with the ANY construct.
You can always define your own function, but you will probably not be able to get it to use an index.
create or replace function most_similar(text, text[]) returns double precision
language sql as $$
select max(similarity($1,x)) from unnest($2) f(x)
$$;
SELECT * FROM artists WHERE most_similar('Cadinsky', STRING_TO_ARRAY(name, ' '))>0.4;

I am not a DB expert nor good at SQL but here is my solution.
I basically use a function called unnest(). Thus, I can iterate over the array and check the similarity value for each item then compare it to similarity input, which is a float.
Using something like set pg_trgm.similarity_threshold=0.6; is a global setting as far as I know. The question is specifically asking for an explicit threshold.
Also, if you create a function to do the job and the function is not VOLATILE but is STABLE, you cannot use set pg_trgm.similarity_threshold. (At least that was what happened to me).
Caution: I didn't compare my approach to (ANY) approach in terms of performance.
Example Code:
CREATE OR REPLACE FUNCTION your_function_name (input text, similarity float) RETURNS
SELECT * FROM your_table_name
WHERE EXISTS
(SELECT
FROM unnest(ARRAY['item','anotherItem', 'third-ish']) element
WHERE SIMILARITY (input, element) > similarity
);
$ function $

Related

Using the smaller of two values in SQL condition

I have a database of videos with a field for both the video width, and height.
I have queries set up to get videos of a specific resolution however it fails to return any videos that are portrait/vertical.
I would like to be able to do something like WHERE MIN(width, height) == 1080 but to my knowledge, this isn't possible.
Is there anyway I can get my desired effect in SQLite?
SQLite supports multi argument min function which behaves like LEAST function.
min(X,Y,...)
The multi-argument min() function returns the argument with the
minimum value. The multi-argument min() function searches its
arguments from left to right for an argument that defines a collating
function and uses that collating function for all string comparisons.
If none of the arguments to min() define a collating function, then
the BINARY collating function is used. Note that min() is a simple
function when it has 2 or more arguments but operates as an aggregate
function if given only a single argument.
So you must be able to use it in the WHERE clause as you have mentioned in the question
You are looking for a CASE expression in your SELECT.
Something like
CASE WHEN width>height THEN height ELSE width END = 1000

Regular expression metacharacter in SQL yields different results in Oracle vs. Postgres

I'm trying to convert some queries from an Oracle environment to Postgres. This is a simplified version of one of the queries:
SELECT * FROM TABLE
WHERE REGEXP_LIKE(TO_CHAR(LINK_ID),'\D')
I believe the equivalent postgreSQL should be this:
SELECT * FROM TABLE
WHERE CAST(LINK_ID AS TEXT) ~ '\D'
But when I run these queries in their respective environments on the exact same dataset, the first query outputs no records (which is correct) and the second query outputs all records in the table. I didn't write the original code, but as I understand it, it's looking for any values in the numeric field LINK_ID that are non-digit characters. Is the \D metacharacter supposed to behave differently in Oracle vs. postgres? I'm not seeing anything in documentation to say they should.
The documentation for Oracle's TO_CHAR(number) states
If you omit fmt, then n is converted to a VARCHAR2 value exactly long enough to hold its significant digits.
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions181.htm
This means that the only non-numeric character which might be produced is a negative sign or a decimal point. If the number is positive and has no fractional part, it will not match the regular expression \D.
On the other hand, on PostgreSQL CAST(numeric(38,8)as TEXT) returns a value with the number of decimal places specified by the type specification, in this case 8.
E.g.:
cast( cast(12341234 as numeric(38,8)) as TEXT)
Generates 12341234.00000000 The result of such a cast will always contain a decimal point and therefore will always match the regular expression \D.
You may find that replacing it with this solves your problem:
(LINK_ID % 1) <> 0.0
Alternatively, If you need to use the regex (e.g. to simplify migration work), consider changing it to '\.0*[1-9]' i.e. to find a decimal point with any nonzero digit after it.

How can I use scientific notation in SQL scripts

I am creating a database in which certain derived attributes are computed using the Universal Gravitational Constant (G), whose approximate value is 6.673 * 10^-11.
I understand that a normal integer constant can be defined using a scalar function as follows
CREATE FUNCTION MY_CONST()
RETURNS INT
AS
BEGIN
RETURN 123456789
END
Thing is, I'm new to SQL and not sure how to store a complex value such as G in there. In popular high-level programming languages like Java, I usually define the value as 6.673e-11 at the top of the editor and call it whenever I need it in my calculations. I would like to know how to simply do the same in SQL.
I really just don't get how to translate the value into SQL code as a constant.

Infinite optional parameters

In essence, I'd like the ability to create a scalar function which accepts a variable number of parameters and concatenates them together to return a single VARCHAR. In other words, I want the ability to create a fold over an uncertain number of variables and return the result of the fold as a VARCHAR, similar to .Aggregate in C# or Concatenate in Common Lisp.
My (procedural) pseudo code for such a function is as follows:
define a VARCHAR variable
foreach non-null parameter convert it to a VARCHAR and add it to the VARCHAR variable
return the VARCHAR variable as the result of the function
Is there an idiomatic way to do something like this in MS-SQL? Does MS-SQL Server have anything similar to the C# params/Common Lisp &rest keyword?
-- EDIT --
Is it possible to do something similar to this without using table-valued parameters, so that a call to the function could look like:
MY_SCALAR_FUNC('A', NULL, 'C', 1)
instead of having to go through the rigmarole of setting up and inserting into a new temporary table each time the function is called?
For a set of items, you could consider passing a table of values to your function?
Pass table as parameter into sql server UDF
See also http://technet.microsoft.com/en-us/library/ms191165(v=sql.105).aspx
To answer your question directly, no, there is no equivalent to the params keyword. The approach I'd use is the one above - Create a user-defined table type, populate that one row per value, and pass that to your scalar function to operate on.
EDIT: If you want to avoid table parameters, and are on SQL 2012, look at the CONCAT function:
http://technet.microsoft.com/en-us/library/hh231515.aspx
CONCAT ( string_value1, string_value2 [, string_valueN ] )
This is only for the built-in CONCAT function, you couldn't roll-your-own function with "params" style declaration.

Column that shows number of elements in another col (Int Array) SQL (postgres 8.3)

I have a column of Int Array. I want to add another column to the table, that always shows the number elements in that array for that row. It should update this value automatically. Is there a way to embedd a function as default value? If so, how would this function know where to pick its argument (the int array column/row number).
In a normalized table you would not include this functionally dependent and redundant information as a separate column.
It is easy and fast enough to compute it on the fly:
SELECT array_dims ('{1,2,3}'::int[]);
Or:
SELECT array_length('{1,2,3}'::int[], 1);
array_length() has been introduced with PostgreSQL 8.4. Maybe an incentive to upgrade? 8.3 is going out of service soon.
With Postgres 8.3 you can use:
SELECT array_upper('{1,2,3}'::int[], 1);
But that's inferior, because the array index can start with any number, if entered explicitly. array_upper() would not tell the actual length then, you would have to subtract array_lower() first. Also note, that in PostgreSQL arrays can always contain multiple dimensions, regardless of how many dimensions have been declared. I quote the manual here:
The current implementation does not enforce the declared number of
dimensions either. Arrays of a particular element type are all
considered to be of the same type, regardless of size or number of
dimensions. So, declaring the array size or number of dimensions in
CREATE TABLE is simply documentation; it does not affect run-time
behavior.
(True for 8.3 and 9.1 alike.) That's why I mentioned array_dims() first, to give a complete picture.
Details about array functions in the manual.
You may want to create a view to include that functionally dependent column:
CREATE VIEW v_tbl AS
SELECT arr_col, array_length(arr_col, 1) AS arr_len
FROM tbl;