sqlite IN check set coverage - sql

Can SQLite check set coverage using the IN operator?
ie
(SELECT n from nums where n < 4) IN (1,2,3,4)
(where nums is the set of whole numbers) would return true
I have searched the documents but can only find documents where they are only using a single value on the left of IN.
Tests have returned that it can, but I need to confirm this is a valid use case and not a side effect like the ability for SQLite to return aggregate queries without proper GROUP BY statements.

Documentation says no.
The IN and NOT IN operators take a single scalar operand on the left and a vector operand on the right formed by an explicit list of zero or more scalars or by a single subquery.
Emphasis mine
https://www.sqlite.org/lang_expr.html#in_op

The documentation says:
The IN and NOT IN operators take a single scalar operand on the left
So if you use a subquery on the left side, it is treated as a scalar subquery, which does not behave as you want it to:
The result of the expression is the value of the only column in the first row returned by the SELECT statement. If the SELECT yields more than one result row, all rows after the first are ignored.
To check for set coverage, you have to check if there is any element in the left set that is not in the right set:
WITH a(n) AS (
SELECT n
FROM nums
WHERE n < 4
),
b(n) AS (
VALUES (1), (2), (3), (4)
)
SELECT *
FROM a
WHERE NOT EXISTS (SELECT 1
FROM b
WHERE b.n = a.n);

Related

How to use regexp_matches() in an UPDATE statement?

I am trying to clean up a table that has a very messy varchar column, with entries of the sorts:
<u><font color="#0000FF">VA Lidar</font></u> OR <u><font color="#0000FF">InPort Metadata</font></u>
I would like to update the column by keeping only the html links, and separating them with a coma if there are more than one. Ideally I would do something like this:
UPDATE mytable
SET column = array_to_string(regexp_matches(column,'(?<=href=").+?(?=\")','g') , ',');
But unfortunately this returns an error in Postgres 10:
ERROR: set-returning functions are not allowed in UPDATE
I assume regexp_matches() is the said set-returning function. Any ideas on how I can achieve this?
Notes
1.
You don't need to base the correlated subquery on a separate instance of the base table (like other answers suggested). That would be doing more work for nothing.
2.
For simple cases an ARRAY constructor is cheaper than array_agg(). See:
Why is array_agg() slower than the non-aggregate ARRAY() constructor?
3.
I use a regular expression without lookahead and lookbehind constraints and parentheses instead: href="([^"]+)
See query 1.
This works because parenthesized subexpressions are captured by regexp_matches() (and several other Postgres regexp functions). So we can replace the more sophisticated constraints with plain parentheses. The manual on regexp_match():
If a match is found, and the pattern contains no parenthesized
subexpressions, then the result is a single-element text array
containing the substring matching the whole pattern. If a match is
found, and the *pattern* contains parenthesized subexpressions, then the
result is a text array whose n'th element is the substring matching
the n'th parenthesized subexpression of the pattern
And for regexp_matches():
This function returns no rows if there is no match, one row if there
is a match and the g flag is not given, or N rows if there are N
matches and the g flag is given. Each returned row is a text array
containing the whole matched substring or the substrings matching
parenthesized subexpressions of the pattern, just as described above
for regexp_match.
4.
regexp_matches() returns a set of arrays (setof text[]) for a reason: not only can a regular expression match several times in a single string (hence the set), it can also produce multiple strings for each single match with multiple capturing parentheses (hence the array). Does not occur with this regexp, every array in the result holds a single element. But future readers shall not be lead into a trap:
When feeding the resulting 1-D arrays to array_agg() (or an ARRAY constructor) that produces a 2-D array - which is only even possible since Postgres 9.5 added a variant of array_agg() accepting array input. See:
Is there something like a zip() function in PostgreSQL that combines two arrays?
However, quoting the manual:
inputs must all have same dimensionality, and cannot be empty or NULL
I think this can never fail as the same regexp always produces the same number of array elements. Ours always produces one element. But that may be different with other regexp. If so, there are various options:
Only take the first element with (regexp_matches(...))[1]. See query 2.
Unnest arrays and use string_agg() on base elements. See query 3.
Each approach works here, too.
Query 1
UPDATE tbl t
SET col = (
SELECT array_to_string(ARRAY(SELECT regexp_matches(col, 'href="([^"]+)', 'g')), ',')
);
Columns with no match are set to '' (empty string).
Query 2
UPDATE tbl
SET col = (
SELECT string_agg(t.arr[1], ',')
FROM regexp_matches(col, 'href="([^"]+)', 'g') t(arr)
);
Columns with no match are set to NULL.
Query 3
UPDATE tbl
SET col = (
SELECT string_agg(elem, ',')
FROM regexp_matches(col, 'href="([^"]+)', 'g') t(arr)
, unnest(t.arr) elem
);
Columns with no match are set to NULL.
db<>fiddle here (with extended test case)
You could use a correlated subquery to deal with the offending set-returning function (which is regexp_matches). Something like this:
update mytable
set column = (
select array_to_string(array_agg(x), ',')
from (
select regexp_matches(t2.c, '(?<=href=").+?(?=\")', 'g')
from t t2
where t2.id = t.id
) dt(x)
)
You're still stuck with the "CSV in a column" nastiness but that's a separate issue and presumably not a problem for you.
Building on the approach of mu is too short with slightly different regex and a COALESCE function to retain values that do not contain href-links:
UPDATE a
SET bad_data = COALESCE(
(SELECT Array_to_string(Array_agg(x), ',')
FROM (SELECT Regexp_matches(a.bad_data,
'(?<=href=")[^"]+', 'g'
) AS x
FROM a a2
WHERE a2.id = a.id) AS sub), bad_data
);
SQL Fiddle

Check if value exists in Postgres array

Using Postgres 9.0, I need a way to test if a value exists in a given array. So far I came up with something like this:
select '{1,2,3}'::int[] #> (ARRAY[]::int[] || value_variable::int)
But I keep thinking there should be a simpler way to this, I just can't see it. This seems better:
select '{1,2,3}'::int[] #> ARRAY[value_variable::int]
I believe it will suffice. But if you have other ways to do it, please share!
Simpler with the ANY construct:
SELECT value_variable = ANY ('{1,2,3}'::int[])
The right operand of ANY (between parentheses) can either be a set (result of a subquery, for instance) or an array. There are several ways to use it:
SQLAlchemy: how to filter on PgArray column types?
IN vs ANY operator in PostgreSQL
Important difference: Array operators (<#, #>, && et al.) expect array types as operands and support GIN or GiST indices in the standard distribution of PostgreSQL, while the ANY construct expects an element type as left operand and can be supported with a plain B-tree index (with the indexed expression to the left of the operator, not the other way round like it seems to be in your example). Example:
Index for finding an element in a JSON array
None of this works for NULL elements. To test for NULL:
Check if NULL exists in Postgres array
Watch out for the trap I got into: When checking if certain value is not present in an array, you shouldn't do:
SELECT value_variable != ANY('{1,2,3}'::int[])
but use
SELECT value_variable != ALL('{1,2,3}'::int[])
instead.
but if you have other ways to do it please share.
You can compare two arrays. If any of the values in the left array overlap the values in the right array, then it returns true. It's kind of hackish, but it works.
SELECT '{1}' && '{1,2,3}'::int[]; -- true
SELECT '{1,4}' && '{1,2,3}'::int[]; -- true
SELECT '{4}' && '{1,2,3}'::int[]; -- false
In the first and second query, value 1 is in the right array
Notice that the second query is true, even though the value 4 is not contained in the right array
For the third query, no values in the left array (i.e., 4) are in the right array, so it returns false
unnest can be used as well.
It expands array to a set of rows and then simply checking a value exists or not is as simple as using IN or NOT IN.
e.g.
id => uuid
exception_list_ids => uuid[]
select * from table where id NOT IN (select unnest(exception_list_ids) from table2)
Hi that one works fine for me, maybe useful for someone
select * from your_table where array_column ::text ilike ANY (ARRAY['%text_to_search%'::text]);
"Any" works well. Just make sure that the any keyword is on the right side of the equal to sign i.e. is present after the equal to sign.
Below statement will throw error: ERROR: syntax error at or near "any"
select 1 where any('{hello}'::text[]) = 'hello';
Whereas below example works fine
select 1 where 'hello' = any('{hello}'::text[]);
When looking for the existence of a element in an array, proper casting is required to pass the SQL parser of postgres. Here is one example query using array contains operator in the join clause:
For simplicity I only list the relevant part:
table1 other_name text[]; -- is an array of text
The join part of SQL shown
from table1 t1 join table2 t2 on t1.other_name::text[] #> ARRAY[t2.panel::text]
The following also works
on t2.panel = ANY(t1.other_name)
I am just guessing that the extra casting is required because the parse does not have to fetch the table definition to figure the exact type of the column. Others please comment on this.

Oracle where condition priority

I've a table with a varchar column (A) and another integer column(B) indicating the type of data present in A. If B is 0, then A will always contain numeric digits.
So when I form an sql like this
SELECT COUNT(*) FROM TAB WHERE B = 0 AND TO_NUMBER(A) = 123;
I get an exception invalid number.
I expect B = 0 to be evaluated first, and then TO_NUMBER(A) second, but from the above scenario I suspect TO_NUMBER(A) is evaluated first. Is my guess correct?
In contrast to programming languages like C, C#, Java etc., SQL doesn't have so called conditional logical operators. For conditional logical operators, the right operand is only evaluated if it can influence the result. So the evaluation of && stops if the left operand returns false. For || it stops if the left operand returns true.
In SQL, both operands are always evaluated. And it's up to the query optimizer to choose which one is evaluated first.
I propose you create the following function, which is useful in many cases:
FUNCTION IS_NUMBER(P_NUMBER VARCHAR2)
RETURN NUMBER DETERMINISTIC
IS
X NUMBER;
BEGIN
X := TO_NUMBER(P_NUMBER);
RETURN X;
EXCEPTION
WHEN OTHERS THEN RETURN NULL;
END IS_NUMBER;
Then you can rewrite your query as:
SELECT COUNT(*) FROM TAB WHERE B = 0 AND IS_NUMBER(A) = 123;
You can also use the function to check whether a string is a number.
Here's a simple way to force the check on B to occur first.
SELECT COUNT(*) FROM TAB
WHERE 123 = DECODE(B, 0, TO_NUMBER(A), NULL);
you can use subquery to be confident in the correctness of the result
select /*+NO_MERGE(T)*/ count(*)
from (
select *
from TAB
where B = 0
) T
where TO_NUMBER(A) = 123
in your particular example, you can compare varchars like this:
SELECT COUNT(*) FROM TAB WHERE B = 0 AND A = '123';
or if you trust oracle to do the implicit conversions, this should almost always work (i don't know in what cases it won't work, but it would be hard to debug if something went wrong)
SELECT COUNT(*) FROM TAB WHERE B = 0 AND A = 123;
It should test B = 0 first.
I not sure your guess is correct or not though without seeing sample data.
SELECT *
FROM DUAL
WHERE 1=0 AND 1/0=0
You can try add 1/0 = 0 as the last statement (like this query)
to your query to know if Oracle short circuits the logical operator.
sqlfiddle here.

What does =+ mean in an Oracle query

Normally in C++ programming language, the plus means addition, in the example below
int x;
x += 1;
However in plsql query, I am confused about the same usage. That usage does not mean addition. In that case, what is the meaning of =+ ?
Select c.* From alf_numeric a, run_of_id b, tail_of_st c
WHERE category_id IN(33,36) AND a.flow_id =+ b.flow_id
Any idea?
This:
...
FROM alf_numeric a, run_of_id b
WHERE a.flow_id = b.flow_id (+)
would mean:
...
FROM alf_numeric a
LEFT JOIN run_of_id b
ON a.flow_id = b.flow_id
My guess is that:
a.flow_id =+b.flow_id
is parsed as the (simple):
a.flow_id = (+b.flow_id)
and so is the same as:
a.flow_id = b.flow_id
It looks to me that the '+' part of '=+' is a no-op. Try running the following statements:
CREATE TABLE test1 (v1 NUMBER);
INSERT INTO test1(v1) VALUES (-1);
INSERT INTO test1(v1) VALUES (1);
CREATE TABLE test2(v2 NUMBER);
INSERT INTO test2(v2) VALUES (-1);
INSERT INTO test2(v2) VALUES (1);
SELECT *
FROM test1 t1
INNER JOIN test2 t2
ON (t1.v1 = t2.v2)
WHERE t1.v1 =+ t2.v2;
which returns
V1 V2
-1 -1
1 1
Thus, it appears the '+' operator isn't doing anything, it's just answering whatever is there. As a test of this, run the following statement:
SELECT V1, +V1 AS PLUS_V1, ABS(V1) AS ABS_V1, -V1 AS NEG_V1 FROM TEST1;
and you'll find it returns
V1 PLUS_V1 ABS_V1 NEG_V1
-1 -1 1 1
1 1 1 -1
which seems to confirm that a unary '+' is effectively a no-op.
Share and enjoy.
In your SELECT statement, the clause
a.flow_id =+b.flow_id
is mainly a comparison. It tests whether the value of a.flow_id is equal to the value of b.flow_id. So the + operator in this case is an arithmetic operator working on a single operand. It turns the sign of the value to positive.
Update:
It seems I was slightly wrong. The operator doesn't change the sign. It has basically no effect.
It's probably a typo for the old left join syntax in Sybase, which would be =* instead of =+. If that's true, you can rewrite the query in a clearer way using joins, like:
select c.*
From alf_numeric a
left join
run_of_id b
on a.flow_id = b.flow_id
cross join
tail_of_st c
WHERE category_id IN(33,36)
Which would basically return the entire table tail_of_st for each entry in alf_numeric, with a filter on category_id (not sure what table that's in.) A mysterious query!
In your C++ example, the + designates the positive sign, it has nothing to do with addition. Just as you can write x = -1, you can also write x = +1 (which is equal to x = 1, since + as sign can be omitted - and is, in most cases, since it does in fact have no effect whatsoever). But both these cases are an assignment in C++, not an addition - no actual calculation is involved; you're probably thinking of x += 1 (the order is important!), which would increase x by 1.
In your SQL query, I think the + is supposed to have a special meaning - it should probably indicate an outer join. Although if I read that document correctly, it should actually be a.flow_id = b.flow_id (+); as it is here, I doubt that the query parser will recognize it as an outer join, but will instead just interpret it as a positive sign, just as in your C++ example.
I believe that's a join syntax thing. The standard way is to say something like tableA join tableB on <whatever> but some DBs, such as Sybase and Oracle support alternate syntax. In Sybase, it's =* or *=. Postgres probably does the same. From the format, I'd guess a right outer join, but it's hard to say. I looked in the PG docs, but didn't immediately see it.
BTW, in C you'd have x += 1 not x = +1.

SQL : ERROR: more than one row returned by a subquery used as an expression

The thing is that it does return one row.
Here's the thing.
SELECT...
FROM...
WHERE...
GROUP BY...
HAVING randomNumber > (SELECT value FROM.....)
Whenever I have signs such as =, > it always returns me this error. When I do IN it doesn't.
Are you not supposed to use comparison signs when comparing to another table?
When you type:
SomeValue IN (SELECT ...)
it is equivalent to using:
SomeValue = ANY (SELECT ...)
Don't use the second notation - but it illustrates a point. When the SELECT returns more than one value, you must use ANY or ALL with the comparator. When you omit ANY or ALL, then you must have a SELECT that returns exactly one value.
You can specify multiple values with IN operator. If you are using >, = , < etc. try using this:
HAVING randomNUmber > (SELECT MAX(value) FROM ......)