Operator does not exist: integer = integer[] in a query with ANY - sql

I frequently used integer = ANY(integer[]) syntax, but now ANY operator doesn't work. This is the first time I use it to compare a scalar with an integer returned from CTE, but I thought this shouldn't cause problems.
My query:
WITH bar AS (
SELECT array_agg(b) AS bs
FROM foo
WHERE c < 3
)
SELECT a FROM foo WHERE b = ANY ( SELECT bs FROM bar);
When I run it, it throws following error:
ERROR: operator does not exist: integer = integer[]: WITH bar AS (
SELECT array_agg(b) AS bs FROM foo WHERE c < 3 ) SELECT a FROM foo
WHERE b = ANY ( SELECT bs FROM bar)
Details in this SQL Fiddle.
So what am I doing wrong?

Based on the error message portion operator does not exist: integer = integer[], it appears that the bs column needs to be unnested, in order to get the right hand side back to an integer so the comparison operator can be found:
WITH bar AS (
SELECT array_agg(b) AS bs
FROM foo
WHERE c < 3
)
SELECT a
FROM foo
WHERE b = ANY ( SELECT unnest(bs) FROM bar);
This results in the output:
A
2
3
Given the doc for the ANY function:
The right-hand side is a parenthesized subquery, which must return
exactly one column. The left-hand expression is evaluated and compared
to each row of the subquery result using the given operator, which
must yield a Boolean result. The result of ANY is "true" if any true
result is obtained. The result is "false" if no true result is found
(including the case where the subquery returns no rows).
... the error makes sense, as the left-hand expression is an integer -- column b -- while the right-hand expression is an array of integers, or integer[], and so the comparison ends up being of the form integer = integer[], which doesn't have an operator, and therefore results in the error.
unnesting the integer[] value makes the left- and right-hand expressions integers, and so the comparison can continue.
Modified SQL Fiddle.
Note: that the same behavior is seen when using IN instead of = ANY.

without unnest
WITH bar AS (
SELECT array_agg(b) AS bs
FROM foo
WHERE c < 3
)
SELECT a FROM foo WHERE ( SELECT b = ANY (bs) FROM bar);

FYI, For me,
SELECT ... WHERE "id" IN (SELECT unnest(ids) FROM tablewithids)
was incomparably faster than
SELECT ... WHERE "id" = ANY((SELECT ids FROM tablewithids)::INT[])
Didn't do any research into why that was though.

column needs to be unnest
WITH bar AS (
SELECT array_agg(b) AS bs
FROM foo
WHERE c < 3
)
SELECT a
FROM foo
WHERE b = ANY ( SELECT unnest(bs) FROM bar);

Related

postgres extract int array from text

I have the following postgresql statement:
SELECT 1 = ANY( jsonb_array_elements_text('[2, 1, 3]') );
Basically I have a string which contains an array of integers seperated by comma, like: [1, 2, 3] and sometimes this array could be empty too, like: []. Now, I want to write a query (as part of a bigger query) where I would be able to find out if an element is matching any integers in the text. For example:
SELECT 1 = ANY( jsonb_array_elements_text('[2, 1, 3]') ); -- Should return true
SELECT 1 = ANY( jsonb_array_elements_text('[]') ); -- should return false
However, the above query fails with an error message:
ERROR: op ANY/ALL (array) requires array on right side
LINE 1: SELECT 1 = ANY( jsonb_array_elements_text('[2, 1, 3]') );
Any help how I can extract an integer array out of a text so that I can use it in a join condition ?
I am using postgres 9.4 if it matters.
I have found it. The answer is:
SELECT 1 IN (SELECT json_array_elements('[2, 1, 3]')::text::int);
SELECT 1 IN (SELECT json_array_elements('[]')::text::int);
SELECT 1 IN (SELECT json_array_elements('[12, 10, 3]')::text::int);

Hive - joining two tables to find string that like string in reference table

I stumbled in a case where requires to mask data using keyword from other reference table, illustrated below:
1:
Table A contains thousands of keyword and Table B contains 15 millions ++ row for each day processing..
How can I replace data in table B using keyword in table A in new column?
I tried to use join but join can only match when the string exactly the same
Here is my code
select
sourcetype, hourx,minutex,t1.adn,hostname,t1.appsid,t1.taskid,
product_id,
location,
smsIncoming,
case
when smsIncoming regexp keyword = true then keyword
else 'undef' end smsIncoming_replaced
from(
select ... from ...
)t1
left join
(select adn,keyword,type,mapping_param,mapping_param_json,appsid,taskid,is_api,charlentgh,wordcount,max(datex)
from ( select adn,keyword,type,mapping_param,mapping_param_json,appsid,taskid,is_api,charlentgh,wordcount,datex ,last_update,
max(last_update) over (partition by keyword) as last_modified
from sqm_stg.reflex_service_map ) as sub
where last_update = last_modified
group by adn,keyword,type,mapping_param,mapping_param_json,appsid,taskid,is_api,charlentgh,wordcount)t2
on t1.adn=t2.adn and t1.appsid=t2.appsid and t1.taskid=t2.taskid
Need advice :)
Thanks
Use instr(string str, string substr) function: Returns the position of the first occurrence of substr in str. Returns null if either of the arguments are null and returns 0 if substr could not be found in str. Be aware that this is not zero based. The first character in str has index 1.
case
when instr(smsIncoming,keyword) >0 then keyword
else 'undef'
end smsIncoming_replaced

How to aggragate integers in postgresql?

I have a query that gives list of IDs:
ID
2
3
4
5
6
25
ID is integer.
I want to get that result like that in ARRAY of integers type:
ID
2,3,4,5,6,25
I wrote this query:
select string_agg(ID::text,',')
from A
where .....
I have to convert it to text otherwise it won't work. string_agg expect to get (text,text)
this works fine the thing is that this result should later be used in many places that expect ARRAY of integers.
I tried :
select ('{' || string_agg(ID::text,',') || '}')::integer[]
from A
WHERE ...
which gives: {2,3,4,5,6,25} in type int4 integer[]
but this isn't the correct type... I need the same type as ARRAY.
for example SELECT ARRAY[4,5] gives array integer[]
in simple words I want the result of my query to work with (for example):
select *
from b
where b.ID = ANY (FIRST QUERY RESULT) // aka: = ANY (ARRAY[2,3,4,5,6,25])
this is failing as ANY expect array and it doesn't work with regular integer[], i get an error:
ERROR: operator does not exist: integer = integer[]
note: the result of the query is part of a function and will be saved in a variable for later work. Please don't take it to places where you bypass the problem and offer a solution which won't give the ARRAY of Integers.
EDIT: why does
select *
from b
where b.ID = ANY (array [4,5])
is working. but
select *
from b
where b.ID = ANY(select array_agg(ID) from A where ..... )
doesn't work
select *
from b
where b.ID = ANY(select array_agg(4))
doesn't work either
the error is still:
ERROR: operator does not exist: integer = integer[]
Expression select array_agg(4) returns set of rows (actually set of rows with 1 row). Hence the query
select *
from b
where b.id = any (select array_agg(4)) -- ERROR
tries to compare an integer (b.id) to a value of a row (which has 1 column of type integer[]). It raises an error.
To fix it you should use a subquery which returns integers (not arrays of integers):
select *
from b
where b.id = any (select unnest(array_agg(4)))
Alternatively, you can place the column name of the result of select array_agg(4) as an argument of any, e.g.:
select *
from b
cross join (select array_agg(4)) agg(arr)
where b.id = any (arr)
or
with agg as (
select array_agg(4) as arr)
select *
from b
cross join agg
where b.id = any (arr)
More formally, the first two queries use ANY of the form:
expression operator ANY (subquery)
and the other two use
expression operator ANY (array expression)
like it is described in the documentation: 9.22.4. ANY/SOME
and 9.23.3. ANY/SOME (array).
How about this query? Does this give you the expected result?
SELECT *
FROM b b_out
WHERE EXISTS (SELECT 1
FROM b b_in
WHERE b_out.id = b_in.id
AND b_in.id IN (SELECT <<first query that returns 2,3,4,...>>))
What I've tried to do is to break down the logic of ANY into two separate logical checks in order to achieve the same result.
Hence, ANY would be equivalent with a combination of EXISTS at least one of the values IN your list of values returned by the first SELECT.

Execute a WHERE clause before another one

I have the following statement
SELECT * FROM foo
WHERE LEN(bar) = 4 AND CONVERT(Int,bar) >= 5000
The values in bar with a length of exactly 4 characters are integers. The other values are not integers and therefore it throws an conversion exception, when trying to convert one of them to an integer.
I thought it's enough to put the LEN(bar) before the CONVERT(Int,bar) >= 5000. But it's not.
How can I kind of prioritize a specific where clause? In my example I obviously want to select all values with a length of 4, before converting and comparing them.
6 answers and 5 of them don't work (for SQL Server)...
SELECT *
FROM foo
WHERE CASE WHEN LEN(bar) = 4 THEN
CASE WHEN CONVERT(Int,bar) >= 5000 THEN 1 ELSE 0 END
END = 1;
The WHERE/INNER JOIN conditions can be executed in any order that the query optimizer determines is best. There is no short-circuit boolean evaluation.
Specifically for your question, since you KNOW that the data with 4-characters is a number, then you can do a direct lexicographical (text) comparison (yes it works):
SELECT *
FROM foo
WHERE LEN(bar) = 4 AND bar > '5000';
try this
SELECT bar FROM
(
SELECT CASE
WHEN LEN(bar) = 4 THEN CAST( bar as int)
ELSE CAST(-1 as int) END bar
FROM Foo
) Foo
WHERE bar>5000

How do I map true/false/unknown to -1/0/null without repetition?

I am currently working on a tool to help my users port their SQL code to SQL-Server 2005. For this purpose, I parse the SQL into a syntax tree, analyze it for constructs which need attentions, modify it and transform it back into T-SQL.
On thing that I want to support, is the "bools are values too" semantics of other RDBMS. For example, MS-Access allows me to write select A.x and A.y as r from A, which is impossible in T-SQL because:
Columns can't have boolean type (column values can't be and'ed)
Logical predicates can not be used where expressions are expected.
Therefore, my transformation routine converts the above statement into this:
select case
when (A.x<>0) and (A.y<>0)
then -1
when not((A.x<>0) and (A.y<>0))
then 0
else
null
end as r
from A;
Which works, but is annoying, because I have to duplicate the logical expression (which can be very complex or contain subqueries etc.) in order to distinguish between true, false and unknown - the latter shall map to null. So I wonder if the T-SQL pro's here know a better way to achieve this?
UPDATE:
I would like to point out, that solutions which try to keep the operands in the integer domain have to take into account, that some operands may be logical expressions in the first place. This means that a efficient solution to convert a bool to a value is stil required. For example:
select A.x and exists (select * from B where B.y=A.y) from A;
I don't think there's a good answer, really it's a limitation of TSQL.
You could create a UDF for each boolean expression you need
CREATE FUNCTION AndIntInt
(
#x as int,#y as int
)
RETURNS int
AS
BEGIN
if (#x<>0) and (#y<>0)
return -1
if not((#x<>0) and (#y<>0))
return 0
return null
END
used via
select AndIntInt(A.x,A.y) as r from A
Boolean handling
Access seems to use the logic that given 2 booleans
Both have to be true to return true
Either being false returns false (regardless of nulls)
Otherwise return null
I'm not sure if this is how other DBMS (Oracle, DB2, PostgreSQL) deal with bool+null, but this answer is based on the Access determination (MySQL and SQLite agree). The table of outcomes is presented below.
X Y A.X AND B.Y
0 0 0
0 -1 0
0 (null) 0
-1 0 0
-1 -1 -1
-1 (null) (null)
(null) 0 0
(null) -1 (null)
(null) (null) (null)
SQL Server helper 1: function for boolean from any "single value"
In SQL Server in general, this function will fill the gap for the missing any value as boolean functionality. It returns a ternary result, either 1/0/null - 1 and 0 being the SQL Server equivalent of true/false (without actually being boolean).
drop function dbo.BoolFromAny
GO
create function dbo.BoolFromAny(#v varchar(max)) returns bit as
begin
return (case
when #v is null then null
when isnumeric(#v) = 1 and #v like '[0-9]%' and (#v * 1.0 = 0) then 0
else 1 end)
end
GO
Note: taking Access as a starting point, only the numeric value 0 evaluates to FALSE
This uses some SQL Server tricks
everything is convertible to varchar. Therefore only one function taking varchar input is required.
isnumeric is not comprehensive, '.' returns 1 for isnumeric but will fail at #v * 1.0, so an explicit test for LIKE [0-9]%`` is required to "fix" isnumeric.
#v * 1.0 is required to overcome some arithmetic issues. If you pass the string "1" into the function without *1.0, it will bomb
Now we can test the function.
select dbo.BoolFromAny('abc')
select dbo.BoolFromAny(1)
select dbo.BoolFromAny(0) -- the only false
select dbo.BoolFromAny(0.1)
select dbo.BoolFromAny(-1)
select dbo.BoolFromAny('')
select dbo.BoolFromAny('.')
select dbo.BoolFromAny(null) -- the only null
You can now safely use it in a query against ANY SINGLE COLUMN, such as
SELECT dbo.BoolFromAny(X) = 1
SQL Server helper 2: function to return result of BOOL AND BOOL
Now the next part is creating the same truth table in SQL Server. This query shows you how two bit columns interact and the simple CASE statement to produce the same table as Access and your more complicated one.
select a.a, b.a,
case
when a.a = 0 or b.a = 0 then 0
when a.a = b.a then 1
end
from
(select 1 A union all select 0 union all select null) a,
(select 1 A union all select 0 union all select null) b
order by a.a, b.a
This is easily expressed as a function
create function dbo.BoolFromBits(#a bit, #b bit) returns bit as
begin
return case
when #a = 0 or #b = 0 then 0
when #a = #b then 1
end
end
SQL Server conversion of other expressions (not of a single value)
Due to lack of support for bit-from-boolean conversion, expressions that are already [true/false/null] in SQL Server require repetition in a CASE statement.
An example is a "true boolean" in SQL Server, which cannot be the result for a column.
select A > B -- A=B resolves to one of true/false/null
from C
Needs to be expressed as
select case when A is null or B is null then null when A > B then 1 else 0 end
from C
But if A is not a scalar value but a subquery like (select sum(x)...), then as you can see A will appear twice and be evaluated twice in the CASE statement (repeated).
FINAL TEST
Now we put all the conversion rules to use in this long expression
SELECT X AND Y=Z AND C FROM ..
( assume X is numeric 5, and C is varchar "H" )
( note C contributes either TRUE or NULL in Access )
This translates to SQL Server (chaining the two functions and using CASE)
SELECT dbo.BoolFromBits(
dbo.BoolFromBits(dbo.BoolFromAny(X), CASE WHEN Y=Z then 1 else 0 end),
dbo.BoolFromAny(C))
FROM ...
Access Bool or bool
For completeness, here is the truth table for Access bool OR bool. Essentially, it is the opposite of AND, so
Both have to be false to return false
Either being true returns true (regardless of nulls)
Otherwise return null
The SQL SERVER case statement would therefore be
case
when a.a = 1 or b.a = 1 then 1
when a.a = b.a then 0
end
(the omission of an ELSE clause is intentional as the result is NULL when omitted)
EDIT: Based on additional information added to the Question and comments made on one of the suggested Answers, I am reformulating this answer:
If you are porting to SQL Server then I would expect that you are also transforming the data to match SQL Server types. If you have a boolean field then True, False, and Unknown map to 1, 0, and NULL as a NULLable BIT field.
With this in mind, you only need to worry about transforming pure boolean values. Expressions such as:
exists (select * from B where B.y=A.y)
and:
A.x in (1,2,3)
are already in a workable form. Meaning, statements like:
IF (EXISTS(SELECT 1 FROM Table))
and:
IF (value IN (list))
are already correct. So you just need to worry about the fact that "1" for "True" is not by itself enough. Hence, you can convert "1" values to boolean expressions by testing if they are in fact equal to "1". For example:
IF (value = 1)
is the equivalent of what you previously had as:
IF (value)
Putting all of this together, you should be able to simply translate all instances of pure boolean values of the old code into boolean expressions in the form of "value = 1" since a 1 will produce a True, 0 will produce False, and NULL will give you False.
HOWEVER, the real complexity is that SELECTing the value vs. testing with it via a WHERE condition is different. Boolean expressions evaluate correctly in WHERE conditions but have no direct representation to SELECT (especially since NULL / Unknown isn't really boolean). So, you can use the "Value = 1" translation in WHERE conditions but you will still need a CASE statement if you want to SELECT it as a result.
As mentioned briefly a moment ago, since NULL / Unknown isn't truly boolean, it is meaningless trying to convert "NULL AND NULL" to NULL for the purposes of a WHERE condition. In effect, NULL is truly FALSE since it cannot be determined to be TRUE. Again, this might be different for your purposes in a SELECT statement which is again why the CASE statement is your only choice there.