Exclude matched array elements - sql

How can I exclude matched elements of one array from another?
Postgres code:
a1 := '{1, 2, 5, 15}'::int[];
a2 := '{1, 2, 3, 6, 7, 9, 15}'::int[];
a3 := a2 ??magic_operator?? a1;
In a3 I expect exactly '{3, 6, 7, 9}'
Final Result
My and lad2025 solutions works fine.
Solution with array_position() required PostgreSQL 9.5 and later, executes x3 faster.

It looks like XOR between arrays:
WITH set1 AS
(
SELECT * FROM unnest('{1, 2, 5, 15}'::int[])
), set2 AS
(
SELECT * FROM unnest('{1, 2, 3, 6, 7, 9, 15}'::int[])
), xor AS
(
(SELECT * FROM set1
UNION
SELECT * FROM set2)
EXCEPT
(SELECT * FROM set1
INTERSECT
SELECT * FROM set2)
)
SELECT array_agg(unnest ORDER BY unnest)
FROM xor
Output:
"{3,5,6,7,9}"
How it works:
Unnest both arrays
Calculate SUM
Calculate INTERSECT
From SUM - INTERSECT
Combine to array
Alternatively you could use sum of both minus(except) operations:
(A+B) - (A^B)
<=>
(A-B) + (B-A)
Utilizing FULL JOIN:
WITH set1 AS
(
SELECT *
FROM unnest('{1, 2, 5, 15}'::int[])
), set2 AS
(
SELECT *
FROM unnest('{1, 2, 3, 6, 7, 9, 15}'::int[])
)
SELECT array_agg(COALESCE(s1.unnest, s2.unnest)
ORDER BY COALESCE(s1.unnest, s2.unnest))
FROM set1 s1
FULL JOIN set2 s2
ON s1.unnest = s2.unnest
WHERE s1.unnest IS NULL
OR s2.unnest IS NULL;
EDIT:
If you want only elements from second array that are not is first use simple EXCEPT:
SELECT array_agg(unnest ORDER BY unnest)
FROM (SELECT * FROM unnest('{1, 2, 3, 6, 7, 9, 15}'::int[])
EXCEPT
SELECT * FROM unnest('{1, 2, 5, 15}'::int[])) AS sub
Output:
"{3,6,7,9}"

The additional module intarray provides a simple and fast subtraction operator - for integer arrays, exactly the magic_operator you are looking for:
test=# SELECT '{1, 2, 3, 6, 7, 9, 15}'::int[] - '{1, 2, 5, 15}'::int[] AS result;
?column?
-----------
{3,6,7,9}
You need to install the module once per database:
CREATE EXTENSION intarray;
It also provides special operator classes for indexes:
Postgresql intarray error: undefined symbol: pfree
Note that it only works for:
... null-free arrays of integers.

I found a little similar case and modify.
That SQL solve my case.
with elements (element) as (
select unnest(ARRAY[1, 2, 3, 6, 7, 9, 15])
)
select array_agg(element)
from elements
where array_position(ARRAY[1, 2, 5, 15],element) is null
PostgreSQL 9.5 and later required.

Related

How to use a table in SQL WITH statement?

I am trying to use a pre-existing table in the SQL statement at the bottom of the question rather than the data that is being generated in the SQL statement. Currently, there is some data that is generated using:
WITH polys(poly_id, geom) AS (VALUES (1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
(2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)),
However, let's say I already have a table named polys with the poly_id and geom columns, exactly as what would be created above. How can I insert my pre-existing polys table into this SQL statement (i.e. what syntax would I use)?
I have tried the following to add a pre-existing polys table using:
CREATE TABLE polys_pts AS
WITH polys(poly_id, geom) AS,
with the following error:
ERROR: syntax error at or near ","
LINE 2: WITH polys(poly_id, geom) AS,
^
Full Code:
CREATE TABLE polys_pts AS
WITH polys(poly_id, geom) AS (VALUES (1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
(2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)),
pnt_clusters AS (SELECT polys.poly_id,
CASE
WHEN ST_Area(polys.geom)>9 THEN ST_ClusterKMeans(pts.geom, 8) OVER(PARTITION BY polys.poly_id)
ELSE ST_ClusterKMeans(pts.geom, 2) OVER(PARTITION BY polys.poly_id)
END AS cluster_id, pts.geom FROM polys,
LATERAL ST_Dump(ST_GeneratePoints(polys.geom, 1000, 1)) AS pts),
centroids AS (SELECT cluster_id, ST_PointOnSurface(ST_collect(geom)) AS geom FROM pnt_clusters GROUP BY poly_id, cluster_id),
neg_buffer AS (SELECT poly_id, (ST_Buffer(geom, -0.4, 'endcap=flat join=round')) geom FROM polys GROUP BY poly_id, polys.geom),
neg_buffer_pts_out AS (SELECT a.cluster_id, (a.geom) geom FROM centroids a WHERE EXISTS (SELECT 1 FROM neg_buffer b WHERE ST_Intersects(a.geom, b.geom))),
neg_buffer_pts_in AS (SELECT a.cluster_id, (a.geom) geom FROM centroids a WHERE NOT EXISTS (SELECT 1 FROM neg_buffer b WHERE ST_Intersects(a.geom, b.geom))),
snap_pts_clusters_in AS (SELECT DISTINCT ST_ClosestPoint(ST_ExteriorRing(a.geom), b.geom) AS geom FROM neg_buffer a, neg_buffer_pts_in b),
node_pts AS (SELECT ST_StartPoint(ST_ExteriorRing(geom)) geom FROM neg_buffer),
snap_pts AS (SELECT b.cluster_id, a.geom FROM snap_pts_clusters_in a JOIN centroids b ON ST_DWithin(a.geom, b.geom, 0.4))
SELECT a.cluster_id, (a.geom) geom FROM snap_pts a WHERE NOT EXISTS (SELECT 1 FROM node_pts b WHERE ST_Intersects(a.geom, b.geom))
UNION SELECT c.cluster_id, (c.geom) geom FROM neg_buffer_pts_out c ORDER BY cluster_id;
I'm not sure of understanding your question so i give you a broad answer.
To create a table from a query you must use:
CREATE TABLE foo AS
SELECT * FROM my_table;
CTEs are builded as:
WITH
tmp1 AS (
SELECT * from my_table1
), -- commna
tmp2 AS (
SELECT * from my_table2
)
SELECT * from tmp1 JOIN tmp2 ON tmp1.id = tmp2.id -- no comma
;
Note that the are , to separate different "temporary" tables defined in the CTE but the final sentence is not preceded with a ,
So to create a table from a CTE the syntax will be:
CREATE TABLE foo AS
WITH
tmp1 AS (
SELECT * from my_table1
),
tmp2 AS (
SELECT * from my_table2
)
SELECT * from tmp1 JOIN tmp2 ON tmp1.id = tmp2.id -- no comma
;
Create a table from a VALUES clause is the same as the other cases:
CREATE TABLE polys2 AS
VALUES
(1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
(2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)
;
If you already have a table called polys2 that has been created for example like is shown in the previous example, you can replace
CREATE TABLE polys_pts AS
WITH
polys(poly_id, geom) AS (
VALUES
(1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
(2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)),
pnt_clusters AS (SELECT polys.poly_id, ...
with
CREATE TABLE polys_pts AS
WITH
polys(poly_id, geom) AS (
SELECT poly_id, geom FROM polys2
),
pnt_clusters AS (SELECT polys.poly_id, ...
um, the question is not 100% clear to me - ... I am not familiar with pecularities of postgresql, but my first bet would be to try
WITH polys(...) AS (...),
pnt_clusters AS (...)
CREATE polys_pts AS (
SELECT ..
FROM polys... etc.
)
but I guess this is not allowed since WITH only goes with DML statements (data manipulation unlike data definition (DDL) statements like CREATE)
so.. my next bet would be to try using polys and pnt_clusters that you defined inside WITH clause, inline inside the SELECT statement, given that
WITH a AS (
SELECT x, y FROM z
)
SELECT *
FROM a
is the same as
SELECT *
FROM (
SELECT x, y
FROM z
) AS a
well, otherwise I would split the process into two steps - create some kind of temporary tables first for polys and pnt_clusters and then do the create...
The definition of a CTE must be a complete statement, so you have to use
WITH polys(poly_id, geom) AS (
SELECT *
FROM (VALUES
(1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
(2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)
) AS p(p, g)
)

BigQuery arrays - SELECT DISTINCT ordering guarantees?

I want to filter out the duplicates from a BigQuery array. I also need the order of the elements to be preserved. The docs mention that this can be done by combining SELECT DISTINCT with UNNEST. However, it doesn't mention any ordering behavior. I ran this query and got the desired ordering of [5, 3, 1, 4, 10, 8].
WITH an_array AS (
SELECT [5, 5, 3, 1, 4, 4, 10, 8, 5, 1] AS nums
)
SELECT
ARRAY((
SELECT DISTINCT num
FROM UNNEST(nums) num
))
FROM an_array;
I don't know if that's coincidence or if that ordering is guaranteed. I also tried adding WITH OFFSET with an ORDER BY to specify the order explicitly, but in that case I get Query error: ORDER BY clause expression references table alias offset which is not visible after SELECT DISTINCT.
You should always be explicit about ordering if you care about it:WITH an_array AS (
WITH an_array as (
SELECT [5, 5, 3, 1, 4, 4, 10, 8, 5, 1] AS nums
)
SELECT ARRAY((SELECT num
FROM UNNEST(nums) num WITH OFFSET o
GROUP BY num
ORDER BY MIN(o)
)
)
FROM an_array;

Check if array element appears more than once

Is there a way to check an array of integers for any elements that appear more than once? Either boolean False if there are redundancies or list of offending elements would work.
unnest() the array and then use GROUP BY and HAVING with count() to filter for values that appear more than once.
SELECT un.n
FROM unnest('{1, 2, 3, 3, 4, 5, 5, 6, 7, 8, 9, 10}'::integer[]) un (n)
GROUP BY un.n
HAVING count(*) > 1;
To just get a Boolean you can use EXISTS and the above in a subquery.
SELECT EXISTS (SELECT un.n
FROM unnest('{1, 2, 3, 3, 4, 5, 5, 6, 7, 8, 9, 10}'::integer[]) un (n)
GROUP BY un.n
HAVING count(*) > 1);
If you have the intarray extension installed, this will return false if you have duplicates:
icount(your_array) = icount(uniq(your_array))
If not, then I would use unnest() to return false in case of duplicates.
select count(unnest) = count(distinct unnest)
from your_table
cross join lateral unnest(your_array)

SQL ARRAY: Select ID from my_table where "arrayvalue" = "defined_arrayvalue"

This is a beginner-question relating arrays. I hope the answer is simple.
The example is taken from Oracle Spatial, but I think it is valid for all arrays.
I have this SELECT:
SELECT
D.FID
, D.GEOM.SDO_ELEM_INFO -- column GEOM contains spatial data
FROM
my_table D
I get this result:
73035 MDSYS.SDO_ELEM_INFO_ARRAY(1, 2, 1)
73036 MDSYS.SDO_ELEM_INFO_ARRAY(1, 4, 3, 1, 2, 1, 11, 2, 2, 19, 2, 1)
73037 MDSYS.SDO_ELEM_INFO_ARRAY(1, 2, 1)
Now I want to SELECT all rows where (1,2,1) is defined:
SELECT
D.FID
, D.GEOM.SDO_ELEM_INFO
FROM
my_table D
WHERE
-- Pseudo-Code is following
D.GEOM.SDO_ELEM_INFO is "(1, 2, 1)";
So, in simple words: "array_from_row = defined_array".
I found a lot about IMPLODE and TABLE and COLLECT etc. But how to define a clause on two arrays?
Thanks for help!
Try IN clause, you can also use both
SELECT
D.FID
, D.GEOM.SDO_ELEM_INFO
FROM
my_table D
WHERE
D.GEOM.SDO_ELEM_INFO in (1, 2, 1) or ( D.GEOM.SDO_ELEM_INFO = 1 or D.GEOM.SDO_ELEM_INFO = 2 or D.GEOM.SDO_ELEM_INFO = 3);

Google BigQuery Check If One Array is Superset/Subset of another

Given two arrays in Google BigQuery, I need to figure out whether ALL elements in one array are contained in the other.
As an example, I am trying to complete the following query:
WITH
ArrayA AS (
SELECT
[1, 2, 3] arrA,
UNION ALL
SELECT
[4, 5, 6])
ArrayB AS (
SELECT
[1, 2, 3, 4, 5] arrB)
SELECT
*
FROM
ArrayA
CROSS JOIN
ArrayB
WHERE
<your code goes here>
such that the result looks like
arrA | arrB
[1,2,3] | [1,2,3,4,5]
, since [1,2,3,4,5] is a superset of [1,2,3] but not a superset of [4,5,6].
Many thanks in advance.
You can check for every item in arrA, and then get minimum of it.
If all the items of arrA in arrB, there will be 3 trues, so the minimum will be true.
If at least one of them is not in arrB, there will be 2 true and 1 false, so the minimum will be false.
WITH
ArrayA AS (
SELECT [1, 2, 3] arrA,
UNION ALL
SELECT [4, 5, 6]
),
ArrayB AS (
SELECT [1, 2, 3, 4, 5] arrB
)
SELECT
*,
(
SELECT min(a in UNNEST(arrB))
FROM UNNEST(arrA) as a
) as is_a_in_b
FROM ArrayA
CROSS JOIN ArrayB
You can also make it a function and use it in many places. Sorry for bad naming :)
CREATE TEMP FUNCTION is_array_in_array(subset ARRAY<int64>, main ARRAY<int64>) AS ((SELECT min(a in UNNEST(main)) FROM UNNEST(subset) as a));
WITH
ArrayA AS (
SELECT [1, 2, 3] arrA,
UNION ALL
SELECT [4, 5, 6]
),
ArrayB AS (
SELECT [1, 2, 3, 4, 5] arrB
)
SELECT
*,
is_array_in_array(arrA, arrB) as is_a_in_b
FROM ArrayA
CROSS JOIN ArrayB
I think these conditions do what you want:
WITH
ArrayA AS (
SELECT ARRAY[1, 2, 3] arrA,
UNION ALL
SELECT ARRAY[4, 5, 6]),
ArrayB AS (
SELECT ARRAY[1, 2, 3, 4, 5] arrB)
SELECT *
FROM ArrayA a CROSS JOIN
ArrayB b
WHERE NOT EXISTS (SELECT a_el
FROM UNNEST(a.arrA) a_el LEFT JOIN
UNNEST(b.arrB) b_el
ON a_el = b_el
WHERE b_el IS NULL
) AND
NOT EXISTS (SELECT COUNT(*)
FROM UNNEST(a.arrA) a_el LEFT JOIN
UNNEST(b.arrB) b_el
ON a_el = b_el
HAVING COUNT(*) <> COUNT(b_el)
) ;