Select distinct values of comma-separated values column excluding subsets in PostgreSQL - sql

Assume having a table foo with column bar that carries comma-separated values,
('a,b',
'a,b,c',
'a,b,c,d',
'd,e')
How can I select the largest combination and exclude all the subsets included in that combination (the largest one)?
Example on the above data-set. The result should be:
('a,b,c,d', 'd,e') and the first two entities ('a,b', 'a,b,c') are excluded as they are subset of ('a,b,c,d').
Taking in consideration that all the values in the comma-separated string are sorted alphabetically.
I tried the below query, but the results seem a little far away from what I need:
select distinct a.bar from foo a inner join foo b
on a.bar like '%'|| b.bar||'%'
and a.bar != b.bar

You can use string_to_array() to split the strings into an array. With the contains operator, #>, you can check whether an array contains another. (See "9.18. Array Functions and Operators".)
Use that in a NOT EXISTS clause. fi.ctid <> fo.ctid is there to make sure the physical addresses of the compared pair of rows is not equal, as of course an array of one row would contain the array compared to the same row.
SELECT fo.bar
FROM foo fo
WHERE NOT EXISTS (SELECT *
FROM foo fi
WHERE fi.ctid <> fo.ctid
AND string_to_array(fi.bar, ',') #> string_to_array(fo.bar, ','));
SQL Fiddle
But: Don't use comma-separated strings in a relational database. You've got something way better. It's called "table".

First process the string into sets of characters, and then cross join the character-sets with itself, excluding rows where the character-sets on both sides are the same.
Next, aggregate and use BOOL_OR in a HAVING clause to filter out any character-set that is a subset of any other character-set.
With a sample table declared in the CTE, the query becomes:
WITH foo(bar) AS (SELECT '("a,b" , "a,b,c" , "a,b,c,d" , "d,e")'::TEXT)
SELECT bar, string_to_array(elems[1], ',') not_subset
FROM foo
CROSS JOIN regexp_matches(bar, '[\w|,]+', 'g') elems
CROSS JOIN regexp_matches(bar, '[\w|,]+', 'g') elems2
WHERE elems2[1] != elems[1]
-- my regex also matches the ',' between sets which need to be ignored
-- alternatively, i have to refine the regex
AND elems2[1] != ','
AND elems[1] != ','
GROUP BY 1, 2
HAVING NOT BOOL_OR(string_to_array(elems[1], ',') <# string_to_array(elems2[1], ','))
produces the output
bar not_subset
'("a,b" , "a,b,c" , "a,b,c,d" , "d,e")' {'d','e'}
'("a,b" , "a,b,c" , "a,b,c,d" , "d,e")' {'a','b','c','d'}
Example in SQL Fiddle

Related

concat two strings and put smaller string at first in sql server

for concating two varchars from columns A and B ,like "1923X" and "11459" with the hashtag, while I always want the smallest string become at first, what should I do in SQL server query?
inputs:
Two Columns
A="1923X"
B="11459"
procedure:
while we are checking two inputs from right to left, in this example the second character value in B (1) is smaller than the second character in A (9) so B is smaller.
result: new column C
"11459#1923X"
Original answer:
If you need to order the input strings, not only by second character, STRING_AGG() is also an option:
DECLARE #a varchar(5) = '1923X'
DECLARE #b varchar(5) = '11459'
SELECT STRING_AGG(v.String, '#') WITHIN GROUP (ORDER BY v.String) AS Result
FROM (VALUES (#a), (#b)) v (String)
Output:
Result
11459#1923X
Update:
You changed the requirements (now the strings are stored in two columns), so you need a different statement:
SELECT
A,
B,
C = (
SELECT STRING_AGG(v.String, '#') WITHIN GROUP (ORDER BY v.String)
FROM (VALUES (A), (B)) v (String)
)
FROM (VALUES ('1923X', '11459')) t (a, b)

List characters (and their count) which meet a regex

This example lists the result of column col, which does not contain solely alphanumeric characters:
select col
from foo
where col ~ '[^a-zA-Z0-9]';
My aim is to list all the different characters which meet the regex with the count of their occurrence.
Try this :
SELECT c.char[1], count(*) AS count
FROM foo
CROSS JOIN LATERAL regexp_matches(col, '[^a-zA-Z0-9]', 'g') AS c(char)
GROUP BY c.char
see the test result in dbfiddle.

SQL Array with Null

I'm trying to group BigQuery columns using an array like so:
with test as (
select 1 as A, 2 as B
union all
select 3, null
)
select *,
[A,B] as grouped_columns
from test
However, this won't work, since there is a null value in column B row 2.
In fact this won't work either:
select [1, null] as test_array
When reading the documentation on BigQuery though, it says Nulls should be allowed.
In BigQuery, an array is an ordered list consisting of zero or more
values of the same data type. You can construct arrays of simple data
types, such as INT64, and complex data types, such as STRUCTs. The
current exception to this is the ARRAY data type: arrays of arrays are
not supported. Arrays can include NULL values.
There doesn't seem to be any attributes or safe prefix to be used with ARRAY() to handle nulls.
So what is the best approach for this?
Per documentation - for Array type
Currently, BigQuery has two following limitations with respect to NULLs and ARRAYs:
BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
BigQuery translates NULL ARRAY into empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.
So, as of your example - you can use below "trick"
with test as (
select 1 as A, 2 as B union all
select 3, null
)
select *,
array(select cast(el as int64) el
from unnest(split(translate(format('%t', t), '()', ''), ', ')) el
where el != 'NULL'
) as grouped_columns
from test t
above gives below output
Note: above approach does not require explicit referencing to all involved columns!
My current solution---and I'm not a fan of it---is to use a combo of IFNULL(), UNNEST() and ARRAY() like so:
select
*,
array(
select *
from unnest(
[
ifnull(A, ''),
ifnull(B, '')
]
) as grouping
where grouping <> ''
) as grouped_columns
from test
An alternative way, you can replace NULL value to some NON-NULL figures using function IFNULL(null, 0) as given below:-
with test as (
select 1 as A, 2 as B
union all
select 3, IFNULL(null, 0)
)
select *,
[A,B] as grouped_columns
from test

Extract a string in Postgresql and remove null/empty elements

I Need to extract values from string with Postgresql
But for my special scenario - if an element value is null i want to remove it and bring the next element 1 index closer.
e.g.
assume my string is: "a$$b"
If i will use
select string_to_array('a$$b','$')
The result is:
{a,,b}
If Im trying
SELECT unnest(string_to_array('a__b___d_','_')) EXCEPT SELECT ''
It changes the order
1.d
2.a
3.b
order changes which is bad for me.
I have found a other solution with:
select array_remove( string_to_array(a||','||b||','||c,',') , '')
from (
select
split_part('a__b','_',1) a,
split_part('a__b','_',2) b,
split_part('a__b','_',3) c
) inn
Returns
{a,b}
And then from the Array - i need to extract values by index
e.g. Extract(ARRAY,2)
But this one seems to me like an overkill - is there a better or something simpler to use ?
You can use with ordinality to preserve the index information during unnesting:
select a.c
from unnest(string_to_array('a__b___d_','_')) with ordinality as a(c,idx)
where nullif(trim(c), '') is not null
order by idx;
If you want that back as an array:
select array_agg(a.c order by a.idx)
from unnest(string_to_array('a__b___d_','_')) with ordinality as a(c,idx)
where nullif(trim(c), '') is not null;

Return string of description given string of IDs (separated by commas)

I have one table A that has a column C and a lookup table (lookup) that provides a description given an ID.
Here the setup:
table A with column C values:
1,2,3
2,3,4
table lookup:
1, 'This'
2, 'is'
3, 'tricky'
4, 'SQL'
Provide a SQL (SQL Server 2005) statement that returns the following strings:
Input: 1,2,3 Output: 'This','Is','tricky'
Input: 2,3,4 Output: 'Is','tricky','SQL'
basically turning the string of IDs (from an input table A) into a string of descriptions
The Samples that come with SQL Server 2005 include a CLR function called Split(). It's the best way of splitting comma-separated lists like this by far.
Suppose you have a table called inputs, with a column called input.
I forget what the particular outputs of dbo.Split() are... so work with me here. Let's call the fields id and val, where id tells us which entry it is in the list.
WITH
separated AS (
SELECT i.input, s.id, s.val
FROM
dbo.inputs AS i
CROSS APPLY
dbo.Split(i.input) AS s
)
, converted AS (
SELECT s.input, s.id, m.string
FROM
separated AS s
JOIN
dbo.mapping AS m
ON m.number = CAST(s.val AS varchar(5))
)
SELECT c.input, (SELECT string + ' ' FROM converted AS c2 WHERE c2.input = c.input ORDER BY id FOR XML PATH('')) AS converted_string
FROM converted AS c
GROUP BY c.input;