How to get unique values from a column of Arrays PostgreSQL - sql

I am trying to extract DISTINCT value of a column of arrays.
For example, If I have two rows:
{jonathan,michelle}
{jonathan,michael}
The output should be:
{jonathan,michelle,michael}
The output can be an array or a "virtual column" it is not a problem.

The following seems to do the trick:
SELECT DISTINCT(unnest(tbl.ar)) FROM tbl

You can unnest and aggregate back, ignoring duplicates:
select array_agg(distinct u.val) new_ar
from mytable t
cross join lateral unnest(t.ar) as u(val)
Note that this does not guarantee the order in which elements will appear in the final array (there are options, but you did not specify what you wanted in that regard).
Demo on DB Fiddle:
| new_ar |
| :-------------------------- |
| {jonathan,michael,michelle} |

Related

Get Distinct value from a list in SQL Server

I have a DB column that has a comma delimited list:
VALUES ID
--------------------
1,11,32 A
11,12,28 B
1 C
32,12,1 D
When I run my SQL statement, in my WHERE clause I have tried IN, CONTAINS and LIKE with varying degrees of errors and success, but none offer an exact return of what I need.
What I need is a where clause that if I'm looking for all IDs with vale of '1' (NOT the number) in the list.
Example of problem:
WHERE values like (1)
This will return A,B,C,D because 1 is included in the value (11). I would expect IDs (A,C,D).
WHERE values like (2)
This will return A,B,D because 2 is included in the value (32,28,12). I would expect zeros records.
Thanks in advance for your help!
I will begin my answer by quoting the spot-on comment given by #Jarlh above:
Never, ever store data as comma separated items. It will only cause you lots of trouble.
That being said, if you're really stuck with this design, you could use:
SELECT *
FROM yourTable
WHERE ',' + [VALUES] + ',' LIKE '%,1,%';
The trick here is convert every VALUES into something looking like:
,11,12,28,
Then, we can search for a target number with comma delimiters on both sides. Since we placed commas at both ends, then every number in the CSV list is now guaranteed to have commas around it.
If you are stuck with such a poor data model, I would suggest:
select t.*
from t
where exists (select 1
from string_split(t.values, ',') s
where s.value = 1
);
Exactly i echo what jarlh and Tim says. relational model is not the right place to store comma delimited strings in table.
Here is an approach, that can likely use an index if there is one on column x
select distinct x
from t
cross apply string_split(t.x,',')
where value=1 /*out here you may parameterize, and also could make use of an index each if there is one in value*/
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b9b3084f52b0f42ffd17d90427016999
--SQL Server older versions
with data
as (
SELECT t.c.value('.', 'VARCHAR(1000)') as val
,y
,x
FROM (
SELECT x1 = CAST('<t>' +
REPLACE(x , ',', '</t><t>') + '</t>' AS XML)
,y
,x
FROM t
) a
CROSS APPLY x1.nodes('/t') t(c)
)
select x,y
from data
+---------+
| x |
+---------+
| 1 |
| 1,11,32 |
| 32,12,1 |
+---------+
working example
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=011a096bbdd759ea5fe3aa74b08bc895

How to retrieve records using comma separated values with IN clause?

I would like to retrieve certain records from a full list of table. Here I am using comma separated values with IN clause. The table rows looks like this:
Here is my SQL query, but the query completed with empty result set`
DECLARE #input VARCHAR(1000) = '2,3,17,10,16'
SELECT * FROM locations
WHERE
east_zone in (SELECT VALUE FROM string_split(#input,','))
OR
west_zone in (SELECT VALUE FROM string_split(#input,','))
Appreciate your help!
While this can be accomplished, i would request you to rethink your data model. Its a bad idea to store a comma separated list of ids/references in your databases. I strongly am with the comments of Tim Biegeleisen
Alternative would be store the list of zones-titles in a separate table.
Here is a way to accomplish this
with data
as (select 'model_check_holding' as col1,'1,2,3,4,5' as str union all
select 'model_cash_holding' as col1,'5,8,9' as str
)
,split_data
as (select *
from data
cross apply string_split(str,',')
)
,user_input
as(select '2,8,1' as input_val)
select *
from split_data
where value in (select x.value
from user_input
cross apply string_split(input_val,',') x
)
+---------------------+-----------+-------+
| col1 | str | value |
+---------------------+-----------+-------+
| model_check_holding | 1,2,3,4,5 | 1 |
| model_check_holding | 1,2,3,4,5 | 2 |
| model_cash_holding | 5,8,9 | 8 |
+---------------------+-----------+-------+
dbfiddle link
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=1cc9b224e443369744df19c1d7a7d789
Tim is 110% correct. Your data model is totally messed up -- not only storing multiple values in a delimited string. But also string numbers as strings. Wrong, wrong, wrong.
But if you are stuck with some else's really, really, really bad design choices, you do have an option:
DECLARE #input VARCHAR(1000) = '2,3,17,10,16';
SELECT l.*
FROM locations l
WHERE EXISTS (SELECT 1
FROM string_split(#input, ',') s1 JOIN
string_split(concat(l.east_zone, ',', l.west_zone), ',') l
ON s1.value = l.value
);
I do not recommend this approach. I merely suggest it as a stop-gap until you can fix the data model.

Remove duplicate entries from string array column of postgres

I have a PostgreSQL table where there is column which has array of strings. The row have some unique array strings or some have duplicate strings also. I want to remove duplicate strings from each row if they exists.
I have tried to some queries but couldn't make it happen.
Following is the table:
veh_id | vehicle_types
--------+----------------------------------------
1 | {"byd_tang","volt","viper","laferrari"}
2 | {"volt","viper"}
3 | {"byd_tang","sonata","jaguarxf"}
4 | {"swift","teslax","mirai"}
5 | {"volt","viper"}
6 | {"viper","ferrariff","bmwi8","viper"}
7 | {"ferrariff","viper","viper","volt"}
I am expecting following output:
veh_id | vehicle_types
--------+----------------------------------------
1 | {"byd_tang","volt","viper","laferrari"}
2 | {"volt","viper"}
3 | {"byd_tang","sonata","jaguarxf"}
4 | {"swift","teslax","mirai"}
5 | {"volt","viper"}
6 | {"viper","ferrariff","bmwi8"}
7 | {"ferrariff","viper","volt"}
Since each row's array is independent, a plain correlated subquery with an ARRAY constructor would do the job:
SELECT *, ARRAY(SELECT DISTINCT unnest (vehicle_types)) AS vehicle_types_uni
FROM vehicle;
See:
Why is array_agg() slower than the non-aggregate ARRAY() constructor?
Note that NULL is converted to an empty array ('{}'). We'd need to special-case it, but it is excluded in the UPDATE below anyway.
Fast and simple. But don't use this. You didn't say so, but typically you'd want to preserve original order of array elements. Your rudimentary sample suggests as much. Use WITH ORDINALITY in the correlated subquery, which becomes a bit more sophisticated:
SELECT *, ARRAY (SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
) AS vehicle_types_uni
FROM vehicle;
See:
PostgreSQL unnest() with element number
UPDATE to actually remove dupes:
UPDATE vehicle
SET vehicle_types = ARRAY (
SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
)
WHERE cardinality(vehicle_types) > 1 -- optional
AND vehicle_types <> ARRAY (
SELECT v
FROM unnest(vehicle_types) WITH ORDINALITY t(v,ord)
GROUP BY 1
ORDER BY min(ord)
); -- suppress empty updates (optional)
Both added WHERE conditions are optional to improve performance. The 1st one is completely redundant. Each condition also excludes the NULL case. The 2nd one suppresses all empty updates.
See:
How do I (or can I) SELECT DISTINCT on multiple columns?
If you tried to do that without preserving original order, you'd likely update most rows without need, just because the order or elements changed even without dupes.
Requires Postgres 9.4 or later.
db<>fiddle here
I don't claim it's efficient, but something like this might work:
with expanded as (
select veh_id, unnest (vehicle_types) as vehicle_type
from vehicles
)
select veh_id, array_agg (distinct vehicle_type)
from expanded
group by veh_id
If you really want to get fancy and do something that is worst case O(n), you can write a custom function:
create or replace function unique_array(input_array text[])
returns text[] as $$
DECLARE
output_array text[];
i integer;
BEGIN
output_array = array[]::text[];
for i in 1..cardinality(input_array) loop
if not (input_array[i] = any (output_array)) then
output_array := output_array || input_array[i];
end if;
end loop;
return output_array;
END;
$$
language plpgsql
Usage example:
select veh_id, unique_array(vehicle_types)
from vehicles

Extracting Multiple Numerical Values from Text

048(70F-Y),045(DDI-Y),454(CMDE-Y)
I have the above data in a column field, I need to extract each number before the, so in the above example I would want to see 048, 045, 454.
Note the data in the field will change in each record in the above you have 3 sets of numbers. Sometimes you may have just one set or 6 sets. I just need to capture all sets of numbers that are to the left of the (.
Ideally I would want the results to show in a new column like below. I have tried a few things and gotten no where any help would be greatly appreciate.
I would expect the result to look like the below:
+----------+-----------------------------------+---------------+
| EventId | PAEditTypes | Edits |
+----------+-----------------------------------+---------------+
| 6929107 | 082(SPA-Y),177(QL-Y) | 082, 177 |
| 26534980 | 048(70F-Y),045(DDI-Y),454(CMDE-Y) | 045, 048, 454 |
+----------+-----------------------------------+---------------+
You can get desired output with the following step:
use string_split with cross apply to isolate each item
use left to get only the first part of each item together with CHARINDEX to know where you have to stop
use STRING_AGG to build the final result, adding WITHIN GROUP clause to enforce ordering (if ordering is not important just remove WITHIN GROUP clause)
This is a TSQL sample that should work:
declare #tmp table ( EventId varchar(50), PAEditTypes varchar(200) )
insert into #tmp values
('6929107' ,'082(SPA-Y),177(QL-Y)' )
,('26534980','048(70F-Y),045(DDI-Y),454(CMDE-Y)')
select
EventId
, PAEditTypes
, STRING_AGG(left(value,CHARINDEX('(',value)-1),', ') WITHIN GROUP (ORDER BY value ASC) as Edits
from
#tmp
cross apply
string_split(PAEditTypes, ',')
group by
EventId
, PAEditTypes
order by
EventId desc
Output:

Using Sybase ASE 12.5.4 and looking to list / concat string values

I have a table that looks like this
I want to create a query so that same values of column 1 and column 2 (tcbname and status) are grouped and column 3 (scope_name) lists all the scopes related to that status and tcb_name.
Below is my expected outcome
| TUVAmerica, inc | | E | |<all the scope_name values that match first two column>|
It seems to me you need group_concat()
select tcb_name ,status,
group_concat(scope_name separator ',') as list_of_scope
from your_table
group by tcb_name,status