Transform set of data into a single column - sql

Lets say I have this set of integers enclosed in the parenthesis (1,2,3,4,5).
Data I have:
(1,2,3,4,5)
And I would want them to be in a single column.
Expected Output:
column
--------
1
2
3
4
5
(5 rows)
How can I do this? I've tried using array then unnest but with no luck. I know I'm doing something wrong.
I need this to optimize a query that is using a large IN statement, I want to put it in a temp table then join it on the main table.

You can convert the string to an array, then do the unnest:
select *
from unnest(translate('(1,2,3,4,5)', '()', '{}')::int[]);
The translate() call converts '(1,2,3,4,5)' to '{1,2,3,4,5}' which is the string representation of an array. That string is then cast to an array using ::int[].
You don't need a temp table, you can directly join to the result of the unnest.
select *
from some_table t
join unnest(translate('(1,2,3,4,5)', '()', '{}')::int[]) as l(id)
on t.id = l.id;
Another option is to simply use that array in a where condition:
select *
from some_table t
where t.id = any (translate('(1,2,3,4,5)', '()', '{}')::int[]);

Related

BigQuery, Converting string into an ARRAY

I have the following table in BigQuery:
A
B
First
[joe, becky, smith]
Second
[joe, matthew]
Column B has type 'STRING'
I want to convert Column B into a big query array of ARRAY
I attempted to use JSON_EXTRACT_ARRAY but this does not work as the elements inside the arrays of B are not enclosed within double quotes (") (i.e. they are not of the form, ["joe", "becky", "smith"])
Consider below
select a,
array(select trim(val) from unnest(split(trim(b, '[]'))) val) b
from `project.dataset.table`
when applied to sample data in your question - output is

Hive Delimiter using :

I want to extract a column A that has values such as W:X:Y:Z.
I am interested to extract Z from Column A.
I tried multiple commands such as SPLIT(Table.A, "[:]"[3] ) but get an error.
What is the best way to do this?
Split function returns array. Array index [3] should be applied to the split function result:
with yourtable as ( -- use your table instead of this
select 'W:X:Y:Z' as A
)
select split(A,'\\:')[3] from yourtable;
Result:
Z

Get max on comma separated values in column

How to get max on comma separated values in Original_Ids column and get max value in one column and remaining ids in different column.
|Original_Ids | Max_Id| Remaining_Ids |
|123,534,243,345| 534 | 123,234,345 |
Upadte -
If I already have Max_id and just need below equation?
Remaining_Ids = Original_Ids - Max_id
Thanks
Thanks to the excellent possibilities of array manipulation in Postgres, this could be done relatively easy by converting the string to an array and from there to a set.
Then regular queries on that set are possible. With max() the maximum can be selected and with EXCEPT ALL the maximum can be removed from the set.
A set can then be converted to an array and with array_to_string() and the array can be converted to a delimited string again.
SELECT ids original_ids,
(SELECT max(un.id::integer)
FROM unnest(string_to_array(ids,
',')) un(id)) max_id,
array_to_string(ARRAY((SELECT un.id::integer
FROM unnest(string_to_array(ids,
',')) un(id)
EXCEPT ALL
SELECT max(un.id::integer)
FROM unnest(string_to_array(ids,
',')) un(id))),
',') remaining_ids
FROM elbat;
Another option would have been regexp_split_to_table() which directly produces a set (or regexp_split_to_array() but than we'd had the possible regular expression overhead and still had to convert the array to a set).
But nevertheless you just should (almost) never use delimited lists (nor arrays). Use a table, that's (almost) always the best option.
SQL Fiddle
You can use a window function (https://www.postgresql.org/docs/current/static/tutorial-window.html) to get the max element per unnested array. After that you can reaggregate the elements and remove the calculated max value from the array.
Result:
a max_elem remaining
123,534,243,345 534 123,243,345
3,23,1 23 3,17
42 42
56,123,234,345,345 345 56,123,234
This query needs only one split/unnest as well as only one max calculation.
SELECT
a,
max_elem,
array_remove(array_agg(elements), max_elem) as remaining -- C
FROM (
SELECT
*,
MAX(elements) OVER (PARTITION BY a) as max_elem -- B
FROM (
SELECT
a,
unnest((string_to_array(a, ','))::int[]) as elements -- A
FROM arrays
)s
)s
GROUP BY a, max_elem
A: string_to_array converts the string list into an array. Because the arrays are treated as string arrays you need the cast them into integer arrays by adding ::int[]. The unnest() expands all array elements into own rows.
B: window function MAX gives the maximum value of the single arrays as max_elem
C: array_agg reaggregates the elements through the GROUP BY id. After that array_remove removes the max_elem value from the array.
If you do not like to store them as pure arrays but as string list again you could add array_to_string. But I wouldn't recommend this because your data are integer arrays and not strings. For every further calculation you would need this string cast. A even better way (as already stated by #stickybit) is not to store the elements as arrays but as unnested data. As you can see in nearly every operation should would do the unnest before.
Note:
It would be better to use an ID to adress the columns/arrays instead of the origin string as in SQL Fiddle with IDs
If you install the extension intarray this is quite easy.
First you need to create the extension (you have to be superuser to do that):
create extension intarray;
Then you can do the following:
select original_ids,
original_ids[1] as max_id,
sort(original_ids - original_ids[1]) as remaining_ids
from (
select sort_desc(string_to_array(original_ids,',')::int[]) as original_ids
from bad_design
) t
But you shouldn't be storing comma separated values to begin with

Postgres Array[VarChar] uppercase?

I'm trying to find a way to force an array to make it upper or lowercase. This is so that no matter what the user inputs they get a result. This is the query:
select * from table where any(:id) = databasecolumn
:id is an array of chars that the user inputs(can be lowercase or uppercase) and I need to make sure that whatever the user inputs they get a result.
This works as long as the user inputs in uppercase (because the database values are also uppercase). But when they input lowercase letters they get no response.
I tried this:
select * from table where any(upper(:id)) = upper(databasecolumn)
but this does not work because the function "upper" is not for arrays. It works fine when I do it with a single input but not arrays.
Do you have any pointers? I couldn't find an equivalent function for an array of varchars.
You could use ILIKE:
select *
from table
where databasecolumn ILIKE any(:id);
This:
with data (col) as (
values ('one'), ('Two'), ('THREE')
)
select *
from data
where col ilike any(array['one', 'two', 'three']);
returns:
col
-----
one
Two
THREE
you can use double casting like here:
t=# with a as (select '{caSe1,cAse2}'::text[] r) select r,upper(r::text)::text[] from a where true;
r | upper
---------------+---------------
{caSe1,cAse2} | {CASE1,CASE2}
(1 row)
It neglects the benefits of using ANY though

Postgres query to calculate matching strings

I have following table:
id description additional_info
123 XYZ XYD
And an array as:
[{It is known to be XYZ},{It is know to be none},{It is know to be XYD}]
I need to map both the content in such a way that for every record of table I'm able to define the number of successful match.
The result of the above example will be:
id RID Matches
1 123 2
Only the content at position 0 and 2 match the record's description/additional_info so Matches is 2 in the result.
I am struggling to transform this to a query in Postgres - dynamic SQL to create a VIEW in a PL/pgSQL function to be precise.
It's undefined how to deal with array elements that match both description and additional_info at the same time. I'll assume you want to count that as 1 match.
It's also undefined where id = 1 comes from in the result.
One way is to unnest() the array and LEFT JOIN the main table to each element on a match on either of the two columns:
SELECT 1 AS id, t.id AS "RID", count(a.txt) AS "Matches"
FROM tbl t
LEFT JOIN unnest(my_arr) AS a(txt) ON a.txt ~ t.description
OR a.txt ~ t.additional_info
GROUP BY t.id;
I use a regular expression for the match. Special characters like (.\?) etc. in the strings to the right have special meaning. You might have to escape those if possible.
Addressing your comment
You should have mentioned that you are using a plpgsql function with EXECUTE. Probably 2 errors:
The variable array_content is not visible inside EXECUTE, you need to pass the value with a USING clause - or concatenate it as string literal in a CREATE VIEW statement which does not allow parameters.
Missing single quotes around the string 'brand_relevance_calculation_‌​view'. It's still a string literal before you concatenate it as identifier. You did good to use format() with %I there.
Demo:
DO
$do$
DECLARE
array_content varchar[]:= '{FREE,DAY}';
BEGIN
EXECUTE format('
CREATE VIEW %I AS
SELECT id, description, additional_info, name, count(a.text) AS business_objectives
, multi_city, category IS NOT NULL AS category
FROM initial_events i
LEFT JOIN unnest(%L::varchar[]) AS a(text) ON a.text ~ i.description
OR a.text ~ i.additional_info'
, 'brand_relevance_calculation_‌​view', array_content);
END
$do$;