PostgreSQL query on text array value - sql

I have a table where one column has an array - but stored in a text format:
mytable
id ids
-- -------
1 '[3,4]'
2 '[3,5]'
3 '[3]'
etc ...
I want to find all records that have the value 5 as an array element in the ids column.
I was trying to achieve this by using the "string to array" function and removing the [ symbols with the translate function, but couldn't find a way.

You can do this: http://www.sqlfiddle.com/#!1/5c148/12
select *
from tbl
where translate(ids, '[]','{}')::int[] && array[5];
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
You can also use bool_or: http://www.sqlfiddle.com/#!1/5c148/11
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id
from a
group by id
having bool_or(elem = 5);
To see the original elements:
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id, '[' || array_to_string(array_agg(elem), ',') || ']' as ids
from a
group by id
having bool_or(elem = 5);
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
Postgresql DDL is atomic, if it's not late yet in your project, just structure your stringly-typed array to a real array: http://www.sqlfiddle.com/#!1/6e18c/2
alter table tbl
add column id_array int[];
update tbl set id_array = translate(ids,'[]','{}')::int[];
alter table tbl drop column ids;
Query:
select *
from tbl
where id_array && array[5]
Output:
| ID | ID_ARRAY |
-----------------
| 2 | 3,5 |
You can also use contains operator: http://www.sqlfiddle.com/#!1/6e18c/6
select *
from tbl
where id_array #> array[5];
I prefer the && syntax though, it directly connotes intersection. It reflects that you are detecting if there's an intersection between two sets(array is a set)
http://www.postgresql.org/docs/8.2/static/functions-array.html

If you store the string representation of your arrays slightly differently, you can cast to array of integer directly:
INSERT INTO mytable
VALUES
(1, '{3,4}')
,(2, '{3,5}')
,(3, '{3}');
SELECT id, ids::int[]
FROM mytable;
Else, you have to put in one more step:
SELECT (translate(ids, '[]','{}'))::int[]
FROM mytable
I would consider making the column an array type to begin with.
Either way, you can find your row like this:
SELECT id, ids
FROM (
SELECT id, ids, unnest(ids::int[]) AS elem
FROM mytable
) x
WHERE elem = 5

Related

SQLite: How to update rows with a sequence of numbers?

In SQLIte I would like to renumber the values in a specific column with a sequence of numbers.
For example the relevance-column in these rows:
relevance | value
-------------------
3 | value1
5 | valueb
8 | valuex
9 | valueaa
must be updated starting from 1 with increment 1:
relevance | value
-------------------
1 | value1
2 | valueb
3 | valuex
4 | valueaa
What I'm looking for, is something like this:
-- first set all to startvalue
UPDATE MyTable SET relevance = 0;
-- then renumber:
UPDATE MyTable SET relevance = (some function to increase by 1 to the previous row);
I tried this, but its not increasing, seems like Max is not evaluating on each row:
UPDATE MyTable SET relevance = (SELECT Max(relevance ))+1;
First create a temporary table where you will insert the column relevance from your table and with ROW_NUMBER() window function another column with the new sequence and then update from this temporary table:
drop table if exists temp.tmp;
create temporary table tmp as
select relevance, row_number() over (order by relevance) rn
from MyTable;
update MyTable
set relevance = (
select rn from temp.tmp
where temp.tmp.relevance = MyTable.relevance
);
drop table temp.tmp;
See the demo.

Split SQL string with specific string instead of separator?

I have table that looks like:
|ID | String
|546 | 1,2,1,5,7,8
|486 | 2,4,8,1,5,1
|465 | 18,11,20,1,4,18,11
|484 | 11,10,11,12,50,11
I want to split the string to this:
|ID | String
|546 | 1,2
|546 | 1,5
|486 | 1,5,1
|486 | 1
|465 | 1,4
My goal is to show ID and all the strings starting with 1 with just the next number after them.
I filtered all rows without '%1,%' and I don't know how to continue.
If you use SQL Server 2016+, you may try to use a JSON-based approach. You need to transform the data into a valid JSON array and parse the JSON array with OPENJSON(). Note that STRING_SPLIT() is not an option here, because as is mentioned in the documentation, the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
Table:
CREATE TABLE Data (
ID int,
[String] varchar(100)
)
INSERT INTO Data
(ID, [String])
VALUES
(546, '1,2,1,5,7,8'),
(486, '2,4,8,1,5,1'),
(465, '18,11,20,1,4,18,11'),
(484, '11,10,11,12,50,11')
Statement:
SELECT
ID,
CONCAT(FirstValue, ',', SecondValue) AS [String]
FROM (
SELECT
d.ID,
j.[value] As FirstValue,
LEAD(j.[value]) OVER (PARTITION BY d.ID ORDER BY CONVERT(int, j.[key])) AS SecondValue
FROM Data d
CROSS APPLY OPENJSON(CONCAT('[', d.[String], ']')) j
) t
WHERE t.FirstValue = '1'
Result:
----------
ID String
----------
465 1,4
486 1,5
486 1,
546 1,2
546 1,5
Something like :
SELECT ID, S.value
FROM Data
CROSS APPLY STRING_SPLIT(REPLACE(',' + String, ',1,', '#1,'), '#') AS S
WHERE value LIKE '1,%'
?

SQL grouping by distinct values in a multi-value string column

(I want to perform a group-by based on the distinct values in a string column that has multiple values
The said column has a list of strings in a standard format separated by commas. The potential values are only a,b,c,d.
For example the column collection (type: String) contains:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
The expected output is a count of unique values:
collection | count
a | 2
b | 3
c | 2
d | 1
For all the below i used this table:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
If the possible values are only a,b,c,d you can try one of this:
Tke note that this will only works if you have not so similar values like test and test_new, because then the test would be joined also with all test_new rows and the count would not match
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
Which will give you the result you want
collection | count
a | 2
b | 3
c | 2
d | 1
or you can try this one
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
The result would be like this
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
What you need to do is to first explode the collection column into separate rows (like a flatMap operation). In redshift the only way to generate new rows is to JOIN - so let's CROSS JOIN your input table with a static table having consecutive numbers, and take only ones having id less or equal to number of elements in the collection. Then we'll use split_part function to read the item at correct index. Once we have the exploaded table, we'll do a simple GROUP BY.
If your items are stored as JSON array strings ('["a", "b", "c"]') then you can use JSON_ARRAY_LENGTH and JSON_EXTRACT_ARRAY_ELEMENT_TEXT instead of REGEXP_COUNT and SPLIT_PART respectively.
with
index as (
select 1 as i
union all select 2
union all select 3
union all select 4 -- could be substituted with 'select row_number() over () as i from arbitrary_table limit 4'
),
agg as (
select 'a,b' as collection
union all select 'b,c'
union all select 'b,c,a'
union all select 'd'
)
select
split_part(collection, ',', i) as item,
count(*)
from index,agg
where regexp_count(agg.collection, ',') + 1 >= index.i -- only get rows where number of items matches
group by 1

Convert an array into a Map

I have a table with a column like
[{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}]
Which is of the format array<struct<key:string,value:array<string>>>
I want to convert the column into below format :
{"e":["253","203","204"],"st":["mi"],"k2":["1","2"]}
which is of the type map<string,array<string>>
I have tried exploding the array but that does not work. Any ideas how I can do this in hive.
Without use of external libraries it's impossible. Please refer to brickhouse or create your own UDAF.
Note: further code provides snippets to reproduce the problem and solving the problem that Hive's built-in functions can solve. i.e map<string,string> not map<string, array<string>>.
-- reproducing the problem
CREATE TABLE test_table(id INT, input ARRAY<STRUCT<key:STRING,value:ARRAY<STRING>>>);
INSERT INTO TABLE test_table
SELECT
1 AS id,
ARRAY(
named_struct("key","e", "value", ARRAY("253","203","204")),
named_struct("key","st", "value", ARRAY("mi")),
named_struct("key","k2", "value", ARRAY("1", "2"))
) AS input;
SELECT id, input FROM test_table;
+-----+-------------------------------------------------------------------------------------------------------+--+
| id | input |
+-----+-------------------------------------------------------------------------------------------------------+--+
| 1 | [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}] |
+-----+-------------------------------------------------------------------------------------------------------+--+
With exploding and using STRUCT features, we can split the keys and values.
SELECT id, exploded_input.key, exploded_input.value
FROM (
SELECT id, exploded_input
FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x;
+-----+------+----------------------+--+
| id | key | value |
+-----+------+----------------------+--+
| 1 | e | ["253","203","204"] |
| 1 | st | ["mi"] |
| 1 | k2 | ["1","2"] |
+-----+------+----------------------+--+
The idea is to use your UDAF to "collect" a map while aggregating on id.
What Hive can solve with built in function is generating map<string,string> by converting rows to strings with a special delimiter, aggregate rows via another special delimiter and use str_to_map built-in function on the delimiters to generate map<string, string>.
SELECT
id,
str_to_map(
-- outputs: e:253,203,204#st:mi#k2:1,2 with delimiters between aggregated rows
concat_ws('#', collect_list(list_to_string)),
'#', -- first delimiter
':' -- second delimiter
) mapped_output
FROM (
SELECT
id,
-- outputs 3 rows: (e:253,203,203), (st:mi), (k2:1,2)
CONCAT(exploded_input.key,':' , CONCAT_WS(',', exploded_input.value)) as list_to_string
FROM (
SELECT id, exploded_input
FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x
) y
GROUP BY id;
Which outputs a string to string map like:
+-----+-------------------------------------------+--+
| id | mapped_output |
+-----+-------------------------------------------+--+
| 1 | {"e":"253,203,204","st":"mi","k2":"1,2"} |
+-----+-------------------------------------------+--+
with input_set as (
select array(named_struct('key','e','value',array('253','203','204')),named_struct('key','st','value',array('mi')),named_struct('key','k2','value',array('1','2'))) as input_array
), break_input_set as (
select y.col_num as y_col_num,y.col_value as y_col_value from input_set lateral view posexplode(input_set.input_array) y as col_num, col_value
), create_map as (
select map(y_col_value.key,y_col_value.value) as final_map from break_input_set
)
select * from create_map;
var Array = [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}];
var obj = {}
for(var i=0;i<Array.length;i++){
obj[Array[i].key] = Array[i].value
}
obj will be in the required format

Postgres: count unique array entries from subquery

If my subquery foo liberates the rows:-
ID, USERS
1 {23129}
2 {23142}
3 {23300,23300}
4 {23129,23300}
How can I get a count of unique users in a query using a window function such as:-
SELECT ... FROM ( <subquery> ) FOO
I tried this:-
array_length(array_agg(array_length(array(SELECT Distinct unnest(users))),1)) over(), 1)
But get the error that the array dimensions are not the same
NOTE WELL: I cannot change the subquery to solve this problem.
I can get the IDs in an array as follows:-
string_to_array(string_agg(array_to_string(user_ids, ','), ',') over(),',')
But they are not distinct.
You're overcomplicating things - you can unnest the array, and then query a distinct count from it:
SELECT COUNT(DISTINCT u)
FROM (SELECT UNNEST(users) AS u
FROM mytable) t
You can always use a known alghoritm in a simple SQL function:
create or replace function array_unique_elements(arr anyarray)
returns integer
language sql immutable
as $$
select count(distinct a)::int
from unnest(arr) a
$$;
Use:
select *, array_unique_elements(users)
from (
values
(1, '{23129}'::int[]),
(2, '{23142}'),
(3, '{23300,23300}'),
(4, '{23129,23300}')
) foo (id, users)
id | users | array_unique_elements
----+---------------+-----------------------
1 | {23129} | 1
2 | {23142} | 1
3 | {23300,23300} | 1
4 | {23129,23300} | 2
(4 rows)
I would as well just count distinct as Mureinik suggests.
And regarding error that you get - here's tight syntax example with array_length:
t=# with a(v) as (values('{1,2}'::int[]),('{2,3}'))
select array_length(array_agg(distinct unnest),1) from (
select unnest(v) from a
) a;
array_length
--------------
3
(1 row)
Of course it WILL NOT with window aggregation - only with GROUP BY