Presto Unnest varchar array field with {} - sql

I have a column with inconsistent data format, some of them are a list of array [], some of them are JSON_like objects {}
id
prices
1
[100,100,110]
2
{200,210,190}
create table test(id integer, prices varchar(255));
insert into test
values
(1,'[100,100,110]'),
(2,'{200,210,190}');
When I tried to unnest, my query works fine for the first row, but it fails on the second row. Is there a way I can convert the {} to a list of array []?
This is my query:
select id,prices,price from test
cross join UNNEST(cast(json_parse(prices) as array<varchar>)) as t (price)

You can use replace and then parse the data into array:
select json_parse(replace(replace('{200,210,190}', '}', ']'), '{', '['))
Output:
_col0
[200,210,190]

Related

hive string json list to array with specific field

I want to select array at string json list with specific field in hive.
For example,
[{"key1":"val1","key2":"val2"},{"key1":"val3","key2":"val4"},{"key1":"val5","key2":"val6"}]
return array of key1 value
[val1,val3,val5]
How can I make it possible?
Convert string to JSON array: remove [], split by comma between } and {. Then extract val1 and collect_list to get an array of val1, see comments in the code:
with mytable as(--data example with single row
select '[{"key1":"val1","key2":"val2"},{"key1":"val3","key2":"val4"},{"key1":"val5","key2":"val6"}]' as json_string
)
select collect_list( --collect array
get_json_object(json_map_string,'$.key1') --key1 extracted
) as key1_array
from
(
select split(regexp_replace(json_string,'^\\[|\\]$',''), --remove []
'(?<=\\}),(?=\\{)' --split by comma only after } and before {
) as json_array --converted to array of json strings (map)
from mytable
)s
lateral view outer explode(json_array) e as json_map_string --explode array elements
;
Result:
key1_array
["val1","val3","val5"]

getting duplicate values in jsonb query posgresql

In table I have used jsonb to store multiple values in json array. now i want to write a query where day is monday. [{'day':'monday','time':"8 am"},{'day':'tuesday','time':"8 am"}{'day':'monday','time':"11 am"},{'day':'friday','time':"8 am"}]
Query:
SELECT array_to_json(array_agg(j))
FROM demo t, jsonb_array_elements(t.di_item ) j
WHERE j->>'day' = 'monday'
Result:
[{'day':'monday','time':"8 am"},{'day':'monday','time':"11 am"},{'day':'monday','time':"8
am"},{'day':'monday','time':"11 am"}]
Expected:
[{'day':'monday','time':"8 am"},{'day':'monday','time':"11 am"}]
One value getting two times.
First: no need to aggregate the json objects as an array, and then convert it to a json array, you can use json[b]_agg() directly. Then: use distinct to avoid duplicates.
SELECT jsonb_agg(distinct j)
FROM demo t
CROSS JOIN LATERAL jsonb_array_elements(t.di_item) j
WHERE j->>'day' = 'monday'

How to get maximum value of a specific part of strings?

I have below records
Id Title
500006 FS/97/98/037
500007 FS/97/04/035
500008 FS/97/01/036
500009 FS/97/104/040
I should split Title field and get 4th part of text and return maximum value. In this example my query should return 040 or 40.
select max(cast(right(Title, charindex('/', reverse(Title) + '/') - 1) as int))
from your_table
SQLFiddle demo
You can use PARSENAME function since you always have 4 parts(confirmed in comments section)
select max(cast(parsename(replace(Title,'/','.'),1) as int))
from yourtable
If you want to split the data in the Title column and get the part from the splitted text by position, you may try with one JSON-based approach with a simple string transformation. You need to transform the data in the Title column into a valid JSON array (FS/97/98/037 into ["FS","97","08","037"]) and after that to parse thе data with OPENJSON(). The result from OPENJSON() (using default schema and parsing JSON array) is a table with columns key, value and type, and the key column holds the index of the items in the JSON array:
Note, that using STRING_SPLIT() is not an option here, because the order of the returned rows is not guaranteed.
Table:
CREATE TABLE Data (
Id varchar(6),
Title varchar(50)
)
INSERT INTO Data
(Id, Title)
VALUES
('500006', 'FS/97/98/037'),
('500007', 'FS/97/04/035'),
('500008', 'FS/97/01/036'),
('500009', 'FS/97/104/040')
Statement:
SELECT MAX(j.[value])
FROM Data d
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(d.Title, '/', '","'), '"]')) j
WHERE (j.[key] + 1) = 4
If you data has fixed format with 4 parts, even this approach may help:
SELECT MAX(PARSENAME(REPLACE(Title, '/', '.'), 1))
FROM Data
You can also try the below query.
SELECT Top 1
CAST('<x>' + REPLACE(Title,'/','</x><x>') + '</x>' AS XML).value('/x[4]','int') as Value
from Data
order by 1 desc
You can find the live demo Here.

How to get the first field from an anonymous row type in PostgreSQL 9.4?

=# select row(0, 1) ;
row
-------
(0,1)
(1 row)
How to get 0 within the same query? I figured the below sort of working but is there any simple way?
=# select json_agg(row(0, 1))->0->'f1' ;
?column?
----------
0
(1 row)
No luck with array-like syntax [0].
Thanks!
Your row type is anonymous and therefore you cannot access its elements easily. What you can do is create a TYPE and then cast your anonymous row to that type and access the elements defined in the type:
CREATE TYPE my_row AS (
x integer,
y integer
);
SELECT (row(0,1)::my_row).x;
Like Craig Ringer commented in your question, you should avoid producing anonymous rows to begin with, if you can help it, and type whatever data you use in your data model and queries.
If you just want the first element from any row, convert the row to JSON and select f1...
SELECT row_to_json(row(0,1))->'f1'
Or, if you are always going to have two integers or a strict structure, you can create a temporary table (or type) and a function that selects the first column.
CREATE TABLE tmptable(f1 int, f2 int);
CREATE FUNCTION gettmpf1(tmptable) RETURNS int AS 'SELECT $1.f1' LANGUAGE SQL;
SELECT gettmpf1(ROW(0,1));
Resources:
https://www.postgresql.org/docs/9.2/static/functions-json.html
https://www.postgresql.org/docs/9.2/static/sql-expressions.html
The json solution is very elegant. Just for fun, this is a solution using regexp (much uglier):
WITH r AS (SELECT row('quotes, "commas",
and a line break".',null,null,'"fourth,field"')::text AS r)
--WITH r AS (SELECT row('',null,null,'')::text AS r)
--WITH r AS (SELECT row(0,1)::text AS r)
SELECT CASE WHEN r.r ~ '^\("",' THEN ''
WHEN r.r ~ '^\("' THEN regexp_replace(regexp_replace(regexp_replace(right(r.r, -2), '""', '\"', 'g'), '([^\\])",.*', '\1'), '\\"', '"', 'g')
ELSE (regexp_matches(right(r.r, -1), '^[^,]*'))[1] END
FROM r
When converting a row to text, PostgreSQL uses quoted CSV formatting. I couldn't find any tools for importing quoted CSV into an array, so the above is a crude text manipulation via mostly regular expressions. Maybe someone will find this useful!
With Postgresql 13+, you can just reference individual elements in the row with .fN notation. For your example:
select (row(0, 1)).f1; --> returns 0.
See https://www.postgresql.org/docs/13/sql-expressions.html#SQL-SYNTAX-ROW-CONSTRUCTORS

Selecting data into a Postgres array

I have the following data:
name id url
John 1 someurl.com
Matt 2 cool.com
Sam 3 stackoverflow.com
How can I write an SQL statement in Postgres to select this data into a multi-dimensional array, i.e.:
{{John, 1, someurl.com}, {Matt, 2, cool.com}, {Sam, 3, stackoverflow.com}}
I've seen this kind of array usage before in Postgres but have no idea how to select data from a table into this array format.
Assuming here that all the columns are of type text.
You cannot use array_agg() to produce multi-dimensional arrays, at least not up to PostgreSQL 9.4.
(But the upcoming Postgres 9.5 ships a new variant of array_agg() that can!)
What you get out of #Matt Ball's query is an array of records (the_table[]).
An array can only hold elements of the same base type. You obviously have number and string types. Convert all columns (that aren't already) to text to make it work.
You can create an aggregate function for this like I demonstrated to you here before.
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Call:
SELECT array_agg_mult(ARRAY[ARRAY[name, id::text, url]]) AS tbl_mult_arr
FROM tbl;
Note the additional ARRAY[] layer to make it a multidimensional array (2-dimenstional, to be precise).
Instant demo:
WITH tbl(id, txt) AS (
VALUES
(1::int, 'foo'::text)
,(2, 'bar')
,(3, '}b",') -- txt has meta-characters
)
, x AS (
SELECT array_agg_mult(ARRAY[ARRAY[id::text,txt]]) AS t
FROM tbl
)
SELECT *, t[1][3] AS arr_element_1_1, t[3][4] AS arr_element_3_2
FROM x;
You need to use an aggregate function; array_agg should do what you need.
SELECT array_agg(s) FROM (SELECT name, id, url FROM the_table ORDER BY id) AS s;