How to UNNEST an Array Postgresql double nested? - sql

I struggle with Unnesting an array in this format -> btw newbie alert! Use Case: I want to count all v=1234 in a table custom_fields = {f=[{v=1234}, {v=[]}]}
I tried to use:
select custom_fields[safe_offset(1)]
from database
limit 10
it gives me the column, but still everything is nested.
Then i tried this:
SELECT tickets.id, cf
FROM db.tickets
CROSS JOIN UNNEST(tickets.custom_fields) AS cf
limit 10
same behaviour as first code.
I tried this [][]:
SELECT
custom_fields[1][1]
FROM db.tickets
limit 10
*Array element access with array[position] is not supported. Use
array[OFFSET(zero_based_offset)] or array[ORDINAL(one_based_ordinal)]
but jeah thats the query at the beginning of this message.
I am pretty lost.. Anyone an idea?

Not sure I fully understood your question, but I replicated your example adding an id column and the json_col containing the json. The following statement extracts each v value in a different row and is still related to the id
with my_tbl as (
select 1 id, '{"f":[{"v":1234}, {"v":2345}, {"v":7777}]}'::jsonb as json_col UNION ALL
select 2 id, '{"f":[{"v":6789}, {"v":3333}]}'::jsonb as json_col
)
select * from my_tbl, jsonb_to_recordset(jsonb_extract_path(json_col, 'f')) as x(v int);
The sql, uses the JSONB_EXTRACT_PATH to extract the f part, and the JSONB_TO_RECORDSET to create a row for each v value. More info on JSON functions in the documentation
id | json_col | v
----+------------------------------------------------+------
1 | {"f": [{"v": 1234}, {"v": 2345}, {"v": 7777}]} | 1234
1 | {"f": [{"v": 1234}, {"v": 2345}, {"v": 7777}]} | 2345
1 | {"f": [{"v": 1234}, {"v": 2345}, {"v": 7777}]} | 7777
2 | {"f": [{"v": 6789}, {"v": 3333}]} | 6789
2 | {"f": [{"v": 6789}, {"v": 3333}]} | 3333
(5 rows)

Related

Order query results by items in WHERE clause [duplicate]

This question already has answers here:
ORDER BY the IN value list
(17 answers)
Closed 3 months ago.
How to let the query result be ordered by the exact order of passed items in the WHERE clause?
For example, using this query:
SELECT id, name FROM my_table
WHERE id in (1,3,5,2,4,6)
ORDER BY id
Result:
id | name
---------
1 | a
2 | b
3 | c
4 | d
5 | e
6 | f
What I expected:
id | name
---------
1 | a
3 | c
5 | e
2 | b
4 | d
6 | f
I noticed that there is a FIELD() function in MySQL. Is there an equivalent function in PostgreSQL?
Pass an array and use WITH ORDINALITY. That's cleanest and fastest:
SELECT id, t.name
FROM unnest ('{1,3,5,2,4,6}'::int[]) WITH ORDINALITY u(id, ord)
JOIN my_table t USING (id)
ORDER BY u.ord;
Assuming values in the passed array are distinct. Else, this solution preserves duplicates, while IN removes them. You'd have to define which behavior you want. But then the desired sort order is also ambiguous, which would make the question moot.
See:
ORDER BY the IN value list
PostgreSQL unnest() with element number
#chris Kao, use Position in postgresql.
Approach : 1
SELECT id, name FROM my_table
WHERE id in (1,3,5,2,4,6)
order by position(id::text in '1,3,5,2,4,6')
output:
id|name|
--+----+
1|a |
3|c |
5|e |
2|b |
4|d |
6|f |
Aprroach : 2
select id, name
from my_table mt
where id in (1,3,5,2,4,6)
order by array_position(array[1,3,5,2,4,6], mt.id);

Snowflake returns 'invalid query block' error when using `=ANY()` subquery operator

I'm trying to filter a table with a list of strings as a parameter, but as I want to make the parameter optional (in Python sql user case) I can't use IN operator.
With postgresql I was able to build the query like this:
SELECT *
FROM table1
WHERE (id = ANY(ARRAY[%(param_id)s]::INT[]) OR %(param_id)s IS NULL)
;
Then in Python one could choose to pass a list of param_id or just None, which will return all results from table1. E.g.
pandas.read_sql(query, con=con, params={param_id: [id_list or None]})
However I couldn't do the same with snowflake because even the following query fails:
SELECT *
FROM table1
WHERE id = ANY(param_id)
;
Does Snowflake not have ANY operator? Because it is in their doc.
If the parameter is a single string literal 1,2,3 then it first needs to be parsed to multiple rows SPLIT_TO_TABLE
SELECT *
FROM table1
WHERE id IN (SELECT s.value
FROM TABLE (SPLIT_TO_TABLE(%(param_id)s, ',')) AS s);
Agree with #Yuya. This is not very clear in documentation. As per doc -
"IN is shorthand for = ANY, and is subject to the same restrictions as ANY subqueries."
However, it does not work this way - IN works with a IN list where as ANY only works with subquery.
Example -
select * from values (1,2),(2,3),(4,5);
+---------+---------+
| COLUMN1 | COLUMN2 |
|---------+---------|
| 1 | 2 |
| 2 | 3 |
| 4 | 5 |
+---------+---------+
IN works fine with list of literals -
select * from values (1,2),(2,3),(4,5) where column1 in (1,2);
+---------+---------+
| COLUMN1 | COLUMN2 |
|---------+---------|
| 1 | 2 |
| 2 | 3 |
+---------+---------+
Below gives error (though as per doc IN and = ANY are same) -
select * from values (1,2),(2,3),(4,5) where column1 = ANY (1,2);
002076 (42601): SQL compilation error:
Invalid query block: (.
Using subquery ANY runs fine -
select * from values (1,2),(2,3),(4,5) where column1 = ANY (select column1 from values (1),(2));
+---------+---------+
| COLUMN1 | COLUMN2 |
|---------+---------|
| 1 | 2 |
| 2 | 3 |
+---------+---------+
Would it not make more sense for both snowflake and postgresql to have two functions/store procedures that have one/two parameters.
Then the one with the “default” just dose not asked this fake question (is in/any some none) and is simpler. Albeit it you question is interesting.

How to convert arrays from two different table columns to parallel rows?

I'm working with hive and I have a table of the following format (I present only one row, but it has many rows)
_______________________________
segments | rates | sessID
---------|-----------|---------
'1,2,3' | '10,20,30'| 555
Namely, two columns have a string representing arrays of the same length and the third column has some integer. I want to flatten the arrays such that first member of the first array appears in the same row with the first member of the second array, etc:
Something like:
----------------------------
segment | rate | sessId
--------|------|------------
1 | 10 | 555
2 | 20 | 555
3 | 30 | 555
I've tried the following query (for simplicity I've hardcoded the values):
SELECT explode(segments), explode (rates), sessID FROM
(SELECT Split('1,2,3', ',') as segments, Split('10,20,30', ',') as rates, 555 as sessID) data ;
However, this does produce the required result, returning an error:
FAILED: SemanticException 1:26 Only a single expression in the SELECT clause is supported with UDTF's. Error encountered near token 'rates'
When I try to flatten just one column it does work:
The query:
SELECT explode(segments) FROM (
SELECT Split('1,2,3', ',') as segments, Split('10,20,30', ',') as rates, 555 as sessID) data ;
the result:
1
2
3
How can I get the result I want?
I don't have access to Hive to test this, but the approach should basically work.
POSEXPLODE() can be used to get two columns, the position within an array and the item itself. Then you can use that position to look up the corresponding item from the other array...
SELECT
yourData.sessID,
segment.item AS segment,
SPLIT(yourData.rates, ',')[segment.pos] AS rate
FROM
yourData
LATERAL VIEW
POSEXPLODE(SPLIT(yourData.segments,',')) segment AS pos, item
I think that POSEXPLODE() returns the positions starting from 1, but array indexes in Hive start from 0? If that's the case then use [segment.pos - 1] instead.
Please give a try on this.
select sessID,tf1.val as segments, tf2.val as rates
from (SELECT Split('1,2,3', ',') as segments, Split('10,20,30', ',') as rates, 555 as sessID) t
lateral view posexplode(segments) tf1
lateral view posexplode(rates) tf2
where tf1.pos = tf2.pos;
+---------+-----------+--------+--+
| sessid | segments | rates |
+---------+-----------+--------+--+
| 555 | 1 | 10 |
| 555 | 2 | 20 |
| 555 | 3 | 30 |
+---------+-----------+--------+--+

Filter json values regardless of keys in PostgreSQL

I have a table called diary which includes columns listed below:
| id | user_id | custom_foods |
|----|---------|--------------------|
| 1 | 1 | {"56": 2, "42": 0} |
| 2 | 1 | {"19861": 1} |
| 3 | 2 | {} |
| 4 | 3 | {"331": 0} |
I would like to count how many diaries having custom_foods value(s) larger than 0 each user have. I don't care about the keys, since the keys can be any number in string.
The desired output is:
| user_id | count |
|---------|---------|
| 1 | 2 |
| 2 | 0 |
| 3 | 0 |
I started with:
select *
from diary as d
join json_each_text(d.custom_foods) as e
on d.custom_foods != '{}'
where e.value > 0
I don't even know whether the syntax is correct. Now I am getting the error:
ERROR: function json_each_text(text) does not exist
LINE 3: join json_each_text(d.custom_foods) as e
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
My using version is: psql (10.5 (Ubuntu 10.5-1.pgdg14.04+1), server 9.4.19). According to PostgreSQL 9.4.19 Documentation, that function should exist. I am so confused that I don't know how to proceed now.
Threads that I referred to:
Postgres and jsonb - search value at any key
Query postgres jsonb by value regardless of keys
Your custom_foods column is defined as text, so you should cast it to json before applying json_each_text. As json_each_text by default does not consider empty jsons, you may get the count as 0 for empty jsons from a separate CTE and do a UNION ALL
WITH empty AS
( SELECT DISTINCT user_id,
0 AS COUNT
FROM diary
WHERE custom_foods = '{}' )
SELECT user_id,
count(CASE
WHEN VALUE::int > 0 THEN 1
END)
FROM diary d,
json_each_text(d.custom_foods::JSON)
GROUP BY user_id
UNION ALL
SELECT *
FROM empty
ORDER BY user_id;
Demo

PostgreSQL: Efficiently split JSON array into rows

I have a table (Table A) that includes a text column that contains JSON encoded data.
The JSON data is always an array with between one and a few thousand plain object.
I have another table (Table B) with a few columns, including a column with a datatype of 'JSON'
I want to select all the rows from table A, split the json array into its elements and insert each element into table B
Bonus objective: Each object (almost) always has a key, x. I want to pull the value of x out into column, and delete x from the original object (if it exists).
E.g.: Table A
| id | json_array (text) |
+----+--------------------------------+
| 1 | '[{"x": 1}, {"y": 8}]' |
| 2 | '[{"x": 2, "y": 3}, {"x": 1}]' |
| 3 | '[{"x": 8, "z": 2}, {"z": 3}]' |
| 4 | '[{"x": 5, "y": 2, "z": 3}]' |
...would become: Table B
| id | a_id | x | json (json) |
+----+------+------+--------------------+
| 0 | 1 | 1 | '{}' |
| 1 | 1 | NULL | '{"y": 8}' |
| 2 | 2 | 2 | '{"y": 3}' |
| 3 | 2 | 1 | '{}' |
| 4 | 3 | 8 | '{"y": 2}' |
| 5 | 3 | NULL | '{"z": 3}' |
| 6 | 4 | 5 | '{"y": 2, "z": 3}' |
This initially has to work on a few million rows, and would then need to be run at regular intervals, so making it efficient would be a priority.
Is it possible to do this without using a loop and PL/PgSQL? I haven't been making much progress.
The json data type is not particularly suitable (or intended) for modification at the database level. Extracting "x" objects from the JSON object is therefore cumbersome, although it can be done.
You should create your table B (with hopefully a more creative column name than "json"; I am using item here) and make the id column a serial that starts at 0. A pure json solution then looks like this:
INSERT INTO b (a_id, x, item)
SELECT sub.a_id, sub.x,
('{' ||
string_agg(
CASE WHEN i.k IS NULL THEN '' ELSE '"' || i.k || '":' || i.v END,
', ') ||
'}')::json
FROM (
SELECT a.id AS a_id, (j.items->>'x')::integer AS x, j.items
FROM a, json_array_elements(json_array) j(items) ) sub
LEFT JOIN json_each(sub.items) i(k,v) ON i.k <> 'x'
GROUP BY sub.a_id, sub.x
ORDER BY sub.a_id;
In the sub-query this extracts the a_id and x values, well as the JSON object. In the outer query the JSON object is broken into its individual pieces and the objects with key x thrown out (the LEFT JOIN ON i.k <> 'x'). In the select list the pieces are put back together again with string concatenation and grouped into compound objects.
This necessarily has to be like this because json has no built-in manipulation functions of any consequence. This works on PG versions 9.3+, i.e. since time immemorial insofar as JSON support is concerned.
If you are using PG9.5+, the solution is much simpler through a cast to jsonb:
INSERT INTO b (a_id, x, item)
SELECT a.id, (j.items->>'x')::integer, j.items #- '{x}'
FROM a, jsonb_array_elements(json_array::jsonb) j(items);
The #- operator on the jsonb data type does all the dirty work here. Obviously, there is a lot of work going on behind the scenes, converting json to jsonb, so if you find that you need to manipulate your JSON objects more frequently then you are better off using the jsonb type to begin with. In your case I suggest you do some benchmarking with EXPLAIN ANALYZE SELECT ... (you can safely forget about the INSERT while testing) on perhaps 10,000 rows to see which works best for your setup.