Remove numeric item from JsonB array - sql

I have jsonb value with a nested JSON array and need remove an element:
{"values": ["11", "22", "33"]}
jsonb_set(column_name, '{values}', ((column_name -> 'values') - '33')) -- WORKS!
I also have a similar jsonb value with numbers, not strings:
{"values": [11, 22, 33]}
jsonb_set(column_name, '{values}', ((column_name -> 'values') - 33)) -- FAILS!
In this case 33 is used as index of the array.
How to remove items from JSON array when those items are numbers?

Two assertions:
Many Postgres JSON functions and operators target the key in key/value pairs. Strings ("abc" or "33") in JSON arrays are treated like keys without value. But numeric (33 or 123.45) array elements are treated as values.
There are currently three variants of the - operator. Two of them apply here. As the recently clarified manual describes (currently /devel):
Operator
Description
Example(s)
:---------------------
jsonb - text → jsonb
Deletes a key (and its value) from a JSON object, or matching string value(s) from a JSON array.
'{"a": "b", "c": "d"}'::jsonb - 'a' → {"c": "d"}
'["a", "b", "c", "b"]'::jsonb - 'b' → ["a", "c"]
...
jsonb - integer → jsonb
Deletes the array element with specified index (negative integers count from the end).
Throws an error if JSON value is not an array.
'["a", "b"]'::jsonb - 1 → ["a"]
With the right operand being a numeric literal, Postgres operator type resolution arrives at the later variant.
Unfortunately, we cannot use the former variant to begin with, due to assertion 1.
So we have to use a workaround like:
SELECT jsonb_set(column_name
, '{values}'
, (SELECT jsonb_agg(val)
FROM jsonb_array_elements(t.column_name -> 'values') x(val)
WHERE val <> jsonb '33')
) AS column_name
FROM tbl t;
db<>fiddle here -- with extended test case
Do not cast unnested elements to integer (like another answer suggests).
Numeric values may not fit integer.
JSON arrays (unlike Postgres arrays) can hold a mix of element types. So some array elements may be numeric, but others string, etc.
It's more expensive to cast all array elements (on the left). Just cast the value to replace (on the right).
So this works for any types, not just integer (JSON numeric). Example:
'{"values": ["abc", "22", 33]}')

Unfortunately, Postgres json operator - only supports string values, as explained in the documentation:
operand: -
right operand type: text
description: Delete key/value pair or string element from left operand. Key/value pairs are matched based on their key value.
On the other hand, if you pass an integer value as right operand, Postgres considers it the index of the array element that needs to be removed.
An alternative option is to unnest the array with jsonb_array_elements() and a lateral join, filter out the unwanted value, then re-aggregate:
select jsonb_set(column_name, '{values}', new_values) new_column_name
from mytable t
left join lateral (
select jsonb_agg(val) new_values
from jsonb_array_elements(t.column_name -> 'values') x(val)
where val::int <> 33
) x on 1 = 1
Demo on DB Fiddle:
with mytable as (select '{"values": [11, 22, 33]}'::jsonb column_name)
select jsonb_set(column_name, '{values}', new_values) new_column_name
from mytable t
left join lateral (
select jsonb_agg(val) new_values
from jsonb_array_elements(t.column_name -> 'values') x(val)
where val::int <> 33
) x on 1 = 1
| new_column_name |
| :------------------- |
| {"values": [11, 22]} |

Related

Postgres perform regex query on jsonb field

I have a column in my Postgres database that stores jsonb type values. Some of these values are raw strings (not a list or dictionary). I want to be able to perform a regex search on this column, such as
select * from database where jsonb_column::text ~ regex_expression.
The issue is that for values that are already strings, converting from jsonb to text adds additional escaped double quotes at the beginning and end of the value. I don't want these included in the regex query. I understand why Postgres does this, but if, say we assume all values stored in the jsonb field were jsonb strings, is there a work around? I know you can use ->> to get a value out of a jsonb dictionary, but can't figure out a solution for just jsonb strings on their own.
Once I figure out how to make this query in normal Postgres, I want to translate it into Peewee. However, any and all help with even just the initial query would be appreciated!
Just cast the json to text. Here is an example:
class Reg(Model):
key = CharField()
data = BinaryJSONField()
class Meta:
database = db
for i in range(10):
Reg.create(key='k%s' % i, data={'k%s' % i: 'v%s' % i})
# Find the row that contains the json string "k1": "v1".
expr = Reg.data.cast('text').regexp('"k1": "v1"')
query = Reg.select().where(expr)
for row in query:
print(row.key, row.data)
Prints
k1 {'k1': 'v1'}
To extract a plain string (string primitive without key name) from a JSON value (json or jsonb), you can extract the "empty path" like:
SELECT jsonb '"my string"' #>> '{}';
This also works for me (with jsonb but not with json), but it's more of a hack:
SELECT jsonb '"my string"' ->> 0
So:
SELECT * FROM tbl WHERE (jsonb_column #>> '{}') ~ 'my regex here';

Hive and HQL: how do you format JSON strings to be sorted by keys

in Hive/HQL, how do you format JSON strings to be ordered by their keys? For example
id
some_str
1
{"b":1, "c":2, "a":0}
I want the output to be ordered by the json keys (i.e. a<b<c):
id
some_str
1
{"a":0, "b": 1, "c":2}
I know I might be able to use get_json_object and do some hard-coded formatting, but that would only work if I have only a few keys, but unfortunately that doesn't apply here.
What would you suggest? Thanks!!
Additional Q's:
How do I check equality of {"b":1, "c":2, "a":0} and {"a":0, "b": 1, "c":2} (their equality should be True)?
How do I get the value of the largest/smallest key? i.e. expected results:
id
some_str
1
0
(smallest key = "a")
or
id
some_str
1
2
(largest key = "c")
Hive does not support JSON data type, JSON can be parsed by JSONSerDe or get_json_object/json_tuple. If the data in your question were of type map, then it would be sortable/comparable based on keys/values. Strings are comparable as normal strings, no matter is it JSON is inside or not JSON.
Consider storing it as map<string,int> or map<string,string> type.
Also you can convert it to map using str_to_map function (returns map<string, string>) and regexp_replace to remove quotes, spaces, curly braces, then use IN operator to compare. See this demo:
with mydata as (
select '{"b":1, "c":2, "a":0}' as A, '{"a":0, "b": 1, "c":2}' as B
)
select A, B, A in (B), B in (A) from
(
select str_to_map(regexp_replace(regexp_replace(regexp_replace(A,': +',':'),', +',','),'"|\\{|\\}','')) A,
str_to_map(regexp_replace(regexp_replace(regexp_replace(B,': +',':'),', +',','),'"|\\{|\\}','')) B
from mydata
)s
Result:
a b a_equal_b b_equal_a
{"a":"0","b":"1","c":"2"} {"a":"0","b":"1","c":"2"} true true
Both maps are equal, note they are displayed in the same keys order (a<b<c). After conversion to map, you can order, compare and easily extract keys, values and convert values to int if necessary.
Also you can convert JSON string to map specifying types for key and value using brickhouse json_map function, without need to additionally transform JSON string using regexp_replace, this is the most efficient method:
json_map(' {"b":1, "c":2, "a":0}' as A, '{"a":0, "b": 1, "c":2}', 'string,int')
Read how to install brickhouse functions here.

unnest() not exploding array, returns error Column alias list has 1 entries but 't' has 2 columns available

I have some json data which includes a property 'characters' and it looks like this:
select json_data['characters'] from latest_snapshot_events
Returns: [{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":10,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":3},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":39,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":2},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":6801450488388220,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":1,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":8355588830097610,"shards":0,"CHAR_TPIECES":5,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4}]
This is returned on a single row. I would like a single row for each item within the array.
I found several SO posts and other blogs advising me to use unnest(). I've tried this several times and cannot get a result to return. For example, here is the documentation from presto. The bottom covers unnest as a stand in for hive's lateral view explode:
SELECT student, score
FROM tests
CROSS JOIN UNNEST(scores) AS t (score);
So I tried to apply this to my table:
characters as (
select
jdata.characters
from latest_snapshot_events
cross join unnest(json_data) as t(jdata)
)
select * from characters;
where json_data is the field in latest_snapshot_events that contains the the property 'characters' which is an array like the one shown above.
This returns an error:
[Simba]AthenaJDBC An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 69:12: Column alias list has 1 entries but 't' has 2 columns available
How can I unnest/explode latest_snapshot_events.json_data['characters'] onto multiple rows?
Since characters is a JSON array in textual representation, you'll have to:
Parse the JSON text with json_parse to produce a value of type JSON.
Convert the JSON value into a SQL array using CAST.
Explode the array using UNNEST.
For instance:
WITH data(characters) AS (
VALUES '[{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":10,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":3},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":39,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":2},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":6801450488388220,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":1,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":8355588830097610,"shards":0,"CHAR_TPIECES":5,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4}]'
)
SELECT entry
FROM data, UNNEST(CAST(json_parse(characters) AS array(json))) t(entry)
which produces:
entry
-----------------------------------------------------------------------
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,...
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,...
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,...
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,...
In the example above, I convert the JSON value into an array(json), but
you can further convert it to something more concrete if the values inside each
array entry have a regular schema. For example, for your data, it is
possible to cast it to an array(map(varchar, json)) since every element in the
array is a JSON object.
json_parse works if your initial data is a JSON string. However, for array(row) types (i.e. an array of objects/dictionaries), casting to array(json) will convert each row into an array, removing all keys from the object and preventing you from using dot notation or json_extract functions.
To unnest array(row) data, the syntax is much simpler:
CROSS JOIN UNNEST(my_array) AS my_row
I got stuck with this error trying to unpivot data.
This might help someone:
SELECT a_col, b_col
FROM
(
SELECT MAP(
ARRAY['a', 'b', 'c', 'd'],
ARRAY[1, 2, 3, 4]
) my_col
) CROSS JOIN UNNEST(my_col) as t(a_col, b_col)
t() allows you define multiple columns as outputs.

Dynamically cast element to JSON array if it is a JSON string in PostgreSQL

Basically, I have to select a value from a JSON object that has an awful format. This value might be a string or a JSON array, and, if it is an array, I should aggregate it's elements into a single, comma-separated string.
So, for example, my table might look like this:
id | field
------+--------------------------------------
1 | {"key": "string"}
---------------------------------------------
2 | {"key": ["array", "of", "strings"]}
and I would need a result like
"string",
"array, of, strings"
I'm using the following query:
SELECT my_table.id,
array_to_string(ARRAY(SELECT json_array_elements_text(field -> json_object_keys(field))), ', '),
FROM my_table
to aggregate the JSON array into a string. However, when the value is a string, I get the ERROR: cannot call json_array_elements_text on a scalar.
So now the next step seems like casting the value to a JSON array when it's a string (basically just wrap it with brackets).
How would I do such a thing? I've looked at the Postgres doc but see no functions to cast to different JSON types.
I'm using PostgreSQL 10
with c(id,field) as (values(1,'{"key": "string"}'::json),(2, '{"key": ["array", "of", "strings"]}'))
, pg10 as (select id, case when json_typeof(field->'key') != 'array' then json_build_array(field->>'key') else field->'key' end field from c)
, m as (select id, json_array_elements_text(field) field from pg10)
select distinct id, string_agg(field,', ') over (partition by id) field from m;
as you mentioned in comment you need an extra step to take SRF out of case
https://www.db-fiddle.com/f/mr5rWGpC6xRoBvUKwR7RTN/0

how to convert array elements from string to int in postgres?

I have a column in the table as jsonb [14,21,31]
and I want to get all the rows with selected element eg
SELECT *
FROM t_customers
WHERE tags ?| array['21','14']
but the jsonb elements are in integer format
how do i convert the sql array elements into integer
i tried removing the quotes from the array but it gives an error
A naive solution would be:
t=# with t_customers(tags) as (values('[14,21,31]'::jsonb))
select
tags
, translate(tags::text,'[]','{}')::int[] jsonb_int_to_arr
, translate(tags::text,'[]','{}')::int[] #> array['21','14']::int[] includes
from
t_customers;
tags | jsonb_int_to_arr | includes
--------------+------------------+----------
[14, 21, 31] | {14,21,31} | t
(1 row)
https://www.postgresql.org/docs/current/static/functions-array.html
if you want to cast as array - you should use #> operator to check if contains.
(at first I proposed it because I misunderstood the question - so it goes the opposite way, "turning" jsonb to array and checking if it contains, but now maybe this naive approach is the shortest)
the right approach here probably would be:
t=# with t_customers(tags) as (values('[14,21,31]'::jsonb))
, t as (select tags,jsonb_array_elements(tags) from t_customers)
select jsonb_agg(jsonb_array_elements::text) ?| array['21','14'] tags from t group by tags;
tags
------
t
(1 row)
which is basically "repacking" jsonb array with text representations of integers