How to cast postgres JSON column to int without key being present in JSON (simple JSON values)? - sql

I am working on data in postgresql as in the following mytable with the fields id (type int) and val (type json):
id
val
1
"null"
2
"0"
3
"2"
The values in the json column val are simple JSON values, i.e. just strings with surrounding quotes and have no key.
I have looked at the SO post How to convert postgres json to integer and attempted something like the solution presented there
SELECT (mytable.val->>'key')::int FROM mytable;
but in my case, I do not have a key to address the field and leaving it empty does not work:
SELECT (mytable.val->>'')::int as val_int FROM mytable;
This returns NULL for all rows.
The best I have come up with is the following (casting to varchar first, trimming the quotes, filtering out the string "null" and then casting to int):
SELECT id, nullif(trim('"' from mytable.val::varchar), 'null')::int as val_int FROM mytable;
which works, but surely cannot be the best way to do it, right?
Here is a db<>fiddle with the example table and the statements above.

Found the way to do it:
You can access the content via the keypath (see e.g. this PostgreSQL JSON cheatsheet):
Using the # operator, you can access the json fields through the keypath. Specifying an empty keypath like this {} allows you to get your content without a key.
Using double angle brackets >> in the accessor will return the content without the quotes, so there is no need for the trim() function.
Overall, the statement
select id
, nullif(val#>>'{}', 'null')::int as val_int
from mytable
;
will return the contents of the former json column as int, respectvely NULL (in postgresql >= 9.4):
id
val_int
1
NULL
2
0
3
2
See updated db<>fiddle here.
--
Note: As pointed out by #Mike in his comment above, if the column format is jsonb, you can also use val->>0 to dereference scalars. However, if the format is json, the ->> operator will yield null as result. See this db<>fiddle.

Related

Postgresql query by json key condition

I've a table with JSON column and want to select rows where JSON key 'k' has value 'value'. Json may consist of several pairs of [K,V].
[
{"k":"esr:code","v":"800539"},
{"k":"lit","v":"yes"},
{"k":"name","v":"5 ΠΊΠΌ"},
{"k":"railway","v":"halt"},
{"k":"uic_ref","v":"2040757"}
]
I tried to use the next query, but it's wrong.
SELECT *
FROM public.node
where ((node.tags)::json->>'k' like 'name')
How I can fix it, if it's possible?)
Where node - table name, tags - json column.
You can use the JSONB contains operator #>
SELECT *
FROM public.node
where node.tags #> '[{"k","name"}]';
This will do an exact match against name. Your usage of like might indicate you are looking for a partial match - however as your like condition doesn't use a wildcard it's the same as =.
This assumes that tags is defined as jsonb (which it should be). If it's not you need to cast it: node.tags::jsonb

Postgres perform regex query on jsonb field

I have a column in my Postgres database that stores jsonb type values. Some of these values are raw strings (not a list or dictionary). I want to be able to perform a regex search on this column, such as
select * from database where jsonb_column::text ~ regex_expression.
The issue is that for values that are already strings, converting from jsonb to text adds additional escaped double quotes at the beginning and end of the value. I don't want these included in the regex query. I understand why Postgres does this, but if, say we assume all values stored in the jsonb field were jsonb strings, is there a work around? I know you can use ->> to get a value out of a jsonb dictionary, but can't figure out a solution for just jsonb strings on their own.
Once I figure out how to make this query in normal Postgres, I want to translate it into Peewee. However, any and all help with even just the initial query would be appreciated!
Just cast the json to text. Here is an example:
class Reg(Model):
key = CharField()
data = BinaryJSONField()
class Meta:
database = db
for i in range(10):
Reg.create(key='k%s' % i, data={'k%s' % i: 'v%s' % i})
# Find the row that contains the json string "k1": "v1".
expr = Reg.data.cast('text').regexp('"k1": "v1"')
query = Reg.select().where(expr)
for row in query:
print(row.key, row.data)
Prints
k1 {'k1': 'v1'}
To extract a plain string (string primitive without key name) from a JSON value (json or jsonb), you can extract the "empty path" like:
SELECT jsonb '"my string"' #>> '{}';
This also works for me (with jsonb but not with json), but it's more of a hack:
SELECT jsonb '"my string"' ->> 0
So:
SELECT * FROM tbl WHERE (jsonb_column #>> '{}') ~ 'my regex here';

Hive and HQL: how do you format JSON strings to be sorted by keys

in Hive/HQL, how do you format JSON strings to be ordered by their keys? For example
id
some_str
1
{"b":1, "c":2, "a":0}
I want the output to be ordered by the json keys (i.e. a<b<c):
id
some_str
1
{"a":0, "b": 1, "c":2}
I know I might be able to use get_json_object and do some hard-coded formatting, but that would only work if I have only a few keys, but unfortunately that doesn't apply here.
What would you suggest? Thanks!!
Additional Q's:
How do I check equality of {"b":1, "c":2, "a":0} and {"a":0, "b": 1, "c":2} (their equality should be True)?
How do I get the value of the largest/smallest key? i.e. expected results:
id
some_str
1
0
(smallest key = "a")
or
id
some_str
1
2
(largest key = "c")
Hive does not support JSON data type, JSON can be parsed by JSONSerDe or get_json_object/json_tuple. If the data in your question were of type map, then it would be sortable/comparable based on keys/values. Strings are comparable as normal strings, no matter is it JSON is inside or not JSON.
Consider storing it as map<string,int> or map<string,string> type.
Also you can convert it to map using str_to_map function (returns map<string, string>) and regexp_replace to remove quotes, spaces, curly braces, then use IN operator to compare. See this demo:
with mydata as (
select '{"b":1, "c":2, "a":0}' as A, '{"a":0, "b": 1, "c":2}' as B
)
select A, B, A in (B), B in (A) from
(
select str_to_map(regexp_replace(regexp_replace(regexp_replace(A,': +',':'),', +',','),'"|\\{|\\}','')) A,
str_to_map(regexp_replace(regexp_replace(regexp_replace(B,': +',':'),', +',','),'"|\\{|\\}','')) B
from mydata
)s
Result:
a b a_equal_b b_equal_a
{"a":"0","b":"1","c":"2"} {"a":"0","b":"1","c":"2"} true true
Both maps are equal, note they are displayed in the same keys order (a<b<c). After conversion to map, you can order, compare and easily extract keys, values and convert values to int if necessary.
Also you can convert JSON string to map specifying types for key and value using brickhouse json_map function, without need to additionally transform JSON string using regexp_replace, this is the most efficient method:
json_map(' {"b":1, "c":2, "a":0}' as A, '{"a":0, "b": 1, "c":2}', 'string,int')
Read how to install brickhouse functions here.

unnest() not exploding array, returns error Column alias list has 1 entries but 't' has 2 columns available

I have some json data which includes a property 'characters' and it looks like this:
select json_data['characters'] from latest_snapshot_events
Returns: [{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":10,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":3},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":39,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":2},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":6801450488388220,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":1,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":8355588830097610,"shards":0,"CHAR_TPIECES":5,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4}]
This is returned on a single row. I would like a single row for each item within the array.
I found several SO posts and other blogs advising me to use unnest(). I've tried this several times and cannot get a result to return. For example, here is the documentation from presto. The bottom covers unnest as a stand in for hive's lateral view explode:
SELECT student, score
FROM tests
CROSS JOIN UNNEST(scores) AS t (score);
So I tried to apply this to my table:
characters as (
select
jdata.characters
from latest_snapshot_events
cross join unnest(json_data) as t(jdata)
)
select * from characters;
where json_data is the field in latest_snapshot_events that contains the the property 'characters' which is an array like the one shown above.
This returns an error:
[Simba]AthenaJDBC An error has been thrown from the AWS Athena client. SYNTAX_ERROR: line 69:12: Column alias list has 1 entries but 't' has 2 columns available
How can I unnest/explode latest_snapshot_events.json_data['characters'] onto multiple rows?
Since characters is a JSON array in textual representation, you'll have to:
Parse the JSON text with json_parse to produce a value of type JSON.
Convert the JSON value into a SQL array using CAST.
Explode the array using UNNEST.
For instance:
WITH data(characters) AS (
VALUES '[{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":10,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":3},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":39,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":2},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":6801450488388220,"shards":0,"CHAR_TPIECES":0,"CHAR_A5_LVL":1,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4},{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,"CHAR_TIER":1,"ITEM":8355588830097610,"shards":0,"CHAR_TPIECES":5,"CHAR_A5_LVL":0,"CHAR_A2_LVL":1,"CHAR_A4_LVL":1,"ITEM_CATEGORY":"Character","ITEM_LEVEL":4}]'
)
SELECT entry
FROM data, UNNEST(CAST(json_parse(characters) AS array(json))) t(entry)
which produces:
entry
-----------------------------------------------------------------------
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":60,"CHAR_A3_LVL":1,...
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":50,"CHAR_A3_LVL":1,...
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":80,"CHAR_A3_LVL":1,...
{"CHAR_STARS":1,"CHAR_A1_LVL":1,"ITEM_POWER":85,"CHAR_A3_LVL":1,...
In the example above, I convert the JSON value into an array(json), but
you can further convert it to something more concrete if the values inside each
array entry have a regular schema. For example, for your data, it is
possible to cast it to an array(map(varchar, json)) since every element in the
array is a JSON object.
json_parse works if your initial data is a JSON string. However, for array(row) types (i.e. an array of objects/dictionaries), casting to array(json) will convert each row into an array, removing all keys from the object and preventing you from using dot notation or json_extract functions.
To unnest array(row) data, the syntax is much simpler:
CROSS JOIN UNNEST(my_array) AS my_row
I got stuck with this error trying to unpivot data.
This might help someone:
SELECT a_col, b_col
FROM
(
SELECT MAP(
ARRAY['a', 'b', 'c', 'd'],
ARRAY[1, 2, 3, 4]
) my_col
) CROSS JOIN UNNEST(my_col) as t(a_col, b_col)
t() allows you define multiple columns as outputs.

How to query a json column for empty objects?

Looking to find all rows where a certain json column contains an empty object, {}. This is possible with JSON arrays, or if I am looking for a specific key in the object. But I just want to know if the object is empty. Can't seem to find an operator that will do this.
dev=# \d test
Table "public.test"
Column | Type | Modifiers
--------+------+-----------
foo | json |
dev=# select * from test;
foo
---------
{"a":1}
{"b":1}
{}
(3 rows)
dev=# select * from test where foo != '{}';
ERROR: operator does not exist: json <> unknown
LINE 1: select * from test where foo != '{}';
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
dev=# select * from test where foo != to_json('{}'::text);
ERROR: operator does not exist: json <> json
LINE 1: select * from test where foo != to_json('{}'::text);
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
dwv=# select * from test where foo != '{}'::json;
ERROR: operator does not exist: json <> json
LINE 1: select * from test where foo != '{}'::json;
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
There is no equality (or inequality) operator for the data type json as a whole, because equality is hard to establish. Consider jsonb in Postgres 9.4 or later, where this is possible. More details in this related answer on dba.SE (last chapter):
How to remove known elements from a JSON[] array in PostgreSQL?
SELECT DISTINCT json_column ... or ... GROUP BY json_column fail for the same reason (no equality operator).
Casting both sides of the expression to text allows = or <> operators, but that's not normally reliable as there are many possible text representations for the same JSON value. In Postgres 9.4 or later, cast to jsonb instead. (Or use jsonb to begin with.)
However, for this particular case (empty object) it works just fine:
select * from test where foo::text <> '{}'::text;
Empty JSON array [] could also be relevant.
Then this could work for both [] and {}:
select * from test where length(foo::text) > 2 ;
You have to be careful. Casting all your data as a different type so you can compare it will have performance issues on a large database.
If your data has a consistent key then you can look for the existence of the key. For example if plan data is {} or {id: '1'}
then you can look for items without 'id'
SELECT * FROM public."user"
where NOT(plan ? 'id')
As of PostgreSQL 9.5 this type of query with JSON data is not possible. On the other hand, I agree it would be very useful and created a request for it:
https://postgresql.uservoice.com/forums/21853-general/suggestions/12305481-check-if-json-is-empty
Feel free to vote it, and hopefully it will be implemented!
In 9.3 it is possible to count the pairs in each object and filter the ones with none
create table test (foo json);
insert into test (foo) values
('{"a":1, "c":2}'), ('{"b":1}'), ('{}');
select *
from test
where (select count(*) from json_each(foo) s) = 0;
foo
-----
{}
or test the existence, probably faster for big objects
select *
from test
where not exists (select 1 from json_each(foo) s);
Both techniques will work flawlessly regardless of formating
According to the JSON Functions and Operators documentation you can use the double arrow function (->>) to get a json object or array field as text. Then do an equality check against a string.
So this worked for me:
SELECT jsonb_col from my_table
WHERE jsonb_col ->> 'key' = '{}';
Or if it's nested more than one level use the path function (#>>)
SELECT jsonb_col from my_table
WHERE jsonb_col #>> '{key, nestedKey}' = '{}';
Currently supported version as of this writing:
Supported Versions: Current (13) / 12 / 11 / 10 / 9.6