Postgres perform regex query on jsonb field - sql

I have a column in my Postgres database that stores jsonb type values. Some of these values are raw strings (not a list or dictionary). I want to be able to perform a regex search on this column, such as
select * from database where jsonb_column::text ~ regex_expression.
The issue is that for values that are already strings, converting from jsonb to text adds additional escaped double quotes at the beginning and end of the value. I don't want these included in the regex query. I understand why Postgres does this, but if, say we assume all values stored in the jsonb field were jsonb strings, is there a work around? I know you can use ->> to get a value out of a jsonb dictionary, but can't figure out a solution for just jsonb strings on their own.
Once I figure out how to make this query in normal Postgres, I want to translate it into Peewee. However, any and all help with even just the initial query would be appreciated!

Just cast the json to text. Here is an example:
class Reg(Model):
key = CharField()
data = BinaryJSONField()
class Meta:
database = db
for i in range(10):
Reg.create(key='k%s' % i, data={'k%s' % i: 'v%s' % i})
# Find the row that contains the json string "k1": "v1".
expr = Reg.data.cast('text').regexp('"k1": "v1"')
query = Reg.select().where(expr)
for row in query:
print(row.key, row.data)
Prints
k1 {'k1': 'v1'}

To extract a plain string (string primitive without key name) from a JSON value (json or jsonb), you can extract the "empty path" like:
SELECT jsonb '"my string"' #>> '{}';
This also works for me (with jsonb but not with json), but it's more of a hack:
SELECT jsonb '"my string"' ->> 0
So:
SELECT * FROM tbl WHERE (jsonb_column #>> '{}') ~ 'my regex here';

Related

Postgresql query by json key condition

I've a table with JSON column and want to select rows where JSON key 'k' has value 'value'. Json may consist of several pairs of [K,V].
[
{"k":"esr:code","v":"800539"},
{"k":"lit","v":"yes"},
{"k":"name","v":"5 ΠΊΠΌ"},
{"k":"railway","v":"halt"},
{"k":"uic_ref","v":"2040757"}
]
I tried to use the next query, but it's wrong.
SELECT *
FROM public.node
where ((node.tags)::json->>'k' like 'name')
How I can fix it, if it's possible?)
Where node - table name, tags - json column.
You can use the JSONB contains operator #>
SELECT *
FROM public.node
where node.tags #> '[{"k","name"}]';
This will do an exact match against name. Your usage of like might indicate you are looking for a partial match - however as your like condition doesn't use a wildcard it's the same as =.
This assumes that tags is defined as jsonb (which it should be). If it's not you need to cast it: node.tags::jsonb

Why is array_remove (POSTGRESQL) not working in this case?

This is not working correctly:
SELECT array_remove(array_agg(s1->>'karten'),'8 eich.jpg') from spiele;
The output: ["3 eich.jpg","8 eich.jpg","5 sche.jpg","2 herz.jpg","1 laub.jpg","4 eich.jpg","2 sche.jpg","5 laub.jpg","4 herz.jpg","4 sche.jpg"]
The datatype of s1 is json; s1->>'karten' is an array
If karten refers to a JSON array in the, then s1 ->> 'karten doesn't return each element individually, but a one string representing the array. So array_agg() doesn't really aggregate multiple values - only one. The result is an array with a single element - that happens to look like a JSON array.
You can remove an element from a JSON array if the values is jsonb (the recommended data type to handle JSON in Postgres anyway) using the - operator:
select (s1 -> 'karten')::jsonb - '8 eich.jpg'
will return a jsonb value that is an without the key '8 eich.jpg'.
Unfortunately there is no easy conversion from a JSON array to a native array. Search this site, there are multiple answers for that.

How to cast postgres JSON column to int without key being present in JSON (simple JSON values)?

I am working on data in postgresql as in the following mytable with the fields id (type int) and val (type json):
id
val
1
"null"
2
"0"
3
"2"
The values in the json column val are simple JSON values, i.e. just strings with surrounding quotes and have no key.
I have looked at the SO post How to convert postgres json to integer and attempted something like the solution presented there
SELECT (mytable.val->>'key')::int FROM mytable;
but in my case, I do not have a key to address the field and leaving it empty does not work:
SELECT (mytable.val->>'')::int as val_int FROM mytable;
This returns NULL for all rows.
The best I have come up with is the following (casting to varchar first, trimming the quotes, filtering out the string "null" and then casting to int):
SELECT id, nullif(trim('"' from mytable.val::varchar), 'null')::int as val_int FROM mytable;
which works, but surely cannot be the best way to do it, right?
Here is a db<>fiddle with the example table and the statements above.
Found the way to do it:
You can access the content via the keypath (see e.g. this PostgreSQL JSON cheatsheet):
Using the # operator, you can access the json fields through the keypath. Specifying an empty keypath like this {} allows you to get your content without a key.
Using double angle brackets >> in the accessor will return the content without the quotes, so there is no need for the trim() function.
Overall, the statement
select id
, nullif(val#>>'{}', 'null')::int as val_int
from mytable
;
will return the contents of the former json column as int, respectvely NULL (in postgresql >= 9.4):
id
val_int
1
NULL
2
0
3
2
See updated db<>fiddle here.
--
Note: As pointed out by #Mike in his comment above, if the column format is jsonb, you can also use val->>0 to dereference scalars. However, if the format is json, the ->> operator will yield null as result. See this db<>fiddle.

Flatten data source in Snowflake from Array

I am trying to fix an array in a dataset. Currently, I have a data set that has a reference number to multiple different uuids. What I would like to do is flatten this out in Snowflake to make it so the reference number has separate row for each uuid. For example
Reference UUID
1) 9f823c2a-ced5-4dbe-be65-869311462f75 "[
""05554f65-6aa9-4dd1-6271-8ce2d60f10c4"",
""df662812-7f97-0b43-9d3e-12f64f504fbb"",
""08644a69-76ed-ce2d-afff-b236a22efa69"",
""f1162c2e-eeb5-83f6-5307-2ed644e6b9eb"",
]"
Should end up looking like:
Reference UUID
1) 9f823c2a-ced5-4dbe-be65-869311462f75 05554f65-6aa9-4dd1-6271-8ce2d60f10c4
2) 9f823c2a-ced5-4dbe-be65-869311462f75 df662812-7f97-0b43-9d3e-12f64f504fbb
3) 9f823c2a-ced5-4dbe-be65-869311462f75 08644a69-76ed-ce2d-afff-b236a22efa69
4) 9f823c2a-ced5-4dbe-be65-869311462f75 f1162c2e-eeb5-83f6-5307-2ed644e6b9eb
I just started working in Snowflake so I am new to it. It looks like there is a lateral flatten, but this is either not working on telling me that I have all sorts of errors with it. The documentation from snowflake is a bit perplexing when it comes to this.
While FLATTEN is the right approach when exploding an array, the UUID column value shown in the original description is invalid if interpreted as JSON syntax: "[""val1"", ""val2""]" and that'll need correction before a LATERAL FLATTEN approach can be applied by treating it as a VARIANT type.
If your data sample in the original description is a literal one and applies for all columnar values, then the following query will help transform it into a valid JSON syntax and then apply a lateral flatten to yield the desired result:
SELECT
T.REFERENCE,
X.VALUE AS UUID
FROM (
SELECT
REFERENCE,
-- Attempts to transform an invalid JSON array syntax such as "[""a"", ""b""]"
-- to valid JSON: ["a", "b"] by stripping away unnecessary quotes
PARSE_JSON(REPLACE(REPLACE(REPLACE(UUID, '""', '"'), '["', '['), ']"', ']')) AS UUID_ARR_CLEANED
FROM TABLENAME) T,
LATERAL FLATTEN(T.UUID_ARR_CLEANED) X
If your data is already in a valid VARIANT type with a successful PARSE_JSON done for the UUID column during ingest, and the example provided in the description was just a formatting issue that only displays the JSON invalid in the post, then the simpler version of the same query as above will suffice:
SELECT REFERENCE, X.VALUE AS UUID
FROM TABLENAME, LATERAL FLATTEN(TABLENAME.UUID) X

Get an average value for element in column of arrays of json data in postgres

I have some data in a postgres table that is a string representation of an array of json data, like this:
[
{"UsageInfo"=>"P-1008366", "Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0},
{"Role"=>"Text", "ProjectCode"=>"", "PublicationCode"=>"", "RetailPrice"=>2},
{"Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0, "ParentItemId"=>"396487"}
]
This is is data in one cell from a single column of similar data in my database.
The datatype of this stored in the db is varchar(max).
My goal is to find the average RetailPrice of EVERY json item with "Role"=>"Abstract", including all of the json elements in the array, and all of the rows in the database.
Something like:
SELECT avg(json_extract_path_text(json_item, 'RetailPrice'))
FROM (
SELECT cast(json_items to varchar[]) as json_item
FROM my_table
WHERE json_extract_path_text(json_item, 'Role') like 'Abstract'
)
Now, obviously this particular query wouldn't work for a few reasons. Postgres doesn't let you directly convert a varchar to a varchar[]. Even after I had an array, this query would do nothing to iterate through the array. There are probably other issues with it too, but I hope it helps to clarify what it is I want to get.
Any advice on how to get the average retail price from all of these arrays of json data in the database?
It does not seem like Redshift would support the json data type per se. At least, I found nothing in the online manual.
But I found a few JSON function in the manual, which should be instrumental:
JSON_ARRAY_LENGTH
JSON_EXTRACT_ARRAY_ELEMENT_TEXT
JSON_EXTRACT_PATH_TEXT
Since generate_series() is not supported, we have to substitute for that ...
SELECT tbl_id
, round(avg((json_extract_path_text(elem, 'RetailPrice'))::numeric), 2) AS avg_retail_price
FROM (
SELECT *, json_extract_array_element_text(json_items, pos) AS elem
FROM (VALUES (0),(1),(2),(3),(4),(5)) a(pos)
CROSS JOIN tbl
) sub
WHERE json_extract_path_text(elem, 'Role') = 'Abstract'
GROUP BY 1;
I substituted with a poor man's solution: A dummy table counting from 0 to n (the VALUES expression). Make sure you count up to the maximum number of possible elements in your array. If you need this on a regular basis create an actual numbers table.
Modern Postgres has much better options, like json_array_elements() to unnest a json array. Compare to your sibling question for Postgres:
Can get an average of values in a json array using postgres?
I tested in Postgres with the related operator ->>, where it works:
SQL Fiddle.