check if a jsonb field contains an array - type-inference

I have a jsonb field in a PostgreSQL table which was supposed to contain a dictionary like data aka {} but few of its entries got an array due to source data issues.
I want to weed out those entries. One of the ways is to perform following query -
select json_field from data_table where cast(json_field as text) like '[%]'
But this requires converting each jsonb field into text. With data_table having order of 200 million entries, this looks like bit of an overkill.
I investigated pg_typeof but it returns jsonb which doesn't help differentiate between a dictionary and an array.
Is there a more efficient way to achieve the above?

How about using the json_typeof function?
select json_field from data_table where json_typeof(json_field) = 'array'

Related

Update a value in json column array of objects in postgresql raw query

I have a json type column in postgreSQL and I want to update the specific field in that json column. Below is the column I want to update in,
[{"id":"xyyc","answered":false,"payable":true,"productIncentiveCost":{"incentive":0,"cost":0,"dollarIncentive":0,"dollarCost":0},"reward":0,"amountInDollar":0,"delayToNextProduct":"","extraDelayToNextProduct":""}].
I want to update the "reward" at 0 index of this column using postgreSQL raw query.
I have tried updating it using update method set and set_json but no luck.
update data
set value = tmp.upd_json_arr
from (
select
jsonb_agg(jsonb_row || '{"reward": 167}') as upd_json_arr
from data,
jsonb_array_elements(value) jsonb_row
) tmp;
Details:
First, you need to 'expand' your jsonb array using jsonb_array_elements() function. As a result you'll get jsonb rows.
Then, using concatenation operation you can rewrite a jsonb field: jsonb_row || '{"reward": 167}'.
Finally, jsonb_agg() function can help you to flatten all the jsonb rows into an array upd_json_arr.
upd_json_arr is used in a set-statement.
Here is the demo.
This answer shows different ways to manipulate with jsonb fields.

Extract key value pair from json column in redshift

I have a table mytable that stores columns in the form of JSON strings, which contain multiple key-value pairs. Now, I want to extract only a particular value corresponding to one key.
The column that stores these strings is of varchar datatype, and is created as:
insert into mytable(empid, json_column) values (1,'{"FIRST_NAME":"TOM","LAST_NAME" :"JENKINS", "DATE_OF_JOINING" :"2021-06-10", "SALARY" :"1000" }').
As you can see, json_column is created by inserting only a string. Now, I want to do something like:
select json_column.FIRST_NAME from mytable
I just want to extract the value corresponding to key FIRST_NAME.
Though my actual table is far more complex than this example, and I cannot convert these JSON keys into different columns themselves. But, this example clearly illustrates my issue.
This needs to be done over Redshift, please help me out with any valuable suggestions.
using function json_extract_path_text of Redshift can solve this problem easily, as follows:
select json_extract_path_text(json_column, 'FIRST_NAME') from mytable;

Query JSONB column for any value where =?

I have a jsonb column which has the unfortunate case of being very unpredictable, in some cases its value may be an array with nested values:
["UserMailer", "applicant_setup_3", ["5cbffeb7-8d5e-4b52-a475-3cf320b2cee9"]]
Sometimes it will be something with key/values like this:
[{"reference_id": "5cbffeb7-8d5e-4b52-a475-3cf320b2cee9", "job_dictionary": ["StatusUpdater", "FollowTwitterUsersJob"]}]
Is there a way to write a query which just treats the whole column like text and does a like to see if I can find the uuid in the big text blob? I want to find all the records where a particular uuid string is present in the jsonb column.
The query doesn't need to be fast or efficient.
Postgres has search operator ? for jsonb, but that would require you to search the json content recursively.
A possible, although not very efficient method, would to stringify the object and use LIKE to search it:
myjsonb::text LIKE '%"5cbffeb7-8d5e-4b52-a475-3cf320b2cee9"%'
myjsonb::text LIKE '%"' || myuuid || '"%'
Demo on DB Fiddle:
The problem with the jsonb operator ? is that it only considers top-level keys (including array elements), not values, and no nested objects.
You seem to be looking for values and array elements (not keys) on any level. You can get that with a full text search on top of your json(b) column:
SELECT * FROM tbl
WHERE to_tsvector('simple', jsonb_column)
## tsquery '5cbffeb7-8d5e-4b52-a475-3cf320b2cee9';
db<>fiddle here
to_tsvector() extracts values and array elements on all levels - just what you need.
Requires Postgres 10 or later. json(b)_to_tsvector() in Postgres 11 offers more flexibility.
That's attractive for tables of non-trivial size as it can be supported with a full text index very efficiently:
CREATE INDEX tbl_jsonb_column_fts_gin_idx ON tbl USING GIN (to_tsvector('simple', jsonb_column));
I use the 'simple' text search configuration in the example. You might want a language-specific one, like 'english'. Doesn't matter much while you only look for UUID strings, but stemming for a particular language might make the index a bit smaller ...
Related:
LIKE query on elements of flat jsonb array
Does the phrase search operator <-> work with JSONB documents or only relational tables?
While you are only looking for UUIDs, you might optimize further with a custom (IMMUTABLE) function to extract UUIDs from the JSON document as array (uuid[]) and build a functional GIN index on top of it. (Considerably smaller index, yet.) Then:
SELECT * FROM tbl
WHERE my_uuid_extractor(jsonb_column) #> '{5cbffeb7-8d5e-4b52-a475-3cf320b2cee9}';
Such a function can be expensive, but does not matter much with a functional index that stores and operates on pre-computed values.
You can split the array elements first by using jsonb_array_elements(json), and then filter the casted string from those elements by like operator
select q.elm
from
(
select jsonb_array_elements(js) as elm
from tab
) q
where elm::varchar like '%User%'
elm
----------------------------------------------------------------------------------------------------------------------
"UserMailer"
{"reference_id": "5cbffeb7-8d5e-4b52-a475-3cf320b2cee9", "job_dictionary": ["StatusUpdater", "FollowTwitterUsersJob"]}
Demo

Search a JSON array for an object containing a value matching a pattern

I have a DB with a jsonb column where each row essentially holds an array of name value pairs. Example for a single jsonb value:
[
{"name":"foo", "value":"bar"},
{"name":"biz", "value":"baz"},
{"name":"beep", "value":"boop"}
]
How would I query for rows that contain a partial value? I.e., find rows with the JSON object key value ilike '%ba%'?
I know that I can use SELECT * FROM tbl WHERE jsoncol #> '[{"value":"bar"}]' to find rows where the JSON is that specific value, but how would I query for rows containing a pattern?
There are no built in jsonb operators nor any indexes supporting this kind of filter directly (yet).
I suggest an EXISTS semi-join:
SELECT t.*
FROM tbl t
WHERE EXISTS (
SELECT FROM jsonb_array_elements(t.jsoncol) elem
WHERE elem->>'value' LIKE '%ba%'
);
It avoids redundant evaluations and the final DISTINCT step you would need to get distinct rows with a plain CROSS JOIN.
If this still isn't fast enough, a way more sophisticated specialized solution for the given type of query would be to extract a concatenated string of unique values (with a delimiter that won't interfere with your search patterns) per row in an IMMUTABLE function, build a trigram GIN index on the functional expression and use the same expression in your queries.
Related:
Search for nested values in jsonb array with greater operator
Find rows containing a key in a JSONB array of records
Create Postgres JSONB Index on Array Sub-Object
Aside, if your jsonb values really look like the example, you could trim a lot of noise and just store:
[
{"foo":"bar"},
{"biz":"baz"},
{"beep":"boop"}
]
You can use the function jsonb_array_elements() in a lateral join and use its result value in the WHERE clause:
select distinct t.*
from my_table t
cross join jsonb_array_elements(jsoncol)
where value->>'value' like '%ba%'
Please, read How to query jsonb arrays with IN operator for notes about distinct and performance.

PostgreSQL - best way to return an array of key-value pairs

I'm trying to select a number of fields, one of which needs to be an array with each element of the array containing two values. Each array item needs to contain a name (character varying) and an ID (numeric). I know how to return an array of single values (using the ARRAY keyword) but I'm unsure of how to return an array of an object which in itself contains two values.
The query is something like
SELECT
t.field1,
t.field2,
ARRAY(--with each element containing two values i.e. {'TheName', 1 })
FROM MyTable t
I read that one way to do this is by selecting the values into a type and then creating an array of that type. Problem is, the rest of the function is already returning a type (which means I would then have nested types - is that OK? If so, how would you read this data back in application code - i.e. with a .Net data provider like NPGSQL?)
Any help is much appreciated.
ARRAYs can only hold elements of the same type
Your example displays a text and an integer value (no single quotes around 1). It is generally impossible to mix types in an array. To get those values into an array you have to create a composite type and then form an ARRAY of that composite type like you already mentioned yourself.
Alternatively you can use the data types json in Postgres 9.2+, jsonb in Postgres 9.4+ or hstore for key-value pairs.
Of course, you can cast the integer to text, and work with a two-dimensional text array. Consider the two syntax variants for a array input in the demo below and consult the manual on array input.
There is a limitation to overcome. If you try to aggregate an ARRAY (build from key and value) into a two-dimensional array, the aggregate function array_agg() or the ARRAY constructor error out:
ERROR: could not find array type for data type text[]
There are ways around it, though.
Aggregate key-value pairs into a 2-dimensional array
PostgreSQL 9.1 with standard_conforming_strings= on:
CREATE TEMP TABLE tbl(
id int
,txt text
,txtarr text[]
);
The column txtarr is just there to demonstrate syntax variants in the INSERT command. The third row is spiked with meta-characters:
INSERT INTO tbl VALUES
(1, 'foo', '{{1,foo1},{2,bar1},{3,baz1}}')
,(2, 'bar', ARRAY[['1','foo2'],['2','bar2'],['3','baz2']])
,(3, '}b",a{r''', '{{1,foo3},{2,bar3},{3,baz3}}'); -- txt has meta-characters
SELECT * FROM tbl;
Simple case: aggregate two integer (I use the same twice) into a two-dimensional int array:
Update: Better with custom aggregate function
With the polymorphic type anyarray it works for all base types:
CREATE AGGREGATE array_agg_mult (anyarray) (
SFUNC = array_cat
,STYPE = anyarray
,INITCOND = '{}'
);
Call:
SELECT array_agg_mult(ARRAY[ARRAY[id,id]]) AS x -- for int
,array_agg_mult(ARRAY[ARRAY[id::text,txt]]) AS y -- or text
FROM tbl;
Note the additional ARRAY[] layer to make it a multidimensional array.
Update for Postgres 9.5+
Postgres now ships a variant of array_agg() accepting array input and you can replace my custom function from above with this:
The manual:
array_agg(expression)
...
input arrays concatenated into array of one
higher dimension (inputs must all have same dimensionality, and cannot
be empty or NULL)
I suspect that without having more knowledge of your application I'm not going to be able to get you all the way to the result you need. But we can get pretty far. For starters, there is the ROW function:
# SELECT 'foo', ROW(3, 'Bob');
?column? | row
----------+---------
foo | (3,Bob)
(1 row)
So that right there lets you bundle a whole row into a cell. You could also make things more explicit by making a type for it:
# CREATE TYPE person(id INTEGER, name VARCHAR);
CREATE TYPE
# SELECT now(), row(3, 'Bob')::person;
now | row
-------------------------------+---------
2012-02-03 10:46:13.279512-07 | (3,Bob)
(1 row)
Incidentally, whenever you make a table, PostgreSQL makes a type of the same name, so if you already have a table like this you also have a type. For example:
# DROP TYPE person;
DROP TYPE
# CREATE TABLE people (id SERIAL, name VARCHAR);
NOTICE: CREATE TABLE will create implicit sequence "people_id_seq" for serial column "people.id"
CREATE TABLE
# SELECT 'foo', row(3, 'Bob')::people;
?column? | row
----------+---------
foo | (3,Bob)
(1 row)
See in the third query there I used people just like a type.
Now this is not likely to be as much help as you'd think for two reasons:
I can't find any convenient syntax for pulling data out of the nested row.
I may be missing something, but I just don't see many people using this syntax. The only example I see in the documentation is a function taking a row value as an argument and doing something with it. I don't see an example of pulling the row out of the cell and querying against parts of it. It seems like you can package the data up this way, but it's hard to deconstruct after that. You'll wind up having to make a lot of stored procedures.
Your language's PostgreSQL driver may not be able to handle row-valued data nested in a row.
I can't speak for NPGSQL, but since this is a very PostgreSQL-specific feature you're not going to find support for it in libraries that support other databases. For example, Hibernate isn't going to be able to handle fetching an object stored as a cell value in a row. I'm not even sure the JDBC would be able to give Hibernate the information usefully, so the problem could go quite deep.
So, what you're doing here is feasible provided you can live without a lot of the niceties. I would recommend against pursuing it though, because it's going to be an uphill battle the whole way, unless I'm really misinformed.
A simple way without hstore
SELECT
jsonb_agg(to_jsonb (t))
FROM (
SELECT
unnest(ARRAY ['foo', 'bar', 'baz']) AS table_name
) t
>>> [{"table_name": "foo"}, {"table_name": "bar"}, {"table_name": "baz"}]