Index for finding an element in a JSON array - sql

I have a table that looks like this:
CREATE TABLE tracks (id SERIAL, artists JSON);
INSERT INTO tracks (id, artists)
VALUES (1, '[{"name": "blink-182"}]');
INSERT INTO tracks (id, artists)
VALUES (2, '[{"name": "The Dirty Heads"}, {"name": "Louis Richards"}]');
There's several other columns that aren't relevant to this question. There's a reason to have them stored as JSON.
What I'm trying to do is lookup a track that has a specific artist name (exact match).
I'm using this query:
SELECT * FROM tracks
WHERE 'ARTIST NAME' IN
(SELECT value->>'name' FROM json_array_elements(artists))
for example
SELECT * FROM tracks
WHERE 'The Dirty Heads' IN
(SELECT value->>'name' FROM json_array_elements(artists))
However, this does a full table scan, and it isn't very fast. I tried creating a GIN index using a function names_as_array(artists), and used 'ARTIST NAME' = ANY names_as_array(artists), however the index isn't used and the query is actually significantly slower.

jsonb in Postgres 9.4+
The binary JSON data type jsonb largely improves index options. You can now have a GIN index on a jsonb array directly:
CREATE TABLE tracks (id serial, artists jsonb); -- !
CREATE INDEX tracks_artists_gin_idx ON tracks USING gin (artists);
No need for a function to convert the array. This would support a query:
SELECT * FROM tracks WHERE artists #> '[{"name": "The Dirty Heads"}]';
#> being the jsonb "contains" operator, which can use the GIN index. (Not for json, only jsonb!)
Or you use the more specialized, non-default GIN operator class jsonb_path_ops for the index:
CREATE INDEX tracks_artists_gin_idx ON tracks
USING gin (artists jsonb_path_ops); -- !
Same query.
Currently jsonb_path_ops only supports the #> operator. But it's typically much smaller and faster. There are more index options, details in the manual.
If the column artists only holds names as displayed in the example, it would be more efficient to store just the values as JSON text primitives and the redundant key can be the column name.
Note the difference between JSON objects and primitive types:
Using indexes in json array in PostgreSQL
CREATE TABLE tracks (id serial, artistnames jsonb);
INSERT INTO tracks VALUES (2, '["The Dirty Heads", "Louis Richards"]');
CREATE INDEX tracks_artistnames_gin_idx ON tracks USING gin (artistnames);
Query:
SELECT * FROM tracks WHERE artistnames ? 'The Dirty Heads';
? does not work for object values, just keys and array elements.
Or:
CREATE INDEX tracks_artistnames_gin_idx ON tracks
USING gin (artistnames jsonb_path_ops);
Query:
SELECT * FROM tracks WHERE artistnames #> '"The Dirty Heads"'::jsonb;
More efficient if names are highly duplicative.
json in Postgres 9.3+
This should work with an IMMUTABLE function:
CREATE OR REPLACE FUNCTION json2arr(_j json, _key text)
RETURNS text[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY(SELECT elem->>_key FROM json_array_elements(_j) elem)';
Create this functional index:
CREATE INDEX tracks_artists_gin_idx ON tracks
USING gin (json2arr(artists, 'name'));
And use a query like this. The expression in the WHERE clause has to match the one in the index:
SELECT * FROM tracks
WHERE '{"The Dirty Heads"}'::text[] <# (json2arr(artists, 'name'));
Updated with feedback in comments. We need to use array operators to support the GIN index.
The "is contained by" operator <# in this case.
Notes on function volatility
You can declare your function IMMUTABLE even if json_array_elements() isn't wasn't.
Most JSON functions used to be only STABLE, not IMMUTABLE. There was a discussion on the hackers list to change that. Most are IMMUTABLE now. Check with:
SELECT p.proname, p.provolatile
FROM pg_proc p
JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE n.nspname = 'pg_catalog'
AND p.proname ~~* '%json%';
Functional indexes only work with IMMUTABLE functions.

Related

How to speed up SELECT for a JSONB column in Postgres when the first level key is unknown?

I have a table with a JSONB column called "attributes" that contains a JSON object with various keys and values. The keys are dynamic and I do not know their names until the time of the query. I have over 20 million rows in this table and the queries on this column are currently very slow. Is there a way to improve the search performance in this scenario without using dynamically generated indexes?
How my data stored:
attributes
JSONB
JSON looks like this:
{
dynamicName1: 'value',
dynamicName2: 'value',
dynamicName3: 'value',
...
}
Example of query:
SELECT * FROM table WHERE "attributes" ->> 'dynamicName1' = 'SomeValue'
SELECT * FROM table WHERE "attributes" ->> 'abcdefg' = 'SomeValue'
SELECT * FROM table WHERE "attributes" ->> 'anyPossibleName' = 'SomeValue'
Create table:
CREATE TABLE "table" ("id" SERIAL NOT NULL, "attributes" JSONB)
Explain:
Gather (cost=1000.00..3460271.08 rows=91075 width=1178)
Workers Planned: 2
" -> Parallel Seq Scan on ""table"" (cost=0.00..3450163.58 rows=37948 width=1178)"
" Filter: ((""attributes"" ->> 'Beak'::text) = 'Yellow'::text)"
I have attempted to research the use of indexes to improve search performance on JSONB columns, but have been unable to find any information that specifically addresses my scenario where the keys in the JSON object are dynamic and unknown until the time of the query.
You don't need to specify the keys within the jsonb object to build a useful index on its column.
create index on "table" using gin("attributes" jsonb_path_ops);
and then use ##jsonpath or #>jsonb operators that are supported by GIN. You can omit the jsonb_path_ops operator class if you'll need to use other operators with this index.
select * from "table" where "attributes" ## '$.dynamicName1 == "SomeValue"';
select * from "table" where "attributes" #> '{"dynamicName1":"SomeValue"}'::jsonb;
Online demo where this speeds things up about three orders of magnitude on 400k random records.

Index Creation on Postgresql

I have a issue where i am facing difficulty in creating index.
TableName(My_Table)
ColumnName(Repo_id(INT),Data JSONB)
JSONB structure:
{
"Property_1":'1',
"Property_2":'2',
"Property_3":'3',
"Property_4":'4',
"Property_5":'5'
}
For one query:
select *
from my_table
where repo_id = 1
and Data ->> 'Property_1' = '1'
I added btree index (repo_id,(Data ->> 'Property_1') ), It worked fine for that scenario.
for other scenarios like
select *
from my_table
where repo_id = 2
and Data ->> 'Property_2' = '2'
It is not giving me optimal plan. for that i have to modify previous index as covered index (repo_id,(Data ->> 'Property_1',((Data ->> 'Property_2')) ) and this gave me optimal plan.
I have more than 100 json attributes in column and for each repo_id in where condition ...json attribute filters will be different. i dont think it will be wise to add all those columns as covered index it will increase index size.
Please suggest how can i efficiently create index on dynamic json attribute filter.
Use a GIN index and change your WHERE clause:
create index on the_table using gin (data);
Then use the contains operator #> :
select *
from my_table
where data #> '{"Property_2": 2}';
The condition where data ->> 'Property_2' = '2' will not use that index. You have to use one of the supported operator classes for the index to be used.
If the #> operator will support all you ever want to do, using a different operator class makes the index more efficient:
create index on the_table using gin (data jsonb_path_ops);
With that operator class, the operators ? ?& and ?| would not make use of that index.

Query JSONB column for any value where =?

I have a jsonb column which has the unfortunate case of being very unpredictable, in some cases its value may be an array with nested values:
["UserMailer", "applicant_setup_3", ["5cbffeb7-8d5e-4b52-a475-3cf320b2cee9"]]
Sometimes it will be something with key/values like this:
[{"reference_id": "5cbffeb7-8d5e-4b52-a475-3cf320b2cee9", "job_dictionary": ["StatusUpdater", "FollowTwitterUsersJob"]}]
Is there a way to write a query which just treats the whole column like text and does a like to see if I can find the uuid in the big text blob? I want to find all the records where a particular uuid string is present in the jsonb column.
The query doesn't need to be fast or efficient.
Postgres has search operator ? for jsonb, but that would require you to search the json content recursively.
A possible, although not very efficient method, would to stringify the object and use LIKE to search it:
myjsonb::text LIKE '%"5cbffeb7-8d5e-4b52-a475-3cf320b2cee9"%'
myjsonb::text LIKE '%"' || myuuid || '"%'
Demo on DB Fiddle:
The problem with the jsonb operator ? is that it only considers top-level keys (including array elements), not values, and no nested objects.
You seem to be looking for values and array elements (not keys) on any level. You can get that with a full text search on top of your json(b) column:
SELECT * FROM tbl
WHERE to_tsvector('simple', jsonb_column)
## tsquery '5cbffeb7-8d5e-4b52-a475-3cf320b2cee9';
db<>fiddle here
to_tsvector() extracts values and array elements on all levels - just what you need.
Requires Postgres 10 or later. json(b)_to_tsvector() in Postgres 11 offers more flexibility.
That's attractive for tables of non-trivial size as it can be supported with a full text index very efficiently:
CREATE INDEX tbl_jsonb_column_fts_gin_idx ON tbl USING GIN (to_tsvector('simple', jsonb_column));
I use the 'simple' text search configuration in the example. You might want a language-specific one, like 'english'. Doesn't matter much while you only look for UUID strings, but stemming for a particular language might make the index a bit smaller ...
Related:
LIKE query on elements of flat jsonb array
Does the phrase search operator <-> work with JSONB documents or only relational tables?
While you are only looking for UUIDs, you might optimize further with a custom (IMMUTABLE) function to extract UUIDs from the JSON document as array (uuid[]) and build a functional GIN index on top of it. (Considerably smaller index, yet.) Then:
SELECT * FROM tbl
WHERE my_uuid_extractor(jsonb_column) #> '{5cbffeb7-8d5e-4b52-a475-3cf320b2cee9}';
Such a function can be expensive, but does not matter much with a functional index that stores and operates on pre-computed values.
You can split the array elements first by using jsonb_array_elements(json), and then filter the casted string from those elements by like operator
select q.elm
from
(
select jsonb_array_elements(js) as elm
from tab
) q
where elm::varchar like '%User%'
elm
----------------------------------------------------------------------------------------------------------------------
"UserMailer"
{"reference_id": "5cbffeb7-8d5e-4b52-a475-3cf320b2cee9", "job_dictionary": ["StatusUpdater", "FollowTwitterUsersJob"]}
Demo

Search a JSON array for an object containing a value matching a pattern

I have a DB with a jsonb column where each row essentially holds an array of name value pairs. Example for a single jsonb value:
[
{"name":"foo", "value":"bar"},
{"name":"biz", "value":"baz"},
{"name":"beep", "value":"boop"}
]
How would I query for rows that contain a partial value? I.e., find rows with the JSON object key value ilike '%ba%'?
I know that I can use SELECT * FROM tbl WHERE jsoncol #> '[{"value":"bar"}]' to find rows where the JSON is that specific value, but how would I query for rows containing a pattern?
There are no built in jsonb operators nor any indexes supporting this kind of filter directly (yet).
I suggest an EXISTS semi-join:
SELECT t.*
FROM tbl t
WHERE EXISTS (
SELECT FROM jsonb_array_elements(t.jsoncol) elem
WHERE elem->>'value' LIKE '%ba%'
);
It avoids redundant evaluations and the final DISTINCT step you would need to get distinct rows with a plain CROSS JOIN.
If this still isn't fast enough, a way more sophisticated specialized solution for the given type of query would be to extract a concatenated string of unique values (with a delimiter that won't interfere with your search patterns) per row in an IMMUTABLE function, build a trigram GIN index on the functional expression and use the same expression in your queries.
Related:
Search for nested values in jsonb array with greater operator
Find rows containing a key in a JSONB array of records
Create Postgres JSONB Index on Array Sub-Object
Aside, if your jsonb values really look like the example, you could trim a lot of noise and just store:
[
{"foo":"bar"},
{"biz":"baz"},
{"beep":"boop"}
]
You can use the function jsonb_array_elements() in a lateral join and use its result value in the WHERE clause:
select distinct t.*
from my_table t
cross join jsonb_array_elements(jsoncol)
where value->>'value' like '%ba%'
Please, read How to query jsonb arrays with IN operator for notes about distinct and performance.

How to filter a value of any key of json in postgres

I have a table users with a jsonb field called data. I have to retrieve all the users that have a value in that data column matching a given string. For example:
user1 = data: {"property_a": "a1", "property_b": "b1"}
user2 = data: {"property_a": "a2", "property_b": "b2"}
I want to retrieve any user that has a value data matching 'b2', in this case that will be 'user2'.
Any idea how to do this in an elegant way? I can retrieve all keys from data of all users and create a query manually but that will be neither fast nor elegant.
In addition, I have to retrieve the key and value matched, but first things first.
There is no easy way. Per documentation:
GIN indexes can be used to efficiently search for keys or key/value
pairs occurring within a large number of jsonb documents (datums)
Bold emphasis mine. There is no index over all values. (Those can have non-compatible data types!) If you do not know the name(s) of all key(s) you have to inspect all JSON values in every row.
If there are just two keys like you demonstrate (or just a few well-kown keys), it's still easy enough:
SELECT *
FROM users
WHERE data->>'property_a' = 'b2' OR
data->>'property_b' = 'b2';
Can be supported with a simple expression index:
CREATE INDEX foo_idx ON users ((data->>'property_a'), (data->>'property_b'))
Or with a GIN index:
SELECT *
FROM users
WHERE data #> '{"property_a": "b2"}' OR
data #> '{"property_b": "b2"}'
CREATE INDEX bar_idx ON users USING gin (data jsonb_path_ops);
If you don't know all key names, things get more complicated ...
You could use jsonb_each() or jsonb_each_text() to unnest all values into a set and then check with an ANY construct:
SELECT *
FROM users
WHERE jsonb '"b2"' = ANY (SELECT (jsonb_each(data)).value);
Or
...
WHERE 'b2' = ANY (SELECT (jsonb_each_text(data)).value);
db<>fiddle here
But there is no index support for the last one. You could instead extract all values into and array and create an expression index on that, and match that expression in queries with array operators ...
Related:
How do I query using fields inside the new PostgreSQL JSON datatype?
Index for finding an element in a JSON array
Can PostgreSQL index array columns?
Try this query.
SELECT * FROM users
WHERE data::text LIKE '%b2%'
Of course it won't work if your key will contain such string too.