What's the proper index for querying structures in arrays in Postgres jsonb? - sql

I'm experimenting with keeping values like the following in a Postgres jsonb field in Postgres 9.4:
[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
{"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
{"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]
I'm executing queries like:
SELECT * FROM locations
WHERE EXISTS (
SELECT 1 FROM jsonb_array_elements(events) AS e
WHERE (
e->>'event_slug' = 'test_1' AND
(
e->>'start_time' >= '2014-10-30 14:04:06 -0400' OR
e->>'end_time' >= '2014-10-30 14:04:06 -0400'
)
)
)
How would I create an index on that data for queries like the above to utilize? Does this sound reasonable design for a few million rows that each contain ~10 events in that column?
Worth noting that it seems I'm still getting sequential scans with:
CREATE INDEX events_gin_idx ON some_table USING GIN (events);
which I'm guessing is because the first thing I'm doing in the query is converting data to json array elements.

First of all, you cannot access JSON array values like that. For a given json value:
[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
{"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
{"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]
A valid test against the first array element would be:
WHERE e->0->>'event_slug' = 'test_1'
But you probably don't want to limit your search to the first element of the array. With the jsonb data type in Postgres 9.4 you have additional operators and index support. To index elements of an array you need a GIN index.
The built-in operator classes for GIN indexes do not support "greater than" or "less than" operators > >= < <=. This is true for jsonb as well, where you can choose between two operator classes. The manual:
Name Indexed Data Type Indexable Operators
...
jsonb_ops jsonb ? ?& ?| #>
jsonb_path_ops jsonb #>
(jsonb_ops being the default.) You can cover the equality test, but neither of those operators covers your requirement for >= comparison. You would need a btree index.
Basic solution
To support the equality check with an index:
CREATE INDEX locations_events_gin_idx ON locations
USING gin (events jsonb_path_ops);
SELECT * FROM locations WHERE events #> '[{"event_slug":"test_1"}]';
This might be good enough if the filter is selective enough.
Assuming end_time >= start_time, so we don't need two checks. Checking only end_time is cheaper and equivalent:
SELECT l.*
FROM locations l
, jsonb_array_elements(l.events) e
WHERE l.events #> '[{"event_slug":"test_1"}]'
AND (e->>'end_time')::timestamp >= '2014-10-30 14:04:06 -0400'::timestamptz;
Utilizing an implicit JOIN LATERAL. Details (last chapter):
PostgreSQL unnest() with element number
Careful with the different data types! What you have in the JSON value looks like timestamp [without time zone], while your predicates use timestamp with time zone literals. The timestamp value is interpreted according to the current time zone setting, while the given timestamptz literals must be cast to timestamptz explicitly or the time zone would be ignored! Above query should work as desired. Detailed explanation:
Ignoring time zones altogether in Rails and PostgreSQL
More explanation for jsonb_array_elements():
PostgreSQL joining using JSONB
Advanced solution
If the above is not good enough, I would consider a MATERIALIZED VIEW that stores relevant attributes in normalized form. This allows plain btree indexes.
The code assumes that your JSON values have a consistent format as displayed in the question.
Setup:
CREATE TYPE event_type AS (
, event_slug text
, start_time timestamp
, end_time timestamp
);
CREATE MATERIALIZED VIEW loc_event AS
SELECT l.location_id, e.event_slug, e.end_time -- start_time not needed
FROM locations l, jsonb_populate_recordset(null::event_type, l.events) e;
Related answer for jsonb_populate_recordset():
How to convert PostgreSQL 9.4's jsonb type to float
CREATE INDEX loc_event_idx ON loc_event (event_slug, end_time, location_id);
Also including location_id to allow index-only scans. (See manual page and Postgres Wiki.)
Query:
SELECT *
FROM loc_event
WHERE event_slug = 'test_1'
AND end_time >= '2014-10-30 14:04:06 -0400'::timestamptz;
Or, if you need full rows from the underlying locations table:
SELECT l.*
FROM (
SELECT DISTINCT location_id
FROM loc_event
WHERE event_slug = 'test_1'
AND end_time >= '2014-10-30 14:04:06 -0400'::timestamptz
) le
JOIN locations l USING (location_id);

CREATE INDEX json_array_elements_index ON
json_array_elements ((events_arr->>'event_slug'));
Should get you started in the right direction.

Related

How to speed up SELECT for a JSONB column in Postgres when the first level key is unknown?

I have a table with a JSONB column called "attributes" that contains a JSON object with various keys and values. The keys are dynamic and I do not know their names until the time of the query. I have over 20 million rows in this table and the queries on this column are currently very slow. Is there a way to improve the search performance in this scenario without using dynamically generated indexes?
How my data stored:
attributes
JSONB
JSON looks like this:
{
dynamicName1: 'value',
dynamicName2: 'value',
dynamicName3: 'value',
...
}
Example of query:
SELECT * FROM table WHERE "attributes" ->> 'dynamicName1' = 'SomeValue'
SELECT * FROM table WHERE "attributes" ->> 'abcdefg' = 'SomeValue'
SELECT * FROM table WHERE "attributes" ->> 'anyPossibleName' = 'SomeValue'
Create table:
CREATE TABLE "table" ("id" SERIAL NOT NULL, "attributes" JSONB)
Explain:
Gather (cost=1000.00..3460271.08 rows=91075 width=1178)
Workers Planned: 2
" -> Parallel Seq Scan on ""table"" (cost=0.00..3450163.58 rows=37948 width=1178)"
" Filter: ((""attributes"" ->> 'Beak'::text) = 'Yellow'::text)"
I have attempted to research the use of indexes to improve search performance on JSONB columns, but have been unable to find any information that specifically addresses my scenario where the keys in the JSON object are dynamic and unknown until the time of the query.
You don't need to specify the keys within the jsonb object to build a useful index on its column.
create index on "table" using gin("attributes" jsonb_path_ops);
and then use ##jsonpath or #>jsonb operators that are supported by GIN. You can omit the jsonb_path_ops operator class if you'll need to use other operators with this index.
select * from "table" where "attributes" ## '$.dynamicName1 == "SomeValue"';
select * from "table" where "attributes" #> '{"dynamicName1":"SomeValue"}'::jsonb;
Online demo where this speeds things up about three orders of magnitude on 400k random records.

Postgres greater than or null

I am trying to efficiently use an index for the greater than or null query.
Example of a query would be:
select * from table where date > '2020-01-01' or date is null
What index if any do I need that postgres will be able to do this efficiently. I tried doing an index with
create index date on table (date asc nulls last)
but it does not seem to work and the best I am able to get is two bitmaps scans (one for greater than and one for null).
If you are able to rewrite your condition, you could replace the null value with a date that is guaranteed to be greater than the comparison value:
where coalesce(date, 'infinity') > date '2020-01-01'
Then create an index on that expression:
create index on the_table ( (coalesce(date, 'infinity')) )
See also PostgreSQL docs:
Date/Time Types, 8.5.1.4. Special Values for infinity value
Conditional Expressions, 9.18.2. COALESCE for coalesce function
Does Postgres use the index correctly when you use union all?
select *
from table
where date > '2020-01-01'
union all
select *
from table
where date is null;
The issue might be the inequality combined with the NULL comparison. If this is some sort of "end date", then you might consider using some far out future value such as 9999-01-01 or infinity.

Index Creation on Postgresql

I have a issue where i am facing difficulty in creating index.
TableName(My_Table)
ColumnName(Repo_id(INT),Data JSONB)
JSONB structure:
{
"Property_1":'1',
"Property_2":'2',
"Property_3":'3',
"Property_4":'4',
"Property_5":'5'
}
For one query:
select *
from my_table
where repo_id = 1
and Data ->> 'Property_1' = '1'
I added btree index (repo_id,(Data ->> 'Property_1') ), It worked fine for that scenario.
for other scenarios like
select *
from my_table
where repo_id = 2
and Data ->> 'Property_2' = '2'
It is not giving me optimal plan. for that i have to modify previous index as covered index (repo_id,(Data ->> 'Property_1',((Data ->> 'Property_2')) ) and this gave me optimal plan.
I have more than 100 json attributes in column and for each repo_id in where condition ...json attribute filters will be different. i dont think it will be wise to add all those columns as covered index it will increase index size.
Please suggest how can i efficiently create index on dynamic json attribute filter.
Use a GIN index and change your WHERE clause:
create index on the_table using gin (data);
Then use the contains operator #> :
select *
from my_table
where data #> '{"Property_2": 2}';
The condition where data ->> 'Property_2' = '2' will not use that index. You have to use one of the supported operator classes for the index to be used.
If the #> operator will support all you ever want to do, using a different operator class makes the index more efficient:
create index on the_table using gin (data jsonb_path_ops);
With that operator class, the operators ? ?& and ?| would not make use of that index.

Search a JSON array for an object containing a value matching a pattern

I have a DB with a jsonb column where each row essentially holds an array of name value pairs. Example for a single jsonb value:
[
{"name":"foo", "value":"bar"},
{"name":"biz", "value":"baz"},
{"name":"beep", "value":"boop"}
]
How would I query for rows that contain a partial value? I.e., find rows with the JSON object key value ilike '%ba%'?
I know that I can use SELECT * FROM tbl WHERE jsoncol #> '[{"value":"bar"}]' to find rows where the JSON is that specific value, but how would I query for rows containing a pattern?
There are no built in jsonb operators nor any indexes supporting this kind of filter directly (yet).
I suggest an EXISTS semi-join:
SELECT t.*
FROM tbl t
WHERE EXISTS (
SELECT FROM jsonb_array_elements(t.jsoncol) elem
WHERE elem->>'value' LIKE '%ba%'
);
It avoids redundant evaluations and the final DISTINCT step you would need to get distinct rows with a plain CROSS JOIN.
If this still isn't fast enough, a way more sophisticated specialized solution for the given type of query would be to extract a concatenated string of unique values (with a delimiter that won't interfere with your search patterns) per row in an IMMUTABLE function, build a trigram GIN index on the functional expression and use the same expression in your queries.
Related:
Search for nested values in jsonb array with greater operator
Find rows containing a key in a JSONB array of records
Create Postgres JSONB Index on Array Sub-Object
Aside, if your jsonb values really look like the example, you could trim a lot of noise and just store:
[
{"foo":"bar"},
{"biz":"baz"},
{"beep":"boop"}
]
You can use the function jsonb_array_elements() in a lateral join and use its result value in the WHERE clause:
select distinct t.*
from my_table t
cross join jsonb_array_elements(jsoncol)
where value->>'value' like '%ba%'
Please, read How to query jsonb arrays with IN operator for notes about distinct and performance.

How to filter a value of any key of json in postgres

I have a table users with a jsonb field called data. I have to retrieve all the users that have a value in that data column matching a given string. For example:
user1 = data: {"property_a": "a1", "property_b": "b1"}
user2 = data: {"property_a": "a2", "property_b": "b2"}
I want to retrieve any user that has a value data matching 'b2', in this case that will be 'user2'.
Any idea how to do this in an elegant way? I can retrieve all keys from data of all users and create a query manually but that will be neither fast nor elegant.
In addition, I have to retrieve the key and value matched, but first things first.
There is no easy way. Per documentation:
GIN indexes can be used to efficiently search for keys or key/value
pairs occurring within a large number of jsonb documents (datums)
Bold emphasis mine. There is no index over all values. (Those can have non-compatible data types!) If you do not know the name(s) of all key(s) you have to inspect all JSON values in every row.
If there are just two keys like you demonstrate (or just a few well-kown keys), it's still easy enough:
SELECT *
FROM users
WHERE data->>'property_a' = 'b2' OR
data->>'property_b' = 'b2';
Can be supported with a simple expression index:
CREATE INDEX foo_idx ON users ((data->>'property_a'), (data->>'property_b'))
Or with a GIN index:
SELECT *
FROM users
WHERE data #> '{"property_a": "b2"}' OR
data #> '{"property_b": "b2"}'
CREATE INDEX bar_idx ON users USING gin (data jsonb_path_ops);
If you don't know all key names, things get more complicated ...
You could use jsonb_each() or jsonb_each_text() to unnest all values into a set and then check with an ANY construct:
SELECT *
FROM users
WHERE jsonb '"b2"' = ANY (SELECT (jsonb_each(data)).value);
Or
...
WHERE 'b2' = ANY (SELECT (jsonb_each_text(data)).value);
db<>fiddle here
But there is no index support for the last one. You could instead extract all values into and array and create an expression index on that, and match that expression in queries with array operators ...
Related:
How do I query using fields inside the new PostgreSQL JSON datatype?
Index for finding an element in a JSON array
Can PostgreSQL index array columns?
Try this query.
SELECT * FROM users
WHERE data::text LIKE '%b2%'
Of course it won't work if your key will contain such string too.