Index Creation on Postgresql - sql

I have a issue where i am facing difficulty in creating index.
TableName(My_Table)
ColumnName(Repo_id(INT),Data JSONB)
JSONB structure:
{
"Property_1":'1',
"Property_2":'2',
"Property_3":'3',
"Property_4":'4',
"Property_5":'5'
}
For one query:
select *
from my_table
where repo_id = 1
and Data ->> 'Property_1' = '1'
I added btree index (repo_id,(Data ->> 'Property_1') ), It worked fine for that scenario.
for other scenarios like
select *
from my_table
where repo_id = 2
and Data ->> 'Property_2' = '2'
It is not giving me optimal plan. for that i have to modify previous index as covered index (repo_id,(Data ->> 'Property_1',((Data ->> 'Property_2')) ) and this gave me optimal plan.
I have more than 100 json attributes in column and for each repo_id in where condition ...json attribute filters will be different. i dont think it will be wise to add all those columns as covered index it will increase index size.
Please suggest how can i efficiently create index on dynamic json attribute filter.

Use a GIN index and change your WHERE clause:
create index on the_table using gin (data);
Then use the contains operator #> :
select *
from my_table
where data #> '{"Property_2": 2}';
The condition where data ->> 'Property_2' = '2' will not use that index. You have to use one of the supported operator classes for the index to be used.
If the #> operator will support all you ever want to do, using a different operator class makes the index more efficient:
create index on the_table using gin (data jsonb_path_ops);
With that operator class, the operators ? ?& and ?| would not make use of that index.

Related

How to speed up SELECT for a JSONB column in Postgres when the first level key is unknown?

I have a table with a JSONB column called "attributes" that contains a JSON object with various keys and values. The keys are dynamic and I do not know their names until the time of the query. I have over 20 million rows in this table and the queries on this column are currently very slow. Is there a way to improve the search performance in this scenario without using dynamically generated indexes?
How my data stored:
attributes
JSONB
JSON looks like this:
{
dynamicName1: 'value',
dynamicName2: 'value',
dynamicName3: 'value',
...
}
Example of query:
SELECT * FROM table WHERE "attributes" ->> 'dynamicName1' = 'SomeValue'
SELECT * FROM table WHERE "attributes" ->> 'abcdefg' = 'SomeValue'
SELECT * FROM table WHERE "attributes" ->> 'anyPossibleName' = 'SomeValue'
Create table:
CREATE TABLE "table" ("id" SERIAL NOT NULL, "attributes" JSONB)
Explain:
Gather (cost=1000.00..3460271.08 rows=91075 width=1178)
Workers Planned: 2
" -> Parallel Seq Scan on ""table"" (cost=0.00..3450163.58 rows=37948 width=1178)"
" Filter: ((""attributes"" ->> 'Beak'::text) = 'Yellow'::text)"
I have attempted to research the use of indexes to improve search performance on JSONB columns, but have been unable to find any information that specifically addresses my scenario where the keys in the JSON object are dynamic and unknown until the time of the query.
You don't need to specify the keys within the jsonb object to build a useful index on its column.
create index on "table" using gin("attributes" jsonb_path_ops);
and then use ##jsonpath or #>jsonb operators that are supported by GIN. You can omit the jsonb_path_ops operator class if you'll need to use other operators with this index.
select * from "table" where "attributes" ## '$.dynamicName1 == "SomeValue"';
select * from "table" where "attributes" #> '{"dynamicName1":"SomeValue"}'::jsonb;
Online demo where this speeds things up about three orders of magnitude on 400k random records.

How to make a multi column JSON index in Postgres?

I have a query where I query by a normal text column and a value in a JSON column. What I'm wondering is how to create the best index for the query?
This is the query:
explain select * from "tags" where "slug"->>'en' = 'slugName'
and "type" in ('someType1','someType1');
-------
Seq Scan on tags (cost=0.00..1.47 rows=1 width=888)
" Filter: (((type)::text = ANY ('{dsfdsf,fgsdf}'::text[])) AND ((slug ->> 'en'::text) = 'dsfdsf'::text))"
The "slug" column is type JSON and the "type" column is type varchar(191). I'm familiar that I can add an index to the JSON column like:
CREATE INDEX tag_slug_index ON tags USING btree ((slug ->> 'en'));
But I'm wondering, how do I create a multi-column index on the slug name combined with the type column?
There is nothing special about it, you just do it the normal way, by separating them with a comma:
CREATE INDEX ON tags USING btree ("type", (slug ->> 'en'));
The expression does still need to be in an extra set of parentheses, same is if it were the only 'column' in the index.

What's the proper index for querying structures in arrays in Postgres jsonb?

I'm experimenting with keeping values like the following in a Postgres jsonb field in Postgres 9.4:
[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
{"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
{"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]
I'm executing queries like:
SELECT * FROM locations
WHERE EXISTS (
SELECT 1 FROM jsonb_array_elements(events) AS e
WHERE (
e->>'event_slug' = 'test_1' AND
(
e->>'start_time' >= '2014-10-30 14:04:06 -0400' OR
e->>'end_time' >= '2014-10-30 14:04:06 -0400'
)
)
)
How would I create an index on that data for queries like the above to utilize? Does this sound reasonable design for a few million rows that each contain ~10 events in that column?
Worth noting that it seems I'm still getting sequential scans with:
CREATE INDEX events_gin_idx ON some_table USING GIN (events);
which I'm guessing is because the first thing I'm doing in the query is converting data to json array elements.
First of all, you cannot access JSON array values like that. For a given json value:
[{"event_slug":"test_1","start_time":"2014-10-08","end_time":"2014-10-12"},
{"event_slug":"test_2","start_time":"2013-06-24","end_time":"2013-07-02"},
{"event_slug":"test_3","start_time":"2014-03-26","end_time":"2014-03-30"}]
A valid test against the first array element would be:
WHERE e->0->>'event_slug' = 'test_1'
But you probably don't want to limit your search to the first element of the array. With the jsonb data type in Postgres 9.4 you have additional operators and index support. To index elements of an array you need a GIN index.
The built-in operator classes for GIN indexes do not support "greater than" or "less than" operators > >= < <=. This is true for jsonb as well, where you can choose between two operator classes. The manual:
Name Indexed Data Type Indexable Operators
...
jsonb_ops jsonb ? ?& ?| #>
jsonb_path_ops jsonb #>
(jsonb_ops being the default.) You can cover the equality test, but neither of those operators covers your requirement for >= comparison. You would need a btree index.
Basic solution
To support the equality check with an index:
CREATE INDEX locations_events_gin_idx ON locations
USING gin (events jsonb_path_ops);
SELECT * FROM locations WHERE events #> '[{"event_slug":"test_1"}]';
This might be good enough if the filter is selective enough.
Assuming end_time >= start_time, so we don't need two checks. Checking only end_time is cheaper and equivalent:
SELECT l.*
FROM locations l
, jsonb_array_elements(l.events) e
WHERE l.events #> '[{"event_slug":"test_1"}]'
AND (e->>'end_time')::timestamp >= '2014-10-30 14:04:06 -0400'::timestamptz;
Utilizing an implicit JOIN LATERAL. Details (last chapter):
PostgreSQL unnest() with element number
Careful with the different data types! What you have in the JSON value looks like timestamp [without time zone], while your predicates use timestamp with time zone literals. The timestamp value is interpreted according to the current time zone setting, while the given timestamptz literals must be cast to timestamptz explicitly or the time zone would be ignored! Above query should work as desired. Detailed explanation:
Ignoring time zones altogether in Rails and PostgreSQL
More explanation for jsonb_array_elements():
PostgreSQL joining using JSONB
Advanced solution
If the above is not good enough, I would consider a MATERIALIZED VIEW that stores relevant attributes in normalized form. This allows plain btree indexes.
The code assumes that your JSON values have a consistent format as displayed in the question.
Setup:
CREATE TYPE event_type AS (
, event_slug text
, start_time timestamp
, end_time timestamp
);
CREATE MATERIALIZED VIEW loc_event AS
SELECT l.location_id, e.event_slug, e.end_time -- start_time not needed
FROM locations l, jsonb_populate_recordset(null::event_type, l.events) e;
Related answer for jsonb_populate_recordset():
How to convert PostgreSQL 9.4's jsonb type to float
CREATE INDEX loc_event_idx ON loc_event (event_slug, end_time, location_id);
Also including location_id to allow index-only scans. (See manual page and Postgres Wiki.)
Query:
SELECT *
FROM loc_event
WHERE event_slug = 'test_1'
AND end_time >= '2014-10-30 14:04:06 -0400'::timestamptz;
Or, if you need full rows from the underlying locations table:
SELECT l.*
FROM (
SELECT DISTINCT location_id
FROM loc_event
WHERE event_slug = 'test_1'
AND end_time >= '2014-10-30 14:04:06 -0400'::timestamptz
) le
JOIN locations l USING (location_id);
CREATE INDEX json_array_elements_index ON
json_array_elements ((events_arr->>'event_slug'));
Should get you started in the right direction.

multi-column index for string match + string similarity with pg_trgm?

Given this table:
foos
integer id
string name
string type
And a query like this:
select * from foos where name ilike '%bar%'
I can make a pg_trgm index like this to make lookups faster:
CREATE INDEX ON foos USING gin (name gin_trgm_ops)
(right?)
my question: what about a query like this:
select * from foos where name ilike '%bar%' AND type = 'baz'
Can I possibly make an index that will help the lookup of both columns?
(I know that trigram isn't strictly fulltext but I'm tagging this question as such anyway)
You can use a multicolumn index combining different types.
First, add the two extensions required in your case:
CREATE EXTENSION pg_trgm;
CREATE EXTENSION btree_gist;
pg_trgm allows you to use trigram indexes and btree_gist allows you to combine gist and b-tree indexes, which is what you want!
For a query like:
SELECT * FROM foo WHERE type = 'baz' AND name ilike '%bar%';
You can now create an index like:
CREATE INDEX ON foo USING gist (type, name gist_trgm_ops);
As usual, the order of columns has to be the same between the query and the index.
Use a composite index:
CREATE INDEX ON foos(name, type)
However, you might want:
CREATE INDEX ON foos(lower(name), type)
I don't see why a full text index is needed for your queries.

Index for finding an element in a JSON array

I have a table that looks like this:
CREATE TABLE tracks (id SERIAL, artists JSON);
INSERT INTO tracks (id, artists)
VALUES (1, '[{"name": "blink-182"}]');
INSERT INTO tracks (id, artists)
VALUES (2, '[{"name": "The Dirty Heads"}, {"name": "Louis Richards"}]');
There's several other columns that aren't relevant to this question. There's a reason to have them stored as JSON.
What I'm trying to do is lookup a track that has a specific artist name (exact match).
I'm using this query:
SELECT * FROM tracks
WHERE 'ARTIST NAME' IN
(SELECT value->>'name' FROM json_array_elements(artists))
for example
SELECT * FROM tracks
WHERE 'The Dirty Heads' IN
(SELECT value->>'name' FROM json_array_elements(artists))
However, this does a full table scan, and it isn't very fast. I tried creating a GIN index using a function names_as_array(artists), and used 'ARTIST NAME' = ANY names_as_array(artists), however the index isn't used and the query is actually significantly slower.
jsonb in Postgres 9.4+
The binary JSON data type jsonb largely improves index options. You can now have a GIN index on a jsonb array directly:
CREATE TABLE tracks (id serial, artists jsonb); -- !
CREATE INDEX tracks_artists_gin_idx ON tracks USING gin (artists);
No need for a function to convert the array. This would support a query:
SELECT * FROM tracks WHERE artists #> '[{"name": "The Dirty Heads"}]';
#> being the jsonb "contains" operator, which can use the GIN index. (Not for json, only jsonb!)
Or you use the more specialized, non-default GIN operator class jsonb_path_ops for the index:
CREATE INDEX tracks_artists_gin_idx ON tracks
USING gin (artists jsonb_path_ops); -- !
Same query.
Currently jsonb_path_ops only supports the #> operator. But it's typically much smaller and faster. There are more index options, details in the manual.
If the column artists only holds names as displayed in the example, it would be more efficient to store just the values as JSON text primitives and the redundant key can be the column name.
Note the difference between JSON objects and primitive types:
Using indexes in json array in PostgreSQL
CREATE TABLE tracks (id serial, artistnames jsonb);
INSERT INTO tracks VALUES (2, '["The Dirty Heads", "Louis Richards"]');
CREATE INDEX tracks_artistnames_gin_idx ON tracks USING gin (artistnames);
Query:
SELECT * FROM tracks WHERE artistnames ? 'The Dirty Heads';
? does not work for object values, just keys and array elements.
Or:
CREATE INDEX tracks_artistnames_gin_idx ON tracks
USING gin (artistnames jsonb_path_ops);
Query:
SELECT * FROM tracks WHERE artistnames #> '"The Dirty Heads"'::jsonb;
More efficient if names are highly duplicative.
json in Postgres 9.3+
This should work with an IMMUTABLE function:
CREATE OR REPLACE FUNCTION json2arr(_j json, _key text)
RETURNS text[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY(SELECT elem->>_key FROM json_array_elements(_j) elem)';
Create this functional index:
CREATE INDEX tracks_artists_gin_idx ON tracks
USING gin (json2arr(artists, 'name'));
And use a query like this. The expression in the WHERE clause has to match the one in the index:
SELECT * FROM tracks
WHERE '{"The Dirty Heads"}'::text[] <# (json2arr(artists, 'name'));
Updated with feedback in comments. We need to use array operators to support the GIN index.
The "is contained by" operator <# in this case.
Notes on function volatility
You can declare your function IMMUTABLE even if json_array_elements() isn't wasn't.
Most JSON functions used to be only STABLE, not IMMUTABLE. There was a discussion on the hackers list to change that. Most are IMMUTABLE now. Check with:
SELECT p.proname, p.provolatile
FROM pg_proc p
JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE n.nspname = 'pg_catalog'
AND p.proname ~~* '%json%';
Functional indexes only work with IMMUTABLE functions.