multi-column index for string match + string similarity with pg_trgm? - sql

Given this table:
foos
integer id
string name
string type
And a query like this:
select * from foos where name ilike '%bar%'
I can make a pg_trgm index like this to make lookups faster:
CREATE INDEX ON foos USING gin (name gin_trgm_ops)
(right?)
my question: what about a query like this:
select * from foos where name ilike '%bar%' AND type = 'baz'
Can I possibly make an index that will help the lookup of both columns?
(I know that trigram isn't strictly fulltext but I'm tagging this question as such anyway)

You can use a multicolumn index combining different types.
First, add the two extensions required in your case:
CREATE EXTENSION pg_trgm;
CREATE EXTENSION btree_gist;
pg_trgm allows you to use trigram indexes and btree_gist allows you to combine gist and b-tree indexes, which is what you want!
For a query like:
SELECT * FROM foo WHERE type = 'baz' AND name ilike '%bar%';
You can now create an index like:
CREATE INDEX ON foo USING gist (type, name gist_trgm_ops);
As usual, the order of columns has to be the same between the query and the index.

Use a composite index:
CREATE INDEX ON foos(name, type)
However, you might want:
CREATE INDEX ON foos(lower(name), type)
I don't see why a full text index is needed for your queries.

Related

How to speed up SELECT for a JSONB column in Postgres when the first level key is unknown?

I have a table with a JSONB column called "attributes" that contains a JSON object with various keys and values. The keys are dynamic and I do not know their names until the time of the query. I have over 20 million rows in this table and the queries on this column are currently very slow. Is there a way to improve the search performance in this scenario without using dynamically generated indexes?
How my data stored:
attributes
JSONB
JSON looks like this:
{
dynamicName1: 'value',
dynamicName2: 'value',
dynamicName3: 'value',
...
}
Example of query:
SELECT * FROM table WHERE "attributes" ->> 'dynamicName1' = 'SomeValue'
SELECT * FROM table WHERE "attributes" ->> 'abcdefg' = 'SomeValue'
SELECT * FROM table WHERE "attributes" ->> 'anyPossibleName' = 'SomeValue'
Create table:
CREATE TABLE "table" ("id" SERIAL NOT NULL, "attributes" JSONB)
Explain:
Gather (cost=1000.00..3460271.08 rows=91075 width=1178)
Workers Planned: 2
" -> Parallel Seq Scan on ""table"" (cost=0.00..3450163.58 rows=37948 width=1178)"
" Filter: ((""attributes"" ->> 'Beak'::text) = 'Yellow'::text)"
I have attempted to research the use of indexes to improve search performance on JSONB columns, but have been unable to find any information that specifically addresses my scenario where the keys in the JSON object are dynamic and unknown until the time of the query.
You don't need to specify the keys within the jsonb object to build a useful index on its column.
create index on "table" using gin("attributes" jsonb_path_ops);
and then use ##jsonpath or #>jsonb operators that are supported by GIN. You can omit the jsonb_path_ops operator class if you'll need to use other operators with this index.
select * from "table" where "attributes" ## '$.dynamicName1 == "SomeValue"';
select * from "table" where "attributes" #> '{"dynamicName1":"SomeValue"}'::jsonb;
Online demo where this speeds things up about three orders of magnitude on 400k random records.

Index Creation on Postgresql

I have a issue where i am facing difficulty in creating index.
TableName(My_Table)
ColumnName(Repo_id(INT),Data JSONB)
JSONB structure:
{
"Property_1":'1',
"Property_2":'2',
"Property_3":'3',
"Property_4":'4',
"Property_5":'5'
}
For one query:
select *
from my_table
where repo_id = 1
and Data ->> 'Property_1' = '1'
I added btree index (repo_id,(Data ->> 'Property_1') ), It worked fine for that scenario.
for other scenarios like
select *
from my_table
where repo_id = 2
and Data ->> 'Property_2' = '2'
It is not giving me optimal plan. for that i have to modify previous index as covered index (repo_id,(Data ->> 'Property_1',((Data ->> 'Property_2')) ) and this gave me optimal plan.
I have more than 100 json attributes in column and for each repo_id in where condition ...json attribute filters will be different. i dont think it will be wise to add all those columns as covered index it will increase index size.
Please suggest how can i efficiently create index on dynamic json attribute filter.
Use a GIN index and change your WHERE clause:
create index on the_table using gin (data);
Then use the contains operator #> :
select *
from my_table
where data #> '{"Property_2": 2}';
The condition where data ->> 'Property_2' = '2' will not use that index. You have to use one of the supported operator classes for the index to be used.
If the #> operator will support all you ever want to do, using a different operator class makes the index more efficient:
create index on the_table using gin (data jsonb_path_ops);
With that operator class, the operators ? ?& and ?| would not make use of that index.

Index on part of a column in SQLite

Is doing something like the following possible in SQLite:
create INDEX idx on mytable (synopsis(20));
In other words, indexing by something less than the full text field? This is useful on long-text fields where I don't want to index everything into memory (the index itself could take up more space than the entire table).
You seem to be looking for an index on expression:
Use a CREATE INDEX statement to create a new index on one or more expressions just like you would to create an index on columns. The only difference is that expressions are listed as the elements to be indexed rather than column names.
Consider:
CREATE INDEX idx ON mytable(SUBSTR(synopsis, 1, 20));
Please note that, as explained in the documentation, for this index to be considered by the sqlite query planner, you need to use the exact same expression that was given when creating the index.
So this query would use the index:
SELECT * FROM mytable WHERE SUBSTR(synopsis, 1, 20) = 'a text with 20 chars';
While, typically, this would not:
SELECT * FROM mytable WHERE synopsis LIKE 'a text with 20 chars%';
Note: yes, 'a text with 20 chars' is 20 chars long...

How to query an array of values within a JSONB field dictionary?

I have a jsonb column that contains a dictionary which has a key that points to an array of string values. I need to query against that array.
The table (called "things") looks like this:
------------------
| my_column |
|----------------|
| { "a": ["X"] } |
------------------
I need to write two queries:
Does the array contain value "X"?
Does the array not contain value "X"?
my_column has a non-null constraint, but it can contain an empty dictionary. The dictionary can also contain other key/value pairs.
The first query was easy:
SELECT * FROM things
WHERE my_column -> 'a' ? 'X';`
The second one is proving to be more challenging. I started there with:
SELECT * FROM things
WHERE NOT my_column -> 'a' ? 'X';
... but that excluded all the records that had dictionaries that didn't include key 'a'. So I modified it like so:
SELECT * FROM things
WHERE my_column -> 'a' IS NULL OR NOT
my_column -> 'a' ? 'X';
This works, but is there a better way? Also, is it possible to index this query, and if so, how?
I'm not sure if there's any better way -- that honestly looks pretty straightforward to me.
As for indexing, there are a couple things you can do.
First, you can index the jsonb field. Putting a GIN index on that field should help with any use of "exists"-type operators (like ?).
If that isn't the solution you want for whatever reason, Postgres supports functional and partial indexes. A functional index might look like:
CREATE INDEX ON things ( my_column -> 'a' );
(note: It looks like postgres is having trouble with that syntax, which might be a bug. The concept holds, though.)
A partial index would get even more specific, and could even look like:
CREATE INDEX ON things (my_column)
WHERE my_column -> 'a' IS NULL OR NOT
my_column -> 'a' ? 'X';
Obviously, that won't help for more general queries.
At a guess, indexing the whole column with a GIN index is the right way to go, or at least the right place to start.

Index for finding an element in a JSON array

I have a table that looks like this:
CREATE TABLE tracks (id SERIAL, artists JSON);
INSERT INTO tracks (id, artists)
VALUES (1, '[{"name": "blink-182"}]');
INSERT INTO tracks (id, artists)
VALUES (2, '[{"name": "The Dirty Heads"}, {"name": "Louis Richards"}]');
There's several other columns that aren't relevant to this question. There's a reason to have them stored as JSON.
What I'm trying to do is lookup a track that has a specific artist name (exact match).
I'm using this query:
SELECT * FROM tracks
WHERE 'ARTIST NAME' IN
(SELECT value->>'name' FROM json_array_elements(artists))
for example
SELECT * FROM tracks
WHERE 'The Dirty Heads' IN
(SELECT value->>'name' FROM json_array_elements(artists))
However, this does a full table scan, and it isn't very fast. I tried creating a GIN index using a function names_as_array(artists), and used 'ARTIST NAME' = ANY names_as_array(artists), however the index isn't used and the query is actually significantly slower.
jsonb in Postgres 9.4+
The binary JSON data type jsonb largely improves index options. You can now have a GIN index on a jsonb array directly:
CREATE TABLE tracks (id serial, artists jsonb); -- !
CREATE INDEX tracks_artists_gin_idx ON tracks USING gin (artists);
No need for a function to convert the array. This would support a query:
SELECT * FROM tracks WHERE artists #> '[{"name": "The Dirty Heads"}]';
#> being the jsonb "contains" operator, which can use the GIN index. (Not for json, only jsonb!)
Or you use the more specialized, non-default GIN operator class jsonb_path_ops for the index:
CREATE INDEX tracks_artists_gin_idx ON tracks
USING gin (artists jsonb_path_ops); -- !
Same query.
Currently jsonb_path_ops only supports the #> operator. But it's typically much smaller and faster. There are more index options, details in the manual.
If the column artists only holds names as displayed in the example, it would be more efficient to store just the values as JSON text primitives and the redundant key can be the column name.
Note the difference between JSON objects and primitive types:
Using indexes in json array in PostgreSQL
CREATE TABLE tracks (id serial, artistnames jsonb);
INSERT INTO tracks VALUES (2, '["The Dirty Heads", "Louis Richards"]');
CREATE INDEX tracks_artistnames_gin_idx ON tracks USING gin (artistnames);
Query:
SELECT * FROM tracks WHERE artistnames ? 'The Dirty Heads';
? does not work for object values, just keys and array elements.
Or:
CREATE INDEX tracks_artistnames_gin_idx ON tracks
USING gin (artistnames jsonb_path_ops);
Query:
SELECT * FROM tracks WHERE artistnames #> '"The Dirty Heads"'::jsonb;
More efficient if names are highly duplicative.
json in Postgres 9.3+
This should work with an IMMUTABLE function:
CREATE OR REPLACE FUNCTION json2arr(_j json, _key text)
RETURNS text[] LANGUAGE sql IMMUTABLE AS
'SELECT ARRAY(SELECT elem->>_key FROM json_array_elements(_j) elem)';
Create this functional index:
CREATE INDEX tracks_artists_gin_idx ON tracks
USING gin (json2arr(artists, 'name'));
And use a query like this. The expression in the WHERE clause has to match the one in the index:
SELECT * FROM tracks
WHERE '{"The Dirty Heads"}'::text[] <# (json2arr(artists, 'name'));
Updated with feedback in comments. We need to use array operators to support the GIN index.
The "is contained by" operator <# in this case.
Notes on function volatility
You can declare your function IMMUTABLE even if json_array_elements() isn't wasn't.
Most JSON functions used to be only STABLE, not IMMUTABLE. There was a discussion on the hackers list to change that. Most are IMMUTABLE now. Check with:
SELECT p.proname, p.provolatile
FROM pg_proc p
JOIN pg_namespace n ON n.oid = p.pronamespace
WHERE n.nspname = 'pg_catalog'
AND p.proname ~~* '%json%';
Functional indexes only work with IMMUTABLE functions.