I have the following query:
SELECT "survey_results".* FROM "survey_results" WHERE (raw #> '{"client":{"token":"test_token"}}');
EXPLAIN ANALYZE returns following results:
Seq Scan on survey_results (cost=0.00..352.68 rows=2 width=2039) (actual time=132.942..132.943 rows=1 loops=1)
Filter: (raw #> '{"client": {"token": "test_token"}}'::jsonb)
Rows Removed by Filter: 2133
Planning time: 0.157 ms
Execution time: 132.991 ms
(5 rows)
I want to add index on client key inside raw field so search will be faster. I don't know how to do it. When I add index for whole raw column like this:
CREATE INDEX test_index on survey_results USING GIN (raw);
then everything works as expected. I don't want to add index for whole raw because I have a lot of records in database and I do not want to increase its size.
If you are using JSON objects as atm in the example then you can specify index only client like that:
CREATE INDEX test_client_index ON survey_results USING GIN (( raw->'client ));
But since you are using #> operator in your query then in your case it might make sense to create index only for that operator like that:
CREATE INDEX test_client_index ON survey_results USING GIN (raw jsonb_path_ops);
See more from documentation about Postgres JSONB indexing:
Related
I have the following SQL statement to filter data with a regex search:
select * from others.table
where vintage ~* '(17|18|19|20)[0-9]{2,}'
Upon some researching, I found that I need to create gin/gist index for better performance:
create index idx_vintage_gist on others.table using gist (vintage gist_trgm_ops);
create index idx_vintage_gin on others.table using gin (vintage gin_trgm_ops);
create index idx_vintage_varchar on others.table using btree (vintage varchar_pattern_ops);
Looking at the explain plan, it is not using any index but a seq scan:
Seq Scan on table t (cost=0.00..45412.25 rows=1070800 width=91) (actual time=0.038..8518.830 rows=1075980 loops=1)
Filter: (vintage ~* '(17|18|19|20)[0-9]{2,}'::text)
Rows Removed by Filter: 25400
Planning Time: 0.481 ms
Execution Time: 8767.998 ms
There are total 1101380 rows in the table.
My question is why is it not using any index for the regex search?
(Answer was in comments; posting as community wiki.)
From the execution plan, 1070800 rows were expected to be returned, which is 1070800/1101380 ≈ 97.2% of the table. With so much of the table being in the results, using an index wouldn't be advantageous, so a sequential scan is performed.
Consider a table defined as follows:
CREATE TABLE test (
id int4 NOT NULL,
tag_counts _jsonb NOT NULL DEFAULT ARRAY[]::jsonb[]
);
INSERT INTO test(id, tag_counts) values(1,array['{"type":1, "count":4}','{"type":2, "count":10}' ]::jsonb[])
How can I create an index on json key type and how can I query on it?
Edit: Previously, there were no indexes on json keys and select queries used an unnest operation as shown below:
select * from (SELECT unnest(tag_counts) as tc
FROM public.test) as t
where tc->'type' = '2';
The problem is, if the table has a large number of rows, the above query will not only include a full table scan, but also filtering through each jsonb array.
There is a way to index this, not sure how fast it will be.
If that was a "regular" jsonb column, you could use a condition like where tag_counts #> '[{"type": 2}]' which can use a GIN index on the column.
You can use that operator if you convert the array to "plain" json value:
select *
from test
where to_jsonb(tag_counts) #> '[{"type": 2}]'
Unfortunately, to_jsonb() is not marked as immutable (I guess because of potential timestamp conversion in there) which is a requirement if you want to use an expression in an index.
But for your data, this is indeed immutable, so we can create a little wrapper function:
create function as_jsonb(p_input jsonb[])
returns jsonb
as
$$
select to_jsonb(p_input);
$$
language sql
immutable;
And with that function we can create an index:
create index on test using gin ( as_jsonb(tag_counts) jsonb_path_ops);
You will need to use that function in your query:
select *
from test
where as_jsonb(tag_counts) #> '[{"type": 2}]'
On a table with a million rows, I get the following execution plan:
Bitmap Heap Scan on stuff.test (cost=1102.62..67028.01 rows=118531 width=252) (actual time=15.145..684.062 rows=147293 loops=1)
Output: id, tag_counts
Recheck Cond: (as_jsonb(test.tag_counts) #> '[{"type": 2}]'::jsonb)
Heap Blocks: exact=25455
Buffers: shared hit=25486
-> Bitmap Index Scan on ix_test (cost=0.00..1072.99 rows=118531 width=0) (actual time=12.347..12.356 rows=147293 loops=1)
Index Cond: (as_jsonb(test.tag_counts) #> '[{"type": 2}]'::jsonb)
Buffers: shared hit=31
Planning:
Buffers: shared hit=23
Planning Time: 0.444 ms
Execution Time: 690.160 ms
I am using postgres from past few days and i came up with requirement which i am trying to find solution.
I am using postgres sql.
I have a Table which is like this
CREATE TABLE location (id location_Id, details jsonb);
INSERT INTO location (id,details)
VALUES (1,'[{"Slno" : 1, "value" : "Bangalore"},
{"Slno" : 2, "value" : "Delhi"}]');
INSERT INTO location (id,details)
VALUES (2,'[{"Slno" : 5, "value" : "Chennai"}]');
From the above queries you can see that a jsonb column with name details
is present which has an array of json as value stored.
The data is stored like this because of some requirement.
Here i want to create an index for the Slno property present in jsonb
column values.
Can someone help me out in finding solution for this as it would be
very helpful.
Thanks
It is not an index for a single JSON property, rather for the whole attribute, but you can use it to perform a search like you want:
CREATE INDEX locind ON location USING GIN (details jsonb_path_ops);
If you need the index for other operations except #> (contains), omit the jsonb_path_ops. This will make the index larger and slower.
Now you can search for the property using the index:
EXPLAIN (VERBOSE) SELECT id FROM location WHERE details #> '[{"Slno" : 1}]';
QUERY PLAN
-------------------------------------------------------------------------
Bitmap Heap Scan on laurenz.location (cost=8.00..12.01 rows=1 width=4)
Output: id
Recheck Cond: (location.details #> '[{"Slno": 1}]'::jsonb)
-> Bitmap Index Scan on locind (cost=0.00..8.00 rows=1 width=0)
Index Cond: (location.details #> '[{"Slno": 1}]'::jsonb)
(5 rows)
I have a table with an integer column called account_id. I have an index on that column.
But seems Postgres doesn't want to use my index:
EXPLAIN ANALYZE SELECT "invoices".* FROM "invoices" WHERE "invoices"."account_id" = 1;
Seq Scan on invoices (cost=0.00..6504.61 rows=117654 width=186) (actual time=0.021..33.943 rows=118027 loops=1)
Filter: (account_id = 1)
Rows Removed by Filter: 51462
Total runtime: 39.917 ms
(4 rows)
Any idea why that would be?
Because of:
Seq Scan on invoices (...) (actual ... rows=118027 <— this
Filter: (account_id = 1)
Rows Removed by Filter: 51462 <— vs this
Total runtime: 39.917 ms
You're selecting so many rows that it's cheaper to read the entire table.
Related earlier questions and answers from today for further reading:
Why doesn't Postgresql use index for IN query?
Postgres using wrong index when querying a view of indexed expressions?
(See also Craig's longer answer on the second one for additional notes on indexes subtleties.)
Given a string column with a value similar to /123/12/34/56/5/, what is the optimal way of querying for all the records that include the given number (12 for example)?
The solution from top of my head is:
SELECT id FROM things WHERE things.path LIKE '%/12/%'
But AFAIK this query can't use indexes on the column due to the leading %.
There must be something better. What is it?
Using PostgreSQL, but would prefer the solution that would work across other DBs too.
If you're happy turning that column into an array of integers, like:
'/123/12/34/56/5/' becomes ARRAY[123,12,34,56,5]
So that path_arr is a column of type INTEGER[], then you can create a GIN index on that column:
CREATE INDEX ON things USING gin(path_arr);
A query for all items containing 12 then becomes:
SELECT * FROM things WHERE ARRAY[12] <# path_arr;
Which will use the index. In my test (with a million rows), I get plans like:
EXPLAIN SELECT * FROM things WHERE ARRAY[12] <# path_arr;
QUERY PLAN
----------------------------------------------------------------------------------------
Bitmap Heap Scan on things (cost=5915.75..9216.99 rows=1000 width=92)
Recheck Cond: (path_arr <# '{12}'::integer[])
-> Bitmap Index Scan on things_path_arr_idx (cost=0.00..5915.50 rows=1000 width=0)
Index Cond: ('{12}'::integer[] <# path_arr)
(4 rows)
In PostgreSQL 9.1 you could utilize the pg_trgm module and build a GIN index with it.
CREATE EXTENSION pg_trgm; -- once per database
CREATE INDEX things_path_trgm_gin_idx ON things USING gin (path gin_trgm_ops);
Your LIKE expression can use this index even if it is not left-anchored.
See a detailed demo by depesz here.
Normalize it If you can, though.