How to optimize this simple yet slow query

How to optimize this simple yet slow query - sql

I have a relatively simple table with several columns, two of them are expires_at(Date) and museum_id(BIGINT, FOREIGN). Both indexed, also using a compound index. The table contains around 3 million of rows in it.
Running query as simple as this takes around 90 seconds to complete:
SELECT *
FROM external_users
WHERE museum_id = 356
AND ((expires_at > '2022-02-16 07:35:39.818117') OR expires_at IS NULL)
Here is the explain analyze output:
Bitmap Heap Scan on external_users (cost=2595.76..148500.40 rows=59259 width=1255) (actual time=4901.257..90786.702 rows=94272 loops=1)
Recheck Cond: (((museum_id = 356) AND (expires_at > '2022-02-16'::date)) OR ((museum_id = 356) AND (expires_at IS NULL)))
Rows Removed by Index Recheck: 391889
Heap Blocks: exact=34133 lossy=33698
-> BitmapOr (cost=2595.76..2595.76 rows=63728 width=0) (actual time=4671.804..4671.806 rows=0 loops=1)
-> Bitmap Index Scan on index_external_users_on_museum_id_and_expires_at (cost=0.00..2187.79 rows=54336 width=0) (actual time=1229.564..1229.564 rows=33671 loops=1)
Index Cond: ((museum_id = 356) AND (expires_at > '2022-02-16'::date))
-> Bitmap Index Scan on index_external_users_on_museum_id_and_expires_at (cost=0.00..378.34 rows=9391 width=0) (actual time=3442.238..3442.238 rows=64337 loops=1)
Index Cond: ((museum_id = 356) AND (expires_at IS NULL))
Planning Time: 266.470 ms
Execution Time: 90838.777 ms
I can't really see anything helpful in the explain/analyze output but that might be related to my lack of experience in such. My peer-reviewer also didn't saw anything interesting in there which makes me think - is there anything i can do in order to help postgres handle queries like that faster or is it just the way it is for tables with over 3M records?

I will explain to you some rules, ways to optimize this query.
1 - When you use OR command on the where conditions, then DB can not use indexes. Recommended using union all. Example:
select *
from external_users
where museum_id = 356
and expires_at > '2022-02-16 07:35:39.818117'
union all
select *
from external_users
where museum_id = 356
and expires_at is null
2 - Your expires_at field may be a timestamp type. But, date types are faster than timestamp types. Because in timestamp types stored hours, minutes, seconds. Also, indexing size timestamp types will be greater than date type indexing size. If you need to store full datetime, then you can use casting types. For the best performance you must create a function-based index (on PostgreSQL this is called expression index), but not a standard index.
select *
from external_users
where museum_id = 356
and expires_at::date > '2022-02-16'
union all
select *
from external_users
where museum_id = 356
and expires_at is NULL
/*
We must cast `expires_at` field type to date type during creating the indexing process. Because in our query we use casting this type, so we must create an index via casting data.
*/
create index external_users_expires_at_idx
ON external_users USING btree ((expires_at::date));
3 - In where conditions if you are using always two, three fields samely, recommended creating one index for these fields, but not separately. In your query maybe always use museum_id and expires_at fields. Create index sample code:
create index external_users_full_index on external_users using btree (museum_id, (expires_at ::date));
The most important of all these ways is the first rule, so not using OR command.

Related

Why is my query uses filtering instead of index cond when I use an `OR` condition?

I have a transactions table in PostgreSQL with block_height and index as BIGINT values. Those two values are used for determining the order of the transactions in this table.
So if I want to query transactions from this table that comes after the given block_height and index, I'd have to put this on the condition
If two transactions are in the same block_height, then check the ordering of their index
Otherwise compare their block_height
For example if I want to get 10 transactions that came after block_height 100000 and index 5:
SELECT * FROM transactions
WHERE (
(block_height = 10000 AND index > 5)
OR (block_height > 10000)
)
ORDER BY block_height, index ASC
LIMIT 10
However I find this query to be extremely slow, it took up to 60 seconds for a table with 50 million rows.
However if I split up the condition and run them individually like so:
SELECT * FROM transactions
WHERE block_height = 10000 AND index > 5
ORDER BY block_height, index ASC
LIMIT 10
and
SELECT * FROM transactions
WHERE block_height > 10000
ORDER BY block_height, index ASC
LIMIT 10
Both queries took at most 200ms on the same table! It is much faster to do both queries and then UNION the final result instead of putting an OR in the condition.
This is the part of the query plan for the slow query (OR-ed condition):
-> Nested Loop (cost=0.98..11689726.68 rows=68631 width=73) (actual time=10230.480..10234.289 rows=10 loops=1)
-> Index Scan using src_transactions_block_height_index on src_transactions (cost=0.56..3592792.96 rows=16855334 width=73) (actual time=10215.698..10219.004 rows=1364 loops=1)
Filter: (((block_height = $1) AND (index > $2)) OR (block_height > $3))
Rows Removed by Filter: 2728151
And this is the query plan for the fast query:
-> Nested Loop (cost=0.85..52.62 rows=1 width=73) (actual time=0.014..0.014 rows=0 loops=1)
-> Index Scan using src_transactions_block_height_index on src_transactions (cost=0.43..22.22 rows=5 width=73) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: ((block_height = $1) AND (index > $2))
I see the main difference to be the use of Filter instead of Index Cond between the query plans.
Is there any way to do this query in a performant way without resorting to the UNION workaround?

The fact that block_height is compared to two different parameters which you know just happen to be equal might be a problem. What if you use $1 twice, rather than $1 and $3?
But better yet, try a tuple comparison
WHERE (block_height, index) > (10000, 5)
This can become fast with a two-column index on (block_height, index).

With more than 30 millions entries in a table, a select count on it takes 19 minutes to perform

I am using PostgreSQL 9.4.8 32 bits on a Windows 7 64 bits OS.
I am using a RAID 5 on 3 disks of 2T eachs. CPU is Xeon E3-1225v3 with 8G of RAM.
In a table, I have inserted more than 30 millions entries (I want to go up to 50 millions).
Performing a select count(*) on this table is taking more than 19 minutes. Performing this query a second times reduce it to 14 minutes, but it is still slow. Indexes doesn't seems to do anything.
My postgresql.conf is setup like this at the end of the file :
max_connections = 100
shared_buffers = 512MB
effective_cache_size = 6GB
work_mem = 13107kB
maintenance_work_mem = 512MB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 1.2
Here is the schema of this table :
CREATE TABLE recorder.records
(
recorder_id smallint NOT NULL DEFAULT 200,
rec_start timestamp with time zone NOT NULL,
rec_end timestamp with time zone NOT NULL,
deleted boolean NOT NULL DEFAULT false,
channel_number smallint NOT NULL,
channel_name text,
from_id text,
from_name text,
to_id text,
to_name text,
type character varying(32),
hash character varying(128),
codec character varying(16),
id uuid NOT NULL,
status smallint,
duration interval,
CONSTRAINT records_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
)
CREATE INDEX "idxRecordChanName"
ON recorder.records
USING btree
(channel_name COLLATE pg_catalog."default");
CREATE INDEX "idxRecordChanNumber"
ON recorder.records
USING btree
(channel_number);
CREATE INDEX "idxRecordEnd"
ON recorder.records
USING btree
(rec_end);
CREATE INDEX "idxRecordFromId"
ON recorder.records
USING btree
(from_id COLLATE pg_catalog."default");
CREATE INDEX "idxRecordStart"
ON recorder.records
USING btree
(rec_start);
CREATE INDEX "idxRecordToId"
ON recorder.records
USING btree
(to_id COLLATE pg_catalog."default");
CREATE INDEX "idxRecordsStart"
ON recorder.records
USING btree
(rec_start);
CREATE TRIGGER trig_update_duration
AFTER INSERT
ON recorder.records
FOR EACH ROW
EXECUTE PROCEDURE recorder.fct_update_duration();
My query is like this :
select count(*) from recorder.records as rec where rec.rec_start < '2016-01-01' and channel_number != 42;
Explain analyse of this query :
Aggregate (cost=1250451.14..1250451.15 rows=1 width=0) (actual time=956017.494..956017.494 rows=1 loops=1)
-> Seq Scan on records rec (cost=0.00..1195534.66 rows=21966592 width=0) (actual time=34.581..950947.593 rows=23903295 loops=1)
Filter: ((rec_start < '2016-01-01 00:00:00-06'::timestamp with time zone) AND (channel_number <> 42))
Rows Removed by Filter: 7377886
Planning time: 0.348 ms
Execution time: 956017.586 ms
The same now, but by disabling seqscan :
Aggregate (cost=1456272.87..1456272.88 rows=1 width=0) (actual time=929963.288..929963.288 rows=1 loops=1)
-> Bitmap Heap Scan on records rec (cost=284158.85..1401356.39 rows=21966592 width=0) (actual time=118685.228..925629.113 rows=23903295 loops=1)
Recheck Cond: (rec_start < '2016-01-01 00:00:00-06'::timestamp with time zone)
Rows Removed by Index Recheck: 2798893
Filter: (channel_number <> 42)
Rows Removed by Filter: 612740
Heap Blocks: exact=134863 lossy=526743
-> Bitmap Index Scan on "idxRecordStart" (cost=0.00..278667.20 rows=22542169 width=0) (actual time=118628.930..118628.930 rows=24516035 loops=1)
Index Cond: (rec_start < '2016-01-01 00:00:00-06'::timestamp with time zone)
Planning time: 0.279 ms
Execution time: 929965.547 ms
How can I make this kind of query faster ?
Added :
I have created an index using rec_start and channel_number, and after a vacuum analyse that took 57minutes, the query now is done in a little more than 3 minutes :
CREATE INDEX "plopLindex"
ON recorder.records
USING btree
(rec_start, channel_number);
Explain buffers verbose of the same query :
explain (analyse, buffers, verbose) select count(*) from recorder.records as rec where rec.rec_start < '2016-01-01' and channel_number != 42;
Aggregate (cost=875328.61..875328.62 rows=1 width=0) (actual time=199610.874..199610.874 rows=1 loops=1)
Output: count(*)
Buffers: shared hit=69490 read=550462 dirtied=75118 written=51880"
-> Index Only Scan using "plopLindex" on recorder.records rec (cost=0.56..814734.15 rows=24237783 width=0) (actual time=66.115..197609.019 rows=23903295 loops=1)
Output: rec_start, channel_number
Index Cond: (rec.rec_start < '2016-01-01 00:00:00-06'::timestamp with time zone)
Filter: (rec.channel_number <> 42)
Rows Removed by Filter: 612740
Heap Fetches: 5364345
Buffers: shared hit=69490 read=550462 dirtied=75118 written=51880
Planning time: 12.416 ms
Execution time: 199610.988 ms
Then performing a second time this query (without explain) : 11secs ! great improvement.

Seing your number of rows, this doesnt sound abnormal to me and is going to be the same on other RDBMS.
You have too many rows to get the results fast and since you have a WHERE clause, the only solution to get your row count fast is to create specific table(s) to keep track of that, populated with either a TRIGGER on INSERT, or a batch job.
The TRIGGER solution is 100% accurate but more intensive, the batch solution is approximative but more flexible, and the more you increase the batch job frequency, the more accurate your stats are;
In your case I would go for the second solution and create one or several aggregation tables.
You can have for instance a batch job that count all rows grouped by date and channel
An example of an aggregation table for this specific need would be
CREATE TABLE agr_table (AGR_TYPE CHAR(50), AGR_DATE DATE, AGR_CHAN SMALLINT, AGR_CNT INT)
Your batch job would do:
DELETE FROM agr_table WHERE AGR_TYPE='group_by_date_and_channel';
INSERT INTO agr_table
SELECT 'group_by_date_and_channel', rec_start, channel_number, count(*) as cnt
FROM recorder.records
GROUP BY rec_start, channel_number
;
Then you can retrieve stats fast by doing :
SELECT SUM(cnt)
FROM agr_table
WHERE AGR_DATE < '2016-01-01' and AGR_CHAN != 42
That's a very simple example of course. You should design your agregation table(s) depending on the stats you need to retrieve fast.
I would suggest to read carefully Postgres Slow Counting and Postgres Count Estimate

Yes, you've created the correct index to cover your query arguments. Thomas G also suggested you a nice workaround. I totally agree.
But there is another thing I want to share with you as well: the fact the second run took only 11sec (against 3min from the first one) sounds to me you are facing a "caching issue".
When you ran the first execution, postgres grabed table pages from disk to the RAM and when you did the second run, everything it needs already was in memory and it took only 11sec to run.
I used to have the exactly same problem and my "best" solution was simply give postgres more shared_buffers. I don't rely on OS's file cache. I reserve most memory I can to postgres use. But it's a pain in the *** do that simple change in windows. You have OS limitations and Windows "wastes" too much memory to run it self. It's a shame.
Believe me... you don't have to change your hardware adding more RAM (either way, adding more memory is always something good!). The most effective change is to change your OS. And if you have a "dedicated" server, why to waste so precious memory with video/sound/drivers/services/AV/etc... on those things that you don't (or won't) ever use?
Go to a Linux OS (Ubuntu Server, perhaps?) and get much more performance at exactly same hardware.
Change kernel.shmmax to a greater value:
sysctl -w kernel.shmmax=14294967296
echo kernel.shmmax = 14294967296 >>/etc/sysctl.conf
and then you can change postgresql.conf to:
shared_buffers = 6GB
effective_cache_size = 7GB
work_mem = 128MB
You're gonna feel the difference right the way.

How to optimize query by index PostgreSQL

I want to fetch users that has 1 or more processed bets. I do this by using next sql:
SELECT user_id FROM bets
WHERE bets.state in ('guessed', 'losed')
GROUP BY user_id
HAVING count(*) > 0;
But running EXPLAIN ANALYZE I noticed no index is used and query execution time is very high. I tried add partial index like:
CREATE INDEX processed_bets_index ON bets(state) WHERE state in ('guessed', 'losed');
But EXPLAIN ANALYZE output not changed:
HashAggregate (cost=34116.36..34233.54 rows=9375 width=4) (actual time=235.195..237.623 rows=13310 loops=1)
Filter: (count(*) > 0)
-> Seq Scan on bets (cost=0.00..30980.44 rows=627184 width=4) (actual time=0.020..150.346 rows=626674 loops=1)
Filter: ((state)::text = ANY ('{guessed,losed}'::text[]))
Rows Removed by Filter: 20951
Total runtime: 238.115 ms
(6 rows)
Records with other statuses except (guessed, losed) a little.
How do I create proper index?
I'm using PostgreSQL 9.3.4.

I assume that the state mostly consists of 'guessed' and 'losed', with maybe a few other states as well in there. So most probably the optimizer do not see the need to use the index since it would still fetch most of the rows.
What you do need is an index on the user_id, so perhaps something like this would work:
CREATE INDEX idx_bets_user_id_in_guessed_losed ON bets(user_id) WHERE state in ('guessed', 'losed');
Or, by not using a partial index:
CREATE INDEX idx_bets_state_user_id ON bets(state, user_id);

IN vs OR in the SQL WHERE clause

When dealing with big databases, which performs better: IN or OR in the SQL WHERE clause?
Is there any difference about the way they are executed?

I assume you want to know the performance difference between the following:
WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'
According to the manual for MySQL if the values are constant IN sorts the list and then uses a binary search. I would imagine that OR evaluates them one by one in no particular order. So IN is faster in some circumstances.
The best way to know is to profile both on your database with your specific data to see which is faster.
I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:
SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)
SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)
So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.

The best way to find out is looking at the Execution Plan.
I tried it with Oracle, and it was exactly the same.
CREATE TABLE performance_test AS ( SELECT * FROM dba_objects );
SELECT * FROM performance_test
WHERE object_name IN ('DBMS_STANDARD', 'DBMS_REGISTRY', 'DBMS_LOB' );
Even though the query uses IN, the Execution Plan says that it uses OR:
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 8 | 1416 | 163 (2)| 00:00:02 |
|* 1 | TABLE ACCESS FULL| PERFORMANCE_TEST | 8 | 1416 | 163 (2)| 00:00:02 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("OBJECT_NAME"='DBMS_LOB' OR "OBJECT_NAME"='DBMS_REGISTRY' OR
"OBJECT_NAME"='DBMS_STANDARD')

The OR operator needs a much more complex evaluation process than the IN construct because it allows many conditions, not only equals like IN.
Here is a list of what you can use with OR but that are not compatible with IN:
greater, greater or equal, less, less or equal, LIKE and some more like the oracle REGEXP_LIKE.
In addition, consider that the conditions may not always compare the same value.
For the query optimizer it's easier to to manage the IN operator because is only a construct that defines the OR operator on multiple conditions with = operator on the same value. If you use the OR operator the optimizer may not consider that you're always using the = operator on the same value and, if it doesn't perform a deeper and more complex elaboration, it could probably exclude that there may be only = operators for the same values on all the involved conditions, with a consequent preclusion of optimized search methods like the already mentioned binary search.
[EDIT]
Probably an optimizer may not implement optimized IN evaluation process, but this doesn't exclude that one time it could happen(with a database version upgrade). So if you use the OR operator that optimized elaboration will not be used in your case.

I think oracle is smart enough to convert the less efficient one (whichever that is) into the other. So I think the answer should rather depend on the readability of each (where I think that IN clearly wins)

OR makes sense (from readability point of view), when there are less values to be compared.
IN is useful esp. when you have a dynamic source, with which you want values to be compared.
Another alternative is to use a JOIN with a temporary table.
I don't think performance should be a problem, provided you have necessary indexes.

I'll add info for PostgreSQL version 11.8 (released 2020-05-14).
IN may be significantly faster. E.g. table with ~23M rows.
Query with OR:
explain analyse select sum(mnozstvi_rozdil)
from product_erecept
where okres_nazev = 'Brno-město' or okres_nazev = 'Pardubice';
-- execution plan
Finalize Aggregate (cost=725977.36..725977.37 rows=1 width=32) (actual time=4536.796..4540.748 rows=1 loops=1)
-> Gather (cost=725977.14..725977.35 rows=2 width=32) (actual time=4535.010..4540.732 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=724977.14..724977.15 rows=1 width=32) (actual time=4519.338..4519.339 rows=1 loops=3)
-> Parallel Bitmap Heap Scan on product_erecept (cost=15589.71..724264.41 rows=285089 width=4) (actual time=135.832..4410.525 rows=230706 loops=3)
Recheck Cond: (((okres_nazev)::text = 'Brno-město'::text) OR ((okres_nazev)::text = 'Pardubice'::text))
Rows Removed by Index Recheck: 3857398
Heap Blocks: exact=11840 lossy=142202
-> BitmapOr (cost=15589.71..15589.71 rows=689131 width=0) (actual time=140.985..140.986 rows=0 loops=1)
-> Bitmap Index Scan on product_erecept_x_okres_nazev (cost=0.00..8797.61 rows=397606 width=0) (actual time=99.371..99.371 rows=397949 loops=1)
Index Cond: ((okres_nazev)::text = 'Brno-město'::text)
-> Bitmap Index Scan on product_erecept_x_okres_nazev (cost=0.00..6450.00 rows=291525 width=0) (actual time=41.612..41.612 rows=294170 loops=1)
Index Cond: ((okres_nazev)::text = 'Pardubice'::text)
Planning Time: 0.162 ms
Execution Time: 4540.829 ms
Query with IN:
explain analyse select sum(mnozstvi_rozdil)
from product_erecept
where okres_nazev in ('Brno-město', 'Pardubice');
-- execution plan
Aggregate (cost=593199.90..593199.91 rows=1 width=32) (actual time=855.706..855.707 rows=1 loops=1)
-> Index Scan using product_erecept_x_okres_nazev on product_erecept (cost=0.56..591477.07 rows=689131 width=4) (actual time=1.326..645.597 rows=692119 loops=1)
Index Cond: ((okres_nazev)::text = ANY ('{Brno-město,Pardubice}'::text[]))
Planning Time: 0.136 ms
Execution Time: 855.743 ms

Even though you use the IN operator MS SQL server will automatically convert it to OR operator. If you analyzed the execution plans will able to see this. So better to use it OR if its long IN operator list. it will at least save some nanoseconds of the operation.

I did a SQL query in a large number of OR (350). Postgres do it 437.80ms.
Now use IN:
23.18ms

How to optimize my PostgreSQL DB for prefix search?

I have a table called "nodes" with roughly 1.7 million rows in my PostgreSQL db
=#\d nodes
Table "public.nodes"
Column | Type | Modifiers
--------+------------------------+-----------
id | integer | not null
title | character varying(256) |
score | double precision |
Indexes:
"nodes_pkey" PRIMARY KEY, btree (id)
I want to use information from that table for autocompletion of a search field, showing the user a list of the ten titles having the highest score fitting to his input. So I used this query (here searching for all titles starting with "s")
=# explain analyze select title,score from nodes where title ilike 's%' order by score desc;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Sort (cost=64177.92..64581.38 rows=161385 width=25) (actual time=4930.334..5047.321 rows=161264 loops=1)
Sort Key: score
Sort Method: external merge Disk: 5712kB
-> Seq Scan on nodes (cost=0.00..46630.50 rows=161385 width=25) (actual time=0.611..4464.413 rows=161264 loops=1)
Filter: ((title)::text ~~* 's%'::text)
Total runtime: 5260.791 ms
(6 rows)
This was much to slow for using it with autocomplete. With some information from Using PostgreSQL in Web 2.0 Applications I was able to improve that with a special index
=# create index title_idx on nodes using btree(lower(title) text_pattern_ops);
=# explain analyze select title,score from nodes where lower(title) like lower('s%') order by score desc limit 10;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=18122.41..18122.43 rows=10 width=25) (actual time=1324.703..1324.708 rows=10 loops=1)
-> Sort (cost=18122.41..18144.60 rows=8876 width=25) (actual time=1324.700..1324.702 rows=10 loops=1)
Sort Key: score
Sort Method: top-N heapsort Memory: 17kB
-> Bitmap Heap Scan on nodes (cost=243.53..17930.60 rows=8876 width=25) (actual time=96.124..1227.203 rows=161264 loops=1)
Filter: (lower((title)::text) ~~ 's%'::text)
-> Bitmap Index Scan on title_idx (cost=0.00..241.31 rows=8876 width=0) (actual time=90.059..90.059 rows=161264 loops=1)
Index Cond: ((lower((title)::text) ~>=~ 's'::text) AND (lower((title)::text) ~<~ 't'::text))
Total runtime: 1325.085 ms
(9 rows)
So this gave me a speedup of factor 4. But can this be further improved? What if I want to use '%s%' instead of 's%'? Do I have any chance of getting a decent performance with PostgreSQL in that case, too? Or should I better try a different solution (Lucene?, Sphinx?) for implementing my autocomplete feature?

You will need a text_pattern_ops index if you're not in the C locale.
See: index types.

Tips for further investigation :
Partition the table on the title key. This makes the lists smaller that postgres need to work with.
give postgresql more memory so the cache hit rate > 98%. This table will take about 0.5G, I think 2G should be no problem nowadays. Make sure statistics collection is enabled and read up on the pg_stats tables.
Make a second table with a reduced sustring of the title e.g. 12 characters so the complete table fits in less database blocks. An index on a substring may also work, but requires careful querying.
The long the substring, the faster the query will run. Create a separate table for small substrings, and store in the value the top ten or whatever of choices you would want to show. There are about 20000 combinations of 1,2,3 character strings.
You can use the same idea if you want to have %abc% queries, but probably switching to lucene makes sense now.

You're obviously not interested in 150000+ results, so you should limit them:
select title,score
from nodes
where title ilike 's%'
order by score desc
limit 10;
You can also consider creating functional index, and using ">=" and "<":
create index nodes_title_lower_idx on nodes (lower(title));
select title,score
from nodes
where lower(title)>='s' and lower(title)<'t'
order by score desc
limit 10;
You should also create index on score, which will help in ilike %s% case.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas