SQLite speed up select with collate nocase - sql

I have SQLite db:
CREATE TABLE IF NOT EXISTS Commits
(
GlobalVer INTEGER PRIMARY KEY,
Data blob NOT NULL
) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
I want to make 1 select:
SELECT Commits.Data
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ?
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
after that i want to make another select:
SELECT Commits.Data,Streams.Name
FROM Streams JOIN Commits ON Streams.GlobalVer=Commits.GlobalVer
WHERE
Streams.Name = ? COLLATE NOCASE
ORDER BY Streams.GlobalVer
LIMIT ? OFFSET ?
The problem is that 2nd select works super slow. I think this is because COLLATE NOCASE. I want to speed up it. I tried to add index but it doesn't help (may be i did sometinhg wrong?). How to execute 2nd query with speed approximately equals to 1st query's speed?

An index can be used to speed up a search only if it uses the same collation as the query.
By default, an index takes the collation from the table column, so you could change the table definition:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
However, this would make the first query slower.
To speed up both queries, you need two indexes, one for each collation. So to use the default collation for the implicit index, and NOCASE for the explicit index:
CREATE TABLE IF NOT EXISTS Streams
(
Name char(40) NOT NULL COLLATE NOCASE,
GlobalVer INTEGER NOT NULL,
PRIMARY KEY(Name, GlobalVer)
) WITHOUT ROWID;
CREATE INDEX IF NOT EXISTS Streams_nocase_idx ON Streams(Name COLLATE NOCASE, GlobalVar);
(Adding the second column to the index speeds up the ORDER BY in this query.)

Related

In timescale db, while querying compressed chunks, segment by indexes are not getting used. How to make sure that indexes are used while querying?

Below is my table schema:-
CREATE TABLE public.table1
(
machinecode character varying(100) COLLATE pg_catalog."default" NOT NULL,
groupname character varying(50) COLLATE pg_catalog."default" NOT NULL,
sensor character varying(100) COLLATE pg_catalog."default" NOT NULL,
"time" timestamp without time zone NOT NULL,
value1 integer,
value2 real,
PRIMARY KEY (machinecode, groupname,sensor, "time")
)
CREATE INDEX ON table1 (machinecode,groupname,sensor, time DESC);
SELECT create_hypertable('table1', 'time');
SELECT set_chunk_time_interval('table1', INTERVAL '24 hours');
ALTER TABLE table1 SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'machinecode,groupname,sensor'
);
SELECT add_compression_policy('table1', INTERVAL '7 days');
Image 1
But as can be seen from image 1, in explain no index scan is used on machinecode
select * from _timescaledb_catalog.hypertable
where id =
(SELECT compressed_hypertable_id FROM _timescaledb_catalog.hypertable
WHERE table_name='table1');
image 2
From image 2 it can be seen that compressed_hypertable_18 is formed and indexes on same is
CREATE INDEX _compressed_hypertable_18_machinecode__ts_meta_sequence_num_idx ON _timescaledb_internal._compressed_hypertable_18 USING btree (machinecode, _ts_meta_sequence_num)
CREATE INDEX _compressed_hypertable_18_groupname__ts_meta_sequence_num_idx ON _timescaledb_internal._compressed_hypertable_18 USING btree (groupname, _ts_meta_sequence_num)
CREATE INDEX _compressed_hypertable_18_sensor__ts_meta_sequence_num_idx ON _timescaledb_internal._compressed_hypertable_18 USING btree (sensor, _ts_meta_sequence_num)
But still in explain, query is not using index.
How can i make sure that indexes are used while querying.
Your query plan shows that the issue is casting the column to TEXT. This answer in GH suggest to use TEXT for the segmentby columns. In your case you use type VARYING, which introduces the need for casting. And since casting is used on the column, the index cannot be used.

Postgresql Query is very Slow

I have a table with 300000 rows and when i run a simple query like
select * from diario_det;
it leaves 41041 ms to return rows. It's fine that? How i can optimize the query?
I use Postgresql 9.3 in Centos 7.
Here's is my table
CREATE TABLE diario_det
(
cod_empresa numeric(2,0) NOT NULL,
nro_asiento numeric(8,0) NOT NULL,
nro_secue_pase numeric(4,0) NOT NULL,
descripcion_pase character varying(150) NOT NULL,
monto_debe numeric(16,3),
monto_haber numeric(16,3),
estado character varying(1) NOT NULL,
cod_pcuenta character varying(15) NOT NULL,
cod_local numeric(2,0) NOT NULL,
cod_centrocosto numeric(4,0) NOT NULL,
cod_ejercicio numeric(4,0) NOT NULL,
nro_comprob character varying(15),
conciliado_por character varying(10),
CONSTRAINT fk_diario_det_cab FOREIGN KEY (cod_empresa, cod_local, cod_ejercicio, nro_asiento)
REFERENCES diario_cab (cod_empresa, cod_local, cod_ejercicio, nro_asiento) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION,
CONSTRAINT fk_diario_det_pc FOREIGN KEY (cod_empresa, cod_pcuenta)
REFERENCES plan_cuenta (cod_empresa, cod_pcuenta) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=TRUE
);
ALTER TABLE diario_det
OWNER TO postgres;
-- Index: pk_diario_det_ax
-- DROP INDEX pk_diario_det_ax;
CREATE INDEX pk_diario_det_ax
ON diario_det
USING btree
(cod_pcuenta COLLATE pg_catalog."default", cod_local, estado COLLATE pg_catalog."default");
Very roughly size of one row is 231 bytes, times 300000... It's 69300000 bytes (~69MB) that has to be transferred from server to client.
I think that 41 seconds is a bit long, but still the query has to be slow because of amount of data that has to be loaded from disk and transferred.
You can optimise query by
selecting just columns you that are going to use not all of them (if you need just cod_empresa it would reduce total amount of transferred data to ~1.2MB, but server would still have to iterate trough all records - slow)
filter only rows that are going to use - using WHERE on columns with indexes can really speed the query up
If you want to know what is happening in your query, play around with EXPLAIN and EXPLAIN EXECUTE.
Also, if you're running dedicated database server, be sure to configure it properly to use a lot of system resources.

Any way to achieve fulltext-like search on InnoDB

I have a very simple query:
SELECT ... WHERE row LIKE '%some%' OR row LIKE '%search%' OR row LIKE '%string%'
to search for some search string, but as you can see, it searches for each string individually and it's also not good for performance.
Is there a way to recreate a fulltext-like search using LIKE on an InnoDB table. Of course, I know I can use something like Sphinx to achieve this but I'm looking for a pure MySQL solution.
use a myisam fulltext table to index back into your innodb tables for example:
Build your system using innodb:
create table users (...) engine=innodb;
create table forums (...) engine=innodb;
create table threads
(
forum_id smallint unsigned not null,
thread_id int unsigned not null default 0,
user_id int unsigned not null,
subject varchar(255) not null, -- gonna want to search this... !!
created_date datetime not null,
next_reply_id int unsigned not null default 0,
view_count int unsigned not null default 0,
primary key (forum_id, thread_id) -- composite clustered PK index
)
engine=innodb;
Now the fulltext search table which we will use just to index back into our innodb tables. You can maintain rows in this table either by using a trigger or nightly batch updates etc.
create table threads_ft
(
forum_id smallint unsigned not null,
thread_id int unsigned not null default 0,
subject varchar(255) not null,
fulltext (subject), -- fulltext index on subject
primary key (forum_id, thread_id) -- composite non-clustered index
)
engine=myisam;
Finally the search stored procedure which you call from your php/application:
drop procedure if exists ft_search_threads;
delimiter #
create procedure ft_search_threads
(
in p_search varchar(255)
)
begin
select
t.*,
f.title as forum_title,
u.username,
match(tft.subject) against (p_search in boolean mode) as rank
from
threads_ft tft
inner join threads t on tft.forum_id = t.forum_id and tft.thread_id = t.thread_id
inner join forums f on t.forum_id = f.forum_id
inner join users u on t.user_id = u.user_id
where
match(tft.subject) against (p_search in boolean mode)
order by
rank desc
limit 100;
end;
call ft_search_threads('+innodb +clustered +index');
Hope this helps :)
Using PHP to construct the query. This is an horrible hack. Once seen, it can't be unseen...
$words=dict($userQuery);
$numwords = sizeof($words);
$innerquery="";
for($i=0;$i<$numwords;$i++) {
$words[$i] = mysql_real_escape_string($words[$i]);
if($i>0) $innerquery .= " AND ";
$innerquery .= "
(
field1 LIKE \"%$words[$i]%\" OR
field2 LIKE \"%$words[$i]%\" OR
field3 LIKE \"%$words[$i]%\" OR
field4 LIKE \"%$words[$i]%\"
)
";
}
SELECT fields FROM table WHERE $innerquery AND whatever;
dict is a dictionary function
InnoDB full-text search (FTS) is finally available in MySQL 5.6.4 release.
These indexes are physically represented as entire InnoDB tables, which are acted upon by SQL keywords such as the FULLTEXT clause of the CREATE INDEX statement, the MATCH() ... AGAINST syntax in a SELECT statement, and the OPTIMIZE TABLE statement.
From FULLTEXT Indexes

MySQL query slow when selecting VARCHAR

I have this table:
CREATE TABLE `search_engine_rankings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`keyword_id` int(11) DEFAULT NULL,
`search_engine_id` int(11) DEFAULT NULL,
`total_results` int(11) DEFAULT NULL,
`rank` int(11) DEFAULT NULL,
`url` varchar(255) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
`indexed_at` date DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
KEY `search_engine_rankings_search_engine_id_fk` (`search_engine_id`),
CONSTRAINT `search_engine_rankings_keyword_id_fk` FOREIGN KEY (`keyword_id`) REFERENCES `keywords` (`id`) ON DELETE CASCADE,
CONSTRAINT `search_engine_rankings_search_engine_id_fk` FOREIGN KEY (`search_engine_id`) REFERENCES `search_engines` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=244454637 DEFAULT CHARSET=utf8
It has about 250M rows in production.
When I do:
select id,
rank
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very quickly.
When I add the url column (VARCHAR):
select id,
rank,
url
from search_engine_rankings
where keyword_id = 19
and search_engine_id = 11
and indexed_at = "2010-12-03";
...it runs very slowly.
Any ideas?
The first query can be satisfied by the index alone -- no need to read the base table to obtain the values in the Select clause. The second statement requires reads of the base table because the URL column is not part of the index.
UNIQUE KEY `unique_ranking` (`keyword_id`,`search_engine_id`,`rank`,`indexed_at`),
The rows in tbe base table are not in the same physical order as the rows in the index, and so the read of the base table can involve considerable disk-thrashing.
You can think of it as a kind of proof of optimization -- on the first query the disk-thrashing is avoided because the engine is smart enough to consult the index for the values requested in the select clause; it will already have read that index into RAM for the where clause, so it takes advantage of that fact.
Additionally to Tim's answer. An index in Mysql can only be used left-to-right. Which means it can use columns of your index in your WHERE clause only up to the point you use them.
Currently, your UNIQUE index is keyword_id,search_engine_id,rank,indexed_at. This will be able to filter the columns keyword_id and search_engine_id, still needing to scan over the remaining rows to filter for indexed_at
But if you change it to: keyword_id,search_engine_id,indexed_at,rank (just the order). This will be able to filter the columns keyword_id,search_engine_id and indexed_at
I believe it will be able to fully use that index to read the appropriate part of your table.
I know it's an old post but I was experiencing the same situation and I didn't found an answer.
This really happens in MySQL, when you have varchar columns it takes a lot of time processing. My query took about 20 sec to process 1.7M rows and now is about 1.9 sec.
Ok first of all, create a view from this query:
CREATE VIEW view_one AS
select id,rank
from search_engine_rankings
where keyword_id = 19000
and search_engine_id = 11
and indexed_at = "2010-12-03";
Second, same query but with an inner join:
select v.*, s.url
from view_one AS v
inner join search_engine_rankings s ON s.id=v.id;
TLDR: I solved this by running optimize on the table.
I experienced the same just now. Even lookups on primary key and selecting just some few rows was slow. Testing a bit, I found it not to be limited to the varchar column, selecting an int also took considerable amounts of time.
A query roughly looking like this took around 3s:
select someint from mytable where id in (1234, 12345, 123456).
While a query roughly looking like this took <10ms:
select count(*) from mytable where id in (1234, 12345, 123456).
The approved answer here is to just make an index spanning someint also, and it will be fast, as mysql can fetch all information it needs from the index and won't have to touch the table. That probably works in some settings, but I think it's a silly workaround - something is clearly wrong, it should not take three seconds to fetch three rows from a table! Besides, most applications just does a "select * from mytable", and doing changes at the application side is not always trivial.
After optimize table, both queries takes <10ms.

Will creating index help in this case

I'm still a learning user of SQL-SERVER2005.
Here is my table structure
CREATE TABLE [dbo].[Trn_PostingGroups](
[ControlGroup] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[PracticeCode] [char](5) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[ScanDate] [smalldatetime] NULL,
[DepositDate] [smalldatetime] NULL,
[NameOfFile] [varchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[DepositValue] [decimal](11, 2) NULL,
[RecordStatus] [char](1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
CONSTRAINT [PK_Trn_PostingGroups_1] PRIMARY KEY CLUSTERED
(
[ControlGroup] ASC,
[PracticeCode] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Scenario 1 : Suppose I have a query like this...
Select * from Trn_PostingGroups where PracticeCode = 'ABC'
Will indexing on Practice Code seperately help me in making my query faster??
Scenario 2 :
Select * from Trn_PostingGroups
where
ControlGroup = 12701
and PracticeCode = 'ABC'
and NameOfFile = 'FileName1'
Will indexing on NameOfFile seperately help me in making my query faster ??
If you were only selecting on the first field (ControlGroup), it is the primary sort of the clustered index and you wouldn't need to index the other field.
If you select on the other primary key fields, then adding a separate index on the other fields should help with such selects.
In general, you should index fields that are commonly used in SORT and WHERE clauses. This of course is over simplified.
See this article for more information about optimizing (statistics and query analyser).
You can only utilize one index per table per query (unless you consider self joins or CTEs). if you have multiple that can be used on the same table in the same query, then SQL Server will use statistics to determine which would be better to use.
In Scenario 1, if you create an index on PracticeCode alone, it will usually be used, as long as you have enough rows that a table scan costs more and that there is a diverse range of values in that column. An index will not be used if there are only a few rows in the table (it is faster to just look at them all). Also, an index will not be used if most of the values in that column are the same. It will not use the PK in this query, it would be like looking for a first name in the phone book, you can't use the index because it is last+first name. You might consider reversing your PK to PracticeCode+ControlGroup if you never search on ControlGroup by itself.
In Scenario 2, if you have an index on NameOfFile it will probably use the PK and ignore the NameOfFile index. Unless you make the NameOfFile index unique, and then it is a tossup. You might try to create an index (in addition to your PK) on ControlGroup+PracticeCode+NameOfFile. if you have many files per ControlGroup+PracticeCode, then it may select that index over the PK index.