Sqlite query optimisation needed - sql

I'm using sqlite for a small validation application. I have a simple one table database with 4 varhchar columns and one integer primary key. There are close to 1 million rows in the table. I have optimised it and done a vacuum on it.
I am using the following query to retrieve a presence count from the table. I have changed the fields and names for privacy.
SELECT
count(*) as 'test'
FROM
my_table
WHERE
LOWER(surname) = LOWER('Oliver')
AND
UPPER(address_line2) = UPPER('Somewhere over the rainbow')
AND
house_number IN ('3','4','5');
This query takes about 1.5-1.9 seconds to run. I have tried indexes and they make no difference really. This time may not sound bad but I have to run this test about 40,000 times on a read in csv file so as you may imagine it adds up pretty quickly. Any ideas on how to reduce the execution time. I normally develop in mssql or mysql so if there are some tricks I am missing in sqlite I would be happy to hear them.
All the best.

When you use a function over an indexed column, SQLite cannot use the index, because the function may not preserve the ordering -- i.e. there can be functions such as 1>2, but F(1)<F(2). There are some ways to solve this situation, though:
If you want to use indexes to make your query faster, you must save
the value in a fixed case (upper or lower) and then convert only the
query parameter to the same case:
SELECT count(*) as 'test'
FROM my_table
WHERE surname = LOWER('Oliver')
You can use the case-insensitive LIKE operator (I don't know how indexes are affected!):
SELECT count(*) as 'test'
FROM my_table
WHERE surname LIKE 'Oliver';
Or you can create each column as text collate nocase and don't worry about case differences regarding this column anymore:
CREATE TABLE my_table (surname text collate nocase, <... other fields here ...>);
SELECT count(*) as 'test'
FROM my_table
WHERE surname ='Oliver';
You can find more information about the = and LIKE operators here.

SELECT
count(1) as 'test'
FROM
my_table
WHERE
surname = 'Oliver'
AND
address_line2 = 'Somewhere over the rainbow'
AND
house_number IN ('3','4','5')
COLLATE NOCASE;

Related

Improving performance on an alphanumeric text search query

I have table where millions of records are there I'm just posting sample data. Actually I'm looking to get only Endorsement data by using LIKE or LEFT but there is no difference between them in Execution time. IS there any fine way to get data in less time while dealing with Alphanumeric Data. I have 4.4M records in table. Suggest me
declare #t table (val varchar(50))
insert into #t(val)values
('0-1AB11BC11yerw123Endorsement'),
('0-1AB114578Endorsement'),
('0-1BC11BC11yerw122553Endorsement'),
('0-1AB11BC11yerw123newBusiness'),
('0-1AB114578newBusiness'),
('0-1BC11BC11yerw122553newBusiness'),
('0-1AB11BC11yerw123Renewal'),
('0-1AB114578Renewal'),
('0-1BC11BC11yerw122553Renewal')
SELECT * FROM #t where RIGHT(val,11) = 'Endorsement'
SELECT * FROM #t where val like '%Endorsement%'
Imagine you'd have to find names in a telephone book that end with a certain string. All you could do is read every single name and compare. It doesn't help you at all to see where the names with A, B, C, etc. start, because you are not interested in the initial characters of the names but only in the last characters instead. Well, the only thing you could do to speed this up is ask some friends to help you and each person scans a range of pages only. In a DBMS it is the same. The DBMS performs a full table scan and does this parallelized if possible.
If however you had a telephone book listing the words backwards, so you'd see which words end with A, B, C, etc., that sure would help. In SQL Server: Create a computed column on the reverse string:
alter table t add reverse_val as reverse(val);
And add an index:
create index idx_reverse_val on t(reverse_val);
Then query the string with LIKE. The DBMS should notice that it can use the index for speeding up the search process.
select * from t where reverse_val like reverse('Endorsement') + '%';
Having said this, it seems strange that you are interested in the end of your strings at all. In a good database you store atomic information, e.g. you would not store a person's name and birthdate in the same column ('John Miller 12.12.2000'), but in separate columns instead. Sure, it does happen that you store names and want to look for names starting with, ending with, containing substrings, but this is a rare thing after all. Check your column and think about whether its content should be separate columns instead. If you had the string ('Endorsement', 'Renewal', etc.) in a separate column, this would really speed up the lookup, because all you'd have to do is ask where val = 'Endorsement' and with an index on that column this is a super-simple task for the DBMS.
try charindex or patindex:
SELECT *
FROM #t t
WHERE CHARINDEX('endorsement', t.val) > 0
SELECT *
FROM #t t
WHERE PATINDEX('%endorsement%', t.val) > 0
CREATE TABLE tbl
(val varchar(50));
insert into tbl(val)values
('0-1AB11BC11yerw123Endorsement'),
('0-1AB114578Endorsement'),
('0-1BC11BC11yerw122553Endorsement'),
('0-1AB11BC11yerw123newBusiness'),
('0-1AB114578newBusiness'),
('0-1BC11BC11yerw122553newBusiness'),
('0-1AB11BC11yerw123Renewal'),
('0-1AB114578Renewal'),
('0-1BC11BC11yerw122553Renewal');
CREATE CLUSTERED INDEX inx
ON dbo.tbl(val)
SELECT * FROM tbl where val like '%Endorsement';
--LIKE '%Endorsement' will give better performance it will utilize the index well efficiently than RIGHT(val,ll)

Fastest way to check if any case of a pattern exist in a column using SQL

I am trying to write code that allows me to check if there are any cases of a particular pattern inside a table.
The way I am currently doing is with something like
select count(*)
from database.table
where column like (some pattern)
and seeing if the count is greater than 0.
I am curious to see if there is any way I can speed up this process as this type of pattern finding happens in a loop in my query and all I need to know is if there is even one such case rather than the total number of cases.
Any suggestions will be appreciated.
EDIT: I am running this inside a Teradata stored procedure for the purpose of data quality validation.
Using EXISTS will be faster if you don't actually need to know how many matches there are. Something like this would work:
IF EXISTS (
SELECT *
FROM bigTbl
WHERE label LIKE '%test%'
)
SELECT 'match'
ELSE
SELECT 'no match'
This is faster because once it finds a single match it can return a result.
If you don't need the actual count, the most efficient way in Teradata will use EXISTS:
select 1
where exists
( select *
from database.table
where column like (some pattern)
)
This will return an empty result set if the pattern doesn't exist.
In terms of performance, a better approach is to:
select the result set based on your pattern;
limit the result set's size to 1.
Check whether a result was returned.
Doing this prevents the database engine from having to do a full table scan, and the query will return as soon as the first matching record is encountered.
The actual query depends on the database you're using. In MySQL, it would look something like:
SELECT id FROM database.table WHERE column LIKE '%some pattern%' LIMIT 1;
In Oracle it would look like this:
SELECT id FROM database.table WHERE column LIKE '%some pattern%' AND ROWNUM = 1;

One select for multiple records by composite key

Such a query as in the title would look like this I guess:
select * from table t where (t.k1='apple' and t.k2='pie') or (t.k1='strawberry' and t.k2='shortcake')
... --10000 more key pairs here
This looks quite verbose to me. Any better alternatives? (Currently using SQLite, might use MYSQL/Oracle.)
You can use for example this on Oracle, i assume that if you use regular concatenate() instead of Oracle's || on other DB, it would work too (as it is simply just a string comparison with the IN list). Note that such query might have suboptimal execution plan.
SELECT *
FROM
TABLE t
WHERE
t.k1||','||t.k2 IN ('apple,pie',
'strawberry,shortcake' );
But if you have your value list stored in other table, Oracle supports also the format below.
SELECT *
FROM
TABLE t
WHERE (t.k1,t.k2) IN ( SELECT x.k1, x.k2 FROM x );
Don't be afraid of verbose syntax. Concatenation tricks can easily mess up the selectivity estimates or even prevent the database from using indexes.
Here is another syntax that may or may not work in your database.
select *
from table t
where (k1, k2) in(
('apple', 'pie')
,('strawberry', 'shortcake')
,('banana', 'split')
,('raspberry', 'vodka')
,('melon', 'shot')
);
A final comment is that if you find yourself wanting to submit 1000 values as filters you should most likely look for a different approach all together :)
select * from table t
where (t.k1+':'+t.k2)
in ('strawberry:shortcake','apple:pie','banana:split','etc:etc')
This will work in most of the cases as it concatenate and finds in as one column
off-course you need to choose a proper separator which will never come in the value of k1 and k2.
for e.g. if k1 and k2 are of type int you can take any character as separator
SELECT * FROM tableName t
WHERE t.k1=( CASE WHEN t.k2=VALUE THEN someValue
WHEN t.k2=otherVALUE THEN someotherValue END)
- SQL FIDDLE

Need to UPPER SQL statement with INNER JOIN SELECT

I'm using Pervasive SQL 10.3 (let's just call it MS SQL since almost everything is the same regarding syntax) and I have a query to find duplicate customers using their email address as the duplicate key:
SELECT arcus.idcust, arcus.email2
FROM arcus
INNER JOIN (
SELECT arcus.email2, COUNT(*)
FROM arcus WHERE RTRIM(arcus.email2) != ''
GROUP BY arcus.email2 HAVING COUNT(*)>1
) dt
ON arcus.email2=dt.email2
ORDER BY arcus.email2";
My problem is that I need to do a case insensitive search on the email2 field. I'm required to have UPPER() for the conversion of those fields.
I'm a little stuck on how to do an UPPER() in this query. I've tried all sorts of combinations including one that I thought for sure would work:
... ON UPPER(arcus.email2)=UPPER(dt.email2) ...
... but that didn't work. It took it as a valid query, but it ran for so long I eventually gave up and stopped it.
Any idea of how to do the UPPER conversion on the email2 field?
Thanks!
If your database is set up to be case sensitive, then your inner query will have to take account of this to perform the grouping as you intended. If it is not case sensitive, then you won't require UPPER functions.
Assuming your database IS case sensitive, you could try the query below. Maybe this will run faster...
SELECT arcus.idcust, arcus.email2
FROM arcus
INNER JOIN (
SELECT UPPER(arcus.email2) as upperEmail2, COUNT(*)
FROM arcus WHERE RTRIM(arcus.email2) != ''
GROUP BY UPPER(arcus.email2) HAVING COUNT(*)>1
) dt
ON UPPER(arcus.email2) = dt.upperEmail2
Check out this blog post which discusses case insensitive searches in SQL. In essence, the reason why it was so slow was that most likely none of the current table indexes could be used in the query, so the database engine had to perform a full table scan, likely multiple times.
An index on arcus.email2 is completely useless when wanting to compare between the uppercased versions (UPPER(arcus.email2)), because the database engine cannot look up the values in the index (because they're different values!).
To improve the performance, you can create an index specifically on the result of applying UPPER to the field.
CREATE INDEX IX_arcus_UPPER_email2
ON arcus (UPPER(email2));
The collation of a character string will determine how SQL Server compares character strings. If you store your data using a case-insensitive format then when comparing the character string “AAAA” and “aaaa” they will be equal. You can place a collate Latin1_General_CI_AS for your email column in the where clause.
Check the link below for how to implement collation in a sql query.
How to do a case sensitive search in WHERE clause

Get all records that contain a number

It is possible to write a query get all that record from a table where a certain field contains a numeric value?
something like "select street from tbladdress where street like '%0%' or street like '%1%' ect ect"
only then with one function?
Try this
declare #t table(street varchar(50))
insert into #t
select 'this address is 45/5, Some Road' union all
select 'this address is only text'
select street from #t
where street like '%[0-9]%'
street
this address is 45/5, Some Road
Yes, but it will be inefficient, and probably slow, with a wildcard on the leading edge of the pattern
LIKE '%[0-9]%'
Searching for text within a column is horrendously inefficient and does not scale well (per-row functions, as a rule, all have this problem).
What you should be doing is trading disk space (which is cheap) for performance (which is never cheap) by creating a new column, hasNumerics for example, adding an index to it, then using an insert/update trigger to set it based on the data going into the real column.
This means the calculation is done only when the row is created or modified, not every single time you extract the data. Databases are almost always read far more often than they're written and using this solution allows you to amortize the cost of the calculation over many select statement executions.
Then, when you want your data, just use:
select * from mytable where hasNumerics = 1; -- or true or ...
and watch it leave a regular expression query or like '%...%' monstrosity in its dust.
To fetch rows that contain only numbers,use this query
select street
from tbladdress
where upper(street) = lower(street)
Works in oracle .
I found this solution " select street from tbladresse with(nolock) where patindex('%[0-9]%',street) = 1"
it took me 2 mins to search 3 million on an unindexed field