SQLITE FTS3 Query Slower than Standard Tabel - sql

I built sqlite3 from source to include the FTS3 support and then created a new table in an existing sqlite database containing 1.5million rows of data, using
CREATE VIRTUAL TABLE data USING FTS3(codes text);
Then used
INSERT INTO data(codes) SELECT originalcodes FROM original_data;
Then queried each table with
SELECT * FROM original_data WHERE originalcodes='RH12';
This comes back instantly as I have an index on that column
The query on the FTS3 table
SELECT * FROM data WHERE codes='RH12';
Takes almost 28 seconds
Can someone help explain what I have done wrong as I expected this to be significantly quicker

The documentation explains:
FTS tables can be queried efficiently using SELECT statements of two different forms:
Query by rowid. If the WHERE clause of the SELECT statement contains a sub-clause of the form "rowid = ?", where ? is an SQL expression, FTS is able to retrieve the requested row directly using the equivalent of an SQLite INTEGER PRIMARY KEY index.
Full-text query. If the WHERE clause of the SELECT statement contains a sub-clause of the form " MATCH ?", FTS is able to use the built-in full-text index to restrict the search to those documents that match the full-text query string specified as the right-hand operand of the MATCH clause.
If neither of these two query strategies can be used, all queries on FTS tables are implemented using a linear scan of the entire table.
For an efficient query, you should use
SELECT * FROM data WHERE codes MATCH 'RH12'
but this will find all records that contain the search string.
To do 'normal' queries efficiently, you have to keep a copy of the data in a normal table.
(If you want to save space, you can use a contentless or external content table.)

You should read documentation more carefully.
Any query against virtual FTS table using WHERE col = 'value' will be slow (except for query against ROWID), but query using WHERE col MATCH 'value' will be using FTS and fast.

I'm not an expert on this, but here are a few things to think about.
Your test is flawed (I think). You are contrasting a scenario where you have an exact text match (the index can be used on original_data - nothing is going to outperform this scenario) with an equality on the fts3 table (I'm not sure that FTS3 would even come into play in this type of query). If you want to compare apples to apples (to see the benefit of FTS3), you're going to want to compare a "like" operation on original_data against the FTS3 "match" operation on data.

Related

Send notifications to user when row with specific keywords is inserted

I am using SQL Server over Azure and would like to give the ability to users to define keywords and, when an article where the title matches one of those keywords, the user would receive an alert.
Of course there can be 100.000 users each with 100 keywords defined.
To do such a query every time an article is inserted is obviously not feasible.
My idea is to create a job that would run every hour or so but, since for many reasons that also doesn't strike me as ideal, I was wondering if anyone would suggest a better option. Ideally using the azure infrastructure and not only a SQL based solution.
This is a question about how to query in SQL with large number of data.
According to your description, we can use database index to improve the performance of query.
We can create an index in the keyword field and use T-SQL as below:
Select count(1) From T Where Keyword = XXX
In this way, the database engine will use index instead of full table scan.
In Azure SQL Db, we can create index using T-SQL: CREATE INDEX (Transact-SQL)
We can also use SSMS to create index, more information about index in Azure SQL Db, we can refer to: Clustered and Nonclustered Indexes Described
Here are some optimization methods for database query for you, hope it will be help:
1. To optimize the query, avoid full table scanning as much as possible, and first consider indexing the columns involved in where and order by.
2. The null value judgment of the field in the where clause should be avoided as far as possible. Otherwise, it will cause the engine to abandon the index and scan the whole table, such as:
Select id from t where num is null
You can set a default value of 0 on num to ensure that there is no null value in the num column in the table.
Select id from t where num=0
3. try to avoid using "=" or "> operator in the where clause, otherwise the engine will discard the index and perform the full table scan.
4. Use or to join conditions in where clauses should be avoided as far as possible, otherwise it will cause the engine to abandon the use of the index and perform a full table scan, such as:
Select id from t where num=10 or num=20
It can be inquired like this:
Select id from t where num=10
Union all
Select id from t where num=20
5.in and not in should also be used with caution, otherwise the whole table will be scanned, such as:
Select id from t where num in (1,2,3)
For continuous values, use between instead of in:
Select id from t where num between 1 and 3
The query under
6. will also result in full table scan:
Select id from t where name like'%abc%'
In order to improve the efficiency, the full text retrieval can be considered.
You could use Logic Apps for your use-case.
There is a SQL connector in the logic app where you can achieve your requirements easily.
I've done a sample below
Explaination
Creating a trigger for your SQL Table when any item got inserted into a particular table (Customer_Feedback)
Execute a Stored Procedure (action) and get back the result/output from that.The Stored Procedure can be a simple select statement with your requirements for searching a keyword. Please be sure to follow the Indexing as per Lee Liu Answer above
Adding a condition which will check the output of the stored procedure with the corresponding Keyword
If the Condition Satisfies then send mail to that user via send mail task
You can also modify this flow with your own creativity.

SQLite FTS3 - Full-Text Search over multiple tables

I have a brand_name column in my brand_names table and a product_name column in my product_names table.
At the moment, I have two separate SELECTs (one on brand_names.brand_name and one on product_names.product_name) and I use a UNION to OR the two resultsets together. However, when a search is made for "SomeBrandName Some Product Name", even though such a product exists, my SQL returns zero results (this is because the terms - SomeBrandName Some Product and Name - don't all appear in brand_names.brand_name and they don't all appear in product_names.product_name).
So I need help to work out SQLite / FTS3 equivalent of something like...
SELECT
(brand_names.brand_name || ' ' || product_names.product_name) AS brand_and_product_name
FROM
brand_names, product_names
WHERE
brand_and_product_name MATCH 'SomeBrandName Some Product Name'
What is the actual SQLite / FTS3 SQL that I need to achieve this?
In terms of research, I have read through the SQLite FTS3 guide but it doesn't mention multiple tables.
I've also seen a similar question which is a bit more advanced and so may well be overkill for the simple search I am trying to achieve here.
An FTS search can be done only in FTS indexes.
If you want to have a result for "Brand Product", you have to create an FTS table that contains these words in a single row.
(To reduce storage, try using an external content table on a view.)
For that you must have to prepare first virtual tables for both of your tables .
Then you can apply indexed search(FTS Search) on them using join same as you use join with simple table .

the best way to use LIKE query in websql.

I have a 2 tables that are old_test and new_test /bible database/
old_test table has 7959 rows
new_test table has 23145 rows
I want to use LIKE query to search verse from two tables.
For example:
SELECT *
FROM old_test
where text like "%'+searchword+'%"
union all
SELECT *
FROM new_test
where text like "%'+searchword+'%"
It works good but taking a lot of time to show the result.
What is the best solution to search much faster on above condition?
Thanks
Your query %searchword% cause table scan, it will get slower as number of records increase. Use searchword% query to get index base fast query.
What you need is full-text search, which is not available in websql.
I suggest my own open source library, https://github.com/yathit/ydn-db-fulltext for full-text search implementation. It works with newer IndexedDB API as well.
The main problem with your query is that you having to search entire fields segment by segment to find the string using like - building an index that can be queried instead should alleviate the problem.
Looking at Web SQL it uses the SQLite engine:
User agents must implement the SQL dialect supported by Sqlite 3.6.19.
http://www.w3.org/TR/webdatabase/#parsing-and-processing-sql-statements
Based on that, I would recommend trying to build a full-text index over the table to make these searches run quickly http://www.sqlite.org/fts3.html

How should I up a temp table used to search for matches in a larger table?

Table A has millions of rows of indexed phrases (1-5 words). I'm looking for matches to about 20-30 phrases, e.g., ('bird', 'cat', 'cow', 'purple rain', etc.). I know that the IN operator is generally a bad idea when the search set is large - so the solution is to create a temp table (in memory) and JOIN it against the table I'm looking for.
I can create a TEMP TABLE B using my search phrases, and I know that if I do the join, the SQL engine will work against the Table A indices. Does it make any difference at all to index TEMP TABLE B phrases?
Edit... I just realized you're asking about sqlite. I'd say the same principal of keeping the very small joined table in cache would still apply though.
When joining tables, SQL server will put the relevant contents of one table in cache, if possible. Your 20 to 30 phrases will certainly fit in cache, so there would really be no point of indexing. Indexing is useful for looking up values, but SQL server will already have these values in cache. Also, since SQL server reads data a page at a time (a page is 8K), it will be able to read that entire table in one read.
When you make your temp table, make sure to use the same datatype so SQL server doesn't have to convert values to match.
Why would IN be a bad idea when the search terms are many?
From what I understand when I read about the SQLite query planner, a list of IN(1,2,3,4,5,6,N) would generate the same query plan as a join against a temporary table with the same rows.
An index on a temporary search term table will not make the query any faster since you process all terms. Going via index only adds processing time.

What is a good way to optimize an Oracle query looking for a substring match?

I have a column in a non-partitioned Oracle table defined as VARCHAR2(50); the column has a standard b-tree index. I was wondering if there is an optimal way to query this column to determine whether it contains a given value. Here is the current query:
SELECT * FROM my_table m WHERE m.my_column LIKE '%'||v_value||'%';
I looked at Oracle Text, but that seems like overkill for such a small column. However, there are millions of records in this table so looking for substring matches is taking more time than I'd like. Is there a better way?
No.
That query is a table scan. If v_value is an actual word, then you may very well want to look at Oracle Text or a simple inverted index scheme you roll your on your own. But as is, it's horrible.
Oracle Text covers a number of different approaches, not all of them heavyweight. As your column is quite small you could index it with a CTXCAT index.
SELECT * FROM my_table m
WHERE catsearch(m.my_column, v_value, null) > 0
/
Unlike the other type of Text index, CTXCAT indexes are transactional, so they do not require synchronisation. Such indexes consume a lot of space, but that you have to pay some price for improved performance.
Find out more.
You have three choices:
live with it;
use something like Oracle Text for full-text searching; or
redefine the problem so you can implement a faster solution.
The simplest way to redefine the problem is to say the column has to start with the search term (so lose the first %), which will then use the index.
An alternative way is to say that the search starts on word boundaries (so "est" will match "estimate" but not "test"). MySQL (MyISAM) and SQL Server have functions that will do matching like this. Not sure if Oracle does. If it doesn't you could create a lookup table of words to search instead of the column itself and you could populate that table on a trigger.
You could put a function-based index on the column, using the REGEXP_LIKE function. You might need to create the fbi with a case statement to return '1' with a match, as boolean returning functions dont seem to be valid in fbi.
Here is an example.
Create the index:
CREATE INDEX regexp_like_on_myCol ON my_table (
CASE WHEN REGEXP_LIKE(my_column, '[static exp]', 'i')
THEN 1
END);
And then to use it, instead of:
SELECT * FROM my_table m WHERE m.my_column LIKE '%'||v_value||'%';
you will need to perform a query like the following:
SELECT * FROM my_table m WHERE (
CASE WHEN REGEXP_LIKE(m.my_column, '[static exp]', 'i')
THEN 1
END) IS NOT NULL;
A significant shortcomming in this approach is that you will need to know your '[static exp]' at the time that you create your index. If you are looking for a performance increase while performing ad hoc queries, this might not be the solution for you.
A bonus though, as the function name indicates, is that you have the opportunity to create this index using regex, which could be a powerful tool in the end. The evaluation hit will be taken when items are added to the table, not during the search.
You could try INSTR:
...WHERE INSTR(m.my_column, v_value) > 0
I don't have access to Oracle to test & find out if it is faster than LIKE with wildcarding.
For the most generic case where you do not know in advance the string you are searching for then the best access path you can hope for is a fast full index scan. You'd have to focus on keeping the index as small as possible, which might have it's own problems of course, and could look at a compressed index if the data is not very high cardinality.