I have a brand_name column in my brand_names table and a product_name column in my product_names table.
At the moment, I have two separate SELECTs (one on brand_names.brand_name and one on product_names.product_name) and I use a UNION to OR the two resultsets together. However, when a search is made for "SomeBrandName Some Product Name", even though such a product exists, my SQL returns zero results (this is because the terms - SomeBrandName Some Product and Name - don't all appear in brand_names.brand_name and they don't all appear in product_names.product_name).
So I need help to work out SQLite / FTS3 equivalent of something like...
SELECT
(brand_names.brand_name || ' ' || product_names.product_name) AS brand_and_product_name
FROM
brand_names, product_names
WHERE
brand_and_product_name MATCH 'SomeBrandName Some Product Name'
What is the actual SQLite / FTS3 SQL that I need to achieve this?
In terms of research, I have read through the SQLite FTS3 guide but it doesn't mention multiple tables.
I've also seen a similar question which is a bit more advanced and so may well be overkill for the simple search I am trying to achieve here.
An FTS search can be done only in FTS indexes.
If you want to have a result for "Brand Product", you have to create an FTS table that contains these words in a single row.
(To reduce storage, try using an external content table on a view.)
For that you must have to prepare first virtual tables for both of your tables .
Then you can apply indexed search(FTS Search) on them using join same as you use join with simple table .
Related
I have a question for text analysis and database experts. I would like to match person names from one database table to text articles in another table. For example:
SELECT text FROM article
INNER JOIN person
ON article.text LIKE "%" || person.name || "%"
This method is very slow on any database I tried, like Netezza, Redshift and traditional RDS's like MySQL or SQL server.
What system is best suited for queries like this?
It is slow because you don't have an index and every query ends up in multiple full table scans. You can stay with a RDMBS if you like, the only thing you need to create is an index table
This table looks like this:
word varchar(n),
document_id int
The idea is to creaet for every word in a document one entry in this tabe where document_id points to the row in your source table
Then you create a db index in the word column and you end up with O(n log n) time complexity for your query
You can also try ibm db2 text search or similar tools from other vendors which basically are doing the same
I have a situation where I am trying to JOIN two tables based on partially matching text data. I have read the question Using Full-Text Search in SQL Server 2005 across multiple tables, columns and it appears that my best option is to create a VIEW and add a full-text index on the VIEW.
Let me start by giving a little background of the situation. I have an Excel spreadsheet that I need to calculate some pricing for drugs, but the drug names in the spreadsheet do not match exactly to the database where I am pulling the pricing information. So I figured that using full-text search may be the way to go.
What I have done so far, is exported the spreadsheet as a CSV file and used BULK INSERT to import the data into my database. Now, my drug database has a primary key on NDC, but that information is not available on the spreadsheet unfortunately, or my job would be much easier.
I need to basically be able to match 'AMLODIPINE TAB 5MG' and 'AMLODIPINE BESYLATE 5MG TAB'. This is just one example, but the other drugs are similar. My issue is that I'm not even sure how I would be able to create a VIEW in order to add both columns, without them matching.
Is there a way to use a full-text search in a JOIN statement, something like:
SELECT i.Description, m.ProdDescAbbr
FROM dbo.ImportTable i
LEFT JOIN dbo.ManufNames m ON m.ProdDescAbbr <something similar to> i.Description
EDIT:
Not all of the drug names will contain extra words, another example that I am trying to match is: 'ACYCLOVIR TAB 800MG' AND 'ACYCLOVIR 800MG TAB'
in my work I saw this (fancy for me) function CONTAINSTABLE, which uses full text index. Maybe to much complicated function for this situation, but I wanted to share.
Returns a table of zero, one, or more rows for those columns containing precise or fuzzy (less precise) matches to single words and phrases, the proximity of words within a certain distance of one another, or weighted matches
Overall I see that you will need to prepare search condition (make it text) before looking for it.
Example:
SELECT select_list
FROM table AS FT_TBL
INNER JOIN CONTAINSTABLE(table, column, contains_search_condition) AS KEY_TBL
ON FT_TBL.unique_key_column = KEY_TBL.[KEY];
source http://msdn.microsoft.com/en-us/library/ms189760.aspx
You can add a
CREATE VIEW view_name WITH SCHEMABINDING
AS
in front of your SQL to create the view. Then you could
CREATE UNIQUE CLUSTERED INDEX idx_name
ON view_name(Description, ProdDescAbbr)
Then you can
CREATE FULLTEXT INDEX ON view_name
That will let you run a search with
WHERE CONTAINS( (Description, ProdDescAbbr), 'search_term')
I have a postgresql view that is comprised as a combination of 3 tables:
create view search_view as
select u.first_name, u.last_name, a.notes, a.summary, a.search_index
from user as u, assessor as a, connector as c
where a.connector_id = c.id and c.user_id = u.id;
However, I need to concat tsvector fields from 2 of the 3 table into a single tsvector field in the view which provides full text search across 4 fields: 2 from one table, and 2 from another.
I've read the documentation stating that I can use the concat operator to combine two tsvector fields, but I'm not certain what this looks like syntactically, and also whether there are potential gotchas with this implementation.
I'm looking for example code that concats two tsvector fields from separate tables into a view, and also commentary on whether this is a good or bad practice in postgresql land.
I was wondering the same thing. I don't think we are supposed to be combining tsvectors from multiple tables like this. Best solution is to:
create a new tsv column in each of your tables (user, assessor, connector)
update the new tsv column in each table with all of the text you want to search. for example in the user table you would update the tsv column of all records concatenating first_name and last_name columns.
create an index on the new tsv column, this will be faster than indexing on the individual columns
Run your queries as usual, and let Postgres do the "thinking" about which indexes to use. It may or may not use all indexes in queries involving more than one table.
use the ANALYZE and EXPLAIN commands to look at how Postgres is utilizing your new indexes for particular queries, and this will give you insight into speeding things up further.
This will be my approach at least. I to have been doing lots of reading and have found that people aren't combining data from multiple tables into tsvectors. In fact I don't think this is possible, it may only be possible to use the columns of the current table when creating a tsvector.
Concatenating tsv vectors works but as per comments, index is probably not used this way (not an expert, can't say if it does or does not).
SELECT * FROM newsletters
LEFT JOIN campaigns ON newsletters.campaign_id=campaigns.id
WHERE newsletters.tsv || campaigns.tsv ## to_tsquery(unaccent(?))
The reason why you'd want this is to search for an AND string like txt1 & txt2 & txt 3 which is very common usage scenario. If you simpy split the search by an OR WHERE campaigns.tsv ## to_tsquery(unaccent(?) this won't work because it will try to match all 3 tokens in both tsv column but the tokens could be in either column.
One solution which I found is to use triggers to insert and update the tsv column in table1 whenever the table2 changes, see: https://dba.stackexchange.com/questions/154011/postgresql-full-text-search-tsv-column-trigger-with-many-to-many but this is not a definitive answer and using that many triggers is error prone and hacky.
Official documentation and some tutorials also show concatenating all the wanted colums into a ts vector on the fly without using a tsv column. But it is unclear how much slower is the on-the-fly versus tsv column approach, I can't find a single benchmark or explanation about this. The documenntation simply states:
Another advantage is that searches will be faster, since it will not
be necessary to redo the to_tsvector calls to verify index matches.
(This is more important when using a GiST index than a GIN index; see
Section 12.9.) The expression-index approach is simpler to set up,
however, and it requires less disk space since the tsvector
representation is not stored explicitly.
All I can tell from this is that tsv columns are probably waste of resources and just complicate things but it'd be nice to see some hard numbers. But if you can concat tsv columns like this, then I guess it's no different than doing it in a WHERE clause.
I have a 2 tables that are old_test and new_test /bible database/
old_test table has 7959 rows
new_test table has 23145 rows
I want to use LIKE query to search verse from two tables.
For example:
SELECT *
FROM old_test
where text like "%'+searchword+'%"
union all
SELECT *
FROM new_test
where text like "%'+searchword+'%"
It works good but taking a lot of time to show the result.
What is the best solution to search much faster on above condition?
Thanks
Your query %searchword% cause table scan, it will get slower as number of records increase. Use searchword% query to get index base fast query.
What you need is full-text search, which is not available in websql.
I suggest my own open source library, https://github.com/yathit/ydn-db-fulltext for full-text search implementation. It works with newer IndexedDB API as well.
The main problem with your query is that you having to search entire fields segment by segment to find the string using like - building an index that can be queried instead should alleviate the problem.
Looking at Web SQL it uses the SQLite engine:
User agents must implement the SQL dialect supported by Sqlite 3.6.19.
http://www.w3.org/TR/webdatabase/#parsing-and-processing-sql-statements
Based on that, I would recommend trying to build a full-text index over the table to make these searches run quickly http://www.sqlite.org/fts3.html
I built sqlite3 from source to include the FTS3 support and then created a new table in an existing sqlite database containing 1.5million rows of data, using
CREATE VIRTUAL TABLE data USING FTS3(codes text);
Then used
INSERT INTO data(codes) SELECT originalcodes FROM original_data;
Then queried each table with
SELECT * FROM original_data WHERE originalcodes='RH12';
This comes back instantly as I have an index on that column
The query on the FTS3 table
SELECT * FROM data WHERE codes='RH12';
Takes almost 28 seconds
Can someone help explain what I have done wrong as I expected this to be significantly quicker
The documentation explains:
FTS tables can be queried efficiently using SELECT statements of two different forms:
Query by rowid. If the WHERE clause of the SELECT statement contains a sub-clause of the form "rowid = ?", where ? is an SQL expression, FTS is able to retrieve the requested row directly using the equivalent of an SQLite INTEGER PRIMARY KEY index.
Full-text query. If the WHERE clause of the SELECT statement contains a sub-clause of the form " MATCH ?", FTS is able to use the built-in full-text index to restrict the search to those documents that match the full-text query string specified as the right-hand operand of the MATCH clause.
If neither of these two query strategies can be used, all queries on FTS tables are implemented using a linear scan of the entire table.
For an efficient query, you should use
SELECT * FROM data WHERE codes MATCH 'RH12'
but this will find all records that contain the search string.
To do 'normal' queries efficiently, you have to keep a copy of the data in a normal table.
(If you want to save space, you can use a contentless or external content table.)
You should read documentation more carefully.
Any query against virtual FTS table using WHERE col = 'value' will be slow (except for query against ROWID), but query using WHERE col MATCH 'value' will be using FTS and fast.
I'm not an expert on this, but here are a few things to think about.
Your test is flawed (I think). You are contrasting a scenario where you have an exact text match (the index can be used on original_data - nothing is going to outperform this scenario) with an equality on the fts3 table (I'm not sure that FTS3 would even come into play in this type of query). If you want to compare apples to apples (to see the benefit of FTS3), you're going to want to compare a "like" operation on original_data against the FTS3 "match" operation on data.