suppose someone enter this search (on an form):
Nicole Kidman films
Which SQL i can use to find "the best" results ?
I suppose something like this :
SELECT * FROM myTable WHERE ( Field='%Nicole Kidman Films%' OR Field='%Nicole%' OR Field='%Kidman%' OR Field='%Films%' )
My question is how to get most relevant result ?
Thank you very much!
Full-Text Search:
SELECT * FROM myTable WHERE MATCH(Field) AGAINST('Nicole Kidman Films')
This query will return rows in order of relevancy, as defined by the full-text search algorithm.
Note: This is for MySQL, other DBMS have similar functionality.
What you're looking for is often called a "full text search" or a "natural language search". Unfortunately it's not standard SQL. Here's a tutorial on how to do it in mysql: http://devzone.zend.com/article/1304
You should be able to find examples for other database engines.
In SQL, the equals sign doesn't support wildcards in it - your query should really be:
SELECT *
FROM myTable
WHERE Field LIKE '%Nicole Kidman Films%'
OR Field LIKE '%Nicole%'
OR Field LIKE '%Kidman%'
OR Field LIKE '%Films%'
But wildcarding the left side won't use an index, if one exists.
A better approach is to use Full Text Searching, which most databases provide natively but there are 3rd party vendors like Sphinx. Each has it's own algorithm to assign a rank/score based on the criteria searched on in order to display what the algorithm deems most relevant.
Related
I have a repository of SQL queries and I want to understand which queries use certain tables or fields.
Let's say I want to understand what queries use the email field, how can I write it?
Example SQL query:
select
users.email as email_user
,users.email as email_user_too
,email as email_user_too_2
email as email_user_too_3,
back_email as wrong_email -- wrong field
from users
So to state the problem more accurately, you are sorting through a list of SQL queries [as text], and you now need to find the queries that use certain fields using SQL & RegEx (Regular Expressions) in PostgreSQL. (please tag the question so that StackOverflow indexes your question correctly, more importantly, readers have more context about the question)
PostgreSQL has Regular Expression support OOTB (Out Of The Box). So we skip exploring other ways to do this. (If you are reading this as Microsoft SQL Server person, then I strongly suggest you to have a read of this brilliant article on Microsoft's website on defining a Table-Valued UDF (User Defined Function))
The simplest way I could think of to approach your problem, is to throw away what we don't want out of the query text first, and then filter out what's left.
This way, after throwing away the stuff you don't need, you will be left with a set of "tokens" that you can easily filter, and I'm putting token in quotes since we are not really parsing the SQL language, but if we did that would be the first step: to extract tokens.. (:
Take this query for example:
With Queries (
Id
, QueryText
) As (
values (1, 'select
users.email as email_user
,users.email as email_user_too
,email as email_user_too_2,
email as email_user_too_3,
back_email as wrong_email -- wrong field
from users')
)
Select QueryText
, found
From (
Select Id
, QueryText
, regexp_split_to_table (QueryText, '(--[\s\w]+|select|from|as|where|[ \s\n,])') As found
From Queries
) As Result
Where found != ''
And found = 'back_email'
I have sourced the concept of a "query repository" with a WITH statement for ease of doing the pseudo-code.
I have also selected few words/characters to split QueryText with. Like select, where etc. We don't need these in our 'found' set.
And in the end, as you can see above, I simply used found as what's left and filtered it with the field name you are looking for. (Assuming that you know the field you are looking for)
You could improve upon the RegEx I did, or change the method as you wish to make it better. But I think the general concept addresses what you need to achieve. One problem I can see with my solution right off the bat is the fact that you can search for anything really, not just names of the selected fields - which begs the question, why use RegEx, and not Like statements? But again, as I mentioned, you can improve upon the RegEx and address specific requirements you may have. Using Like might limit you in that direction. (In other words, only you know what's good for you. I can't say that from here.)
You can play with the query online here: db-fiddle query and use https://regex101.com/ for testing your RegEx.
Disclaimer I'm not a PostgreSQL developer. There must be other, perhaps better ways of doing this. (:
Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;
I would like to create a SQL query, which does the following..
- I have a few parameters, for instance like "John","Smith"
- Now I have a articles tables with a content column, which I would like to be searched
- Now, How can I find out the rows in the articles table, which has the any one of those values("John","Smith")
I cannot use content LIKE "%john% or content LIKE "%smith%", as there could be any number of incoming parameters.
Can you guys please tell me a way to do this....
Thanks
Have you considered full-text search?
While HLGEM's solution is ideal, if full-text search is not possible, you could construct a regular expression that you could test only once per row. How exactly you do that depends on the DBMS you're using.
This depends a lot on the DBMS you're using. Generally - if you don't want to use full-text search - you can almost always use regular expressions to achive this goal. For MySQL see this manual page - they even have example answering your question.
If full text search is overkill, consider putting the parameters in a table and use LIKE in theJOIN` condition e.g.
SELECT * -- column list in production code
FROM Entities AS E1
INNER JOIN Params AS P1
ON E1.entity_name LIKE '%' + P1.param + '%';
I was curious since i read it in a doc. Does writing
select * from CONTACTS where id = ‘098’ and name like ‘Tom%’;
speed up the query as oppose to
select * from CONTACTS where name like ‘Tom%’ and id = ‘098’;
The first has an indexed column on the left side. Does it actually speed things up or is it superstition?
Using php and mysql
Check the query plans with explain. They should be exactly the same.
This is purely superstition. I see no reason that either query would differ in speed. If it was an OR query rather than an AND query however, then I could see that having it on the left may spped things up.
interesting question, i tried this once. query plans are the same (using EXPLAIN).
but considering short-circuit-evaluation i was wondering too why there is no difference (or does mysql fully evaluate boolean statements?)
You may be mis-remembering or mis-reading something else, regarding which side the wildcards are on a string literal in a Like predicate. Putting the wildcard on the right (as in yr example), allows the query engine to use any indices that might exist on the table column you are searching (in this case - name). But if you put the wildcard on the left,
select * from CONTACTS where name like ‘%Tom’ and id = ‘098’;
then the engine cannot use any existing index and must do a complete table scan.
I have a query. I am developing a site that has a search engine. The search engine is the main function. It is a business directory site.
A present I have basic search sql that searches the database using the "LIKE" keyword.
My client has asked me to change the search function so instead of using the "Like" keyword they are after something along the lines of the "Startswith" keyword.
To clarify this need here is an example.
If somebody types "plu" for plumbers in the textbox it currently returns, e.g.,
CENTRE STATE PLUMBING & ROOFING
PLUMBING UNLIMITED
The client only wants to return the "PLUMBING UNLIMITED" because it startswith "plu", and doesn't "contain" "plu"
I know this is a weird and maybe silly request, however does anyone have any example SQL code to point me in the right direction on how to achieve this goal.
Any help would be greatly appreciated, thanks...
how about this:
SELECT * FROM MyTable WHERE MyColumn LIKE 'PLU%'
please note that the % sign is only on the right side of the string
example in MS SQL
Instead of:
select * from professions where name like '%plu%'
, use a where clause without the leading %:
select * from professions where name like 'plu%'
LIKE won't give you the performance you need for a really effective search. Looking into something like Lucence or your engine's Full Text Search equivalent.