I am not a SQL Expert. I’m trying to elegantly solve a query problem that others have had to have had. Surprisingly, Google is not returning anything that is helping. Basically, my application has a “search” box. This search field will allow a user to search for customers in the system. I have a table called “Customer” in my SQL Server 2008 database. This table is defined as follows:
Customer
UserName (nvarchar)
FirstName (nvarchar)
LastName (nvarchar)
As you can imagine, my users will enter queries of varying cases and probably mis-spell the customer’s names regularly. How do I query my customer table and return the 25 results that are closest to their query? I have no idea how to do this ranking and consider the three fields listed in my table.
Thank you!
I would suggest full-text search. Full-text search will provide plenty of options for dealing with some name variants and can rank the "closeness" of the results using CONTAINSTABLE. If you find that full-text search is not sufficient, you might consider a third-party indexing tool like Lucene.
You might want to try using SOUNDEX or DIFFERENCE as an alternative to full text search.
SELECT TOP 25 UserName, FirstName, LastName
FROM Customer
WHERE DIFFERENCE( UserName, #SearchValue ) > 2
ORDER BY DIFFERENCE( UserName, #SearchValue ), UserName
The case issue you can solve easy by setting your table collation to be case insensitive
The misspelling not sure how to handle but have a look at the full text search capabilities of sql server..
Related
I have a repository of SQL queries and I want to understand which queries use certain tables or fields.
Let's say I want to understand what queries use the email field, how can I write it?
Example SQL query:
select
users.email as email_user
,users.email as email_user_too
,email as email_user_too_2
email as email_user_too_3,
back_email as wrong_email -- wrong field
from users
So to state the problem more accurately, you are sorting through a list of SQL queries [as text], and you now need to find the queries that use certain fields using SQL & RegEx (Regular Expressions) in PostgreSQL. (please tag the question so that StackOverflow indexes your question correctly, more importantly, readers have more context about the question)
PostgreSQL has Regular Expression support OOTB (Out Of The Box). So we skip exploring other ways to do this. (If you are reading this as Microsoft SQL Server person, then I strongly suggest you to have a read of this brilliant article on Microsoft's website on defining a Table-Valued UDF (User Defined Function))
The simplest way I could think of to approach your problem, is to throw away what we don't want out of the query text first, and then filter out what's left.
This way, after throwing away the stuff you don't need, you will be left with a set of "tokens" that you can easily filter, and I'm putting token in quotes since we are not really parsing the SQL language, but if we did that would be the first step: to extract tokens.. (:
Take this query for example:
With Queries (
Id
, QueryText
) As (
values (1, 'select
users.email as email_user
,users.email as email_user_too
,email as email_user_too_2,
email as email_user_too_3,
back_email as wrong_email -- wrong field
from users')
)
Select QueryText
, found
From (
Select Id
, QueryText
, regexp_split_to_table (QueryText, '(--[\s\w]+|select|from|as|where|[ \s\n,])') As found
From Queries
) As Result
Where found != ''
And found = 'back_email'
I have sourced the concept of a "query repository" with a WITH statement for ease of doing the pseudo-code.
I have also selected few words/characters to split QueryText with. Like select, where etc. We don't need these in our 'found' set.
And in the end, as you can see above, I simply used found as what's left and filtered it with the field name you are looking for. (Assuming that you know the field you are looking for)
You could improve upon the RegEx I did, or change the method as you wish to make it better. But I think the general concept addresses what you need to achieve. One problem I can see with my solution right off the bat is the fact that you can search for anything really, not just names of the selected fields - which begs the question, why use RegEx, and not Like statements? But again, as I mentioned, you can improve upon the RegEx and address specific requirements you may have. Using Like might limit you in that direction. (In other words, only you know what's good for you. I can't say that from here.)
You can play with the query online here: db-fiddle query and use https://regex101.com/ for testing your RegEx.
Disclaimer I'm not a PostgreSQL developer. There must be other, perhaps better ways of doing this. (:
I'm trying to convert some searching C# code over to an old SQL project but I don't know how to:
split strings in SQL
perform a "WHERE LIKE IN (#subTerms)"
Edit: I'm using MS SQL Server 2012.
Sorry I thought my question had enough information, the corresponding SQL table would look like:
TableName: Products
Columns: ID (int), Name (varchar)
Data:
MyProduct
My Stuff
Super Product
My SQL would look something like:
DECLARE #term varchar;
SET #term = "My Product"
DECLARE #subTerms varchar/array;
SET #subTerms = ??? (Split Term by ' ')
SELECT *
FROM Products
WHERE Name LIKE IN (#subTerms)
The term I have provided should pull all of the records available in the database.
Hard to say exactly, based on what you've written, but it should be something like this:
SELECT * FROM Items WHERE Name in ('MyProduct', 'My Stuff', 'Super Product')
By the way, I've found it LINQPad to be very useful in situations like this: you can point it at a database, write a LINQ query, and then click the "SQL" tab to see what SQL was produced for that query. The SQL tends to be overly verbose, but you can usually get a pretty good idea of what it's doing and come up with a simplified version yourself.
Update
Based on the updated question, it's more clear what you want to do. I would recommend that you create a full-text search capable field, and use FREETEXT to query it. This is exactly the sort of thing that full-text search was made for.
Any text, including words, phrases or sentences, can be entered. Matches are generated if any term or the forms of any term is found in the full-text index.
I am trying to generate a query where I want to select columns(text) matching multiple values.
eg: I have two columns, id and description. suppose my first row contains description column with value
Google is website and an awesome
search engine
, second row with description column value
Amazon website is an awesome eCommerce
store
I have created a query
Select * from table_name where
description REGEXP 'Website \|
Search'
It returns both the rows i.e with both google and amazon, but i want to return only google as i want those rows with both the words website and search also the number of words to be matched is also not fixed, basically the query I am creating is for a search drop down,
All the words that are passed should be present in the column, the order of the words present in the column is not important. If there are other better options besides using regex , please do point out.
Editing: the number of words that are passed are dynamic and not known, the user may pass additional words to be matched against the column. I would be using the query within a stored Procedure
Really don;t think the regex solution is going to be good for you from a performance point of view. Think you should be looking for FULL text searches.
Specifically you need to create a full text index with something like this in the table definition:
create table testTable
(
Id int auto_increment not null,
TextCol varchar(500)
fulltext(TextCol)
);
Then your query gets easier:
select * from testTable where Match(TextCol) against ('web')
AND Match(TextCol) against ('server')
Strongly suggest you read the MySQL docs regarding FULLTEXT matching and there are lots of little tricks and features that will be useful in this task (including more efficient ways to run the query above)
Edit: Perhaps Boolean mode will help you to an easy solution like this:
Match(textCol) against ('web+ Server+' in boolean mode)
All you have to do is build the against string so I think this can be done in an SP with out dynamic SQL
suppose someone enter this search (on an form):
Nicole Kidman films
Which SQL i can use to find "the best" results ?
I suppose something like this :
SELECT * FROM myTable WHERE ( Field='%Nicole Kidman Films%' OR Field='%Nicole%' OR Field='%Kidman%' OR Field='%Films%' )
My question is how to get most relevant result ?
Thank you very much!
Full-Text Search:
SELECT * FROM myTable WHERE MATCH(Field) AGAINST('Nicole Kidman Films')
This query will return rows in order of relevancy, as defined by the full-text search algorithm.
Note: This is for MySQL, other DBMS have similar functionality.
What you're looking for is often called a "full text search" or a "natural language search". Unfortunately it's not standard SQL. Here's a tutorial on how to do it in mysql: http://devzone.zend.com/article/1304
You should be able to find examples for other database engines.
In SQL, the equals sign doesn't support wildcards in it - your query should really be:
SELECT *
FROM myTable
WHERE Field LIKE '%Nicole Kidman Films%'
OR Field LIKE '%Nicole%'
OR Field LIKE '%Kidman%'
OR Field LIKE '%Films%'
But wildcarding the left side won't use an index, if one exists.
A better approach is to use Full Text Searching, which most databases provide natively but there are 3rd party vendors like Sphinx. Each has it's own algorithm to assign a rank/score based on the criteria searched on in order to display what the algorithm deems most relevant.
I've been asked to put together a search for one of our databases.
The criteria is the user types into a search box, SQL then needs to split up all the words in the search and search for each of them across multiple fields (Probably 2 or 3), it then needs to weight the results for example the result where all the words appear will be the top result and if only 1 word appears it will be weighted lower.
For example if you search for "This is a demo post"
The results would be ranked like this
Rank Field1 Field2
1: "This is a demo post" ""
2: "demo post" ""
3: "demo" "post"
4: "post" ""
Hope that makes some sort of sense, its kind of a base Google like search.
Anyway I can think of doing this is very messy.
Any suggestions would be great.
"Google-like search" means: fulltext search. Check it out!
Understanding fulltext indexing on SQL Server
Understanding SQL Server full-text indexing
Getting started with SQL Server 2005 fulltext searching
SQL Server fulltext search: language features
With SQL Server 2008, it's totally integrated into the SQL Server engine.
Before that, it was a bit of a quirky add-on. Another good reason to upgrade to SQL Server 2008! (and the SP1 is out already, too!)
Marc
Logically you can do this reasonably easily, although it may get hard to optimise - especially if someone uses a particularly long phrase.
Here's a basic example based on a table I have to hand...
SELECT TOP 100 Score, Forename FROM
(
SELECT
CASE
WHEN Forename LIKE '%Kerry James%' THEN 100
WHEN Forename LIKE '%Kerry%' AND Forename LIKE '%James%' THEN 75
WHEN Forename LIKE '%Kerry%' THEN 50
WHEN Forename LIKE '%James%' THEN 50
END AS Score,
Forename
FROM
tblPerson
) [Query]
WHERE
Score > 0
ORDER BY
Score DESC
In this example, I'm saying that an exact match is worth 100, a match with both terms (but not together) is worth 75 and a match of a single word is worth 50. You can make this as complicated as you wish and even include SOUNDEX matches too - but this is a simple example to point you in the right direction.
I ended up creating a full text index on the table and joining my search results to FREETEXTTABLE, allowing me to see the ranked value of each result
The SQL ended up looking something like this
SELECT
Msgs.RecordId,
Msgs.Title,
Msgs.Body
FROM
[Messages] AS Msgs
INNER JOIN FREETEXTTABLE([Messages],Title,#SearchText) AS TitleRanks ON Msgs.RecordId = TitleRanks.[Key]
ORDER BY
TitleRanks.[Key] DESC
I've used full text indexes in the past but never realised you could use FullTextTable like that, was very impressed with how easy it was to code and how well it works.