This question already has answers here:
Fastest way to find string by substring in SQL?
(7 answers)
Closed 8 years ago.
I have a database table that looks like this i.e. address is a free text field (I did not design it):
5 records from the Address Table:
1 The street
2 Pine Street,Lincoln,Lincolnshire
77 Drove Way,Grantham
Drove Way Lincoln
Some house on Ambleside
I have an application that has an address field that is free text (again I did not design it). I want a user to start typing an address into the address field and then a list of Possibles to appear (hopefully just one). I have thought of a few ways to approach this:
1) Use a LIKE statement e.g. select * FROM dbaddress where address like '%1 The Street%'. This seems like a bad idea.
2) Free text search. I have not used this before.
Which is the "better option" for my requirements. Is there an alternative approach?
I have had something like this before and free text was a better a option in my case. If you can use LIKE 'abc%' which you use an index is also a better option.
For a field like address it's better to search with begin with.
At the end based on your needs, If I were you I would execute both queries and compare them in execution plan.
Related
In Sql Server, I have a table containing 46 million rows.
In "Title" column of table, I want make search. The word may be at any index of field value.
For example:
Value in table: BROTHERS COMPANY
Search string: ROTHER
I want this search to match the given record. This is exactly what LIKE '%ROTHER%' do. However, LIKE '%%' usage should not be used on large tables because of performance issues. How can I achieve it?
Though I don't know your requirements, your best approach may be to challenge them. Middle-of-the-string searches are usually not very practical. If you can get your users to perform prefix searches (broth%) then you can easily use Full Text's wildcard search (CONTAINS(*, '"broth*"')). Full Text can also handle suffix searches (%rothers) with a little extra work.
But when it comes to middle-of-the-string searches with SQL Server, you're stuck using LIKE. However you may be able to improve performance of LIKE by using a binary collation as explained in this article. (I hate to post a link without including its content but it is way too long of an article to post here and I don't understand the approach enough to sum it up.)
If that doesn't help and if middle-of-the-string searches are that important of a requirement then you should consider using a different search solution like Lucene.
Add Full-Text index if you want.
You can search the table using CONTAINS:
SELECT *
FROM YourTable
WHERE CONTAINS(TableColumnName, 'SearchItem')
I work for an organization that has a serious data quality problem with names. There are fifteen databases that contain information about people. For example:
Database 1
Name=Fre&d Blo-ggs DOB 01/01/1980
Database 2
Name=Freddy Bloggs DOB 01/01/1980
If a user searches for Fred Bloggs using my search tool then I want both records to be found. I was thinking about something like this:
SELECT * FROM Person WHERE Soundex('Fred Bloggs') = Soundex('Fre&d Blo-ggs')
Is it advisable to use Soundex like this rather than using replace statements like this:
select Replace(Replace(Replace(Name,',',''),'&',''),'#') from Person
where Replace(Replace(Replace(Name,',',''),'&',''),'#') = #Name
#Name is the variable passed in. Is there a better way of doing it e.g. using regular expressions? Does Soundex affect performance.
Nice idea. I would not suggest using it though. I suppose that "John Right" is not the same as "John Write", even though they hear the same. I mean that in the end, what it matters is what you want to compare.... If you want to compare if the name sounds are the same, then SOUNDEX is fine.
However, I would suggest correcting your data somehow. This would be a real solution, although I can imagine that is not an easy one.
Hope I helped!
If soundex is better than regex depends of your data. For example there are different soundex versions for different languages. You have to check with your data, which is better..
Of course soundex does affect performance as any other additional functions you are calling. If performance becomes a problem, I would advise to add an additional column with the already computed soundex or normalized names and to create an index over it.
From own experience I think a normalized / simplified search criterion as e.g. parts of surname, prename and month of birth date should be sufficient to get all persons, but not too many, so a user can decide which person (s)he really wants to choose.
Soundex wont help you. you will stuck if a consonant appears in the name by mistake.
Its better you go for string distance and specify a percentage. A kind of fuzzy matching.
Have a look at the below link for fuzzy matching using levenshtein edit distance algorithm.
Levenshtein edit distance - MS SQL SERVER
This question already has answers here:
Splitting the full name and writing it to another table in SQL Server 2008
(2 answers)
Closed 9 years ago.
How would I go about splitting one column that has the first, last and middle name. To there own separate columns in a SQL Server 2008 query?
The column is called NAME
NAME(char(25),null)
Mctasrren ,David Max
Cressler ,Patti L
Basil ,Vessen Eddie
Chapplestait ,Victoy
this is what i've used so far my main issue is the middle name. or if someone has a better way to shorten the first name code.
--last name code
left([NAME],charindex(' ,',[NAME]))
-first name code
substring([NAME],charindex(',',[NAME])+1,charindex(' ',substring([NAME],charindex(',',[NAME])+1,25-charindex(',',[NAME])+1)))
Do you want to split it into columns as part of a result set, create computed columns on the table, or actually update the schema to have the data split in the source?
In any case the basic nuts and bolts can be done by either:
Use a combination of CHARINDEX, SUBSTRING, and LEFT or RIGHT to find commas or spaces and split around that. If you sure you data will always be 'L_NAME ,F_NAME M_NAME_OR_INITIAL' that will pretty easy. I am actually I surprised I didn't find an similar question here near the top of a google search, but there is an example of similar from SQLServerCentral.
Use a RegEx via the CLR, which can be more robust if there is any variety in the data. If you are familiar with RegEx this should be a straight forward parse. Again, a simplified example can found on MSDN.
Whatever you choose, you'll probably quickly run into names that don't easily follow that format. In that case you want to build more logic into a function handle different types of names.
Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;
I want to show the closest related item for a product. So say I am showing a product and the style number is SG-sfs35s. Is there a way to select whatever product's style number is closest to that?
Thanks.
EDIT: to answer your questions. Well I definitely want to keep the first 2 letters as that is the manufacturer code but as for the part after the first dash, just whatever matches closest. so for example SG-sfs35s would match SG-shs35s much more than SG-sht64s. I hope this makes sense whenever I do LIKE product_style_number it only pulls the exact match.
There normally isn't a simple way to match product codes that are roughly similar.
A more SQL friendly solution is to create a new table that maps each product to all the products it is similar to.
This table would either need to be maintained manually, or a more sophisticated script can be executed periodically to update it.
If your product codes follow a consistent pattern (all the letters are the same for similar products, with only the numbers changing), then you should be able to use a regular expression to match the similar items. There are docs on this here...
It sounds like what you want is levenshtein distance .
Unfortunately, there isn't a built-in levenshtein function for mysql, but some folks have come up with a user-defined function that does it(deadlink).
You will probably want to do it as a stored procedure, as I expect that the algorithm may not be trivial.
For example, you may split the term at the -, so you have two parts. You do a LIKE query on each part and use that to make a decision.
You could just loop though, replacing the last character with "%" until you get at least one result, in your stored procedure.
Sounds like you need something like Lucene, though i'm not sure if that would be overkill for your situation. But it certainly would be able to do text searches and return the ones most similar first.
If you need something more simple I would try to start by searching with the full product code, then if that doesn't work try to use wildcards/remove some characters until you return a result.
JD Isaacks.
This situation of yours is very simple to solve.
It`s not like you need to use Artificial Intelligence like the Google.
http://www.w3schools.com/sql/sql_wildcards.asp
Take a look at this manual at w3schools about wildcards to use with your SELECT code.
But also you will need to create a new table with 3 columns: LeftCode, RightCode and WildCard.
Example:
Rows on Table:
LeftCode = SG | RightCode = 35s | WildCard = SG-s_s35s
LeftCode = SG | RightCode = 64s | WildCard = SG-s_t64s
SQL Code
If the user typed the code that matches the row1 of the table:
SELECT * FROM PRODUCTS WHERE CODE LIKE "$WildCard";
Where $WildCard is the PHP variable containing the column 3 of the new table.
I hope I helped, even 4 years late...