How to search SQL when words contain errors - sql

I would like to execute a SQL command. However, the keywords may contain errors. For example, the correct command should be
select id from my_table where name = 'Tommy'
It would return 1.
However, if someone execute the following incorrect command:
select id from my_table where name = 'Tomyy'
How to change the command so that it still returns 1?
Thanks a lot.

There are many ways to tackle these, but please keep in mind this isn't the easiest of tasks. What you're looking for is a fuzzy search algorithm.
This should get you started: Fuzzy searches in SQL Server (Redgate)
Code project also has some interesting options here: Implementing phonetic name searches
If you're looking for an easier but more barebones solution you should look into using SOUNDEX or DIFFERENCE (assuming your dbms is MSSQL). I've been playing a bit with DIFFERENCE and it's pretty cool what this can do out of the box.

Try this
select id from my_table where SOUNDEX(name) = SOUNDEX('Tomyy')

Related

Regexp search SQL query fields

I have a repository of SQL queries and I want to understand which queries use certain tables or fields.
Let's say I want to understand what queries use the email field, how can I write it?
Example SQL query:
select
users.email as email_user
,users.email as email_user_too
,email as email_user_too_2
email as email_user_too_3,
back_email as wrong_email -- wrong field
from users
So to state the problem more accurately, you are sorting through a list of SQL queries [as text], and you now need to find the queries that use certain fields using SQL & RegEx (Regular Expressions) in PostgreSQL. (please tag the question so that StackOverflow indexes your question correctly, more importantly, readers have more context about the question)
PostgreSQL has Regular Expression support OOTB (Out Of The Box). So we skip exploring other ways to do this. (If you are reading this as Microsoft SQL Server person, then I strongly suggest you to have a read of this brilliant article on Microsoft's website on defining a Table-Valued UDF (User Defined Function))
The simplest way I could think of to approach your problem, is to throw away what we don't want out of the query text first, and then filter out what's left.
This way, after throwing away the stuff you don't need, you will be left with a set of "tokens" that you can easily filter, and I'm putting token in quotes since we are not really parsing the SQL language, but if we did that would be the first step: to extract tokens.. (:
Take this query for example:
With Queries (
Id
, QueryText
) As (
values (1, 'select
users.email as email_user
,users.email as email_user_too
,email as email_user_too_2,
email as email_user_too_3,
back_email as wrong_email -- wrong field
from users')
)
Select QueryText
, found
From (
Select Id
, QueryText
, regexp_split_to_table (QueryText, '(--[\s\w]+|select|from|as|where|[ \s\n,])') As found
From Queries
) As Result
Where found != ''
And found = 'back_email'
I have sourced the concept of a "query repository" with a WITH statement for ease of doing the pseudo-code.
I have also selected few words/characters to split QueryText with. Like select, where etc. We don't need these in our 'found' set.
And in the end, as you can see above, I simply used found as what's left and filtered it with the field name you are looking for. (Assuming that you know the field you are looking for)
You could improve upon the RegEx I did, or change the method as you wish to make it better. But I think the general concept addresses what you need to achieve. One problem I can see with my solution right off the bat is the fact that you can search for anything really, not just names of the selected fields - which begs the question, why use RegEx, and not Like statements? But again, as I mentioned, you can improve upon the RegEx and address specific requirements you may have. Using Like might limit you in that direction. (In other words, only you know what's good for you. I can't say that from here.)
You can play with the query online here: db-fiddle query and use https://regex101.com/ for testing your RegEx.
Disclaimer I'm not a PostgreSQL developer. There must be other, perhaps better ways of doing this. (:

PLSQL to SQL - DatabaseLink Select

straight forward question, but interesting enought, I didn't find anything. Probably I'm searching for the wrong keywords:
We have 2 Databases, one Oracle, one SQL, connected via Links.
I'd like to check, if some data is in the Oracle-Part, but not in the SQL one.
Selecting it from the PLSQLDev is pretty straightforward:
SELECT * from Core.Event#Link
But as soon as I try to select specific fields like :
SELECT Id from Core.Event#Link
It tells me the qualifier is invalid.
I tried all shennanigans like alias select:
SELECT ie.Id from Core.Event#Link ie
But it keeps telling me the qualifier is invalid.
Is there a special syntax I have to keep in mind?
Thanks in advance and a good weekend.
Matthias

How to find strings which are similar to given string in SQL server?

I have a SQL server table which contains several string columns. I need to write an application which gets a string and search for similar strings in SQL server table.
For example, if I give the "مختار" or "مختر" as input string, I should get these from SQL table:
1 - مختاری
2 - شهاب مختاری
3 - شهاب الدین مختاری
I've searched the net for a solution but I have found nothing useful. I've read this question , but this will not help me because:
I am using MS SQL Server not MySQL
my table contents are in Persian, so I can't use Levenshtein distance and similar methods
I prefer an SQL Server only solution, not an indexing or daemon based solution.
The best solution would be a solution which help us sort result by similarity, but, its optional.
Do you have any suggestion for that?
Thanks
MSSQL supports LIKE which seems like it should work. Is there a reason it's not suitable for your program?
SELECT * FROM table WHERE input LIKE '%مختار%'
Hmm.. considering that you read the other post you probably know about the like operator already... maybe your problem is "getting the string and searching for something similar"?
--This part searches for a string you want
declare #MyString varchar(max)
set #MyString = (Select column from table
where **LOGIC TO FIND THE STRING GOES HERE**)
--This part searches for that string
select searchColumn, ABS(Len(searchColumn) - Len(#MyString)) as Similarity
from table where data LIKE '%' + #MyString + '%'
Order by Similarity, searchColumn
The similarity part is something like the thing you posted. If the strings are "more similar" meaning that they have a similar length, they will be higher on the results query.
The absolute part can be avoided obviously but I did it just in case.
Hope that helps =-)
Besides like operator, you can use the condition WHERE instr(columnname, search) > 0; however this is generally slower. What it does is return the starting position of a string within another string. thus if searching in ABCDEFG for CD it would return 3. 3>0, so the record would be returned. However in the case you've described, like seems to be the best solution.
The general problem is that in languages where the same letter has different writing form in the beginning, middle and at the end of word, and thus - different codes - we can try to use specific Persian collations, but in general this will not help.
The second option - is to use SQL FTS abilities, but again - if it has not special language module for the language - it is much less useful.
And most general way - to use your own language processing - which is very complex task at all. The next keywords and google can help to understand the size of the problem: DLP, words and terms, bi-gramms, n-gramms, grammar and morphology inflection
Try to use the Built-in Soundex() And Difference() functions. I hope they work fine for Persian.
Look at the following reference:
http://blog.hoegaerden.be/2011/02/05/finding-similar-strings-with-fuzzy-logic-functions-built-into-mds/
Similarity() function helps you to sort result by similarity (as you asked in your question) and it is also possible using algorithms different from Levenshtein edit distance depends on the Value for #method Algorithm:
0 The Levenshtein edit distance algorithm
1 The Jaccard similarity coefficient algorithm
2 A form of the Jaro-Winkler distance algorithm
3 Longest common subsequence algorithm
Like operator may not do what he is asking for. Like for example, if i have a record value "please , i want to ask a question' in my database record. and lets say on my query, i want to find a match similarity like this 'Can i ask a question, please'. like operator may do this using like %[your senttence] or [your sentence]% but it is not advisable to use it for string similarity cos sentences may change and all your like logic may not fetch the matching records. It is advisable to use naive bayes text classification for similarities assigning labels to your sentences or you can try the semantic search function in MSSQL server

SQL - searching database with the LIKE operator

Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;

How to update an SQLite database with a search and replace query?

My SQL knowledge is very limited, specially about SQLite, although I believe this is will be some sort of generic query... Or maybe not because of the search and replace...
I have this music database in SQLite that has various fields of course but the important ones here the "media_item_id" and "content_url".
Here's an example of a "content_url":
file:///c:/users/nazgulled/music/band%20albums/devildriver/%5b2003%5d%20devildriver/08%20-%20what%20does%20it%20take%20(to%20be%20a%20man).mp3
I'm looking for a query that will search for entries like those, where "content_url" follows that pattern and replace it (the "content_url") with something else.
For instance, a generic "content_url" can be this:
file:///c:/users/nazgulled/music/band%20albums/BAND_NAME/ALBUM_NAME/SONG_NAME.mp3
And I want to replace all these entries with:
file:///c:/users/nazgulled/music/bands/studio%20albums/BAND_NAME/ALBUM_NAME/SONG_NAME.mp3
How can I do it in one query?
P.S: I'm using Firefox SQLite Manager (couldn't find a better and free alternative for Windows).
You are probably looking for the replace function.
For example,
update table_name set
content_url = replace(content_url, 'band%20albums', 'bands/studio%20albums')
where
content_url like '%nazgulled/music/band_20albums/%';
More documentation at http://sqlite.org/lang_corefunc.html