Microsoft SQL and query-ing with spelling errors - sql

If I wanted to run a query where certain fields in my query were to be mispelt (up to a certain point of course), is there a native way to do this with MS SQL server or does one have to implement something else entirely?
What I think it boils to is can you select where levenstein distance is a max of X
Example
select from myTable WHERE name closeTo('Erik')
and it might return things like Eric, ie, a Levenhstein distance of 1

Have a look at T-SQL SOUNDEX and DIFFERENCE: http://msdn.microsoft.com/en-us/library/ms187384.aspx

Related

sql statement not working on AND (... OR ... OR ...)

this is probably a little thing
but i try to use this sql statement:
SELECT * FROM Colors
WHERE colorHueWarmth < 0
AND colorV >=0.7
AND (fk_subCategory=4 OR fk_subCategory=5 OR fk_subCategory=11)
And in the results i get the perfect colorHueWarmth and colorV but i also get the fk_subcategories for other values than 4, 5 or 11.
i tried changing the values but no results, is it even possible to do such a statement?
Does anyone what i am doing wrong?
Thanks in advance
You've actually got multiple options; although I'd point out that the query (in your qusetion) actually works for me (see this Sql Fiddle)
SELECT
*
FROM
Colors
WHERE
colorHueWarmth < 0
AND colorV >=0.7
AND (fk_subCategory=4 OR fk_subCategory=5 OR fk_subCategory=11)
As stated in one of the comments I would guess that your original didn't have braces on the fk_subCategory clause (the third table in my previous fiddle). Brackets are immensely important when working with logic and should always be used to group items together.
The easiest solution is as follows:
SELECT
*
FROM
Colors
WHERE
colorHueWarmth < 0
AND colorV >=0.7
AND (fk_subCategory IN(4,5,11));
You will find loads of documentation online regarding the LIKE clause here are a few you might find useful:
http://webcheatsheet.com/sql/interactive_sql_tutorial/sql_in.php
http://www.w3schools.com/sql/sql_in.asp (note W3Schools can't always be taken on face value and are often excluded from suggested links due to the errors/omissions they often contain)
http://msdn.microsoft.com/en-gb/library/ms177682.aspx
Given the size of the foreign key constraint (4,5 or 11) the IN clause is a reasonable option, if you have other queries using something similar with large collections this can become quite inefficient in which case you could create a temporary table which contains the ID's and INNER JOIN onto that. (here is a question regarding alternatives to LIKE)

How to find strings which are similar to given string in SQL server?

I have a SQL server table which contains several string columns. I need to write an application which gets a string and search for similar strings in SQL server table.
For example, if I give the "مختار" or "مختر" as input string, I should get these from SQL table:
1 - مختاری
2 - شهاب مختاری
3 - شهاب الدین مختاری
I've searched the net for a solution but I have found nothing useful. I've read this question , but this will not help me because:
I am using MS SQL Server not MySQL
my table contents are in Persian, so I can't use Levenshtein distance and similar methods
I prefer an SQL Server only solution, not an indexing or daemon based solution.
The best solution would be a solution which help us sort result by similarity, but, its optional.
Do you have any suggestion for that?
Thanks
MSSQL supports LIKE which seems like it should work. Is there a reason it's not suitable for your program?
SELECT * FROM table WHERE input LIKE '%مختار%'
Hmm.. considering that you read the other post you probably know about the like operator already... maybe your problem is "getting the string and searching for something similar"?
--This part searches for a string you want
declare #MyString varchar(max)
set #MyString = (Select column from table
where **LOGIC TO FIND THE STRING GOES HERE**)
--This part searches for that string
select searchColumn, ABS(Len(searchColumn) - Len(#MyString)) as Similarity
from table where data LIKE '%' + #MyString + '%'
Order by Similarity, searchColumn
The similarity part is something like the thing you posted. If the strings are "more similar" meaning that they have a similar length, they will be higher on the results query.
The absolute part can be avoided obviously but I did it just in case.
Hope that helps =-)
Besides like operator, you can use the condition WHERE instr(columnname, search) > 0; however this is generally slower. What it does is return the starting position of a string within another string. thus if searching in ABCDEFG for CD it would return 3. 3>0, so the record would be returned. However in the case you've described, like seems to be the best solution.
The general problem is that in languages where the same letter has different writing form in the beginning, middle and at the end of word, and thus - different codes - we can try to use specific Persian collations, but in general this will not help.
The second option - is to use SQL FTS abilities, but again - if it has not special language module for the language - it is much less useful.
And most general way - to use your own language processing - which is very complex task at all. The next keywords and google can help to understand the size of the problem: DLP, words and terms, bi-gramms, n-gramms, grammar and morphology inflection
Try to use the Built-in Soundex() And Difference() functions. I hope they work fine for Persian.
Look at the following reference:
http://blog.hoegaerden.be/2011/02/05/finding-similar-strings-with-fuzzy-logic-functions-built-into-mds/
Similarity() function helps you to sort result by similarity (as you asked in your question) and it is also possible using algorithms different from Levenshtein edit distance depends on the Value for #method Algorithm:
0 The Levenshtein edit distance algorithm
1 The Jaccard similarity coefficient algorithm
2 A form of the Jaro-Winkler distance algorithm
3 Longest common subsequence algorithm
Like operator may not do what he is asking for. Like for example, if i have a record value "please , i want to ask a question' in my database record. and lets say on my query, i want to find a match similarity like this 'Can i ask a question, please'. like operator may do this using like %[your senttence] or [your sentence]% but it is not advisable to use it for string similarity cos sentences may change and all your like logic may not fetch the matching records. It is advisable to use naive bayes text classification for similarities assigning labels to your sentences or you can try the semantic search function in MSSQL server

What happens if I SELECT something AS a name then something else AS the same name?

I've already answered my question but didn't see it on here, so here we go. Please feel free to link to the question if it has been asked exactly.
I simplified my question to the following code:
SELECT 'a' AS col1, 'b' AS col1
Will this give a same column name error?
Will the last value always be returned or is there a chance col1 could be 'a'?
I'm not sure why you would ever want this, but I tried it in Oracle (10g) and it worked fine, returning both columns. I realize you've asked about SQL Server specifically, but I found it interesting that this worked at all.
Edit: It also works on MySQL.
It works in the final query:
However when you do it in a subselect and refer to the ambiguous column aliases in an outer query you get an error:
In SQL 2008 r2 it is valid as a stand alone query. Under certain circumstances it will produce errors (incomplete list):
Inline views
Common Table Entries
Stored Procedures when the output is used by reporting services and presumably similarly integrated tools
It's hard to imagine a case where you would want duplicate row names, and it's easy to think of ways in which writing queries with repeats now could turn sour in the future.

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.

Expression Too Complex In Access 2007

When I try to run this query in Access through the ODBC interface into a MySQL database I get an "Expression too complex in query expression" error. The essential thing I'm trying to do is translate abbreviated names of languages into their full body English counterparts. I was curious if there was some way to "trick" access into thinking the expression is smaller with sub queries, or if someone else had a better idea of how to solve this problem. I thought about making a temporary table and doing a join on it, but that's not supported in Access SQL.
Just as an FYI, the query worked fine until I added the big long IFF chain. I tested the query on a smaller IFF chain for three languages, and that wasn't an issue, so the problem definitely stems from the huge IFF chain (It's 26 deep). Also, I might be able to drop some of the options (like combining the different forms of Chinese or Portuguese)
As a test, I was able to get the SQL query to work after paring it down to 14 IFF() statements, but that's a far cry from the 26 languages I'd like to represent.
SELECT TOP 5 Count( * ) AS [Number of visits by language], IIf(login.lang="ar","Arabic",IIf(login.lang="bg","Bulgarian",IIf(login.lang="zh_CN","Chinese (Simplified Han)",IIf(login.lang="zh_TW","Chinese (Traditional Han)",IIf(login.lang="cs","Czech",IIf(login.lang="da","Danish",IIf(login.lang="de","German",IIf(login.lang="en_US","United States English",IIf(login.lang="en_GB","British English",IIf(login.lang="es","Spanish",IIf(login.lang="fr","French",IIf(login.lang="el","Greek",IIf(login.lang="it","Italian",IIf(login.lang="ko","Korean",IIf(login.lang="hu","Hungarian",IIf(login.lang="nl","Dutch",IIf(login.lang="pl","Polish",IIf(login.lang="pt_PT","European Portuguese",IIf(login.lang="pt_BR","Brazilian Portuguese",IIf(login.lang="ru","Russian",IIf(login.lang="sk","Slovak",IIf(login.lang="sl","Slovenian","IIf(login.lang="fi","Finnish",IIf(login.lang="sv","Swedish",IIf(login.lang="tr","Turkish","Unknown")))))))))))))))))))))))))) AS [Language]
FROM login, reservations, reservation_users, schedules
WHERE (reservations.start_date Between DATEDIFF('s','1970-01-01 00:00:00',[Starting Date in the Following Format YYYY/MM/DD]) And DATEDIFF('s','1970-01-01 00:00:00',[Ending Date in the Following Format YYYY/MM/DD])) And reservations.is_blackout=0 And reservation_users.memberid=login.memberid And reservation_users.resid=reservations.resid And reservation_users.invited=0 And reservations.scheduleid=schedules.scheduleid And scheduletitle=[Schedule Title]
GROUP BY login.lang
ORDER BY Count( * ) DESC;
# Michael Todd
I completely agree. The list of languages should have been a table in the database and the login.lang should have been a FK into that table. Unfortunately this isn't how the database was written, and it's not really mine to modify. The languages are placed into the login.lang field by the PHP running on top of the database.
I thought about making a temporary table and doing a join on it, but that's not supported in Access SQL.
Did you try making a table of languages within Access, and joining it to the MySQL tables?
You may try the below expression. what I did is, your expression is cut down to two parts, then a final 'IIf' check will do the trick. You will have additional 2 fields and you may ignore those. I had the same situation and this worked well for me. PS: You may need to double check the closing brackets in the below expression. I did it quickly.
Thanks,
Shibin
IIf(login.lang="ar","Arabic",IIf(login.lang="bg","Bulgarian",IIf(login.lang="zh_CN","Chinese (Simplified Han)",IIf(login.lang="zh_TW","Chinese (Traditional Han)",IIf(login.lang="cs","Czech",IIf(login.lang="da","Danish",IIf(login.lang="de","German",IIf(login.lang="en_US","United States English",IIf(login.lang="en_GB","British English",IIf(login.lang="es","Spanish",IIf(login.lang="fr","French",IIf(login.lang="el","Greek",IIf(login.lang="it","Italian",""))))))))))))) as l1,
IIf(login.lang="ko","Korean",IIf(login.lang="hu","Hungarian",IIf(login.lang="nl","Dutch",IIf(login.lang="pl","Polish",IIf(login.lang="pt_PT","European Portuguese",IIf(login.lang="pt_BR","Brazilian Portuguese",IIf(login.lang="ru","Russian",IIf(login.lang="sk","Slovak",IIf(login.lang="sl","Slovenian","IIf(login.lang="fi","Finnish",IIf(login.lang="sv","Swedish",IIf(login.lang="tr","Turkish","Unknown")))))))))))) as l2,
IIf(l1="",l2,l1) AS [Language]
If you can't use a lookup table, create a custom VB function, so that instead of 26 IIf statements, you have one function call.