Openbase SQL case-sensitivity oddities ('=' vs. LIKE) - porting to MySQL - sql

We are porting an app which formerly used Openbase 7 to now use MySQL 5.0.
OB 7 did have quite badly defined (i.e. undocumented) behavior regarding case-sensitivity. We only found this out now when trying the same queries with MySQL.
It appears that OB 7 treats lookups using "=" differently from those using "LIKE": If you have two values "a" and "A", and make a query with WHERE f="a", then it finds only the "a" field, not the "A" field. However, if you use LIKE instead of "=", then it finds both.
Our tests with MySQL showed that if we're using a non-binary collation (e.g. latin1), then both "=" and "LIKE" compare case-insensitively. However, to simulate OB's behavior, we need to get only "=" to be case-sensitive.
We're now trying to figure out how to deal with this in MySQL without having to add a lot of LOWER() function calls to all our queries (there are a lot!).
We have full control over the MySQL DB, meaning we can choose its collation mode as we like (our table names and unique indexes are not affected by the case sensitivity issues, fortunately).
Any suggestions how to simulate the OpenBase behaviour on MySQL with the least amount of code changes?
(I realize that a few smart regex replacements in our source code to add the LOWER calls might do the trick, but we'd rather find a different way)

Another idea .. does MySQL offer something like User Defined Functions? You could then write a UDF-version of like that is case insesitive (ci_like or so) and change all like's to ci_like. Probably easier to do than regexing a call to lower in ..

These two articles talk about case sensitivity in mysql:
Case Sensitive mysql
mySql docs "Case Sensitivity"
Both were early hits in this Google search:
case sensitive mysql

I know that this is not the answer you are looking for .. but given that you want to keep this behaviour, shouldn't you explicitly code it (rather than changing some magic 'config' somewhere)?
It's probably quite some work, but at least you'd know which areas of your code are affected.

A quick look at the MySQL docs seems to indicate that this is exactly how MySQL does it:
This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a.

Related

Replace Character in String with SQL Server Table Trigger on Insert\Update

**Answered
I am attempting to create a trigger that will replace a character ’ (MS Word Smart Quote) with a proper apostrophe ' when new data is inserted or updated by a user from our website.
The special apostrophe may be found anywhere on a 5000 NVarchar column and may be found multiple times in the same string.
Any easy replace statement for this?
REPLACE(Column,'’','''')
I'm going to argue that you should probably look at doing this in your applications instead of from within SQL Server. That's NOT the answer you're looking for - but it would probably make more sense.
Typically, when I see questions like this I instantly worry about devs trying to 'defeat' SQL Injection. If that's the case, this approach will NEVER work - as per:
http://sqlmag.com/database-security/sql-injection-beyond-basics
That said, if you're not focused on that and just need to get rid of 'pesky' characters, then REPLACE() will work (and likely be your best option), but I'd still argue that you're probably better off tackling 'formatting' issues like this from within your applications. Or in other words, treat SQL Server as your data repository - something that stores your raw data. Then, if you need to make it 'pretty' or 'tweak' it for various outputs/displays, then do that on the way out to your users by means of your application(s).

DB2 complex like

I have to write a select statement following the following pattern:
[A-Z][0-9][0-9][0-9][0-9][A-Z][0-9][0-9][0-9][0-9][0-9]
The only thing I'm sure of is that the first A-Z WILL be there. All the rest is optional and the optional part is the problem. I don't really know how I could do that.
Some example data:
B/0765/E 3
B/0765/E3
B/0764/A /02
B/0749/K
B/0768/
B/0784//02
B/0807/
My guess is that I best remove al the white spaces and the / in the data and then execute the select statement. But I'm having some problems writing the like pattern actually.. Anyone that could help me out?
The underlying reason for this is that I'm migrating a database. In the old database the values are just in 1 field but in the new one they are splitted into several fields but I first have to write a "control script" to know what records in the old database are not correct.
Even the following isn't working:
where someColumn LIKE '[a-zA-Z]%';
You can use Regular Expression via xQuery to define this pattern. There are many question in StackOverFlow that talk about patterns in DB2, and they have been solved with Regular Expressions.
DB2: find field value where first character is a lower case letter
Emulate REGEXP like behaviour in SQL

SQL - searching database with the LIKE operator

Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;

Problems with Turkish SQL Collation (Turkish "I")

I'm having problems with our MSSQL database set to any of the Turkish Collations. Becuase of the "Turkish I" problem, none of our queries containing an 'i' in them are working correctly. For example, if we have a table called "Unit" with a column "UnitID" defined in that case, the query "select unitid from unit" no longer works because the lower case "i" in "id" differs from the defined capital I in "UnitID". The error message would read "Invalid column name 'unitid'."
I know that this is occurring because in Turkish, the letter i and I are seen as different letters. However, I am not sure as to how to fix this problem? It is not an option to go through all 1900 SPs in the DB and correct the casing of the "i"s.
Any help would be appreciated, even suggestions of other collations that could be used instead of Turkish but would support their character set.
Turns out that the best solution was to in fact refactor all SQL and the code.
In the last few days I've written a refactoring app to fix up all Stored procs, functions, views, tablenames to be consistent and use the correct casing eg:
select unitid from dbo.unit
would be changed to
select UnitId from dbo.Unit
The app also then goes through the code and replaces any occurrences of the stored proc and its parameters and corrects them to match the case defined in the DB. All datatables in the app are set to invariant locale (thanks to FXCop for pointing out all the datatables..), this prevents the calls from within code having to be case sensitive.
If anyone would like the app or any advice on the process you can contact me on dotnetvixen#gmail.com.
I developed so many systems with Turkish support and this is well known problem as you said.
Best practice to do change your database settings to UTF-8, and that's it. It should solve the all problem.
You might run into problems if you want to support case-sensitivity in (ı-I,i-İ) that can be a problematic to support in SQL Server. If the whole entrance is from Web ensure that is UTF-8 as well.
If you keep your Web UTF-8 input and SQL Server settings as UTF-8 everything should goes smoothly.
Perhaps I don't understand the problem here, but is this not more likely because the database is case sensitive and your query is not? For example, on Sybase I can do the following:
USE master
GO
EXEC sp_server_info 16
GO
Which tells me that my database is case-insensitive:
attribute_id attribute_name attribute_value
16 IDENTIFIER_CASE MIXED
If you can change the collation that you're using then try the Invariant locale. But make sure you don't impact other things like customer names and addresses. If a customer is accustomed to having case insensitive searching for their own name, they won't like it if ı and I stop being equivalent, or if i and İ stop being equivalent.
Can you change the database collation to the default: this will leave all your text columns with the Turkish colllation?
Queries will work but data will behave correctly. In theory...
There are some gotchas with temp tables and table variables with varchar columns: you'll have to add COLLATE clauses to these
I realize you don't want to go through all the stored procedures to fix the issue but maybe you'd be OK to use a refactoring tool to solve the problem. I say take a look at SQL Refactor. I didn't use it but looks promising.
Changing the Regional Settings of your machine to English(US) completely saves the day!

How can I make MS Access Query Parameters Optional?

I have a query that I would like to filter in different ways at different times. The way I have done this right now by placing parameters in the criteria field of the relevant query fields, however there are many cases in which I do not want to filter on a given field but only on the other fields. Is there any way in which a wildcard of some sort can be passed to the criteria parameter so that I can bypass the filtering for that particular call of the query?
If you construct your query like so:
PARAMETERS ParamA Text ( 255 );
SELECT t.id, t.topic_id
FROM SomeTable t
WHERE t.id Like IIf(IsNull([ParamA]),"*",[ParamA])
All records will be selected if the parameter is not filled in.
Note the * wildcard with the LIKE keyword will only have the desired effect in ANSI-89 Query Mode.
Many people mistakenly assume the wildcard character in Access/Jet is always *. Not so. Jet has two wildcards: % in ANSI-92 Query Mode and * in ANSI-89 Query Mode.
ADO is always ANSI-92 and DAO is always ANSI-89 but the Access interface can be either.
When using the LIKE keyword in a database object (i.e. something that will be persisted in the mdb file), you should to think to yourself: what would happen if someone used this database using a Query Mode other than the one I usually use myself? Say you wanted to restrict a text field to numeric characters only and you'd written your Validation Rule like this:
NOT LIKE "*[!0-9]*"
If someone unwittingly (or otherwise) connected to your .mdb via ADO then the validation rule above would allow them to add data with non-numeric characters and your data integrity would be shot. Not good.
Better IMO to always code for both ANSI Query Modes. Perhaps this is best achieved by explicitly coding for both Modes e.g.
NOT LIKE "*[!0-9]*" AND NOT LIKE "%[!0-9]%"
But with more involved Jet SQL DML/DDL, this can become very hard to achieve concisely. That is why I recommend using the ALIKE keyword, which uses the ANSI-92 Query Mode wildcard character regardless of Query Mode e.g.
NOT ALIKE "%[!0-9]%"
Note ALIKE is undocumented (and I assume this is why my original post got marked down). I've tested this in Jet 3.51 (Access97), Jet 4.0 (Access2000 to 2003) and ACE (Access2007) and it works fine. I've previously posted this in the newsgroups and had the approval of Access MVPs. Normally I would steer clear of undocumented features myself but make an exception in this case because Jet has been deprecated for nearly a decade and the Access team who keep it alive don't seem interested in making deep changes to the engines (or bug fixes!), which has the effect of making the Jet engine a very stable product.
For more details on Jet's ANSI Query modes, see About ANSI SQL query mode.
Back to my previous exampe in your previous question. Your parameterized query is a string looking like that:
qr = "Select Tbl_Country.* From Tbl_Country WHERE id_Country = [fid_country]"
depending on the nature of fid_Country (number, text, guid, date, etc), you'll have to replace it with a joker value and specific delimitation characters:
qr = replace(qr,"[fid_country]","""*""")
In order to fully allow wild cards, your original query could also be:
qr = "Select Tbl_Country.* From Tbl_Country _
WHERE id_Country LIKE [fid_country]"
You can then get wild card values for fid_Country such as
qr = replace(qr,"[fid_country]","G*")
Once you're done with that, you can use the string to open a recordset
set rs = currentDb.openRecordset(qr)
I don't think you can. How are you running the query?
I'd say if you need a query that has that many open variables, put it in a vba module or class, and call it, letting it build the string every time.
I'm not sure this helps, because I suspect you want to do this with a saved query rather than in VBA; however, the easiest thing you can do is build up a query line by line in VBA, and then creating a recordset from it.
A quite hackish way would be to re-write the saved query on the fly and then access that; however, if you have multiple people using the same DB you might run into conflicts, and you'll confuse the next developer down the line.
You could also programatically pass default value to the query (as discussed in you r previous question)
Well, you can return non-null values by passing * as the parameter for fields you don't wish to use in the current filter. In Access 2003 (and possibly earlier and later versions), if you are using like [paramName] as your criterion for a numeric, Text, Date, or Boolean field, an asterisk will display all records (that match the other criteria you specify). If you want to return null values as well, then you can use like [paramName] or Is Null as the criterion so that it returns all records. (This works best if you are building the query in code. If you are using an existing query, and you don't want to return null values when you do have a value for filtering, this won't work.)
If you're filtering a Memo field, you'll have to try another approach.