oracle - query data having non-latin characters - sql

I'm not familiar oracle but have to use PHP/PLSQL to query an oracle database. These data are old and kept in a very bad way.
For example, a person's name let say ÇİĞDEM may be kept in several different ways like CIGDEM or ÇÝÐDEM or ÇIĞDEM or CİĞDEM etc :(
What've did so far is replacing characters that I've found using replace but I don't like it. It works for most of the cases but I can't just accept this. It should work for all possible combinations.
SELECT ... FROM ... WHERE replace(replace(CONVERT(ADI, 'US7ASCII', 'US7ASCII'), chr(221), 'I'), 'Ü', 'U') LIKE :myvariablehere ...
Is there an elegant way to search this kind of data
EDIT
Database version is 10g

I have good experiences with Knuth's soundex function which is available in oracle SQL. Or use Oracle Text as described at http://docs.oracle.com/cd/E18283_01/text.112/e16594/search.htm
Good luck!

Related

Replace Character in String with SQL Server Table Trigger on Insert\Update

**Answered
I am attempting to create a trigger that will replace a character ’ (MS Word Smart Quote) with a proper apostrophe ' when new data is inserted or updated by a user from our website.
The special apostrophe may be found anywhere on a 5000 NVarchar column and may be found multiple times in the same string.
Any easy replace statement for this?
REPLACE(Column,'’','''')
I'm going to argue that you should probably look at doing this in your applications instead of from within SQL Server. That's NOT the answer you're looking for - but it would probably make more sense.
Typically, when I see questions like this I instantly worry about devs trying to 'defeat' SQL Injection. If that's the case, this approach will NEVER work - as per:
http://sqlmag.com/database-security/sql-injection-beyond-basics
That said, if you're not focused on that and just need to get rid of 'pesky' characters, then REPLACE() will work (and likely be your best option), but I'd still argue that you're probably better off tackling 'formatting' issues like this from within your applications. Or in other words, treat SQL Server as your data repository - something that stores your raw data. Then, if you need to make it 'pretty' or 'tweak' it for various outputs/displays, then do that on the way out to your users by means of your application(s).

DB2 complex like

I have to write a select statement following the following pattern:
[A-Z][0-9][0-9][0-9][0-9][A-Z][0-9][0-9][0-9][0-9][0-9]
The only thing I'm sure of is that the first A-Z WILL be there. All the rest is optional and the optional part is the problem. I don't really know how I could do that.
Some example data:
B/0765/E 3
B/0765/E3
B/0764/A /02
B/0749/K
B/0768/
B/0784//02
B/0807/
My guess is that I best remove al the white spaces and the / in the data and then execute the select statement. But I'm having some problems writing the like pattern actually.. Anyone that could help me out?
The underlying reason for this is that I'm migrating a database. In the old database the values are just in 1 field but in the new one they are splitted into several fields but I first have to write a "control script" to know what records in the old database are not correct.
Even the following isn't working:
where someColumn LIKE '[a-zA-Z]%';
You can use Regular Expression via xQuery to define this pattern. There are many question in StackOverFlow that talk about patterns in DB2, and they have been solved with Regular Expressions.
DB2: find field value where first character is a lower case letter
Emulate REGEXP like behaviour in SQL

SQL - searching database with the LIKE operator

Given your data stored somewhere in a database:
Hello my name is Tom I like dinosaurs to talk about SQL.
SQL is amazing. I really like SQL.
We want to implement a site search, allowing visitors to enter terms and return relating records. A user might search for:
Dinosaurs
And the SQL:
WHERE articleBody LIKE '%Dinosaurs%'
Copes fine with returning the correct set of records.
How would we cope however, if a user mispells dinosaurs? IE:
Dinosores
(Poor sore dino). How can we search allowing for error in spelling? We can associate common misspellings we see in search with the correct spelling, and then search on the original terms + corrected term, but this is time consuming to maintain.
Any way programatically?
Edit
Appears SOUNDEX could help, but can anyone give me an example using soundex where entering the search term:
Dinosores wrocks
returns records instead of doing:
WHERE articleBody LIKE '%Dinosaurs%' OR articleBody LIKE '%Wrocks%'
which would return squadoosh?
If you're using SQL Server, have a look at SOUNDEX.
For your example:
select SOUNDEX('Dinosaurs'), SOUNDEX('Dinosores')
Returns identical values (D526) .
You can also use DIFFERENCE function (on same link as soundex) that will compare levels of similarity (4 being the most similar, 0 being the least).
SELECT DIFFERENCE('Dinosaurs', 'Dinosores'); --returns 4
Edit:
After hunting around a bit for a multi-text option, it seems that this isn't all that easy. I would refer you to the link on the Fuzzt Logic answer provided by #Neil Knight (+1 to that, for me!).
This stackoverflow article also details possible sources for implentations for Fuzzy Logic in TSQL. Once respondant also outlined Full text Indexing as a potential that you might want to investigate.
Perhaps your RDBMS has a SOUNDEX function? You didn't mention which one was involved here.
SQL Server's SOUNDEX
Just to throw an alternative out there. If SSIS is an option, then you can use Fuzzy Lookup.
SSIS Fuzzy Lookup
I'm not sure if introducing a separate "search engine" is possible, but if you look at products like the Google search appliance or Autonomy, these products can index a SQL database and provide more searching options - for example, handling misspellings as well as synonyms, search results weighting, alternative search recommendations, etc.
Also, SQL Server's full-text search feature can be configured to use a thesaurus, which might help:
http://msdn.microsoft.com/en-us/library/ms142491.aspx
Here is another SO question from someone setting up a thesaurus to handle common misspellings:
FORMSOF Thesaurus in SQL Server
Short answer, there is nothing built in to most SQL engines that can do dictionary-based correction of "fat fingers". SoundEx does work as a tool to find words that would sound alike and thus correct for phonetic misspellings, but if the user typed in "Dinosars" missing the final U, or truly "fat-fingered" it and entered "Dinosayrs", SoundEx would not return an exact match.
Sounds like you want something on the level of Google Search's "Did you mean __?" feature. I can tell you that is not as simple as it looks. At a 10,000-foot level, the search engine would look at each of those keywords and see if it's in a "dictionary" of known "good" search terms. If it isn't, it uses an algorithm much like a spell-checker suggestion to find the dictionary word that is the closest match (requires the fewest letter substitutions, additions, deletions and transpositions to turn the given word into the dictionary word). This will require some heavy procedural code, either in a stored proc or CLR Db function in your database, or in your business logic layer.
You can also try the SubString(), to eliminate the first 3 or so characters . Below is an example of how that can be achieved
SELECT Fname, Lname
FROM Table1 ,Table2
WHERE substr(Table1.Fname, 1,3) || substr(Table1.Lname,1 ,3) = substr(Table2.Fname, 1,3) || substr(Table2.Lname, 1 , 3))
ORDER BY Table1.Fname;

Why are SQL entries written in uppercase? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why should I capitalize my SQL keywords?
hi,
I'm pretty new to SQL, but i've noticed that writing
SELECT * FROM column_name
is almost always used when
select * from column_name
yields exactly the same result. I can't find anything online about this. Is this just a convention? Or will not using uppercase break the script on older systems/systems that i'm not aware of?
thanks
SQL was developed in the 1970s when the popular programming languages (like COBOL) used ALL CAPS, and the convention must have stuck.
It's because that is the way it is defined in the ANSI standard. See section 5 Lexical elements, I presume it caught on from there.
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
They are completely equivalent, the uppercase just makes the query easier to read.
Notice this is really depending on your sql database implementation. Oracle tends to convert everything to uppercase. Postgresql on the contrary, will convert sql keywords or column identifier to lowercase.
For idendifiers (tables, columns, ...) you can prevent your database from being "clever" by double-quoting them.
select "TeST" from MyTable;
will be translated in Oracle to SELECT "TeST" FROM MYTABLE; and in Postgresql to select "TeST" from mytable;
Also consider this behaviour when using jdbc for example, as the column names retrieved in the ResultSet will also follow these rules, so double-quoting identifiers might be a good practice if you consider portability.

How to update an SQLite database with a search and replace query?

My SQL knowledge is very limited, specially about SQLite, although I believe this is will be some sort of generic query... Or maybe not because of the search and replace...
I have this music database in SQLite that has various fields of course but the important ones here the "media_item_id" and "content_url".
Here's an example of a "content_url":
file:///c:/users/nazgulled/music/band%20albums/devildriver/%5b2003%5d%20devildriver/08%20-%20what%20does%20it%20take%20(to%20be%20a%20man).mp3
I'm looking for a query that will search for entries like those, where "content_url" follows that pattern and replace it (the "content_url") with something else.
For instance, a generic "content_url" can be this:
file:///c:/users/nazgulled/music/band%20albums/BAND_NAME/ALBUM_NAME/SONG_NAME.mp3
And I want to replace all these entries with:
file:///c:/users/nazgulled/music/bands/studio%20albums/BAND_NAME/ALBUM_NAME/SONG_NAME.mp3
How can I do it in one query?
P.S: I'm using Firefox SQLite Manager (couldn't find a better and free alternative for Windows).
You are probably looking for the replace function.
For example,
update table_name set
content_url = replace(content_url, 'band%20albums', 'bands/studio%20albums')
where
content_url like '%nazgulled/music/band_20albums/%';
More documentation at http://sqlite.org/lang_corefunc.html