I have a table called A and a column called Keywords Varchar(255). The Keywords column can contain strings like "TEST, CÃO, ódio" and so on... with or without accents:
ID Keywords
1 TEST, CÃO, ódio, oracle, SQL, açaí
2 Valor, Deputado Rafael, Costelão, estilo
3 São Sebastião, cao, projeto de lei
I'm trying to create a SQL query that compare strings ignoring brazilian accents (áéíóúç and so on...). So if the user searches for "cao", it should return the rows 1 and 3 in the example.
I tried something like:
SELECT keywords
FROM A WHERE UPPER(TRANSLATE(keywords,
'ÁÇÉÍÓÚÀÈÌÒÙÂÊÎÔÛÃÕËÜáçéíóúàèìòùâêîôûãõëü','ACEIOUAEIOUAEIOUAOEUaceiouaeiouaeiouaoeu'))
LIKE UPPER((TRANSLATE('%cao%',
'ÁÇÉÍÓÚÀÈÌÒÙÂÊÎÔÛÃÕËÜáçéíóúàèìòùâêîôûãõëü', 'ACEIOUAEIOUAEIOUAOEUaceiouaeiouaeiouaoeu')));
But it doesn't work.
I also tried using NLS_SORT, but it is only for Oracle, and I need a query that works both on SQL Server and Oracle (it's a client requirement). How can I do that?
One issue is that Microsoft SQL Server did not have a translate function until 2017. It does now, but since it doesn't work for you, you are probably not on this version yet.
You can do a nested replace instead. This is not difficult but is tedious to write. Once it is written and tested, it will be fine.
The Microsoft SQL Server documentation explains this: https://learn.microsoft.com/en-us/sql/t-sql/functions/translate-transact-sql
You also should be aware of the character encoding that is being used in Oracle and SQL Server. With the translate and replace functions you should be OK, but if you ever transfer data via files it will be important. I have described more of this at: http://www.thedatastudio.net/dodgy_characters.htm
Here's an example for the first few characters you want to translate:
select
replace
(
replace
(
replace
(
replace
(
'ABÇDÉFGHÍJÁBÇDÉFGHÍJ', 'Á', 'A'
), 'Ç', 'C'
), 'É', 'E'
), 'Í', 'I'
) as clean_keyword;
Just substitute your keyword for 'ABÇDÉFGHÍJÁBÇDÉFGHÍJ'.
The result is:
ABCDEFGHIJABCDEFGHIJ
There is an example on https://learn.microsoft.com/en-us/sql/t-sql/functions/translate-transact-sql too.
Related
I was reading the article at mssqltips and wanted to try the caret in regex. I understand regex pretty well and use it often, although not much in SQl Server queries.
For the following list of names, I had thought that 1) select * from people where name like '%[^m]%;' will return those names that do not contain 'm'. But it doesn't work like that. I know I can do 2) select * from people where name not like '%m%'; to get the result I want, but I'm just baffled why 1) doesn't work as expected.
Amy
Jasper
Jim
Kathleen
Marco
Mike
Mitchell
I am using SQL Server 2017, but here is a fiddle:
sql fiddle
'%[^m]%' would be true for any string containing a character that is not m. An expanded version would be '%[Any character not m]%'. Since all of those strings contain a character other than m, they are valid results.
If you had a string like mmm, where name like '%[^m]%' would not return that row.
I posted the same question below for SQL in Oracle here and was provided the SQL info within that works.
However, I now need to perform the same in a DB2 database and if I attempt to run the same SQL it errors out.
I need to find rows where the phone number field contains unexpected characters.
Most of the values in this field look like:
123456-7890
This is expected. However, we are also seeing character values in this field such as * and #.
I want to find all rows where these unexpected character values exist.
Expected:
Numbers are expected
Hyphen with numbers is expected (hyphen alone is not)
NULL is expected
Empty is expected
This SQL works in Oracle:
...
WHERE regexp_like(phone_num, '[^ 0123456789-]|^-|-$')
When using the same SQL above in DB2, the statement errors out.
I found it easiest to answer your question by phrasing a regex which matches the positive cases. Then, we can just use NOT to find the negative cases. DB2 supports a REGEXP_LIKE function:
SELECT *
FROM yourTable
WHERE
NOT REGEXP_LIKE(phone_num, '^[0-9]+(-?[0-9]+)*$') AND
COALESCE(phone_num, '') <> '';
Here is a demo of the regex:
Demo
For newer version of db2, regexp is the way to go. If you are on an older version (perhaps why you get an error), you can replace all accepted chars with '' and check if the result is an empty string. Can't check right now, but from memory, it would be
WHERE TRANSLATE(phone_num, '', '0123456789-')<>''
EDIT:
For what it's worth your regexp works for V11 so you probably have an older version of Db2. Example of translate and regexp side by side:
]$ db2 "with t(s) as ( values '123456-7890', '12345*-7890' )
select s, 'regexp' as method from t
where regexp_like(s, '[^ 0123456789-]|^-|-$')
union all
select s, 'translate' as method
from t where TRANSLATE(s, '', '0123456789-')<>''"
S METHOD
----------- ---------
12345*-7890 translate
12345*-7890 regexp
2 record(s) selected.
I have a requirement where I have a table REPLACE_Table. This table would have 2 columns: one would be Original_string and the other would be Replacement_String.
I have a cursor running on Item_master table. For each record, in the Item_description column, I need to scan for the Replace_Table/Original_string and replace it with Replace_Table/replacement_string.
For Example, if my Replace_Table has these 2 rows:
Original_string Replacement_String
--------------------------------------
LO ##
WO ()
If my first Item_Description is 'HELLO WORLD', then I should get the result as 'HEL## ()RLD'.
I cannot use recursive Replace function in SQL because I do not know the number of records in my REPLACE_Table. I cannot use XLATE because it is not character to character replacement.
Only solution I have in mind is to read the REPLACE_Table in a loop and keep replacing Item_Description column value using the REPLACE in SQL.
Is there any other good solution?
Ok, so you're dealing with outputting XML and you're concerned about special characters...
Personally, I'd look at using the CDATA section for any data which might contain special characters...
<name><![CDATA[Mike & Son's Auto]]></name>
Is handled by an XML parser just like
<name>Mike & Son's Auto</name>
would be.
Also consider looking at whatever tools you might be using for web services. Scott Klement's excellent open source HTTP API includes an http_EscapeXml() procedure already.
Failing that, consider using the XMLTEXT() function built into Db2 for i
myText = 'Mike & Son''s Auto';
exec SQL
values (XMLSERIALIZE(XMLTEXT(:myText)
as varchar(50)
excluding XMLDECLARATION
)) into :myXmlText;
Although XMLTEXT() only converts & and < from what I can see...
I require a select query that adds a space to the data based on the placement of the capital letters i.e. 'HelpMe' using this query would be displayed as 'Help Me' . Note i cannot use a stored function to do this the it must be done in the query itself. The Data is of variable length and query must be in SQL. Any Help will be appreciated.
Thanks
You need to use user defined function for this until MS give us support for regular expressions. Solution would be something like:
SELECT col1, dbo.RegExReplace(col1, '([A-Z])',' \1') FROM Table
Aldo this would produce leading space that you can remove with TRIM.
Replace regular expresion function:
http://connect.microsoft.com/SQLServer/feedback/details/378520
About dbo.RegexReplace you can read at:
TSQL Replace all non a-z/A-Z characters with an empty string
Assume if you are using Oracle RDBMS, you use the following,
REGEX_REPLACE
SELECT REGEXP_REPLACE('ILikeToWatchCSIMiami',
'([A-Z.])', ' \1')
AS RX_REPLACE
FROM dual
;
Managed to get this output: * SQLFIDDLE
But as you see it doesn't treat well on words such as CSI though.
Suppose there is a table "A" with 2 columns - ID (INT), DATA (VARCHAR(100)).
Executing "SELECT DATA FROM A" results in a table looks like:
DATA
---------------------
Nowshak 7,485 m
Maja e Korabit (Golem Korab) 2,764 m
Tahat 3,003 m
Morro de Moco 2,620 m
Cerro Aconcagua 6,960 m (located in the northwestern corner of the province of Mendoza)
Mount Kosciuszko 2,229 m
Grossglockner 3,798 m
// the DATA continues...
---------------------
How can I extract only the numerical data using some kind of string processing function in the SELECT SQL query so that the result from a modified SELECT would look like this:
DATA (in INTEGER - not varchar)
---------------------
7485
2764
3003
2620
6960
2229
3798
// the DATA in INTEGER continues...
---------------------
By the way, it would be best if this could be done in a single SQL statement. (I am using IBM DB2 version 9.5)
Thanks :)
I know this thread is old, and the OP doesn't need the answer, but I had to figure this out with a few hints from this and other threads. They all seem to be missing the exact answer.
The easy way to do this is to TRANSLATE all unneeded characters to a single character, then REPLACE that single character with an empty string.
DATA = 'Nowshak 7,485 m'
# removes all characters, leaving only numbers
REPLACE(TRANSLATE(TRIM(DATA), '_____________________________________________________________________________________________', ' abcdefghijklmnopqrstuvwzyaABCDEFGHIJKLMNOPQRSTUVWXYZ`~!##$%^&*()-_=+\|[]{};:",.<>/?'), '_', '')
=> '7485'
To break down the TRANSLATE command:
TRANSLATE( FIELD or String, <to characters>, <from characters> )
e.g.
DATA = 'Sample by John'
TRANSLATE(DATA, 'XYZ', 'abc')
=> a becomes X, b becomes Y, c becomes Z
=> 'SXmple Yy John'
** Note: I can't speak to performance or version compatibility. I'm on a 9.x version of DB2, and new to the technology. Hope this helps someone.
In Oracle:
SELECT TO_NUMBER(REGEXP_REPLACE(data, '[^0-9]', ''))
FROM a
In PostgreSQL:
SELECT CAST(REGEXP_REPLACE(data, '[^0-9]', '', 'g') AS INTEGER)
FROM a
In MS SQL Server and DB2, you'll need to create UDF's for regular expressions and query like this.
See links for more details.
Doing a quick search on line for DB2 the best inbuilt function I can find is Translate It lets you specify a list of characters you want to change to other characters. It's not ideal, but you can specify every character that you want to strip out, that is, every non numeric character available...
(Yes, that's a long list, a very long list, which is why I say it's not ideal)
TRANSLATE('data', 'abc...XYZ,./\<>?|[and so on]', ' ')
Alternatively you need to create a user defined function to search for the number. There are a few alternatives for that.
Check each character one by one and keep it only if it's a numeric.
If you know what precedes the number and what follows the number, you can search for those and keep what is in between...
To elaborate on Dems's suggeston, the approach I've used is a scalar user-defined function (UDF) that accepts an alphanumeric string and recursively iterates through the string (one byte per iteration) and suppresses the non-numeric characters from the output. The recursive expression will generate a row per iteration, but only the final row is kept and returned to the calling application.