searching sql DB table for japanese terms - sql

I have a table with columns that allow for different language formats (using nvarchar) and the problem is that when I try to search for these terms; particularly Japanese/Chinese terms, the typical select statement does not work
select * from jtable where searchterm = 'ろくでなし'
It will return 0 which is incorrect since it is definitely in the table. Someone mentioned using cast(....) but not sure how to do this.

Need an N to make the string literal unicode.
select * from jtable where searchterm = N'ろくでなし'
Without the N the 'ろくでなし' is implicit varchar and is seen as '?????'
See my related answer about khmer text for examples of why: Khmer Unicode, English and Microsoft SQL Server 2008 results in questionmarks

Related

Query to ignore rows which have non hex values within field

Initial situation
I have a relatively large table (ca. 0.7 Mio records) where an nvarchar field "MediaID" contains largely media IDs in proper hexadecimal notation (as they should).
Within my "sequential" query (each query depends on the output of the query before, this is all in pure T-SQL) I have to convert these hexadecimal values into decimal bigint values in order to do further calculations and filtering on these calculated values for the subsequent queries.
--> So far, no problem. The "sequential" query works fine.
Problem
Unfortunately, some of these Media IDs do contain non-hex characters - most probably because there was some typing errors by the people which have added them or through import errors from the previous business system.
Because of these non-hex chars, the whole query fails (of course) because the conversion hits an error.
For my current purpose, such rows must be skipped/ignored as they are clearly wrong and cannot be used (there are no medias / data carriers in use with the current business system which can have non-hex character IDs).
Manual editing of the data is not an option as there are too many errors and it is not clear with what the data must be replaced.
Challenge
To create a query which only returns records which have valid hex values within the media ID field.
(Unfortunately, my SQL skills are not enough to create the above query. Your help is highly appreciated.)
The relevant section of the larger query looks like this (xxxx is where your help comes in :-))
select
pureMediaID
, mediaID
, CUSTOMERID
,CONTRACT_CUSTOMERID
from
(
select concat('0x', Replace(Ltrim(Replace(mediaID, '0', ' ')), ' ', '0')) AS pureMediaID
--, CUSTOMERID
, *
from M_T_CONTRACT_CUSTOMERS
where mediaID is not null
and mediaID like '0%'
and xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
) as inner1
EDIT: As per request I have added here some good and some bad data:
Good:
4335463357
4335459809
1426427996
4335463509
4335515039
4335465134
4427370396
4335415661
4427369036
4335419089
004BB03433
004e7cf9c6
00BD23133
00EE13D8C1
00CCB5522C
00C46522C
00dbbe3433
Bad:
4564589+
AB6B8BFC.8
7B498DFCnm
DB218DFChb
d<tgfh8CFC
CB9E8AFCzj
B458DFCjhl
rytzju8DFC
BFCtdsjshj
DB9888FCgf
9BC08CFCyx
EB198DFCzj
4B628CFChj
7B2B8DFCgg
After I did upgrade the compatibility level of the SQL instance to SQL2016 (it was below 2012 before) I could use try_convert with same syntax as the original convert function as donPablo has pointed out. With that the query could run fully through and every MediaID which is not a correct hex value gets nicely converted into a null value - really, really nice.
Exactly what I needed.
Unfortunately, the solution of ALICE... didn't work out for me as this was also (strangely) returning records which had the "+" character within them.
Edit: The added comment of Alice... where you create a calculated field like this:
CASE WHEN "KEY" LIKE '%[^0-9A-F]%' THEN 0 ELSE 1 end as xyz
and then filter in the next query like this:
where xyz = 1
works also with SQL Instances with compatibility level < SQL 2012.
Great addition for people which still have to work with older SQL instances.
An option (although not ideal in terms of performance) is to check the characters in the MediaID through a case statement and regular expression
Hexadecimals cannot contain characters other than A-F and numbers between 0 and 9
CASE WHEN MediaID LIKE '%[0-9A-F]%' THEN 1 ELSE 0 END
I would recommend writing a function that can be used to evaluate MediaID first and checks if it is hexadecimal and then running the query for conversion

Can I convert centigrade to farenheit in a query (not a function)

In Oracle I can convert centigrade to farenheit in an SQL query, see below. It seems SQL Server does not have full regex functionality. Is it possible to do this without dropping into a function, which I currently do?
(UNISTR('00B0') is the degree symbol we use)
The requirement is for any string that contains [digits]°C to be converted to same string with [new_digits]°F.
SELECT replace(replace(v_text_f,replace(regexp_substr(v_text_f,'\-? [[:digit:]]+\.?[[:digit:]]*'||UNISTR('\00B0')||'C'),UNISTR('\00B0')||'C'),
replace(regexp_substr(v_text_f,'\-?[[:digit:]]+\.?[[:digit:]]*'||UNISTR('\00B0')||'C'),UNISTR('\00B0')||'C')*9/5+32||UNISTR('\00B0')||'F'),
UNISTR('\00B0')||'F'||UNISTR('\00B0')||'C',UNISTR('\00B0') ||'F')
FROM (SELECT '38'||UNISTR('\00B0')||'C' as v_text_f FROM DUAL)
Try this, comparing to Oracle code, extremely simplified version:
DECLARE #C nvarchar(10) = '38'+CHAR(0x00B0)+'C' --38°C
SELECT CONVERT(nvarchar(10),CONVERT(int ,LEFT(#C, CHARINDEX(CHAR(0x00B0), #C)-1))*9/5+32)+CHAR(0x00B0)+'F'
--100°F

Cannot insert character '≤' in SQL Server 2008

I have a SQL Server 2008 database and a nvarchar(256) field of a table. The crazy problem is that when I run this query:
update ruds_values_short_text
set value = '≤ asjdklasd'
where rud_id=12202 and field_code='detection_limit'
and then
select * from ruds_values_short_text
where rud_id=12202 and field_code='detection_limit'
I get this result:
12202 detection_limit = asjdklasd 11
You can see that the character ≤ has been transformed in =
It's an encoding related problem, for sure, in fact, if I try to paste '≤' in Notepad++ it pastes '=' but I get '≤' when I convert ANSI to UTF-8.
So.. I think I should write the query in UTF8.. but how? Thanks.
You need to use the N prefix so the literal is treated as Unicode rather than being treated as character data in the code page of your database's default collation.
update ruds_values_short_text
set value = N'≤ asjdklasd'
where rud_id=12202 and field_code='detection_limit'
Try update ruds_values_short_text set value = N'≤ asjdklasd' where rud_id=12202 and field_code='detection_limit'. The N indicates that you are providing national language so it respects the encoding.

Arabic SQL query (on Oracle DB) returns empty result

I have this query (that runs on Oracle 10g database):
SELECT ge.*, ge.concept AS glossarypivot
FROM s_glossary_entries ge
WHERE (ge.glossaryid = '161' OR ge.sourceglossaryid = '161')
AND (ge.approved != 0 OR ge.userid = 361)
AND concept like 'م%' ORDER BY ge.concept
The query must display all words that begin with the arabic letter "م"
but unfortunately, it returns empty result ..
However, if I run the same query on the same database which runs on MYSQL, it works well and displays the correct result ..
and also, if I run the same query with an english letter (m), like this:
SELECT ge.*, ge.concept AS glossarypivot
FROM s_glossary_entries ge
WHERE (ge.glossaryid = '161' OR ge.sourceglossaryid = '161')
AND (ge.approved != 0 OR ge.userid = 361)
AND concept like 'm%' ORDER BY ge.concept
it displays result correctly and not empty !!
What should I do in order to get this query working the right way on oracle 10 database?
P.S. the oracle database character set is : "AL32UTF8"
thank you so much in advance
Sure that this works in MySQL? I would do this part:
AND concept = 'م'
like this:
AND concept LIKE 'م%'
or because it's arabic and the first char is the right one's like this:
AND concept LIKE '%م'
But i have no idea if Oracle even have LIKE, i never worked with Oracle.
if I put a UTF8 character : " ظ… " instead of the arabic character "م" it will work on oracle ...
The obvious question is, do you have matching data.
You can use SELECT DUMP(concept), DUMP('م') FROM ... to see the bytes that actually form the value. My database gives me 217/133. I believe there are some characters which can have different bytes in UTF-8 but the same physical appearance, though I couldn't say whether this is one of them.
Also, consult the Globalization guide.
i thjink its a mismatch in your oracle client codepage. it should be defined in the same character set as the database, otherwise there will be some character conversion.

Regular expressions inside SQL Server

I have stored values in my database that look like 5XXXXXX, where X can be any digit. In other words, I need to match incoming SQL query strings like 5349878.
Does anyone have an idea how to do it?
I have different cases like XXXX7XX for example, so it has to be generic. I don't care about representing the pattern in a different way inside the SQL Server.
I'm working with c# in .NET.
You can write queries like this in SQL Server:
--each [0-9] matches a single digit, this would match 5xx
SELECT * FROM YourTable WHERE SomeField LIKE '5[0-9][0-9]'
stored value in DB is: 5XXXXXX [where x can be any digit]
You don't mention data types - if numeric, you'll likely have to use CAST/CONVERT to change the data type to [n]varchar.
Use:
WHERE CHARINDEX(column, '5') = 1
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
CHARINDEX
ISNUMERIC
i have also different cases like XXXX7XX for example, so it has to be generic.
Use:
WHERE PATINDEX('%7%', column) = 5
AND CHARINDEX(column, '.') = 0 --to stop decimals if needed
AND ISNUMERIC(column) = 1
References:
PATINDEX
Regex Support
SQL Server 2000+ supports regex, but the catch is you have to create the UDF function in CLR before you have the ability. There are numerous articles providing example code if you google them. Once you have that in place, you can use:
5\d{6} for your first example
\d{4}7\d{2} for your second example
For more info on regular expressions, I highly recommend this website.
Try this
select * from mytable
where p1 not like '%[^0-9]%' and substring(p1,1,1)='5'
Of course, you'll need to adjust the substring value, but the rest should work...
In order to match a digit, you can use [0-9].
So you could use 5[0-9][0-9][0-9][0-9][0-9][0-9] and [0-9][0-9][0-9][0-9]7[0-9][0-9][0-9]. I do this a lot for zip codes.
SQL Wildcards are enough for this purpose. Follow this link: http://www.w3schools.com/SQL/sql_wildcards.asp
you need to use a query like this:
select * from mytable where msisdn like '%7%'
or
select * from mytable where msisdn like '56655%'