SQL finding rows that only contain chars from a certain Unicode range - sql

I recently asked a question to obtain rows that contain characters in a certain Unicode range.
SELECT *
FROM #kanjinames
WHERE UNICODE(LEFT(ForeNames, 1)) BETWEEN 0x4e00 AND 0x9fff
A very helpful user shared the above with me. To my understanding it checks the first character on the left and if it is within the Unicode range it returns an a the row. Through testing I believe this works.
My current problem is how do I go about checking the entire column is within the range? For example:
石山コンタクトレンズ
The above contains characters outside of the range (the first two characters are within range) in the query above but I am not sure about how I go about checking the entire field. I am away of using stuff like
is not like N'%^a-z%'
for the English alphabet. Just not sure how to apply it for this situation.
Any help would be great on this.

I think this will work:
SELECT *
FROM #kanjinames
WHERE ForeNames NOT LIKE '%[^' + NCHAR(0x4e00) + '-' NCHAR(0x9fff) + ']%';
That is, the string contains no characters outside that sequence.
Edit: I had to alter this slightly to get it to work. I had to use the decimal values instead of the hex.
SELECT *
FROM #kanjinames
WHERE ForeNames NOT LIKE '%[^' + NCHAR(19968) + '-' + NCHAR(40802) + ']%';
This still returns blank values but I removed those separately.

Related

How to search for separated values in cloumns from a merged values column

I have a database where the data I need to work with is stored into two different columns. I also need to import an excel file and the data in this excel file is all together only separated by a dash. So either I need to figure out how to create a query, maybe an alias, or how to split the column by the dash and then make the query with the data split up.
The code I was trying was the following:
SELECT
CAST (dbo_predios.codigo_manzana_predio as nvarchar(55))+'-
'+CAST(dbo_predios.codigo_lote_predio as nvarchar(55)) as ROL_AVALUO
FROM dbo_predios
WHERE ROL_AVALUO like '%9132-2%'
That is one way I tried, but I don't know well how to split by a determined symbol. The data on the excel comes in the exact same way that I wrote in the "like" portion of the code.
I believe this is what you are after from the sounds of it:
SELECT
[locateDashInString] = CHARINDEX('-', e.FieldHere, 0) --just showing you where it finds the dash
,[SubstringBeforeItemLocated] =
SUBSTRING(
e.FieldHere --string to search from
,0 --starting index
,CHARINDEX('-', e.FieldHere, 0) --index of found item
)
,[SubstringAfterItemLocated] =
SUBSTRING(
e.FieldHere --string to search from
,CHARINDEX('-', e.FieldHere, 0) + 1 --starting index for substring
,LEN(e.FieldHere) --finish substring at this point
)
FROM ExcelImportedDataTable e
The locateDashInString column is just to show you where it finds the '-' symbol, you don't actually need it, the other two columns are a split of the value so '9132-2' split into two values/two columns.
**Just note that this will only work if you always have the format of val1-val2 in the data. Aslong as the format is the same it should be fine.

Sort a VARCHAR column in SQL Server that contains numbers?

I have a column in which data has letters with numbers.
For example:
1 name
2 names ....
100 names
When sorting this data, it is not sorted correctly, how can I fix this? I made a request but it doesn’t sort correctly.
select name_subagent
from Subagent
order by
case IsNumeric(name_subagent)
when 1 then Replicate('0', 100 - Len(name_subagent)) + name_subagent
else name_subagent
end
This should work
select name_subagent
from Subagent
order by CAST(LEFT(name_subagent, PATINDEX('%[^0-9]%', name_subagent + 'a') - 1) as int)
This expression will find the first occurrence of a letter withing a string and assume anything prior to this position is a number.
You will need to adapt this statement to your needs as apparently your data is not in Latin characters.
With a bit of tweaking you should be able to achieve exactly what you're looking for:
select
name_subagent
from
Subagent
order by
CAST(SUBSTRING(name_subagent,0,PATINDEX('%[A-Z]%',name_subagent)) as numeric)
Note, the '%[A-Z]%' expression. This will only look for the first occurrence of a letter within the string.
I'm not considering special characters such as '!', '#' and so on. This is the bit you might want to play around with and adapt to your needs.

how to get regexp_substr for a string

In my table for the rows containing values like
sample>test Y10,
Sample> y21
I want to get a substring like y10,y21 from all rows. May I pls know how to get it. I tried with regexp_substr,Instr but not able to find the solution.
I am supposing that your string from column is devided by a single space .
It will give you last occurances which will be splited by ' ' a space
substr(your_string, 1, instr(yourString,' ') - 1)
OR you can achive this using regexp_substr
regexp_substr(your_String, '[^[:space:]]+', 1, -1 )
Assuming that yxx is always preceded by a space, it should be as easy as doing this:
TRIM(REGEXP_SUBSTR(mycolumn, ' y\d+', 1, 1, 'i'))
The above regular expression will grab y (note that it is case-insensitive, so it will grab Y as well) followed by an indefinite number (one or more) of digits. If you want to grab just two digits, replace \d+ with \d{2}.
Also, please note that it will get the first occurrence only. Getting multiple occurrences is a bit more complicated, but it can still be done.

Query for blank white space before AND after a number string

How would i go about constructing a query, that would return all material numbers that have a "blank white space" either BEFORE or AFTER the number string? We are exporting straight from SSMS to excel and we see the problem in the spreadsheet. If i could return all of the material numbers with spaces.. i could go in and edit them or do a replace to fix this issue prior to exporting! (the mtrl numbers are imported in via a windows application that users upload an excel template to. This template has all of this data and sometimes they place in spaces in or after the material number). The query we have used to work but now it does not return anything, but upon export we identify these problems you see highlighted in the screenshot (left screenshot) and then query to find that mtrl # in the table (right screenshot). And indeed, it has a space before the 1.
Currently the query we use looks like:
SELECT Mtrl
FROM dbo.Source
WHERE Mtrl LIKE '% %'
Since you are getting the data from a query, you should just have that query remove any potential spaces using LTRIM and RTRIM:
LTRIM(RTRIM([MTRL]))
Keep in mind that these two commands remove only spaces, not tabs or returns or other white-space characters.
Doing the above will make sure that the data for the entire set of data is fine, whether or not you find it and/or fix it.
Or, since you are copying-and-pasting from the Results Grid into Excel, you can just CONVERT the value to a number which will naturally remove any spaces:
SELECT CONVERT(INT, ' 12 ');
Returns:
12
So you would just use:
CONVERT(INT, [MRTL])
Now, if you want to find the data that has anything that is not a digit in it, you would use this:
SELECT Mtrl
FROM dbo.Source
WHERE [Mtrl] LIKE '%[^0-9]%'; -- any single non-digit character
If the issue is with non-space white-space characters, you can find out which ones they are via the following (to find them at the beginning instead of at the end, change the RIGHT to be LEFT):
;WITH cte AS
(
SELECT UNICODE(RIGHT([MTRL], 1)) AS [CharVal]
FROM dbo.Source
)
SELECT *
FROM cte
WHERE cte.[CharVal] NOT BETWEEN 48 AND 57 -- digits 0 - 9
AND cte.[CharVal] <> 32; -- space
And you can fix in one shot using the following, which removes regular spaces (char 32 via LTRIM/RTRIM), tabs (char 9), and non-breaking spaces (char 160):
UPDATE src
SET src.[Mtrl] = REPLACE(
REPLACE(
LTRIM(RTRIM(src.[Mtrl])),
CHAR(160),
''),
CHAR(9),
'')
FROM dbo.Source src
WHERE src.[Mtrl] LIKE '%[' -- find rows with any of the following characters
+ CHAR(9) -- tab
+ CHAR(32) -- space
+ CHAR(160) -- non-breaking space
+ ']%';
Here I used the same WHERE condition that you have since if there can't be any spaces then it doesn't matter if you check both ends or for any at all (and maybe it is faster to have a single LIKE instead of two).

sql query for alphanumeric ID in hex

I want to be able to differentiate between a string that is alphnumeric and a string that is in hex format.
My current query is:
<columnName> LIKE '?_____=' + REPLICATE('[0-9A-Fa-f]',16)
I found this method of searching for hex ID's online and I thought it was working. However after getting a significantly larger sample size I can see a high false positive rate in my results. The problem is that this gives me all the results I do want but it also gives me a bunch of results I dont care about. For example:
I want to see:
<url>.php?mains=d7ad916d1c0396ff
but i dont want to see:
<url>.php?mblID=2007012422060265
The difference between the 2 strings is that the 16 characters at the end that i want to collect are all numeric and not a hex ID. What are some ways you guys use to limit the results to hex ID only? Thanks in advnace.
UPDATE:
Juergen brought up a good point, the second number could be a hex value to. Not all hex numbers contain [a-F]. I would like to rephrase the question to state that I am looking for an ID with both letters and numbers in it, not just numbers.
The simplest way is just to add a separate clause for that restriction:
<columnName> LIKE '?_____=' + REPLICATE('[0-9A-Fa-f]',16)
AND <columnName> NOT LIKE '?_____=' + REPLICATE('[0-9]',16)
It should be fairly simple to determine if a string contains only numbers...
Setting up a test table:
CREATE TABLE #Temp (Data char(32) not null)
INSERT #Temp
values ('<url>.php?mains=d7ad916d1c0396ff')
,('<url>.php?mblID=2007012422060265 ')
Write a query:
SELECT
right(Data, 16) StringToCheck
,isnumeric(right(Data, 16)) IsNumeric
from #Temp
Get results:
StringToCheck IsNumeric
d7ad916d1c0396ff 0
2007012422060265 1
So, if the IsNumeric function returns 0, it could be a hex string.
This makes several assumptions:
The rightmost 16 characters are what you want to check
You only ever hit 16 characters. I don't know when the string would get too long to check.
A non-numeric character means hex. Any chance of "Q" or "~" being embedded in the string?