Finding Unicode Characters in String Field with Regex in Oracle SQL - sql

I have a string field (comments) that contains a user id such as 'THOMASAN'. However, the string field is dynamic and can have a plethora of things written in it. But it always has the pattern 'UserID'. I am trying to use the REGEXP_SUBSTR function in Oracle SQL to pull the name out.
I have tried REGEXP_SUBSTR(comments,'[A-Z]*') but it brings back null. In a string field how do I pull out this userid?

UPDATE:
For the specific unicode you mentioned 
with cte as ( SELECT ' the left padding thomsan the right padding' comments FROM dual),
cte2 as (select ASCIISTR(upper(comments)) cmt from cte)
SELECT replace(regexp_substr( cmt, '\F7FD[A-Z]+', 1), 'F7FD','') userid from cte2;

Related

Get total number of user where username have defferrent case

I have SQL table where username have different cases for example "ACCOUNTS\Ninja.Developer" or "ACCOUNTS\ninja.developer"
I want to find the how many records where username where first in first and last name capitalize ? how can use Regex to find the total ?
x table
User
"ACCOUNTS\James.McAvoy"
"ACCOUNTS\michael.fassbender"
"ACCOUNTS\nicholas.hoult"
"ACCOUNTS\Oscar.Isaac"
Do you want something like this?
select count(*)
from t
where name rlike 'ACCOUNTS\[A-Z][a-z0-9]*[.][A-Z][a-z0-9]*'
Of course, different databases implement regular expressions differently, so the actual comparator may not be rlike.
In SQL Server, you can do:
select count(*)
from t
where name like 'ACCOUNTS\[A-Z][^.][.][A-Z]%';
You might need to be sure that you have a case-sensitive collation.
In most cases in MS SQL string collation is case insensitive so we need some trick. Here is an example:
declare #accts table(acct varchar(100))
--sample data
insert #accts values
('ACCOUNTS\James.McAvoy'),
('ACCOUNTS\michael.fassbender'),
('ACCOUNTS\nicholas.hoult'),
('ACCOUNTS\Oscar.Isaac')
;with accts as (
select
--cleanup and split values
left(replace(acct,'ACCOUNTS\',''),charindex('.',replace(acct,'ACCOUNTS\',''),0)-1) frst,
right(replace(acct,'ACCOUNTS\',''),charindex('.',replace(acct,'ACCOUNTS\',''),0)) last
from #accts
)
,groups as (--add comparison columns
select frst, last,
case when CAST(frst as varbinary(max)) = CAST(lower(frst) as varbinary(max)) then 'lower' else 'Upper' end frstCase, --circumvert case insensitive
case when CAST(last as varbinary(max)) = CAST(lower(last) as varbinary(max)) then 'lower' else 'Upper' end lastCase
from accts
)
--and gather fruit
select frstCase, lastCase, count(frst) cnt
from groups
group by frstCase,lastCase
Your question is a little vague but;
You might be looking for the DISTINCT command.
REF
I don't think you need regex.
Maybe do something like:
Get distinct names from Table X as Table A
Use inputs table A as where clause on Table X
count
union
I hope this helps,
Rhys
Given your example set you can use a combination of techniques. First if the user name always begins with "ACCOUNTS\" then you can use substr to select the characters that start after the "\" character.
For the first name:
Then you can use a regex function to see if it matches against [A-Z] or [a-z] assuming your username must start with an alpha character.
For the last name:
Use the instr function on the substr and search for the character '.' and again apply the regex function to match against [A-Z] or [a-z] to see if the last name starts with an upper or a lower character.
To total:
Select all matches where both first and last match against upper and do a count. Repeat for the lower matches and you'll have both totals.

SQL: Finding dynamic length characters in a data string

I am not sure how to do this, but I have a string of data. I need to isolate a number out of the string that can vary in length. The original string also varies in length. Let me give you an example. Here is a set of the original data string:
:000000000:370765:P:000001359:::3SA70000SUPPL:3SA70000SUPPL:
:000000000:715186816:P:000001996:::H1009671:H1009671:
For these two examples, I need 3SA70000SUPPL from the first and H1009671 from the second. How would I do this using SQL? I have heard that case statements might work, but I don't see how. Please help.
This works in Oracle 11g:
with tbl as (
select ':000000000:370765:P:000001359:::3SA70000SUPPL:3SA70000SUPPL:' str from dual
union
select ':000000000:715186816:P:000001996:::H1009671:H1009671:' str from dual
)
select REGEXP_SUBSTR(str, '([^:]*)(:|$)', 1, 8, NULL, 1) data
from tbl;
Which can be described as "look at the 8th occurrence of zero or more non-colon characters that are followed by a colon or the end of the line, and return the 1st subgroup (which is the data less the colon or end of the line).
From this post: REGEX to select nth value from a list, allowing for nulls
Sorry, just saw you are using DB2. I don't know if there is an equivalent regular expression function, but maybe it will still help.
For the fun of it: SQL Fiddle
first substring gets the string at ::: and second substring retrieves the string starting from ::: to :
declare #x varchar(1024)=':000000000:715186816:P:000001996:::H1009671:H1009671:'
declare #temp varchar(1024)= SUBSTRING(#x,patindex('%:::%', #x)+3, len(#x))
SELECT SUBSTRING( #temp, 0,CHARINDEX(':', #temp, 0))

How do I modify these group of functions to get everything left of #?

I have a column that has values stored in the following format:
name#URL
All data is stored with this and a second # is never present.
I've got the following statement that strips the URL from this column:
SELECT SUBSTRING ( wf_name ,PATINDEX ( '%#%' , wf_name )+1 , LEN(wf_name)-(PATINDEX ( '%#%' , wf_name )) )
However I want to take the name also (everything left of the #). Unfortuantely I don't understand the functions I'm using above (having read the documentation I'm still confused). Could somebody please help me to understand the flow and how I can adjust this to get everything left of #?
Have a look at the following example
; WITH Table1 AS (
SELECT 'TADA#TEST' AS NameURL
)
SELECT *,
LEFT(NameURL,PATINDEX('%#%',NameURL) - 1) LeftText,
RIGHT(NameURL,PATINDEX('%#%',NameURL)- 1) RightText
FROM Table1
SQL Fiddle DEMO
Using functions
PATINDEX (Transact-SQL)
Returns the starting position of the first occurrence of a pattern in
a specified expression, or zeros if the pattern is not found, on all
valid text and character data types.
LEFT (Transact-SQL)
Returns the left part of a character string with the specified number
of characters.
RIGHT (Transact-SQL)
Returns the right part of a character string with the specified number
of characters.

Oracle: a query, which counts occurrences of all non alphanumeric characters in a string

What would be the best way to count occurrences of all non alphanumeric characters that appear in a string in an Oracle database column.
When attempting to find a solution I realised I had a query that was unrelated to the problem, but I noticed I could modify it in the hope to solve this problem. I came up with this:
SELECT COUNT (*), SUBSTR(TITLE, REGEXP_INSTR(UPPER(TITLE), '[^A-Z,^0-9]'), 1)
FROM TABLE_NAME
WHERE REGEXP_LIKE(UPPER(TITLE), '[^A-Z,^0-9]')
GROUP BY SUBSTR(TITLE, REGEXP_INSTR(UPPER(TITLE), '[^A-Z,^0-9]'), 1)
ORDER BY COUNT(*) DESC;
This works to find the FIRST non alphanumeric character, but I would like to count the occurrences throughout the entire string, not just the first occurrence. E. g. currently my query analysing "a (string)" would find one open parenthesis, but I need it to find one open parenthesis and one closed parenthesis.
There is an obscure Oracle TRANSLATE function that will let you do that instead of regexp:
select a.*,
length(translate(lower(title),'.0123456789abcdefghijklmnopqrstuvwxyz','.'))
from table_name a
Try this:
SELECT a.*, LENGTH(REGEXP_REPLACE(TITLE, '[^a-zA-Z0-9]'), '')
FROM TABLE_NAME a
The best option, as you discovered is to use a PL/SQL procedure. I don't think there's any way to create a regex expression that will return multiple counts like you're expecting (at least, not in Oracle).
One way to get around this is to use a recursive query to examine each character individually, which could be used to return a row for each character found. The following example will work for a single row:
with d as (
select '(1(2)3)' as str_value
from dual)
select char_value, count(*)
from (select substr(str_value,level,1) as char_value
from d
connect by level <= length(str_value))
where regexp_instr(upper(char_value), '[^A-Z,^0-9]'), 1) <> 0
group by char_value;

Reading a part of a alpha numeric string in SQL

I have a table with one column " otname "
table1.otname contains multiple rows of alpha-numeric string resembling the following data sample:
11.10.32.12.U.A.F.3.2.21.249.1
2001.1.1003.8281.A.LE.P.P
2010.1.1003.8261.A.LE.B.B
I want to read the fourth number in every string ( part of the string in bold ) and write a query in Oracle 10g
to read its description stored in another table. My dilemma is writing the first part of the query.i.e. choosing the fourth number of every string in a table
My second query will be something like this:
select description_text from table2 where sncode = 8281 -- fourth part of the data sample in every string
Many thanks.
novice
Works with 9i+:
WITH portion AS (
SELECT SUBSTR(t.otname, INSTR(t.otname, ".", 1, 3)+1, INSTR(t.otname, ".", 1, 4)) 'sncode'
FROM TABLE t)
SELECT t.description_text
FROM TABLE2 t
JOIN portion p ON p.sncode = t.sncode
The use of SUBSTR should be obvious; INSTR is being used to find location the period (.), starting at the first character in the string (parameter value 1), on the 3rd and 4th appearance in the string. You might have to subtract one from the position returned for the 4th instance of the period - test this first to be sure you're getting the right values:
SELECT SUBSTR(t.otname, INSTR(t.otname, ".", 1, 3)+1, INSTR(t.otname, ".", 1, 4)) 'sncode'
FROM TABLE t
I used subquery factoring so the substring happens before you join to the second table. It can be done as a subquery, but subquery factoring is faster.
Newer versions of oracle (including 10g) have various regular expression functions. So you can do something like this:
where sncode = to_number(regexp_replace(otname, '^(\d+\.\d+\.\d+\.(\d+))?.+$', '\2'))
This matches 3 sets of digits-followed-by-a-dot, and a fourth grouped set of digits, followed by the rest of the string, and returns a string consisting of all that entirely replaced by the first group (the fourth set of digits).
Here's a complete query (if I understood your description of the two tables correctly):
select t2.description_text
from table1 t1, table2 t2
where t2.sncode = to_number(regexp_replace(t1.otname, '^(\d+\.\d+\.\d+\.(\d+))?.+$', '\2'))
Another slightly shorter alternative regex:
where t2.sncode = to_number(regexp_replace(t1.otname, '^((\d+\.){3}(\d+))?.+$', '\3'))