SQL: Finding dynamic length characters in a data string - sql

I am not sure how to do this, but I have a string of data. I need to isolate a number out of the string that can vary in length. The original string also varies in length. Let me give you an example. Here is a set of the original data string:
:000000000:370765:P:000001359:::3SA70000SUPPL:3SA70000SUPPL:
:000000000:715186816:P:000001996:::H1009671:H1009671:
For these two examples, I need 3SA70000SUPPL from the first and H1009671 from the second. How would I do this using SQL? I have heard that case statements might work, but I don't see how. Please help.

This works in Oracle 11g:
with tbl as (
select ':000000000:370765:P:000001359:::3SA70000SUPPL:3SA70000SUPPL:' str from dual
union
select ':000000000:715186816:P:000001996:::H1009671:H1009671:' str from dual
)
select REGEXP_SUBSTR(str, '([^:]*)(:|$)', 1, 8, NULL, 1) data
from tbl;
Which can be described as "look at the 8th occurrence of zero or more non-colon characters that are followed by a colon or the end of the line, and return the 1st subgroup (which is the data less the colon or end of the line).
From this post: REGEX to select nth value from a list, allowing for nulls
Sorry, just saw you are using DB2. I don't know if there is an equivalent regular expression function, but maybe it will still help.
For the fun of it: SQL Fiddle

first substring gets the string at ::: and second substring retrieves the string starting from ::: to :
declare #x varchar(1024)=':000000000:715186816:P:000001996:::H1009671:H1009671:'
declare #temp varchar(1024)= SUBSTRING(#x,patindex('%:::%', #x)+3, len(#x))
SELECT SUBSTRING( #temp, 0,CHARINDEX(':', #temp, 0))

Related

How to remove leftmost group of numbers from string in Oracle SQL?

I have a string like T_44B56T4 that I'd like to make T_B56T4. I can't use positional logic because the string could instead be TE_2BMT that I'd like to make TE_BMT.
What is the most concise Oracle SQL logic to remove the leftmost grouping on consecutive numbers from the string?
EDIT:
regex_replace is unavailable but I have LTRIM,REPLACE,SUBSTR, etc.
would this fit the bill? I am assuming there are alphanumeric characters, then underscore, and then the numbers you want to remove followed by anything.
select regexp_replace(s, '^([[:alnum:]]+)_\d*(.*)$', '\1_\2')
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual
)
It uses regular expressions with matched groups.
Alphanumeric characters before underscore are matched and stored in first group, then underscore followed by 0-many digits (it will match as many digits as possible) followed by anything else that is stored in second group.
If we have a match, the string will be replaced by content of the first group followed by underscore and content of the second group.
if there is no match, the string will not be changed.
It seems that you must use standard string functions, as regular expression functions are not available to you. (Comment under Gordon Linoff's answer; it would help if you would add the same at the bottom of your original question, marked clearly as EDIT).
Also, it seems that the input will always have at least one underscore, and any digits that must be removed will always be immediately after the first underscore.
If so, here is one way you could solve it:
select s, substr(s, 1, instr(s, '_')) ||
ltrim(substr(s, instr(s, '_') + 1), '0123456789') as result
from (
select 'T_44B56T4' s from dual union all
select 'TXM_1JK7B' from dual union all
select '34_AB3_1D' from dual
)
S RESULT
--------- ------------------
T_44B56T4 T_B56T4
TXM_1JK7B TXM_JK7B
34_AB3_1D 34_AB3_1D
I added one more test string, to show that only digits immediately following the first underscore are removed; any other digits are left unchanged.
Note that this solution would very likely be faster than regexp solutions, too (assuming that matters; sometimes it does, but often it doesn't).
If I understand correctly, you can use regexp_replace():
select regexp_replace('T_44B56T4', '_[0-9]+', '_')
Here is a db<>fiddle with your two examples.
Note: Your questions says the left most grouping, but the examples all have the number following an underscore, so the underscore seems to be important.
EDIT:
If you really just want the first string of digits replaced without reference to the underscore:
select regexp_replace(code, '[0-9]+', '', 1, 1)
from (select 'T_44B56T4' as code from dual union all select 'TE_2BMT' from dual ) t

Oracle SQL: select last n qualifiers of a delimited string

I have a delimited string in a column, and I want to select the last 5 qualifiers. For example, in the below example i would like to get the result '3,4,5,6,7'.
select '1,2,3,4,5,6,7' as val from dual
I am currently fiddling with reversing the string and trying to do a regexp_substr (maybe in combination with a regexp_count and a row_number?) on it, but I can't quite figure it out yet.
I can find several similar threads, but can't find the answer for oracle sql yet. If I find the solution I will post it here!
You can use regexp_substr():
select regexp_substr('1,2,3,4,5,6,7', '([^,]+[,]?){5}$')
You can try something like :
select substr(val, instr(val, ',', -1, 5) + 1)
This simply finds the fifth occurrence of ',' starting from the right and then returns the string from that character on

replace all occurrences of a sub string between 2 charcters using sql

Input string: ["1189-13627273","89-13706681","118-13708388"]
Expected Output: ["14013627273","14013706681","14013708388"]
What I am trying to achieve is to replace any numbers till the '-' for each item with hard coded text like '140'
SELECT replace(value_to_replace, '-', '140')
FROM (
VALUES ('1189-13627273-77'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
check this
I found the right way to achieve that using the below regular expression.
SELECT REGEXP_REPLACE (string_to_change, '\\"[0-9]+\\-', '140')
You don't need a regexp for this, it's as easy as concatenation of 140 and the substring from - (or the second part when you split by -)
select '140'||substring('89-13706681' from position('-' in '89-13706681')+1 for 1000)
select '140'||split_part('89-13706681','-',2)
also, it's important to consider if you might have instances that don't contain - and what would be the output in this case
Use regexp_replace(text,text,text) function to do so giving the pattern to match and replacement string.
First argument is the value to be replaced, second is the POSIX regular expression and third is a replacement text.
Example
SELECT regexp_replace('1189-13627273', '.*-', '140');
Output: 14013627273
Sample data set query
SELECT regexp_replace(value_to_replace, '.*-', '140')
FROM (
VALUES ('1189-13627273'), ('89-13706681'), ('118-13708388')
) t(value_to_replace);
Caution! Pattern .*- will replace every character until it finds last occurence of - with text 140.

Get total number of user where username have defferrent case

I have SQL table where username have different cases for example "ACCOUNTS\Ninja.Developer" or "ACCOUNTS\ninja.developer"
I want to find the how many records where username where first in first and last name capitalize ? how can use Regex to find the total ?
x table
User
"ACCOUNTS\James.McAvoy"
"ACCOUNTS\michael.fassbender"
"ACCOUNTS\nicholas.hoult"
"ACCOUNTS\Oscar.Isaac"
Do you want something like this?
select count(*)
from t
where name rlike 'ACCOUNTS\[A-Z][a-z0-9]*[.][A-Z][a-z0-9]*'
Of course, different databases implement regular expressions differently, so the actual comparator may not be rlike.
In SQL Server, you can do:
select count(*)
from t
where name like 'ACCOUNTS\[A-Z][^.][.][A-Z]%';
You might need to be sure that you have a case-sensitive collation.
In most cases in MS SQL string collation is case insensitive so we need some trick. Here is an example:
declare #accts table(acct varchar(100))
--sample data
insert #accts values
('ACCOUNTS\James.McAvoy'),
('ACCOUNTS\michael.fassbender'),
('ACCOUNTS\nicholas.hoult'),
('ACCOUNTS\Oscar.Isaac')
;with accts as (
select
--cleanup and split values
left(replace(acct,'ACCOUNTS\',''),charindex('.',replace(acct,'ACCOUNTS\',''),0)-1) frst,
right(replace(acct,'ACCOUNTS\',''),charindex('.',replace(acct,'ACCOUNTS\',''),0)) last
from #accts
)
,groups as (--add comparison columns
select frst, last,
case when CAST(frst as varbinary(max)) = CAST(lower(frst) as varbinary(max)) then 'lower' else 'Upper' end frstCase, --circumvert case insensitive
case when CAST(last as varbinary(max)) = CAST(lower(last) as varbinary(max)) then 'lower' else 'Upper' end lastCase
from accts
)
--and gather fruit
select frstCase, lastCase, count(frst) cnt
from groups
group by frstCase,lastCase
Your question is a little vague but;
You might be looking for the DISTINCT command.
REF
I don't think you need regex.
Maybe do something like:
Get distinct names from Table X as Table A
Use inputs table A as where clause on Table X
count
union
I hope this helps,
Rhys
Given your example set you can use a combination of techniques. First if the user name always begins with "ACCOUNTS\" then you can use substr to select the characters that start after the "\" character.
For the first name:
Then you can use a regex function to see if it matches against [A-Z] or [a-z] assuming your username must start with an alpha character.
For the last name:
Use the instr function on the substr and search for the character '.' and again apply the regex function to match against [A-Z] or [a-z] to see if the last name starts with an upper or a lower character.
To total:
Select all matches where both first and last match against upper and do a count. Repeat for the lower matches and you'll have both totals.

How to get rightmost 10 places of a string in oracle

I am trying to fetch an id from an oracle table. It's something like TN0001234567890345. What I want is to sort the values according to the right most 10 positions (e.g. 4567890345). I am using Oracle 11g. Is there any function to cut the rightmost 10 places in Oracle SQL?
You can use SUBSTR function as:
select substr('TN0001234567890345',-10) from dual;
Output:
4567890345
codaddict's solution works if your string is known to be at least as long as the length it is to be trimmed to. However, if you could have shorter strings (e.g. trimming to last 10 characters and one of the strings to trim is 'abc') this returns null which is likely not what you want.
Thus, here's the slightly modified version that will take rightmost 10 characters regardless of length as long as they are present:
select substr(colName, -least(length(colName), 10)) from tableName;
Another way of doing it though more tedious. Use the REVERSE and SUBSTR functions as indicated below:
SELECT REVERSE(SUBSTR(REVERSE('TN0001234567890345'), 1, 10)) FROM DUAL;
The first REVERSE function will return the string 5430987654321000NT.
The SUBSTR function will read our new string 5430987654321000NT from the first character to the tenth character which will return 5430987654.
The last REVERSE function will return our original string minus the first 8 characters i.e. 4567890345
SQL> SELECT SUBSTR('00000000123456789', -10) FROM DUAL;
Result: 0123456789
Yeah this is an old post, but it popped up in the list due to someone editing it for some reason and I was appalled that a regular expression solution was not included! So here's a solution using regex_substr in the order by clause just for an exercise in futility. The regex looks at the last 10 characters in the string:
with tbl(str) as (
select 'TN0001239567890345' from dual union
select 'TN0001234567890345' from dual
)
select str
from tbl
order by to_number(regexp_substr(str, '.{10}$'));
An assumption is made that the ID part of the string is at least 10 digits.