Select PART of a SQL field between two known string elements - sql

I am looking to extract only a portion of a string from a comment field in a SQL table. The current string looks like this: "TN:mba trucking|HR:cf82267|TR:solomon|AI:|N/A". What I want to do is to select anything from TN: up until the next pipe. Then I want to select anything from HR: up until the next pipe. Note, the data between these parameters are not always the same length, therefore cannot use SUBSTRING.

For Oracle you could use instr and substr, to be flexible according to different stringlength. In my example I did not create a table where I could select your samplestring from. Therefore it's a bit clunky. If you can use normal fieldnames instead of repeating the string it will look much more compact.
SELECT SUBSTR('TN:mba trucking|HR:cf82267|TR:solomon|AI:|N/A',
INSTR('TN:mba trucking|HR:cf82267|TR:solomon|AI:|N/A', 'TN:'),
INSTR('TN:mba trucking|HR:cf82267|TR:solomon|AI:|N/A', '|') - 1)
FROM dual;
I included the -1 to cut the pipe off. If you want to, you can cut the TN: off too. This is a very basic way and not the best solution. You will get problems, if the text contains some of the keywords you are looking for in the instr-function. Depending on the overall context it may be wise to use some regex, like simsim mentioned before.

For example:
I wrote this code to get me the list of numbers inside a string
SELECT DISTINCT A.REZ FROM
(
SELECT REGEXP_SUBSTR('1213-1201+1202+1203+1204+1205+1206+1207+1208+1209+1210+1211', '[0-9]+', 1, LEVEL) AS REZ FROM dual
CONNECT BY REGEXP_SUBSTR('1213-1201+1202+1203+1204+1205+1206+1207+1208+1209+1210+1211', '[0-9]+', 1, LEVEL) IS NOT NULL
) A;
you can write your regular expression to get the words from between each pipe
so, it will be something like this:
SELECT A.REZ FROM
(
SELECT REGEXP_SUBSTR(Your_Var, Your_Regular_Expression, 1, LEVEL) AS REZ FROM dual
CONNECT BY REGEXP_SUBSTR(Your_Var, Your_Regular_Expression, 1, LEVEL) IS NOT NULL
) A;
Where Your_Var will hold the text you want to work with
And Your_Regular_Expression will be the Regular Expression that you need to construct to meet your needs
Note: this is Oracle Sql Syntax

Related

Use REGEXP_SUBSTR to extract string of varied length

I want to extract alphanumeric text of varied length from a string between the second occurrence of a specific characters.
I have tried various forms of substr and regexp_substr but can't seem to get the syntax right. This is for use in Teradata SQL assistant. In the past I would have to create a temp table and use substr twice before trimming down the string to what I need. I want to do it all in one go.
SELECT regexp_substr('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num\\:+(\\:+)',1,2, ':') as RESULTING_STRING
My desired result is to return ONLY what is between "Num:" and the next "," in this case "12345T6". The length of the order number can vary so it is not a fixed length. When I run my code the actual output is a '?' returned by Teradata. What am I doing wrong?
This seems to work:
SELECT regexp_substr('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num:(\w*)', 1, 1, NULL, 1) as RESULTING_STRING from dual
Finds Num: and then captures as many word characters (, is not a word char) as there are available. The last parameter - subexpr - specifies which subexpression (aka capture group) you want, without it the whole thing will be matched (Num:12345T6).
Assuming you use Teradata SQL Assistant to query a Teradata system (but why do you tag Oracle then) the RegEx syntax is slightly different (both use a different RegEx dialects):
Teradata's RegExp_Substr doesn't support the subexpression parameter, you can either switch to the (I really don't know why) undocumented RegExp_Substr_gpl
RegExp_Substr_gpl(x, 'Num:([^,]*)', 1, 1, 'i', 1)
or tell the RegEx to forget the previous match using \K:
RegExp_Substr(x, 'Num:\K[^,]*', 1,1, 'i')
You can give a try to the below pattern search !
SELECT REGEXP_REPLACE ((REGEXP_SUBSTR('Channel:DF GB, Order Num:12345T6, Order Date:01/01/2019, Charge Codes:TAXES,,GBRAX', 'Num:[A-Za-z0-9]*',1,1, 'i')),'Num:','',1,1,'i') AS RESULTING_STRING
Regexp_substr pattern search ['Num:[A-Za-z0-9]*'], will first filter out the alphanumeric characters that follow the pattern 'Num:',astriek, helps to find out zero or more occurrences of the specified pattern.
For eg:, in this 'Num:12345T6' will be filtered out of the string provided, also note the last parameter in the regexp_substr is 'i', which ensures case in-specific search.
Lastly, Regexp_replace will replace the pattern 'Num:' from the output of the regexp_substr with an empty string,resulting in a final string as '12345T6'.

Oracle SQL: select last n qualifiers of a delimited string

I have a delimited string in a column, and I want to select the last 5 qualifiers. For example, in the below example i would like to get the result '3,4,5,6,7'.
select '1,2,3,4,5,6,7' as val from dual
I am currently fiddling with reversing the string and trying to do a regexp_substr (maybe in combination with a regexp_count and a row_number?) on it, but I can't quite figure it out yet.
I can find several similar threads, but can't find the answer for oracle sql yet. If I find the solution I will post it here!
You can use regexp_substr():
select regexp_substr('1,2,3,4,5,6,7', '([^,]+[,]?){5}$')
You can try something like :
select substr(val, instr(val, ',', -1, 5) + 1)
This simply finds the fifth occurrence of ',' starting from the right and then returns the string from that character on

Comparing fields when a field has data in between 2 characters that match the field being compared

I have code that looks like this:
left outer join
gme_batch_header bh
on
substr(ln.lot_number,instr(ln.lot_number,'(') + 1,
instr(ln.lot_number,')') - instr(ln.lot_number,'(') - 1)
=
bh.batch_no
It works fine, but I have come across a few lot numbers that have two sections of strings that are between parenthesis. How would I compare what is between the second set of parenthesis? Here is an example of the data in the lot number field:
E142059-307-SCRAP-(74055)
This one works with the code,
58LF-3-B-2-2-2 (SCRAP)-(61448)
This one tries comparing SCRAP with the batch no, which isn't correct. It needs to be the 61448.
The result is always the last item in parenthesis.
After more research, I actually got it to work with this code:
substr(ln.lot_number,instr(ln.lot_number,'(',-1) + 1, instr(ln.lot_number,')',-1) - instr(ln.lot_number,'(',-1) - 1)
Assuming SQL2005+, and it is always the last occurrence you want, then I would suggest finding the last instance of a ( in your query and substring to there. To get the last instance you could use something like:
REVERSE(SUBSTRING(REVERSE(lot_number),0,CHARINDEX('(',REVERSE(lot_number))))
If your version of Oracle supports regular expressions try this:
substr(regexp_substr(ln.lot_number,'[0-9]+\)$'),1,length(regexp_substr(ln.lot_number,'[0-9]+\)$'))-1)
Explanation:
regexp_substr(scrap_row,'[0-9]+\)$' ==> find me just numbers in the string that ends in ). This returns the numbers but it includes the closing parenthesis.
To remove the closing parenthsis, just send it through substring and extract first number through the length of the number stopping at 1 character from the end of the string.
Query for analysis:
with scrap
as (select '58LF-3-B-2-2-2 (SCRAP)-(61448)' as scrap_row from dual)
select scrap_row,
regexp_substr(scrap_row,'[0-9]+\)$') as regex_substring,
length(regexp_substr(scrap_row,'[0-9]+\)$')) as length_regex_substring,
substr(regexp_substr(scrap_row,'[0-9]+\)$'),1,length(regexp_substr(scrap_row,'[0-9]+\)$'))-1) as regex_sans_parenthesis
from scrap
If you have 11g, this will do it pretty simply by using the subgroup argument of regexp_substr() and constructing the regex appropriately:
SQL> with tbl(data) as
(
select 'E142059-307-SCRAP-(74055)' from dual
union
select '58LF-3-B-2-2-2 (SCRAP)-(61448)' from dual
)
select data from tbl
where regexp_substr(data, '\((\d+)\)$', 1, 1, NULL, 1)
= '61448';
DATA
------------------------------
58LF-3-B-2-2-2 (SCRAP)-(61448)
The regular expression can be read as:
\( - Search for a literal left paren
( - Start a remembered subgroup
\d+ - followed by 1 more more digits
) - End remembered subgroup
\) - followed by a literal right paren
$ - at the end of the line.
The regexp_substr function arguments are:
Source - the source string
Pattern - The regex pattern to look for
position - Position in the string to start looking for the pattern
occurrence - If the pattern occurs multiple times, which occurrence you want
match_params - See the docs, not used here
subexpression - which subexpression to use (the remembered group)
So in English, look for a series of 1 or more digits surrounded by parens, where it occurs at the end of the line and save the digit part only to use to compare. IMHO a lot easier to follow/maintain than nested instr(), substr().
For re-useability, make a function called get_last_number_in_parens() that contains this code and uses an argument of the string to search. This way that logic is encapsulated and can be re-used by folks that may not be so comfortable with regular expressions, but can benefit from the power! One place to maintain code too. Then call like this:
select data from tbl
where get_last_number_in_parens(data) = '61448';
How easy is that?!
Hello you can check with this code. It works whaever the condition may be
SELECT SUBSTR('58LF-3-B-2-2-2-(61448)',instr('58LF-3-B-2-2-2-(61448)','(',-1)+1,LENGTH('58LF-3-B-2-2-2-(61448)')-instr('58LF-3-B-2-2-2-(61448)','(',-1)-1)
FROM dual;
SELECT SUBSTR('58LF-3-B-2-2-2 (SCRAP)-(61448)',instr('58LF-3-B-2-2-2 (SCRAP)-(61448)','(',-1)+1,LENGTH('58LF-3-B-2-2-2 (SCRAP)-(61448)')-instr('58LF-3-B-2-2-2 (SCRAP)-(61448)','(',-1)-1)
FROM dual;
Output
==================================
61448
==================================

How to match and replace sections of a string in SQL

I'm pulling a list of popular sites from my database, but I want to combine results that are from the same domain. I've been able to do this partially by using :
REGEXP_REPLACE(site, '%|^www([123])?\.|^m\.|^mobile\.|^desktop\.')) as site
so that "www.facebook.com" and "facebook.com" or "m.facebook.com"
- all of which appear in the database - are treated as the same when I do a select distinct.
However, I want to take this a step further by writing an expression that looks at each string between periods. If a match is found consecutively in three or more strings between periods, then I want to treat those as the same. I simply can't predict every possible string that could come before "facebook.com", or any other site.
So for example:
"my.careerone.com.au" and
"careerone.com.au" match in three places.
Or "yahoo.realestate.com.au" and "rs.realestate.com.au" match in three places.
Any ideas on how to achieve this?
#David code will work in Vertica as well but not so well performance wise maybe.
You can use Vertica's own internal functions such as TRIM & REGEXP_REPLACE.
After borrowing #David Faber reg exp i endend-up with this.
select TRIM(LEADING '.' from REGEXP_REPLACE(col_name,'^.*((\.[^.]+){3})$', '\1')) AS fixed_dn from table_name;
I don't have Vertica available so I tested this in Oracle SQL (which does have REGEXP_REPLACE() that is similar to Vertica's). Not sure what the CTE syntax would be in Vertica but you'll be querying against a table anyway:
WITH d1 AS (
SELECT 'my.careerone.com.au' AS domain_nm FROM dual
UNION ALL
SELECT 'careerone.com.au' FROM dual
UNION ALL
SELECT 'yahoo.realestate.com.au' FROM dual
UNION ALL
SELECT 'rs.realestate.com.au' FROM dual
)
SELECT domain_nm, TRIM('.' FROM REGEXP_REPLACE(domain_nm, '^.*((\.[^.]+){3})$', '\1')) AS domain_nm_fix
FROM d1;
What REGEXP_REPLACE() does here is trim the highest level subdomains from the domain name, if it exists and if there are more than 3 levels. If there are only three levels then nothing will be replaced as the regex won't match -- that is why the leading . character then has to be trimmed. So, for example, careerone.com.au will be unaltered, while my.careerone.com.au will be changed to .careerone.com.au by the REGEXP_REPLACE(), from which the leading . then has to be trimmed.

SQL: Finding dynamic length characters in a data string

I am not sure how to do this, but I have a string of data. I need to isolate a number out of the string that can vary in length. The original string also varies in length. Let me give you an example. Here is a set of the original data string:
:000000000:370765:P:000001359:::3SA70000SUPPL:3SA70000SUPPL:
:000000000:715186816:P:000001996:::H1009671:H1009671:
For these two examples, I need 3SA70000SUPPL from the first and H1009671 from the second. How would I do this using SQL? I have heard that case statements might work, but I don't see how. Please help.
This works in Oracle 11g:
with tbl as (
select ':000000000:370765:P:000001359:::3SA70000SUPPL:3SA70000SUPPL:' str from dual
union
select ':000000000:715186816:P:000001996:::H1009671:H1009671:' str from dual
)
select REGEXP_SUBSTR(str, '([^:]*)(:|$)', 1, 8, NULL, 1) data
from tbl;
Which can be described as "look at the 8th occurrence of zero or more non-colon characters that are followed by a colon or the end of the line, and return the 1st subgroup (which is the data less the colon or end of the line).
From this post: REGEX to select nth value from a list, allowing for nulls
Sorry, just saw you are using DB2. I don't know if there is an equivalent regular expression function, but maybe it will still help.
For the fun of it: SQL Fiddle
first substring gets the string at ::: and second substring retrieves the string starting from ::: to :
declare #x varchar(1024)=':000000000:715186816:P:000001996:::H1009671:H1009671:'
declare #temp varchar(1024)= SUBSTRING(#x,patindex('%:::%', #x)+3, len(#x))
SELECT SUBSTRING( #temp, 0,CHARINDEX(':', #temp, 0))