How to extract multiple dates from varchar2(4000) multiline string using sql? - sql

I have two columns ID (NUMBER), DESCRIPTION (VARCHAR2(4000)) in original table
DESCRIPTION column has multi line strings.
I need to extract dates from each line of the string and also need to find earliest date. so the result would look like in expected result table.
Origional result:
Expected Table:
Using this query:
to_date((regexp_substr(A.Description , '\d{1,2}/\d{1,2}/\d{4}')), 'MM-DD-YYYY')
I was able to extract date from the first line
Discontinued:09/10/2015:Rappaport Family Institute for Research:;
only, but not from the other two.

OK, I think I found a solution similar to the other post, but simpler. FYI. regexp_substr() function only returns one match. Here is an example with a string with embedded line feeds (really does not matter, but added to show it will work in this case):
WITH A AS
(SELECT 'this is a test:12/01/2015 01/05/2018'
|| chr(13)
||chr(10)
|| ' this is the 2nd line: 07/07/2017' Description
FROM dual
)
SELECT to_date(regexp_substr(A.Description , '\d{1,2}/\d{1,2}/\d{4}',1,level),'MM/DD/YYYY')
FROM A
CONNECT BY level <= regexp_count(a.description, '\d{1,2}/\d{1,2}/\d{4}')
Output:
12/01/2015
01/05/2018
07/07/2017
If you are not familiar with hierarchical queries in oracle, "level" is a pseudo-column. By using that as the 3rd parameter (occurrence) in the regexp_substr function, each "level" will start the pattern match after the prior found substring. regexp_count will count the #times the pattern is matched, so we keep parsing the sting, moving over one occurrence until the max #of matches is reached.

Related

How to get a specific substring from a column in sql

I have a Orders table that has one of the columns called "details" as:
Contact ID: A18YTX7GWEJRU8 City/Site and Site Name: Orlando - Orlando (UFL4) Date of Call (MM/DD/YYYY): 01/23/2017 Time of Call (Local Time): 16:44 Order ID(s): 112-0654231-9637802 Call Summary: Cx did not receive. Order marked as delivered to doorstep at 16:27 created by flx-cstech on behalf of sssmiley.
There are different cell values in that column. Also could be like:
Short Description: Dry Ice Retrieval Please enter the following information for the site ops to pick up the dry ice from the customer: Contact ID:AD3R60PA1QCCF Order ID:112-6254812-3186644
Or anything else.
I just want to extract the Order ID(s): 112-0654231-9637802 part from it. How do I do that?
SELECT REGEXP_SUBSTR(
your_column,
'Order\s+ID(\s*\(s\))?:\s*\d{3}-\d{7}-\d{7}'
)
FROM your_table
To just get the number you can wrap the number in a capture group:
SELECT REGEXP_SUBSTR(
your_column,
'Order\s+ID(\s*\(s\))?:\s*(\d{3}-\d{7}-\d{7})',
1, -- Start from the 1st character
1, -- Get the 1st match
NULL, -- Apply default flags
2 -- Get the 2nd capture group
)
FROM your_table
Or, if you do not have anything else with the same 3-digit, dash, 7-digit, dash, 7-digit format:
SELECT REGEXP_SUBSTR(
your_column,
'\d{3}-\d{7}-\d{7}',
)
FROM your_table
Your string looks like a fixed format string, so the simplest way would be:
select substr(detail, 160, 31)
https://docs.oracle.com/cd/B12037_01/appdev.101/b10795/adfns_re.htm
REGEXP_LIKE This function searches a character column for a pattern. Use this
function in the WHERE clause of a query to return rows matching the
regular expression you specify.
and similar functions

Comparing fields when a field has data in between 2 characters that match the field being compared

I have code that looks like this:
left outer join
gme_batch_header bh
on
substr(ln.lot_number,instr(ln.lot_number,'(') + 1,
instr(ln.lot_number,')') - instr(ln.lot_number,'(') - 1)
=
bh.batch_no
It works fine, but I have come across a few lot numbers that have two sections of strings that are between parenthesis. How would I compare what is between the second set of parenthesis? Here is an example of the data in the lot number field:
E142059-307-SCRAP-(74055)
This one works with the code,
58LF-3-B-2-2-2 (SCRAP)-(61448)
This one tries comparing SCRAP with the batch no, which isn't correct. It needs to be the 61448.
The result is always the last item in parenthesis.
After more research, I actually got it to work with this code:
substr(ln.lot_number,instr(ln.lot_number,'(',-1) + 1, instr(ln.lot_number,')',-1) - instr(ln.lot_number,'(',-1) - 1)
Assuming SQL2005+, and it is always the last occurrence you want, then I would suggest finding the last instance of a ( in your query and substring to there. To get the last instance you could use something like:
REVERSE(SUBSTRING(REVERSE(lot_number),0,CHARINDEX('(',REVERSE(lot_number))))
If your version of Oracle supports regular expressions try this:
substr(regexp_substr(ln.lot_number,'[0-9]+\)$'),1,length(regexp_substr(ln.lot_number,'[0-9]+\)$'))-1)
Explanation:
regexp_substr(scrap_row,'[0-9]+\)$' ==> find me just numbers in the string that ends in ). This returns the numbers but it includes the closing parenthesis.
To remove the closing parenthsis, just send it through substring and extract first number through the length of the number stopping at 1 character from the end of the string.
Query for analysis:
with scrap
as (select '58LF-3-B-2-2-2 (SCRAP)-(61448)' as scrap_row from dual)
select scrap_row,
regexp_substr(scrap_row,'[0-9]+\)$') as regex_substring,
length(regexp_substr(scrap_row,'[0-9]+\)$')) as length_regex_substring,
substr(regexp_substr(scrap_row,'[0-9]+\)$'),1,length(regexp_substr(scrap_row,'[0-9]+\)$'))-1) as regex_sans_parenthesis
from scrap
If you have 11g, this will do it pretty simply by using the subgroup argument of regexp_substr() and constructing the regex appropriately:
SQL> with tbl(data) as
(
select 'E142059-307-SCRAP-(74055)' from dual
union
select '58LF-3-B-2-2-2 (SCRAP)-(61448)' from dual
)
select data from tbl
where regexp_substr(data, '\((\d+)\)$', 1, 1, NULL, 1)
= '61448';
DATA
------------------------------
58LF-3-B-2-2-2 (SCRAP)-(61448)
The regular expression can be read as:
\( - Search for a literal left paren
( - Start a remembered subgroup
\d+ - followed by 1 more more digits
) - End remembered subgroup
\) - followed by a literal right paren
$ - at the end of the line.
The regexp_substr function arguments are:
Source - the source string
Pattern - The regex pattern to look for
position - Position in the string to start looking for the pattern
occurrence - If the pattern occurs multiple times, which occurrence you want
match_params - See the docs, not used here
subexpression - which subexpression to use (the remembered group)
So in English, look for a series of 1 or more digits surrounded by parens, where it occurs at the end of the line and save the digit part only to use to compare. IMHO a lot easier to follow/maintain than nested instr(), substr().
For re-useability, make a function called get_last_number_in_parens() that contains this code and uses an argument of the string to search. This way that logic is encapsulated and can be re-used by folks that may not be so comfortable with regular expressions, but can benefit from the power! One place to maintain code too. Then call like this:
select data from tbl
where get_last_number_in_parens(data) = '61448';
How easy is that?!
Hello you can check with this code. It works whaever the condition may be
SELECT SUBSTR('58LF-3-B-2-2-2-(61448)',instr('58LF-3-B-2-2-2-(61448)','(',-1)+1,LENGTH('58LF-3-B-2-2-2-(61448)')-instr('58LF-3-B-2-2-2-(61448)','(',-1)-1)
FROM dual;
SELECT SUBSTR('58LF-3-B-2-2-2 (SCRAP)-(61448)',instr('58LF-3-B-2-2-2 (SCRAP)-(61448)','(',-1)+1,LENGTH('58LF-3-B-2-2-2 (SCRAP)-(61448)')-instr('58LF-3-B-2-2-2 (SCRAP)-(61448)','(',-1)-1)
FROM dual;
Output
==================================
61448
==================================

Finding first and second word in a string in SQL Developer

How can I find the first word and second word in a string separated by unknown number of spaces in SQL Developer? I need to run a query to get the expected result.
String:
Hello Monkey this is me
Different sentences have different number of spaces between the first and second word and I need a generic query to get the result.
Expected Result:
Hello
Monkey
I have managed to find the first word using substr and instr. However, I do not know how to find the second word due to the unknown number of spaces between the first and second word.
select substr((select ltrim(sentence) from table1),1,
(select (instr((select ltrim(sentence) from table1),' ',1,1)-1)
from table1))
from table1
Since you seem to want them as separate result rows, you could use a simple common table expression to duplicate the rows, once with the full row, then with the first word removed. Then all you have to do is get the first word from each;
WITH cte AS (
SELECT value FROM table1
UNION ALL
SELECT SUBSTR(TRIM(value), INSTR(TRIM(value), ' ')) FROM table1
)
SELECT SUBSTR(TRIM(value), 1, INSTR(TRIM(value), ' ') -1) word
FROM cte
Note that this very simple example assumes that there is a second word, if there isn't, NULL will be returned for both words.
An SQLfiddle to test with.
While Joachim Isaksson's answer is a robust and fast approach, you can also consider splitting the string and selecting from the resulting pieces set. This is just meant as hint for another approach, if your requirements alter (e.g. more than two string pieces).
You could split finally by the regex /[ ]+/, and so getting the words between the blanks.
Find more about splitting here: How do I split a string so I can access item x?
This will strongly depend on the SQL dialect you are using.
Try this with REGEXP_SUBSTR:
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),
REGEXP_SUBSTR(REGEXP_SUBSTR(sentence,'\s+(\w+)\s+(\w+)'),'\w+$'),
REGEXP_SUBSTR(sentence,'\s+(\w+)\s+$')
FROM table1;
result:
1 2 3 4 5
Hello Monkey Monkey this this is_me
Learn more about REGEXP_SUBSTR reference to Using Regular Expressions With Oracle Database
Test use SqlFiddle: http://sqlfiddle.com/#!4/8e9ef/9
If you only want to get the first and the second word, use REGEXP_INSTR to get second word start position :
SELECT
REGEXP_SUBSTR(sentence,'\w+\s+') AS FIRST,
REGEXP_SUBSTR(sentence,'\w+\s',REGEXP_INSTR(sentence,'\w+\s+')+length(REGEXP_SUBSTR(sentence,'\w+\s+'))) AS SECOND
FROM table1;

How to extract group from regular expression in Oracle?

I got this query and want to extract the value between the brackets.
select de_desc, regexp_substr(de_desc, '\[(.+)\]', 1)
from DATABASE
where col_name like '[%]';
It however gives me the value with the brackets such as "[TEST]". I just want "TEST". How do I modify the query to get it?
The third parameter of the REGEXP_SUBSTR function indicates the position in the target string (de_desc in your example) where you want to start searching. Assuming a match is found in the given portion of the string, it doesn't affect what is returned.
In Oracle 11g, there is a sixth parameter to the function, that I think is what you are trying to use, which indicates the capture group that you want returned. An example of proper use would be:
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]', 1,1,NULL,1) from dual;
Where the last parameter 1 indicate the number of the capture group you want returned. Here is a link to the documentation that describes the parameter.
10g does not appear to have this option, but in your case you can achieve the same result with:
select substr( match, 2, length(match)-2 ) from (
SELECT regexp_substr('abc[def]ghi', '\[(.+)\]') match FROM dual
);
since you know that a match will have exactly one excess character at the beginning and end. (Alternatively, you could use RTRIM and LTRIM to remove brackets from both ends of the result.)
You need to do a replace and use a regex pattern that matches the whole string.
select regexp_replace(de_desc, '.*\[(.+)\].*', '\1') from DATABASE;

Reading a part of a alpha numeric string in SQL

I have a table with one column " otname "
table1.otname contains multiple rows of alpha-numeric string resembling the following data sample:
11.10.32.12.U.A.F.3.2.21.249.1
2001.1.1003.8281.A.LE.P.P
2010.1.1003.8261.A.LE.B.B
I want to read the fourth number in every string ( part of the string in bold ) and write a query in Oracle 10g
to read its description stored in another table. My dilemma is writing the first part of the query.i.e. choosing the fourth number of every string in a table
My second query will be something like this:
select description_text from table2 where sncode = 8281 -- fourth part of the data sample in every string
Many thanks.
novice
Works with 9i+:
WITH portion AS (
SELECT SUBSTR(t.otname, INSTR(t.otname, ".", 1, 3)+1, INSTR(t.otname, ".", 1, 4)) 'sncode'
FROM TABLE t)
SELECT t.description_text
FROM TABLE2 t
JOIN portion p ON p.sncode = t.sncode
The use of SUBSTR should be obvious; INSTR is being used to find location the period (.), starting at the first character in the string (parameter value 1), on the 3rd and 4th appearance in the string. You might have to subtract one from the position returned for the 4th instance of the period - test this first to be sure you're getting the right values:
SELECT SUBSTR(t.otname, INSTR(t.otname, ".", 1, 3)+1, INSTR(t.otname, ".", 1, 4)) 'sncode'
FROM TABLE t
I used subquery factoring so the substring happens before you join to the second table. It can be done as a subquery, but subquery factoring is faster.
Newer versions of oracle (including 10g) have various regular expression functions. So you can do something like this:
where sncode = to_number(regexp_replace(otname, '^(\d+\.\d+\.\d+\.(\d+))?.+$', '\2'))
This matches 3 sets of digits-followed-by-a-dot, and a fourth grouped set of digits, followed by the rest of the string, and returns a string consisting of all that entirely replaced by the first group (the fourth set of digits).
Here's a complete query (if I understood your description of the two tables correctly):
select t2.description_text
from table1 t1, table2 t2
where t2.sncode = to_number(regexp_replace(t1.otname, '^(\d+\.\d+\.\d+\.(\d+))?.+$', '\2'))
Another slightly shorter alternative regex:
where t2.sncode = to_number(regexp_replace(t1.otname, '^((\d+\.){3}(\d+))?.+$', '\3'))