Get first value and last value - sql

i have below record like '8|12|53|123|97' and i need to find the range of values between 8 to 97, so that i need the number 8 and 97.

You can use REGEXP_SUBSTR as following:
SQL> SELECT
2 REGEXP_SUBSTR('8|12|53|123|97', '^[0-9]+') FIRSTVAL,
3 REGEXP_SUBSTR('8|12|53|123|97', '[0-9]+$') LASTVAL
4 FROM
5 DUAL;
FIRSTVAL LASTVAL
---------- ----------
8 97
SQL>
^ matches the beginning of a string.
$ matches the end of a string.
Cheers!!

Here is a solution which will work for any string which has at least one pipe.
with cte as (
select '8|12|53|123|97' str from dual
)
, rng as (
select to_number(substr(str, 1, instr(str, '|')-1)) as token_1
,to_number(substr(str, instr(str, '|', -1)+1)) as token_2
from cte )
select token_1 + level - 1 as tkn
from rng
connect by level <= (token_2 - token_1) + 1
/
The first subquery is just your test data. The second subquery identifies the first number (token_1) and the last number (token_2) in the string. It uses substr() and instr() just because they are faster than regex. instr() with a negative offset finds the last occurence of the search argument.
The main query generates a range of numbers from the bounds of the rng subquery. Not sure if that's in your requirement (depends on what you mean by "range of values between").
Because this model is not in First Normal Form you are exposed to data quality issues. The query will not produce results if the first or last tokens are not numeric, or there's only one token or the separator is not a pipe.

Related

Issues with SUBSTR function Oracle_SQL

I used the SUBSTR function for the similar purposes, but I encountered the following issue:
I am extracting 6 characters from the right, but the data in column is inconsistent and for some rows it has characters less than 6, i.e. 5 or 4. So for such rows, the function returns blanks. How can I fix this?
Example Scenario 1:
SUBSTR('0000123456',-6,6)
Output: 123456
Scenario 2 (how do I fix this?, I need it to return '23456'):
SUBSTR('23456',-6,6)
Output: ""
You can use a case expression: if the string length is strictly greater than 6 then return just the last 6 characters; otherwise return the string itself. This way you don't need to call substr unless it is really needed.
Alternatively, if speed is not the biggest issue and you are allowed to use regular expressions, you can write this more compactly - select between 0 and 6 characters - as many as possible - at the end of the string.
Finally, if you don't mind using undocumented functions, you can use reverse and standard substr (starting from character 1 and extracting the first 6 characters; that will work as expected even if the string has length less than 6). So: reverse the string, extract first (up to) 6 characters, and then reverse again to restore the order. WARNING: This is shown only for fun; DO NOT USE THIS METHOD!
with
test_data (str) as (
select '0123449389' from dual union all
select '00000000' from dual union all
select null from dual union all
select 'abcd' from dual
)
select str,
case when length(str) > 6 then substr(str, -6) else str end as case_substr,
regexp_substr(str, '.{0,6}$') as regexp_substr,
reverse(substr(reverse(str), 1, 6)) as rev_substr
from test_data
;
STR CASE_SUBSTR REGEXP_SUBSTR REV_SUBSTR
---------- ------------- ------------- --------------
0123449389 449389 449389 449389
00000000 000000 000000 000000
abcd abcd abcd abcd
One method uses coalesce():
select coalesce(substr('23456', -6, 6), '23456')
Another tweaks the length:
select substr('23456', greatest(- length('23456'), -6), 6)

How to pull a value in between multiple values?

I have a column named Concatenated Segments which has 12 segment values, and I'm looking to edit the formula on the column to only show the 5th segment. The segments are separated by periods.
How would I need to edit the formula to do this?
Would using a substring work?
Alternatively, using good old SUBSTR + INSTR combination
possibly faster on large data sets
which doesn't care about uninterrupted strings (can contain anything between dots)
SQL> WITH
2 -- thank you for typing, #marcothesane
3 indata(s) AS (
4 SELECT '1201.0000.5611005.0099.211003.0000.2199.00099.00099.0000.0000.00000' FROM dual
5 )
6 select substr(s, instr(s, '.', 1, 4) + 1,
7 instr(s, '.', 1, 5) - instr(s, '.', 1, 4) - 1
8 ) result
9 from indata;
RESULT
------
211003
SQL>
Use REGEXP_SUBSTR(), searching for the 5th uninterrupted string of digits, or the 5th uninterrupted string of anything but a dot (\d and [^\.]) starting from position 1 of the input string:
WITH
-- your input ... paste it as text next time, so I don't have to manually re-type it ....
indata(s) AS (
SELECT '1201.0000.5611005.0099.211003.0000.2199.00099.00099.0000.0000.00000' FROM dual
)
SELECT
REGEXP_SUBSTR(s,'\d+',1,5) AS just_digits
, REGEXP_SUBSTR(s,'[^\.]+',1,5) AS between_dots
FROM indata;
-- out just_digits | between_dots
-- out -------------+--------------
-- out 211003 | 211003

Retrivieng specific occurrences of a given Regex with Oracle SQL

In a simplified form, I'm attempting to retrieve either the first occurrence of the '.*?=(.*?);.*' regex, or the second, or the third -- that is, either x or y or z (that is, I want to be able to hardcode in this query that I want the first, second or third values) in this following example:
select regexp_replace(
'margin=x;margin=y;margin=z;',
'.*?=(.*?);.*',
'\1',
1 -- occurrences. I thought that picking 1, 2 or 3 would solve my problem?
) from dual;
-- This returns "xyz", which is terrible. I was expecting it to return "x", in this case.
Looking at the Oracle documentation, I thought this would be relatively straightforward, as the last parameter (occurrences), apparently allows me to select which groups to take into consideration. But it doesn't! Why?
Thanks
i´m goingoff to another completly different solution. Would combining a hierarchial substring select with a regexp_replace be an option for your needs?
This way you could create an option to either select one or multiple values, depending on your needs. You wouldn´t need to write a concatinating regex value and you could adjust the select a bit more to your needs
select regexp_replace(subselect.val, '.*=(.*?);', '\1') -- remove "margin="
from (select regexp_substr(
'margin=x;margin=y;margin=z;',
'.*?=(.*?);',
1,
level) val,
level lvl
from dual
connect by regexp_substr('margin=x;margin=y;margin=z;',
'.*?=(.*?);',
1,
level) is not null) subselect -- This select represents each margin=T as a single row
where lvl = 1; -- cou could define multiple values to select aswell.
You need a regex that will match 1 to n occurrences of the whole group. E.g.
([^=]*=([^;]*);){2}.*
(replaced with \2 backreference) will get the 2nd attribute value. Your regex can also be used (though it is quite synonymous to the above pattern): (.*?=(.*?);){2}.*.
See the regex demo
If you define the index variable as IDX, you can use something like
select regexp_replace(
'margin=x;margin=y;margin=z;',
CONCAT('([^=]*=([^;]*);){', IDX, '}.*'),
'\2'
) from dual;
NOTE: If you want to get an empty string as a result of trying to obtain a non-existing value, add |.* at the end of the regex:
(.*?=(.*?);){4}.*|.*
See this regex demo (with your input string, the result will be empty string).
Perhaps all you need is this.... The fourth parameter is NOT the occurrence but the POSITION from which the search starts. The FIFTH parameter is the occurrence.
https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions130.htm
Also, are you sure you want REPLACE and not SUBSTR?
EDITED: To clarify (it seems at least one person was confused). I show a possible solution to what you need (perhaps) at the end, but first let's look at REGEXP_REPLACE. I rewrote your query to use different occurrences; I put the index in a CTE, but you can instead make idx into a bind variable, or any other mechanism you need to use. As you will see, the output makes no sense.
with t1 ( idx ) as (select 1 from dual union all select 2 from dual
union all select 3 from dual)
select idx,
regexp_replace('margin=x;margin=y;margin=z;', '.*?=(.*?);.*', '\1', 1, idx) as val
from t1;
Output:
IDX VAL
---------- -----------------------
1 xmargin=y;margin=z;
2 margin=x;ymargin=z;
3 margin=x;margin=y;z
3 rows selected.
I guess this is not what you needed - but it demonstrates what was wrong in your query. The fourth argument to REGEXP_REPLACE, 1 in all cases in the above query, is the position from which the search begins. The fifth argument, idx, is the occurrence. This query replaces the first, second, third occurrence with the subexpression - probably not what you wanted.
If you need to extract x, or y, or z, depending on the occurrence number, you must use REGEXP_SUBSTR, not REGEXP_REPLACE. Note also that I changed the match pattern - the .*? at the beginning and the .* at the end are unnecessary. If you want to find x, y or z in something like margin=x; but not in length=x; then you must make that explicit, the match pattern should be 'margin=(.*?);'.
with t1 ( idx ) as (select 1 from dual union all select 2 from dual
union all select 3 from dual)
select idx,
regexp_replace('margin=x;margin=y;margin=z;', '=(.*?);', '\1', 1, idx) as val
from t1;
Output:
IDX VAL
---------- -------
1 x
2 y
3 z

PLSQL show digits from end of the string

I have the following problem.
There is a String:
There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 12125235
I need to show only just the last date from this string: 2015.06.07.
I tried with regexp_substr with insrt but it doesn't work.
So this is just test, and if I can solve this after it with this solution I should use it for a CLOB query where there are multiple date, and I need only the last one. I know there is regexp_count, and it is help to solve this, but the database what I use is Oracle 10g so it wont work.
Can somebody help me?
The key to find the solution of this problem is the idea of reversing the words in the string presented in this answer.
Here is the possible solution:
WITH words AS
(
SELECT regexp_substr(str, '[^[:space:]]+', 1, LEVEL) word,
rownum rn
FROM (SELECT 'There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 2015.06.08 2015.06.17. 2015.07.01. 12345678999 12125235' str
FROM dual) tab
CONNECT BY LEVEL <= LENGTH(str) - LENGTH(REPLACE(str, ' ')) + 1
)
, words_reversed AS
(
SELECT *
FROM words
ORDER BY rn DESC
)
SELECT regexp_substr(word, '\d{4}\.\d{2}\.\d{2}', 1, 1)
FROM words_reversed
WHERE regexp_like(word, '\d{4}\.\d{2}\.\d{2}')
AND rownum = 1;
From the documentation on regexp_substr, I see one problem immediately:
The . (period) matches any character. You need to escape those with a backslash: \. in order to match only a period character.
For reference, I am linking this post which appears to be the approach you are taking with substr and instr.
Relevant documentation from Oracle:
INSTR(string , substring [, position [, occurrence]])
When position is negative, then INSTR counts and searches backward from the end of string. The default value of position is 1, which means that the function begins searching at the beginning of string.
The problem here is that your regular expression only returns a single value, as explained here, so you will be giving the instr function the appropriate match in the case of multiple dates.
Now, because of this limitation, I recommend using the approach that was proposed in this question, namely reverse the entire string (and your regular expression, i.e. \d{2}\.\d{2}\.\d{4}) and then the first match will be the 'last match'. Then, perform another string reversal to get the original date format.
Maybe this isn't the best solution, but it should work.
There are three different PL/SQL functions that will get you there.
The INSTR function will identify where the first "period" in the date string appears.
SUBSTR applied to the entire string using the value from (1) as the start point
TO_DATE for a specific date mask: YYYY.MM.DD will convert the result from (2) into a Oracle date time type.
To make this work in procedural code, the standard blocks apply:
DECLARE
v_position pls_integer;
... other variables
BEGIN
sql code and function calls;
END
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE finddate
(column1 varchar2(11), column2 varchar2(39))
;
INSERT ALL
INTO finddate (column1, column2)
VALUES ('row1', '1234567 242424 2015.06.07. 12125235')
INTO finddate (column1, column2)
VALUES ('string2', '1234567 242424 2015.06.07. 12125235')
SELECT * FROM dual
;
Query 1:
select instr(column2,'.',1) from finddate
where column1 = 'string2'
select substr(column2,(20-4),10) from finddate
select to_date('2015.06.07','YYYY.MM.DD') from finddate
Results:
| TO_DATE('2015.06.07','YYYY.MM.DD') |
|------------------------------------|
| June, 07 2015 00:00:00 |
| June, 07 2015 00:00:00 |
Here's a way using regexp_replace() that should work with 10g, assuming the format of the lines will be the same:
with tbl(col_string) as
(
select 'There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 12125235'
from dual
)
select regexp_replace(col_string, '^.*(\d{4}\.\d{2}\.\d{2})\. \d*$', '\1')
from tbl;
The regex can be read as:
^ - Match the start of the line
. - followed by any character
* - followed by 0 or more of the previous character (which is any character)
( - Start a remembered group
\d{4}\.\d{2}\.\d{2} - 4 digits followed by a literal period followed by 2 digits, etc
) - End the first remembered group
\. - followed by a literal period
- followed by a space
\d* - followed by any number of digits
$ - followed by the end of the line
regexp_replace then replaces all that with the first remembered group (\1).
Basically describe the whole line as a regular expression, group around what you want to return. You will most likely need to tweak the regex for the end of the line if it could be other characters than digits but this should give you an idea.
For the sake of argument this works too ONLY IF there are 2 occurrences of the date pattern:
with tbl(col_string) as
(
select 'There is something 2015.06.06. in the air 1234567 242424 2015.06.07. 12125235' from dual
)
select regexp_substr(col_string, '\d{4}\.\d{2}\.\d{2}', 1, 2)
from tbl;
returns the second occurrence of the pattern. I expect the above regexp_replace more accurately describes the solution.

Retrieve segment from value

I have this value in my field which have 5 segment for example 100-200-300-400-500.
How do I query to only retrieve the first 3 segment? Which mean the query result will display as 100-200-300.
The old SUBSTR and INSTR will be faster and less CPU intensive as compared to REGEXP.
SQL> WITH DATA AS(
2 SELECT '100-200-300-400-500' str FROM dual
3 )
4 SELECT substr(str, 1, instr(str, '-', 1, 3)-1) str
5 FROM DATA
6 /
STR
-----------
100-200-300
SQL>
The above SUBSTR and INSTR query uses the logic to find the 3rd occurrence of the hyphen "-" and then take the substring from position 1 till the third occurrence of '-'.
((\d)+-(\d)+-(\d)+)
If the Position of this sequence is arbitrary, you might go for REGularEXPressions
select regexp_substr(
'Test-Me 100-200-300-400-500 AGain-Home',
'((\d)+-(\d)+-(\d)+)'
) As Result
from dual
RESULT
-----------
100-200-300
Otherwise Simple SUBSTR will do
you have tow way, the first is substring.
The second is fast, us a REGEXP like this.
REGEXP_SUBSTR('100-200-300-400-500','[[:digit:]]{3}-[[:digit:]]{3}-[[:digit:]]{3}')"REGEXPR_SUBSTR" FROM DUAL;