Extract the second word from a string in ODI Expression

Extract the second word from a string in ODI Expression - sql

This two syntaxes allow to get the scond word from a string in oracle
SELECT REGEXP_SUBSTR('Hello this is an example', '\s+(\w+)\s') AS syntax1,
SUBSTR('Hello this is an example',
INSTR('Hello this is an example', ' ', 1, 1) + 1,
INSTR('Hello this is an example', ' ', 1, 2)
- INSTR('Hello this is an example', ' ', 1)
) AS syntax2
FROM dual;
Result:
syntax1 syntax2
------- -------
this this
I'm working in ODI (oracle data integration), this two syntaxes doesn't work in ODI:
For ODI, the regexp is not valid and INSTR function accepts only 2 parameters
Can you suggest me a solution that can work in ODI?
Thank you.

I think it should support ' [[:alpha:]]+ '

Then, you can apply SUBSTR() function twice :
WITH t2(str) AS
(
SELECT SUBSTR( TRIM( str ), INSTR( TRIM( str ), ' ') + 1, LENGTH( TRIM(str) ) )
FROM t --> original table
)
SELECT SUBSTR( str, 1, INSTR(str, ' ') - 1 ) AS extracted_string
FROM t2
extracted_string
----------------
this
If the version of installed ODI is 12+, then you can also use REGEXP_REPLACE() as below one :
SELECT REGEXP_REPLACE(str, '(\w+)\s(\w+)( .*)', '\2' ) AS extracted_string
FROM t
Demo

I finaly used this expression:
SELECT
SUBSTR (
SUBSTR ('one two three four',
INSTR ('one two three four', ' ') + 1,
999999),
0,
INSTR (
SUBSTR ('one two three four',
INSTR ('one two three four', ' ') + 1,
999999),
' ')
- 1)
FROM DUAL

Related

Remove all dots except its last occurrence

I need regex to remove dots from a number, but not the last one.
What I'd like to do:
100.000.10 -> 100000.10
I tried with:
SELECT REGEXP_REPLACE ('100.100.10', '\.(?![^.]+$)|[^0-9.]','') FROM dual;
But it return 100.100.10

You do not need (slow) regular expression and can use (much faster) simple string functions:
SELECT REPLACE(SUBSTR(value, 1, INSTR(value, '.', -1) - 1), '.')
|| SUBSTR(value, INSTR(value, '.', -1)) AS updated_value
FROM table_name;
Which, for the sample data:
CREATE TABLE table_name (value) AS
SELECT '100.000.10' FROM DUAL;
Outputs:
UPDATED_VALUE
100000.10
fiddle

Find space after nth characters and split into new row

I have a large string stored in table as a single line. I need a select query to split the large string to rows after every 100 characters and it should not split in middle of the word. Basically, the query should find a space after 100 characters and split into new line.
I have used this query, it is splitting after 100 lines, but it is breaking in the middle of words.
SELECT REGEXP_REPLACE ( col_large_string , '(.{100})' , '\1' || CHR (10) ) AS split_to_rows
FROM tab_large_string where string_id = 1;

You do not need (slow) regular expressions and can do it with simple (quicker) string functions.
If you want to replace spaces with newlines then:
WITH bounds ( str, end_pos ) AS (
SELECT col_large_string,
INSTR(col_large_string, ' ', 101)
FROM tab_large_string
UNION ALL
SELECT SUBSTR(str, 1, end_pos - 1)
|| CHR(10)
|| SUBSTR(str, end_pos + 1),
INSTR(str, ' ', end_pos + 101)
FROM bounds
WHERE end_pos > 0
)
SELECT str AS split_to_lines
FROM bounds
WHERE end_pos = 0;
and if you want to have each line in a new row then:
WITH bounds ( str, start_pos, end_pos ) AS (
SELECT col_large_string,
1,
INSTR(col_large_string, ' ', 101)
FROM tab_large_string
UNION ALL
SELECT str,
end_pos + 1,
INSTR(str, ' ', end_pos + 101)
FROM bounds
WHERE end_pos > 0
)
SELECT CASE end_pos
WHEN 0
THEN SUBSTR(str, start_pos)
ELSE SUBSTR(str, start_pos, end_pos - start_pos)
END AS split_to_rows
FROM bounds;
If you do want to use regular expressions then:
SELECT REGEXP_REPLACE(
col_large_string,
'(.{100,}?) ',
'\1' || CHR (10)
) AS split_to_lines
FROM tab_large_string
WHERE string_id = 1;
db<>fiddle here

You can use this regular expression:
SELECT REGEXP_REPLACE ( col_large_string , '((\w+\s+){100})' , '\1' || CHR (10) ) AS split_to_rows
FROM tab_large_string where string_id = 1;
\w+ matches one or more occurrence of word character.
\s+ matches one or more occurrence of space character.
(\w+\s+) matches a word followed by space
(\w+\s+){100} then matches (a word followed by space) x100.

Oracle: instr+substr instead of regexp_substr

I got this query from another post I made which uses REGEXP_SUBSTR() to pull out specific information from a string in oracle. It works good but only for small sets of data. When it comes to tables that have 300,000+ records, it is very slow and I was doing some reading that instr + substr might be faster. The example query is:
SELECT REGEXP_SUBSTR(value, '(^|\|)\s*24=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "24",
REGEXP_SUBSTR(value, '(^|\|)\s*35=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "35",
REGEXP_SUBSTR(value, '(^|\|)\s*47A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "47A",
REGEXP_SUBSTR(value, '(^|\|)\s*98A=\s*(.*?)\s*(\||$)', 1, 1, NULL, 2) AS "98A"
FROM table_name
Table example:
CREATE TABLE table_name (value ) AS
SELECT '35= 88234.00 | 47A= Shawn | 98A= This is a comment |' FROM DUAL UNION ALL
SELECT '24= 123.00 | 98A= This is a comment | 47A= Derick |' FROM DUAL
Output of query would be:
24
35
47A
98A
88234.00
Shawn
This is a comment
123.00
Derick
This is a comment
Can someone give me an example of how this same query would look if I was doing instr+substr instead?
Thank you.

SELECT CASE
WHEN start_24 > 0
THEN TRIM(
SUBSTR(
value,
start_24 + 5,
INSTR(value, '|', start_24 + 5) - (start_24+5)
)
)
END AS "24",
CASE
WHEN start_35 > 0
THEN TRIM(
SUBSTR(
value,
start_35 + 5,
INSTR(value, '|', start_35 + 5) - (start_35+5)
)
)
END AS "35",
CASE
WHEN start_47a > 0
THEN TRIM(
SUBSTR(
value,
start_47a + 6,
INSTR(value, '|', start_47a + 6) - (start_47a+6)
)
)
END AS "47A",
CASE
WHEN start_98a > 0
THEN TRIM(
SUBSTR(
value,
start_98a + 6,
INSTR(value, '|', start_98a + 6) - (start_98a+6)
)
)
END AS "98A"
FROM (
SELECT value,
INSTR(value, '| 24=') AS start_24,
INSTR(value, '| 35=') AS start_35,
INSTR(value, '| 47A=') AS start_47a,
INSTR(value, '| 98A=') AS start_98a
FROM (
SELECT '| ' || value AS value FROM table_name
)
);
Which, for your sample data, outputs:
24
35
47A
98A
88234.00
Shawn
This is a comment
123.00
Derick
This is a comment
db<>fiddle here

Given the data in your example it seems you could also use a procedural approach for your data extraction, but I'm sceptical if this could be faster.
The following function get24 for example extracts the columns "24" just using INSTR and SUBSTR.
CREATE OR REPLACE FUNCTION get24(value IN VARCHAR2) RETURN VARCHAR2
IS
i PLS_INTEGER;
s VARCHAR2(32767);
BEGIN
i := INSTR(value, '24= ');
IF (i <> 1) THEN
RETURN NULL;
END IF;
s := SUBSTR(value, i + 4);
i := INSTR(s, ' | ');
IF (i = 0) THEN
RETURN NULL;
END IF;
RETURN SUBSTR(s, 1, i - 1);
END;
/
SELECT get24(value) "24" FROM table_name;
You could then also try using a pipelined function and do all the data extraction within the pipelined function.

How to sort version numbers (like 5.3.60.8)

I have a Strings like:
5.3.60.8
6.0.5.94
3.3.4.1
How to sort these values in sorting order in Oracle SQL?
I want the order to be like this:
6.0.5.94
5.3.60.8
3.3.4.1

with
inputs ( str ) as (
select '6.0.5.94' from dual union all
select '5.3.60.8' from dual union all
select '3.3.4.1' from dual
)
select str from inputs
order by to_number(regexp_substr(str, '\d+', 1, 1)),
to_number(regexp_substr(str, '\d+', 1, 2)),
to_number(regexp_substr(str, '\d+', 1, 3)),
to_number(regexp_substr(str, '\d+', 1, 4))
;
STR
--------
3.3.4.1
5.3.60.8
6.0.5.94

You could pad numbers with zeroes on the left in the order by clause:
select version
from versions
order by regexp_replace(
regexp_replace(version, '(\d+)', lpad('\1', 11, '0')),
'\d+(\d{10})',
'\1'
) desc
This works for more number parts as well, up to about 200 of them.
If you expect to have numbers with more than 10 digits, increase the number passed as second argument to the lpad function, and also the braced number in the second regular expression. The first should be one more (because \1 is two characters but could represent only one digit).
Highest version
To get the highest version only, you can add the row number to the query above with the special Oracle rownum keyword. Then wrap all that in an another select with a condition on that row number:
select version
from (
select version, rownum as row_num
from versions
order by regexp_replace(
regexp_replace(version, '(\d+)', lpad('\1', 11, '0')),
'\d+(\d{10})',
'\1'
) desc)
where row_num <= 1;
See this Q&A for several alternatives, also depending on your Oracle version.

I will show here the answer from AskTom, which can be used with different version size :
WITH inputs
AS (SELECT 1 as id, '6.0.5.94' as col FROM DUAL
UNION ALL
SELECT 2,'5.3.30.8' FROM DUAL
UNION ALL
SELECT 3,'5.3.4.8' FROM DUAL
UNION ALL
SELECT 4,'3' FROM DUAL
UNION ALL
SELECT 5,'3.3.40' FROM DUAL
UNION ALL
SELECT 6,'3.3.4.1.5' FROM DUAL
UNION ALL
SELECT 7,'3.3.4.1' FROM DUAL)
SELECT col, MAX (SYS_CONNECT_BY_PATH (v, '.')) p
FROM (SELECT t.col, TO_NUMBER (SUBSTR (x.COLUMN_VALUE, 1, 5)) r, SUBSTR (x.COLUMN_VALUE, 6) v, id rid
FROM inputs t,
TABLE (
CAST (
MULTISET (
SELECT TO_CHAR (LEVEL, 'fm00000')
|| TO_CHAR (TO_NUMBER (SUBSTR ('.' || col || '.', INSTR ('.' || col || '.', '.', 1, ROWNUM) + 1, INSTR ('.' || col || '.', '.', 1, ROWNUM + 1) - INSTR ('.' || col || '.', '.', 1, ROWNUM) - 1)), 'fm0000000000')
FROM DUAL
CONNECT BY LEVEL <= LENGTH (col) - LENGTH (REPLACE (col, '.', '')) + 1) AS SYS.odciVarchar2List)) x)
START WITH r = 1
CONNECT BY PRIOR rid = rid AND PRIOR r + 1 = r
GROUP BY col
ORDER BY p

SQL Concatenate strings across multiple columns with corresponding values

I'm looking for a way to achieve this in a SELECT statement.
FROM
Column1 Column2 Column3
A,B,C 1,2,3 x,y,z
TO
Result
A|1|x,B|2|y,C|3|z
The delimiters don't matter. I'm just trying to to get all the data in one single column. Ideally I am looking to do this in DB2. But I'd like to know if there's an easier way to get this done in Oracle.
Thanks

You can do it like this using INSTR and SUBSTR:
select
substr(column1,1,instr(column1,',',1)-1) || '|' ||
substr(column2,1,instr(column2,',',1)-1) || '|' ||
substr(column3,1,instr(column3,',',1)-1) || '|' ||
',' ||
substr(column1 ,instr(column1 ,',',1,1)+1,instr(column1 ,',',1,2) - instr(column1 ,',',1)-1) || '|' ||
substr(column2 ,instr(column2 ,',',1,1)+1,instr(column2 ,',',1,2) - instr(column2 ,',',1)-1) || '|' ||
substr(column3 ,instr(column3 ,',',1,1)+1,instr(column3 ,',',1,2) - instr(column3 ,',',1)-1) || '|' ||
',' ||
substr(column1 ,instr(column1 ,',',1,2)+1) || '|' ||
substr(column2 ,instr(column2 ,',',1,2)+1) || '|' ||
substr(column3 ,instr(column3 ,',',1,2)+1)
from yourtable

i tried some thing. just look into link
first i created a table called t_ask_test and inserted the data based on the above question. Achieved the result by using the string functions
sample table
create table t_ask_test(column1 varchar(10), column2 varchar(10),column3 varchar(10));
inserted a row
insert into T_ASK_TEST values ('A,B,C','1,2,3','x,y,z');
the following query will be in dynamic way
select substr(column1,1,instr(column1,',',1,1)-1)||'|'||substr(column2,1,instr(column1,',',1,1)-1)||'|'||substr(column3,1,instr(column1,',',1,1)-1) ||','||
substr(column1,instr(column1,',',1,1)+1,instr(column1,',',1,2)-instr(column1,',',1,1)-1)||'|'||substr(column2,instr(column2,',',1,1)+1,instr(column2,',',1,2)-instr(column2,',',1,1)-1)||'|'||substr(column3,instr(column3,',',1,1)+1,instr(column3,',',1,2)-instr(column3,',',1,1)-1) ||','||
substr(column1,instr(column1,',',1,2)+1,length(column1)-instr(column1,',',1,2))||'|'||substr(column2,instr(column2,',',1,2)+1,length(column2)-instr(column2,',',1,2))||'|'||substr(column3,instr(column3,',',1,2)+1,length(column3)-instr(column3,',',1,2)) as test from t_ask_test;
output will be as follows
TEST
---------------
A|1|x,B|2|y,C|3|z

If you have a dynamic number of entries for each row then:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE TEST ( Column1, Column2, Column3 ) AS
SELECT 'A,B,C', '1,2,3', 'x,y,z' FROM DUAL
UNION ALL SELECT 'D,E', '4,5', 'v,w' FROM DUAL;
Query 1:
WITH ids AS (
SELECT t.*, ROWNUM AS id
FROM TEST t
)
SELECT LISTAGG(
REGEXP_SUBSTR( i.Column1, '[^,]+', 1, n.COLUMN_VALUE )
|| '|' || REGEXP_SUBSTR( i.Column2, '[^,]+', 1, n.COLUMN_VALUE )
|| '|' || REGEXP_SUBSTR( i.Column3, '[^,]+', 1, n.COLUMN_VALUE )
, ','
) WITHIN GROUP ( ORDER BY n.COLUMN_VALUE ) AS value
FROM ids i,
TABLE(
CAST(
MULTISET(
SELECT LEVEL
FROM DUAL
CONNECT BY LEVEL <= GREATEST(
REGEXP_COUNT( i.COLUMN1, '[^,]+' ),
REGEXP_COUNT( i.COLUMN2, '[^,]+' ),
REGEXP_COUNT( i.COLUMN3, '[^,]+' )
)
)
AS SYS.ODCINUMBERLIST
)
) n
GROUP BY i.ID
Results:
| VALUE |
|-------------------|
| A|1|x,B|2|y,C|3|z |
| D|4|v,E|5|w |

You need to use:
SUBSTR
INSTR
|| concatenation operator
It would be easy if you break your output, and then understand how it works.
SQL> WITH t AS
2 ( SELECT 'A,B,C' Column1, '1,2,3' Column2, 'x,y,z' Column3 FROM dual
3 )
4 SELECT SUBSTR(column1, 1, instr(column1, ',', 1) -1)
5 ||'|'
6 || SUBSTR(column2, 1, instr(column2, ',', 1) -1)
7 ||'|'
8 || SUBSTR(column3, 1, instr(column1, ',', 1) -1)
9 ||','
10 || SUBSTR(column1, instr(column1, ',', 1, 2) +1 - instr(column1, ',', 1),
11 instr(column1, ',', 1) -1)
12 ||'|'
13 || SUBSTR(column2, instr(column2, ',', 1, 2) +1 - instr(column2, ',', 1),
14 instr(column2, ',', 1) -1)
15 ||'|'
16 || SUBSTR(column3, instr(column3, ',', 1, 2) +1 - instr(column3, ',', 1),
17 instr(column3, ',', 1) -1)
18 ||','
19 || SUBSTR(column1, instr(column1, ',', 1, 3) +1 - instr(column1, ',', 1),
20 instr(column1, ',', 2) -1)
21 as "new_column"
22 FROM t;
new_column
-------------
A|1|x,B|2|y,C
On a side note, you should avoid storing delimited values in a single column. Consider normalizing the data.
From Oracle 11g and above, you could create a VIRTUAL COLUMN using the above expression and use it instead of executing the SQL frequently.

Its very simple in oracle. just use the concatenation operatort ||.
In the below solution, I have used underscore as the delimiter
select Column1 ||'_'||Column2||'_'||Column3 from table_name;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extract the second word from a string in ODI Expression - sql

I think it should support ' [[:alpha:]]+ '

I finaly used this expression: SELECT SUBSTR ( SUBSTR ('one two three four', INSTR ('one two three four', ' ') + 1, 999999), 0, INSTR ( SUBSTR ('one two three four', INSTR ('one two three four', ' ') + 1, 999999), ' ') - 1) FROM DUAL

Related

Remove all dots except its last occurrence

Find space after nth characters and split into new row

Oracle: instr+substr instead of regexp_substr

How to sort version numbers (like 5.3.60.8)

SQL Concatenate strings across multiple columns with corresponding values

Categories

Resources