Passing regexp_replace matches into arguments - sql

I've got a set of strings in a database which describe calculations like this:
"#id1# = #id2# + #id3#"
and a table with the ids like this:
ID Human_friendly_name
id1 Name1
id2 Name2
id3 Name3
I'd like to substitute the human-friendly names in for the #id# format, giving me a result of:
Name1 = Name2 + Name3
The calculations do not have a limit on how many variables they can include - some are in the hundreds
One potential way to do this would be to split the equation into multiple rows (using a recursive trim, for example), do a lookup for the names and then use LISTAGG to recombine the strings. But that seems overly complicated.
What I'd really like to do is use REGEXP_REPLACE to pass the matches into the argument for the replacement string, i.e.:
REGEXP_REPLACE('My calculation string',
'#\d+#',
(select max(name) from table where id = REGEX_MATCH)
)
I haven't been able to find any examples of passing the matched value into the replacement_string argument (although the SELECT part works). Can anyone tell me how to do this, or confirm that it's impossible?

Thought about it some more... Perhaps you meant something different?
Do you have a table with strings, like '#id1# = #id2# + #id3#', and you are looking for a query that will substitute 'Name1' in place of '#id1#', etc. - that is, the + sign in the string has NO meaning whatsoever, and you are simply wanting to do a string replacement based on a substitution table? So, for example, if you had another string '#id1# is better than a glass of #id2#' you would want the output 'Name1 is better than a glass of Name2'?
If so, you will need regular expressions AND a recursive process of some sort. Below I show how this can be done in Oracle versions 11.2 and higher (since I use the recursive subquery factoring introduced in 11.2).
Input tables:
Table: INPUT_STRINGS
Columns: INP_STR
INP_STR
------------------------------------------------------------
#id1# = #id2# + #id3# + #id1# / #id3#
Let #id2# be equal to #id4# - 5 + #id1##id2#
Table: HUMAN_READABLE
Columns: ID, HUMAN_READABLE_NAME
ID HUMAN_READABLE_NAME
-------------------- -----------------------------
id1 James Bond
id2 cat$5FISH
id3
id4 star
Query:
with t (input_str, temp_str, ct) as (
select inp_str, inp_str, regexp_count(inp_str, '#') from input_strings
union all
select input_str, regexp_replace(temp_str, '#[^#]*#',
(select human_readable_name from human_readable
where regexp_substr(temp_str, '#[^#]*#') = '#'||id||'#'), 1, 1), ct - 2
from t
where ct != 0
)
select t.input_str, temp_str as human_readable_str from t where ct = 0;
Output:
INPUT_STR HUMAN_READABLE_STR
-------------------------------------------- ------------------------------------------------------------
Let #id2# be equal to #id4# - 5 + #id1##id2# Let cat$5FISH be equal to star - 5 + James Bondcat$5FISH
#id1# = #id2# + #id3# + #id1# / #id3# James Bond = cat$5FISH + + James Bond /

Interesting problem. I think the issue is with when Oracle evaluates the backreferences in regexp_replace (so instead of sending the value of \1 you are actually sending the literal string '\1'). Anyway, here is a solution using SQL modeling (I like mathguy's answer too, this is just a different approach):
First, setup your ref table holding the id=>name translations:
create table my_ref
(
id varchar2(50) not null primary key,
name varchar2(50)
);
insert into my_ref (id, name) values ('id1','name1');
insert into my_ref (id, name) values ('id2','name2');
insert into my_ref (id, name) values ('id3','name3');
insert into my_ref (id, name) values ('id4','name4');
insert into my_ref (id, name) values ('id5','name5');
insert into my_ref (id, name) values ('id6','name6');
commit;
And the main table with a few example entries:
create table my_tab
(
formula varchar2(50)
);
insert into my_tab values ('#id1# = #id2# + #id3#');
insert into my_tab values ('#test# = some val #id4#');
commit;
Next, a basic function to translate a single id to name (lookup function):
create or replace function my_ref_fn(i_id in varchar2)
return varchar2
as
rv my_ref.name%type;
begin
begin
select
-- replace id with name, keeping spaces
regexp_replace(i_id, '( *)#(.+)#( *)', '\1' || name || '\3')
into rv
from my_ref
where id = ltrim(rtrim(trim(i_id), '#'),'#');
exception
when no_data_found then null;
end;
dbms_output.put_line('"' || i_id || '" => "' || rv || '"');
return rv;
end;
And to use it, we need to use SQL modeling:
select formula, new_val
from my_tab
MODEL
PARTITION BY (ROWNUM rn)
DIMENSION BY (0 dim)
MEASURES(formula, CAST('' AS VARCHAR2(255)) word, CAST('' AS VARCHAR(255)) new_val)
RULES ITERATE(99) UNTIL (word[0] IS NULL)
(word[0] = REGEXP_SUBSTR(formula[0], '( *)[^ ]+( *|$)', 1, ITERATION_NUMBER + 1)
, new_val[0] = new_val[0] || nvl(my_ref_fn(word[0]), word[0])
);
Which gives:
FORMULA;NEW_VAL
"#id1# = #id2# + #id3#";"name1 = name2 + name3"
"#test# = some val #id4#";"#test# = some val name4"

Related

Use replace to update column value from another column

I have database that looks like this
CREATE TABLE code (
id SERIAL,
name VARCHAR(255) NOT NULL
);
INSERT INTO code (name) VALUES ('random_value1_random');
INSERT INTO code (name) VALUES ('random_value123_random');
CREATE TABLE value (
id SERIAL,
name VARCHAR(255) NOT NULL
);
INSERT INTO value (name) VALUES ('value1');
INSERT INTO value (name) VALUES ('value123');
UPDATE code SET name = REPLACE(name, SELECT name from value , '');
I want to update my table code to remove a portion of a code and that code is coming from another table. My goal is to update all values of code and remove the portion of the string that matches another value. My end goal is to make all code.name in the example look like: random_random removing the value from the value table.
When tried using to replace with a query I get an error:
[21000] ERROR: more than one row returned by a subquery used as an expression
What is a cleaner better way to do this?
You can use REGEXP_REPLACE to replace multiple substrings in a string. You can use STRING_AGG to get the search pattern from the single search values.
UPDATE code SET name =
REGEXP_REPLACE( name,
(SELECT '(' || STRING_AGG(name, '|') || ')' from value),
''
);
This will leave you with 'random___random', not 'random_random'. If you only want to look for substrings separated with the underline character, then use
UPDATE code SET name =
TRIM('_' FROM
REGEXP_REPLACE(name,
(SELECT '(' || STRING_AGG('_?' || name || '_?', '|') || ')' from value),
'_'
)
);
Demo: https://dbfiddle.uk/RrOel8Ns
This T-SQL (I don't have Postgres) and isn't elegant, but it works..
;with l as (
-- Match the longest value first
select c.id c_id, v.id v_id, ROW_NUMBER () over (partition by c.id order by len(v.name) desc) r
from code c
join value v on charindex (v.name, c.name) > 0)
, l1 as (
-- Select the longest value first
select c_id, v_id from l where r = 1
)
update c set name = REPLACE(c.name, v.name, '')
from l1
join code c on c.id = l1.c_id
join value v on v.id = l1.v_id

Removing all characters before a given special character [Oracle SQL]

my oracle table has a column with these data:
ROW_ID
FILE_NAME
1
ZASWEFFT%Contract V1.pdf
2
ZZZZxxxx12%Contract 03.12.14.pdf
I need to remove everything before and including the % character, which would give me:
ROW_ID
FILE_NAME
1
Contract V1.pdf
2
Contract 03.12.14.pdf
I found this similar question
I changed it to fit my need and the select statement works:
SELECT SUBSTR(value, INSTR(value, '%')+1) invalue
FROM (SELECT FILE_NAME value FROM SFDC.PROJECT_ATT));
result:
INVALUE
Contract V1.pdf
Contract 03.12.14.pdf
But I'm not able to transform this into an update statement. My last try was:
UPDATE SIEBEL.S_PROJ_ATT T1
SET T1.FILE_NAME =
(SELECT SUBSTR(value,
INSTR(value,
'%') + 1) invalue
FROM (SELECT T2.FILE_NAME value
FROM SIEBEL.S_PROJ_ATT T2
WHERE T1.ROW_ID = T2.ROW_ID))
Oracle says the syntax is rubbish: ORA-00904: "T1"."ID": invalid identifier
you made it too complicted
CREATE TABLE S_PROJ_ATT (
ROW_ID INTEGER,
FILE_NAME VARCHAR(32)
);
INSERT INTO S_PROJ_ATT
(ROW_ID, FILE_NAME)
VALUES
('1', 'ZASWEFFT%Contract V1.pdf');
INSERT INTO S_PROJ_ATT
(ROW_ID, FILE_NAME)
VALUES
('2', 'ZZZZxxxx12%Contract 03.12.14.pdf');
UPDATE S_PROJ_ATT
SET FILE_NAME = SUBSTR(FILE_NAME, INSTR(FILE_NAME, '%')+1)
2 rows affected
SELECT * FROM S_PROJ_ATT
ROW_ID | FILE_NAME
-----: | :--------------------
1 | Contract V1.pdf
2 | Contract 03.12.14.pdf
db<>fiddle here
You don't need the subqueries; you can just do:
update s_proj_att
set file_name = substr(file_name, instr(file_name, '%') + 1)
where instr(file_name, '%') > 0;
The where clause stops it trying to update any file names without the *% at the start.
db<>fiddle demo
The question you linked to is using a subquery - or line view - to generate the value from a string literal. You don't need to do that here, as you already have the column value.

Select query where a column contains set of numbers

I have to write a query on a table which has a varchar column. Value in this column may have a numbers as substring
Lets possible say the column values are
Data
-----------------------
abc=123/efg=143/ijk=163
abc=123/efg=153/ijk=173
now I have to query the table where data contains the numbers [123,143,163] but shouldnt contain any other number.
How can I write this select query ?
This looks like a very bad database design. If you are interested in separate information stored in a string, then don't store the string but the separate information in separate columns. Change this if possible and such queries will become super simple.
However, for the time being it's easy to find the records as described, provided there are always three numbers in the string as in your sample data. Add a slash at the end of the string, so every number has a leading = and a trailing /. Then look up the numbers in the string with LIKE.
select *
from mytable
where data || `/` like '%=123/%'
and data || `/` like '%=143/%'
and data || `/` like '%=163/%';
If these three numbers are in the string, then all numbers match. Hence there is no other number not matching.
If there can be more numbers in the string but no duplicates, then count equal signs to determine how many numbers are in the string:
select *
from mytable
where data || '/' like '%=123/%'
and data || '/' like '%=143/%'
and data || '/' like '%=163/%'
and regexp_count(data, '=') = 3;
And here is a query accepting even duplicate numbers in the string:
select *
from mytable
where regexp_count(data, '=') >= 3
and regexp_count(data, '=') =
regexp_count(data || '/', '=123/') +
regexp_count(data || '/', '=143/') +
regexp_count(data || '/', '=163/');
Oracle Setup:
CREATE TABLE table_name ( data ) AS
SELECT 'abc=123/efg=143/ijk=163' FROM DUAL UNION ALL
SELECT 'abc=123/efg=153/ijk=173' FORM DUAL;
Then you can create some virtual columns to represent the data:
ALTER TABLE table_name ADD abc GENERATED ALWAYS AS (
TO_NUMBER( REGEXP_SUBSTR( data, '(^|/)abc=(\d+)(/|$)', 1, 1, NULL, 2 ) )
) VIRTUAL;
ALTER TABLE table_name ADD efg GENERATED ALWAYS AS (
TO_NUMBER( REGEXP_SUBSTR( data, '(^|/)efg=(\d+)(/|$)', 1, 1, NULL, 2 ) )
) VIRTUAL;
ALTER TABLE table_name ADD ijk GENERATED ALWAYS AS (
TO_NUMBER( REGEXP_SUBSTR( data, '(^|/)ijk=(\d+)(/|$)', 1, 1, NULL, 2 ) )
) VIRTUAL;
And can add appropriate indexes if you want:
CREATE INDEX table_name__abc_efg_ijk__idx ON table_name( abc, efg, ijk );
Query:
Then if you are only going to have those three keys you can do:
SELECT abc, efg, ijk
FROM table_name
WHERE abc = 123
AND efg = 143
AND ijk = 163;
However, if you could get more than three keys and want ignore additional values then you could do:
CREATE TYPE intlist AS TABLE OF INT;
/
SELECT *
FROM table_name
WHERE INTLIST( 143, 123, 163 )
=
CAST(
MULTISET(
SELECT TO_NUMBER(
REGEXP_SUBSTR(
t.data,
'[^/=]+=(\d+)(/|$)',
1,
LEVEL,
NULL,
1
)
)
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT( t.data, '[^/=]+=(\d+)(/|$)' )
)
AS INTLIST
);
This has the added bonus that INTLIST(123, 143, 163) can be passed as a bind parameter (depending on the client program you are using and the Oracle driver) so that you can simply change how many and what numbers you want to filter for (and that the order of the values does not matter).
Also, if you want it to contain at least those values then you can change INTLIST( ... ) = to INTLIST( ... ) SUBMULTISET OF.

extract lowest number before and after slashmark in sql

I have a large dataset of measurements, stored in a free text field. I'm needing to extract the lowest systolic amount (value to left of slash) and lowest diastolic amount (value to right of slash). I've included a sample dataset below... As you can see, some of the records, like #2, are simple, but others are complex. Is there anyway to do this in sql or teradata? Any suggestions are appreciated!
record 1 lt 90/50 rt 128/88
record 2 left arm regular cuff 144/100
record 3 156/72;134/82
record 4 204/127, 189/122, 196/121
What's your TD release?
In TD14 there's a StrTok_Split_to_Table/StrTok function:
DROP TABLE tab;
CREATE TABLE tab (id INT, TEXT VARCHAR(200));
INSERT INTO tab(1,'lt 90/50 rt 128/88');
INSERT INTO tab(2,'left arm regular cuff 144/100');
INSERT INTO tab(3,'156/72;134/82');
INSERT INTO tab(4,'204/127, 189/122, 196/121');
SELECT id,
MIN(CAST(STRTOK(token,'/',1) AS INT)) AS systolic,
MIN(CAST(STRTOK(token,'/',2) AS INT)) AS diastolic
FROM
TABLE (STRTOK_SPLIT_TO_TABLE(tab.id, tab.TEXT, ' ,;')
RETURNS (id INT, tokennum INTEGER, token VARCHAR(256) CHARACTER SET UNICODE)
) AS dt
WHERE token LIKE '%/%'
GROUP BY 1
ORDER BY 1,2;
You have to check for any additional delimiters beside ' ,;' otherwise the CAST to INT will fail.
If your release is pre-TD14 the STRTOK_SPLIT_TO_TABLE can probably be replaced by a way more complicated recursive query. How many rows exist in that table and how large is the text column defined?
Edit:
This is a quick&dirty recursive query, assuming blood pressure for living people is never single or four digits :-)
WITH RECURSIVE cte
(id,
pos,
remaining,
token
) AS (
SELECT
id,
POSITION('/' IN TEXT) AS pos,
SUBSTRING(TEXT FROM pos +4) AS remaining,
TRIM(BOTH ',' FROM (TRIM(BOTH ';' FROM (TRIM(BOTH ' ' FROM (SUBSTRING(TEXT FROM pos -3 FOR 7))))))) AS token
FROM tab
WHERE POSITION('/' IN TEXT) > 0
UNION ALL
SELECT
id,
POSITION('/' IN remaining) AS newpos,
SUBSTRING(remaining FROM newpos +4) AS remaining,
TRIM(BOTH ',' FROM (TRIM(BOTH ';' FROM (TRIM(BOTH ' ' FROM (SUBSTRING(remaining FROM newpos -3 FOR 7))))))) AS token
FROM cte
WHERE POSITION('/' IN remaining) > 0
)
SELECT id,
MIN(CAST(SUBSTRING(token FROM 1 FOR POSITION('/' IN token) -1) AS INT)) AS stolic,
MIN(CAST(SUBSTRING(token FROM POSITION('/' IN token) +1) AS INT)) AS diasystolic
FROM cte
GROUP BY 1
Btw, both queries will fail if there are no numbers before/after the slash

Select 2 columns in one and combine them

Is it possible to select 2 columns in just one and combine them?
Example:
select something + somethingElse as onlyOneColumn from someTable
(SELECT column1 as column FROM table )
UNION
(SELECT column2 as column FROM table )
Yes, just like you did:
select something + somethingElse as onlyOneColumn from someTable
If you queried the database, you would have gotten the right answer.
What happens is you ask for an expression. A very simple expression is just a column name, a more complicated expression can have formulas etc in it.
Yes,
SELECT CONCAT(field1, field2) AS WHOLENAME FROM TABLE
WHERE ...
will result in data set like:
WHOLENAME
field1field2
None of the other answers worked for me but this did:
SELECT CONCAT(Cust_First, ' ', Cust_Last) AS CustName FROM customer
Yes it's possible, as long as the datatypes are compatible. If they aren't, use a CONVERT() or CAST()
SELECT firstname + ' ' + lastname AS name FROM customers
The + operator should do the trick just fine. Keep something in mind though, if one of the columns is null or does not have any value, it will give you a NULL result. Instead, combine + with the function COALESCE and you'll be set.
Here is an example:
SELECT COALESCE(column1,'') + COALESCE(column2,'') FROM table1.
For this example, if column1 is NULL, then the results of column2 will show up, instead of a simple NULL.
Hope this helps!
To complete the answer of #Pete Carter, I would add an "ALL" on the UNION (if you need to keep the duplicate entries).
(SELECT column1 as column FROM table )
UNION ALL
(SELECT column2 as column FROM table )
DROP TABLE IF EXISTS #9
CREATE TABLE #9
(
USER1 int
,USER2 int
)
INSERT INTO #9
VALUES(1, 2), (1, 3), (1, 4), (2, 3)
------------------------------------------------
(SELECT USER1 AS 'column' from #9)
UNION ALL
(SELECT USER2 AS 'column' from #9)
Would then return : Result
Yes, you can combine columns easily enough such as concatenating character data:
select col1 | col 2 as bothcols from tbl ...
or adding (for example) numeric data:
select col1 + col2 as bothcols from tbl ...
In both those cases, you end up with a single column bothcols, which contains the combined data. You may have to coerce the data type if the columns are not compatible.
if one of the column is number i have experienced the oracle will think '+' as sum operator instead concatenation.
eg:
select (id + name) as one from table 1; (id is numeric)
throws invalid number exception
in such case you can || operator which is concatenation.
select (id || name) as one from table 1;
Your syntax should work, maybe add a space between the colums like
SELECT something + ' ' + somethingElse as onlyOneColumn FROM someTable
I hope this answer helps:
SELECT (CAST(id AS NVARCHAR)+','+name) AS COMBINED_COLUMN FROM TABLENAME;
select column1 || ' ' || column2 as whole_name FROM tablename;
Here || is the concat operator used for concatenating them to single column and ('') inside || used for space between two columns.
SELECT firstname || ' ' || lastname FROM users;