BigQuery function to remove words in string by lookup table

BigQuery function to remove words in string by lookup table - sql

Given a string, I want to create a function to remove any word/word phrase in the string if it exists in a lookup table.
For example, given a string s1 s2 s3 s4 s5 s6 and a lookup table
word
s2
s4 s5
Expected result:
select fn.remove_substring('s1 s2 s3 s4 s5 s6')
-- Expected output: 's1 s3 s6'
In PostgreSQL, I actually have a working function implemented, however, I am not sure how to rewrite it in BigQuery, as BigQuery UDF does not allow cursor and looping.
CREATE OR REPLACE FUNCTION fn.remove_substring(s text)
RETURNS text
LANGUAGE plpgsql
AS $function$
declare
replaced_string text := s;
t_cur cursor for
select word from public.lookup order by word desc;
begin
for row in t_cur loop
replaced_string := regexp_replace(replaced_string, '\y'||row.word||'\y', '', 'gi');
end loop;
return replaced_string;
end;
$function$
;

You might consider below.
WITH sample_table AS (
SELECT 's1 s2 S3 S4 s5 s6' str
),
lookup_table AS (
SELECT 's2' word UNION ALL
SELECT 's4 s5'
)
SELECT str,
REGEXP_REPLACE(
str, (SELECT '(?i)(' || STRING_AGG(word, '|' ORDER BY LENGTH(word) DESC) || ')' FROM lookup_table), ''
) AS removed_str
FROM sample_table;
Query results
If implemented in an UDF,
CREATE TEMP TABLE lookup_table AS
SELECT 's2' word UNION ALL
SELECT 's4 s5'
;
CREATE TEMP FUNCTION remove_substring(str STRING) AS (
REGEXP_REPLACE(
str, (SELECT '(?i)(' || STRING_AGG(word, '|' ORDER BY LENGTH(word) DESC) || ')' FROM lookup_table), ''
)
);
SELECT remove_substring('s1 s2 s3 s4 s5 s6');

Using the same approach of #jaytiger . However you can also create the regular expression by executing the select string_agg only once, this way if your lookup table is too big you don’t have to execute the same query for each row.
Example:
declare regex String default '';
create temp table main AS (
select 's1 s2 s3 s4 s5 s6' str
);
create temp table lookup_table AS (
select 's2' word union all
select 's4' union all
select 's5'
);
set regex = ( select string_agg(word, '|' order by length(word) desc) from lookup_table ) ;
select regexp_replace(str, regex, '')new_str from main;
Output:
N.B above query is case sensitive.You should modify it for your requirement.

Related

How can I get value with key in JSON_DATAGUIDE (dynamically)

JSON_DATAGUIDE gives me only keys not value like "$.a" . How can I get key-value pair that example in below.
select json_dataguide('{a:100, b:200, c:300}')
from dual;
JSON_DATAGUIDE('{A:100,B:200,C:300}')
--------------------------------------------------------------------------------
[{"o:path":"$.a","type":"number","o:length":4},{"o:path":"$.b","type":"number","
o:length":4},{"o:path":"$.c","type":"number","o:length":4}]
I need like this as table:
Column Key, Column Value
a 100
b 200
c 300
I want to find it without using declare,begin etc. Only Built-in function for example json_table,json_dataguide. I don't want to declare function or something.

An alternative would be
declare
j JSON_OBJECT_T;
i NUMBER;
k JSON_KEY_LIST;
CURSOR c_json IS
SELECT '{a:100, b:200, c:300}' as myJsonCol from dual;
begin
FOR rec IN c_json
LOOP
j := JSON_OBJECT_T.parse(rec.myJsonCol);
k := j.get_keys;
dbms_output.put_line('KEY VAL');
FOR i in 1..k.COUNT
LOOP
dbms_output.put_line(k(i) || ' ' || j.get_Number(k(i)));
END LOOP;
END LOOP;
END;
/
Demo
db<>fiddle
Then you can store the result if you want in sys_refcursor, or even create a table function.

In later Oracle versions, you can include functions in a sub-query factoring (WITH) clause of a SELECT statement. Then, you can use this answer:
WITH FUNCTION get_key(
pos IN PLS_INTEGER,
json IN CLOB
) RETURN VARCHAR2
AS
doc_keys JSON_KEY_LIST;
BEGIN
doc_keys := JSON_OBJECT_T.PARSE ( json ).GET_KEYS;
RETURN doc_keys( pos );
END get_key;
SELECT get_key( j.pos, t.value ) AS key,
j.value
FROM table_name t
CROSS APPLY JSON_TABLE(
t.value,
'$.*'
COLUMNS (
pos FOR ORDINALITY,
value PATH '$'
)
) j;
Which, for the sample data:
CREATE TABLE table_name ( value VARCHAR2(4000) CHECK (value IS JSON) );
INSERT INTO table_name (value) VALUES ('{a:100, b:200, c:300}');
Outputs:
KEY
VALUE
a
100
b
200
c
300
Only Built-in function for example json_table,json_dataguide
You are going to struggle with those limitations as:
JSON_QUERY only allows literal values for the path; you cannot pass a dynamic path value.
JSON_TABLE does appear to allow dynamic paths in the COLUMNS clause but does not return a value for those dynamic paths.
For example:
SELECT t.value AS json,
SUBSTR(p.path, 3) AS key,
JSON_QUERY(t.value, p.path) AS value
FROM table_name t
CROSS JOIN LATERAL(
SELECT JSON_DATAGUIDE(t.value) AS data
FROM DUAL
) d
CROSS JOIN LATERAL(
SELECT path
FROM JSON_TABLE(
d.data,
'$[*]'
COLUMNS(
path VARCHAR2(20) PATH '$."o:path"'
)
)
) p;
Outputs:
ORA-40454: path expression not a literal
and:
SELECT t.value AS json,
SUBSTR(p.path, 3) AS key,
v.val AS value
FROM table_name t
CROSS JOIN LATERAL(
SELECT JSON_DATAGUIDE(t.value) AS data
FROM DUAL
) d
CROSS JOIN LATERAL(
SELECT path
FROM JSON_TABLE(
d.data,
'$[*]'
COLUMNS(
path VARCHAR2(20) PATH '$."o:path"'
)
)
) p
CROSS JOIN LATERAL(
SELECT val
FROM JSON_TABLE(
t.value,
'$'
COLUMNS(
val VARCHAR2(20) PATH p.path
)
)
) v;
Outputs:
JSON
KEY
VALUE
{"a":100, "b":200, "c":300}
a
<null>
{"a":100, "b":200, "c":300}
b
<null>
{"a":100, "b":200, "c":300}
c
<null>
Although the query works it does not dynamically get the value. (Note: The query would work if you use a literal path instead of a dynamic path.)
db<>fiddle here

Oracle function to compare strings in a not ordered way

I need a function to make a comparison between two strings withouth considering the order in oracle.
i.e. "asd" and "sad" should be considered as equal.
Are there similar functions? Or I need to write my own function?

This can be done with a simple java function to sort the characters of a string alphabetically:
CREATE AND COMPILE JAVA SOURCE NAMED SORTSTRING AS
public class SortString {
public static String sort( final String value )
{
final char[] chars = value.toCharArray();
java.util.Arrays.sort( chars );
return new String( chars );
}
};
/
Which you can then create a PL/SQL function to invoke:
CREATE FUNCTION SORTSTRING( in_value IN VARCHAR2 ) RETURN VARCHAR2
AS LANGUAGE JAVA NAME 'SortString.sort( java.lang.String ) return java.lang.String';
/
Then you can do a simple comparison on the sorted strings:
SELECT CASE
WHEN SORTSTRING( 'ads' ) = SORTSTRING( 'das' )
THEN 'Equal'
ELSE 'Not Equal'
END
FROM DUAL;

Not exactly a rocket science, but works (kind of, at least on simple cases).
What does it do? Alphabetically sorts letters in every string and compares them.
SQL> with test (col1, col2) as
2 (select 'asd', 'sad' from dual),
3 inter as
4 (select
5 col1, regexp_substr(col1, '[^.]', 1, level) c1,
6 col2, regexp_substr(col2, '[^.]', 1, level) c2
7 from test
8 connect by level <= greatest(length(col1), length(col2))
9 ),
10 agg as
11 (select listagg(c1, '') within group (order by c1) col1_new,
12 listagg(c2, '') within group (order by c2) col2_new
13 from inter
14 )
15 select case when col1_new = col2_new then 'Equal'
16 else 'Different'
17 end result
18 From agg;
RESULT
---------
Equal
SQL> with test (col1, col2) as
2 (select 'asd', 'sadx' from dual),
<snip>
RESULT
---------
Different
SQL>

Yet another solution, using the SUBSTR function and CONNECT BY loop.
SQL Fiddle
Query 1:
WITH a
AS (SELECT ROWNUM rn, a1.*
FROM ( SELECT SUBSTR ('2asd', LEVEL, 1) s1
FROM DUAL
CONNECT BY LEVEL <= LENGTH ('2asd')
ORDER BY s1) a1),
b
AS (SELECT ROWNUM rn, a2.*
FROM ( SELECT SUBSTR ('asd2', LEVEL, 1) s2
FROM DUAL
CONNECT BY LEVEL <= LENGTH ('asd2')
ORDER BY s2) a2)
SELECT CASE COUNT (NULLIF (s1, s2)) WHEN 0 THEN 'EQUAL' ELSE 'NOT EQUAL' END
res
FROM a INNER JOIN b ON a.rn = b.rn
Results:
| RES |
|-------|
| EQUAL |
EDIT : A PL/SQL Sort function for alphanumeric strings.
CREATE OR replace FUNCTION fn_sort(str VARCHAR2)
RETURN VARCHAR2 DETERMINISTIC AS
v_s VARCHAR2(4000);
BEGIN
SELECT LISTAGG(substr(str, LEVEL, 1), '')
within GROUP ( ORDER BY substr(str, LEVEL, 1) )
INTO v_s
FROM dual
CONNECT BY LEVEL < = length(str);
RETURN v_s;
END;
/
select fn_sort('shSdf3213Js') as s
from dual;
| S |
|-------------|
| 1233JSdfhss |

In case you want to create your own sort function, you can use below code,
CREATE OR REPLACE FUNCTION sort_text (p_text_to_sort VARCHAR2) RETURN VARCHAR2
IS
v_sorted_text VARCHAR2(1000);
BEGIN
v_sorted_text := p_text_to_sort;
FOR i IN 1..LENGTH(p_text_to_sort)
LOOP
FOR j IN 1..LENGTH(p_text_to_sort)
LOOP
IF SUBSTR(v_sorted_text, j, 1)||'' > SUBSTR(v_sorted_text, j+1, 1)||'' THEN
v_sorted_text := SUBSTR(v_sorted_text, 1, j-1)||
SUBSTR(v_sorted_text, j+1, 1)||
SUBSTR(v_sorted_text, j, 1)||
SUBSTR(v_sorted_text, j+2);
END IF;
END LOOP;
END LOOP;
RETURN v_sorted_text;
END;
/
SELECT SORT_TEXT('zlkdsadfsdfasdf') SORTED_TEXT
FROM dual;
SORTED_TEXT
---------------
aaddddfffklsssz

PostgreSQL: Get values of a register as multiple rows

Using PostgreSQL 9.3, I'm creating a Jasper reports template to make a pdf report. I want to create reports of different tables, with multiple columns, all with the same template. A solution could be to get values of register as pairs of column name and value per id.
By example, if I had a table like:
id | Column1 | Column2 | Column3
-------------------------------------------------
1 | Register1C1 | Register1C2 | Register1C3
I would like to get the register as:
Id | ColumnName | Value
-----------------------------
1 | Column1 | Register1C1
1 | Column2 | Register1C2
1 | Column3 | Register1C3
The data type of value columns can vary!
Is it possible? How can I do this?

If all your columns share the same data type and order of rows does not have to be enforced:
SELECT t.id, v.*
FROM tbl t, LATERAL (
VALUES
('col1', col1)
, ('col2', col2)
, ('col3', col3)
-- etc.
) v(col, val);
About LATERAL (requires Postgres 9.3 or later):
What is the difference between LATERAL and a subquery in PostgreSQL?
Combining it with a VALUES expression:
Crosstab transpose query request
SELECT DISTINCT on multiple columns
For varying data types, the common denominator would be text, since every type can be cast to text. Plus, order enforced:
SELECT t.id, v.col, v.val
FROM tbl t, LATERAL (
VALUES
(1, 'col1', col1::text)
, (2, 'col2', col2::text)
, (3, 'col3', col3::text)
-- etc.
) v(rank, col, val)
ORDER BY t.id, v.rank;
In Postgres 9.4 or later use the new unnest() for multiple arrays:
SELECT t.id, v.*
FROM tbl t, unnest('{col1,col2,col3}'::text[]
, ARRAY[col1,col2,col3]) v(col, val);
-- , ARRAY[col1::text,col2::text,col3::text]) v(col, val);
The commented alternative for varying data types.
Full automation for Postgres 9.4:
The query above is convenient to automate for a dynamic set of columns:
CREATE OR REPLACE FUNCTION f_transpose (_tbl regclass, VARIADIC _cols text[])
RETURNS TABLE (id int, col text, val text) AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
'SELECT t.id, v.* FROM %s t, unnest($1, ARRAY[%s]) v'
, _tbl, array_to_string(_cols, '::text,') || '::text'))
-- , _tbl, array_to_string(_cols, ','))) -- simple alternative for only text
USING _cols;
END
$func$ LANGUAGE plpgsql;
Call - with table name and any number of column names, any data types:
SELECT * FROM f_transpose('table_name', 'column1', 'column2', 'column3');
Weakness: the list of column names is not safe against SQL injection. You could gather column names from pg_attribute instead. Example:
How to perform the same aggregation on every column, without listing the columns?

SELECT id
,unnest(string_to_array('col1,col2,col3', ',')) col_name
,unnest(string_to_array(col1 || ',' || col2 || ',' || col3, ',')) val
FROM t
Try following method:
My sample table name is t,to get the n columns name you can use this query
select string_agg(column_name,',') cols from information_schema.columns where
table_name='t' and column_name<>'id'
this query will selects all columns in your table except id column.If you want to specify schema name then use table_schema='your_schema_name' in where clause
To create select query dynamically
SELECT 'select id,unnest(string_to_array(''' || cols || ''','','')) col_name,unnest(string_to_array(' || cols1 || ','','')) val from t'
FROM (
SELECT string_agg(column_name, ',') cols -- here we'll get all the columns in table t
,string_agg(column_name, '||'',''||') cols1
FROM information_schema.columns
WHERE table_name = 't'
AND column_name <> 'id'
) tb;
And using following plpgsql function dynamically creates SELECT id,unnest(string_to_array('....')) col_name,unnest(string_to_array(.., ',')) val FROM t and execute.
CREATE OR replace FUNCTION fn ()
RETURNS TABLE (
id INT
,columname TEXT
,columnvalues TEXT
) AS $$
DECLARE qry TEXT;
BEGIN
SELECT 'select id,unnest(string_to_array(''' || cols || ''','','')) col_name,unnest(string_to_array(' || cols1 || ','','')) val from t'
INTO qry
FROM (
SELECT string_agg(column_name, ',') cols
,string_agg(column_name, '||'',''||') cols1
FROM information_schema.columns
WHERE table_name = 't'
AND column_name <> 'id'
) tb;
RETURN QUERY
EXECUTE format(qry);
END;$$
LANGUAGE plpgsql
Call this function like select * from fn()

using Oracle SQL - regexp_substr to split a record

I need to split the record for column CMD.NUM_MAI which may contain ',' or ';'.
I did this but it gave me an error:
SELECT REGEXP_SUBSTR (expression.num_mai,
'[^;|,]+',
1,
LEVEL)
FROM (SELECT CMD.num_cmd,
(SELECT COMM.com
FROM COMM
WHERE COMM.cod_soc = CMD.cod_soc AND COMM.cod_com = 'URL_DSD')
AS cod_url,
NVL (CONTACT.nom_cta, TIERS.nom_ct1) AS nom_cta,
NVL (CONTACT.num_mai, TIERS.num_mai) AS num_mai,
NVL (CONTACT.num_tel, TIERS.num_tel) AS num_tel,
TO_CHAR (SYSDATE, 'hh24:MI') AS heur_today
FROM CMD, TIERS, CONTACT
WHERE ( (CMD.cod_soc = :CMD_cod_soc)
AND (CMD.cod_eta = :CMD.cod_eta)
AND (CMD.typ_cmd = :CMD.typ_cmd)
AND (CMD.num_cmd = :CMD.num_cmd))
AND (TIERS.cod_soc(+) = CMD.cod_soc)
AND (TIERS.cod_trs(+) = CMD.cod_trs_tra)
AND (TIERS.cod_soc = CONTACT.cod_soc(+))
AND (TIERS.cod_trs = CONTACT.cod_trs(+))
AND (CONTACT.lib_cta(+) = 'EDITION')) experssion
CONNECT BY REGEXP_SUBSTR (expression.num_mai,'[^;|,]+',1,LEVEL)

Error 1:
The expression in CONNECT BY clause is unary. You have to specify both left and right hand side operands.
Try something like,
CONNECT BY REGEXP_SUBSTR (expression.num_mai,'[^;|,]+',1,LEVEL) IS NOT NULL
Error 2:
Your bind variable name is wrong. Ex: :CMD_cod_eta
Perhaps you wanted this way!
( (CMD.cod_soc = :CMD_cod_soc)
AND (CMD.cod_eta = :CMD_cod_eta)
AND (CMD.typ_cmd = :CMD_typ_cmd)
AND (CMD.num_cmd = :CMD_num_cmd))

This is a common question, I'd put into a function, then call it as needed:
CREATE OR REPLACE function fn_split(i_string in varchar2, i_delimiter in varchar2 default ',', b_dedup_tokens in number default 0)
return sys.dbms_debug_vc2coll
as
l_tab sys.dbms_debug_vc2coll;
begin
select regexp_substr(i_string,'[^' || i_delimiter || ']+', 1, level)
bulk collect into l_tab
from dual
connect by regexp_substr(i_string, '[^' || i_delimiter || ']+', 1, level) is not null
order by level;
if (b_dedup_tokens > 0) then
return l_tab multiset union distinct l_tab;
end if;
return l_tab;
end;
/
This will return a table of varchar2(1000), dbms_debug_vc2coll, which is a preloaded type owned by SYS (or you could create your own type using 4000 perhaps). Anyway, an example using it (with space, comma, or semi-colon used as delimiters):
with test_data as (
select 1 as id, 'A;test;test;string' as test_string from dual
union
select 2 as id, 'Another string' as test_string from dual
union
select 3 as id,'A,CSV,string' as test_string from dual
)
select d.*, column_value as token
from test_data d, table(fn_split(test_string, ' ,;', 0));
Output:
ID TEST_STRING TOKEN
1 A;test;test;string A
1 A;test;test;string test
1 A;test;test;string test
1 A;test;test;string string
2 Another string Another
2 Another string string
3 A,CSV,string A
3 A,CSV,string CSV
3 A,CSV,string string
You can pass 1 instead of 0 to fn_split to dedup the tokens (like the repeated "test" token above)

Oracle - pl sql selecting from SYS_REFCURSOR

I have a function that returns a SYS_REFCURSOR that has a single row but multiple columns. What I'm looking to do is to be able to have a SQL query that has nested sub-queries using the column values returned in the SYS_REFCURSOR. Alternative ideas such as types, etc would be appreciated. Code below is me writing on-the-fly and hasn't been validated for syntax.
--Oracle function
CREATE DummyFunction(dummyValue AS NUMBER) RETURN SYS_REFCURSOR
IS
RETURN_DATA SYS_REFCURSOR;
BEGIN
OPEN RETURN_DATA
SELECT
TO_CHAR(dummyValue) || 'A' AS ColumnA
,TO_CHAR(dummyValue) || 'B' AS ColumnB
FROM
DUAL;
RETURN RETURN_DATA;
END;
--sample query with sub-queries; does not work
SELECT
SELECT ColumnA FROM DummyFunction(1) FROM DUAL AS ColumnA
,SELECT ColumnB FROM DummyFunction(1) FROM DUAL AS ColumnB
FROM
DUAL;

A SYS_REFCURSOR won't work for the intended use - you need to create an Oracle TYPE:
CREATE TYPE your_type IS OBJECT (
ColumnA VARCHAR2(100),
ColumnB VARCHAR2(100)
)
Update your function:
CREATE DummyFunction(dummyValue AS NUMBER)
RETURN your_type
IS
BEGIN
INSERT INTO your_type
SELECT TO_CHAR(dummyValue) || 'A' AS ColumnA,
TO_CHAR(dummyValue) || 'B' AS ColumnB
FROM DUAL;
RETURN your_type;
END;
Then you can use:
SELECT (SELECT ColumnA FROM table(DummyFunction(1))) AS ColumnA,
(SELECT ColumnB FROM table(DummyFunction(1))) AS ColumnB
FROM DUAL
The example is overcomplicated - all you need to use is:
SELECT x.columna,
x.columnb
FROM table(DummyFunction(1)) x

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery function to remove words in string by lookup table - sql

Related

How can I get value with key in JSON_DATAGUIDE (dynamically)

Oracle function to compare strings in a not ordered way

PostgreSQL: Get values of a register as multiple rows

using Oracle SQL - regexp_substr to split a record

Oracle - pl sql selecting from SYS_REFCURSOR

Categories

Resources