Postgresql subtract comma separated string in one column from another column - sql

The format is like:
col1
col2
V1,V2,V3,V4,V5,V6
V4,V1,V6
V1,V2,V3
V2,V3
I want to create another column called col3 which contains the subtraction of two columns.
What I have tried:
UPDATE myTable
SET col3=(replace(col1,col2,''))
It works well for rows like row2 since the order of replacing patterns matters.
I was wondering if there's a perfect way to achieve the same goal for rows like row1.
So the desired output would be:
col1
col2
col3
V1,V2,V3,V4,V5,V6
V4,V1,V6
V2,V3,V5
V1,V2,V3
V2,V3
V1
Any suggestions would be appreciated!

Split values into tables, subtract sets and then assemble it back. Everything is possible as an expression defining new query column.
with t (col1,col2) as (values
('V1,V2,V3,V4,V5,V6','V4,V1,V6'),
('V1,V2,V3','V2,V3')
)
select col1,col2
, (
select string_agg(v,',')
from (
select v from unnest(string_to_array(t.col1,',')) as a1(v)
except
select v from unnest(string_to_array(t.col2,',')) as a2(v)
) x
)
from t
DB fiddle

You will have to unnest the elements then apply an EXCEPT clause on the "unnested" rows and aggregate back:
select col1,
col2,
(select string_agg(item,',' order by item)
from (
select *
from string_to_table(col1, ',') as c1(item)
except
select *
from string_to_table(col2, ',') as c2(item)
) t)
from the_table;
I wouldn't store that result in a separate column, but if you really need to introduce even more problems by storing another comma separated list.
update the_table
set col3 = (select string_agg(item,',' order by item)
from (
select *
from string_to_table(col1, ',') as c1(item)
except
select *
from string_to_table(col2, ',') as c2(item)
) t)
;
string_to_table() requires Postgres 14 or newer. If you are using an older version, you need to use unnest(string_to_array(col1, ',')) instead
If you need that a lot, consider creating a function:
create function remove_items(p_one text, p_other text)
returns text
as
$$
select string_agg(item,',' order by item)
from (
select *
from string_to_table(col1, ',') as c1(item)
except
select *
from string_to_table(col2, ',') as c2(item)
) t;
$$
language sql
immutable;
Then the above can be simplified to:
select col1, col2, remove_items(col1, col2)
from the_table;

Note, POSTGRESQL is not my forte, but thought I'd have a go at it. Try:
SELECT col1, col2, RTRIM(REGEXP_REPLACE(Col1,CONCAT('\m(?:', REPLACE(Col2,',','|'),')\M,?'),'','g'), ',') as col3 FROM myTable
See an online fidle.
The idea is to use a regular expession to replace all values, based on following pattern:
\m - Word-boundary at start of word;
(?:V4|V1|V6) - A non-capture group that holds the alternatives from col2;
\M - Word-boundary at end of word;
,? - Optional comma.
When replaced with nothing we need to clean up a possible trailing comma with RTRIM(). See an online demo where I had to replace the word-boundaries with the \b word-boundary to showcase the outcome.

Related

How do I select columns based on a string pattern in BigQuery

I have a table in BigQuery with hundreds of columns, and it just happens that I want to select all of them except for those that begin with an underscore. I know how to do a query to select the columns beginning with an underscore using the INFORAMTION_SCHEMA.COLUMNS table, but I can't figure out how I would use this query to select the columns I want. I know BigQuery has EXCEPT but I want to avoid writing out each column that begins with an underscore, and I can't seem to pass to it a subquery or even something like a._*.
Consider below approach
execute immediate (select '''
select * except(''' || string_agg(col) || ''') from your_table
'''
from (
select col
from (select * from your_table limit 1) t,
unnest([struct(translate(to_json_string(t), '{}"', '') as kvs)]),
unnest(split(kvs)) kv,
unnest([struct(split(kv, ':')[offset(0)] as col)])
where starts_with(col, '_')
));
if apply to table like below
it generates below statement
select * except(_c,_e) from your_table
and produces below output

concatenate all columns from with names of columns also in it, one string for every row

CREATE TABLE myTable
(
COL1 int,
COL2 varchar(10),
COL3 float
)
INSERT INTO myTable
VALUES (1, 'c2r1', NULL), (2, 'c2r2', 2.335)
I want an output with for every row of a table one string with all columns and the names in it.
Something like:
COL1=1|COL2=c2r1|COL3=NULL
COL1=2|COL2=c2r2|COL3=2.3335
I have a table with lot of columns so it has to be dynamic (it would use it on different tables also), is there an easy solution where I can do it and choose separator and things like that... (It has to deal with NULL-values & numeric values also.)
I am using SQL Server 2019.
Since you are on 2019, string_agg() with a bit if JSON
Example
Select NewVal
From MyTable A
Cross Apply ( Select NewVal = string_agg([key]+'='+isnull(value,'null'),'|')
From OpenJson((Select A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES ))
) B
Results
NewVal
COL1=1|COL2=c2r1|COL3=null
COL1=2|COL2=c2r2|COL3=2.335000000000000e+000 -- Don't like the float
EDIT to Trap FLOATs
Select NewVal
From MyTable A
Cross Apply ( Select NewVal = string_agg([key]+'='+isnull(case when value like '%0e+0%' then concat('',convert(decimal(15,3),convert(float,value))) else value end,'null'),'|')
From OpenJson((Select A.* For JSON Path,Without_Array_Wrapper,INCLUDE_NULL_VALUES ))
) B
Results
NewVal
COL1=1|COL2=c2r1|COL3=null
COL1=2|COL2=c2r2|COL3=2.335
Would one dare to abuse json for this?
SELECT REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (ca.js,'":','='), ',"','|'), '"',''), '[{','') ,'}]','') AS data
FROM (SELECT col1 as id FROM myTable) AS list
CROSS APPLY
(
SELECT t.col1
, t.col2
, cast(t.col3 as decimal(16,3)) as col3
FROM myTable t
WHERE t.col1 = list.id
FOR JSON AUTO, INCLUDE_NULL_VALUES
) ca(js)
It'll work with a simple SELECT t.* in the cross apply.
But the floats tend to be bit too long then.

Remove Substring according to specific pattern

I need to remove in a SQL Server database a substring according to a pattern:
Before: Winter_QZ6P91712017_115BPM
After: Winter_115BPM
Or
Before: cpx_Note In My Calendar_QZ6P91707044
After: cpx_Note In My Calendar
Basically delete the substring that has pattern _ + 12 chars.
I've tried PatIndex('_\S{12}', myCol) to get the index of the substring but it doesn't match anything.
Assuming you mean underscore followed by 12 characters that are not underscores you can use this pattern:
SELECT *,
CASE WHEN PATINDEX('%[_][^_][^_][^_][^_][^_][^_][^_][^_][^_][^_][^_][^_]%', str) > 0
THEN STUFF(str, PATINDEX('%[_][^_][^_][^_][^_][^_][^_][^_][^_][^_][^_][^_][^_]%', str), 13, '')
ELSE str
END
FROM (VALUES
('Winter_QZ6P91712017_115BPM'),
('Winter_115BPM_QZ6P91712017')
) AS tests(str)
late to the party, but you could also use latest STRING_SPLIT function to explode the string by underscores and count length of each segment between underscores. If the length is >=12, these sections must be replaced from original string via replace function recursively.
drop table if exists Tbl;
drop table if exists #temptable;
create table Tbl (input nvarchar(max));
insert into Tbl VALUES
('Winter_QZ6P91712017_115BPM'),
('cpx_Note In My Calendar_QZ6P91707044'),
('stuff_asdasd_QZ6P91712017'),
('stuff_asdasd_QZ6P91712017_stuff_asdasd_QZ6P91712017'),
('stuff_asdasd_QZ6P917120117_stuff_asdasd_QZ6P91712017');
select
input, value as replacethisstring,
rn = row_number() over (partition by input order by (select 1))
into #temptable
from
(
select
input,value as hyphensplit
from Tbl
cross apply string_split(input,'_')
)T cross apply string_split(hyphensplit,' ')
where len(value)>=12
; with cte as (
select input, inputtrans= replace(input,replacethisstring,''), level=1 from #temptable where rn=1
union all
select T.input,inputtrans=replace(cte.inputtrans,T.replacethisstring,''),level=level+1
from cte inner join #temptable T on T.input=cte.input and rn=level+1
--where level=rn
)
select input, inputtrans
from (
select *, rn=row_number() over (partition by input order by level desc) from cte
) T where rn=1
sample output
SQL Server doesn't support Regex. Considering, however, you just want to remove the first '_' and the 12 characters afterwards, you could use CHARINDEX to find the location of said underscore, and then STUFF to remove the 13 characters:
SELECT V.YourString,
STUFF(V.YourString, CHARINDEX('_',V.YourString),13,'') AS NewString
FROM (VALUES('Winter_QZ6P91712017_115BPM'))V(YourString);

Presto insert value in to a column of (array<struct<pos:int, date:string>>)

I have a column 'col2' which is of type
array<struct<pos:int, date:string>>
I need to check if the column is empty and then insert values to the column and then unnest the values in the column
case WHEN CARDINALITY(col2) = 0 THEN ARRAY[(0,'value1'),(0,'value2')] else col2 end as col2
Below is sql
WITH CTE AS
(SELECT
col1,
case
WHEN CARDINALITY(col2) = 0 THEN ARRAY[(0,'value1'),(0,'value2')]
else col2
end as col2
FROM table1
)
SELECT
col1
column2.value1 AS pos,
column2.value2 AS date,
FROM CTE
CROSS JOIN UNNEST(col2) AS t(column2)
Because the case expression returns [{field1=1,field2=2020-03-01},{field1=1,field2=2020-01-09}]
i am not able to unpack it as value1 and value2, and above expression throws error.
Can anyone help me to fix this?
When the elements of an array are of type row, UNNEST expands them into separate columns. You need to adjust the UNNEST clause to reflect this.
Here's an example (tested with Trino 351, formerly known as Presto SQL):
WITH
data(entries) AS (VALUES
ARRAY[],
ARRAY[(1,'x'),(2,'y')]
),
cte(entries) AS (
SELECT if(cardinality(entries) = 0, ARRAY[(0,'value1'),(0,'value2')], entries)
FROM data
)
SELECT pos, date
FROM cte
CROSS JOIN UNNEST(entries) AS t(pos, date)

Double IN Statements in SQL

Just curious about the IN statement in SQL.
I know I can search multiple columns with one value by doing
'val1' IN (col1,col2)
And can search a column for multiple values
col1 IN ('val1','val2')
But is there a way to do both of these simultaneously, without restorting to an repeating AND / OR in the SQl? I am looking to do this in the most scalable way, so independent of how many vals / cols i need to search in.
So essentially:
('val1','val2') IN (col1,col2)
but valid.
You could do something like this (which I've also put on SQLFiddle):
-- Test data:
WITH t(col1, col2) AS (
SELECT 'val1', 'valX' UNION ALL
SELECT 'valY', 'valZ'
)
-- Solution:
SELECT *
FROM t
WHERE EXISTS (
SELECT 1
-- Join all columns with all values to see if any column matches any value
FROM (VALUES(t.col1),(t.col2)) t1(col)
JOIN (VALUES('val1'),('val2')) t2(val)
ON col = val
)
Of course, one could argue, which version is more concise.
Yes, for example you can do this in Oracle:
select x, y from (select 1 as x, 2 as y from dual)
where (x,y) in (select 1 as p, 2 as q from dual)