Remove items in a delimited list that are non numeric in SQL for Redshift - sql

I am working with a field called codes that is a delimited list of values, separated by commas. Within each item there is a title ending in a colon and then a code number following the colon. I want a list of only the code numbers after each colon.
Example Value:
name-form-na-stage0:3278648990379886572,rules-na-unwanted-sdfle2:6886328308933282817,us-disdg-order-stage1:1273671130817907765
Desired Output:
3278648990379886572,6886328308933282817,1273671130817907765
The title does always start with a letter and the end with a colon so I can see how REGEXP_REPLACE might work to replace any string between starting with a letter and ending with a colon with '' might work but I am not good at REGEXP_REPLACE patterns. Chat GPT is down fml.
Side note, if anyone knows of a good guide for understanding pattern notation for regular expressions it would be much appreciated!
I tried this and it is not working REGEXP_REPLACE(REPLACE(REPLACE(codes,':', ' '), ',', ' ') ,' [^0-9]+ ', ' ')

This solution assumes a few things:
No colons anywhere else except immediately before the numbers
No number at the very start
At a high level, this query finds how many colons there are, splits the entire string into that many parts, and then only keeps the number up to the comma immediately after the number, and then aggregates the numbers into a comma-delimited list.
Assuming a table like this:
create temp table tbl_string (id int, strval varchar(1000));
insert into tbl_string
values
(1, 'name-form-na-stage0:3278648990379886572,rules-na-unwanted-sdfle2:6886328308933282817,us-disdg-order-stage1:1273671130817907765');
with recursive cte_num_of_delims AS (
select max(regexp_count(strval, ':')) AS num_of_delims
from tbl_string
), cte_nums(nums) AS (
select 1 as nums
union all
select nums + 1
from cte_nums
where nums <= (select num_of_delims from cte_num_of_delims)
), cte_strings_nums_combined as (
select id,
strval,
nums as index
from cte_nums
cross join tbl_string
), prefinal as (
select *,
split_part(strval, ':', index) as parsed_vals
from cte_strings_nums_combined
where parsed_vals != ''
and index != 1
), final as (
select *,
case
when charindex(',', parsed_vals) = 0
then parsed_vals
else left(parsed_vals, charindex(',', parsed_vals) - 1)
end as final_vals
from prefinal
)
select listagg(final_vals, ',')
from final

Related

Split strings in a column based on text values and numerical values such as patindex

I have a column that displays stock market options data like below:
GME240119C00020000
QQQ240119C00305000
NFLX240119P00455000
I want to be able to split these up so they show up like:
GME|240119|C|00020000
QQQ|240119|C|00305000
NFLX|240119|P|00455000
I was able to split the first portion with the ticker name by using the code below, but I don't know how to split the rest of the strings.
case patindex('%[0-9]%', str)
when 0 then str
else left(str, patindex('%[0-9]%', str) -1 )
end
from t
edit: for anyone who is wondering, I used Dale's solution below to get my desired outcome. I edited the query he provided to make the parts show up as individual columns
select
substring(T.contractSymbol,1,C1.Position-1) as a
,substring(T.contractSymbol,C1.Position,6) as b
,substring(S1.Part,1,1) as c
,substring(S1.Part,2,len(S1.Part)) as d
from Options_Data_All T
cross apply (
values (patindex('%[0-9]%', T.contractSymbol))
) C1 (Position)
cross apply (
values (substring(contractSymbol, C1.Position+6, len(T.contractSymbol)))
) S1 (Part);
Just keep doing what you started doing by using SUBSTRING. So as you did find the first number and actually in your case, based on the data provided, everything else is fixed length, so you don't have to search anymore, just split the string.
declare #Test table (Contents nvarchar(max));
insert into #Test (Contents)
values
('GME240119C00020000'),
('QQQ240119C00305000'),
('NFLX240119P00455000');
select
substring(T.Contents,1,C1.Position-1) + '|' + substring(T.Contents,C1.Position,6) + '|' + substring(S1.Part,1,1) + '|' + substring(S1.Part,2,len(S1.Part))
from #Test T
cross apply (
values (patindex('%[0-9]%', T.Contents))
) C1 (Position)
cross apply (
values (substring(Contents, C1.Position+6, len(T.Contents)))
) S1 (Part);
Returns:
Data
GME|240119|C|00020000
QQQ|240119|C|00305000
NFLX|240119|P|00455000
If one can assume that all but the first column are fixed width then a simple SUBSTRING solution would suffice e.g.
select
substring(Contents,1,len(Contents)-15)
+ '|' + substring(Contents,len(Contents)-14,6)
+ '|' + substring(Contents,len(Contents)-8,1)
+ '|' + substring(Contents,len(Contents)-7,8) [Data]
from #Test;
Note: CROSS APPLY is just a fancy way to use a sub-query to avoid needing to repeat a calculation.

Using string methods in a SELECT query to select up to the second space?

In an MS-Access database I'm working with, one of the tables has a field called "Name". The format of this field will generally be along the lines of "firstname surname integer", but sometimes may just be "firstname surname".
I need to select just the first name and the surname from the name field.
I've looked at using the Left function
SELECT DISTINCT LEFT([Name], x)
However since names are different lengths, this isn't going to work since there is no constant integer to use as the second parameter. Nor can it be used with
SELECT DISTINCT LEFT(InStr([Name], " "), x)
for the above reason, but also because because that would split the field at the first space.
Is there a way using LEFT, TRIM, SPLIT or any other string manipulation that I can create a query to select just the first two parts of the name? I need the space included.
You can try this.
SELECT DISTINCT IIf( ( InStr( InStr([Name],' ') + 1 , [Name], ' ') > 0 ), Left( [Name], InStr(InStr([Name],' ') + 1 , [Name], ' ') ), [Name])
FROM MyTable;

How to remove specific value from comma separated string in oracle

I want remove specific value from comma separated sting using oracle.
Sample Input -
col
1,2,3,4,5
Suppose i want to remove 3 from the string.
Sample Output -
col
1,2,4,5
Please suggest how i can do this using oracle query.
Thanks.
Here is a solution that uses only standard string functions (rather than regular expressions) - which should result in faster execution in most cases; it removes 3 only when it is the first character followed by comma, the last character preceded by comma, or preceded and followed by comma, and it removes the comma that precedes it in the middle case and it removes the comma that follows it in the first and third case.
It is able to remove two 3's in a row (which some of the other solutions offered are not able to do) while leaving in place consecutive commas (which presumably stand in for NULL) and do not disturb numbers like 38 or 123.
The strategy is to first double up every comma (replace , with ,,) and append and prepend a comma (to the beginning and the end of the string). Then remove every occurrence of ,3,. From what is left, replace every ,, back with a single , and finally remove the leading and trailing ,.
with
test_data ( str ) as (
select '1,2,3,4,5' from dual union all
select '1,2,3,3,4,4,5' from dual union all
select '12,34,5' from dual union all
select '1,,,3,3,3,4' from dual
)
select str,
trim(both ',' from
replace( replace(',' || replace(str, ',', ',,') || ',', ',3,'), ',,', ',')
) as new_str
from test_data
;
STR NEW_STR
------------- ----------
1,2,3,4,5 1,2,4,5
1,2,3,3,4,4,5 1,2,4,4,5
12,34,5 12,34,5
1,,,3,3,3,4 1,,,4
4 rows selected.
Note As pointed out by MT0 (see Comments below), this will trim too much if the original string begins or ends with commas. To cover that case, instead of wrapping everything within trim(both ',' from ...) I should wrap the rest within a subquery, and use something like substr(new_str, 2, length(new_str) - 2) in the outer query.
Here is one method:
select trim(both ',' from replace(',' || '1,2,3,4,5' || ',', ',' || '3' || ',', ','))
That said, storing comma-delimited strings is a really, really bad idea. There is almost no reason to do such a thing. Oracle supports JSON, XML, and nested tables -- all of which are better alternatives.
The need to remove an element suggests a poor data design.
You can convert the list rows using an XMLTABLE, filter to remove the unwanted rows and then re-aggregate them:
SELECT LISTAGG( x.value.getStringVal(), ',' ) WITHIN GROUP ( ORDER BY idx )
FROM XMLTABLE(
( '1,2,3,4,5' )
COLUMNS value XMLTYPE PATH '.',
idx FOR ORDINALITY
) x
WHERE x.value.getStringVal() != 3;
For a simple filter this is probably not worth it and you should use something like (based on #mathguy's solution):
SELECT SUBSTR( new_list, 2, LENGTH( new_list ) - 2 ) AS new_list
FROM (
SELECT REPLACE(
REPLACE(
',' || REPLACE( :list, ',', ',,' ) || ',',
',' || :value_to_replace || ','
),
',,',
','
) AS new_list
FROM DUAL
)
However, if the filtering is more complicated then it might be worth converting the list to rows, filtering and re-aggregating.
I do not knwo how to do this in Oracle, but with SQL-Server I'd use a trick:
convert the list to XML by replacing the comma with tags
use XQuery to filter the data
reconcatenate
This is SQL Server syntax but might point you the direction:
declare #s varchar(100)='1,2,2,3,3,4';
declare #exclude int=3;
WITH Casted AS
(
SELECT CAST('<x>' + REPLACE(#s,',','</x><x>') + '</x>' AS XML) AS TheXml
)
SELECT x.value('.','int')
FROM Casted
CROSS APPLY TheXml.nodes('/x[text()!=sql:variable("#exclude")]') AS A(x)
UPDATE
I just found this answer which seems to show pretty well how to start...
I agree with Gordon regarding the fact that storing comma delimited data in a column is a really bad idea.
I just preceed the csv with a ',', then use the replace function followed by a left trim function to clean-up the preceeding ','.
SCOTT#tst>VAR b_number varchar2(5);
SCOTT#tst>EXEC :b_number:= '3';
PL/SQL procedure successfully completed.
SCOTT#tst>WITH srce AS (
2 SELECT
3 ',' || '3,1,2,3,3,4,5,3' col
4 FROM
5 dual
6 ) SELECT
7 ltrim(replace(col,',' ||:b_number),',') col
8 FROM
9 srce;
COL
1,2,4,5

How to Extract the middle characters from a string without an specific index to start and end it before the space character is read?

I have a column that contains data like:
AB-123 XYZ
ABCD-456 AAA
BCD-789 BBB
ZZZ-963
Y-85
and this is what i need from those string:
123
456
789
963
85
I need the characters from the left after the dash('-') character, then ends before the space character is read.
Thank You guys.
Note: Original tag on this question was Oracle and this answer is based on that tag. Now that, tag is updated to SqlServer, this answer is no longer valid, if somebody looking for Oracle solution, this may help.
Use regular expression to arrive at sub string.
select trim(substr(regexp_substr('ABCD-456 AAA','-[0-9]+ '),2)) from dual
'-[0-9]+ ' will grab any string pattern which starts with dash has one or more digits and ends with a ' ' and returns number with dash
substr will remove '-' from above output
trim will remove any trailing ' '
Check This.
Using Substring and PatIndex.
select
SUBSTRING(colnm, PATINDEX('%[0-9]%',colnm),
PATINDEX('%[^0-9]%',ltrim(RIGHT(colnm,LEN(colnm)-CHARINDEX('-',colnm)))))
from
(
select 'AB-123 XYZ' colnm union
select 'ABCD-456 AAA' union
select 'BCD-789 BBB' union
select 'ZX- 23 BBB'
)a
OutPut :
Try this
http://rextester.com/YTBPQD69134
CREATE TABLE Table1 ([col] varchar(12));
INSERT INTO Table1
([col])
VALUES
('AB-123 XYZ'),
('ABCD-456 AAA'),
('BCD-789 BBB');
select substring
(col,
charindex('-',col,1)+1,
charindex(' ',col,1)-charindex('-',col,1)
) from table1;
Assume all values has '-' and followed by a space ' '. Below solution will not tolerant to exception case:
SELECT
*,
SUBSTRING(Value, StartingIndex, Length) AS Result
FROM
-- You can replace this section of code with your table name
(VALUES
('AB-123 XYZ'),
('ABCD-456 AAA'),
('BCD-789 BBB')
) t(Value)
-- Use APPLY instead of sub-query is for debugging,
-- you can view the actual parameters in the select
CROSS APPLY
(
SELECT
-- Get the first index of character '-'
CHARINDEX('-', Value) + 1 AS StartingIndex,
-- Get the first index of character ' ', then calculate the length
CHARINDEX(' ', Value) - CHARINDEX('-', Value) - 1 AS Length
) b

Teradata : Sum up values in a column

Problem Statement
Example is shown in below image :
The last 2 rows have the patterns like "1.283 2 3" in a single cell. The numbers are seperated by space in the column. We need to add those nos and represent in the format given in Output.
So, the cell having "1.283 2 3" must be converted to 6.283
Challenges facing :
The column values are in string format.
Add nos after casting them into integer
Donot want to take data in UNIX box and manipulate the same.
In TD14 there would be a built-in table UDF named STRTOK_SPLIT_TO_TABLE, before you need to implement your own UDF or use a recursive query.
I modified an existing string splitting script to use blanks as delimiter:
CREATE VOLATILE TABLE Strings
(
groupcol INT NOT NULL,
string VARCHAR(991) NOT NULL
) ON COMMIT PRESERVE ROWS;
INSERT INTO Strings VALUES (1,'71.792');
INSERT INTO Strings VALUES (2,'71.792 1 2');
INSERT INTO Strings VALUES (3,'1.283 2 3');
WITH RECURSIVE cte
(groupcol,
--string,
len,
remaining,
word,
pos
) AS (
SELECT
GroupCol,
--String,
POSITION(' ' IN String || ' ') - 1 AS len,
TRIM(LEADING FROM SUBSTRING(String || ' ' FROM len + 2)) AS remaining,
TRIM(SUBSTRING(String FROM 1 FOR len)) AS word,
1
FROM strings
UNION ALL
SELECT
GroupCol,
--String,
POSITION(' ' IN remaining)- 1 AS len_new,
TRIM(LEADING FROM SUBSTRING(remaining FROM len_new + 2)),
TRIM(SUBSTRING(remaining FROM 1 FOR len_new)),
pos + 1
FROM cte
WHERE remaining <> ''
)
SELECT
groupcol,
-- remove the NULLIF to get 0 for blank strings
SUM(CAST(NULLIF(word, '') AS DECIMAL(18,3)))
FROM cte
GROUP BY 1
This might use a lot of spool, hopefully you're not running that on a large table.