Split strings in a column based on text values and numerical values such as patindex - sql

I have a column that displays stock market options data like below:
GME240119C00020000
QQQ240119C00305000
NFLX240119P00455000
I want to be able to split these up so they show up like:
GME|240119|C|00020000
QQQ|240119|C|00305000
NFLX|240119|P|00455000
I was able to split the first portion with the ticker name by using the code below, but I don't know how to split the rest of the strings.
case patindex('%[0-9]%', str)
when 0 then str
else left(str, patindex('%[0-9]%', str) -1 )
end
from t
edit: for anyone who is wondering, I used Dale's solution below to get my desired outcome. I edited the query he provided to make the parts show up as individual columns
select
substring(T.contractSymbol,1,C1.Position-1) as a
,substring(T.contractSymbol,C1.Position,6) as b
,substring(S1.Part,1,1) as c
,substring(S1.Part,2,len(S1.Part)) as d
from Options_Data_All T
cross apply (
values (patindex('%[0-9]%', T.contractSymbol))
) C1 (Position)
cross apply (
values (substring(contractSymbol, C1.Position+6, len(T.contractSymbol)))
) S1 (Part);

Just keep doing what you started doing by using SUBSTRING. So as you did find the first number and actually in your case, based on the data provided, everything else is fixed length, so you don't have to search anymore, just split the string.
declare #Test table (Contents nvarchar(max));
insert into #Test (Contents)
values
('GME240119C00020000'),
('QQQ240119C00305000'),
('NFLX240119P00455000');
select
substring(T.Contents,1,C1.Position-1) + '|' + substring(T.Contents,C1.Position,6) + '|' + substring(S1.Part,1,1) + '|' + substring(S1.Part,2,len(S1.Part))
from #Test T
cross apply (
values (patindex('%[0-9]%', T.Contents))
) C1 (Position)
cross apply (
values (substring(Contents, C1.Position+6, len(T.Contents)))
) S1 (Part);
Returns:
Data
GME|240119|C|00020000
QQQ|240119|C|00305000
NFLX|240119|P|00455000
If one can assume that all but the first column are fixed width then a simple SUBSTRING solution would suffice e.g.
select
substring(Contents,1,len(Contents)-15)
+ '|' + substring(Contents,len(Contents)-14,6)
+ '|' + substring(Contents,len(Contents)-8,1)
+ '|' + substring(Contents,len(Contents)-7,8) [Data]
from #Test;
Note: CROSS APPLY is just a fancy way to use a sub-query to avoid needing to repeat a calculation.

Related

Remove items in a delimited list that are non numeric in SQL for Redshift

I am working with a field called codes that is a delimited list of values, separated by commas. Within each item there is a title ending in a colon and then a code number following the colon. I want a list of only the code numbers after each colon.
Example Value:
name-form-na-stage0:3278648990379886572,rules-na-unwanted-sdfle2:6886328308933282817,us-disdg-order-stage1:1273671130817907765
Desired Output:
3278648990379886572,6886328308933282817,1273671130817907765
The title does always start with a letter and the end with a colon so I can see how REGEXP_REPLACE might work to replace any string between starting with a letter and ending with a colon with '' might work but I am not good at REGEXP_REPLACE patterns. Chat GPT is down fml.
Side note, if anyone knows of a good guide for understanding pattern notation for regular expressions it would be much appreciated!
I tried this and it is not working REGEXP_REPLACE(REPLACE(REPLACE(codes,':', ' '), ',', ' ') ,' [^0-9]+ ', ' ')
This solution assumes a few things:
No colons anywhere else except immediately before the numbers
No number at the very start
At a high level, this query finds how many colons there are, splits the entire string into that many parts, and then only keeps the number up to the comma immediately after the number, and then aggregates the numbers into a comma-delimited list.
Assuming a table like this:
create temp table tbl_string (id int, strval varchar(1000));
insert into tbl_string
values
(1, 'name-form-na-stage0:3278648990379886572,rules-na-unwanted-sdfle2:6886328308933282817,us-disdg-order-stage1:1273671130817907765');
with recursive cte_num_of_delims AS (
select max(regexp_count(strval, ':')) AS num_of_delims
from tbl_string
), cte_nums(nums) AS (
select 1 as nums
union all
select nums + 1
from cte_nums
where nums <= (select num_of_delims from cte_num_of_delims)
), cte_strings_nums_combined as (
select id,
strval,
nums as index
from cte_nums
cross join tbl_string
), prefinal as (
select *,
split_part(strval, ':', index) as parsed_vals
from cte_strings_nums_combined
where parsed_vals != ''
and index != 1
), final as (
select *,
case
when charindex(',', parsed_vals) = 0
then parsed_vals
else left(parsed_vals, charindex(',', parsed_vals) - 1)
end as final_vals
from prefinal
)
select listagg(final_vals, ',')
from final

Pulling floats to sum data in array structure using SQL

I'm trying to pull numbers from an array structure and then I want to sum them.
Example row entry:
{"DBA":50.0},{"RST":132.0},{"ZIT":752}
I would want to sum all of the number values so 50 + 132 + 752 = 934
What I have tried: col = column name
SELECT SUBSTRING(col, LEN(LEFT(col, CHARINDEX (':', col))) + 1, LEN(col) - LEN(LEFT(col,
CHARINDEX (':', col))) - LEN(RIGHT(col, LEN(col) - CHARINDEX ('}', Benefit))) - 1)
FROM table
This works to grab the first value (so 50.0) in the above example, but will not grab each value. Any idea how I can make this query grab multiple values and then sum them together?
I would, personally, convert your data into actual well formed JSON. Then you can easily SUM the values:
DECLARE #YourString nvarchar(MAX) = N'{"DBA":50.0},{"RST":132.0},{"ZIT":752}';
SELECT SUM(TRY_CONVERT(decimal(5,1),[value]))
FROM (VALUES(CONCAT('{',REPLACE(REPLACE(#YourString,'{',''),'}',''),'}')))V(JSONString)
CROSS APPLY OPENJSON(V.JSONString);
Or you could add a WITH to the OPENJSON call and then add (+) the values:
DECLARE #YourString nvarchar(MAX) = N'{"DBA":50.0},{"RST":132.0},{"ZIT":752}';
SELECT OJ.DBA + OJ.RST + OJ.ZIT
FROM (VALUES(CONCAT('{',REPLACE(REPLACE(#YourString,'{',''),'}',''),'}')))V(JSONString)
CROSS APPLY OPENJSON(V.JSONString)
WITH (DBA decimal(5,1),
RST decimal(5,1),
ZIT decimal(5,1)) OJ;
The content is almost a valid JSON, so you may try to fix it and parse it with built-in JSON support using OPENJSON() (a valid JSON content is [{"DBA":50.0},{"RST":132.0},{"ZIT":752}]):
SELECT
t.[Column],
[Sum] = (
SELECT SUM(CONVERT(numeric(10, 1), j2.value))
FROM OPENJSON(CONCAT('[', t.[Column], ']')) j1
CROSS APPLY OPENJSON(j1.[value]) j2
)
FROM (VALUES
('{"DBA":50.0},{"RST":132.0},{"ZIT":752}')
) t ([Column])

sql extract rightmost number in string and increment

i have transaction codes like
"A0004", "1B2005","20CCCCCCC21"
I need to extract the rightmost number and increment the transaction code by one
"AA0004"----->"AA0005"
"1B2005"------->"1B2006"
"20CCCCCCCC21"------>"20CCCCCCCC22"
in SQL Server 2012.
unknown length of string
right(n?) always number
dealing with unsignificant number of string and number length is out of my league.
some logic is always missing.
LEFT(#a,2)+RIGHT('000'+CONVERT(NVARCHAR,CONVERT(INT,SUBSTRING( SUBSTRING(#a,2,4),2,3))+1)),3
First, I want to be clear about this: I totally agree with the comments to the question from a_horse_with_no_name and Jeroen Mostert.
You should be storing one data point per column, period.
Having said that, I do realize that a lot of times the database structure can't be changed - so here's one possible way to get that calculation for you.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
col varchar(100)
);
INSERT INTO #T (col) VALUES
('A0004'),
('1B2005'),
('1B2000'),
('1B00'),
('20CCCCCCC21');
(I've added a couple of strings as edge cases you didn't mention in the question)
Then, using a couple of cross apply to minimize code repetition, I came up with that:
SELECT col,
LEFT(col, LEN(col) - LastCharIndex + 1) +
REPLICATE('0', LEN(NumberString) - LEN(CAST(NumberString as int))) +
CAST((CAST(NumberString as int) + 1) as varchar(100)) As Result
FROM #T
CROSS APPLY
(
SELECT PATINDEX('%[^0-9]%', Reverse(col)) As LastCharIndex
) As Idx
CROSS APPLY
(
SELECT RIGHT(col, LastCharIndex - 1) As NumberString
) As NS
Results:
col Result
A0004 A0005
1B2005 1B2006
1B2000 1B2001
1B00 1B01
20CCCCCCC21 20CCCCCCC22
The LastCharIndex represents the index of the last non-digit char in the string.
The NumberString represents the number to increment, as a string (to preserve the leading zeroes if they exists).
From there, it's simply taking the left part of the string (that is, up until the number), and concatenate it to a newly calculated number string, using Replicate to pad the result of addition with the exact number of leading zeroes the original number string had.
Try This
DECLARE #test nvarchar(1000) ='"A0004", "1B2005","20CCCCCCC21"'
DECLARE #Temp AS TABLE (ID INT IDENTITY,Data nvarchar(1000))
INSERT INTO #Temp
SELECT #test
;WITH CTE
AS
(
SELECT Id,LTRIM(RTRIM((REPLACE(Split.a.value('.' ,' nvarchar(max)'),'"','')))) AS Data
,RIGHT(LTRIM(RTRIM((REPLACE(Split.a.value('.' ,' nvarchar(max)'),'"','')))),1)+1 AS ReqData
FROM
(
SELECT ID,
CAST ('<S>'+REPLACE(Data,',','</S><S>')+'</S>' AS XML) AS Data
FROM #Temp
) AS A
CROSS APPLY Data.nodes ('S') AS Split(a)
)
SELECT CONCAT('"'+Data+'"','-------->','"'+CONCAT(LEFT(Data,LEN(Data)-1),CAST(ReqData AS VARCHAR))+'"') AS ExpectedResult
FROM CTE
Result
ExpectedResult
-----------------
"A0004"-------->"A0005"
"1B2005"-------->"1B2006"
"20CCCCCCC21"-------->"20CCCCCCC22"
STUFF(#X
,LEN(#X)-CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END+1
,LEN(((RIGHT(#X,CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END)/#N)+1)#N)
,((RIGHT(#X,CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END)/#N)+1)#N)
works on number only strings
99 becomes 100
mod(#N) increments

Trying to extract number between 2 characters '|' MS SQL

I have column and need to extract number between 2 pipes |, example data inside is AAA|12345678|#RRR. I need to get this number 12345678.
my code is:
SELECT SUBSTRING(column_name,CHARINDEX('|',column_name) + 1, CHARINDEX('|',column_name) - CHARINDEX('|',column_name) - 1)
FROM [name].[name].[table_name]
Using your own code:
SELECT SUBSTRING(column_name,CHARINDEX('|',column_name) + 1,
CHARINDEX('|',column_name) - CHARINDEX('|',column_name) - 1)
FROM [name].[name].[table_name]
The second part of substring is not correct. It should be:
SELECT SUBSTRING(column_name,CHARINDEX('|',column_name) + 1,
CHARINDEX('|',column_name, CHARINDEX('|',column_name)))
FROM [name].[name].[table_name]
The nested CHARINDEX will look for the position of the second pipe. and the SUBSTRING will start from the first pipe and continue to the second
Assuming the 2nd position, you can use a little XML or ParseName()
XML Example
Declare #YourTable table (ID int,column_name varchar(max))
Insert Into #YourTable values
(1,'AAA|12345678|#RRR')
Select ID
,SomeValue = Cast('<x>' + replace(column_name,'|','</x><x>')+'</x>' as xml).value('/x[2]','varchar(max)')
From #YourTable
ParseName() Example
Select ID
,SomeValue = parsename(replace(column_name,'|','.'),2)
From #YourTable
Both would Return
ID SomeValue
1 12345678
String extraction is generally tricky in SQL Server. But if you only have one numeric value and are looking for it, then the code isn't that bad:
select patindex('%[0-9]|%', str),
substring(str, patindex('%|[0-9]%', str), patindex('%[0-9]|%', str) - patindex('%|[0-9]%', str) + 1)
from (values ('AAA|12345678|#RRR')) v(str)
I would use PARSENAME() :
select parsename(replace(str, '|', '.'), 2)
from ( values ('AAA|12345678|#RRR')
) v(str);

Teradata : Sum up values in a column

Problem Statement
Example is shown in below image :
The last 2 rows have the patterns like "1.283 2 3" in a single cell. The numbers are seperated by space in the column. We need to add those nos and represent in the format given in Output.
So, the cell having "1.283 2 3" must be converted to 6.283
Challenges facing :
The column values are in string format.
Add nos after casting them into integer
Donot want to take data in UNIX box and manipulate the same.
In TD14 there would be a built-in table UDF named STRTOK_SPLIT_TO_TABLE, before you need to implement your own UDF or use a recursive query.
I modified an existing string splitting script to use blanks as delimiter:
CREATE VOLATILE TABLE Strings
(
groupcol INT NOT NULL,
string VARCHAR(991) NOT NULL
) ON COMMIT PRESERVE ROWS;
INSERT INTO Strings VALUES (1,'71.792');
INSERT INTO Strings VALUES (2,'71.792 1 2');
INSERT INTO Strings VALUES (3,'1.283 2 3');
WITH RECURSIVE cte
(groupcol,
--string,
len,
remaining,
word,
pos
) AS (
SELECT
GroupCol,
--String,
POSITION(' ' IN String || ' ') - 1 AS len,
TRIM(LEADING FROM SUBSTRING(String || ' ' FROM len + 2)) AS remaining,
TRIM(SUBSTRING(String FROM 1 FOR len)) AS word,
1
FROM strings
UNION ALL
SELECT
GroupCol,
--String,
POSITION(' ' IN remaining)- 1 AS len_new,
TRIM(LEADING FROM SUBSTRING(remaining FROM len_new + 2)),
TRIM(SUBSTRING(remaining FROM 1 FOR len_new)),
pos + 1
FROM cte
WHERE remaining <> ''
)
SELECT
groupcol,
-- remove the NULLIF to get 0 for blank strings
SUM(CAST(NULLIF(word, '') AS DECIMAL(18,3)))
FROM cte
GROUP BY 1
This might use a lot of spool, hopefully you're not running that on a large table.