Extracting numbers separately from string column

Extracting numbers separately from string column - sql

I have a table stat that is VARCHAR type and would like to grab all the numbers within it individually. For example, if a specific record in the column has the value 12 to 24 MONTHS Fl then I would like to grab 12 and 24 separately. I've seen other posts where the numbers end up grouped together and would be 1224 in this case, but how could I do the aforementioned separating of the numbers? Given that I do not know the number of digits in each of the numbers, I was wondering how best to do this. Thanks
For an example like 12 to 24 months APY1.8semi the output would need to be 12,24 and 1.8, but to be clear there are only whole numbers and there aren't any . characters in the column.

I shamelessly copied this answer from another post but made a small modification to preserve your spaces. This one is essentially replacing letters with the # symbol, then replacing the # symbol.
select id, REPLACE(TRANSLATE([comments], 'abcdefghijklmnopqrstuvwxyz+()- ,#+', '##################################'), '#', ' ')
from my_data
id
(No column name)
1
12 24
2
12 24 1.8
fiddle
Or if you would prefer results as a tall table, then you could apply the string_split function.
select id, value from (
select id, ca.value
from my_data
cross apply string_split (REPLACE(TRANSLATE([comments], 'abcdefghijklmnopqrstuvwxyz+()- ,#+', '##################################'), '#', ','), ',')ca
)z
where value <> ''
id
value
1
12
1
24
2
12
2
24
2
1.8
fiddle

First we create this function.
create function [dbo].[GetNumbersFromText](#String varchar(2000))
returns table as return
(
with C as
(
select cast(substring(S.Value, S1.Pos, S2.L) as decimal(10,2)) as Number,
stuff(s.Value, 1, S1.Pos + S2.L, '') as Value
from (select #String+' ') as S(Value)
cross apply (select patindex('%[0-9.]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9.]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
union all
select cast(substring(S.Value, S1.Pos, S2.L) as decimal(10,2)),
stuff(S.Value, 1, S1.Pos + S2.L, '')
from C as S
cross apply (select patindex('%[0-9.]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9.]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
where patindex('%[0-9.]%', S.Value) > 0
)
select number
from C
)
Then we use it in our query.
select string_agg(number, ', ') as result from GetNumbersFromText('12 to 24 months APY1.8semi')
result
12.00, 24.00, 1.80
Fiddle

Related

Extract & Expand Numbers

Using SQL with Microsoft SQL Server. I have a column that has values like this:
5-7(A-C) 15(A-C)
3(A-C)
I am trying to extract the numbers and if there is a dash then I need those numbers plus all the numbers in between. So for this example the output would be 5, 6, 7, 15 for the first row and 3 for the second row. I will use the results to see if they exist in another table.
I have been using this but it does not get the numbers between the dash:
SELECT
CASE
WHEN CHARINDEX('-', SUBSTRING(cc_EXPRESSION, 1, CHARINDEX('(', cc_EXPRESSION) - 1)) > 0
THEN CAST(LEFT(SUBSTRING(cc_EXPRESSION, 1, CHARINDEX('(', cc_EXPRESSION) - 1), CHARINDEX('-', SUBSTRING(cc_EXPRESSION, 1, CHARINDEX('(', cc_EXPRESSION) - 1)) - 1) AS INT)
ELSE CAST(SUBSTRING(cc_EXPRESSION, 1, CHARINDEX('(', cc_EXPRESSION) - 1) AS INT)
END AS extracted_number

Here is an option that illustrates how you can "stack" expressions via a CROSS APPLY and JOIN an ad-hoc tally/numbers table.
You may notice I used TOP 1000 ... feel free to pick a more reasonable number
Example
Select A.cc_Expression
,NewValue = string_agg(N+R1,',')
From YourTable A
Cross Apply string_split(cc_Expression,' ') B
Cross Apply (values ( replace(left(B.Value,charindex('(',B.Value)-1 ),'-','.') ) )C(Rng)
Cross Apply (values (try_convert(int,coalesce(parsename(C.Rng,2),parsename(C.Rng,1) )) ,try_convert(int,parsename(C.Rng,1) )) ) D(R1,R2)
Join ( Select Top 1000 N=-1+Row_Number() Over (Order By (Select NULL)) From master..spt_values n1, master..spt_values n2 ) E on N<=R2-R1
Group By A.cc_Expression
Results
cc_Expression NewValue
3(A-C) 3
5-7(A-C) 15(A-C) 5,6,7,15

With the following table :
CREATE TABLE I_have_a_column_that_has_values_like_this (COL VARCHAR(256));
INSERT INTO I_have_a_column_that_has_values_like_this
VALUES ('5-7(A-C) 15(A-C)'), ('3(A-C)');
You can do it like :
WITH
T0 AS
(
SELECT COL, LEFT(value, CHARINDEX('(', value) -1) AS VAL
FROM I_have_a_column_that_has_values_like_this
CROSS APPLY STRING_SPLIT(COL, ' ')
),
T1 AS
(
SELECT COL, CASE WHEN VAL NOT LIKE '%-%' THEN VAL + '-' + VAL ELSE VAL END AS VAL
FROM T0
)
SELECT COL, value AS VALS
FROM T1
CROSS APPLY GENERATE_SERIES(CAST(LEFT(VAL, CHARINDEX('-', VAL)-1) AS INT),
CAST(RIGHT(VAL, CHARINDEX('-', REVERSE(VAL))-1) AS INT)) AS G
The result will be :
COL VALS
---------------------- -----------
5-7(A-C) 15(A-C) 5
5-7(A-C) 15(A-C) 6
5-7(A-C) 15(A-C) 7
5-7(A-C) 15(A-C) 15
3(A-C) 3

How to split names into first/middle/last if there are people who typed last/first/middle? Order known

I am trying to split names into first, middle, last, based on an indicated order. I am at a lost on how to do this, and any help would be much appreciated. I am using sql server 2008 for work.
I attached an example dataset and the ideal dataset I would like to create.
ID ORDER NAME
1 first, middle, last Bruce, Batman, Wayne
2 middle, last, first Superman, Kent, Clark
3 last, first, middle Prince, Diana, Wonderwoman
INTO:
ID ORDER NAME
1 first Bruce
1 middle Batman
1 last Wayne
2 middle Superman
2 last Kent
2 first Clark
3 last Prince
3 first Diana
3 middle Wonderwoman

SQL Server does not have very good string processing functions. You can do this using a recursive CTE, though:
with cte as (
select id,
convert(varchar(max), left(ord, charindex(',', ord) - 1)) as ord,
convert(varchar(max), left(name, charindex(',', name) - 1)) as name,
convert(varchar(max), stuff(ord, 1, charindex(',', ord) + 1, '')) as ord_rest,
convert(varchar(max), stuff(name, 1, charindex(',', name) + 1, '')) as name_rest,
1 as lev
from t
union all
select id,
convert(varchar(max), left(ord_rest, charindex(',', ord_rest + ',') - 1)) as ord,
convert(varchar(max), left(name_rest, charindex(',', name_rest + ',') - 1)) as name,
convert(varchar(max), stuff(ord_rest, 1, charindex(',', ord_rest + ',') + 1, '')) as ord_rest,
convert(varchar(max), stuff(name_rest, 1, charindex(',', name_rest + ',') + 1, '')) as name_rest,
lev + 1
from cte
where ord_rest <> '' and lev < 10
)
select id, ord, name
from cte
order by id, lev
Here is a db<>fiddle.

With the help of a parse/split function that returns the sequence, this becomes a small matter using a CROSS APPLY
Example
Select A.ID
,B.*
From YourTable A
Cross Apply (
Select [Order] = B1.RetVal
,[Name] = B2.RetVal
From [dbo].[tvf-Str-Parse]([ORDER],',') B1
Join [dbo].[tvf-Str-Parse]([NAME] ,',') B2 on B1.RetSeq=B2.RetSeq
) B
Returns
ID Order Name
1 first Bruce
1 middle Batman
1 last Wayne
2 middle Superman
2 last Kent
2 first Clark
3 last Prince
3 first Diana
3 middle Wonderwoman
The Function if Interested
CREATE FUNCTION [dbo].[tvf-Str-Parse] (#String varchar(max),#Delimiter varchar(10))
Returns Table
As
Return (
Select RetSeq = row_number() over (order by 1/0)
,RetVal = ltrim(rtrim(B.i.value('(./text())[1]', 'varchar(max)')))
From ( values (cast('<x>' + replace((Select replace(#String,#Delimiter,'§§Split§§') as [*] For XML Path('')),'§§Split§§','</x><x>')+'</x>' as xml).query('.'))) as A(x)
Cross Apply x.nodes('x') AS B(i)
);

I found the other answers a bit hard to follow - they're neat tricks for sure but I think anyone coming to maintain them might be like "whaaaat?". Here I work out the indexes of the commas (the first comma string index goes in o1/n1, the second comma goes in o2/n2) in the first cte, cut the string up (substring between 1 and first comma, substring between first and second comma, substring after third comma) in the second cte and then use a couple of unions to turn the results from 7 columns into 3
WITH idxs AS
(
SELECT
id,
order,
name,
CHARINDEX(',', [order]) as o1,
CHARINDEX(',', [order], CHARINDEX(',', [order]) + 1) as o2,
CHARINDEX(',', name) as n1,
CHARINDEX(',', name, CHARINDEX(',', name) + 1) as n2
FROM
t
),
cuts as (
SELECT
id,
SUBSTRING([order], 1, o1-1) as ord1,
SUBSTRING([order], o1+1, o2-o1-1) as ord2,
SUBSTRING([order], o2+1, 4000) as ord3,
SUBSTRING(name, 1, n1-1) as nam1,
SUBSTRING(name, n1+1, n2-n1-1) as nam2,
SUBSTRING(name, n2+1, 4000) as nam3
FROM
idxs
)
SELECT id, ord1 as [order], nam1 as name FROM cuts
UNION ALL
SELECT id, ord2, nam2 FROM cuts
UNION ALL
SELECT id, ord3, nam3 FROM cuts
Note that if your data sometimes has spaces in and sometimes does not you'll benefit from using either LTRIM/RTRIM in the output
if the spaces are always there after a comma, you could also adjust the substring indexes to cut the spaces out (any start index that is x+1 would be x+2 and the length would hence have to be -2)

SQL Server : to get last 4 character in first column and get the first letter of the words in 2nd column but ignore non alphabets

Can I check will it be possible to run SQL with this requirement? I trying to get a new value for new column from these 2 existing columns ID and Description.
For ID, simply retrieve last 4 characters
For Description, would like to get the first alphabets for each word but ignore the numbers & symbols.

SQL Server has lousy string processing capabilities. Even split_string() doesn't preserve the order of the words that it finds.
One approach to this uses a recursive CTE to split the strings and accumulate the initials:
with t as (
select v.*
from (values (2004120, 'soccer field 2010'), (2004121, 'ruby field')) v(id, description)
),
cte as (
select id, description, convert(varchar(max), left(description, charindex(' ', description + ' '))) as word,
convert(varchar(max), stuff(description, 1, charindex(' ', description + ' ') , '')) as rest,
1 as lev,
(case when description like '[a-zA-Z]%' then convert(varchar(max), left(description, 1)) else '' end) as inits
from t
union all
select id, description, convert(varchar(max), left(rest, charindex(' ', rest + ' '))) as word,
convert(varchar(max), stuff(rest, 1, charindex(' ', rest + ' ') , '')) as rest,
lev + 1,
(case when rest like '[a-zA-Z]%' then convert(varchar(max), inits + left(rest, 1)) else inits end) as inits
from cte
where rest > ''
)
select id, description, inits + right(id, 4)
from (select cte.*, max(lev) over (partition by id) as max_lev
from cte
) cte
where lev = max_lev;
Here is a db<>fiddle.

To get the last 4 numbers of the ID you could use:
SELECT Id%10000 as New_Id from Tablename;
To get the starting of each Word you could use(letting the answer be String2):
LEFT(Description,1)
This is equivalent to using SUBSTRING(Description,1,1)
This helps you get the first letter of each word.
To concatenate both of them you could use the CONCAT function:
SELECT CONCAT(String2,New_Id)
See more on the CONCAT function here

Selecting between quotes (") in SQL Server 2012

I have a table holding IDs in one column and a string in the second column like below.
COLUMN01 COLUMN02
----------------------------------------------------------------------------------
1 abc"11444,12,13"efg"14,15"hij"16,17,18,19"opqr
2 ahsdhg"21,22,23"ghshds"24,25"fgh"26,27,28,28"shgshsg
3 xvd"3142,32,33"hty"34,35"okli"36,37,38,39"adfd
Now I want to have the following result
COLUMN01 COLUMN02
-----------------------------------------------------------
1 11444,12,13,14,15,16,17,18,19
2 21,22,23,24,25,26,27,28,28
3 3142,32,33,34,35,36,37,38,39
How can I do that?
Thanks so much

Here is one way (maybe not the best, but it seems to work). I am NOT a SQL guru...
First, create this SQL Function. It came from: Extract numbers from a text in SQL Server
create function [dbo].[GetNumbersFromText](#String varchar(2000))
returns table as return
(
with C as
(
select cast(substring(S.Value, S1.Pos, S2.L) as int) as Number,
stuff(s.Value, 1, S1.Pos + S2.L, '') as Value
from (select #String+' ') as S(Value)
cross apply (select patindex('%[0-9]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
union all
select cast(substring(S.Value, S1.Pos, S2.L) as int),
stuff(S.Value, 1, S1.Pos + S2.L, '')
from C as S
cross apply (select patindex('%[0-9]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
where patindex('%[0-9]%', S.Value) > 0
)
select Number
from C
)
Then, you can do something like this to get the results you were asking for. Note that I broke the query up into 3 parts for clarity. And, obviously, you don't need to declare the table variable and insert data into it.
DECLARE #tbl
TABLE (
COLUMN01 int,
COLUMN02 varchar(max)
)
INSERT INTO #tbl VALUES (1, 'abc"11444,12,13"efg"14,15"hij"16,17,18,19"opqr')
INSERT INTO #tbl VALUES (2, 'ahsdhg"21,22,23"ghshds"24,25"fgh"26,27,28,28"shgshsg')
INSERT INTO #tbl VALUES (3, 'xvd"3142,32,33"hty"34,35"okli"36,37,38,39"adfd')
SELECT COLUMN01, SUBSTRING(COLUMN02, 2, LEN(COLUMN02) - 1) as COLUMN02 FROM
(
SELECT COLUMN01, REPLACE(COLUMN02, ' ', '') as COLUMN02 FROM
(
SELECT COLUMN01, (select ',' + number as 'data()' from dbo.GetNumbersFromText(Column02) for xml path('')) as COLUMN02 FROM #tbl
) t
) tt
GO
output:
COLUMN01 COLUMN02
1 11444,12,13,14,15,16,17,18,19
2 21,22,23,24,25,26,27,28,28
3 3142,32,33,34,35,36,37,38,39

I know you want to do it using SQL. But ones I had nearly the same problem and getting this data to a string using a php or another language, than parsing is a way to do it. For example, you can use this kind of code after receiving the data into a string.
function clean($string) {
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
return preg_replace('/-+/', '-', $string); // Replaces multiple hyphens with single one.
}
For more information you might want to look at this post that I retrieved the function: Remove all special characters from a string
As I said this is an easy way to do it, I hope this could help.

Extract multiple decimal numbers from string in T-SQL

I have a table in SQL Server Management Studio with columns containing ranges of numbers as strings. I am trying to find a way to extract the numeric values from the string and insert them into a new table.
For example, in the table I have the value 12.45% - 42.32% as a string. I'd like to be able to get 12.45 and 42.32 and insert them into a new table with columns min_percent and max_percent.
I found several ways to extract a single numeric value from a string using SQL, and also tried modifying the function from Extract numbers from a text in SQL Server (which returns multiple integers, but not decimals), but so far I haven't been able to get it working. Thanks in advance for any suggestions

Assuming your data is consistent, this should work fine, and has the added advantage of being easier on the eyes. Also consider decimal if you're going for precision.
select
cast(left(r, charindex('%', r) - 1) AS float) as minVal,
cast(replace(right(r, charindex('-', r) - 1), '%', '') as float) AS maxVal
from ( select '22.45% - 42.32%' as r ) as tableStub

The function is quite close. You just use numeric and add the point:
with C as
(
select cast(substring(S.Value, S1.Pos, S2.L) as decimal(16,2)) as Number,
stuff(s.Value, 1, S1.Pos + S2.L, '') as Value
from (select #String+' ') as S(Value)
cross apply (select patindex('%[0-9,.]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9,.]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
union all
select cast(substring(S.Value, S1.Pos, S2.L) as decimal(16,2)),
stuff(S.Value, 1, S1.Pos + S2.L, '')
from C as S
cross apply (select patindex('%[0-9,.]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9,.]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
where patindex('%[0-9,.]%', S.Value) > 0
)
select Number
from C

Here is a brute force approach using the string operations available in SQL Server:
with t as (
select '12.45% - 42.32%' as val
)
select cast(SUBSTRING(val, 1, charindex('%', val) - 1) as float) as minval,
cast(replace(substring(val, len(val) - charindex(' ', reverse(val))+2, 100), '%', '') as float) as maxval
from t

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extracting numbers separately from string column - sql

Related

Extract & Expand Numbers

How to split names into first/middle/last if there are people who typed last/first/middle? Order known

SQL Server : to get last 4 character in first column and get the first letter of the words in 2nd column but ignore non alphabets

Selecting between quotes (") in SQL Server 2012

Extract multiple decimal numbers from string in T-SQL

Categories

Resources