Extract float from String/Text SQL Server - sql

I have a Data field that is supposed to have floating values(prices), however, the DB designers have messed up and now I have to perform aggregate functions on that field. Whereas 80% of the time data is in correct format,eg. '80.50', sometime it is saved as '$80.50' or '$80.50 per sqm'.
The data field is nvarchar. What I need to do is extract the floating point number from the nvarchar. I came accross this: Article on SQL Authority
This, however, solves half my problem, or compound it, some might say. That function just returns the numbers in a string. That is '$80.50 per m2'will return 80502. Obviously that wont work. I tried to change the Regex from =>
PATINDEX('%[^0-9]%', #strAlphaNumeric) to=>
PATINDEX('%[^0-9].[^0-9]%', #strAlphaNumeric)
doesnt work. Any help would be appreciated.

This will do want you need, tested on (http://sqlfiddle.com/#!6/6ef8e/53)
DECLARE #data varchar(max) = '$70.23 per m2'
Select LEFT(SubString(#data, PatIndex('%[0-9.-]%', #data),
len(#data) - PatIndex('%[0-9.-]%', #data) +1
),
PatIndex('%[^0-9.-]%', SubString(#data, PatIndex('%[0-9.-]%', #data),
len(#data) - PatIndex('%[0-9.-]%', #data) +1))
)
But as jpw already mentioned a regular expression over a CLR would be better

This should work too, but it assumes that the float numbers are followed by a white space in case there's text after.
// sample data
DECLARE #tab TABLE (strAlphaNumeric NVARCHAR(30))
INSERT #tab VALUES ('80.50'),('$80.50'),('$80.50 per sqm')
// actual query
SELECT
strAlphaNumeric AS Original,
CAST (
SUBSTRING(stralphanumeric, PATINDEX('%[0-9]%', strAlphaNumeric),
CASE WHEN PATINDEX('%[ ]%', strAlphaNumeric) = 0
THEN LEN(stralphanumeric)
ELSE
PATINDEX('%[ ]%', strAlphaNumeric) - PATINDEX('%[0-9]%', strAlphaNumeric)
END
)
AS FLOAT) AS CastToFloat
FROM #tab
From the sample data above it generates:
Original CastToFloat
------------------------------ ----------------------
80.50 80,5
$80.50 80,5
$80.50 per sqm 80,5
Sample SQL Fiddle.
If you want something more robust you might want to consider writing an CLR-function to do regex parsing instead like described in this MSDN article: Regular Expressions Make Pattern Matching And Data Extraction Easier

Inspired on #deterministicFail, I thought a way to extract only the numeric part (although it's not 100% yet):
DECLARE #NUMBERS TABLE (
Val VARCHAR(20)
)
INSERT INTO #NUMBERS VALUES
('$70.23 per m2'),
('$81.23'),
('181.93 per m2'),
('1211.21'),
(' There are 4 tokens'),
(' No numbers '),
(''),
(' ')
select
CASE
WHEN ISNUMERIC(RTRIM(LEFT(RIGHT(RTRIM(LTRIM(n.Val)), 1+LEN(RTRIM(LTRIM(n.Val)))-PatIndex('%[0-9.-]%', RTRIM(LTRIM(n.Val)))), LEN(RIGHT(RTRIM(LTRIM(n.Val)), 1+LEN(RTRIM(LTRIM(n.Val)))-PatIndex('%[0-9.-]%', RTRIM(LTRIM(n.Val)))))- PATINDEX('%[^0-9.-]%',RIGHT(RTRIM(LTRIM(n.Val)), 1+LEN(RTRIM(LTRIM(n.Val)))-PatIndex('%[0-9.-]%', RTRIM(LTRIM(n.Val))))))))=1 THEN
RTRIM(LEFT(RIGHT(RTRIM(LTRIM(n.Val)), 1+LEN(RTRIM(LTRIM(n.Val)))-PatIndex('%[0-9.-]%', RTRIM(LTRIM(n.Val)))), LEN(RIGHT(RTRIM(LTRIM(n.Val)), 1+LEN(RTRIM(LTRIM(n.Val)))-PatIndex('%[0-9.-]%', RTRIM(LTRIM(n.Val)))))- PATINDEX('%[^0-9.-]%',RIGHT(RTRIM(LTRIM(n.Val)), 1+LEN(RTRIM(LTRIM(n.Val)))-PatIndex('%[0-9.-]%', RTRIM(LTRIM(n.Val)))))))
ELSE '0.0'
END
FROM #NUMBERS n

Related

sql extract rightmost number in string and increment

i have transaction codes like
"A0004", "1B2005","20CCCCCCC21"
I need to extract the rightmost number and increment the transaction code by one
"AA0004"----->"AA0005"
"1B2005"------->"1B2006"
"20CCCCCCCC21"------>"20CCCCCCCC22"
in SQL Server 2012.
unknown length of string
right(n?) always number
dealing with unsignificant number of string and number length is out of my league.
some logic is always missing.
LEFT(#a,2)+RIGHT('000'+CONVERT(NVARCHAR,CONVERT(INT,SUBSTRING( SUBSTRING(#a,2,4),2,3))+1)),3
First, I want to be clear about this: I totally agree with the comments to the question from a_horse_with_no_name and Jeroen Mostert.
You should be storing one data point per column, period.
Having said that, I do realize that a lot of times the database structure can't be changed - so here's one possible way to get that calculation for you.
First, create and populate sample table (Please save us this step in your future questions):
DECLARE #T AS TABLE
(
col varchar(100)
);
INSERT INTO #T (col) VALUES
('A0004'),
('1B2005'),
('1B2000'),
('1B00'),
('20CCCCCCC21');
(I've added a couple of strings as edge cases you didn't mention in the question)
Then, using a couple of cross apply to minimize code repetition, I came up with that:
SELECT col,
LEFT(col, LEN(col) - LastCharIndex + 1) +
REPLICATE('0', LEN(NumberString) - LEN(CAST(NumberString as int))) +
CAST((CAST(NumberString as int) + 1) as varchar(100)) As Result
FROM #T
CROSS APPLY
(
SELECT PATINDEX('%[^0-9]%', Reverse(col)) As LastCharIndex
) As Idx
CROSS APPLY
(
SELECT RIGHT(col, LastCharIndex - 1) As NumberString
) As NS
Results:
col Result
A0004 A0005
1B2005 1B2006
1B2000 1B2001
1B00 1B01
20CCCCCCC21 20CCCCCCC22
The LastCharIndex represents the index of the last non-digit char in the string.
The NumberString represents the number to increment, as a string (to preserve the leading zeroes if they exists).
From there, it's simply taking the left part of the string (that is, up until the number), and concatenate it to a newly calculated number string, using Replicate to pad the result of addition with the exact number of leading zeroes the original number string had.
Try This
DECLARE #test nvarchar(1000) ='"A0004", "1B2005","20CCCCCCC21"'
DECLARE #Temp AS TABLE (ID INT IDENTITY,Data nvarchar(1000))
INSERT INTO #Temp
SELECT #test
;WITH CTE
AS
(
SELECT Id,LTRIM(RTRIM((REPLACE(Split.a.value('.' ,' nvarchar(max)'),'"','')))) AS Data
,RIGHT(LTRIM(RTRIM((REPLACE(Split.a.value('.' ,' nvarchar(max)'),'"','')))),1)+1 AS ReqData
FROM
(
SELECT ID,
CAST ('<S>'+REPLACE(Data,',','</S><S>')+'</S>' AS XML) AS Data
FROM #Temp
) AS A
CROSS APPLY Data.nodes ('S') AS Split(a)
)
SELECT CONCAT('"'+Data+'"','-------->','"'+CONCAT(LEFT(Data,LEN(Data)-1),CAST(ReqData AS VARCHAR))+'"') AS ExpectedResult
FROM CTE
Result
ExpectedResult
-----------------
"A0004"-------->"A0005"
"1B2005"-------->"1B2006"
"20CCCCCCC21"-------->"20CCCCCCC22"
STUFF(#X
,LEN(#X)-CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END+1
,LEN(((RIGHT(#X,CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END)/#N)+1)#N)
,((RIGHT(#X,CASE PATINDEX('%[A-Z]%',REVERSE(#X)) WHEN 0 THEN LEN(#X) ELSE PATINDEX('%[A-Z]%',REVERSE(#X))-1 END)/#N)+1)#N)
works on number only strings
99 becomes 100
mod(#N) increments

Edit string column in SQL - remove sections between separators

I have a string column in my table that contains 'Character-separated' data such as this:
"Value|Data|4|Z|11/06/2012"
This data is fed into a 'parser' and deserialised into a particular object. (The details of this aren't relevant and can't be changed)
The structure of my object has changed and now I would like to get rid of some of the 'sections' of data
So I want the previous value to turn into this
"Value|Data|11/06/2012"
I was hoping I might be able to get some help on how I would go about doing this in T-SQL.
The data always has the same number of sections, 'n' and I will want to remove the same sections for all rows , 'n-x and 'n-y'
So far I know I need an update statement to update my column value.
I've found various ways of splitting a string but I'm struggling to apply it to my scenario.
In C# I would do
string RemoveSecitons(string value)
{
string[] bits = string.split(value,'|');
List<string> wantedBits = new List<string>();
for(var i = 0; i < bits.Length; i++)
{
if ( i==2 || i==3) // position of sections I no longer want
{
continue;
}
wantedBits.Add(bits[i]);
}
return string.Join(wantedBits,'|');
}
But how I would do this in SQL I'm not sure where to start. Any help here would be appreciated
Thanks
Ps. I need to run this SQL on SQL Server 2012
Edit: It looks like parsing to xml in some manner could be a popular answer here, however I can't guarantee my string won't have characters such as '<' or '&'
Using NGrams8K you can easily write a nasty fast customized splitter. The logic here is based on DelimitedSplit8K. This will likely outperform even the C# code you posted.
DECLARE #string VARCHAR(8000) = '"Value|Data|4|Z|11/06/2012"',
#delim CHAR(1) = '|';
SELECT newString =
(
SELECT SUBSTRING(
#string, split.pos+1,
ISNULL(NULLIF(CHARINDEX(#delim,#string,split.pos+1),0),8000)-split.pos)
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY d.Pos), d.Pos
FROM
(
SELECT 0 UNION ALL
SELECT ng.position
FROM samd.ngrams8k(#string,1) AS ng
WHERE ng.token = #delim
) AS d(Pos)
) AS split(ItemNumber,Pos)
WHERE split.ItemNumber IN (1,2,5)
ORDER BY split.ItemNumber
FOR XML PATH('')
);
Returns:
newString
----------------------------
"Value|Data|11/06/2012"
Not the most elegant way, but works:
SELECT SUBSTRING(#str,1, CHARINDEX('|',#str,CHARINDEX('|',#str,1)+1)-1)
+ SUBSTRING(#str, CHARINDEX('|',#str,CHARINDEX('|',#str,CHARINDEX('|',#str,CHARINDEX('|',#str,1)+1)+1)+1), LEN(#str))
----------------------
Value|Data|11/06/2012
You might try some XQuery:
DECLARE #s VARCHAR(100)='Value|Data|4|Z|11/06/2012';
SELECT CAST('<x>' + REPLACE(#s,'|','</x><x>') + '</x>' AS XML)
.value('concat(/x[1],"|",/x[2],"|",/x[5])','nvarchar(max)');
In short: The value is trasformed to XML by some string replacements. Then we use the XQuery-concat to bind the first, the second and the fifth element together again.
This version is a bit less efficient but safe with forbidden characters:
SELECT CAST('<x>' + REPLACE((SELECT #s AS [*] FOR XML PATH('')),'|','</x><x>') + '</x>' AS XML)
.value('concat(/x[1],"|",/x[2],"|",/x[5])','nvarchar(max)')
Just to add a non-xml option for fun:
Edit and Caveat - In case anyone tries this for a different solution and doesn't read the comments...
HABO rightly noted that this is easily broken if any of the columns have a period (".") in them. PARSENAME is dependent on a 4 part naming structure and will return NULL if that is exceeded. This solution will also break if any values ever contain another pipe ("|") or another delimited column is added - the substring in my answer is specifically there as a workaround for the dependency on the 4 part naming. If you are trying to use this solution on, say, a variable with 7 delimited columns, it would need to be reworked or scrapped in favor of one of the other answers here.
DECLARE
#a VARCHAR(100)= 'Value|Data|4|Z|11/06/2012'
SELECT
PARSENAME(REPLACE(SUBSTRING(#a,0,LEN(#a)-CHARINDEX('|',REVERSE(#a))+1),'|','.'),4)+'|'+
PARSENAME(REPLACE(SUBSTRING(#a,0,LEN(#a)-CHARINDEX('|',REVERSE(#a))+1),'|','.'),3)+'|'+
SUBSTRING(#a,LEN(#a)-CHARINDEX('|',REVERSE(#a))+2,LEN(#a))
Here is a quick way to do it.
CREATE FUNCTION [dbo].StringSplitXML
(
#String VARCHAR(MAX), #Separator CHAR(1)
)
RETURNS #RESULT TABLE(id int identity(1,1),Value VARCHAR(MAX))
AS
BEGIN
DECLARE #XML XML
SET #XML = CAST(
('<i>' + REPLACE(#String, #Separator, '</i><i>') + '</i>')
AS XML)
INSERT INTO #RESULT
SELECT t.i.value('.', 'VARCHAR(MAX)')
FROM #XML.nodes('i') AS t(i)
WHERE t.i.value('.', 'VARCHAR(MAX)') <> ''
RETURN
END
GO
SELECT * FROM dbo.StringSplitXML( 'Value|Data|4|Z|11/06/2012','|')
WHERE id not in (3,4)
Note that using a UDF will slow things down, so this solution should be considered only if you have a reasonably small data set to work with.

How to trim/replace any letters in the value?

I have few columns in my old database that have values where number and letters are combined together. This is something that I have to clean and import in the new table. The most of the values that need to be converted look like this:
40M or 85M or NR or 5NR ...
Since there wasn't any validation what user can enter in the old system there still can be values like: 40A or 3R and so on. I want to import only numeric values in my new table. So if there is any letters in the value I want to trim them. What is the best way to do that in SQL Server? I have tried this:
CASE WHEN CHARINDEX('M',hs_ptr1) <> 0 THEN 1 ELSE 0 END AS hs_ptr1
but this will only identify if one letter is in the value. If anyone can help please let me know. Thanks!
you can use patindex to search for the pattern. Try this code:
Code:
CREATE TABLE #temp
(
TXT NVARCHAR(50)
)
INSERT INTO #temp (TXT)
VALUES
('40M'),
('85M'),
('NR'),
('5NR')
SELECT LEFT(subsrt, PATINDEX('%[^0-9]%', subsrt + 't') - 1)
FROM (
SELECT subsrt = SUBSTRING(TXT, pos, LEN(TXT))
FROM (
SELECT TXT, pos = PATINDEX('%[0-9]%', TXT)
FROM #temp
) d
) t
DROP TABLE #temp
Here's a way without a function....
declare #table table (c varchar(256))
insert into #table
values
('40M'),
('30'),
('5NR'),
('3(-4_')
select
replace(LEFT(SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000),
PATINDEX('%[^0-9.-]%', SUBSTRING(replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',',''), PATINDEX('%[0-9.-]%', replace(replace(replace(replace(replace(c,'(',''),')',''),'-',''),' ',''),',','')), 8000) + 'X') -1),'.','')
from #table
You go with the PATINDEX function and search for a character that is not a digit. If such an index exists, then grab everything to the left of it. Something like that:
SELECT LEFT(your_field_name, PATINDEX("%[^0-9]%", your_field_name) - 1)
FROM your_table_name
UPDATE
Well, you need to take care of any edge cases. E.g. if there isn't a non-digit data the function will return 0, thus the calculation yields -1, which, indeed, is an invalid length.
I would suggest you to leverage a Common Table Expression to calculate the index of the non-digit data and then construct an IIF expression to select the correct char data. E.g.
WITH cte AS
(
SELECT *, PATINDEX("%[^0-9]%", your_field_name) AS NumLength
FROM your_table_name
)
SELECT any_other_field, IIF(NumLength = 0,
your_field_name,
LEFT(your_field_name, PATINDEX("%[^0-9]%", your_field_name) - 1)
)
FROM cte

How to sort a varchar column that contains numbers and letters in SQL Server?

I have a varchar column that contain numbers (1-99999) and letters (AM0001-BF9999).
Since it has letters so i can't just convert it to int.
Is there a way to maybe use grouping_id to sort this column by numbers (small to large) then follow by letters (alphabetically)?
Thanks..
You need to know what the maximum length of your field is. Assuming 25 characters for illustrative purposes, this will work:
select
v
from (
select
right(space(25) + v,25) as v
from ( values
('1-99999')
,('AM0001-BF9999')
) data(v)
)data
order by v
to yield:
v
-------------------------
1-99999
AM0001-BF9999
You can try using the ISNUMERIC function like this:
select * from test_table
order by
case isnumeric(test_column)
when 1 then convert(int,test_column)
else 999999 end, test_column
Sql fiddle demo.
That's what you get when you denormalize your database schema.
Prefix and number should be stored separately.
That said, this is what I did when I had the same problem:
SELECT * FROM YOUR_TABLE
ORDER BY dbo.GetNumbers(YOUR_FIELD), YOUR_FIELD
Create Function dbo.GetNumbers(#Data VarChar(8000))
Returns int
AS
Begin
Return CAST(Left(
SubString(#Data, PatIndex('%[0-9.-]%', #Data), 8000),
PatIndex('%[^0-9.-]%', SubString(#Data, PatIndex('%[0-9.-]%', #Data), 8000) + 'X')-1) AS int)
End
See also this post for extracting numbers from strings
http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/extracting-numbers-with-sql-server/

TSQL How to get the 2nd number from a string

We have the below in row in MS SQL:
Got event with: 123.123.123.123, event 34, brown fox
How can we extract the 2nd number ie the 34 reliable in one line of SQL?
Here's one way to do it using SUBSTRING and PATINDEX -- I used a CTE just so it wouldn't look so awful :)
WITH CTE AS (
SELECT
SUBSTRING(Data,CHARINDEX(',',Data)+1,LEN(Data)) data
FROM Test
)
SELECT LEFT(SUBSTRING(Data, PATINDEX('%[0-9]%', Data), 8000),
PATINDEX('%[^0-9]%',
SUBSTRING(Data, PATINDEX('%[0-9]%', Data), 8000) + 'X')-1)
FROM CTE
And here is some sample Fiddle.
As commented, CTEs will only work with 2005 and higher. If by chance you're using 2000, then this will work without the CTE:
SELECT LEFT(SUBSTRING(SUBSTRING(Data,CHARINDEX(',',Data)+1,LEN(Data)),
PATINDEX('%[0-9]%', SUBSTRING(Data,CHARINDEX(',',Data)+1,LEN(Data))), 8000),
PATINDEX('%[^0-9]%',
SUBSTRING(SUBSTRING(Data,CHARINDEX(',',Data)+1,LEN(Data)),
PATINDEX('%[0-9]%', SUBSTRING(Data,CHARINDEX(',',Data)+1,LEN(Data))), 8000) + 'X')-1)
FROM Test
Simply replace #s with your column name to apply this to a table. Assuming that number is between last comma and space before the last comma. Sql-Fiddle-Demo
declare #s varchar(100) = '123.123.123.123, event 34, brown fox'
select right(first, charindex(' ', reverse(first),1) ) final
from (
select left(#s,len(#s) - charindex(',',reverse(#s),1)) first
--from tableName
) X
OR if it is between first and second commas then try, DEMO
select substring(first, charindex(' ',first,1),
charindex(',', first,1)-charindex(' ',first,1)) final
from (
select right(#s,len(#s) - charindex(',',#s,1)-1) first
) X
I've thought of another way that's not been mentioned yet. Presuming the following are true:
Always one comma before the second "part"
It's always the word "event" with the number in the second part
You are using SQL Server 2005+
Then you could use the built in ParseName function meant for parsing the SysName datatype.
--Variable to hold your example
DECLARE #test NVARCHAR(50)
SET #test = 'Got event with: 123.123.123.123, event 34, brown fox'
SELECT Ltrim(Rtrim(Replace(Parsename(Replace(Replace(#test, '.', ''), ',', '.'), 2), 'event', '')))
Results:
34
ParseName parses around dots, but we want it to parse around commas. Here's the logic of what I've done:
Remove all existing dots in the string, in this case swap them with empty string.
Swap all commas for dots for ParseName to use
Use ParseName and ask for the second "piece". In your example this gives us the value
" event 34".
Remove the word "event" from the string.
Trim both ends and return the value.
I've no comments on performance vs. the other solutions, and it looks just as messy. Thought I'd throw the idea out there anyway!