Remove alphanumeric characters from the Varchar columns and then convert to Float

Remove alphanumeric characters from the Varchar columns and then convert to Float - sql

I have a Laboratory-Test table with 120 columns all with datatype varchar (which supposed to be FLOAT) but these columns also contain characters like ^,*,A-Z,a-z, commas, sentences with full stop "." at the end. I am using the following function to keep all the numeric values including ".".
The issue is this . (dot ), if I use #KeepValues as varchar(50) = '%[^0-9]%' then it will remove all the dots (e.g 1.05*L become 105) which is not something I want.
Could you please help me to resolved this would be very helpful or any alternative solution would be great
Create Function [dbo].[RAC]
(#Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
Declare #KeepValues as varchar(50) = '%[^0-9.]%'
While PatIndex(#KeepValues, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#KeepValues, #Temp), 1, '')
Return #Temp
End
My T-SQL CASE statement is :
,CASE WHEN LTRIM(RTRIM(DBO.RAC([INR]))) NOT IN ('','.')
THEN round(AVG(NULLIF(CAST(DBO.RAC([INR]) as FLOAT), 0)), 2)
END AS [INR]

Since you have SQL2012, you can take advantage of the TRY_CONVERT() function
CREATE FUNCTION [dbo].[RAC] (#input varchar(max))
RETURNS TABLE AS
RETURN (
WITH number_list AS (SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) i FROM sys.objects a)
SELECT TOP 1 TRY_CONVERT(float,LEFT(#input,i)) float_conversion
FROM number_list
WHERE i <= LEN(#input) AND TRY_CONVERT(float,LEFT(#input,i)) IS NOT NULL
ORDER BY i DESC
)
GO
If you have an actual number_list, which is very useful, use that instead.
DECLARE #table TABLE (data varchar(max))
INSERT #table VALUES
('123.124'),
('123.567 blah.'),
('123.567E10 blah.'),
('blah 45.2')
SELECT *
FROM #table
OUTER APPLY [dbo].[RAC](data) t

You need a somewhat basic Regular Expression that will allow you to get digits with a single decimal between two sets of digits (or perhaps digits with no decimal at all). This requires using SQLCLR for the RegEx function. You can find numerous examples of those, or you can use the freely available SQLCLR library SQL# (SQLsharp) (which I am the author of, but the function needed to answer this question is in the Free version).
DECLARE #Expression NVARCHAR(100) = N'\d+(\.\d+)?(e[-+]?\d+)?';
SELECT
SQL#.RegEx_MatchSimple(N'This is a test. Number here 1.05*L.',
#Expression, 1, 'IgnoreCase') AS [TheNumber],
CONVERT(FLOAT, SQL#.RegEx_MatchSimple(N'This is a test. Number here 1.05*L.',
#Expression, 1, 'IgnoreCase')) AS [Float],
CONVERT(FLOAT, SQL#.RegEx_MatchSimple(N'Another test. New number 1.05e4*L.',
#Expression, 1, 'IgnoreCase')) AS [Float2],
CONVERT(FLOAT, SQL#.RegEx_MatchSimple(N'One more test. Yup 1.05e-4*L.',
#Expression, 1, 'IgnoreCase')) AS [Float3]
/*
Returns:
TheNumber Float Float2 Float3
1.05 1.05 10500 0.000105
*/
The only issue with the pattern would be if there is another number in the text (you did say there are full sentences) prior to the one that you want. If you are 100% certain that the value you want will always have a decimal, you could use a simpler expression as follows:
\d+\.\d+(e[-+]?\d+)?
The regular expression allows for optional ( e / e+ / e- ) notation.

PATINDEX supports pattern matching, but only for T-SQL patterns and getting a pattern to do what you need may be impossible.
It sounds like you will need to use a regular expression for this you will need a CLR user defined function or you can do it using external to SQL Server by writing an app.
The marked answer to this question will help you get what you need.
Here is a copy of the code for ease of reference:
using System;
using System.Data;
using System.Text.RegularExpressions;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString StripNonNumeric(SqlString input)
{
Regex regEx = new Regex(#"\D");
return regEx.Replace(input.Value, "");
}
};

Related

How do I convert decimal encoded hex to ascii in SQL Server 2012

What I have is a string of integer values that represent Letters and Numbers in pairs. I would like to use SQL to do the conversion from '12371 12595 8224' to 'S031 '
I've gotten to here...
CAST(CAST(CAST(SUBSTRING(DataString,1,CHARINDEX(CHAR(9),DataString)-1) as bigint) AS VARBINARY(2)) AS VARCHAR(2)) but that gets me '0S'
Note: The CHAR(9) is because the integers are tab delimited.

So here's one way you can get your desired output. Because you're using 2012 there are some useful functions for splitting and aggregating strings that are not available to you.
This makes use of a tally/numbers table - simulated here with values() - but you would have a permanent table of numbers to use.
This splits the string into rows, and then simply extracts the high/low bytes and uses for xml path to aggregate back into a string.
See this fiddle
declare #v varchar(50)='12371 12595 8224', #delimiter varchar(1)=' ';
with v as (
select Convert(int,Substring(#delimiter+#v+#delimiter, n.n+1, CharIndex(#delimiter, #delimiter+#v+#delimiter, n.n+1)-n.n-1)) as v
from
/* replace this with tally / numbers table */
(select * from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20))n(n))n
where n.n <= len(#delimiter+#v+#delimiter)-1
and substring(#delimiter+#v+#delimiter, n.n, 1) = #delimiter
)
select Concat(Char(v & 0x00ff), char(v/256))
from v
for xml path(''), type

I dug a bit deeper into the datastring and found that I only had to decode the first 6 integers. So I went ugly. Considering how far I was already, the missing piece was the Reverse function.
REVERSE(CAST(CAST(SUBSTRING(DataString,1,5) as bigint) AS VARBINARY(2)))+
REVERSE(CAST(CAST(SUBSTRING(DataString,7,5) as bigint) AS VARBINARY(2)))+
REVERSE(CAST(CAST(SUBSTRING(DataString,13,4) as bigint) AS VARBINARY(2)))+
REVERSE(CAST(CAST(SUBSTRING(DataString,18,4) as bigint) AS VARBINARY(2)))+
REVERSE(CAST(CAST(SUBSTRING(DataString,23,5) as bigint) AS VARBINARY(2)))+
REVERSE(CAST(CAST(SUBSTRING(DataString,29,5) as bigint) AS VARBINARY(2))) AS [Output Code]
Had I needed to go through all of the integers in the string, a While loop would have worked fine.

how to sum up value within one cell SQL

I have some binary values such as 00, 0000, 001000.11111000, 1111100000
I need to sum it up so it turns into 0, 0, 1, 5, 5 ( sum 0s and 1s up)
how can we do that in SQL please?
Thanks

Assumption:
The binary values are stored as string.
Each value is in its own cell in a table. Something like:
BinaryValues (Consider it a column name)
00
0000
001000
and so on.
You want to add up the individual digits to get the sum.
SQL Product you are usind supports functions, looping, string manipulation like substring, extracting string length etc.
As per my best knowledge these are primitives available in all SQL products.
Solution:
Write a function (call it by any name. Ex: AddBinaryDigits) which will take the binary value in string format as input.
Inside the function and do a string manipulation. Extract each digit and add it up. Return the sum as result.
Call the function:
If using binary values stored in a table:
SELECT AddBinaryDigits(BinaryValues) FROM <WhatEverTableName>
If using fixed value:
SELECT AddBinaryDigits('00')
SELECT AddBinaryDigits('0000')
SELECT AddBinaryDigits('001000')
and so on.
Edited to include the request to create function.
CREATE FUNCTION <funtionName>
(
#ParameterName AS VARCHAR(expected string length like 10/15/20 etc.)
)
RETURNS INT
BEGIN
SQL Code to sum
RETURN SummedUpValue
END

Use the below query. If needed convert it into function.
create function dbo.fnSumChars(#someInt VARCHAR(20))
RETURNS INT
AS
BEGIN
DECLARE #count INT = LEN(#someInt),
#counter INT = 1
DECLARE #Sum INT = 0
WHILE #counter <= #count
BEGIN
SELECT #sum += CAST(SUBSTRING(CAST(#someInt AS VARCHAR), #counter, 1) AS int)
SELECT #counter += 1
END
RETURN #sum --5
END
This is the function and you can call this function like below
SELECT dbo.fnSumChars('1111100000')

If these are already in string format, this is the easiest:
select len(replace('1111100000', '0', ''))
No need for a function either, because it can be inlined in the query. Functions, even the light ones, incure perf penalty.

Looking for a scalar function to find the last occurrence of a character in a string

Table FOO has a column FILEPATH of type VARCHAR(512). Its entries are absolute paths:
FILEPATH
------------------------------------------------------------
file://very/long/file/path/with/many/slashes/in/it/foo.xml
file://even/longer/file/path/with/more/slashes/in/it/baz.xml
file://something/completely/different/foo.xml
file://short/path/foobar.xml
There's ~50k records in this table and I want to know all distinct filenames, not the file paths:
foo.xml
baz.xml
foobar.xml
This looks easy, but I couldn't find a DB2 scalar function that allows me to search for the last occurrence of a character in a string. Am I overseeing something?
I could do this with a recursive query, but this appears to be overkill for such a simple task and (oh wonder) is extremely slow:
WITH PATHFRAGMENTS (POS, PATHFRAGMENT) AS (
SELECT
1,
FILEPATH
FROM FOO
UNION ALL
SELECT
POSITION('/', PATHFRAGMENT, OCTETS) AS POS,
SUBSTR(PATHFRAGMENT, POSITION('/', PATHFRAGMENT, OCTETS)+1) AS PATHFRAGMENT
FROM PATHFRAGMENTS
)
SELECT DISTINCT PATHFRAGMENT FROM PATHFRAGMENTS WHERE POS = 0

I think what you're looking for is the LOCATE_IN_STRING() scalar function. This is what Info Center has to say if you use a negative start value:
If the value of the integer is less than zero, the search begins at
LENGTH(source-string) + start + 1 and continues for each position to
the beginning of the string.
Combine that with the LENGTH() and RIGHT() scalar functions, and you can get what you want:
SELECT
RIGHT(
FILEPATH
,LENGTH(FILEPATH) - LOCATE_IN_STRING(FILEPATH,'/',-1)
)
FROM FOO

One way to do this is by taking advantage of the power of DB2s XQuery engine. The following worked for me (and fast):
SELECT DISTINCT XMLCAST(
XMLQuery('tokenize($P, ''/'')[last()]' PASSING FILEPATH AS "P")
AS VARCHAR(512) )
FROM FOO
Here I use tokenize to split the file path into a sequence of tokens and then select the last of these tokens. The rest is only conversion from SQL to XML types and back again.

I know that the problem from the OP was already solved but I decided to post the following anyway to hopefully help others like me that land here.
I came across this thread while searching for a solution to my similar problem which had the exact same requirement but was for a different kind of database that was also lacking the REVERSE function.
In my case this was for a OpenEdge (Progress) database, which has a slightly different syntax. This made the INSTR function available to me that most Oracle typed databases offer.
So I came up with the following code:
SELECT
SUBSTRING(
foo.filepath,
INSTR(foo.filepath, '/',1, LENGTH(foo.filepath) - LENGTH( REPLACE( foo.filepath, '/', '')))+1,
LENGTH(foo.filepath))
FROM foo
However, for my specific situation (being the OpenEdge (Progress) database) this did not result into the desired behaviour because replacing the character with an empty char gave the same length as the original string. This doesn't make much sense to me but I was able to bypass the problem with the code below:
SELECT
SUBSTRING(
foo.filepath,
INSTR(foo.filepath, '/',1, LENGTH( REPLACE( foo.filepath, '/', 'XX')) - LENGTH(foo.filepath))+1,
LENGTH(foo.filepath))
FROM foo
Now I understand that this code won't solve the problem for T-SQL because there is no alternative to the INSTR function that offers the Occurence property.
Just to be thorough I'll add the code needed to create this scalar function so it can be used the same way like I did in the above examples.
-- Drop the function if it already exists
IF OBJECT_ID('INSTR', 'FN') IS NOT NULL
DROP FUNCTION INSTR
GO
-- User-defined function to implement Oracle INSTR in SQL Server
CREATE FUNCTION INSTR (#str VARCHAR(8000), #substr VARCHAR(255), #start INT, #occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE #found INT = #occurrence,
#pos INT = #start;
WHILE 1=1
BEGIN
-- Find the next occurrence
SET #pos = CHARINDEX(#substr, #str, #pos);
-- Nothing found
IF #pos IS NULL OR #pos = 0
RETURN #pos;
-- The required occurrence found
IF #found = 1
BREAK;
-- Prepare to find another one occurrence
SET #found = #found - 1;
SET #pos = #pos + 1;
END
RETURN #pos;
END
GO
To avoid the obvious, when the REVERSE function is available you do not need to create this scalar function and you can just get the required result like this:
SELECT
SUBSTRING(
foo.filepath,
LEN(foo.filepath) - CHARINDEX('\', REVERSE(foo.filepath))+2,
LEN(foo.filepath))
FROM foo

You could just do it in a single statement:
select distinct reverse(substring(reverse(FILEPATH), 1, charindex('/', reverse(FILEPATH))-1))
from filetable

tsql, picking out value-pairs

I have a column that has the following data:
PersonId="315618" LetterId="43" MailingGroupId="1" EntityId="551723" trackedObjectId="9538" EmailAddress="myemailaddy#addy.com"
Is there any good, clean tsql syntax to grab the 551723 (the value associated with EntityId). The combination of Substring and Patindex I'm using seems quite unwieldy.

That strings looks just like an XML attribute list for an element, so you can wrap it into an XML element and use xpath:
declare #t table (t nvarchar(max));
insert into #t (t) values (
N'PersonId="315618" LetterId="43" MailingGroupId="1"
EntityId="551723" trackedObjectId="9538"
EmailAddress="myemailaddy#addy.com"');
with xte as (
select cast(N'<x '+t+N'/>' as xml) as x from #t)
select
n.value(N'#PersonId', N'int') as PersonId
, n.value(N'#LetterId', N'int') as LetterId
, n.value(N'#EntityId', N'int') as EntityId
, n.value(N'#EmailAddress', N'varchar(256)') as EmailAddress
from xte
cross apply x.nodes(N'/x') t(n);
Whether this is better or worse that string manipulation depends on a variety of factors, not least the size of the string and number of records to parse. I preffer the simple and clean xpath syntax over char index based manipulation (the code is much more maintainable).

If that's the text in the column, then you're going to have to use substring at some stage.
declare #l_debug varchar(1000)
select #l_debug = 'PersonId="315618" LetterId="43" MailingGroupId="1" EntityId="551723" trackedObjectId="9538" EmailAddress="myemailaddy#addy.com"'
select substring(#l_debug, patindex('%EntityId="%', #l_debug)+ 10, 6)
If you don't know how long EntityID could be, then you'll need to get the patindex of the next double-quote after EntityID="
declare #l_debug varchar(1000), #l_sub varchar(100), #l_index2 numeric
select #l_debug = 'PersonId="315618" LetterId="43" MailingGroupId="1" EntityId="551723" trackedObjectId="9538" EmailAddress="myemailaddy#addy.com"'
select #l_sub = substring(#l_debug, patindex('%EntityId="%', #l_debug)+ 10 /*length of "entityid=""*/, char_length(#l_debug))
select #l_index2 = patindex('%"%', #l_sub)
select substring(#l_debug, patindex('%EntityId="%', #l_debug)+ 10, #l_index2 -1)

If you possibly can, break out your data. Either normalize your tables or store XML in the column (with an XML data type) instead of name, value pairs. You'll then be able to use the full power and speed of SQL Server, or at least be able to issue XPath queries (assuming a relatively recent version of SQL Server).
I know this probably won't help you in the short term, but it's a goal to work towards. :)

Substring(
Substring(EventArguments,PATINDEX('%EntityId%', EventArguments)+10,10),0,
PATINDEX('%"%', Substring(EventArguments,
PATINDEX('%EntityId%', EventArguments)+10,10))
)

Is there any simple way to format decimals in T-SQL?

I know it could be done trivially in a non-SQL environment [post-data processing, frontend, what have you], but that's not possible at the moment. Is there a way to take a decimal(5,2) and convert it to a varchar without the trailing zeroes/decimal points? For example:
declare #number decimal(5,2)
set #number = 123.00
select cast(#number as varchar) as FormattedNumber
And the result is '123.00'. Is there a (simple) way to get '123' instead? And likewise, instead of '123.30', '123.3'? Could do it by figuring out whether or not the hundredths/tenths places were 0 and manually trimming characters, but I wanted to know if there was a more elegant solution.

What about:
SELECT CAST(CAST(#number AS float) AS varchar(10))
However you may want to test this carefully with your raw data first.

This way is pretty simple:
DECLARE #Number DECIMAL(5,2)
SELECT #Number = 123.65
SELECT FormattedNumber = CAST(CAST(#Number AS DECIMAL(3,0)) AS VARCHAR(4))
Returns '124'.
The only thing to consider is whether you want to round up/down, or just strip the zeroes and decimal points without rounding; you'd cast the DECIMAL as an INT in the second case.

For controlled formatting of numbers in T-SQL you should use the FORMAT() function. For example:
DECLARE #number DECIMAL(9,2); SET #number = 1234567.12;
DECLARE #formatted VARCHAR(MAX); SET #formatted = FORMAT(#number, 'N0', 'en-AU');
PRINT #formatted;
The result will be:
1,234,567
The arguments to the FORMAT() function are:
FORMAT(value, format [, culture])
The value argument is your number. The format argument is a CLR type formatting string (in this example, I specified "normal number, zero precision"). The optional culture argument allows you to override the server culture setting to format the number as per a desired culture.
See also the MSDN ref page for FORMAT().

The Convert function may do what you want to do.
ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/tsqlref9/html/a87d0850-c670-4720-9ad5-6f5a22343ea8.htm

Let me try this again....
CREATE FUNCTION saneDecimal(#input decimal(5,2)) returns varchar(10)
AS
BEGIN
DECLARE #output varchar(10)
SET #output = CAST(#input AS varchar(10))
DECLARE #trimmable table (trimval char(1))
INSERT #trimmable VALUES ('0')
INSERT #trimmable VALUES ('.')
WHILE EXISTS (SELECT * FROM #trimmable WHERE trimval = CAST(SUBSTRING(#output, LEN(#output), 1) AS char(1)))
SET #output = LEFT(#output, LEN(#output) - 1)
RETURN #output
END
GO
SELECT dbo.saneDecimal(1.00)

You could strip the trailing zeroes in a while loop:
declare #number decimal(5,2)
declare #str varchar(100)
set #number = 123.00
set #str = #number
while substring(#str,len(#str),1) in ('0','.',',')
set #str = substring(#str,1,len(#str)-1)
But as AdaTheDev commented, this is more easily done client-side.

Simple and elegant? Not so much...but that's T-SQL for you:
DECLARE #number decimal(5,2) = 123.00
DECLARE #formatted varchar(5) = CAST(#number as varchar)
SELECT
LEFT(
#formatted,
LEN(#formatted)
- PATINDEX('%[^0.]%', REVERSE(#formatted))
+ 1
)

Use the Format(value,format string,culture) function in SQL Server 2012+

If you have SQL Server 2012 or Greater you can use the format function like this:
select format(#number,'0') as FormattedNumber
Of course the format function will return an nvarchar, and not a varchar. You can cast to get a specific type.

Also, take a look at the T-SQL STR function in Books Online; this can be used for formatting floats and might work for your case. For some reason it doesn't come up in Google searches relating to this problem.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Remove alphanumeric characters from the Varchar columns and then convert to Float - sql

Related

How do I convert decimal encoded hex to ascii in SQL Server 2012

how to sum up value within one cell SQL

Looking for a scalar function to find the last occurrence of a character in a string

tsql, picking out value-pairs

Is there any simple way to format decimals in T-SQL?

Categories

Resources