Extract largest number from a string in T-SQL - sql

I am importing working with data imported from excel files. There is a column with a string that can contain multiple numbers. I am trying to extract the largest number in the string or a 0 if there is no string.
The strings are in formats similar to:
"100% post-consumer recycled paper, 50% post-consumer recycled cover, 90% post-consumer recycled wire."
"Paper contains 30% post-consumer content."
or sometimes a empty string or null.
Given the irregular formatting of the string I am having trouble and any help would be appreciated.

Here's a scalar function that will take a string as an input and return the largest whole number it finds (up to a maximum of 3 digits, but from your question I've assumed you're dealing with percentages. If you need more digits, repeat the IF statements ad infinitum).
Paste this into SSMS and run it to create the function. To call it, do something like:
SELECT dbo.GetLargestNumberFromString(MyStringField) as [Largest Number in String]
FROM MyMessedUpData
Function:
CREATE FUNCTION GetLargestNumberFromString
(
#s varchar(max)
)
RETURNS int
AS
BEGIN
DECLARE #LargestNumber int, #i int
SET #i = 1
SET #LargestNumber = 0
WHILE #i <= LEN(#s)
BEGIN
IF SUBSTRING(#s, #i, 3) like '[0-9][0-9][0-9]'
BEGIN
IF CAST(SUBSTRING(#s, #i,3) as int) > #LargestNumber OR #LargestNumber IS NULL
SET #LargestNumber = CAST(SUBSTRING(#s, #i,3) as int);
END
IF SUBSTRING(#s, #i, 2) like '[0-9][0-9]'
BEGIN
IF CAST(SUBSTRING(#s, #i,2) as int) > #LargestNumber OR #LargestNumber IS NULL
SET #LargestNumber = CAST(SUBSTRING(#s, #i,2) as int);
END
IF SUBSTRING(#s, #i, 1) like '[0-9]' OR #LargestNumber IS NULL
BEGIN
IF CAST(SUBSTRING(#s, #i,1) as int) > #LargestNumber
SET #LargestNumber = CAST(SUBSTRING(#s, #i,1) as int);
END
SET #i = #i + 1
CONTINUE
END
RETURN #LargestNumber
END

Pull the data into SQL as-is
Write a query to get a distinct list of options in that column
Add a new column to store the desired value
Write an update statement to populate the new column
As far as determining the largest size, I think you need to look at your data set first, but the update could be as simple as:
DECLARE #COUNTER INT=1000
While EXISTS (SELECT * FROM <Table> WHERE NewColumn is NULL) AND #COUNTER>=0
BEGIN
UPDATE <Table> SET NewColumn=#COUNTER WHERE <SearchColumn> LIKE '%' + CONVERT(VARCHAR,#COUNTER) + '%' AND NewColumn is NULL
SET #COUNTER=#COUNTER-1
END

SQL Fiddle Demo
Generate the LEN(txt) possible RIGHT() fragments of txt. Trim each fragment at the first non-digit character. Test if the remainder is an int. Return the MAX().
SELECT
txt
,MAX(TRY_CONVERT(int,LEFT(RIGHT(txt,i),PATINDEX('%[^0-9]%',RIGHT(txt,i)+' ')-1)))
FROM MyTable
CROSS APPLY (
SELECT TOP(LEN(txt)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) i FROM master.dbo.spt_values a, master.dbo.spt_values b
) x
GROUP BY txt

I ended up creating a function that handled it. Here is the code:
CREATE FUNCTION [dbo].[cal_GetMaxPercentFromString]
RETURNS float
AS
BEGIN
declare #Numbers Table(number float)
insert into #Numbers
Select 0
declare #temp as varchar(2000) = #string
declare #position int, #length int, #offset int
WHILE CHARINDEX('%', #temp) > 0
BEGIN
set #position = CHARINDEX('%', #temp)
set #offset = 1
set #length = -1
WHILE #position - #offset > 0 and #length < 0
BEGIN
if SUBSTRING(#temp, #position - #offset, 1) not LIKE '[0-9]'
set #length = #offset - 1
set #offset = #offset + 1
END
if #length > 0
BEGIN
insert into #Numbers
select CAST(SUBSTRING(#temp, #position - #length, #length) as float)
END
set #temp = SUBSTRING(#temp, 1, #position - 1) + SUBSTRING(#temp, #position + 1, LEN(#temp) - #position)
END
declare #return as float
select #return = MAX(number) from #Numbers
return #return
END

Related

SQL: how to check palindrome of a string without using reverse function?

I'm using SQL Server and I want to check whether the given string is a palindrome or not - but without using the reverse function.
There are multiple ways to achieve this. One of them is to check first and last character, slicing them if they're equal and continuing the process in a loop.
DECLARE #string NVARCHAR(100)
DECLARE #counter INT
SET #string = 'Your string'
SET #counter = LEN(#string)/2
WHILE (#counter > 0)
BEGIN
IF LEFT(#string,1) = RIGHT(#string,1)
BEGIN
SET #string = SUBSTRING(#string,2,len(#string)-2)
SET #counter = #counter - 1
END
ELSE
BEGIN
PRINT ('Given string is not a Palindrome')
BREAK
END
END
IF(#counter = 0)
PRINT ('Given string is a Palindrome')
A select without loops
DECLARE #Test VARCHAR(100)
SELECT #Test = 'qwerewq'
SELECT CASE WHEN LEFT(#Test, LEN(#Test)/2) =
(
SELECT '' + SUBSTRING(RIGHT(#Test, LEN(#Test)/2), number, 1)
FROM master.dbo.spt_values
WHERE type='P' AND number BETWEEN 1 AND LEN(#Test)/2
ORDER BY number DESC
FOR XML PATH('')
)
THEN 1
ELSE 0
END
Here's an example using LEFT and RIGHT. I use the #count variable to change position, then grab the left-most and right-most char:
DECLARE #mystring varchar(100) = 'redivider'
DECLARE #count int = 1
WHILE (#count < LEN(#mystring) / 2) AND #count <> 0
BEGIN
IF (RIGHT(LEFT(#mystring, #count), 1) <> LEFT(RIGHT(#mystring, #count), 1))
SET #count = 0
ELSE SET #count += 1
END
SELECT CASE WHEN #count = 0
THEN 'Not a Palindrome'
ELSE 'Palindrome'
END [Result]

How to change case in string

My table has one column that contain strings like: ” HRM_APPLICATION_DELAY_IN”
I want to perform bellow operations on each row on this column
convert to lower case
remove underscore “_”
change case (convert to upper case) of the character after the underscore like: ” hrm_Application_Delay_In”
Need help for conversion. Thanks for advance
Here is a function to achieve it:
create function f_test
(
#a varchar(max)
)
returns varchar(max)
as
begin
set #a = lower(#a)
while #a LIKE '%\_%' ESCAPE '\'
begin
select #a = stuff(#a, v, 2, upper(substring(#a, v+1,1)))
from (select charindex('_', #a) v) a
end
return #a
end
Example:
select dbo.f_test( HRM_APPLICATION_DELAY_IN')
Result:
hrmApplicationDelayIn
To update your table here is an example how to write the syntax with the function:
UPDATE <yourtable>
SET <yourcolumn> = dbo.f_test(col)
WHERE <yourcolumn> LIKE '%\_%' ESCAPE '\'
For a variable this is overkill, but I'm using this to demonstrate a pattern
declare #str varchar(100) = 'HRM_APPLICATION_DELAY_IN';
;with c(one,last,rest) as (
select cast(lower(left(#str,1)) as varchar(max)),
left(#str,1), stuff(lower(#str),1,1,'')
union all
select one+case when last='_'
then upper(left(rest,1))
else left(rest,1) end,
left(rest,1), stuff(rest,1,1,'')
from c
where rest > ''
)
select max(one)
from c;
That can be extended to a column in a table
-- Sample table
declare #tbl table (
id int identity not null primary key clustered,
str varchar(100)
);
insert #tbl values
('HRM_APPLICATION_DELAY_IN'),
('HRM_APPLICATION_DELAY_OUT'),
('_HRM_APPLICATION_DELAY_OUT'),
(''),
(null),
('abc<de_fg>hi');
-- the query
;with c(id,one,last,rest) as (
select id,cast(lower(left(str,1)) as varchar(max)),
left(str,1), stuff(lower(str),1,1,'')
from #tbl
union all
select id,one+case when last='_'
then upper(left(rest,1))
else left(rest,1) end,
left(rest,1), stuff(rest,1,1,'')
from c
where rest > ''
)
select id,max(one)
from c
group by id
option (maxrecursion 0);
-- result
ID COLUMN_1
1 hrm_Application_Delay_In
2 hrm_Application_Delay_Out
3 _Hrm_Application_Delay_Out
4
5 (null)
6 abc<de_Fg>hi
select
replace(replace(replace(replace(replace(replace(replace(
replace(replace(replace(replace(replace(replace(replace(
replace(replace(replace(replace(replace(replace(replace(
replace(replace(replace(replace(replace(replace(lower('HRM_APPLICATION_DELAY_IN'),'_a','A'),'_b','B'),'_c','C'),'_d','D'),'_e','E'),'_f','F'),
'_g','G'),'_h','H'),'_i','I'),'_j','J'),'_k','K'),'_l','L'),
'_m','M'),'_n','N'),'_o','O'),'_p','P'),'_q','Q'),'_r','R'),
'_s','S'),'_t','T'),'_u','U'),'_v','V'),'_w','W'),'_x','X'),
'_y','Y'),'_z','Z'),'_','')
Bellow two steps can solve problem,as example i use sys.table.user can use any one
declare #Ret varchar(8000), #RetVal varchar(8000), #i int, #count int = 1;
declare #c varchar(10), #Text varchar(8000), #PrevCase varchar, #ModPrefix varchar(10);
DECLARE #FileDataTable TABLE(TableName varchar(200))
INSERT INTO #FileDataTable
select name FROM sys.tables where object_name(object_id) not like 'sys%' order by name
SET #ModPrefix = 'Pur'
DECLARE crsTablesTruncIns CURSOR
FOR select TableName FROM #FileDataTable
OPEN crsTablesTruncIns
FETCH NEXT FROM crsTablesTruncIns INTO #Text
WHILE ##FETCH_STATUS = 0
BEGIN
SET #RetVal = '';
select #i=1, #Ret = '';
while (#i <= len(#Text))
begin
SET #c = substring(#Text,#i,1)
--SET #Ret = #Ret + case when #Reset=1 then UPPER(#c) else LOWER(#c)
IF(#PrevCase = '_' OR #i = 1)
SET #Ret = UPPER(#c)
ELSE
SET #Ret = LOWER(#c)
--#Reset = case when #c like '[a-zA-Z]' then 0 else 1 end,
if(#c like '[a-zA-Z]')
SET #RetVal = #RetVal + #Ret
if(#c = '_')
SET #PrevCase = '_'
else
SET #PrevCase = ''
SET #i = #i +1
end
SET #RetVal = #ModPrefix + #RetVal
print cast(#count as varchar) + ' ' + #RetVal
SET #count = #count + 1
EXEC sp_RENAME #Text , #RetVal
SET #RetVal = ''
FETCH NEXT FROM crsTablesTruncIns INTO #Text
END
CLOSE crsTablesTruncIns
DEALLOCATE crsTablesTruncIns
I'd like to show you my nice and simple solution. It uses Tally function to split the string by pattern, in our case by underscope. For understanding Tally functions, read this article.
So, this is how my tally function looks like:
CREATE FUNCTION [dbo].[tvf_xt_tally_split](
#String NVARCHAR(max)
,#Delim CHAR(1))
RETURNS TABLE
as
return
(
WITH Tally AS (SELECT top (select isnull(LEN(#String),100)) n = ROW_NUMBER() OVER(ORDER BY [name]) from master.dbo.syscolumns)
(
SELECT LTRIM(RTRIM(SUBSTRING(#Delim + #String + #Delim,N+1,CHARINDEX(#Delim,#Delim + #String + #Delim,N+1)-N-1))) Value, N as Ix
FROM Tally
WHERE N < LEN(#Delim + #String + #Delim)
AND SUBSTRING(#Delim + #String + #Delim,N,1) = #Delim
)
)
This function returns a table, where each row represents part of string between #Delim (in our case between underscopes). Rest of the work is simple, just cobination of LEFT, RIGHT, LEN, UPPER and LOWER functions.
declare #string varchar(max)
set #string = ' HRM_APPLICATION_DELAY_IN'
-- convert to lower case
set #string = LOWER(#string)
declare #output varchar(max)
-- build string
select #output = coalesce(#output + '_','') +
UPPER(left(Value,1)) + RIGHT(Value, LEN(Value) - 1)
from dbo.tvf_xt_tally_split(#string, '_')
-- lower first char
select left(lower(#output),1) + RIGHT(#output, LEN(#output) - 1)

Get a count of ranged and comma separated items in a string in SQL

In SQL SERVER 2008 R2, I need to get a count of items contained in a string that can have any of the following characteristics (controlled by the user and not the system):
each item separated by a comma
sequential items summarized by the first and last item separated by a dash
the non-incremental character attached to only the first item in a dash separated range
multiple characters representing the non-incremental portion of the designation
combination of the above
All of the following are possible:
R1,R2,R3,R4
R1-R4
R1-4
CP10-CP12
R1,R15-R19,RN5
If they were all comma separated, I could just count the commas +1, but that is actually less common than the other options.
A method to count the last option should work for all options. The result should be 7
My expected approach would be to:
separate the items without a dash but separated by a comma => get a count
isolate the dash separated items and remove the non-incremental character(s)
subtract the smaller number from the larger number and add 1
Add that number to the first number for a total count
I am totally stuck on even where to begin. Any ideas?
This can be cleaned up/optimized and is intentionally verbose but should get you started. Notably, the logic inside the last IF is almost identical to that of the WHILE and the block that gets the numeric value of the left/right elements is repeated four times.
declare #input varchar(max)
set #input = 'R1,R15-R19,RN5-RN6'
select #input
declare #elements table
(
Element varchar(10),
[Count] int
)
declare #element varchar(10)
declare #index int
declare #count int
declare #left varchar(10)
declare #right varchar(10)
declare #position int
while (len(#input) > 0 and charindex(',', #input) > 0)
begin
set #element = substring(#input, 0, charindex(',', #input))
if (charindex('-', #element) > 0)
begin
set #index = charindex('-', #element)
set #left = left(#element, #index - 1)
set #right = substring(#element, #index + 1, len(#element) - len(#left))
set #position = 0
while (isnumeric(substring(#left, #position, 1)) = 0)
begin
set #position = #position + 1
end
set #left = substring(#left, #position, len(#left))
set #position = 0
while (isnumeric(substring(#right, #position, 1)) = 0)
begin
set #position = #position + 1
end
set #right = substring(#right, #position, len(#right))
set #count = cast(#right as int) - cast(#left as int) + 1
end
else
begin
set #count = 1
end
insert into #elements select #element, #count
set #input = replace(#input, #element + ',', '')
end
if (len(#input) > 0)
begin
set #element = #input
if (charindex('-', #element) > 0)
begin
set #index = charindex('-', #element)
set #left = left(#element, #index - 1)
set #right = substring(#element, #index + 1, len(#element) - len(#left))
set #position = 0
while (isnumeric(substring(#left, #position, 1)) = 0)
begin
set #position = #position + 1
end
set #left = substring(#left, #position, len(#left))
set #position = 0
while (isnumeric(substring(#right, #position, 1)) = 0)
begin
set #position = #position + 1
end
set #right = substring(#right, #position, len(#right))
set #count = cast(#right as int) - cast(#left as int) + 1
end
else
begin
set #count = 1
end
insert into #elements select #element, #count
end
select * from #elements
select sum([Count]) from #elements
Outputs the following results:
R1,R15-R19,RN5-RN6
R1 1
R15-R19 5
RN5-RN6 2
8
You can use a trick to count the number of commas in a comma separated list:
select len(str) - len(replace(str, ',', '')
For the complete solution, you need need to do something more complicated. Long ago, I downloaded a function called split, that takes a delimited string and returns the components as if it were a table. In fact, it looks like I picked this up from here . . . T-SQL: Opposite to string concatenation - how to split string into multiple records.
So, the idea is that you split the string, and then parse the components to count. If thee is no hyphen, count "1". If there is a hyphen, you need to parse the string to get the count.

Storing phone nos with only numbers and with "x" for extension?

I have a test function which would sanitize phone nos and allow only nos and characters "x" or "X" to be stored. I have it to where it does most of it other than it allows multiple x's which I don't want. Can anybody help me add it to the regular expression also let me know if you spot potential issues ?
CREATE Function [dbo].[RemoveAlphaCharacters](#Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
While PatIndex('%[^0-9,x,X]%', #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex('%[^0-9,x,X]%', #Temp), 1, '')
Return #TEmp
End
The problem with PATINDEX here is that it can't really determine that the pattern should change after it hits a string for the first time. So maybe this approach will be simpler:
CREATE FUNCTION [dbo].[RemoveAlphaCharacters]
(
#Temp VARCHAR(1000)
)
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE #i INT, #hitX BIT, #t VARCHAR(1000), #c CHAR(1);
SELECT #i = 1, #hitX = 0, #t = '';
WHILE #i <= LEN(#Temp)
BEGIN
SET #c = SUBSTRING(#Temp, #i, 1);
IF LOWER(#c) = 'x' AND #hitX = 0
BEGIN
SET #t = #t + #c;
SET #hitX = 1;
END
IF #c LIKE '[0-9]'
BEGIN
SET #t = #t + #c;
END
SET #i = #i + 1;
END
RETURN(#t);
END
GO
SELECT dbo.RemoveAlphaCharacters('401-867-9092');
SELECT dbo.RemoveAlphaCharacters('401-867-9092x32');
SELECT dbo.RemoveAlphaCharacters('401-867-9092x32x54');
Results:
4018679092
4018679092x32
4018679092x3254

How to check upper case existence length in a string - Sql Query

How to check upper case existence length in a string using Sql Query?
For Eg:
1.KKart - from this string the result should be 2, because it has 2 upper case letters.
2.WPOaaa - from this string the result should be 3, because it has 3 upper case letters.
Thanks in advance
There is no built-in T-SQL function for that.
You can use a user-defined function like this one:
CREATE FUNCTION CountUpperCase
(
#input nvarchar(50)
)
RETURNS int
AS
BEGIN
declare #len int
declare #i int
declare #count int
declare #ascii int
set #len = len(#input)
set #i = 1
set #count = 0
while #i <= #len
begin
set #ascii = ascii(substring(#input, #i, 1))
if #ascii >= 65 and #ascii <= 90
begin
set #count = #count +1
end
set #i = #i + 1
end
return #count
END
Usage (with the examples from your question):
select dbo.CountUpperCase('KKart') returns 2.
select dbo.CountUpperCase('WPOaaa') returns 3.
How about something like this :
SELECT len(replace(my_string_field,'abcdefghijklmnopqrstuvwxyz','')) as 'UpperLen'
FROM my_table
The principle is simply to replace all lower case char by nothing and counts the remaining.