How to compare if two strings contain the same words in T-SQL for SQL Server 2008?

How to compare if two strings contain the same words in T-SQL for SQL Server 2008? - sql

When I compare two strings in SQL Server, there are couple of simple ways with = or LIKE.
I want to redefine equality as:
If two strings contain the same words - no matter in what order - they are equal, otherwise they are not.
For example:
'my word' and 'word my' are equal
'my word' and 'aaamy word' are not
What's the best simple solution for this problem?

I don't think there is a simple solution for what you are trying to do in SQL Server. My first thought would be to create a CLR UDF that:
Accepts two strings
Breaks them into two arrays using the split function on " "
Compare the contents of the two arrays, returning true if they contain the same elements.
If this is a route you'd like to go, take a look at this article to get started on creating CLR UDFs.

Try this... The StringSorter function breaks strings on a space and then sorts all the words and puts the string back together in sorted word order.
CREATE FUNCTION dbo.StringSorter(#sep char(1), #s varchar(8000))
RETURNS varchar(8000)
AS
BEGIN
DECLARE #ResultVar varchar(8000);
WITH sorter_cte AS (
SELECT CHARINDEX(#sep, #s) as pos, 0 as lastPos
UNION ALL
SELECT CHARINDEX(#sep, #s, pos + 1), pos
FROM sorter_cte
WHERE pos > 0
)
, step2_cte AS (
SELECT SUBSTRING(#s, lastPos + 1,
case when pos = 0 then 80000
else pos - lastPos -1 end) as chunk
FROM sorter_cte
)
SELECT #ResultVar = (select ' ' + chunk
from step2_cte
order by chunk
FOR XML PATH(''));
RETURN #ResultVar;
END
GO
Here is a test case just trying out the function:
SELECT dbo.StringSorter(' ', 'the quick brown dog jumped over the lazy fox');
which produced these results:
brown dog fox jumped lazy over quick the the
Then to run it from a select statement using your strings
SELECT case when dbo.StringSorter(' ', 'my word') =
dbo.StringSorter(' ', 'word my')
then 'Equal' else 'Not Equal' end as ResultCheck
SELECT case when dbo.StringSorter(' ', 'my word') =
dbo.StringSorter(' ', 'aaamy word')
then 'Equal' else 'Not Equal' end as ResultCheck
The first one shows that they are equal, and the second does not.
This should do exactly what you are looking for with a simple function utilizing a recursive CTE to sort your string.
Enjoy!

There is no simple way to do this. You are advised to write a function or stored procedure that does he processing involved with this requirement.
Your function can use other functions that split the stings into parts, sort by words etc.
Here's how you can split the strings:
T-SQL: Opposite to string concatenation - how to split string into multiple records

Scenario is as follows. You would want to use a TVF to split the first and the second strings on space and then full join the resulting two tables on values and if you have nulls on left or right you've got inequality otherwise they are equal.

A VERY simple way to do this...
JC65100
ALTER FUNCTION [dbo].[ITS_GetDifCharCount]
(
#str1 VARCHAR(MAX)
,#str2 VARCHAR(MAX)
)
RETURNS INT
AS
BEGIN
DECLARE #result INT
SELECT #result = COUNT(*)
FROM dbo.ITS_CompareStrs(#str1,#str2 )
RETURN #result
END
ALTER FUNCTION [dbo].[ITS_CompareStrs]
(
#str1 VARCHAR(MAX)
,#str2 VARCHAR(MAX)
)
RETURNS
#Result TABLE (ind INT, c1 char(1), c2 char(1))
AS
BEGIN
DECLARE #i AS INT
,#c1 CHAR(1)
,#c2 CHAR(1)
SET #i = 1
WHILE LEN (#str1) > #i-1 OR LEN (#str2) > #i-1
BEGIN
IF LEN (#str1) > #i-1
SET #c1 = substring(#str1, #i, 1)
IF LEN (#str2) > #i-1
SET #c2 = substring(#str2, #i, 1)
INSERT INTO #Result([ind],c1,c2)
SELECT #i,#c1,#c2
SELECT #i=#i+1
,#c1=NULL
,#c2=NULL
END
DELETE FROM #Result
WHERE c1=c2
RETURN
END

You can add a precomputed column in the base table that is evaluated in INSERT/UPDATE trigger (or UDF default) that splits, sorts and then concatenates words from the original column.
Then use = to compare these precomputed columns.

There is library called http://www.sqlsharp.com/ that contains a whole range of useful string/math functions.
It has a function called String_CompareSplitValues which does precisely what you want.
I am not sure if it is in the community version or the paid for version.

declare #s1 varchar(50) = 'my word'
declare #s2 varchar(50) = 'word my'
declare #t1 table (word varchar(50))
while len(#s1)>0
begin
if (CHARINDEX(' ', #s1)>0)
begin
insert into #t1 values(ltrim(rtrim(LEFT(#s1, charindex(' ', #s1)))))
set #s1 = LTRIM(rtrim(right(#s1, len(#s1)-charindex(' ', #s1))))
end
else
begin
insert into #t1 values (#s1)
set #s1=''
end
end
declare #t2 table (word varchar(50))
while len(#s2)>0
begin
if (CHARINDEX(' ', #s2)>0)
begin
insert into #t2 values(ltrim(rtrim(LEFT(#s2, charindex(' ', #s2)))))
set #s2 = LTRIM(rtrim(right(#s2, len(#s2)-charindex(' ', #s2))))
end
else
begin
insert into #t2 values (#s2)
set #s2=''
end
end
select case when exists(SELECT * FROM #t1 EXCEPT SELECT * FROM #t2) then 'are not' else 'are equal' end

Related

How can I count the number of words in a column in SQL Server

Is there a query that will return the total number of words in a column? I found some code that can allow me to count the words in a string, but cannot apply it to the entire column.
I first create the function found from http://www.sql-server-helper.com/functions/count-words.aspx:
CREATE FUNCTION [dbo].[WordCount] ( #InputString VARCHAR(4000) )
RETURNS INT
AS
BEGIN
DECLARE #Index INT
DECLARE #Char CHAR(1)
DECLARE #PrevChar CHAR(1)
DECLARE #WordCount INT
SET #Index = 1
SET #WordCount = 0
WHILE #Index <= LEN(#InputString)
BEGIN
SET #Char = SUBSTRING(#InputString, #Index, 1)
SET #PrevChar = CASE WHEN #Index = 1 THEN ' '
ELSE SUBSTRING(#InputString, #Index - 1, 1)
END
IF #PrevChar = ' ' AND #Char != ' '
SET #WordCount = #WordCount + 1
SET #Index = #Index + 1
END
RETURN #WordCount
END
GO
Next, test it on a string:
DECLARE #String VARCHAR(4000)
SET #String = 'Health Insurance is an insurance against expenses incurred through illness of the insured.'
SELECT [dbo].[WordCount] ( #String )
In this example, this returns 13. However, I am trying to get the totals of an entire column. For example, if I had a column with 2 rows and each row contained this string in it, I would like it to return 26 to reflect the total words in the column rather than an individual string.

You could sum this function call:
SELECT SUM([dbo].[WordCount]([my_column]))
FROM [my_table]

Why you are using WHILE loop, since you can just count the words as
WITH TBL AS
(
SELECT 'One' Str
UNION
SELECT 'One Two'
UNION
SELECT 'One Two Three'
UNION
SELECT 'One Two Three Four'
)
SELECT SUM((LEN(Str) - LEN(REPLACE(Str, ' ', ''))) + 1)
FROM TBL;
--WHERE Str <> '' AND Str IS NOT NULL;
This way you will count all the words in that column.

Another word count using SQL solution is provided at referred tutorial where instead of WHILE loop REPLACE SQL function is used for determining the count.
The problem or missing part with these SQL solutions that they do not consider patterns like web addresses. Since a URL has ".com" that will cause plus 1 in the total word count.
All scalar functions can be applied to a table's specific columns' values as
SELECT dbo.scalarFunction(columnName) FROM tableName

STRING_SPLIT was introduced in SQL 2016 and I use it for my counting as following (TRIM is to remove any spaces before or after the string)
SELECT value AS number_of_words
FROM YourTable a
CROSS APPLY STRING_SPLIT(TRIM(a.YourStringColumnWithWordsToCount),' ')
It returns you all the words and now you can either SUM on the value or SUM DISTINCT if you need to eliminate duplicate words (e.g. if you send words for translation).

Is there a LastIndexOf in SQL Server?

I am trying to parse out a value from a string that involves getting the last index of a string. Currently, I am doing a horrible hack that involves reversing a string:
SELECT REVERSE(SUBSTRING(REVERSE(DB_NAME()), 1,
CHARINDEX('_', REVERSE(DB_NAME()), 1) - 1))
To me this code is nearly unreadable. I just upgraded to SQL Server 2016 and I hoping there is a better way.
Is there?

If you want everything after the last _, then use:
select right(db_name(), charindex('_', reverse(db_name()) + '_') - 1)
If you want everything before, then use left():
select left(db_name(), len(db_name()) - charindex('_', reverse(db_name()) + '_'))

Wrote 2 functions, 1 to return LastIndexOf for the selected character.
CREATE FUNCTION dbo.LastIndexOf(#source nvarchar(80), #pattern char)
RETURNS int
BEGIN
RETURN (LEN(#source)) - CHARINDEX(#pattern, REVERSE(#source))
END;
GO
and 1 to return a string before this LastIndexOf. Maybe it will be useful to someone.
CREATE FUNCTION dbo.StringBeforeLastIndex(#source nvarchar(80), #pattern char)
RETURNS nvarchar(80)
BEGIN
DECLARE #lastIndex int
SET #lastIndex = (LEN(#source)) - CHARINDEX(#pattern, REVERSE(#source))
RETURN SUBSTRING(#source, 0, #lastindex + 1)
-- +1 because index starts at 0, but length at 1, so to get up to 11th index, we need LENGTH 11+1=12
END;
GO

No, SQL server doesnt have LastIndexOf.
This are the available string functions
But you can always can create your own function
CREATE FUNCTION dbo.LastIndexOf(#source text, #pattern char)
RETURNS
AS
BEGIN
DECLARE #ret text;
SELECT into #ret
REVERSE(SUBSTRING(REVERSE(#source), 1,
CHARINDEX(#pattern, REVERSE(#source), 1) - 1))
RETURN #ret;
END;
GO

Once you have one of the split strings from here,you can do it in a set based way like this..
declare #string varchar(max)
set #string='C:\Program Files\Microsoft SQL Server\MSSQL\DATA\AdventureWorks_Data.mdf'
;with cte
as
(select *,row_number() over (order by (select null)) as rownum
from [dbo].[SplitStrings_Numbers](#string,'\')
)
select top 1 item from cte order by rownum desc
**Output:**
AdventureWorks_Data.mdf

CREATE FUNCTION dbo.LastIndexOf(#text NTEXT, #delimiter NTEXT)
RETURNS INT
AS
BEGIN
IF (#text IS NULL) RETURN NULL;
IF (#delimiter IS NULL) RETURN NULL;
DECLARE #Text2 AS NVARCHAR(MAX) = #text;
DECLARE #Delimiter2 AS NVARCHAR(MAX) = #delimiter;
DECLARE #Index AS INT = CHARINDEX(REVERSE(#Delimiter2), REVERSE(#Text2));
IF (#Index < 1) RETURN 0;
DECLARE #ContentLength AS INT = (LEN('|' + #Text2 + '|') - 2);
DECLARE #DelimiterLength AS INT = (LEN('|' + #Delimiter2 + '|') - 2);
DECLARE #Result AS INT = (#ContentLength - #Index - #DelimiterLength + 2);
RETURN #Result;
END
Allows for multi-character delimiters like ", " (comma space).
Returns 0 if the delimiter is not found.
Takes a NTEXT for comfort reasons as NVARCHAR(MAX)s are implicitely cast into NTEXT but not vice-versa.
Handles delimiters with leading or tailing space correctly!

Try:
select LEN('tran van abc') + 1 - CHARINDEX(' ', REVERSE('tran van abc'))
So, the last index of ' ' is : 9

I came across this thread while searching for a solution to my similar problem which had the exact same requirement but was for a different kind of database that was lacking the REVERSE function.
In my case this was for a OpenEdge (Progress) database, which has a slightly different syntax. This made the INSTR function available to me that most Oracle typed databases offer.
So I came up with the following code:
SELECT
INSTR(foo.filepath, '/',1, LENGTH(foo.filepath) - LENGTH( REPLACE( foo.filepath, '/', ''))) AS IndexOfLastSlash
FROM foo
However, for my specific situation (being the OpenEdge (Progress) database) this did not result into the desired behaviour because replacing the character with an empty char gave the same length as the original string. This doesn't make much sense to me but I was able to bypass the problem with the code below:
SELECT
INSTR(foo.filepath, '/',1, LENGTH( REPLACE( foo.filepath, '/', 'XX')) - LENGTH(foo.filepath)) AS IndexOfLastSlash
FROM foo
Now I understand that this code won't solve the problem for T-SQL because there is no alternative to the INSTR function that offers the Occurence property.
Just to be thorough I'll add the code needed to create this scalar function so it can be used the same way like I did in the above examples. And will do exactly what the OP wanted, serve as a LastIndexOf method for SQL Server.
-- Drop the function if it already exists
IF OBJECT_ID('INSTR', 'FN') IS NOT NULL
DROP FUNCTION INSTR
GO
-- User-defined function to implement Oracle INSTR in SQL Server
CREATE FUNCTION INSTR (#str VARCHAR(8000), #substr VARCHAR(255), #start INT, #occurrence INT)
RETURNS INT
AS
BEGIN
DECLARE #found INT = #occurrence,
#pos INT = #start;
WHILE 1=1
BEGIN
-- Find the next occurrence
SET #pos = CHARINDEX(#substr, #str, #pos);
-- Nothing found
IF #pos IS NULL OR #pos = 0
RETURN #pos;
-- The required occurrence found
IF #found = 1
BREAK;
-- Prepare to find another one occurrence
SET #found = #found - 1;
SET #pos = #pos + 1;
END
RETURN #pos;
END
GO
To avoid the obvious, when the REVERSE function is available you do not need to create this scalar function and you can just get the required result like this:
SELECT
LEN(foo.filepath) - CHARINDEX('\', REVERSE(foo.filepath))+1 AS LastIndexOfSlash
FROM foo

Try this.
drop table #temp
declare #brokername1 nvarchar(max)='indiabullssecurities,canmoney,indianivesh,acumencapitalmarket,sharekhan,edelweisscapital';
Create Table #temp
(
ID int identity(1,1) not null,
value varchar(100) not null
)
INSERT INTO #temp(value) SELECT value from STRING_SPLIT(#brokername1,',')
declare #id int;
set #id=(select max(id) from #temp)
--print #id
declare #results varchar(500)
select #results = coalesce(#results + ',', '') + convert(varchar(12),value)
from #temp where id<#id
order by id
print #results

Split words with a capital letter in sql

Does anyone know how to split words starting with capital letters from a string?
Example:
DECLARE #var1 varchar(100) = 'OneTwoThreeFour'
DECLARE #var2 varchar(100) = 'OneTwoThreeFourFive'
DECLARE #var3 varchar(100) = 'One'
SELECT #var1 as Col1, <?> as Col2
SELECT #var2 as Col1, <?> as Col2
SELECT #var3 as Col1, <?> as Col2
expected result:
Col1 Col2
OneTwoThreeFour One Two three Four
OneTwoThreeFourFive One Two Three Four Five
One One
If this is not possible (or if too long) an scalar function would be okay as well.

Here is a function I created that is similar to the "removing non-alphabetic characters". How to strip all non-alphabetic characters from string in SQL Server?
This one uses a case sensitive collation which actively seeks out a non-space/capital letter combination and then uses the STUFF function to insert the space. This IS a scalar UDF, so some folks will immediately say that it will be slower than other solutions. To that notion, I say, please test it. This function does not use any table data and only loops as many times as necessary, so it will likely give you very good performance.
Create Function dbo.Split_On_Upper_Case(#Temp VarChar(1000))
Returns VarChar(1000)
AS
Begin
Declare #KeepValues as varchar(50)
Set #KeepValues = '%[^ ][A-Z]%'
While PatIndex(#KeepValues collate Latin1_General_Bin, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#KeepValues collate Latin1_General_Bin, #Temp) + 1, 0, ' ')
Return #Temp
End
Call it like this:
Select dbo.Split_On_Upper_Case('OneTwoThreeFour')
Select dbo.Split_On_Upper_Case('OneTwoThreeFour')
Select dbo.Split_On_Upper_Case('One')
Select dbo.Split_On_Upper_Case('OneTwoThree')
Select dbo.Split_On_Upper_Case('stackOverFlow')
Select dbo.Split_On_Upper_Case('StackOverFlow')

Here is a function I have just created.
FUNCTION
CREATE FUNCTION dbo.Split_On_Upper_Case
(
#String VARCHAR(4000)
)
RETURNS VARCHAR(4000)
AS
BEGIN
DECLARE #Char CHAR(1);
DECLARE #i INT = 0;
DECLARE #OutString VARCHAR(4000) = '';
WHILE (#i <= LEN(#String))
BEGIN
SELECT #Char = SUBSTRING(#String, #i,1)
IF (#Char = UPPER(#Char) Collate Latin1_General_CS_AI)
SET #OutString = #OutString + ' ' + #Char;
ELSE
SET #OutString = #OutString + #Char;
SET #i += 1;
END
SET #OutString = LTRIM(#OutString);
RETURN #OutString;
END
Test Data
DECLARE #TABLE TABLE (Strings VARCHAR(1000))
INSERT INTO #TABLE
VALUES ('OneTwoThree') ,
('FourFiveSix') ,
('SevenEightNine')
Query
SELECT dbo.Split_On_Upper_Case(Strings) AS Vals
FROM #TABLE
Result Set
╔══════════════════╗
║ Vals ║
╠══════════════════╣
║ One Two Three ║
║ Four Five Six ║
║ Seven Eight Nine ║
╚══════════════════╝

If a single query is needed 26 REPLACE can be used to check every upper case letter like
SELECT #var1 col1, REPLACE(
REPLACE(
REPLACE(
...
REPLACE(#var1, 'A', ' A')
, ...
, 'X', ' X')
, 'Y', ' Y')
, 'Z', ' Z') col2
Not the most beautiful thing but it'll work.
EDIT
Just to add another function to do the same thing in a different way of the other answers
CREATE FUNCTION splitCapital (#param Varchar(MAX))
RETURNS Varchar(MAX)
BEGIN
Declare #ret Varchar(MAX) = '';
declare #len int = len(#param);
WITH Base10(N) AS (
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7
UNION ALL SELECT 8 UNION ALL SELECT 9
), Chars(N) As (
Select TOP(#len)
nthChar
= substring(#param, u.N + t.N*10 + h.N*100 + th.N*1000 + 1, 1)
Collate Latin1_General_CS_AI
FROM Base10 u
CROSS JOIN Base10 t
CROSS JOIN Base10 h
CROSS JOIN Base10 th
WHERE u.N + t.N*10 + h.N*100 + th.N*1000 < #len
ORDER BY u.N + t.N*10 + h.N*100 + th.N*1000
)
SELECT #ret += Case nthChar
When UPPER(nthChar) Then ' '
Else ''
End + nthChar
FROM Chars
RETURN #ret;
END
This one uses the possibility of TSQL to concatenate string variable, I had to use the TOP N trick to force the Chars CTE rows in the right order

Build a Numbers table. There are some excellent posts on SO to show you how to do this. Populate it with values up the maximum length of your input string. Select the values from 1 through the actual length of the current input string. Cross join this list of numbers to the input string. Use the result to SUBSTRING() each character. Then you can either compare the resulting list of one-charachter values to a pre-populated table-valued variable or convert each character to an integer using ASCII() and choose only those between 65 ('A') and 90 ('Z'). At this point you have a list which is the position of each upper-case character in your input string. UNION the maximum length of your input string onto the end of this list. You'll see why in just a second. Now you can SUBSTRING() your input variable, starting at the Number given by row N and taking a length of (the Number given by row N+1) - (The number given by row N). This is why you have to UNION the extra Number on the end. Finally concatenate all these substring together, space-separated, using the algorithm of your choice.
Sorry, don't have an instance in front of me to try out code. Sounds like a fun task. I think doing it with nested SELECT statements will get convoluted and un-maintainable; better to lay it out as CTEs, IMHO.

I know that there are already some good answers out there, but if you wanted to avoid creating a function, you could also use a recursive CTE to accomplish this. It's certainly not a clean way of doing this, but it works.
DECLARE
#camelcase nvarchar(4000) = 'ThisIsCamelCased'
;
WITH
split
AS
(
SELECT
[iteration] = 0
,[string] = #camelcase
UNION ALL
SELECT
[iteration] = split.[iteration] + 1
,[string] = STUFF(split.[string], pattern.[index] + 1, 0, ' ')
FROM
split
CROSS APPLY
( SELECT [index] = PATINDEX(N'%[^ ][A-Z]%' COLLATE Latin1_General_Bin, split.[string]) )
pattern
WHERE
pattern.[index] > 0
)
SELECT TOP (1)
[spaced] = split.[string]
FROM
split
ORDER BY
split.[iteration] DESC
;
As I said, this isn't a pretty way to write a query, but I use things like this when I'm just writing up some ad-hoc queries where I would not want to add new artifacts to the database. You could also use this to create your function as an inline table valued function, which is always a tad nicer.

Please Try This:
declare #t nvarchar (100) ='IamTheTestString'
declare #len int
declare #Counter int =0
declare #Final nvarchar (100) =''
set #len =len( #t)
while (#Counter <= #len)
begin
set #Final= #Final + Case when ascii(substring (#t,#Counter,1))>=65 and
ascii(substring (#t,#Counter,1))<=90 then ' '+substring (#t,#Counter,1) else
substring (#t,#Counter,1) end
set #Counter=#Counter+1
end
print ltrim(#Final)

Query to get only numbers from a string

I have data like this:
string 1: 003Preliminary Examination Plan
string 2: Coordination005
string 3: Balance1000sheet
The output I expect is
string 1: 003
string 2: 005
string 3: 1000
And I want to implement it in SQL.

First create this UDF
CREATE FUNCTION dbo.udf_GetNumeric
(
#strAlphaNumeric VARCHAR(256)
)
RETURNS VARCHAR(256)
AS
BEGIN
DECLARE #intAlpha INT
SET #intAlpha = PATINDEX('%[^0-9]%', #strAlphaNumeric)
BEGIN
WHILE #intAlpha > 0
BEGIN
SET #strAlphaNumeric = STUFF(#strAlphaNumeric, #intAlpha, 1, '' )
SET #intAlpha = PATINDEX('%[^0-9]%', #strAlphaNumeric )
END
END
RETURN ISNULL(#strAlphaNumeric,0)
END
GO
Now use the function as
SELECT dbo.udf_GetNumeric(column_name)
from table_name
SQL FIDDLE
I hope this solved your problem.
Reference

Try this one -
Query:
DECLARE #temp TABLE
(
string NVARCHAR(50)
)
INSERT INTO #temp (string)
VALUES
('003Preliminary Examination Plan'),
('Coordination005'),
('Balance1000sheet')
SELECT LEFT(subsrt, PATINDEX('%[^0-9]%', subsrt + 't') - 1)
FROM (
SELECT subsrt = SUBSTRING(string, pos, LEN(string))
FROM (
SELECT string, pos = PATINDEX('%[0-9]%', string)
FROM #temp
) d
) t
Output:
----------
003
005
1000

Query:
DECLARE #temp TABLE
(
string NVARCHAR(50)
)
INSERT INTO #temp (string)
VALUES
('003Preliminary Examination Plan'),
('Coordination005'),
('Balance1000sheet')
SELECT SUBSTRING(string, PATINDEX('%[0-9]%', string), PATINDEX('%[0-9][^0-9]%', string + 't') - PATINDEX('%[0-9]%',
string) + 1) AS Number
FROM #temp

Please try:
declare #var nvarchar(max)='Balance1000sheet'
SELECT LEFT(Val,PATINDEX('%[^0-9]%', Val+'a')-1) from(
SELECT SUBSTRING(#var, PATINDEX('%[0-9]%', #var), LEN(#var)) Val
)x

Getting only numbers from a string can be done in a one-liner.
Try this :
SUBSTRING('your-string-here', PATINDEX('%[0-9]%', 'your-string-here'), LEN('your-string-here'))
NB: Only works for the first int in the string, ex: abc123vfg34 returns 123.

I found this approach works about 3x faster than the top voted answer. Create the following function, dbo.GetNumbers:
CREATE FUNCTION dbo.GetNumbers(#String VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN;
WITH
Numbers
AS (
--Step 1.
--Get a column of numbers to represent
--every character position in the #String.
SELECT 1 AS Number
UNION ALL
SELECT Number + 1
FROM Numbers
WHERE Number < LEN(#String)
)
,Characters
AS (
SELECT Character
FROM Numbers
CROSS APPLY (
--Step 2.
--Use the column of numbers generated above
--to tell substring which character to extract.
SELECT SUBSTRING(#String, Number, 1) AS Character
) AS c
)
--Step 3.
--Pattern match to return only numbers from the CTE
--and use STRING_AGG to rebuild it into a single string.
SELECT #String = STRING_AGG(Character,'')
FROM Characters
WHERE Character LIKE '[0-9]'
--allows going past the default maximum of 100 loops in the CTE
OPTION (MAXRECURSION 8000)
RETURN #String
END
GO
Testing
Testing for purpose:
SELECT dbo.GetNumbers(InputString) AS Numbers
FROM ( VALUES
('003Preliminary Examination Plan') --output: 003
,('Coordination005') --output: 005
,('Balance1000sheet') --output: 1000
,('(111) 222-3333') --output: 1112223333
,('1.38hello#f00.b4r#\-6') --output: 1380046
) testData(InputString)
Testing for performance:
Start off setting up the test data...
--Add table to hold test data
CREATE TABLE dbo.NumTest (String VARCHAR(8000))
--Make an 8000 character string with mix of numbers and letters
DECLARE #Num VARCHAR(8000) = REPLICATE('12tf56se',800)
--Add this to the test table 500 times
DECLARE #n INT = 0
WHILE #n < 500
BEGIN
INSERT INTO dbo.NumTest VALUES (#Num)
SET #n = #n +1
END
Now testing the dbo.GetNumbers function:
SELECT dbo.GetNumbers(NumTest.String) AS Numbers
FROM dbo.NumTest -- Time to complete: 1 min 7s
Then testing the UDF from the top voted answer on the same data.
SELECT dbo.udf_GetNumeric(NumTest.String)
FROM dbo.NumTest -- Time to complete: 3 mins 12s
Inspiration for dbo.GetNumbers
Decimals
If you need it to handle decimals, you can use either of the following approaches, I found no noticeable performance differences between them.
change '[0-9]' to '[0-9.]'
change Character LIKE '[0-9]' to ISNUMERIC(Character) = 1 (SQL treats a single decimal point as "numeric")
Bonus
You can easily adapt this to differing requirements by swapping out WHERE Character LIKE '[0-9]' with the following options:
WHERE Letter LIKE '[a-zA-Z]' --Get only letters
WHERE Letter LIKE '[0-9a-zA-Z]' --Remove non-alphanumeric
WHERE Letter LIKE '[^0-9a-zA-Z]' --Get only non-alphanumeric

With the previous queries I get these results:
'AAAA1234BBBB3333' >>>> Output: 1234
'-çã+0!\aº1234' >>>> Output: 0
The code below returns All numeric chars:
1st output: 12343333
2nd output: 01234
declare #StringAlphaNum varchar(255)
declare #Character varchar
declare #SizeStringAlfaNumerica int
declare #CountCharacter int
set #StringAlphaNum = 'AAAA1234BBBB3333'
set #SizeStringAlfaNumerica = len(#StringAlphaNum)
set #CountCharacter = 1
while isnumeric(#StringAlphaNum) = 0
begin
while #CountCharacter < #SizeStringAlfaNumerica
begin
if substring(#StringAlphaNum,#CountCharacter,1) not like '[0-9]%'
begin
set #Character = substring(#StringAlphaNum,#CountCharacter,1)
set #StringAlphaNum = replace(#StringAlphaNum, #Character, '')
end
set #CountCharacter = #CountCharacter + 1
end
set #CountCharacter = 0
end
select #StringAlphaNum

declare #puvodni nvarchar(20)
set #puvodni = N'abc1d8e8ttr987avc'
WHILE PATINDEX('%[^0-9]%', #puvodni) > 0 SET #puvodni = REPLACE(#puvodni, SUBSTRING(#puvodni, PATINDEX('%[^0-9]%', #puvodni), 1), '' )
SELECT #puvodni

A solution for SQL Server 2017 and later, using TRANSLATE:
DECLARE #T table (string varchar(50) NOT NULL);
INSERT #T
(string)
VALUES
('003Preliminary Examination Plan'),
('Coordination005'),
('Balance1000sheet');
SELECT
result =
REPLACE(
TRANSLATE(
T.string COLLATE Latin1_General_CI_AI,
'abcdefghijklmnopqrstuvwxyz',
SPACE(26)),
SPACE(1),
SPACE(0))
FROM #T AS T;
Output:
result
003
005
1000
The code works by:
Replacing characters a-z (ignoring case & accents) with a space
Replacing spaces with an empty string.
The string supplied to TRANSLATE can be expanded to include additional characters.

I did not have rights to create functions but had text like
["blahblah012345679"]
And needed to extract the numbers out of the middle
Note this assumes the numbers are grouped together and not at the start and end of the string.
select substring(column_name,patindex('%[0-9]%', column_name),patindex('%[0-9][^0-9]%', column_name)-patindex('%[0-9]%', column_name)+1)
from table name

Although this is an old thread its the first in google search, I came up with a different answer than what came before. This will allow you to pass your criteria for what to keep within a string, whatever that criteria might be. You can put it in a function to call over and over again if you want.
declare #String VARCHAR(MAX) = '-123. a 456-78(90)'
declare #MatchExpression VARCHAR(255) = '%[0-9]%'
declare #return varchar(max)
WHILE PatIndex(#MatchExpression, #String) > 0
begin
set #return = CONCAT(#return, SUBSTRING(#string,patindex(#matchexpression, #string),1))
SET #String = Stuff(#String, PatIndex(#MatchExpression, #String), 1, '')
end
select (#return)

This UDF will work for all types of strings:
CREATE FUNCTION udf_getNumbersFromString (#string varchar(max))
RETURNS varchar(max)
AS
BEGIN
WHILE #String like '%[^0-9]%'
SET #String = REPLACE(#String, SUBSTRING(#String, PATINDEX('%[^0-9]%', #String), 1), '')
RETURN #String
END

Just a little modification to #Epsicron 's answer
SELECT SUBSTRING(string, PATINDEX('%[0-9]%', string), PATINDEX('%[0-9][^0-9]%', string + 't') - PATINDEX('%[0-9]%',
string) + 1) AS Number
FROM (values ('003Preliminary Examination Plan'),
('Coordination005'),
('Balance1000sheet')) as a(string)
no need for a temporary variable

Firstly find out the number's starting length then reverse the string to find out the first position again(which will give you end position of number from the end). Now if you deduct 1 from both number and deduct it from string whole length you'll get only number length. Now get the number using SUBSTRING
declare #fieldName nvarchar(100)='AAAA1221.121BBBB'
declare #lenSt int=(select PATINDEX('%[0-9]%', #fieldName)-1)
declare #lenEnd int=(select PATINDEX('%[0-9]%', REVERSE(#fieldName))-1)
select SUBSTRING(#fieldName, PATINDEX('%[0-9]%', #fieldName), (LEN(#fieldName) - #lenSt -#lenEnd))

T-SQL function to read all the integers from text and return the one at the indicated index, starting from left or right, also using a starting search term (optional):
create or alter function dbo.udf_number_from_text(
#text nvarchar(max),
#search_term nvarchar(1000) = N'',
#number_position tinyint = 1,
#rtl bit = 0
) returns int
as
begin
declare #result int = 0;
declare #search_term_index int = 0;
if #text is null or len(#text) = 0 goto exit_label;
set #text = trim(#text);
if len(#text) = len(#search_term) goto exit_label;
if len(#search_term) > 0
begin
set #search_term_index = charindex(#search_term, #text);
if #search_term_index = 0 goto exit_label;
end;
if #search_term_index > 0
if #rtl = 0
set #text = trim(right(#text, len(#text) - #search_term_index - len(#search_term) + 1));
else
set #text = trim(left(#text, #search_term_index - 1));
if len(#text) = 0 goto exit_label;
declare #patt_number nvarchar(10) = '%[0-9]%';
declare #patt_not_number nvarchar(10) = '%[^0-9]%';
declare #number_start int = 1;
declare #number_end int;
declare #found_numbers table (id int identity(1,1), val int);
while #number_start > 0
begin
set #number_start = patindex(#patt_number, #text);
if #number_start > 0
begin
if #number_start = len(#text)
begin
insert into #found_numbers(val)
select cast(substring(#text, #number_start, 1) as int);
break;
end;
else
begin
set #text = right(#text, len(#text) - #number_start + 1);
set #number_end = patindex(#patt_not_number, #text);
if #number_end = 0
begin
insert into #found_numbers(val)
select cast(#text as int);
break;
end;
else
begin
insert into #found_numbers(val)
select cast(left(#text, #number_end - 1) as int);
if #number_end = len(#text)
break;
else
begin
set #text = trim(right(#text, len(#text) - #number_end));
if len(#text) = 0 break;
end;
end;
end;
end;
end;
if #rtl = 0
select #result = coalesce(a.val, 0)
from (select row_number() over (order by m.id asc) as c_row, m.val
from #found_numbers as m) as a
where a.c_row = #number_position;
else
select #result = coalesce(a.val, 0)
from (select row_number() over (order by m.id desc) as c_row, m.val
from #found_numbers as m) as a
where a.c_row = #number_position;
exit_label:
return #result;
end;
Example:
select dbo.udf_number_from text(N'Text text 10 text, 25 term', N'term',2,1);
returns 10;

This is one of the simplest and easiest one. This will work on the entire String for multiple occurences as well.
CREATE FUNCTION dbo.fn_GetNumbers(#strInput NVARCHAR(500))
RETURNS NVARCHAR(500)
AS
BEGIN
DECLARE #strOut NVARCHAR(500) = '', #intCounter INT = 1
WHILE #intCounter <= LEN(#strInput)
BEGIN
SELECT #strOut = #strOut + CASE WHEN SUBSTRING(#strInput, #intCounter, 1) LIKE '[0-9]' THEN SUBSTRING(#strInput, #intCounter, 1) ELSE '' END
SET #intCounter = #intCounter + 1
END
RETURN #strOut
END

Following a solution using a single common table expression (CTE).
DECLARE #s AS TABLE (id int PRIMARY KEY, value nvarchar(max));
INSERT INTO #s
VALUES
(1, N'003Preliminary Examination Plan'),
(2, N'Coordination005'),
(3, N'Balance1000sheet');
SELECT * FROM #s ORDER BY id;
WITH t AS (
SELECT
id,
1 AS i,
SUBSTRING(value, 1, 1) AS c
FROM
#s
WHERE
LEN(value) > 0
UNION ALL
SELECT
t.id,
t.i + 1 AS i,
SUBSTRING(s.value, t.i + 1, 1) AS c
FROM
t
JOIN #s AS s ON t.id = s.id
WHERE
t.i < LEN(s.value)
)
SELECT
id,
STRING_AGG(c, N'') WITHIN GROUP (ORDER BY i ASC) AS value
FROM
t
WHERE
c LIKE '[0-9]'
GROUP BY
id
ORDER BY
id;

DECLARE #index NVARCHAR(20);
SET #index = 'abd565klaf12';
WHILE PATINDEX('%[0-9]%', #index) != 0
BEGIN
SET #index = REPLACE(#index, SUBSTRING(#index, PATINDEX('%[0-9]%', #index), 1), '');
END
SELECT #index;
One can replace [0-9] with [a-z] if numbers only are wanted with desired castings using the CAST function.

If we use the User Define Function, the query speed will be greatly reduced. This code extracts the number from the string....
SELECT
Reverse(substring(Reverse(rtrim(ltrim( substring([FieldName] , patindex('%[0-9]%', [FieldName] ) , len([FieldName]) )))) , patindex('%[0-9]%', Reverse(rtrim(ltrim( substring([FieldName] , patindex('%[0-9]%', [FieldName] ) , len([FieldName]) )))) ), len(Reverse(rtrim(ltrim( substring([FieldName] , patindex('%[0-9]%', [FieldName] ) , len([FieldName]) ))))) )) NumberValue
FROM dbo.TableName

CREATE OR REPLACE FUNCTION count_letters_and_numbers(input_string TEXT)
RETURNS TABLE (letters INT, numbers INT) AS $$
BEGIN
RETURN QUERY SELECT
sum(CASE WHEN input_string ~ '[A-Za-z]' THEN 1 ELSE 0 END) as letters,
sum(CASE WHEN input_string ~ '[0-9]' THEN 1 ELSE 0 END) as numbers
FROM unnest(string_to_array(input_string, '')) as input_string;
END;
$$ LANGUAGE plpgsql;

For the hell of it...
This solution is different to all earlier solutions, viz:
There is no need to create a function
There is no need to use pattern matching
There is no need for a temporary table
This solution uses a recursive common table expression (CTE)
But first - note the question does not specify where such strings are stored. In my solution below, I create a CTE as a quick and dirty way to put these strings into some kind of "source table".
Note also - this solution uses a recursive common table expression (CTE) - so don't get confused by the usage of two CTEs here. The first is simply to make the data avaliable to the solution - but it is only the second CTE that is required in order to solve this problem. You can adapt the code to make this second CTE query your existing table, view, etc.
Lastly - my coding is verbose, trying to use column and CTE names that explain what is going on and you might be able to simplify this solution a little. I've added in a few pseudo phone numbers with some (expected and atypical, as the case may be) formatting for the fun of it.
with SOURCE_TABLE as (
select '003Preliminary Examination Plan' as numberString
union all select 'Coordination005' as numberString
union all select 'Balance1000sheet' as numberString
union all select '1300 456 678' as numberString
union all select '(012) 995 8322 ' as numberString
union all select '073263 6122,' as numberString
),
FIRST_CHAR_PROCESSED as (
select
len(numberString) as currentStringLength,
isNull(cast(try_cast(replace(left(numberString, 1),' ','z') as tinyint) as nvarchar),'') as firstCharAsNumeric,
cast(isNull(cast(try_cast(nullIf(left(numberString, 1),'') as tinyint) as nvarchar),'') as nvarchar(4000)) as newString,
cast(substring(numberString,2,len(numberString)) as nvarchar) as remainingString
from SOURCE_TABLE
union all
select
len(remainingString) as currentStringLength,
cast(try_cast(replace(left(remainingString, 1),' ','z') as tinyint) as nvarchar) as firstCharAsNumeric,
cast(isNull(newString,'') as nvarchar(3999)) + isNull(cast(try_cast(nullIf(left(remainingString, 1),'') as tinyint) as nvarchar(1)),'') as newString,
substring(remainingString,2,len(remainingString)) as remainingString
from FIRST_CHAR_PROCESSED fcp2
where fcp2.currentStringLength > 1
)
select
newString
,* -- comment this out when required
from FIRST_CHAR_PROCESSED
where currentStringLength = 1
So what's going on here?
Basically in our CTE we are selecting the first character and using try_cast (see docs) to cast it to a tinyint (which is a large enough data type for a single-digit numeral). Note that the type-casting rules in SQL Server say that an empty string (or a space, for that matter) will resolve to zero, so the nullif is added to force spaces and empty strings to resolve to null (see discussion) (otherwise our result would include a zero character any time a space is encountered in the source data).
The CTE also returns everything after the first character - and that becomes the input to our recursive call on the CTE; in other words: now let's process the next character.
Lastly, the field newString in the CTE is generated (in the second SELECT) via concatenation. With recursive CTEs the data type must match between the two SELECT statements for any given column - including the column size. Because we know we are adding (at most) a single character, we are casting that character to nvarchar(1) and we are casting the newString (so far) as nvarchar(3999). Concatenated, the result will be nvarchar(4000) - which matches the type casting we carry out in the first SELECT.
If you run this query and exclude the WHERE clause, you'll get a sense of what's going on - but the rows may be in a strange order. (You won't necessarily see all rows relating to a single input value grouped together - but you should still be able to follow).
Hope it's an interesting option that may help a few people wanting a strictly expression-based solution.

In Oracle
You can get what you want using this:
SUBSTR('ABCD1234EFGH',REGEXP_INSTR ('ABCD1234EFGH', '[[:digit:]]'),REGEXP_COUNT ('ABCD1234EFGH', '[[:digit:]]'))
Sample Query:
SELECT SUBSTR('003Preliminary Examination Plan ',REGEXP_INSTR ('003Preliminary Examination Plan ', '[[:digit:]]'),REGEXP_COUNT ('003Preliminary Examination Plan ', '[[:digit:]]')) SAMPLE1,
SUBSTR('Coordination005',REGEXP_INSTR ('Coordination005', '[[:digit:]]'),REGEXP_COUNT ('Coordination005', '[[:digit:]]')) SAMPLE2,
SUBSTR('Balance1000sheet',REGEXP_INSTR ('Balance1000sheet', '[[:digit:]]'),REGEXP_COUNT ('Balance1000sheet', '[[:digit:]]')) SAMPLE3 FROM DUAL

If you are using Postgres and you have data like '2000 - some sample text' then try substring and position combination, otherwise if in your scenario there is no delimiter, you need to write regex:
SUBSTRING(Column_name from 0 for POSITION('-' in column_name) - 1) as
number_column_name

How to extract numbers from a string using TSQL

I have a string:
#string='TEST RESULTS\TEST 1\RESULT 1
The string/text remains the same except for the numbers
need the 1 from TEST
need 1 from RESULT
to be used in a query like:
SET #sql = "SELECT *
FROM TABLE
WHERE test = (expression FOR CASE 1 resulting IN INT 1)
AND result = (expression FOR CASE 2 resulting IN INT 1)"

Looks like you already have a solution that met your needs but I have a little trick that I use to extract numbers from strings that I thought might benefit someone. It takes advantage of the FOR XML statement and avoids explicit loops. It makes a good inline table function or simple scalar. Do with it what you will :)
DECLARE #String varchar(255) = 'This1 Is2 my3 Test4 For Number5 Extr#ct10n';
SELECT
CAST((
SELECT CASE --// skips alpha. make sure comparison is done on upper case
WHEN ( ASCII(UPPER(SUBSTRING(#String, Number, 1))) BETWEEN 48 AND 57 )
THEN SUBSTRING(#String, Number, 1)
ELSE ''END
FROM
(
SELECT TOP 255 --// east way to get a list of numbers
--// change value as needed.
ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) AS Number
FROM master.sys.all_columns a
CROSS JOIN master.sys.all_columns b
) AS n
WHERE Number <= LEN(#String)
--// use xml path to pivot the results to a row
FOR XML PATH('') ) AS varchar(255)) AS Result
Result ==> 1234510

You can script an sql function which can used through your search queries.
Here is the sample code.
CREATE FUNCTION udf_extractInteger(#string VARCHAR(2000))
RETURNS VARCHAR(2000)
AS
BEGIN
DECLARE #count int
DECLARE #intNumbers VARCHAR(1000)
SET #count = 0
SET #intNumbers = ''
WHILE #count <= LEN(#string)
BEGIN
IF SUBSTRING(#string, #count, 1)>='0' and SUBSTRING (#string, #count, 1) <='9'
BEGIN
SET #intNumbers = #intNumbers + SUBSTRING (#string, #count, 1)
END
SET #count = #count + 1
END
RETURN #intNumbers
END
GO
QUERY :
SELECT dbo.udf_extractInteger('hello 123 world456') As output
OUTPUT:
123456
Referred from : http://www.ittutorials.in/source/sql/sql-function-to-extract-only-numbers-from-string.aspx

Since you have stable text and only 2 elements, you can make good use of replace and parsename:
declare #string varchar(100) = 'TEST RESULTS\TEST 1\RESULT 2'
select cast(parsename(replace(replace(#string, 'TEST RESULTS\TEST ', ''), '\RESULT ', '.'), 2) as int) as Test
, cast(parsename(replace(replace(#string, 'TEST RESULTS\TEST ', ''), '\RESULT ', '.'), 1) as int) as Result
/*
Test Result
----------- -----------
1 2
*/
The replace portion does assume the same text and spacing always, and sets up for parsename with the period.

This method uses SUBSTRING, PARSENAME, and PATINDEX:
SELECT
SUBSTRING(PARSENAME(c,2), PATINDEX('%[0-9]%',PARSENAME(c,2)), LEN(c)) Test,
SUBSTRING(PARSENAME(c,1), PATINDEX('%[0-9]%',PARSENAME(c,1)), LEN(c)) Result
FROM ( SELECT REPLACE(#val, '\', '.') c) t
Use PARSENAME to split the string. The text of the string won't matter -- it will just need to contain the 2 back slashes to parse to 3 elements. Use PATINDEX with a regular expression to replace non-numeric values from the result. This would need adjusting if the text in front of the number ever contained numbers.
If needed, CAST/CONVERT the results to int or the appropriate data type.
Here is some sample Fiddle.
Good luck.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to compare if two strings contain the same words in T-SQL for SQL Server 2008? - sql

Scenario is as follows. You would want to use a TVF to split the first and the second strings on space and then full join the resulting two tables on values and if you have nulls on left or right you've got inequality otherwise they are equal.

You can add a precomputed column in the base table that is evaluated in INSERT/UPDATE trigger (or UDF default) that splits, sorts and then concatenates words from the original column. Then use = to compare these precomputed columns.

There is library called http://www.sqlsharp.com/ that contains a whole range of useful string/math functions. It has a function called String_CompareSplitValues which does precisely what you want. I am not sure if it is in the community version or the paid for version.

Related

How can I count the number of words in a column in SQL Server

Is there a LastIndexOf in SQL Server?

Split words with a capital letter in sql

Query to get only numbers from a string

How to extract numbers from a string using TSQL

Categories

Resources