T-SQL How to create function that compares string, checks difference, and do special function - sql

First - sorry for my english. Second - i'm learning t-SQL.
Goal:
I want to get difference between two strings, then check in which column is this difference. If the difference is in first column, do something, if in second column - do something else.
What I'm actually doing
Column 'messages' is a string which contains list of ID. So i am replacing all '#' with ',' and deleting last ',' what gives to me ActualID and BeforeID column. See below:
DECLARE #string VARCHAR(512);
DECLARE #string2 VARCHAR(512);
DECLARE #string3 VARCHAR(512);
SET #string = '41#42#43#44#45#46#47#48#49#50#51#52#53#54#55#56#57#58#59#';
SET #string2 = REPLACE((SELECT messages FROM USERS WHERE userid = 4), '#', ', ' )
SET #string3 = left(#string2, len(#string2) - 1);
SET #string2 = REPLACE(#string, '#', ', ' )
SET #string = left(#string2, len(#string2) - 1);
SELECT #string3 as ActualID, #string as BeforeID
So now, I want compare BeforeID with ActualID. For example:
In BeforeID we have 1, 2, 3 / In ActualID 1, 2, 3, 4
In example above 4 was added. So, if it was added I want to add it to #AddedElements.
If 4, 5, 7 were added then SELECT #AddedElements as AddedElements should return 4, 5, 7 (With comas)
But, that's not all.
If BeforeID = 1, 5, 10, 14 and ActualID = 1, 5, 14 I want, that element which is in BeforeID, but not in AcutalID will be added to #DeletedElements.
So SELECT #DeletedElements as DeletedElements should return 10
Added elements/Deleted elements should be returned once. I mean, full result what I want to Earn should be
SELECT #AddedElements as AddedElements, #DeletedElements as DeletedElements
Is it possible? If, then how to do it?

First of all, I have to start by saying that this is just poor design; but having said that, I've also found myself in all kinds of situations where I couldn't change the way things worked, only try to make them work better in the current configuration. Therefore, I recommend something like this:
1: Create a UDF (User-Defined Function) that can handle splitting the strings and returning them in table-formed data that you can work with:
CREATE FUNCTION [dbo].[UDF_StringDelimiter]
/*********************************************************
** Takes Parameter "LIST" and transforms it for use **
** to select individual values or ranges of values. **
** **
** EX: 'This,is,a,test' = 'This' 'Is' 'A' 'Test' **
*********************************************************/
(
#LIST VARCHAR(8000)
,#DELIMITER VARCHAR(255)
)
RETURNS #TABLE TABLE
(
[RowID] INT IDENTITY
,[Value] VARCHAR(255)
)
WITH SCHEMABINDING
AS
BEGIN
DECLARE
#LISTLENGTH AS SMALLINT
,#LISTCURSOR AS SMALLINT
,#VALUE AS VARCHAR(255)
;
SELECT
#LISTLENGTH = LEN(#LIST) - LEN(REPLACE(#LIST,#DELIMITER,'')) + 1
,#LISTCURSOR = 1
,#VALUE = ''
;
WHILE #LISTCURSOR <= #LISTLENGTH
BEGIN
INSERT INTO #TABLE (Value)
SELECT
CASE
WHEN #LISTCURSOR < #LISTLENGTH
THEN SUBSTRING(#LIST,1,PATINDEX('%' + #DELIMITER + '%',#LIST) - 1)
ELSE SUBSTRING(#LIST,1,LEN(#LIST))
END
;
SET #LIST = STUFF(#LIST,1,PATINDEX('%' + #DELIMITER + '%',#LIST),'')
;
SET #LISTCURSOR = #LISTCURSOR + 1
;
END
;
RETURN
;
END
;
2: Consider dropping the whole "Switching out commas" thing, because it's pointless - the function I've written here takes two arguments: The string itself, and the delimiter (the mini-string that separates the individual strings within the big string, in your case '#') Then you just have to do a couple of quick comparisons to find out what was added and what was deleted.
DECLARE
#AddedElements VARCHAR(255) = ''
,#DeletedElements VARCHAR(255) = ''
,#ActualID VARCHAR(255) = '41#42#43#44#45#46#47#48#49#50#51#52#53#54#55#56#57#58#59#'
,#BeforeID VARCHAR(255) = '41#42#43#44#45#46#47#48#50#51#52#53#54#55#56#57#58#59#60#'
;
SET #AddedElements = #AddedElements +
SUBSTRING(
(
SELECT ', ' + Value
FROM dbo.UDF_StringDelimiter(#ActualID,'#')
WHERE Value NOT IN
(
SELECT Value
FROM dbo.UDF_StringDelimiter(#BeforeID,'#')
)
GROUP BY ', ' + Value
FOR XML PATH('')
)
,3,255)
;
SET #DeletedElements = #DeletedElements +
SUBSTRING(
(
SELECT ', ' + Value
FROM dbo.UDF_StringDelimiter(#BeforeID,'#')
WHERE Value NOT IN
(
SELECT Value
FROM dbo.UDF_StringDelimiter(#ActualID,'#')
)
GROUP BY ', ' + Value
FOR XML PATH('')
)
,3,255)
;
SELECT #AddedElements AS AddedElements,#DeletedElements AS DeletedElements
;
Using this method, if you add a value to #ActualID that does not exist in #BeforeID, it will show up in #AddedElements.
Likewise , if you remove an element from #ActualID that had previously existed in #BeforeID, it will show up in #DeletedElements.
All of this is, of course, assuming that the dynamic string (the one really being compared here) is the #ActualID. I operated with the understanding that #BeforeID is actually a stored value in the DB, and #ActualID is a dynamic string being passed in from...somewhere. If this is wrong, update me and I'll change the tactic appropriately.
Quick note: It's important to me to point out that this is just one way of dealing with a situation like this, and I'm sure there are better ways; but with the information I have, it's the best I could come up with without spending too much time and energy on it.

Related

How to create a function to split date and time from a string in SQL?

How can I remove value before '_' and show date and time in one row in TSQL Function?
Below is sample:
Declare #inputstring as varchar(50) = 'Studio9_20230126_203052' ;
select value from STRING_SPLIT( #inputstring ,'_')
Output Required: 2023-01-26 20:30:52.000
If we can safely assume that the value is always in the format {Some String}_{yyyyMMdd}_{hhmmss} then you can use STUFF a few times, firstly to remove the leading string up to the first underscore (_) character (using CHARINDEX to find that character), and then to inject 2 colon (:) characters. Finally you can REPLACE the remaining underscore with a space ( ), and then use TRY_CONVERT to attempt to convert the value to a datetime2(0).
DECLARE #inputstring varchar(50) = 'Studio9_20230126_203052';
SELECT TRY_CONVERT(datetime2(0),REPLACE(STUFF(STUFF(STUFF(#inputstring,1,CHARINDEX('_',#inputstring),''),14,0,':'),12,0,':'),'_',' '));
Note that this doesn't give the value you state you want in your question (2023-01-26 20:05:52.000) , but I assume this is a typographical error, and that the 05 for minutes should be 30.
Creating function
CREATE FUNCTION [dbo].[convert_to_date] (#inputstring NVARCHAR(MAX))
RETURNS DATETIME AS
BEGIN
DECLARE #finalString varchar(50), #out varchar(100)
SET #finalString = REPLACE ( (SUBSTRING (#inputstring, CHARINDEX('_', #inputstring)+1 , LEN(#inputstring))), '_', ' ')
--SELECT #finalString
SET #out = LEFT (#finalString, 4) + '-'
+ SUBSTRING(#finalString, 5, 2) + '-'
+ SUBSTRING(#finalString, 7, 2) + ' '
+ SUBSTRING(#finalString, 10, 2) + ':'
+ SUBSTRING(#finalString, 12, 2) + ':'
+ SUBSTRING(#finalString, 14, 2) + '.000'
RETURN #out
END
Select Query
SELECT dbo.[convert_to_date] ('Studio54541659_20230126_203052')
Output
2023-01-26 20:30:52.000
This will tolerate "somestring" in the format of "somestring_YYYYMMDD_HHMISS" being variable in length.
Declare #inputstring as varchar(50) = 'Studio9_20230126_203052' ;
SELECT DateAndTime = CONVERT(DATETIME,STUFF(STUFF(STUFF(v2.DT,14,0,':'),12,0,':'),9,1,' '))
,Identifier = LEFT(#inputstring,v1.Pos1-1) --Included this because I know how people are :D --Comment out if not wanted.
,Original = #inputstring --Original string just for checking. Comment out when happy.
FROM (VALUES(CHARINDEX('_',#inputstring)))v1(Pos1) --Position of first Underscore
CROSS APPLY (VALUES(SUBSTRING(#inputstring,v1.Pos1+1,50)))v2(DT) --String after first Underscore
;
Output looks like this and you end up with a DATETIME datatype. Comment out what you don't want for columns in the return.
I'll let you have some of the fun by converting it into an iTVF (inline Table Valued Function). Remember that any function that contains a "BEGIN" is ultimately going to be a part of a performance issue so make sure it's an iTVF :D
EDIT: Crud... I've gotta remember to scroll down. #Lamu already posted the same thing but it's probably better and fast if you just want the time and not the identifier I included.

SQL Server 2012 string functions

I have a field that can vary in length of the format CxxRyyy where x and y are numeric. I want to choose xx and yyy. For instance, if the field value is C1R12, then I want to get 1 and 12. if I use substring and charindex then I have to use a length, but I would like to use a position like
SUBSTRING(WPLocationNew, CHARINDEX('C',WPLocationNew,1)+1, CHARINDEX('R',WPLocationNew,1)-1)
or
SUBSTRING(WPLocationNew, CHARINDEX('C',WPLocationNew,1)+1, LEN(WPLocationNew) - CHARINDEX('R',WPLocationNew,1))
to get x, but I know that doesn't work. I feel like there is a fairly simple solution, but I am not coming up with it yet. Any suggestions
If these are cell references and will always be in the form C{1-5 digits}R{1-5 digits} you can do this:
DECLARE #t TABLE(Original varchar(32));
INSERT #t(Original) VALUES ('C14R4535'),('C1R12'),('C57R123');
;WITH src AS
(
SELECT Original, c = REPLACE(REPLACE(Original,'C',''),'R','.')
FROM #t
)
SELECT Original, C = PARSENAME(c,2), R = PARSENAME(c,1)
FROM src;
Output
Original
C
R
C14R4535
14
4535
C1R12
1
12
C57R123
57
123
Example db<>fiddle
If you need to protect against other formats, you can add
FROM #t WHERE Original LIKE 'C%[0-9]%R%[0-9]%'
AND PATINDEX('%[^C^R^0-9]%', Original) = 0
Updated db<>fiddle
It appears that you are attempting to parse an Excel cell reference. Those are predictably structured or I wouldn't suggest such an embarrassing hack as this.
Basically, take advantage of the fact that a try_cast in SQL ignores spaces when converting strings to numbers.
declare #val as varchar(20) = 'C1R12'
declare #newval as varchar(20)
declare #c as smallint
declare #r as smallint
--replace the C with 5 spaces
set #newval = replace(#val,'C',' ')
--replace the R with 5 spaces
set #newval = replace(#newval,'R',' ')
--take a look at the intermediate result, which is ' 1 14'
select #newval
set #c = try_cast(left(#newval,11) as smallint)
set #r = try_cast(right(#newval,6) as smallint)
--take a look at the results... two smallint, 1 and 14
select #c, #r
That can all be accomplished in one line for each element (a line for column and a line for row) but I wanted you to be able to understand what was happening so this example goes through the steps individually.
Here's yet another way:
declare #val as varchar(20) = 'C12R345'
declare #c as varchar(5)
declare #r as varchar(5)
set #c = SUBSTRING(#val, patindex('C%', #val)+1,(patindex('%R%', #val)-1)-patindex('C%', #val) )
set #r = SUBSTRING(#val, patindex('%R%', #val)+1, LEN(#val) -patindex('%R%', #val))
select cast(#c as int) as 'C', cast(#r as int) as 'R'
dbfiddle
There are lots of different ways to approach string parsing. Here's just one possible idea:
declare #s varchar(10) = 'C01R002';
select
rtrim( left(replace(stuff(#s, 1, 1, ''), 'R', ' '), 10)) as c,
ltrim(right(replace(substring(#s, 2, 10), 'R', ' '), 10)) as r
Strip out the 'C' and then replace the 'R' with enough spaces so that the left and right sides can be extracted using a fixed length and then easily trimmed back.
stuff() and substring() as used above are just different ways accomplish exactly the same thing. One advantage here is that it does use fairly portable string functions and it's conceivable that this is somewhat faster. This is also done inline and without multiple steps.

How to identify and redact all instances of a matching pattern in T-SQL

I have a requirement to run a function over certain fields to identify and redact any numbers which are 5 digits or longer, ensuring all but the last 4 digits are replaced with *
For example: "Some text with 12345 and 1234 and 12345678" would become "Some text with *2345 and 1234 and ****5678"
I've used PATINDEX to identify the the starting character of the pattern:
PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', TEST_TEXT)
I can recursively call that to get the starting character of all the occurrences, but I'm struggling with the actual redaction.
Does anyone have any pointers on how this can be done? I know to use REPLACE to insert the *s where they need to be, it's just the identification of what I should actually be replacing I'm struggling with.
Could do it on a program, but I need it to be T-SQL (can be a function if needed).
Any tips greatly appreciated!
You can do this using the built in functions of SQL Server. All of which used in this example are present in SQL Server 2008 and higher.
DECLARE #String VARCHAR(500) = 'Example Input: 1234567890, 1234, 12345, 123456, 1234567, 123asd456'
DECLARE #StartPos INT = 1, #EndPos INT = 1;
DECLARE #Input VARCHAR(500) = ISNULL(#String, '') + ' '; --Sets input field and adds a control character at the end to make the loop easier.
DECLARE #OutputString VARCHAR(500) = ''; --Initalize an empty string to avoid string null errors
WHILE (#StartPOS <> 0)
BEGIN
SET #StartPOS = PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #Input);
IF #StartPOS <> 0
BEGIN
SET #OutputString += SUBSTRING(#Input, 1, #StartPOS - 1); --Seperate all contents before the first occurance of our filter
SET #Input = SUBSTRING(#Input, #StartPOS, 500); --Cut the entire string to the end. Last value must be greater than the original string length to simply cut it all.
SET #EndPos = (PATINDEX('%[0-9][0-9][0-9][0-9][^0-9]%', #Input)); --First occurance of 4 numbers with a not number behind it.
SET #Input = STUFF(#Input, 1, (#EndPos - 1), REPLICATE('*', (#EndPos - 1))); --#EndPos - 1 gives us the amount of chars we want to replace.
END
END
SET #OutputString += #Input; --Append the last element
SET #OutputString = LEFT(#OutputString, LEN(#OutputString))
SELECT #OutputString;
Which outputs the following:
Example Input: ******7890, 1234, *2345, **3456, ***4567, 123asd456
This entire code could also be made as a function since it only requires an input text.
A dirty solution with recursive CTE
DECLARE
#tags nvarchar(max) = N'Some text with 12345 and 1234 and 12345678',
#c nchar(1) = N' ';
;
WITH Process (s, i)
as
(
SELECT #tags, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', #tags)
UNION ALL
SELECT value, PATINDEX('%[0-9][0-9][0-9][0-9][0-9]%', value)
FROM
(SELECT SUBSTRING(s,0,i)+'*'+SUBSTRING(s,i+4,len(s)) value
FROM Process
WHERE i >0) calc
-- we surround the value and the string with leading/trailing ,
-- so that cloth isn't a false positive for clothing
)
SELECT * FROM Process
WHERE i=0
I think a better solution it's to add clr function in Ms SQL Server to manage regexp.
sql-clr/RegEx
Here is an option using the DelimitedSplit8K_LEAD which can be found here. https://www.sqlservercentral.com/articles/reaping-the-benefits-of-the-window-functions-in-t-sql-2 This is an extension of Jeff Moden's splitter that is even a little bit faster than the original. The big advantage this splitter has over most of the others is that it returns the ordinal position of each element. One caveat to this is that I am using a space to split on based on your sample data. If you had numbers crammed in the middle of other characters this will ignore them. That may be good or bad depending on you specific requirements.
declare #Something varchar(100) = 'Some text with 12345 and 1234 and 12345678';
with MyCTE as
(
select x.ItemNumber
, Result = isnull(case when TRY_CONVERT(bigint, x.Item) is not null then isnull(replicate('*', len(convert(varchar(20), TRY_CONVERT(bigint, x.Item))) - 4), '') + right(convert(varchar(20), TRY_CONVERT(bigint, x.Item)), 4) end, x.Item)
from dbo.DelimitedSplit8K_LEAD(#Something, ' ') x
)
select Output = stuff((select ' ' + Result
from MyCTE
order by ItemNumber
FOR XML PATH('')), 1, 1, '')
This produces: Some text with *2345 and 1234 and ****5678

SSMS replace all commas outside of quotation marks in string

I've written the following function in SSMS to replace any commas that are outside of quotation marks with ||||:
CREATE FUNCTION dbo.fixqualifier (#string nvarchar(max))
returns nvarchar(max)
as begin
DECLARE #STRINGTOPAD NVARCHAR(MAX)
DECLARE #position int = 1,#newstring nvarchar(max) ='',#QUOTATIONMODE INT = 0
WHILE(LEN(#string)>0)
BEGIN
SET #STRINGTOPAD = SUBSTRING(#string,0,IIF(#STRING LIKE '%"%',CHARINDEX('"',#string),LEN(#STRING)))
SET #newstring = #newstring + IIF(#QUOTATIONMODE = 1, REPLACE(#STRINGTOPAD,',','||||'),#STRINGTOPAD)
SET #QUOTATIONMODE = IIF(#QUOTATIONMODE = 1,0,1)
set #string = SUBSTRING(#string,1+IIF(#STRING LIKE '%"%',CHARINDEX('"',#string),LEN(#STRING)),LEN(#string))
END
return #newstring
end
The idea is for the function to find the first ", replace all ',' before that then switch to quotation mode 1 so it knows to not replace the , until it changes back to quotation mode 0 when it hits the 2nd " and so on.
so for example the string:
qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl
would become:
qwer||||tyu||||io||||asd||||"edffs,asdfgh"||||"jjkzx"||||kl
It works as expected but it's really inefficient when it comes to doing this for several thousand rows.
Is there a better way or doing this or at least speeding the function up.
Do a simple trick by Modulus
DECLARE #VAR VARCHAR(100) = 'qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl'
,#OUTPUT VARCHAR(100) = '';
SELECT #OUTPUT = #OUTPUT + CASE WHEN (LEN(#OUTPUT) - LEN(REPLACE(#OUTPUT, '"', ''))) % 2 = 0
THEN REPLACE(VAL, ',', '||||') ELSE VAL END
FROM (
SELECT SUBSTRING(#VAR, NUMBER, 1) VAL
FROM master.dbo.spt_values
WHERE type = 'P'
AND NUMBER BETWEEN 1 AND LEN(#VAR)
) A
PRINT #OUTPUT
Result:
qwer||||tyu||||io||||asd||||"edffs,asdfgh"||||"jjkzx"||||kl
By this LEN(#OUTPUT) - LEN(REPLACE(#OUTPUT, '"', '')) expression, you will get count of ". By taking Modulus of the count %2, if it is zero its even then you can replace commas, otherwise you will keep them.
This uses DelimitedSplit8k and completely avoids any RBAR methods (such as a WHILE or #Variable = #Variable +... (which is a hidden form of RBAR)).
It firstly splits on the quotation, and then on the commas, where the string isn't quoted. Finally it then puts the strings back together again, using the "old" STUFF and FOR XML PATH method:
USE Sandbox;
DECLARE #String varchar(8000) = 'qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl';
WITH Splits AS(
SELECT QS.ItemNumber AS QuoteNumber, CS.ItemNumber AS CommaNumber, ISNULL(CS.Item, '"' + QS.Item + '"') AS DelimitedItem
FROM dbo.DelimitedSplit8K(#string,'"') QS
OUTER APPLY (SELECT *
FROM dbo.DelimitedSplit8K(QS.Item,',')
WHERE QS.ItemNumber % 2 = 1) CS
WHERE QS.Item <> ',')
SELECT STUFF((SELECT '||||' + S.DelimitedItem
FROM Splits S
ORDER BY S.QuoteNumber, S.CommaNumber
FOR XML PATH('')),1,1,'') AS DelimitedList;
(Note, DelimitedSplit8K does not accept more than 8,000 characters. If you have more than that, SQL Server is really not the right tool. STRING_SPLIT does not provide the ordinal position, so you would be unable to guarantee the rebuild order with it.)

Extract a number from String in SQL

I have the following string:
"FLEETWOOD DESIGNS 535353110XXXXX" (The X's are actually numbers I just wanted to hide them here)
Does anyone know how can I search through Strings in SQL and extract numbers that are greater then lets say 10 characters long?
This a quite old post but might help anyone else. I was searching for an user defined function in SQL Server to extract only the numbers of a given string, and, surprisingly I could not find exactly what I was looking for.
Let me put here the code of a function to "Extract a number from string in SQL" (valid for SQL Server). This is taken from the fantastic blog of Pinal Dave, I've modified it just to return NULL is a NULL value is passed to the function.
CREATE FUNCTION [dbo].[ExtractInteger](#String VARCHAR(2000))
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE #Count INT
DECLARE #IntNumbers VARCHAR(1000)
SET #Count = 0
SET #IntNumbers = ''
IF #String IS NULL
RETURN NULL;
WHILE #Count <= LEN(#String)
BEGIN
IF SUBSTRING(#String,#Count,1) >= '0' AND SUBSTRING(#String,#Count,1) <= '9'
BEGIN
SET #IntNumbers = #IntNumbers + SUBSTRING(#String,#Count,1)
END
SET #Count = #Count + 1
END
RETURN #IntNumbers
END
Tests
select '"' + dbo.ExtractInteger('1a2b3c4d5e6f7g8h9i') + '"'
GO
select '"' + dbo.ExtractInteger('abcdefghi') + '"'
GO
select '"' + dbo.ExtractInteger(NULL) + '"'
GO
select '"' + dbo.ExtractInteger('') + '"'
GO
Results
"123456789"
""
NULL
""
You don't mention the DB engine, so we don't know what features are available...
If regexpressions are available then pattern like \d{10,} would match numbers with 10 or more digit.
In mySQL REGEXP can only return true or false (0 or 1) so you'd have to use some ugly hack like
SELECT
LEAST(
INSTR(field,'0'),
INSTR(field,'1'),
INSTR(field,'2'),
INSTR(field,'3'),
INSTR(field,'4'),
INSTR(field,'5'),
INSTR(field,'6'),
INSTR(field,'7'),
INSTR(field,'8'),
INSTR(field,'9')
) AS startPos,
REVERSE(field) AS backward,
LEAST(
INSTR(backward,'0'),
INSTR(backward,'1'),
INSTR(backward,'2'),
INSTR(backward,'3'),
INSTR(backward,'4'),
INSTR(backward,'5'),
INSTR(backward,'6'),
INSTR(backward,'7'),
INSTR(backward,'8'),
INSTR(backward,'9')
) AS endPos,
SUBSTRING(field, startPos, endPos - startPos + 1)
FROM tab
WHERE(field REGEXP '[0-9]{10,}')
but this isn't perfect - it would extract false substring for string like "ABC 9 A 1234567891", not to mention that it is probably so slooooow that it is faster to go througt data by hand.
SUBSTRING('FLEETWOOD DESIGNS 535353110XXXXX', 18, 32)
You could also use LEN() to get the length of the string itself. If you know the serial number length, you can just subtract that from the end index to get your start index of the substring.
It could be done like this
Declare #X varchar(100)
Select #X= 'Here is where15234Numbers'
--
Select #X= SubString(#X,PATINDEX('%[0-9]%',#X),Len(#X))
Select #X= SubString(#X,0,PATINDEX('%[^0-9]%',#X))
--// show result
Select #X