Unexpected results with PATINDEX

Unexpected results with PATINDEX - sql

I am working on some string manipulation with PATINDEX to fix some incorrect time formatting in XML e.g. (2018-12-20T17:00:00-05:00).
The issue I am having is PATINDEX is finding a match to #Pattern in the #IncorrectMatchIndex string.
You can recreate the issue by running the following:
DECLARE #Pattern nvarchar(36) = '%<EstmatedTime>%T%-%</EstmatedTime>%',
#CorrectMatchIndex nvarchar(100) = '<DiscountedRate>263.34</DiscountedRate><EstmatedTime>2018-12-20T17:00:00-05:00</EstmatedTime></Rate>',
#CorrectMatchIndex2 nvarchar(94) = '<DiscountedRate>263.34</DiscountedRate><EstmatedTime>2018-12-20T17:00:00</EstmatedTime></Rate>',
#IncorrectMatchIndex nvarchar(296) = '<DiscountedRate>263.34</DiscountedRate><EstmatedTime>2018-12-20T17:00:00</EstmatedTime></Rate><Rate><Carrier>FedEx Freight</Carrier><Service>FEDEX_FREIGHT_PRIORITY</Service><PublishedRate>520.6</PublishedRate><DiscountedRate>272.04</DiscountedRate><EstmatedTime>2018-12-18T17:00:00</EstmatedTime>'
SELECT
PATINDEX(#Pattern, #CorrectMatchIndex) AS CorrectMatchIndex,
PATINDEX(#Pattern, #CorrectMatchIndex2) AS CorrectMatchIndex2,
PATINDEX(#Pattern, #IncorrectMatchIndex) AS IncorrectMatchIndex

At a pure guess, I suspect you want:
DECLARE #Pattern nvarchar(300) = '%<EstmatedTime>[1-2][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]-[0-9][0-9]:[0-9][0-9]</EstmatedTime>%'
This then returns 0 for IncorrectMatchIndex.
Of course, the comments are right, you should really be using XQUERY for this. I can't provide a sample for this, however, as none of the XML data you have supplied it valid XML (for example #CorrectMatchIndex ends with '</Rate>' but that node is never opened).

The #IncorrectMatchIndex string does not contain a match to %<EstmatedTime>%T%-%</EstmatedTime>% as far as I can see. There is no dash between the T and closing </EstmatedTime>
Yes there is. Because there is a second set of <EstimatedTime> tags later in the string, and there most certainly is a '-' character between the first T and the last </EstimatedTime>

Related

SQL Server: Return a string in a specific format

In TSQL, I need to format a string in a predefined format.
For eg:
SNO
STRING
FORMAT
OUTPUT
1
A5233GFCOP
*XXXXX-XXXXX
*A5233-GFCOP
2
K92374
/X-000XXXXX
/K-00092374
3
H91543987
XXXXXXXXX
H91543987
I am trying with FORMATMESSAGE() built in function.
For ex:
FORMATMESSAGE('*%s-%s','A5233','GFCOP')
FORMATMESSAGE('/%s-000%s','K','92374')
FORMATMESSAGE('%s','H91543987')
I am able to get the first argument by replace function but issue is second/third/fourth/.. arguments.
I don't know how to count respective X's between the various delimiters, so that I can use substring to pass in second/third/.. arguments. If I can count the respective # of X's from the Format column, I feel using substring we can get it but not sure how to count the respective X's.
Please let me know how to get through it or if there is any other simple approach.
Appreciate your help.
Thanks!

It's in theory quite simple, could probably be done set-based using string_split however that's not ideal as the ordering is not guaranteed. As the strings are fairly short then a scalar function should suffice. I don't think it can use function in-lining.
The logic is very simple, create a counter for each string, loop 1 character at a time and pull a character from one or the other into the output depending on if the format string is an X or not.
create or alter function dbo.fnFormatString(#string varchar(20), #format varchar(20))
returns varchar(20)
as
begin
declare #scount int=1, #fcount int=1, #slen int=len(#string), #flen int=Len(#format), #output varchar(20)=''
while #scount<=#slen or #fcount<=#slen
begin
if Substring(#format,#fcount,1)='X'
begin
set #output+=Substring(#string,#scount,1)
select #scount+=1, #fcount +=1
end
else
begin
set #output+=Substring(#format,#fcount,1)
set #fcount +=1
end
end
return #output
end;
select *, dbo.fnFormatString(string, [format])
from t
See working Fiddle

SQL SERVER, use patIndex to find position of a character to the left of another character position

I have used patindex to find a character position in my nvarchar
declare #pmReportText nvarchar(max)
set #pmReportText = 'lots and lots of text;
declare #emptyTag int
select #emptytag = patindex('%></%',#pmReportText) +1
from that point (value in #emptyTag) I need to find the first match to the left. (patindex('>%<',#pmReportText))
I need the position I get to relate to the current nvarchar (#pmReportText) so I don't think using something like
declare #leftOfTag nvarchar(max)
select #leftOfTag = rtrim(left(#pmReportText, #emptytag-1))
select #leftOfTag = reverse(#leftOfTag)
declare #startEmptyTag int
select #startEmptyTag = patindex('>%<',#leftOfTag)
will work because the position will not relate to the original varchar #pmReport
Is what I am trying to do possible? If so how would I go about doing it?
Edit: below is a snippet of data from the nvarchar i am working with,
Green is the position I am at (#emptyTag), I want to get the position in red.

Yes, what you've started with will lead to a solution.
Your #startEmptyTag will contain the location of the search-character in the REVERSE of the substring, as you know.
Consider then, that this is the equivalent to the distance (number of characters) that the search-character is to the Left of the #emptyTag in the original string.
Now you just have to subtract the distance from the location of the #emptyTag and you've got the location of your search-character. You will have to do some math with the length of the "search-character" if it is more than one character, of course.

SQL statement to check for empty string - T-SQL

I am trying to write a WHERE clause for where a certain string variable is not null or empty. The problem I am running into is that certain non-empty strings equal the N'' literal. For instance:
declare #str nvarchar(max) = N'㴆';
select case when #str = N'' then 1 else 0 end;
Yields 1. From what I can gather on Wikipedia, this particular unicode character is a pictograph for submerging something, which is not semantically equal to an empty string. Also, the string length is 1, at least in T-SQL.
Is there a better (accurate) way to check a T-SQL variable for the empty string?

I found a blog, https://bbzippo.wordpress.com/2013/09/10/sql-server-collations-and-string-comparison-issues/
which explained that
The problem is because the “default” collation setting
(SQL_Latin1_General_CP1_CI_AS) for SQL Server cannot properly compare
Unicode strings that contain so called Supplementary Characters
(4-byte characters).
A fix is to use a collation that doesn't have problems with the supplementary characters. For example:
select case when N'㴆' COLLATE Latin1_General_100_CI_AS_KS_WS = N'' then 1 else 0 end;
will return 0. See the blog for more examples.
Since you are comparing to the empty string, another solution would be to test the string length.
declare #str1 nvarchar(max) =N'㴆';
select case when len(#str1) = 0 then 1 else 0 end;
This will return 0 as expected.
This also yields 0 when the string is null.
EDIT:
Thanks to devio's comment, I dug a bit deeper and found a comment from Erland Sommarskog https://groups.google.com/forum/#!topic/microsoft.public.sqlserver.server/X8UhQaP9KF0
that in addition to not supporting Supplementary Characters, the Latin1_General_CP1_CI_AS collation doesn't handle new Unicode characters correctly. So I'm guessing that the 㴆 character is a new Unicode character.
Specifying the collation Latin1_General_100_CI_AS will also fix this issue.

String or binary data would be truncated - large strings

We have an extremely large nvarchar(max) field that contains html. Within this html is an img tag.
Example:
<img style="float:right" src="data:image/png;base64,/9j/4AAQSkZJRgABAQEBLAEsAAD/7gAOQW....
The length of this column is 1645151, although what is being replace is a bit less than this, but not a lot.
What we are trying to do, is a replace in SQL on the column:
declare #url varchar(50) = 'myimageurl';
UPDATE table SET field =
CAST(REPLACE(CAST(field as NVARCHAR(MAX)),#source,'#url') AS NVARCHAR(MAX))
Where #source, is the above image bytes as string, which are assigned to an nvarchar(max) variable before running the replace. and dest is the url of an image, rather than the images bytes as string.
Although I still get the message string or binary data would be truncated.
Does anyone know if this is possible in SQL to replace strings as large as this.

I had the same error, but on a different function.
The fault was that my pattern has longer than my expression, which means that your search pattern will be truncated.
I hope this helps someone.
Also, make sure you put pattern and expression in the right location of your function.

Instead of doing the replace, can you rebuilt the entire field by parsing out the rest of the img tag?
Something like:
declare #Field nvarchar(max) = '<img style="float:right" src="data:image/png;base64,/9j/4AAQSkZJRgA....BAQEBLAEsAAD/7gAOQW" />'
declare #Source nvarchar(max) = 'data:image/png;base64,/9j/4AAQSkZJRgA....BAQEBLAEsAAD/7gAOQW'
declare #URL nvarchar(max) = 'www.img.img/img.png'
declare #Chars int = 20
select left(#Field,patindex('%' + left(#Source,#Chars) + '%', #Field) - 1) as HTMLStart
,#URL as ImgURL
,right(#Field,len(#Field) - patindex('%' + right(#Source,#Chars) + '%', #Field) - #Chars + 1) as HTMLEnd
If you were wanting to run this on a whole dataset at once, you would simply need to look for the src="data:image/png;base64, element and work backwards from there using a similar methodology to the above. Depends on how you are identifying which binary data to replace and what to replace it with.

SQL Server : Update Table - Data Mask

I have created a data mask that finds a 16 digit number anywhere within a string and replaces all but the last four characters with X's.
But instead of manually setting the string I need to update all data within a column located in a table. Please see my code so far:
DECLARE
#NOTES AS VARCHAR(8000)
SET #NOTES = 'Returns the starting position of the first occurrence of a pattern in a specified expression, 1234567891234567 or zeros if the pattern is not found, on all valid text and character data types'
SELECT
REPLACE(#NOTES, SUBSTRING(#NOTES, PATINDEX('%1%2%3%4%5%6%7%8%9%', #NOTES), 16), 'XXXXXXXXXXXX' + RIGHT(SUBSTRING(#NOTES, PATINDEX('%1%2%3%4%5%6%7%8%9%', #NOTES),16),4)) AS REPLACEMENT
Any help would be much appreciated :-)

Create a function with your logic
CREATE FUNCTION MyMask(
#NOTES VARCHAR(8000))
returns varchar(8000)
BEGIN
RETURN
REPLACE(#NOTES, SUBSTRING(#NOTES, PATINDEX('%1%2%3%4%5%6%7%8%9%', #NOTES), 16), 'XXXXXXXXXXXX' + RIGHT(SUBSTRING(#NOTES, PATINDEX('%1%2%3%4%5%6%7%8%9%', #NOTES),16),4))
END
This is who you use it
update table
set field = dbo.myMask(field)
where some condition

The function provided by Horaciux, works re a static declared string, but the PATINDEX always sets to 0 when used in an update query.
The work around was to amend the implementation of the PATINDEX from PATINDEX('%1%2%3%4%5%6%7%8%9%' to PATINDEX('%[123456789]%' I have included the full function below:
CREATE FUNCTION [dbo].[MyMask](#NOTES VARCHAR(8000)) RETURNS VARCHAR(8000)
BEGIN
RETURN
REPLACE(#NOTES, SUBSTRING(#NOTES, PATINDEX('%[123456789]%', #NOTES), 16), 'XXXXXXXXXXXX' + RIGHT(SUBSTRING(#NOTES, PATINDEX('%[123456789]%', #NOTES),16),4))
END
I hope this is useful to others :-)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Unexpected results with PATINDEX - sql

Related

SQL Server: Return a string in a specific format

SQL SERVER, use patIndex to find position of a character to the left of another character position

SQL statement to check for empty string - T-SQL

String or binary data would be truncated - large strings

SQL Server : Update Table - Data Mask

Categories

Resources