Apply a Mask to Format a String in SQL Server Query/View - sql

Is there a neat way to apply a mask to a string in a SQL Server query?
I have two tables, one with Phone number stored as varchar with no literals 0155567890 and a phone type, which has a mask for that phone number type: (##) #### ####
What is the best way to return a string (for a merge Document) so that the query returns the fully formatted phone number:
(01) 5556 7890

As noted in the comment, my original answer below will result in terrible performance if used in a large number of rows. i-one's answer is preferred if performance is a consideration.
I needed this also, and thanks to Sjuul's pseudocode, I was able to create a function to do this.
CREATE FUNCTION [dbo].[fx_FormatUsingMask]
(
-- Add the parameters for the function here
#input nvarchar(1000),
#mask nvarchar(1000)
)
RETURNS nvarchar(1000)
AS
BEGIN
-- Declare the return variable here
DECLARE #result nvarchar(1000) = ''
DECLARE #inputPos int = 1
DECLARE #maskPos int = 1
DECLARE #maskSign char(1) = ''
WHILE #maskPos <= Len(#mask)
BEGIN
set #maskSign = substring(#mask, #maskPos, 1)
IF #maskSign = '#'
BEGIN
set #result = #result + substring(#input, #inputPos, 1)
set #inputPos += 1
set #maskPos += 1
END
ELSE
BEGIN
set #result = #result + #maskSign
set #maskPos += 1
END
END
-- Return the result of the function
RETURN #result
END

Just in case someone ever needs a table-valued function.
Approach 1 (see #2 for a faster version)
create function ftMaskPhone
(
#phone varchar(30),
#mask varchar(50)
)
returns table as
return
with ci(n, c, nn) as (
select
1,
case
when substring(#mask, 1, 1) = '#' then substring(#phone, 1, 1)
else substring(#mask, 1, 1)
end,
case when substring(#mask, 1, 1) = '#' then 1 else 0 end
union all
select
n + 1,
case
when substring(#mask, n + 1, 1) = '#' then substring(#phone, nn + 1, 1)
else substring(#mask, n + 1, 1)
end,
case when substring(#mask, n + 1, 1) = '#' then nn + 1 else nn end
from ci where n < len(#mask))
select (select c + '' from ci for xml path(''), type).value('text()[1]', 'varchar(50)') PhoneMasked
GO
Then apply it as
declare #mask varchar(50)
set #mask = '(##) #### ####'
select pm.PhoneMasked
from Phones p
outer apply ftMaskPhone(p.PhoneNum, #mask) pm
Approach 2
I'm going to leave the above version for historical purposes. However, this one has better performance.
CREATE FUNCTION dbo.ftMaskPhone
(
#phone varchar(30),
#mask varchar(50)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
WITH v1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
v2(N) AS (SELECT 1 FROM v1 a, v1 b),
v3(N) AS (SELECT TOP (ISNULL(LEN(#mask), 0)) ROW_NUMBER() OVER (ORDER BY ##SPID) FROM v2),
v4(N, C) AS (
SELECT N, ISNULL(SUBSTRING(#phone, CASE WHEN c.m = 1 THEN ROW_NUMBER() OVER (PARTITION BY c.m ORDER BY N) END, 1), SUBSTRING(#mask, v3.N, 1))
FROM v3
CROSS APPLY (SELECT CASE WHEN SUBSTRING(#mask, v3.N, 1) = '#' THEN 1 END m) c
)
SELECT MaskedValue = (
SELECT c + ''
FROM v4
ORDER BY N
FOR XML PATH(''), TYPE
).value('text()[1]', 'varchar(50)')
);
GO
Schema binding, in combination with this being a single-statement table-valued-function, makes this version eligible for inlining by the query optimizer. Implement the function using a CROSS APPLY as in the example above, or for single values, like this:
SELECT *
FROM dbo.ftMaskPhone('0012345678910', '### (###) ###-####')
Results look like:
MaskedValue
001 (234) 567-8910

This is just what came up in my head. I don't know whether it's the best solution but I think it should be workable.
Make a function with the name applyMask (orso)
Pseudocode:
WHILE currentPosition < Length(PhoneNr) AND safetyCounter < Length(Mask)
IF currentSign = "#"
result += Mid(PhoneNr, currentPosition, 1)
currentPosition++
ELSE
result += currentSign
safetyCounter++
END
END
Return result

As noted by #Sean, SQL Server 2012 and up supports the FORMAT function, which almost gives you what you need, with the following caveats:
It takes a number to format, rather than a VARCHAR. This could be worked around by using a CAST.
The mask as provided ((##) #### ####), coupled with a CAST would remove the leading zero, leaving you with (1) 5556 7890. You could update the mask to (0#) #### ####. Going on a limb that you're representing an Australian phone number, it seems that the leading 0 is always there anyways:
Within Australia, to access the "Number" of a landline telephone in an "Area" other than that in which the caller is located (including a caller using a "Mobile" 'phone), firstly it is necessary to dial the Australian "Trunk Access Code" of 0 plus the "Area" code, followed by the "Local" Number. Thus, the "Full National Number" (FNN) has ten digits: 0x xxxx xxxx.
But ultimately, I would argue that SQL Server is not the best place to handle representation/formatting of your data (as with dates, so with phone numbers). I would recommend doing this client-side using something like Google's libphonenumber. When a phone number is entered into the database, you could store the phone number itself and the country to which it belongs, which you could then use when displaying the phone number (or doing something like calling it or checking for validity).

There is the built in FORMAT function, which almost works. Unfortunately it takes an int as the first parameter, so it strips off the leading zero:
select format(0155567890 ,'(##) #### ####')
(1) 5556 7890

If you need to "mask", rather hide the real value with another, and then "unmask" a string you can try this function, or extend it for that matter. :)
https://stackoverflow.com/a/22023329/2175524

I wanted to hide some information, so i used RIGHT function. It shows only first 4 chars from right side.
CONCAT('xxx-xx-', RIGHT('03466045896', 4))
Above code will show "xxx-xx-5896"

Related

SQL change a string using a pattern

I need to do something special in SQL, I don't know if a standard function exists, I actually don't know what to search... ! So any advice would be appreciated.
Here is my problem:
I have a data which is a number: 7000000
And I have a "formatting pattern": ****5**
My goal is to merge both: result for this example is: 7000500
(a star means to keep the original value, and a number means to change it)
another example:
7894321
*0**9*1
-------
7094921
(I use SQL Server)
This task can be performed in any programming language with basic for-loop and and some internal functions to find substring and replace
Here is how it's done in SQL SERVER (given that the string and the format is of same length)
Create your own function
CREATE FUNCTION dbo.Formatter
(
#str NVARCHAR(MAX),
#format NVARCHAR(MAX)
)
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #i int = 1, #len int = LEN(#str)
-- Iterates over over each char in the FORMAT and replace the original string if applies
WHILE #i <= #len
BEGIN
IF SUBSTRING(#format, #i, 1) <> '*'
SET #str = STUFF(#str, #i, 1, SUBSTRING(#format, #i, 1))
SET #i = #i + 1
END
RETURN #str
END
USE your function in your SELECTs, e.g.
DECLARE #str VARCHAR(MAX) = '7894321'
DECLARE #format VARCHAR(MAX) = '*0**9*1'
PRINT('Format: ' + #format)
PRINT('Old string: ' + #str)
PRINT('New string: ' + dbo.Formatter(#str, #format))
Result:
Format: *0**9*1
Old string: 7894321
New string: 7094921
I would split the string into it's individual characters, use a CASE expression to determine what character should be retained, and then remerge. I'm going to assume you're on a recent version of SQL Server, and thus have access to STRING_AGG; if not, you'll want to use the "old" FOR XML PATH method. I also assume a string of length 10 of less (if it's more, then just increase the size of the tally).
DECLARE #Format varchar(10) = '****5**';
WITH Tally AS(
SELECT V.I
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10))V(I))
SELECT STRING_AGG(CASE SSf.C WHEN '*' THEN SSc.C ELSE SSf.C END,'') WITHIN GROUP (ORDER BY T.I) AS NewString
FROM (VALUES(1,'7894321'),(2,'7000000'))YT(SomeID,YourColumn) --This would be your table
JOIN Tally T ON LEN(YT.YourColumn) >= T.I
CROSS APPLY (VALUES(SUBSTRING(YT.YourColumn,T.I,1)))SSc(C)
CROSS APPLY (VALUES(SUBSTRING(#Format,T.I,1)))SSf(C)
GROUP BY YT.SomeID;
db<>fiddle
If your value is an int and your format is an int also, then you can do this:
DECLARE #formatNum int = CAST(REPLACE(#format, '*', '0') AS int);
SELECT f.Formatted
FROM Table
CROSS APPLY (
SELECT Formatted = SUM(CASE
WHEN ((#FormatNum / Base) % 10) > 0
THEN ((#FormatNum / Base) % 10) * Base
ELSE ((t.MyNumber / Base) % 10) * Base END
FROM (VALUES
(10),(100),(1000),(10000),(100000),(1000000),(10000000),(100000000),(1000000000)
) v(Base)
) f
The / is integer division, % is modulo, so dividing a number by Base and taking the modulo 10 will give you exactly one digit.

mssql cdc update_mask filter changes made only in column TS

I want to find the rows in my mssql cdc table where only the column "TS" has been changed.
So I found some logic to check if a specific column was changed (this works), but I need to check if only the column TS was changed:
SET #colorder = sys.fn_cdc_get_column_ordinal('dbo_mytable', 'TS')
SELECT case when substring([__$update_mask],len([__$update_mask]) - ((#colorder-1)/8),1) & power(2,(#colorder-1)%8) > 0 then 1 else 0 end
FROM cdc.fn_cdc_get_all_changes_dbo_MYTABLE(#from_lsn, #to_lsn, 'all') PD
I've been meaning to write functions like this for a while, thanks for giving me a reason to actually do it.
Please do some unit testing of your own, I have only done a few very basic checks
-- inline tabular function because it's more versatile
-- cross applies used purely for ease of reading,
-- could just make nested calls but hard to read. Up to you.
-- pass null to flip, otherwise pass explicit value you want the bit to be set to
create function dbo.setbit(#b varbinary(128), #bitpos tinyint, #value bit = null)
returns table as
return
(
select result = cast(result.val as varbinary(128))
from (select len(#b) - ((#bitpos - 1) / 8)) bytepos(val)
cross apply (select substring(#b, bytepos.val, 1)) byte(val)
cross apply (select power(2, (#bitpos - 1) % 8)) mask(val)
cross apply (
select cast
(
case #value
when 1 then byte.val | mask.val
when 0 then byte.val & ~mask.val
else byte.val ^ mask.val
end
as binary(1)
)
) newbyte(val)
cross apply (select stuff(#b, bytepos.val, 1, newbyte.val)) result(val)
);
-- scalar wrapper for tabular function
create function dbo.setbitscalar(#b varbinary(128), #bitpos tinyint, #value bit = null)
returns varbinary(128) as
begin
return (select result from dbo.setbit(#b, #bitpos, #value));
end
-- how it works
declare #b varbinary(128) = 0x0101 -- 2 bytes!
select
dbo.setbitscalar(#b, 1, 1), -- set bit 1 to 1
dbo.setbitscalar(#b, 1, 0), -- set bit 1 to 0
dbo.setbitscalar(#b, 1, default) -- flip bit 1
-- how to use it in your case:
-- set the #colorder bit in the mask to zero,
-- then see if the resulting mask is zero
-- if it is, then only TS changed
SET #colorder = sys.fn_cdc_get_column_ordinal('dbo_mytable', 'TS')
select only_TS_changed = iif(t.result = 0x, 1, 0)
from cdc.fn_cdc_get_all_changes_dbo_MYTABLE(#from_lsn, #to_lsn, 'all') PD
cross apply dbo.setbit(PD.[__$update_mask], #colorder, 0) t

TSQL - Split GUID/UNIQUEIDENTIFIER

Case: We have smart guids in a table and need to extract 2nd and 4th parts out of it. I was thinking about writing a function that can take in #partnumber and return the extracted value for it.
e.g.
DECLARE #Guid UNIQUEIDENTIFIER = 'A7DDAA60-C33A-4D7A-A2D8-ABF20127C9AE'
1st part = A7DDAA60, 2nd part = C33A, 3rd part = 4D7A, 4th part =
A2D8, and 5th part = ABF20127C9AE
Based on the #partnumber, it would return one of those values.
I'm trying to figure out how to split it most efficiently (STRING_SPLIT doesn't guarantee order).
I am not sure exactly what you mean by "smart" guids, but why not just cast it to a char and pull out the parts by position?
create table t(myguid uniqueidentifier);
declare #p tinyint = 5;
select case #p
when 1 then left(c.v, 8)
when 2 then substring(c.v, 10, 4)
when 3 then substring(c.v, 15, 4)
when 4 then substring(c.v, 20, 4)
when 5 then right(c.v, 12)
end
from t
cross apply (select cast(t.myguid as char(36))) c(v)
You can use, OPENJSON
DECLARE #Guid UNIQUEIDENTIFIER = 'A7DDAA60-C33A-4D7A-A2D8-ABF20127C9AE',
#s varchar(100)
Select #s = replace(#guid,'-','","')
Select * from
(
Select [key] + 1 as Poistion, Value as Part
FROM OPENJSON('["' + #s + '"]')
) Q
Where Poistion in (2,4)
Here is the fiddle.

Parsing / Indexing a Binary String in SQL Server

I have searched extensively for a relevant answer, but none quite satisfy what I need to be doing.
For our purposes I have a column with a 50 character binary string. In our database, it is actually hundreds of characters long.
There is one string for each unique item ID in our database. The location of each '1' flags a specific criteria being true, and a '0' false, so the indexed location of the ones and zeros are very important. Mostly, I care about where the 1's are.
I am not updating any databases, so I first decided to try and make a loop to look through each string and create a list of the 1's locations.
declare #binarystring varchar(50) = '10000010000110000001000000000000000000000000000001'
declare #position int = 0
declare #list varchar(200) = ''
while (#position <= len(#binarystring))
begin
set #position = charindex('1', #binarystring, #position)
set #list = #list + ', ' + convert(varchar(10),#position)
set #position = charindex('1', #binarystring, #position)+1
end
select right(#list, len(#list)-2)
This creates the following list:
1, 7, 12, 13, 20, 50
However, the loop will bomb if there is not a '1' at the end of the string, as I am searching through the string via occurrences of 1's rather than one character at a time. I am not sure how satisfy the break criteria when the loop would normally reach the end of the string, without there being a 1.
Is there a simple solution to my loop bombing, and should I even be looping in the first place?
I have tried other methods of parsing, union joining, indexing, etc, but given this very specific set of circumstances I couldn't find any combination that did quite what I needed. The above code is the best I've got so far.
I don't specifically need a comma delimited list as an output, but I need to know the location of all 1's within the string. The amount of 1's vary, but the string size is always the same.
This is my first time posting to stackoverflow, but I have used answers many times. I seek to give a clear question with relevant information. If there is anything I can do to help, I will try to fulfill any requests.
How about changing the while condition to this?
while (charindex('1', #binarystring, #position) > 0)
while (#position <= len(#binarystring))
begin
set #position = charindex('1', #binarystring, #position)
if #position != 0
begin
set #list = #list + ', ' + convert(varchar(10),#position)
set #position = charindex('1', #binarystring, #position)+1
end
else
begin
break
end;
end
It's often useful to have a source of large ranges of sequential integers handy. I have a table, dbo.range that has a single column, id containing all the sequential integers from -500,000 to +500,000. That column is a clustered primary key so lookups against are fast. With such a table, solving your problem is easy.
Assuming your table has a schema something like
create table dbo.some_table_with_flags
(
id int not null primary key ,
flags varchar(1000) not null ,
)
The following query should do you:
select row_id = t.id ,
flag_position = r.id
from dbo.some_table t
join dbo.range r on r.id between 1 and len(t.flags)
and substring(t.flags,r.id,1) = '1'
For each 1 value in the flags column, you'll get a row containing the ID from your source table's ID column, plus the position in which the 1 was found in flags.
There are a number of techniques for generating such sequences. This link shows several:
http://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
For instance, you could use common table expressions (CTEs) to generate your sequences, like this:
WITH
s1(n) AS -- 10 (10^1)
( SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
) ,
s2(n) as ( select 1 from s1 a cross join s1 b ) , -- 10^2 100
s3(n) as ( select 1 FROM s1 a cross join s2 b ) , -- 10^3 1,000
s4(n) as ( select 1 from s1 a cross join s3 b ) , -- 10^4 10,000
s5(n) as ( select 1 from s1 a cross join s4 b ) , -- 10^5 100,000
s6(n) as ( select 1 from s1 a cross join s5 b ) , -- 10^6 1,000,000
seq(n) as ( select row_number() over ( order by n ) from s6 )
select *
from dbo.some_table t
join seq s on s.n between 1 and len(t.flags)
and substring(t.flags,s.n,1) = '1'

Split string in SQL Server to a maximum length, returning each as a row

Is there a way to split a string (from a specific column) to n-number chars without breaking words, with each result in its own row?
Example:
2012-04-24 Change request #3 for the contract per terms and conditions and per John Smith in the PSO department Customer states terms should be Net 60 not Net 30. Please review signed contract for this information.
Results:
2012-04-24 Change request #3 for the contract per terms and conditions and per John Smith in the
PSO department Customer states terms should be Net 60 not Net 30.
Please review signed contract for this information.
I know I can use charindex to find the last space, but im not sure how i can get the remaining ones and return them as rows.
Try something like this. May be your can create a SQL function of following implementation.
DECLARE #Str VARCHAR(1000)
SET #Str = '2012-04-24 Change request #3 for the contract per terms and conditions and per John Smith in the PSO department Customer states terms should be Net 60 not Net 30. Please review signed contract for this information.'
DECLARE #End INT
DECLARE #Split INT
SET #Split = 100
declare #SomeTable table
(
Content varchar(3000)
)
WHILE (LEN(#Str) > 0)
BEGIN
IF (LEN(#Str) > #Split)
BEGIN
SET #End = LEN(LEFT(#Str, #Split)) - CHARINDEX(' ', REVERSE(LEFT(#Str, #Split)))
INSERT INTO #SomeTable VALUES (RTRIM(LTRIM(LEFT(LEFT(#Str, #Split), #End))))
SET #Str = SUBSTRING(#Str, #End + 1, LEN(#Str))
END
ELSE
BEGIN
INSERT INTO #SomeTable VALUES (RTRIM(LTRIM(#Str)))
SET #Str = ''
END
END
SELECT *
FROM #SomeTable
Output will be like this:
2012-04-24 Change request #3 for the contract per terms and conditions and per John Smith in the
PSO department Customer states terms should be Net 60 not Net 30. Please review signed contract
for this information.
I read some articles and each of them has error or bad performance or not working in small or big length of chunk we want. You can read my comments even in this article below of any answer. Finally i found a good answer and decided to share it in this question. I didn't check performance in various scenarios but i think is acceptable and working fine for small and big chunk length.
This is the code:
CREATE function SplitString
(
#str varchar(max),
#length int
)
RETURNS #Results TABLE( Result varchar(50),Sequence INT )
AS
BEGIN
DECLARE #Sequence INT
SET #Sequence = 1
DECLARE #s varchar(50)
WHILE len(#str) > 0
BEGIN
SET #s = left(#str, #length)
INSERT #Results VALUES (#s,#Sequence)
IF(len(#str)<#length)
BREAK
SET #str = right(#str, len(#str) - #length)
SET #Sequence = #Sequence + 1
END
RETURN
END
and source is #Rhyno answer on this question: TSQL UDF To Split String Every 8 Characters
Hope this help.
Just to see if it could be done, I came up with a solution that doesn't loop. It's based on somebody else's function to split a string based on a delimiter.
Note:
This requires that you know the maximum token length ahead of time. The function will stop returning lines upon encountering a token longer than the specified line length. There are probably other bugs lurking as well, so use this code at your own caution.
CREATE FUNCTION SplitLines
(
#pString VARCHAR(7999),
#pLineLen INT,
#pDelim CHAR(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
WITH
E1(N) AS ( --=== Create Ten 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 --10
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4),
lines AS (
SELECT TOP 1
1 as LineNumber,
ltrim(rtrim(SUBSTRING(#pString, 1, N))) as Line,
N + 1 as start
FROM cteTally
WHERE N <= DATALENGTH(#pString) + 1
AND N <= #pLineLen + 1
AND SUBSTRING(#pString + #pDelim, N, 1) = #pDelim
ORDER BY N DESC
UNION ALL
SELECT LineNumber, Line, start
FROM (
SELECT LineNumber + 1 as LineNumber,
ltrim(rtrim(SUBSTRING(#pString, start, N))) as Line,
start + N + 1 as start,
ROW_NUMBER() OVER (ORDER BY N DESC) as r
FROM cteTally, lines
WHERE N <= DATALENGTH(#pString) + 1 - start
AND N <= #pLineLen
AND SUBSTRING(#pString + #pDelim, start + N, 1) = #pDelim
) A
WHERE r = 1
)
SELECT LineNumber, Line
FROM lines
It's actually quite fast and you can do cool things like join on it. Here's a simple example that gets the first 'line' from every row in a table:
declare #table table (
id int,
paragraph varchar(7999)
)
insert into #table values (1, '2012-04-24 Change request #3 for the contract per terms and conditions and per John Smith in the PSO department Customer states terms should be Net 60 not Net 30. Please review signed contract for this information.')
insert into #table values (2, 'Is there a way to split a string (from a specific column) to n-number chars without breaking words, with each result in its own row?')
select t.id, l.LineNumber, l.Line, len(Line)
from #table t
cross apply SplitLines(t.paragraph, 42, ' ') l
where l.LineNumber = 1
I know this is a bit late but a recursive cte would allow to achieve this.
Also you could make use of a seed table containing a sequence of numbers to feed into the substring as a multiplier for the start index.