mssql cdc update_mask filter changes made only in column TS - sql

I want to find the rows in my mssql cdc table where only the column "TS" has been changed.
So I found some logic to check if a specific column was changed (this works), but I need to check if only the column TS was changed:
SET #colorder = sys.fn_cdc_get_column_ordinal('dbo_mytable', 'TS')
SELECT case when substring([__$update_mask],len([__$update_mask]) - ((#colorder-1)/8),1) & power(2,(#colorder-1)%8) > 0 then 1 else 0 end
FROM cdc.fn_cdc_get_all_changes_dbo_MYTABLE(#from_lsn, #to_lsn, 'all') PD

I've been meaning to write functions like this for a while, thanks for giving me a reason to actually do it.
Please do some unit testing of your own, I have only done a few very basic checks
-- inline tabular function because it's more versatile
-- cross applies used purely for ease of reading,
-- could just make nested calls but hard to read. Up to you.
-- pass null to flip, otherwise pass explicit value you want the bit to be set to
create function dbo.setbit(#b varbinary(128), #bitpos tinyint, #value bit = null)
returns table as
return
(
select result = cast(result.val as varbinary(128))
from (select len(#b) - ((#bitpos - 1) / 8)) bytepos(val)
cross apply (select substring(#b, bytepos.val, 1)) byte(val)
cross apply (select power(2, (#bitpos - 1) % 8)) mask(val)
cross apply (
select cast
(
case #value
when 1 then byte.val | mask.val
when 0 then byte.val & ~mask.val
else byte.val ^ mask.val
end
as binary(1)
)
) newbyte(val)
cross apply (select stuff(#b, bytepos.val, 1, newbyte.val)) result(val)
);
-- scalar wrapper for tabular function
create function dbo.setbitscalar(#b varbinary(128), #bitpos tinyint, #value bit = null)
returns varbinary(128) as
begin
return (select result from dbo.setbit(#b, #bitpos, #value));
end
-- how it works
declare #b varbinary(128) = 0x0101 -- 2 bytes!
select
dbo.setbitscalar(#b, 1, 1), -- set bit 1 to 1
dbo.setbitscalar(#b, 1, 0), -- set bit 1 to 0
dbo.setbitscalar(#b, 1, default) -- flip bit 1
-- how to use it in your case:
-- set the #colorder bit in the mask to zero,
-- then see if the resulting mask is zero
-- if it is, then only TS changed
SET #colorder = sys.fn_cdc_get_column_ordinal('dbo_mytable', 'TS')
select only_TS_changed = iif(t.result = 0x, 1, 0)
from cdc.fn_cdc_get_all_changes_dbo_MYTABLE(#from_lsn, #to_lsn, 'all') PD
cross apply dbo.setbit(PD.[__$update_mask], #colorder, 0) t

Related

TSQL - Split GUID/UNIQUEIDENTIFIER

Case: We have smart guids in a table and need to extract 2nd and 4th parts out of it. I was thinking about writing a function that can take in #partnumber and return the extracted value for it.
e.g.
DECLARE #Guid UNIQUEIDENTIFIER = 'A7DDAA60-C33A-4D7A-A2D8-ABF20127C9AE'
1st part = A7DDAA60, 2nd part = C33A, 3rd part = 4D7A, 4th part =
A2D8, and 5th part = ABF20127C9AE
Based on the #partnumber, it would return one of those values.
I'm trying to figure out how to split it most efficiently (STRING_SPLIT doesn't guarantee order).
I am not sure exactly what you mean by "smart" guids, but why not just cast it to a char and pull out the parts by position?
create table t(myguid uniqueidentifier);
declare #p tinyint = 5;
select case #p
when 1 then left(c.v, 8)
when 2 then substring(c.v, 10, 4)
when 3 then substring(c.v, 15, 4)
when 4 then substring(c.v, 20, 4)
when 5 then right(c.v, 12)
end
from t
cross apply (select cast(t.myguid as char(36))) c(v)
You can use, OPENJSON
DECLARE #Guid UNIQUEIDENTIFIER = 'A7DDAA60-C33A-4D7A-A2D8-ABF20127C9AE',
#s varchar(100)
Select #s = replace(#guid,'-','","')
Select * from
(
Select [key] + 1 as Poistion, Value as Part
FROM OPENJSON('["' + #s + '"]')
) Q
Where Poistion in (2,4)
Here is the fiddle.

SQL - STRING_SPLIT string position

I have a table with two columns of comma-separated strings. The way the data is formatted, the number of comma-separated items in both columns is equal, and the first value in colA is related to the first value in colB, and so on. (It's obviously not a very good data format, but it's what I'm working with.)
If I have the following row (PrimaryKeyID | column1 | column2):
1 | a,b,c | A,B,C
then in this data format, a & 1 are logically related, b & 2, etc.
I want to use STRING_SPLIT to split these columns, but using it twice obviously crosses them with each other, resulting in a total of 9 rows.
1 | a | A
1 | b | A
1 | c | A
1 | a | B
1 | b | B
1 | c | B
1 | a | C
1 | b | C
1 | c | C
What I want is just the 3 "logically-related" columns
1 | a | A
1 | b | B
1 | c | C
However, STRING_SPLIT(myCol,',') doesn't appear to save the String Position anywhere.
I have done the following:
SELECT tbl.ID,
t1.Column1Value,
t2.Column2Value
FROM myTable tbl
INNER JOIN (
SELECT t.ID,
ss.value AS Column1Value,
ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY t.ID) as StringOrder
FROM myTable t
CROSS APPLY STRING_SPLIT(t.column1,',') ss
) t1 ON tbl.ID = t1.ID
INNER JOIN (
SELECT t.ID,
ss.value AS Column2Value,
ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY t.ID) as StringOrder
FROM myTable t
CROSS APPLY STRING_SPLIT(t.column2,',') ss
) t1 ON tbl.ID = t2.ID AND t1.StringOrder = t2.StringOrder
This appears to work on my small test set, but in my opinion there is no reason to expect it to work guaranteed every time. The ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) is obviously a meaningless ordering, but it appears that, in absence of any real ordering, STRING_SPLIT is returning the values in the "default" order that they were already in. Is this "expected" behaviour? Can I count on this? Is there any other way of accomplishing what I'm attempting to do?
Thanks.
======================
EDIT
I got what I wanted (I think) with the following UDF. However it's pretty slow. Any suggestions?
CREATE FUNCTION fn.f_StringSplit(#string VARCHAR(MAX),#delimiter VARCHAR(1))
RETURNS #r TABLE
(
Position INT,
String VARCHAR(255)
)
AS
BEGIN
DECLARE #current_position INT
SET #current_position = 1
WHILE CHARINDEX(#delimiter,#string) > 0 BEGIN
INSERT INTO #r (Position,String) VALUES (#current_position, SUBSTRING(#string,1,CHARINDEX(#delimiter,#string) - 1))
SET #current_position = #current_position + 1
SET #string = SUBSTRING(#string,CHARINDEX(#delimiter,#string) + 1, LEN(#string) - CHARINDEX(#delimiter,#string))
END
--add the last one
INSERT INTO #r (Position, String) VALUES(#current_position,#string)
RETURN
END
The only way I've discovered to expressively maintain the order of the String_Split() function this is using the Row_Number() function with a literal value in the "order by".
For example:
declare #Version nvarchar(128)
set #Version = '1.2.3';
with V as (select value v, Row_Number() over (order by (select 0)) n from String_Split(#Version, '.'))
select
(select v from V where n = 1) Major,
(select v from V where n = 2) Minor,
(select v from V where n = 3) Revision
Returns:
Major Minor Revision
----- ----- ---------
1 2 3
Update: if you are using a newer version of SQL Server, you can now provide an optional third bit argument which indicates that and ordinal column should also be included in the result. See my other answer here for more details.
Fortunately in newer SQL Server (Azure and 2022) an optional flag has been added to String_Split to include an "ordinal" column. If you are using a newer version of SQL Server, this finally provides a solution that is logically correct rather than implementation specific.
New definition:
String_Split(string, separator [, enable_ordinal])
e.g. String_Split('1.2.3', '.', 1)
Example:
with V as (select Value v, Ordinal n from String_Split('1.2.3', '.', 1))
select
(select v from V where n = 1) Major,
(select v from V where n = 2) Minor,
(select v from V where n = 3) Revision
Returns:
Major Minor Revision
----- ----- ---------
1 2 3
Your idea is fine, but your order by is not using a stable sort. I think it is safer to do:
SELECT tbl.ID, t1.Column1Value, t2.Column2Value
FROM myTable tbl INNER JOIN
(SELECT t.ID, ss.value AS Column1Value,
ROW_NUMBER() OVER (PARTITION BY t.ID
ORDER BY CHARINDEX(',' + ss.value + ',', ',' + t.column1 + ',')
) as StringOrder
FROM myTable t CROSS APPLY
STRING_SPLIT(t.column1,',') ss
) t1
ON tbl.ID = t1.ID INNER JOIN
(SELECT t.ID, ss.value AS Column2Value,
ROW_NUMBER() OVER (PARTITION BY t.ID
ORDER BY CHARINDEX(',' + ss.value + ',', ',' + t.column2 + ',')
) as StringOrder
FROM myTable t CROSS APPLY
STRING_SPLIT(t.column2, ',') ss
) t2
ON tbl.ID = t2.ID AND t1.StringOrder = t2.StringOrder;
Note: This may not work as desired if the strings have non-adjacent duplicates.
I'm a little late to this question, but I was just attempting the same thing with string_split since I've run into a performance problem of late. My experience with string splitters in T-SQL has led me to use recursive CTE's for most things containing fewer than 1,000 delimited values. Ideally, a CLR procedure would be used if you need ordinal in your string split.
That said, I've come to a similar conclusion as you on getting ordinal from string_split. You can see the queries and statistics below which, in order, are the bare string_split function, a CTE RowNumber of string_split, and then my personal string split CTE function I derived from this awesome write-up. The main difference between my CTE-based function and the one in the write-up is I made it an Inline-TVF instead of their implementation of a MultiStatement-TVF, which you can read about the differences here.
In my experiments I haven't seen a deviation using ROW_NUMBER on a constant returning the internal order of the delimited string, so I will be using it until such time as I find a problem with it, but if order is imperative in a business setting, I would probably recommend the Moden splitter featured in the first link above, which links to the author's article here since it is right in-line with the performance seen by the less safe string_split with RowNumber approach.
set nocount on;
declare
#iter int = 0,
#rowcount int,
#val varchar(max) = '';
while len(#val) < 1e6
select
#val += replicate(concat(#iter, ','), 8e3),
#iter += 1;
raiserror('Begin string_split Built-In', 0, 0) with nowait;
set statistics time, io on;
select
*
from
string_split(#val, ',')
where
[value] > '';
select
#rowcount = ##rowcount;
set statistics time, io off;
print '';
raiserror('End string_split Built-In | Return %d Rows', 0, 0, #rowcount) with nowait;
print '';
raiserror('Begin string_split Built-In with RowNumber', 0, 0) with nowait;
set statistics time, io on;
with cte
as (
select
*,
[group] = 1
from
string_split(#val, ',')
where
[value] > ''
),
cteCount
as (
select
*,
[id] = row_number() over (order by [group])
from
cte
)
select
*
from
cteCount;
select
#rowcount = ##rowcount;
set statistics time, io off;
print '';
raiserror('End string_split Built-In with RowNumber | Return %d Rows', 0, 0, #rowcount) with nowait;
print '';
raiserror('Begin Moden String Splitter', 0, 0) with nowait;
set statistics time, io on;
select
*
from
dbo.SplitStrings_Moden(#val, ',')
where
item > '';
select
#rowcount = ##rowcount;
set statistics time, io off;
print '';
raiserror('End Moden String Splitter | Return %d Rows', 0, 0, #rowcount) with nowait;
print '';
raiserror('Begin Recursive CTE String Splitter', 0, 0) with nowait;
set statistics time, io on;
select
*
from
dbo.fn_splitByDelim(#val, ',')
where
strValue > ''
option
(maxrecursion 0);
select
#rowcount = ##rowcount;
set statistics time, io off;
Statistics being
Begin string_split Built-In
SQL Server Execution Times:
CPU time = 2000 ms, elapsed time = 5325 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End string_split Built-In | Return 331940 Rows
Begin string_split Built-In with RowNumber
SQL Server Execution Times:
CPU time = 2094 ms, elapsed time = 8119 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End string_split Built-In with RowNumber | Return 331940 Rows
Begin Moden String Splitter
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 6 ms.
SQL Server Execution Times:
CPU time = 8734 ms, elapsed time = 9009 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End Moden String Splitter | Return 331940 Rows
Begin Recursive CTE String Splitter
Table 'Worktable'. Scan count 2, logical reads 1991648, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 147188 ms, elapsed time = 147480 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
End Recursive CTE String Splitter | Return 331940 Rows
SELECT
PrimaryKeyID ,t2.items as column1, t1.items as column2 from [YourTableName]
cross Apply [dbo].[Split](column2) as t1
cross Apply [dbo].[Split](column1) as t2
Mark, here is a solution I would use. Assuming that [column 1] in your table has the "key" values that are more less stable, and [column2] has corresponding "field" values that can be sometimes omitted or NULL:
There will be two extractions, one for [column 1] - which I assume is the Key, another for [column 2] - which I assume is the sort of "values" for the "key", they will be auto parsed then by STRING_SPLIT function.
These two INDEPENDENT result-sets will be then re-numbered based on the time of operation (which is always sequential). Take note, we renumber not by the field content or position of the comma etc, BUT by the timestamp.
Then they will get joined back together by LEFT OUTER JOIN; note not by INNER JOIN due to the fact that our "field values" could get omitted, while "keys" will always be there
Below is the TSQL code, as this is my first post to this site, hope it looks ok:
SELECT T1.ID, T1.KeyValue, T2.FieldValue
from (select t1.ID, row_number() OVER (PARTITION BY t1.ID ORDER BY current_timestamp) AS KeyRow, t2.value AS KeyValue
from myTable t1
CROSS APPLY STRING_SPLIT(t1.column1,',') as t2) T1
LEFT OUTER JOIN
(select t1.ID, row_number() OVER (PARTITION BY t1.ID ORDER BY current_timestamp) AS FieldRow, t3.value AS FieldValue
from myTable t1
CROSS APPLY STRING_SPLIT(t1.column2,',') as t3) T2 ON T1.ID = T2.ID AND T1.KeyRow = T2.FieldRow
This is very simple
CREATE TABLE #a(
id [INT] IDENTITY(1,1) NOT NULL,
OrgId INT )
INSERT INTO #a
(
OrgId
)
SELECT value FROM STRING_SPLIT('18,44,45,46,47,48,49,50,51,52,53', ',')
Select * from #a
Here is a t-sql function that uses string_split and adds the ordinal column:
drop function if exists [dbo].[varchar_split2];
go
create function [dbo].[varchar_split2]
(
#text varchar(max),
#delimiter char(1) = ','
)
returns #result table ([Ordinal] int not null identity(1, 1) primary key, [Value] varchar(128) not null)
as
begin
insert #result ([Value])
select
[Value]
from
string_split(#text, #delimiter)
where
0 != len([Value])
;
return;
end;
go

SQL script for removing extra characters

I've got MSSQL 2012 database with some data issues in the certain column A which contains text.
There are many occurences of aditional unnecesarry character after the </B> tag, for instance:
'<B>Something</B>g' where should stand '<B>Something</B>'
'<B>SomethingElse</B>e' where should stand '<B>SomethingElse</B>'
Previous values are part of a greater text, for instance and can occur more than once -> Column example:
'Some text is here <B>Something</B>g and also here <B>SomethingElse</B>e more text'
Those 'extra' characters are always the same as the last character between the <B></B> tags.
I would like to create SQL scripts which will:
Remove extra character after </B> tag
Only if extra character is the same as the last character between the
<B></B> tags (as a aditional check). EDIT: This is not absolutely necessary
I assuming there is a way of calling replace function, like in this pseudo in which X represents any character.
replace(X</B>X, X</B>);
But I am not very good in SQL and also I don't know how to implement 2. check.
Thank you for your help.
If your column has no other characters then just those strings, you could use this update statement on column a:
update mytable
set a = left(a, len(a)-1)
where left(right(a, 6), 5) = right(a, 1) + '</B>'
Here are some test cases in a fiddle.
To replace such occurrences in longer strings, where there might be multiple of them, then you can use this recursive query:
WITH recursive AS (
SELECT replace(a, '</B>', 'µ') as a
FROM mytable
UNION ALL
SELECT stuff(a, charindex('µ', a),
CASE WHEN substring(a, charindex('µ', a)-1, 1)
= substring(a, charindex('µ', a)+1, 1)
THEN 2
ELSE 1
END, '</B>')
FROM recursive
WHERE charindex('µ', a) > 0
)
SELECT *
FROM recursive
WHERE charindex('µ', a) = 0
The character µ that appears in several places should be a character that you do not expect to ever have in your data. Replace it by another character if necessary.
Here is a fiddle.
The above query turned into an update statement looks like below. It assumes that your table has a primary key id:
WITH recursive AS (
SELECT id,
replace(a, '</B>', 'µ') as a,
0 as modified
FROM mytable
UNION ALL
SELECT id,
stuff(a, charindex('µ', a),
CASE WHEN substring(a, charindex('µ', a)-1, 1)
= substring(a, charindex('µ', a)+1, 1)
THEN 2 ELSE 1 END, '</B>'),
1
FROM recursive
WHERE charindex('µ', a) > 0
)
UPDATE mytable
SET a = recursive.a
FROM recursive
INNER JOIN mytable
ON mytable.id = recursive.id
WHERE charindex('µ', recursive.a) = 0
AND recursive.modified = 1;
Here is the fiddle for that as well.
You can create a scalar function:
CREATE FUNCTION [dbo].[RemoveChars]
(
-- Add the parameters for the function here
#InputStr NVARCHAR(50)
)
RETURNS NVARCHAR(50)
AS
BEGIN
DECLARE #SearchStr NVARCHAR(4) = '</B>'
DECLARE #LastChar CHAR(1)
DECLARE #LastCharInStr CHAR(1)
DECLARE #Result NVARCHAR(50)
SET #LastChar = SUBSTRING(#InputStr,
CHARINDEX(#SearchStr, #InputStr) + LEN(#SearchStr), 1)
SET #LastCharInStr = SUBSTRING(#InputStr,
CHARINDEX(#SearchStr, #InputStr) - 1, 1)
IF (#LastCharInStr = #LastChar)
SET #Result = SUBSTRING(#InputStr, 0,
CHARINDEX(#SearchStr, #InputStr) + LEN(#SearchStr))
ELSE
SET #Result = #InputStr
RETURN #Result
END
And then call it:
UPDATE MyTable
Set A = dbo.RemoveChars(A)
Personally I would create a second function to only apply the updates to the values that have a difference between the last char in the string and the char after the but that's for you to decide.

Parsing / Indexing a Binary String in SQL Server

I have searched extensively for a relevant answer, but none quite satisfy what I need to be doing.
For our purposes I have a column with a 50 character binary string. In our database, it is actually hundreds of characters long.
There is one string for each unique item ID in our database. The location of each '1' flags a specific criteria being true, and a '0' false, so the indexed location of the ones and zeros are very important. Mostly, I care about where the 1's are.
I am not updating any databases, so I first decided to try and make a loop to look through each string and create a list of the 1's locations.
declare #binarystring varchar(50) = '10000010000110000001000000000000000000000000000001'
declare #position int = 0
declare #list varchar(200) = ''
while (#position <= len(#binarystring))
begin
set #position = charindex('1', #binarystring, #position)
set #list = #list + ', ' + convert(varchar(10),#position)
set #position = charindex('1', #binarystring, #position)+1
end
select right(#list, len(#list)-2)
This creates the following list:
1, 7, 12, 13, 20, 50
However, the loop will bomb if there is not a '1' at the end of the string, as I am searching through the string via occurrences of 1's rather than one character at a time. I am not sure how satisfy the break criteria when the loop would normally reach the end of the string, without there being a 1.
Is there a simple solution to my loop bombing, and should I even be looping in the first place?
I have tried other methods of parsing, union joining, indexing, etc, but given this very specific set of circumstances I couldn't find any combination that did quite what I needed. The above code is the best I've got so far.
I don't specifically need a comma delimited list as an output, but I need to know the location of all 1's within the string. The amount of 1's vary, but the string size is always the same.
This is my first time posting to stackoverflow, but I have used answers many times. I seek to give a clear question with relevant information. If there is anything I can do to help, I will try to fulfill any requests.
How about changing the while condition to this?
while (charindex('1', #binarystring, #position) > 0)
while (#position <= len(#binarystring))
begin
set #position = charindex('1', #binarystring, #position)
if #position != 0
begin
set #list = #list + ', ' + convert(varchar(10),#position)
set #position = charindex('1', #binarystring, #position)+1
end
else
begin
break
end;
end
It's often useful to have a source of large ranges of sequential integers handy. I have a table, dbo.range that has a single column, id containing all the sequential integers from -500,000 to +500,000. That column is a clustered primary key so lookups against are fast. With such a table, solving your problem is easy.
Assuming your table has a schema something like
create table dbo.some_table_with_flags
(
id int not null primary key ,
flags varchar(1000) not null ,
)
The following query should do you:
select row_id = t.id ,
flag_position = r.id
from dbo.some_table t
join dbo.range r on r.id between 1 and len(t.flags)
and substring(t.flags,r.id,1) = '1'
For each 1 value in the flags column, you'll get a row containing the ID from your source table's ID column, plus the position in which the 1 was found in flags.
There are a number of techniques for generating such sequences. This link shows several:
http://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
For instance, you could use common table expressions (CTEs) to generate your sequences, like this:
WITH
s1(n) AS -- 10 (10^1)
( SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
) ,
s2(n) as ( select 1 from s1 a cross join s1 b ) , -- 10^2 100
s3(n) as ( select 1 FROM s1 a cross join s2 b ) , -- 10^3 1,000
s4(n) as ( select 1 from s1 a cross join s3 b ) , -- 10^4 10,000
s5(n) as ( select 1 from s1 a cross join s4 b ) , -- 10^5 100,000
s6(n) as ( select 1 from s1 a cross join s5 b ) , -- 10^6 1,000,000
seq(n) as ( select row_number() over ( order by n ) from s6 )
select *
from dbo.some_table t
join seq s on s.n between 1 and len(t.flags)
and substring(t.flags,s.n,1) = '1'

Apply a Mask to Format a String in SQL Server Query/View

Is there a neat way to apply a mask to a string in a SQL Server query?
I have two tables, one with Phone number stored as varchar with no literals 0155567890 and a phone type, which has a mask for that phone number type: (##) #### ####
What is the best way to return a string (for a merge Document) so that the query returns the fully formatted phone number:
(01) 5556 7890
As noted in the comment, my original answer below will result in terrible performance if used in a large number of rows. i-one's answer is preferred if performance is a consideration.
I needed this also, and thanks to Sjuul's pseudocode, I was able to create a function to do this.
CREATE FUNCTION [dbo].[fx_FormatUsingMask]
(
-- Add the parameters for the function here
#input nvarchar(1000),
#mask nvarchar(1000)
)
RETURNS nvarchar(1000)
AS
BEGIN
-- Declare the return variable here
DECLARE #result nvarchar(1000) = ''
DECLARE #inputPos int = 1
DECLARE #maskPos int = 1
DECLARE #maskSign char(1) = ''
WHILE #maskPos <= Len(#mask)
BEGIN
set #maskSign = substring(#mask, #maskPos, 1)
IF #maskSign = '#'
BEGIN
set #result = #result + substring(#input, #inputPos, 1)
set #inputPos += 1
set #maskPos += 1
END
ELSE
BEGIN
set #result = #result + #maskSign
set #maskPos += 1
END
END
-- Return the result of the function
RETURN #result
END
Just in case someone ever needs a table-valued function.
Approach 1 (see #2 for a faster version)
create function ftMaskPhone
(
#phone varchar(30),
#mask varchar(50)
)
returns table as
return
with ci(n, c, nn) as (
select
1,
case
when substring(#mask, 1, 1) = '#' then substring(#phone, 1, 1)
else substring(#mask, 1, 1)
end,
case when substring(#mask, 1, 1) = '#' then 1 else 0 end
union all
select
n + 1,
case
when substring(#mask, n + 1, 1) = '#' then substring(#phone, nn + 1, 1)
else substring(#mask, n + 1, 1)
end,
case when substring(#mask, n + 1, 1) = '#' then nn + 1 else nn end
from ci where n < len(#mask))
select (select c + '' from ci for xml path(''), type).value('text()[1]', 'varchar(50)') PhoneMasked
GO
Then apply it as
declare #mask varchar(50)
set #mask = '(##) #### ####'
select pm.PhoneMasked
from Phones p
outer apply ftMaskPhone(p.PhoneNum, #mask) pm
Approach 2
I'm going to leave the above version for historical purposes. However, this one has better performance.
CREATE FUNCTION dbo.ftMaskPhone
(
#phone varchar(30),
#mask varchar(50)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
WITH v1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
v2(N) AS (SELECT 1 FROM v1 a, v1 b),
v3(N) AS (SELECT TOP (ISNULL(LEN(#mask), 0)) ROW_NUMBER() OVER (ORDER BY ##SPID) FROM v2),
v4(N, C) AS (
SELECT N, ISNULL(SUBSTRING(#phone, CASE WHEN c.m = 1 THEN ROW_NUMBER() OVER (PARTITION BY c.m ORDER BY N) END, 1), SUBSTRING(#mask, v3.N, 1))
FROM v3
CROSS APPLY (SELECT CASE WHEN SUBSTRING(#mask, v3.N, 1) = '#' THEN 1 END m) c
)
SELECT MaskedValue = (
SELECT c + ''
FROM v4
ORDER BY N
FOR XML PATH(''), TYPE
).value('text()[1]', 'varchar(50)')
);
GO
Schema binding, in combination with this being a single-statement table-valued-function, makes this version eligible for inlining by the query optimizer. Implement the function using a CROSS APPLY as in the example above, or for single values, like this:
SELECT *
FROM dbo.ftMaskPhone('0012345678910', '### (###) ###-####')
Results look like:
MaskedValue
001 (234) 567-8910
This is just what came up in my head. I don't know whether it's the best solution but I think it should be workable.
Make a function with the name applyMask (orso)
Pseudocode:
WHILE currentPosition < Length(PhoneNr) AND safetyCounter < Length(Mask)
IF currentSign = "#"
result += Mid(PhoneNr, currentPosition, 1)
currentPosition++
ELSE
result += currentSign
safetyCounter++
END
END
Return result
As noted by #Sean, SQL Server 2012 and up supports the FORMAT function, which almost gives you what you need, with the following caveats:
It takes a number to format, rather than a VARCHAR. This could be worked around by using a CAST.
The mask as provided ((##) #### ####), coupled with a CAST would remove the leading zero, leaving you with (1) 5556 7890. You could update the mask to (0#) #### ####. Going on a limb that you're representing an Australian phone number, it seems that the leading 0 is always there anyways:
Within Australia, to access the "Number" of a landline telephone in an "Area" other than that in which the caller is located (including a caller using a "Mobile" 'phone), firstly it is necessary to dial the Australian "Trunk Access Code" of 0 plus the "Area" code, followed by the "Local" Number. Thus, the "Full National Number" (FNN) has ten digits: 0x xxxx xxxx.
But ultimately, I would argue that SQL Server is not the best place to handle representation/formatting of your data (as with dates, so with phone numbers). I would recommend doing this client-side using something like Google's libphonenumber. When a phone number is entered into the database, you could store the phone number itself and the country to which it belongs, which you could then use when displaying the phone number (or doing something like calling it or checking for validity).
There is the built in FORMAT function, which almost works. Unfortunately it takes an int as the first parameter, so it strips off the leading zero:
select format(0155567890 ,'(##) #### ####')
(1) 5556 7890
If you need to "mask", rather hide the real value with another, and then "unmask" a string you can try this function, or extend it for that matter. :)
https://stackoverflow.com/a/22023329/2175524
I wanted to hide some information, so i used RIGHT function. It shows only first 4 chars from right side.
CONCAT('xxx-xx-', RIGHT('03466045896', 4))
Above code will show "xxx-xx-5896"