I have searched extensively for a relevant answer, but none quite satisfy what I need to be doing.
For our purposes I have a column with a 50 character binary string. In our database, it is actually hundreds of characters long.
There is one string for each unique item ID in our database. The location of each '1' flags a specific criteria being true, and a '0' false, so the indexed location of the ones and zeros are very important. Mostly, I care about where the 1's are.
I am not updating any databases, so I first decided to try and make a loop to look through each string and create a list of the 1's locations.
declare #binarystring varchar(50) = '10000010000110000001000000000000000000000000000001'
declare #position int = 0
declare #list varchar(200) = ''
while (#position <= len(#binarystring))
begin
set #position = charindex('1', #binarystring, #position)
set #list = #list + ', ' + convert(varchar(10),#position)
set #position = charindex('1', #binarystring, #position)+1
end
select right(#list, len(#list)-2)
This creates the following list:
1, 7, 12, 13, 20, 50
However, the loop will bomb if there is not a '1' at the end of the string, as I am searching through the string via occurrences of 1's rather than one character at a time. I am not sure how satisfy the break criteria when the loop would normally reach the end of the string, without there being a 1.
Is there a simple solution to my loop bombing, and should I even be looping in the first place?
I have tried other methods of parsing, union joining, indexing, etc, but given this very specific set of circumstances I couldn't find any combination that did quite what I needed. The above code is the best I've got so far.
I don't specifically need a comma delimited list as an output, but I need to know the location of all 1's within the string. The amount of 1's vary, but the string size is always the same.
This is my first time posting to stackoverflow, but I have used answers many times. I seek to give a clear question with relevant information. If there is anything I can do to help, I will try to fulfill any requests.
How about changing the while condition to this?
while (charindex('1', #binarystring, #position) > 0)
while (#position <= len(#binarystring))
begin
set #position = charindex('1', #binarystring, #position)
if #position != 0
begin
set #list = #list + ', ' + convert(varchar(10),#position)
set #position = charindex('1', #binarystring, #position)+1
end
else
begin
break
end;
end
It's often useful to have a source of large ranges of sequential integers handy. I have a table, dbo.range that has a single column, id containing all the sequential integers from -500,000 to +500,000. That column is a clustered primary key so lookups against are fast. With such a table, solving your problem is easy.
Assuming your table has a schema something like
create table dbo.some_table_with_flags
(
id int not null primary key ,
flags varchar(1000) not null ,
)
The following query should do you:
select row_id = t.id ,
flag_position = r.id
from dbo.some_table t
join dbo.range r on r.id between 1 and len(t.flags)
and substring(t.flags,r.id,1) = '1'
For each 1 value in the flags column, you'll get a row containing the ID from your source table's ID column, plus the position in which the 1 was found in flags.
There are a number of techniques for generating such sequences. This link shows several:
http://sqlperformance.com/2013/01/t-sql-queries/generate-a-set-1
For instance, you could use common table expressions (CTEs) to generate your sequences, like this:
WITH
s1(n) AS -- 10 (10^1)
( SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
) ,
s2(n) as ( select 1 from s1 a cross join s1 b ) , -- 10^2 100
s3(n) as ( select 1 FROM s1 a cross join s2 b ) , -- 10^3 1,000
s4(n) as ( select 1 from s1 a cross join s3 b ) , -- 10^4 10,000
s5(n) as ( select 1 from s1 a cross join s4 b ) , -- 10^5 100,000
s6(n) as ( select 1 from s1 a cross join s5 b ) , -- 10^6 1,000,000
seq(n) as ( select row_number() over ( order by n ) from s6 )
select *
from dbo.some_table t
join seq s on s.n between 1 and len(t.flags)
and substring(t.flags,s.n,1) = '1'
Related
I have been trying to set up a SQL function to build descriptions with "tags". For example, I would want to start with a description:
"This is [length] ft. long and [height] ft. high"
And modify the description with data from a related table, to end up with:
"This is 75 ft. long and 20 ft. high"
I could do this easily with REPLACE functions if we had a set number of tags, but I want these tags to be user defined, and each description may or may not have specific tags in it. Would there be any better way to get this other than using a cursor to go through the string once for each available tag? Does SQL have any built in functionality to do a multiple replace? something like:
Replace(description,(select tag, replacement from tags))
I actually recommend doing this in application code. But, you can do it using a recursive CTE:
with t as (
select t.*, row_number() over (order by t.tag) as seqnum
from tags t
),
cte as (
select replace(#description, t.tag, t.replacement) as d, t.seqnum
from t
where seqnum = 1
union all
select replace(d, t.tag, t.replacement), t.seqnum
from cte join
t
on t.seqnum = cte.seqnum + 1
)
select top 1 cte.*
from cte
order by seqnum desc;
Try below query :
SELECT REPLACE(DESCRIPTION,'[length]',( SELECT replacement FROM tags WHERE tag
= '[length]') )
I agree with Gordon that this is best handled in your application code.
If for whatever reason that option is not available however, and if you don't want to use recursion as per Gordon's answer, you could use a tally table approach to swap out your values.
You will need to test the performance of the for xml being executed for each value though...
Assuming you have a table of Tag replacement values:
create table TagReplacementTable(Tag nvarchar(50), Replacement nvarchar(50));
insert into TagReplacementTable values('[test]',999)
,('[length]',75)
,('[height]',20)
,('[other length]',40)
,('[other height]',50);
You can create an inline table function that will work through your Descriptions and drop replace the necessary parts using TagReplacementTable as reference:
create function dbo.Tag_Replace(#str nvarchar(4000)
,#tagstart nvarchar(1)
,#tagend nvarchar(1)
)
returns table
as
return
(
with n(n) as (select n from (values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n(n))
-- Select the same number of rows as characters in #str as incremental row numbers.
-- Cross joins increase exponentially to a max possible 10,000 rows to cover largest #str length.
,t(t) as (select top (select len(#str) a) row_number() over (order by (select null)) from n n1,n n2,n n3,n n4)
-- Return the position of every value that starts or ends a part of the description.
-- This will be the first character (t='f'), the start of any tag (t='s') and the end of any tag (t='e').
,s(s,t) as (select 1, 'f'
union all select t+1, 's' from t where substring(#str,t,1) = #tagstart
union all select t+1, 'e' from t where substring(#str,t,1) = #tagend
)
-- Return the start and length of every value, to use in the SUBSTRING function.
-- ISNULL/NULLIF combo handles the last value where there is no delimiter at the end of the string.
-- Using the t value we can determine which CHARINDEX to look for.
,l(t,s,l) as (select t,s,isnull(nullif(charindex(case t when 'f' then #tagstart when 's' then #tagend when 'e' then #tagstart end,#str,s),0)-s,4000) from s)
-- Each element of the string is returned in an ordered list along with its t value.
-- Where this t value is 's' this means the value is a tag, so append the start and end identifiers and join to the TagReplacementTable.
-- Where no replacement is found, simply return the part of the Description.
-- Finally, concatenate into one string value.
select (select isnull(r.Replacement,k.Item)
from(select row_number() over(order by s) as ItemNumber
,case when l.t = 's' then '[' else '' end
+ substring(#str,s,l)
+ case when l.t = 's' then ']' else '' end as Item
,t
from l
) k
left join TagReplacementTable r
on(k.Item = r.Tag)
order by k.ItemNumber
for xml path('')
) as NewString
);
And then outer apply to the results of the function to do replacements on all your Description values:
declare #t table (Descr nvarchar(100));
insert into #t values('This is [length] ft. long and [height] ft. high'),('[test] This is [other length] ft. long and [other height] ft. high');
select *
from #t t
outer apply dbo.Tag_Replace(t.Descr,'[',']') r;
Output:
+--------------------------------------------------------------------+-----------------------------------------+
| Descr | NewString |
+--------------------------------------------------------------------+-----------------------------------------+
| This is [length] ft. long and [height] ft. high | This is 75 ft. long and 20 ft. high |
| [test] This is [other length] ft. long and [other height] ft. high | 999 This is 40 ft. long and 50 ft. high |
+--------------------------------------------------------------------+-----------------------------------------+
I would not iterate through an individual string, but instead run the update on the entire column of strings. I'm not sure if that was your intent but this would be much quicker than one string at a time.
Test Data:
Create TABLE #strs ( mystr VARCHAR(MAX) )
Create TABLE #rpls (i INT IDENTITY(1,1) NOT NULL, src VARCHAR(MAX) , Trg VARCHAR(MAX) )
INSERT INTO #strs
( mystr )
SELECT 'hello ##color## world'
UNION ALL SELECT 'see jack ##verboftheday##! ##verboftheday## Jack, ##verboftheday##!'
UNION ALL SELECT 'on ##Date##, the ##color## StockMarket was ##MarketDirection##!'
INSERT INTO #rpls ( src ,Trg )
SELECT '##Color##', 'Blue'
UNION SELECT ALL '##verboftheday##' , 'run'
UNION SELECT ALL '##Date##' , CONVERT(VARCHAR(MAX), GETDATE(), 9)
UNION SELECT ALL '##MarketDirection##' , 'UP'
then a loop like this:
DECLARE #i INTEGER = 0
DECLARE #count INTEGER
SELECT #count = COUNT(*)
FROM #rpls R
WHILE #i < #count
BEGIN
SELECT #i += 1
UPDATE #strs
SET mystr = REPLACE(mystr, ( SELECT R.src
FROM #rpls R
WHERE i = #i ), ( SELECT R.Trg
FROM #rpls R
WHERE i = #i ))
END
SELECT *
FROM #strs S
Yielding the following
hello Blue world
see jack run! run Jack, run!
on May 19 2017 9:48:02:390AM, the Blue StockMarket was UP!
I found someone wanting to do something similar here with a set number of options:
SELECT #target = REPLACE(#target, invalidChar, '-')
FROM (VALUES ('~'),(''''),('!'),('#'),('#')) AS T(invalidChar)
I could modify it as such:
declare #target as varchar(max) = 'This is [length] ft. long and [height] ft. high'
select #target = REPLACE(#target,'[' + tag + ']',replacement)
from tags
It then runs the replace once for every record returned in the select statement.
(I originally had added this to my question, but it sounds like it is better protocol to add it as a answer.)
How can i find the occurrences of 'e' at then end only of a string in sql?
For example:
abcdeee, occurrences: 3
aecdeae, occurrences: 1
Thanks.
The goal is going to be to avoid looping if at all possible, since SQL Server works much better with sets of data rather than processing things sequentially. With that in mind, I would generate a virtual table that gives you all of the counts that you might find. To be safe, it should be the same length as your column. In my example, I've stopped at 10 characters. I use a CTE to generate the virtual table. You can use a variable in there instead of the hard-coded 'e' of course. Also, the CAST()s are important to avoid data type mismatches with the recursive CTE, but you may need to adjust them, especially if you're using NVARCHAR.
;WITH CTE_Characters AS
(
SELECT
CAST('e' AS VARCHAR(10)) AS my_char, 1 AS cnt
UNION ALL
SELECT
CAST(my_char + 'e' AS VARCHAR(10)), cnt + 1
FROM
CTE_Characters
WHERE
cnt <= 9
)
SELECT
MT.my_string,
MAX(CTE.cnt) AS number_of_occurrences
FROM
My_Table MT
INNER JOIN CTE_Characters CTE ON REVERSE(MT.my_string) LIKE CTE.my_char + '%'
GROUP BY
MT.my_string
Here is a loop that will count the instances of 'e' on the end of an incoming string:
DECLARE #str nvarchar(50) = 'abcdeee'; --incoming string
DECLARE #ctr int = 0; --to count instances
DECLARE #rts nvarchar(50) = REVERSE(#str); --reverse the incoming string to start from the end
DECLARE #ind int = CHARINDEX('e',#rts,1); --find the first instance of e
WHILE #ind = 1 --only continue to count consecutive instances
BEGIN
SET #ctr += 1;
SET #rts = RIGHT(#rts,LEN(#rts)-1); --remove first character and re-run
SET #ind = CHARINDEX('e',#rts,1); --find next e
END
SELECT #ctr AS ctr
I have a table which is already truncated (Microsoft SQL 2008). I have to now populate it with sequential numbers up to 50,000 records arbitrary numbers (doesn't mater) up to 7 characters.
Can any one help as to what SQL statement I need to write that will automatically populate the newly empty table with A000001,A0000002,A0000003, etc so that I can sort number the records within the table.
I have approximately 50000 records which I need to sequentially entered and I really don't want to number the column manually via hand editing.
Thanks in advance.
I'd use excel to generate your unique ids using the following:
In A column:
=CONCATENATE($C2, TEXT($B2,"000000"))
In B column put a 1 in the first row and the following code in all subsequent rows:
=SUM($B4 + 1)
In C column:
The letter A
Then just import the excel csv as a table and you'll have all your ids ready to insert into your empty table.
The SQL below loads a table variable up. Just select from it and insert the data into the new table. Certainly not the model of efficiency, but it'll get the job done.
DECLARE #tmp TABLE(
Value NVARCHAR(10)
)
DECLARE #Counter INT=0
DECLARE #Padding NVARCHAR(20)
WHILE #Counter<50000
BEGIN
SET #Counter=#Counter+1
SET #Padding=
CASE LEN(CONVERT(NVARCHAR,#Counter))
WHEN 1 THEN '00000'
WHEN 2 THEN '0000'
WHEN 3 THEN '000'
WHEN 4 THEN '00'
WHEN 5 THEN '0'
ELSE ''
END
INSERT INTO #tmp SELECT 'A' + #Padding + CONVERT(NVARCHAR,#Counter)
END
select * from #tmp
Use Stacked CTE to generate sequential Numbers
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e2 CROSS JOIN e2 AS b), -- 100*100
e4(n) AS (SELECT 1 FROM e3 CROSS JOIN (SELECT TOP 5 n FROM e1) AS b) -- 5*10000
SELECT n = 'A'+right('000000'+
convert(varchar(20),ROW_NUMBER() OVER (ORDER BY n)),7)
FROM e4 ORDER BY n;
Check here for more methods to generate sequential numbers with performance analysis
Use a table with an identity column and populate it. Then update that table to set the alpha value you need as follows:
create table MyTable (
ID int not null identity(1,1),
Alpha varchar(30)
)
truncate table MyTable
begin tran -- makes it run much faster
declare #i int
select #i = 1
while #i < 1000000
begin
insert into MyTable (Alpha) values ('')
select #i = #i + 1
end
commit
update MyTable set Alpha = 'A' + replicate('0', 6 - len(cast(ID as varchar(30)))) + cast(ID as varchar(30))
Can somebody help me with this little task? What I need is a stored procedure that can find duplicate letters (in a row) in a string from a table "a" and after that make a new table "b" with just the id of the string that has a duplicate letter.
Something like this:
Table A
ID Name
1 Matt
2 Daave
3 Toom
4 Mike
5 Eddie
And from that table I can see that Daave, Toom, Eddie have duplicate letters in a row and I would like to make a new table and list their ID's only. Something like:
Table B
ID
2
3
5
Only 2,3,5 because that is the ID of the string that has duplicate letters in their names.
I hope this is understandable and would be very grateful for any help.
In your answer with stored procedure, you have 2 mistakes, one is missing space between column name and LIKE clause, second is missing single quotes around search parameter.
I first create user-defined scalar function which return 1 if string contains duplicate letters:
EDITED
CREATE FUNCTION FindDuplicateLetters
(
#String NVARCHAR(50)
)
RETURNS BIT
AS
BEGIN
DECLARE #Result BIT = 0
DECLARE #Counter INT = 1
WHILE (#Counter <= LEN(#String) - 1)
BEGIN
IF(ASCII((SELECT SUBSTRING(#String, #Counter, 1))) = ASCII((SELECT SUBSTRING(#String, #Counter + 1, 1))))
BEGIN
SET #Result = 1
BREAK
END
SET #Counter = #Counter + 1
END
RETURN #Result
END
GO
After function was created, just call it from simple SELECT query like following:
SELECT
*
FROM
(SELECT
*,
dbo.FindDuplicateLetters(ColumnName) AS Duplicates
FROM TableName) AS a
WHERE a.Duplicates = 1
With this combination, you will get just rows that has duplicate letters.
In any version of SQL, you can do this with a brute force approach:
select *
from t
where t.name like '%aa%' or
t.name like '%bb%' or
. . .
t.name like '%zz%'
If you have a case sensitive collation, then use:
where lower(t.name) like '%aa%' or
. . .
Here's one way.
First create a table of numbers
CREATE TABLE dbo.Numbers
(
number INT PRIMARY KEY
);
INSERT INTO dbo.Numbers
SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number > 0;
Then with that in place you can use
SELECT *
FROM TableA
WHERE EXISTS (SELECT *
FROM dbo.Numbers
WHERE number < LEN(Name)
AND SUBSTRING(Name, number, 1) = SUBSTRING(Name, number + 1, 1))
Though this is an old post it's worth posting a solution that will be faster than a brute force approach or one that uses a scalar udf (which generally drag down performance). Using NGrams8K this is rather simple.
--sample data
declare #table table (id int identity primary key, [name] varchar(20));
insert #table([name]) values ('Mattaa'),('Daave'),('Toom'),('Mike'),('Eddie');
-- solution #1
select id
from #table
cross apply dbo.NGrams8k([name],1)
where charindex(replicate(token,2), [name]) > 0
group by id;
-- solution #2 (SQL 2012+ solution using LAG)
select id
from
(
select id, token, prevToken = lag(token,1) over (partition by id order by position)
from #table
cross apply dbo.NGrams8k([name],1)
) prep
where token = prevToken
group by id; -- optional id you want to remove possible duplicates.
another burte force way:
select *
from t
where t.name ~ '(.)\1';
Is there a neat way to apply a mask to a string in a SQL Server query?
I have two tables, one with Phone number stored as varchar with no literals 0155567890 and a phone type, which has a mask for that phone number type: (##) #### ####
What is the best way to return a string (for a merge Document) so that the query returns the fully formatted phone number:
(01) 5556 7890
As noted in the comment, my original answer below will result in terrible performance if used in a large number of rows. i-one's answer is preferred if performance is a consideration.
I needed this also, and thanks to Sjuul's pseudocode, I was able to create a function to do this.
CREATE FUNCTION [dbo].[fx_FormatUsingMask]
(
-- Add the parameters for the function here
#input nvarchar(1000),
#mask nvarchar(1000)
)
RETURNS nvarchar(1000)
AS
BEGIN
-- Declare the return variable here
DECLARE #result nvarchar(1000) = ''
DECLARE #inputPos int = 1
DECLARE #maskPos int = 1
DECLARE #maskSign char(1) = ''
WHILE #maskPos <= Len(#mask)
BEGIN
set #maskSign = substring(#mask, #maskPos, 1)
IF #maskSign = '#'
BEGIN
set #result = #result + substring(#input, #inputPos, 1)
set #inputPos += 1
set #maskPos += 1
END
ELSE
BEGIN
set #result = #result + #maskSign
set #maskPos += 1
END
END
-- Return the result of the function
RETURN #result
END
Just in case someone ever needs a table-valued function.
Approach 1 (see #2 for a faster version)
create function ftMaskPhone
(
#phone varchar(30),
#mask varchar(50)
)
returns table as
return
with ci(n, c, nn) as (
select
1,
case
when substring(#mask, 1, 1) = '#' then substring(#phone, 1, 1)
else substring(#mask, 1, 1)
end,
case when substring(#mask, 1, 1) = '#' then 1 else 0 end
union all
select
n + 1,
case
when substring(#mask, n + 1, 1) = '#' then substring(#phone, nn + 1, 1)
else substring(#mask, n + 1, 1)
end,
case when substring(#mask, n + 1, 1) = '#' then nn + 1 else nn end
from ci where n < len(#mask))
select (select c + '' from ci for xml path(''), type).value('text()[1]', 'varchar(50)') PhoneMasked
GO
Then apply it as
declare #mask varchar(50)
set #mask = '(##) #### ####'
select pm.PhoneMasked
from Phones p
outer apply ftMaskPhone(p.PhoneNum, #mask) pm
Approach 2
I'm going to leave the above version for historical purposes. However, this one has better performance.
CREATE FUNCTION dbo.ftMaskPhone
(
#phone varchar(30),
#mask varchar(50)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
WITH v1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
v2(N) AS (SELECT 1 FROM v1 a, v1 b),
v3(N) AS (SELECT TOP (ISNULL(LEN(#mask), 0)) ROW_NUMBER() OVER (ORDER BY ##SPID) FROM v2),
v4(N, C) AS (
SELECT N, ISNULL(SUBSTRING(#phone, CASE WHEN c.m = 1 THEN ROW_NUMBER() OVER (PARTITION BY c.m ORDER BY N) END, 1), SUBSTRING(#mask, v3.N, 1))
FROM v3
CROSS APPLY (SELECT CASE WHEN SUBSTRING(#mask, v3.N, 1) = '#' THEN 1 END m) c
)
SELECT MaskedValue = (
SELECT c + ''
FROM v4
ORDER BY N
FOR XML PATH(''), TYPE
).value('text()[1]', 'varchar(50)')
);
GO
Schema binding, in combination with this being a single-statement table-valued-function, makes this version eligible for inlining by the query optimizer. Implement the function using a CROSS APPLY as in the example above, or for single values, like this:
SELECT *
FROM dbo.ftMaskPhone('0012345678910', '### (###) ###-####')
Results look like:
MaskedValue
001 (234) 567-8910
This is just what came up in my head. I don't know whether it's the best solution but I think it should be workable.
Make a function with the name applyMask (orso)
Pseudocode:
WHILE currentPosition < Length(PhoneNr) AND safetyCounter < Length(Mask)
IF currentSign = "#"
result += Mid(PhoneNr, currentPosition, 1)
currentPosition++
ELSE
result += currentSign
safetyCounter++
END
END
Return result
As noted by #Sean, SQL Server 2012 and up supports the FORMAT function, which almost gives you what you need, with the following caveats:
It takes a number to format, rather than a VARCHAR. This could be worked around by using a CAST.
The mask as provided ((##) #### ####), coupled with a CAST would remove the leading zero, leaving you with (1) 5556 7890. You could update the mask to (0#) #### ####. Going on a limb that you're representing an Australian phone number, it seems that the leading 0 is always there anyways:
Within Australia, to access the "Number" of a landline telephone in an "Area" other than that in which the caller is located (including a caller using a "Mobile" 'phone), firstly it is necessary to dial the Australian "Trunk Access Code" of 0 plus the "Area" code, followed by the "Local" Number. Thus, the "Full National Number" (FNN) has ten digits: 0x xxxx xxxx.
But ultimately, I would argue that SQL Server is not the best place to handle representation/formatting of your data (as with dates, so with phone numbers). I would recommend doing this client-side using something like Google's libphonenumber. When a phone number is entered into the database, you could store the phone number itself and the country to which it belongs, which you could then use when displaying the phone number (or doing something like calling it or checking for validity).
There is the built in FORMAT function, which almost works. Unfortunately it takes an int as the first parameter, so it strips off the leading zero:
select format(0155567890 ,'(##) #### ####')
(1) 5556 7890
If you need to "mask", rather hide the real value with another, and then "unmask" a string you can try this function, or extend it for that matter. :)
https://stackoverflow.com/a/22023329/2175524
I wanted to hide some information, so i used RIGHT function. It shows only first 4 chars from right side.
CONCAT('xxx-xx-', RIGHT('03466045896', 4))
Above code will show "xxx-xx-5896"