Currently, I have an data set that is structured as follows:
CREATE TABLE notes (
date DATE NOT NULL,
author VARCHAR(100) NOT NULL,
type CHAR NOT NULL,
line_number INT NOT NULL,
note VARCHAR(4000) NOT NULL
);
Some sample date:
Date, Author, Type, Line Number, Note
2015-01-01, Abe, C, 1, First 4000 character string
2015-01-01, Abe, C, 2, Second 4000 character string
2015-01-01, Abe, C, 3, Third 4000 character string
2015-01-01, Bob, C, 1, First 4000 character string
2015-01-01, Bob, C, 2, Second 1000 character string
2015-01-01, Cal, C, 1, First 3568 character string
This data is to be migrated to a new SQL Server structure that is defined as:
CREATE TABLE notes (
date DATE NOT NULL,
author VARCHAR(100) NOT NULL,
type CHAR NOT NULL,
note VARCHAR(8000) NOT NULL
);
I would like to prefix to the multi-line (those with more than 8000 characters when combined) Notes with "Date - Author - Part X of Y // ", and place a space between concatenated strings so the data would end up like:
Date, Author, Type, Note
2015-01-01, Abe, C, 2015-01-01 - Abe - Part 1 of 2 // First 4000 character string First 3959 characters of the second 4000 character string
2015-01-01, Abe, C, 2015-01-01 - Abe - Part 2 of 2 // Remaining 41 characters of the second 4000 character string Third (up to) 4000 character string
2015-01-01, Bob, C, First 4000 character string Second 1000 character string
2015-01-01, Cal, C, First 3568 character string
I'm looking for ways to accomplish this transformation. Initially, I had an intermediate step to simple combine (coalesce) all the Note strings where Date, Author, Type are shared together but was not able to split.
Okay, so, this was a bit of a challenge but I got there in the end. Has been a thoroughly enjoyable distraction from my regular work :D
The code assumes that you will never have a note that is longer than 72,000 total characters, in that the logic which works out how much extra text is added by the Part x in y prefix assumes that x and y are single digit numbers. This could easily be remedied by padding any single digits with leading zeros, which would also ensure ordering is correct.
If you need anything explained, the comments in the code should be sufficient:
-- Declare the test data:
declare #a table ([Date] date
,author varchar(100)
,type char
,line_number int
,note varchar(8000)
,final_line int
,new_lines int
)
insert into #a values
('2015-01-01','Abel','C',1,'This is a note that is 100 characters long----------------------------------------------------------' ,null,null)
,('2015-01-01','Abel','C',2,'This is a note that is 100 characters long----------------------------------------------------------' ,null,null)
,('2015-01-01','Abel','C',3,'This is a note that is 83 characters long------------------------------------------' ,null,null)
,('2015-01-01','Bob' ,'C',1,'This is a note that is 100 characters long----------------------------------------------------------' ,null,null)
,('2015-01-01','Bob' ,'C',2,'This is a note that is 43 characters long--' ,null,null)
,('2015-01-01','Cal' ,'C',1,'This is a note that is 50 characters long---------' ,null,null)
---------------------------------------
-- Start the actual data processing. --
---------------------------------------
declare #MaxFieldLen decimal(10,2) = 100 -- Set this to your 8000 characters limit you have. I have used 100 so I didn't have to generate and work with really long text values.
-- Create Numbers table. This will perform better if created as a permanent table:
if object_id('tempdb..#Numbers') is not null
drop table #Numbers
;with e00(n) as (select 1 union all select 1)
,e02(n) as (select 1 from e00 a, e00 b)
,e04(n) as (select 1 from e02 a, e02 b)
,e08(n) as (select 1 from e04 a, e04 b)
,e16(n) as (select 1 from e08 a, e08 b)
,e32(n) as (select 1 from e16 a, e16 b)
,cte(n) as (select row_number() over (order by n) from e32)
select n-1 as Number
into #Numbers
from cte
where n <= 1000001
-- Calculate some useful figures to be used in chopping up the total note. This will need to be done across the table before doing anything else:
update #a
set final_line = t.final_line
,new_lines = t.new_lines
from #a a
inner join (select Date
,author
,type
,max(line_number) as final_line -- We only want the final line from the CTE later on, so we need a way of identifying that the line_number we are working with the last one.
-- Calculate the total number of lines that will result from the additional text being added:
,case when sum(len(note)) > #MaxFieldLen -- If the Note is long enough to be broken into two lines:
then ceiling( -- Find the next highest integer value for
sum(len(note)) -- the total length of all the notes
/ (#MaxFieldLen - len(convert(nvarchar(10), Date, 121) + ' - ' + author + ' - Part x of x //_')) -- divided by the max note size allowed minus the length of the additional text.
)
else 1 -- Otherwise return 1.
end as new_lines
from #a
group by Date
,author
,type
) t
on a.Date = t.Date
and a.author = t.author
and a.type = t.type
-- Combine the Notes using a recursive cte:
;with cte as
(
select Date
,author
,type
,line_number
,final_line
,note
,new_lines
from #a
where line_number = 1
union all
select a.Date
,a.author
,a.type
,a.line_number
,a.final_line
,c.note + a.note
,a.new_lines
from cte c
join #a a
on c.Date = a.Date
and c.author = a.author
and c.type = a.type
and c.line_number+1 = a.line_number
)
select c1.Date
,c1.author
,c1.type
,c2.note
from cte c1
cross apply (select case when c1.new_lines > 1 -- If there is more than one line to be returned, build up the prefix:
then convert(nvarchar(10), Date, 121) + ' - ' + author + ' - Part ' + cast(Number+1 as nvarchar(10)) + ' of ' + cast(c1.new_lines as nvarchar(10)) + ' // '
+ substring(c1.note -- and then append the next (Max note length - Generated prefix) number of characters in the note:
,1 + Number * (#MaxFieldLen - len(convert(nvarchar(10), Date, 121) + ' - ' + author + ' - Part x of x //_'))
,(#MaxFieldLen - len(convert(nvarchar(10), Date, 121) + ' - ' + author + ' - Part x of x //_'))-1
)
else c1.note
end as note
from #Numbers
where Number >= 0
and Number < case when c1.new_lines = 1
then 1
else len(c1.note) / (#MaxFieldLen - len(convert(nvarchar(10), Date, 121) + ' - ' + author + ' - Part x of x //_'))
end
) c2
where line_number = final_line
order by 1,2,3,4
Related
I would like to generate the data by given regex pattern in SQL Server. Is there any possibility to do? Say, I have pattern as below and I would like to generate data as follow:
The idea behind the concept is SQL STATIC DATA MASKING (which was removed in current feature). Our client wants to mask the production data in test database. We don't have SQL STATIC DATA MASKING feature with sql now but we have patterns to mask the column, so what I am thinking is, with these pattern we can run the update query.
SELECT "(\d){7}" AS RandonNumber, "(\W){5}" AS RandomString FROM tbl
Output Should be
+---------------+--------------+
| RandonNumber | RandomString |
+---------------+--------------+
| 7894562 | AHJIL |
+---------------+--------------+
| 9632587 | ZLOKP |
+---------------+--------------+
| 4561238 | UJIOK |
+---------------+--------------+
Apart from this regular pattern, I have some customized pattern like Test_Product_(\d){1,4}, which should give result as below:
Test_Product_012
Test_Product_143
Test_Product_8936
Complete Patterns which I am going to use for masking
Other Patterns Samples
(\l){30} ahukoklijfahukokponmahukoahuko
(\d){7} 7895623
(\W){5} ABCDEF
Test_Product_(\d){1,4} Test_Product_007
0\.(\d){2} 0.59
https://www\.(\l){10}\.com https://www.anything.com
Well, I can give you a solution that is not based on regular expressions, but on a set of parameters - but it contains a complete set of all your requirements.
I've based this solution on a user-defined function I've written to generate random strings (You can read my blog post about it here) - I've just changed it so that it could generate the mask you wanted based on the following conditions:
The mask has an optional prefix.
The mask has an optional suffix.
The mask has a variable-length random string.
The random string can contain either lower-case letters, upper-case letters, digits, or any combination of the above.
I've decided these set of rules based on your update to the question, containing your desired masks:
(\d){7} 7895623
(\W){5} ABCDEF
Test_Product_(\d){1,4} Test_Product_007
0\.(\d){2} 0.59
https://www\.(\l){10}\.com https://www.anything.com
And now, for the code:
Since I'm using a user-defined function, I can't use inside it the NewId() built in function - so we first need to create a view to generate the guid for us:
CREATE VIEW GuidGenerator
AS
SELECT Newid() As NewGuid;
In the function, we're going to use that view to generate a NewID() as the base of all randomness.
The function itself is a lot more cumbersome then the random string generator I've started from:
CREATE FUNCTION dbo.MaskGenerator
(
-- use null or an empty string for no prefix
#Prefix nvarchar(4000),
-- use null or an empty string for no suffix
#suffix nvarchar(4000),
-- the minimum length of the random part
#MinLength int,
-- the maximum length of the random part
#MaxLength int,
-- the maximum number of rows to return. Note: up to 1,000,000 rows
#Count int,
-- 1, 2 and 4 stands for lower-case, upper-case and digits.
-- a bitwise combination of these values can be used to generate all possible combinations:
-- 3: lower and upper, 5: lower and digis, 6: upper and digits, 7: lower, upper nad digits
#CharType tinyint
)
RETURNS TABLE
AS
RETURN
-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)), -- 10
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY ##SPID) FROM E3 a, E2 b) --1,000,000
SELECT TOP(#Count)
n As Number,
CONCAT(#Prefix, (
SELECT TOP (Length)
-- choose what char combination to use for the random part
CASE #CharType
WHEN 1 THEN Lower
WHEN 2 THEN Upper
WHEN 3 THEN IIF(Rnd % 2 = 0, Lower, Upper)
WHEN 4 THEN Digit
WHEN 5 THEN IIF(Rnd % 2 = 0, Lower, Digit)
WHEN 6 THEN IIF(Rnd % 2 = 0, Upper, Digit)
WHEN 7 THEN
CASE Rnd % 3
WHEN 0 THEN Lower
WHEN 1 THEN Upper
ELSE Digit
END
END
FROM Tally As t0
-- create a random number from the guid using the GuidGenerator view
CROSS APPLY (SELECT Abs(Checksum(NewGuid)) As Rnd FROM GuidGenerator) As rand
CROSS APPLY
(
-- generate a random lower-case char, upper-case char and digit
SELECT CHAR(97 + Rnd % 26) As Lower, -- Random lower case letter
CHAR(65 + Rnd % 26) As Upper,-- Random upper case letter
CHAR(48 + Rnd % 10) As Digit -- Random digit
) As Chars
WHERE t0.n <> -t1.n -- Needed for the subquery to get re-evaluated for each row
FOR XML PATH('')
), #Suffix) As RandomString
FROM Tally As t1
CROSS APPLY
(
-- Select a random length between #MinLength and #MaxLength (inclusive)
SELECT TOP 1 n As Length
FROM Tally As t2
CROSS JOIN GuidGenerator
WHERE t2.n >= #MinLength
AND t2.n <= #MaxLength
AND t2.n <> t1.n
ORDER BY NewGuid
) As Lengths;
And finally, Test cases:
(\l){30} - ahukoklijfahukokponmahukoahuko
SELECT RandomString FROM dbo.MaskGenerator(null, null, 30, 30, 2, 1);
Results:
1, eyrutkzdugogyhxutcmcmplvzofser
2, juuyvtzsvmmcdkngnzipvsepviepsp
(\d){7} - 7895623
SELECT RandomString FROM dbo.MaskGenerator(null, null, 7, 7, 2, 4);
Results:
1, 8744412
2, 2275313
(\W){5} - ABCDE
SELECT RandomString FROM dbo.MaskGenerator(null, null, 5, 5, 2, 2);
Results:
1, RSYJE
2, MMFAA
Test_Product_(\d){1,4} - Test_Product_007
SELECT RandomString FROM dbo.MaskGenerator('Test_Product_', null, 1, 4, 2, 4);
Results:
1, Test_Product_933
2, Test_Product_7
0\.(\d){2} - 0.59
SELECT RandomString FROM dbo.MaskGenerator('0.', null, 2, 2, 2, 4);
Results:
1, 0.68
2, 0.70
https://www\.(\l){10}\.com - https://www.anything.com
SELECT RandomString FROM dbo.MaskGenerator('https://www.', '.com', 10, 10, 2, 1);
Results:
1, https://www.xayvkmkuci.com
2, https://www.asbfcvomax.com
Here's how you use it to mask the content of a table:
DECLARE #Count int = 10;
SELECT CAST(IntVal.RandomString As Int) As IntColumn,
UpVal.RandomString as UpperCaseValue,
LowVal.RandomString as LowerCaseValue,
MixVal.RandomString as MixedValue,
WithPrefix.RandomString As PrefixedValue
FROM dbo.MaskGenerator(null, null, 3, 7, #Count, 4) As IntVal
JOIN dbo.MaskGenerator(null, null, 10, 10, #Count, 1) As LowVal
ON IntVal.Number = LowVal.Number
JOIN dbo.MaskGenerator(null, null, 5, 10, #Count, 2) As UpVal
ON IntVal.Number = UpVal.Number
JOIN dbo.MaskGenerator(null, null, 10, 20, #Count, 7) As MixVal
ON IntVal.Number = MixVal.Number
JOIN dbo.MaskGenerator('Test ', null, 1, 4, #Count, 4) As WithPrefix
ON IntVal.Number = WithPrefix.Number
Results:
IntColumn UpperCaseValue LowerCaseValue MixedValue PrefixedValue
674 CCNVSDI esjyyesesv O2FAC7bfwg2Be5a91Q0 Test 4935
30732 UJKSL jktisddbnq 7o8B91Sg1qrIZSvG3AcL Test 0
4669472 HDLJNBWPJ qgtfkjdyku xUoLAZ4pAnpn Test 8
26347 DNAKERR vlehbnampb NBv08yJdKb75ybhaFqED Test 91
6084965 LJPMZMEU ccigzyfwnf MPxQ2t8jjmv0IT45yVcR Test 4
6619851 FEHKGHTUW wswuefehsp 40n7Ttg7H5YtVPF Test 848
781 LRWKVDUV bywoxqizju UxIp2O4Jb82Ts Test 6268
52237 XXNPBL beqxrgstdo Uf9j7tCB4W2 Test 43
876150 ZDRABW fvvinypvqa uo8zfRx07s6d0EP Test 7
Note that this is a fast process - generating 1000 rows with 5 columns took less than half a second on average in tests I've made.
I'm not convinced you need a Regex for this. Why not just use a "scrub script" and take advantage of the newid() function to generate a bunch of random data. It looks like you'll need to write such a script anyway, Regex or not, and this has the benefit of being very simple.
Let's say you start with the following data:
create table tbl (PersonalId int, Name varchar(max))
insert into tbl select 300300, 'Michael'
insert into tbl select 554455, 'Tim'
insert into tbl select 228899, 'John'
select * from tbl
Then run your script:
update tbl set PersonalId = cast(rand(checksum(newid())) * 1000000 as int)
update tbl set Name = left(convert(varchar(255), newid()), 6)
select * from tbl
Is it possible in SQL to generate random alphanumeric character string?
current strings are ‘Mxxxx’ and ‘Pxxxx’ where the xxxx is just a sequential number.
i need a format with alphanumeric character in position 1, 2 & 4, special characters in position 3 & 5. All will be random and unique.
The alphanumeric characters are A – Z, 1 – 9. The special characters are *, +, =, #, /, %, &, !, and ?.
is it possible to generate a list of 400 using this format in sql server?
Thanks
Have a look at my blog post about methods to generate random strings using the pure TSQL:
"SQL: generate random character string". The second method doesn't have any string length limitations.
You can use something like on the below lines.
DECLARE #AlLChars varchar(100) = 'ABCDEFGHIJKL'
DECLARE #Numerics varchar(100) = '12345678910'
DECLARE #SpecialChars varchar(10) = '*+=#/%&!?'
DECLARE #I INT = 1
WHILE #I <= 400
BEGIN
INSERT INTO tblStrings
SELECT
RIGHT( LEFT(#AlLChars,ABS(BINARY_CHECKSUM(NEWID())%26) + 1 ),1)
+RIGHT( LEFT(#Numeric,ABS(BINARY_CHECKSUM(NEWID())%10) + 1 ),1)
+RIGHT( LEFT(#SpecialChars,ABS(BINARY_CHECKSUM(NEWID())%9) + 1 ),1)
+RIGHT( LEFT(#AlLChars,ABS(BINARY_CHECKSUM(NEWID())%26) + 1 ),1)
+RIGHT( LEFT(#SpecialChars,ABS(BINARY_CHECKSUM(NEWID())%9) + 1 ),1)
SET #I = #I + 1
END;
You can do something like this:
;WITH ALPHA AS (
SELECT DISTINCT CHAR(object_id) AS C FROM sys.columns SC
WHERE
object_id BETWEEN ASCII('A') AND ASCII('Z')
OR object_id BETWEEN ASCII('0') AND ASCII('9')
), SPC AS(
SELECT DISTINCT CHAR(object_id) AS S FROM sys.columns SC
WHERE
object_id IN (ASCII('*'), ASCII('+'), ASCII('='), ASCII('#'), ASCII('/'), ASCII('%'), ASCII('&'), ASCII('!'))
)
SELECT TOP 10 A1.C + A2.C + A3.C + S1.S + S2.S + S3.S
FROM
ALPHA A1, ALPHA A2, ALPHA A3, SPC S1, SPC S2, SPC S3
ORDER BY NEWID()
Try it on rextester: stackoverflow-54809150-auto-string-generator
Adjust quantity and position of symbols as you need.
Beware:
It is quite resource consuming!
If you need to run it a lot of times - prepare table(s) with computed values and select from them.
I'm building a Markov chain name generator. I'm trying to replace a while loop with a recursive CTE. Limitations in using top and order by in the recursive part of the CTE have led me down the following path.
The point of all of this is to generate names, based on a model, which is just another word that I've chunked out into three character segments, stored in three columns in the Markov_Model table. The next character in the sequence will be a character from the Markov_Model, such that the 1st and 2nd characters in the model match the penultimate and ultimate character in the word being generated. Rather than generate a probability matrix for the that third character, I'm using a scalar function that finds all the characters that fit the criteria, and gets one of them randomly: order by newid().
The problem is that this formulation of the CTE gets the desired number of rows in the anchor segment, but the union that recursively calls the CTE only unions one row from the anchor. I've attached a sample of the desired output at the bottom.
The query:
;with names as
(
select top 5
cast('+' as nvarchar(50)) as char1,
cast('+' as nvarchar(50)) as char2,
cast(char3 as nvarchar(50)) as char3,
cast('++' + char3 as nvarchar(100)) as name_in_progress,
1 as iteration
from markov_Model
where char1 is null
and char2 is null
order by newid() -- Get some random starting characters
union all
select
n.char2 as char1,
n.char3 as char2,
cast(fnc.addition as nvarchar(50)) as char3,
cast(n.name_in_progress + fnc.addition as nvarchar(100)),
1 + n.iteration
from names n
cross apply (
-- This function takes the preceding two characters,
-- and gets a random character that follows the pattern
select isnull(dbo.[fn_markov_getNext] (n.char2, n.char3), ',') as addition
) fnc
)
select *
from names
option (maxrecursion 3) -- For debug
The trouble is the union only unions one row.
Example output:
char1 char2 char3 name_in_progress iteration
+ + F ++F 1
+ + N ++N 1
+ + K ++K 1
+ + S ++S 1
+ + B ++B 1
+ B a ++Ba 2
B a c ++Bac 3
a c h ++Bach 4
Note I'm using + and , as null replacers/delimeters.
What I want to see is the entirety of the previous recursion, with the addition of the new characters to the name_in_progress; each pass should modify the entirely of the previous pass.
My desired output would be:
Top 10 of the Markov_Model table:
Text of the function that gets the next character from the Markov_Model:
CREATEFUNCTION [dbo].[fn_markov_getNext]
(
#char2 nvarchar(1),
#char3 nvarchar(1)
)
RETURNS nvarchar(1)
AS
BEGIN
DECLARE #newChar nvarchar(1)
set #newChar = (
select top 1
isnull(char3, ',')
from markov_Model mm
where isnull(mm.char1, '+') = isnull(#char2, '+')
and isnull(mm.char2, '+') = isnull(#char3, ',')
order by (select new_id from vw_getGuid) -- A view that calls newid()
)
return #newChar
END
Hi I have interesting problem, I have about 1500 records within a table. the format of column I need to sort against is
String Number.number.(optional number) (optional string)
In reality this could look like this:
AB 2.10.19
AB 2.10.2
AB 2.10.20 (I)
ACA 1.1
ACA 1.9 (a) V
I need a way to sort these so that instead of
AB 2.10.19
AB 2.10.2
AB 2.10.20 (I)
I get this
AB 2.10.2
AB 2.10.19
AB 2.10.20 (I)
Because of the lack of standard formatting I'm at a loss as to how I can sort this via SQL.
I'm at the point of just manually identifying a new int column to denote the sorting value, unless anyone has any suggestion?
I'm using SQL Server 2008 R2
You would need to sort on the first text token, then on the second text token (which is not a number, its a string comprising some numbers) then optionally on any remaining text.
To make the 2nd token sort correctly (like a version number I presume) you can use a hierarchyid:
with t(f) as (
select 'AB 2.10.19' union all
select 'AB 2.10.2' union all
select 'AB 2.10.20 (I)' union all
select 'AB 2.10.20 (a) Z' union all
select 'AB 2.10.21 (a)' union all
select 'ACA 1.1' union all
select 'ACA 1.9 (a) V' union all
select 'AB 4.1'
)
select * from t
order by
left(f, charindex(' ', f) - 1),
cast('/' + replace(substring(f, charindex(' ', f) + 1, patindex('%[0-9] %', f + ' ') - charindex(' ', f)) , '.', '/') + '/' as hierarchyid),
substring(f, patindex('%[0-9] %', f + ' ') + 1, len(f))
f
----------------
AB 2.10.2
AB 2.10.19
AB 2.10.20 (a) Z
AB 2.10.20 (I)
AB 2.10.21 (a)
AB 4.1
ACA 1.1
ACA 1.9 (a) V
add text for the same length
SELECT column
FROM table
ORDER BY left(column + replicate('*', 100500), 100500)
--get the start and end position of numeric in the string
with numformat as
(select val,patindex('%[0-9]%',val) strtnum,len(val)-patindex('%[0-9]%',reverse(val))+1 endnum
from t
where patindex('%[0-9]%',val) > 0) --where condition added to exclude records with no numeric part in them
--get the substring based on the previously calculated start and end positions
,substrng_to_sort_on as
(select val, substring(val,strtnum,endnum-strtnum+1) as sub from numformat)
--Final query to sort based on the 1st,2nd and the optional 3rd numbers in the string
select val
from substrng_to_sort_on
order by
cast(substring(sub,1,charindex('.',sub)-1) as numeric), --1st number in the string
cast(substring(sub,charindex('.',sub)+1,charindex('.',reverse(sub))) as numeric), --second number in the string
cast(reverse(substring(reverse(sub),1,charindex('.',reverse(sub))-1)) as numeric) --third number in the string
Sample demo
Try this:
SELECT column
FROM table
ORDER BY CASE WHEN SUBSTRING(column,LEN(column)-1,1) = '.'
THEN 0
ELSE 1
END, column
This will put any strings that have a . in the second to last position first in the ordering.
Edit:
On second thought, this won't work with the leading 'AB', 'ACA' etc. Try this instead:
SELECT column
FROM table
ORDER BY SUBSTRING(column,1,2), --This will deal with leading letters up to 2 chars
CASE WHEN SUBSTRING(column,LEN(column)-1,1) = '.'
THEN 0
ELSE 1
END,
Column
Edit2:
To also compensate for the second numeric set, use this:
SELECT column
FROM table
ORDER BY substring(column,1,2),
CASE WHEN substring(column,charindex('.',column) + 2,1) = '.' and substring(column,len(column)-1,1) = '.' THEN 0
WHEN substring(column,charindex('.',column) + 2,1) = '.' and substring(column,len(column)-1,1) <> '.' THEN 1
WHEN substring(column,charindex('.',column) + 2,1) <> '.' and substring(column,len(column)-1,1) = '.' THEN 2
ELSE 3 END, column
Basically, this is a manual way to force hierarchical ordering by accounting for each condition.
I have a text like: Sentence one. Sentence two. Sentence three.
I want it to be:
Sentence one.
Sentence two.
Sentence three.
I assume I can replace '.' with '.' + char(10) + char(13), but how can I go about bullets? '•' character works fine if printed manually I just do not know how to bullet every sentence including the first.
-- Initial string
declare #text varchar(100)
set #text = 'Sentence one. Sentence two. Sentence three.'
-- Setting up replacement text - new lines (assuming this works) and bullets ( char(149) )
declare #replacement varchar(100)
set #replacement = '.' + char(10) + char(13) + char(149)
-- Adding a bullet at the beginning and doing the replacement, but this will also add a trailing bullet
declare #processedText varchar(100)
set #processedText = char(149) + ' ' + replace(#text, '.', #replacement)
-- Figure out length of substring to select in the next step
declare #substringLength int
set #substringLength = LEN(#processedText) - CHARINDEX(char(149), REVERSE(#processedText))
-- Removes trailing bullet
select substring(#processedText, 0, #substringLength)
I've tested here - https://data.stackexchange.com/stackoverflow/qt/119364/
I should point out that doing this in T-SQL doesn't seem correct. T-SQL is meant to process data; any presentation-specific work should be done in the code that calls this T-SQL (C# or whatever you're using).
Here's my over-the-top approach but I feel it's a fairly solid approach. It combines classic SQL problem solving techniques of Number tables for string slitting and use of the FOR XML for concatenating the split lines back together. The code is long but the only place you'd need to actually edit is the SOURCE_DATA section.
No knock on #Jeremy Wiggins approach, but I prefer mine as it lends itself well to a set based approach in addition to being fairly efficient code.
-- This code will rip lines apart based on #delimiter
-- and put them back together based on #rebind
DECLARE
#delimiter char(1)
, #rebind varchar(10);
SELECT
#delimiter = '.'
, #rebind = char(10) + char(149) + ' ';
;
-- L0 to L5 simulate a numbers table
-- http://billfellows.blogspot.com/2009/11/fast-number-generator.html
WITH L0 AS
(
SELECT
0 AS C
UNION ALL
SELECT
0
)
, L1 AS
(
SELECT
0 AS c
FROM
L0 AS A
CROSS JOIN L0 AS B
)
, L2 AS
(
SELECT
0 AS c
FROM
L1 AS A
CROSS JOIN L1 AS B
)
, L3 AS
(
SELECT
0 AS c
FROM
L2 AS A
CROSS JOIN L2 AS B
)
, L4 AS
(
SELECT
0 AS c
FROM
L3 AS A
CROSS JOIN L3 AS B
)
, L5 AS
(
SELECT
0 AS c
FROM
L4 AS A
CROSS JOIN L4 AS B
)
, NUMS AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS number
FROM
L5
)
, SOURCE_DATA (ID, content) AS
(
-- This query simulates your input data
SELECT 1, 'Sentence one. Sentence two. Sentence three.'
UNION ALL SELECT 7, 'In seed time learn, in harvest teach, in winter enjoy.Drive your cart and your plow over the bones of the dead.The road of excess leads to the palace of wisdom.Prudence is a rich, ugly old maid courted by Incapacity.He who desires but acts not, breeds pestilence.'
)
, MAX_LENGTH AS
(
-- this query is rather important. The current NUMS query generates a
-- very large set of numbers but we only need 1 to maximum lenth of our
-- source data. We can take advantage of a 2008 feature of letting
-- TOP take a dynamic value
SELECT TOP (SELECT MAX(LEN(SD.content)) AS max_length FROM SOURCE_DATA SD)
N.number
FROM
NUMS N
)
, MULTI_LINES AS
(
-- This query will make many lines out a single line based on the supplied delimiter
-- Need to retain the ID (or some unique value from original data to regroup it
-- http://www.sommarskog.se/arrays-in-sql-2005.html#tblnum
SELECT
SD.ID
, LTRIM(substring(SD.content, Number, charindex(#delimiter, SD.content + #delimiter, Number) - Number)) + #delimiter AS lines
FROM
MAX_LENGTH
CROSS APPLY
SOURCE_DATA SD
WHERE
Number <= len(SD.content)
AND substring(#delimiter + SD.content, Number, 1) = #delimiter
)
, RECONSITITUE (content, ID) AS
(
-- use classic concatenation to put it all back together
-- using CR/LF * (space) as delimiter
-- as a correlated sub query and joined back to our original table to preserve IDs
-- https://stackoverflow.com/questions/5196371/sql-query-concatenating-results-into-one-string
SELECT DISTINCT
STUFF
(
(
SELECT #rebind + M.lines
FROM MULTI_LINES M
WHERE M.ID = ML.ID
FOR XML PATH('')
)
, 1
, 1
, '')
, ML.ID
FROM
MULTI_LINES ML
)
SELECT
R.content
, R.ID
FROM
RECONSITITUE R
Results
content ID
----------------------------------------------------------- ---
• In seed time learn, in harvest teach, in winter enjoy.
• Drive your cart and your plow over the bones of the dead.
• The road of excess leads to the palace of wisdom.
• Prudence is a rich, ugly old maid courted by Incapacity.
• He who desires but acts not, breeds pestilence. 7
• Sentence one.
• Sentence two.
• Sentence three. 1
(2 row(s) affected)
References
Number table
Splitting strings via number table
SQL Query - Concatenating Results into One String
select '• On '+ cast(getdate() as varchar)+' I discovered how to do this '
Sample