Remove only leading or trailing carriage returns - sql

I'm dumbfounded that this question has not been asked meaningfully already. How does one go about creating an equivalent function in SQL like LTRIM or RTRIM for carriage returns and line feeds ONLY at the start or end of a string.
Obviously REPLACE(REPLACE(#MyString,char(10),''),char(13),'') removes ALL carriage returns and new line feeds. Which is NOT what I'm looking for. I just want to remove leading or trailing ones.

Find the first character that is not CHAR(13) or CHAR(10) and subtract its position from the string's length.
LTRIM()
SELECT RIGHT(#MyString,LEN(#MyString)-PATINDEX('%[^'+CHAR(13)+CHAR(10)+']%',#MyString)+1)
RTRIM()
SELECT LEFT(#MyString,LEN(#MyString)-PATINDEX('%[^'+CHAR(13)+CHAR(10)+']%',REVERSE(#MyString))+1)

Following functions are enhanced types of trim functions you can use. Copied from sqlauthority.com
These functions remove trailing spaces, leading spaces, white space, tabs, carriage returns, line feeds etc.
Trim Left
CREATE FUNCTION dbo.LTrimX(#str VARCHAR(MAX)) RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #trimchars VARCHAR(10)
SET #trimchars = CHAR(9)+CHAR(10)+CHAR(13)+CHAR(32)
IF #str LIKE '[' + #trimchars + ']%' SET #str = SUBSTRING(#str, PATINDEX('%[^' + #trimchars + ']%', #str), LEN(#str))
RETURN #str
END
Trim Right
CREATE FUNCTION dbo.RTrimX(#str VARCHAR(MAX)) RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #trimchars VARCHAR(10)
SET #trimchars = CHAR(9)+CHAR(10)+CHAR(13)+CHAR(32)
IF #str LIKE '%[' + #trimchars + ']'
SET #str = REVERSE(dbo.LTrimX(REVERSE(#str)))
RETURN #str
END
Trim both Left and Right
CREATE FUNCTION dbo.TrimX(#str VARCHAR(MAX)) RETURNS VARCHAR(MAX)
AS
BEGIN
RETURN dbo.LTrimX(dbo.RTrimX(#str))
END
Using function
SELECT dbo.TRIMX(#MyString)
If you do use these functions you might also consider changing from varchar to nvarchar to support more encodings.

In SQL Server 2017 you can use the TRIM function to remove specific characters from beginning and end, in one go:
WITH testdata(str) AS (
SELECT CHAR(13) + CHAR(10) + ' test ' + CHAR(13) + CHAR(10)
)
SELECT
str,
TRIM(CHAR(13) + CHAR(10) + CHAR(9) + ' ' FROM str) AS [trim cr/lf/tab/space],
TRIM(CHAR(13) + CHAR(10) FROM str) AS [trim cr/lf],
TRIM(' ' FROM str) AS [trim space]
FROM testdata
Result:
Note that the last example (trim space) does nothing as expected since the spaces are in the middle.

Here's an example you may run:
I decided to cast the results as an Xml value, so when you click on it, you will be able to view the Carriage Returns.
DECLARE #CRLF Char(2) = (CHAR(0x0D) + CHAR(0x0A))
DECLARE #String VarChar(MAX) = #CRLF + #CRLF + ' Hello' + #CRLF + 'World ' + #CRLF + #CRLF
--Unmodified String:
SELECT CAST(#String as Xml)[Unmodified]
--Remove Trailing Whitespace (including Spaces).
SELECT CAST(LEFT(#String, LEN(REPLACE(#String, #CRLF, ' '))) as Xml)[RemoveTrailingWhitespace]
--Remove Leading Whitespace (including Spaces).
SELECT CAST(RIGHT(#String, LEN(REVERSE(REPLACE(#String, #CRLF, ' ')))) as Xml)[RemoveLeadingWhitespace]
--Remove Leading & Trailing Whitespace (including Spaces).
SELECT CAST(SUBSTRING(#String, LEN(REPLACE(#String, ' ', '_')) - LEN(REVERSE(REPLACE(#String, #CRLF, ' '))) + 1, LEN(LTRIM(RTRIM(REPLACE(#String, #CRLF, ' '))))) as Xml)[RemoveAllWhitespace]
--Remove Only Leading and Trailing CR/LF's (while still preserving all other Whitespace - including Spaces). - 04/06/2016 - MCR.
SELECT CAST(SUBSTRING(#String, PATINDEX('%[^'+CHAR(13)+CHAR(10)+']%',#String), LEN(REPLACE(#String, ' ', '_')) - PATINDEX('%[^'+CHAR(13)+CHAR(10)+']%',#String) + 1 - PATINDEX('%[^'+CHAR(13)+CHAR(10)+']%', REVERSE(#String)) + 1) as Xml)[RemoveLeadingAndTrailingCRLFsOnly]
Remember to remove the Cast-to-Xml, as this was done just as a Proof-of-Concept to show it works.
How is this better than the currently Accepted Answer?
At first glance this may appear to use more Functions than the Accepted Answer.
However, this is not the case.
If you combine both approaches listed in the Accepted Answer (to remove both Trailing and Leading whitespace), you will either have to make two passes updating the Record, or copy all of one Logic into the other (everywhere #String is listed), which would cause way more function calls and become even more difficult to read.

I was stuck using Microsoft SQL Server 2008 R2 and so basing my functions on #sqluser's answer I came up with the below. This will return an empty string if the string only contains the characters to be trimmed.
The bit that threw me was the pattern for PATINDEX must be included between % characters, which for a while I was thinking of as the same wildcard in a LIKE statement but which I now believe is just the syntax to denote a pattern, though I may be wrong!
CREATE FUNCTION [dbo].[ExtendedLTRIM](#string_to_trim VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE #tab CHAR(1) = CHAR(9);
DECLARE #line_feed CHAR(1) = CHAR(10);
DECLARE #carriage_return CHAR(1) = CHAR(13);
DECLARE #space CHAR(1) = CHAR(32);
DECLARE #characters_to_trim VARCHAR(10)
SET #characters_to_trim = #tab + #line_feed + #carriage_return + #space
IF #string_to_trim LIKE '[' + #characters_to_trim + ']%'
BEGIN
DECLARE #first_non_trim_character INT = PATINDEX('%[^' + #characters_to_trim + ']%', #string_to_trim);
IF #first_non_trim_character = 0 RETURN '';
RETURN SUBSTRING(#string_to_trim, #first_non_trim_character, 8000)
END
RETURN #string_to_trim
END
GO

To trim characters from a pre-defined list you'll want to create the following UDF (should work in 2008R2 and above).
Handles both sides in a single pass and doesn't care if it's a CRLF, LFCR (yep, seen that abomination more than once), bare LF or a bunch of spaces.
is easy to extend to e.g. add additional parameters to do LTRIM/RTRIM only, or a full purge (that last bit is simpler to do in 2017 by incorporating STRING_AGG, but perfectly doable in 2008R2); as a matter of fact this is a simplified version of something I use to do all those things. If anybody is interested then let me know and I'll update:
CREATE FUNCTION fnTrimHarder
(
#String VARCHAR(MAX)
)
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE
#Start INT,
#Len INT,
#Chars CHAR(5) = CONCAT(
CHAR(9), -- TAB
CHAR(10), -- LF
CHAR(13), -- CR
' '
), -- List of invalid characters
#Return VARCHAR(MAX) = '';
IF #String NOT LIKE '%[^' + #Chars + ']%' -- If string contains only invalid characters
OR COALESCE(#String, '') = '' -- Optional addition for NULL handling
RETURN #Return
ELSE
BEGIN -- Create a "table" of characters with ordinals, calculate the start of string and its length, then return the substring
WITH CTE AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM CTE
WHERE n < LEN(#String)
)
SELECT
#Start = MIN(n),
#Len = 1 + MAX(n) - MIN(n)
FROM CTE
WHERE SUBSTRING(#String, n, 1) NOT LIKE '[' + #Chars + ']';
SET #Return = SUBSTRING(#String, #Start, #Len)
END
RETURN #Return
END
GO

Related

How to create a function to split date and time from a string in SQL?

How can I remove value before '_' and show date and time in one row in TSQL Function?
Below is sample:
Declare #inputstring as varchar(50) = 'Studio9_20230126_203052' ;
select value from STRING_SPLIT( #inputstring ,'_')
Output Required: 2023-01-26 20:30:52.000
If we can safely assume that the value is always in the format {Some String}_{yyyyMMdd}_{hhmmss} then you can use STUFF a few times, firstly to remove the leading string up to the first underscore (_) character (using CHARINDEX to find that character), and then to inject 2 colon (:) characters. Finally you can REPLACE the remaining underscore with a space ( ), and then use TRY_CONVERT to attempt to convert the value to a datetime2(0).
DECLARE #inputstring varchar(50) = 'Studio9_20230126_203052';
SELECT TRY_CONVERT(datetime2(0),REPLACE(STUFF(STUFF(STUFF(#inputstring,1,CHARINDEX('_',#inputstring),''),14,0,':'),12,0,':'),'_',' '));
Note that this doesn't give the value you state you want in your question (2023-01-26 20:05:52.000) , but I assume this is a typographical error, and that the 05 for minutes should be 30.
Creating function
CREATE FUNCTION [dbo].[convert_to_date] (#inputstring NVARCHAR(MAX))
RETURNS DATETIME AS
BEGIN
DECLARE #finalString varchar(50), #out varchar(100)
SET #finalString = REPLACE ( (SUBSTRING (#inputstring, CHARINDEX('_', #inputstring)+1 , LEN(#inputstring))), '_', ' ')
--SELECT #finalString
SET #out = LEFT (#finalString, 4) + '-'
+ SUBSTRING(#finalString, 5, 2) + '-'
+ SUBSTRING(#finalString, 7, 2) + ' '
+ SUBSTRING(#finalString, 10, 2) + ':'
+ SUBSTRING(#finalString, 12, 2) + ':'
+ SUBSTRING(#finalString, 14, 2) + '.000'
RETURN #out
END
Select Query
SELECT dbo.[convert_to_date] ('Studio54541659_20230126_203052')
Output
2023-01-26 20:30:52.000
This will tolerate "somestring" in the format of "somestring_YYYYMMDD_HHMISS" being variable in length.
Declare #inputstring as varchar(50) = 'Studio9_20230126_203052' ;
SELECT DateAndTime = CONVERT(DATETIME,STUFF(STUFF(STUFF(v2.DT,14,0,':'),12,0,':'),9,1,' '))
,Identifier = LEFT(#inputstring,v1.Pos1-1) --Included this because I know how people are :D --Comment out if not wanted.
,Original = #inputstring --Original string just for checking. Comment out when happy.
FROM (VALUES(CHARINDEX('_',#inputstring)))v1(Pos1) --Position of first Underscore
CROSS APPLY (VALUES(SUBSTRING(#inputstring,v1.Pos1+1,50)))v2(DT) --String after first Underscore
;
Output looks like this and you end up with a DATETIME datatype. Comment out what you don't want for columns in the return.
I'll let you have some of the fun by converting it into an iTVF (inline Table Valued Function). Remember that any function that contains a "BEGIN" is ultimately going to be a part of a performance issue so make sure it's an iTVF :D
EDIT: Crud... I've gotta remember to scroll down. #Lamu already posted the same thing but it's probably better and fast if you just want the time and not the identifier I included.

Function to replace all non alpha-numeric and multiple whitespace characters with a single space

I am trying to write an efficient function to use in a calculated field which has the following characteristics
Replace all non alpha numeric characters with space
Replace multiple white spaces with a space
Trim and lower the results
Example input
A B##%$$C &^%D
Example output
a b c d
A normal regex pattern would match like so
[\W_]+
The following works, however I am not sure if there is a more efficient approach than using 2 loops ( O(n2) complexity at least) with PatIndex and Stuff, charindex and replace
Create Function [dbo].[Clean](#Temp nvarchar(1000))
Returns nvarchar(1000)
AS
Begin
Declare #Pattern as varchar(50) = '%[^a-z0-9 ]%'
While PatIndex(#Pattern, #Temp) > 0
Set #Temp = Stuff(#Temp, PatIndex(#Pattern, #Temp), 1, ' ')
while charindex(' ',#Temp ) > 0
set #Temp = replace(#Temp, ' ', ' ')
Return LOWER(TRIM(#Temp))
End
Usage
Select dbo.Clean(' A B##%$$C &^%D ')
Result
a b c d
Is there potentially a single pass approach I can use, or a sneaky method I am not aware of?
I'm not able to test the performance, but the following approach (without loops and based on some string manipulations) is an additional option.
Note, that you'll need at least SQL Server 2017 (for the TRANSLATE() call).
-- Input text and patterns
DECLARE #text varchar(1000) = ' A B##%$$C &^%D'
DECLARE #alphanumericpattern varchar(36) = 'abcdefghijklmnopqrstuvwxyz0123456789'
DECLARE #notalphanumericpattern varchar(1000)
-- Trim and lower the input text
SELECT #text = RTRIM(LTRIM(LOWER(#text)))
-- Get not alpha-numeric characters
SELECT #notalphanumericpattern =
REPLACE(
TRANSLATE(#text, #alphanumericpattern, REPLICATE('a', LEN(#alphanumericpattern))),
'a',
''
)
-- Replace all not alpha-numeric characters with a space
SELECT #text =
REPLACE(
TRANSLATE(#text, #notalphanumericpattern, REPLICATE('$', LEN(#notalphanumericpattern))),
'$',
' '
)
-- Replace multiple spaces with a single space
SELECT #text =
REPLACE(
REPLACE(
REPLACE(
#text,
' ',
'<>'
),
'><',
''
),
'<>',
' '
)
Result:
a b c d

SSMS replace all commas outside of quotation marks in string

I've written the following function in SSMS to replace any commas that are outside of quotation marks with ||||:
CREATE FUNCTION dbo.fixqualifier (#string nvarchar(max))
returns nvarchar(max)
as begin
DECLARE #STRINGTOPAD NVARCHAR(MAX)
DECLARE #position int = 1,#newstring nvarchar(max) ='',#QUOTATIONMODE INT = 0
WHILE(LEN(#string)>0)
BEGIN
SET #STRINGTOPAD = SUBSTRING(#string,0,IIF(#STRING LIKE '%"%',CHARINDEX('"',#string),LEN(#STRING)))
SET #newstring = #newstring + IIF(#QUOTATIONMODE = 1, REPLACE(#STRINGTOPAD,',','||||'),#STRINGTOPAD)
SET #QUOTATIONMODE = IIF(#QUOTATIONMODE = 1,0,1)
set #string = SUBSTRING(#string,1+IIF(#STRING LIKE '%"%',CHARINDEX('"',#string),LEN(#STRING)),LEN(#string))
END
return #newstring
end
The idea is for the function to find the first ", replace all ',' before that then switch to quotation mode 1 so it knows to not replace the , until it changes back to quotation mode 0 when it hits the 2nd " and so on.
so for example the string:
qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl
would become:
qwer||||tyu||||io||||asd||||"edffs,asdfgh"||||"jjkzx"||||kl
It works as expected but it's really inefficient when it comes to doing this for several thousand rows.
Is there a better way or doing this or at least speeding the function up.
Do a simple trick by Modulus
DECLARE #VAR VARCHAR(100) = 'qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl'
,#OUTPUT VARCHAR(100) = '';
SELECT #OUTPUT = #OUTPUT + CASE WHEN (LEN(#OUTPUT) - LEN(REPLACE(#OUTPUT, '"', ''))) % 2 = 0
THEN REPLACE(VAL, ',', '||||') ELSE VAL END
FROM (
SELECT SUBSTRING(#VAR, NUMBER, 1) VAL
FROM master.dbo.spt_values
WHERE type = 'P'
AND NUMBER BETWEEN 1 AND LEN(#VAR)
) A
PRINT #OUTPUT
Result:
qwer||||tyu||||io||||asd||||"edffs,asdfgh"||||"jjkzx"||||kl
By this LEN(#OUTPUT) - LEN(REPLACE(#OUTPUT, '"', '')) expression, you will get count of ". By taking Modulus of the count %2, if it is zero its even then you can replace commas, otherwise you will keep them.
This uses DelimitedSplit8k and completely avoids any RBAR methods (such as a WHILE or #Variable = #Variable +... (which is a hidden form of RBAR)).
It firstly splits on the quotation, and then on the commas, where the string isn't quoted. Finally it then puts the strings back together again, using the "old" STUFF and FOR XML PATH method:
USE Sandbox;
DECLARE #String varchar(8000) = 'qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl';
WITH Splits AS(
SELECT QS.ItemNumber AS QuoteNumber, CS.ItemNumber AS CommaNumber, ISNULL(CS.Item, '"' + QS.Item + '"') AS DelimitedItem
FROM dbo.DelimitedSplit8K(#string,'"') QS
OUTER APPLY (SELECT *
FROM dbo.DelimitedSplit8K(QS.Item,',')
WHERE QS.ItemNumber % 2 = 1) CS
WHERE QS.Item <> ',')
SELECT STUFF((SELECT '||||' + S.DelimitedItem
FROM Splits S
ORDER BY S.QuoteNumber, S.CommaNumber
FOR XML PATH('')),1,1,'') AS DelimitedList;
(Note, DelimitedSplit8K does not accept more than 8,000 characters. If you have more than that, SQL Server is really not the right tool. STRING_SPLIT does not provide the ordinal position, so you would be unable to guarantee the rebuild order with it.)

Using SQL Server to replace line breaks in columns with spaces

I have a large table of data where some of my columns contain line breaks. I would like to remove them and replace them with some spaces instead.
Can anybody tell me how to do this in SQL Server?
Thanks in advance
SELECT REPLACE(REPLACE(#str, CHAR(13), ''), CHAR(10), '')
This should work, depending on how the line breaks are encoded:
update t
set col = replace(col, '
', ' ')
where col like '%
%';
That is, in SQL Server, a string can contain a new line character.
#Gordon's answer should work, but in case you're not sure how your line breaks are encoded, you can use the ascii function to return the character value. For example:
declare #entry varchar(50) =
'Before break
after break'
declare #max int = len(#entry)
; with CTE as (
select 1 as id
, substring(#entry, 1, 1) as chrctr
, ascii(substring(#entry, 1, 1)) as code
union all
select id + 1
, substring(#entry, ID + 1, 1)
, ascii(substring(#entry, ID + 1, 1))
from CTE
where ID <= #max)
select chrctr, code from cte
print replace(replace(#entry, char(13) , ' '), char(10) , ' ')
Depending where your text is coming from, there are different encodings for a line break. In my test string I put the most common.
First I replace all CHAR(10) (Line feed) with CHAR(13) (Carriage return), then all doubled CRs to one CR and finally all CRs to the wanted replace (you want a blank, I put a dot for better visability:
Attention: Switch the output to "text", otherwise you wont see any linebreaks...
DECLARE #text VARCHAR(100)='test single 10' + CHAR(10) + 'test 13 and 10' + CHAR(13) + CHAR(10) + 'test single 13' + CHAR(13) + 'end of test';
SELECT #text
DECLARE #ReplChar CHAR='.';
SELECT REPLACE(REPLACE(REPLACE(#text,CHAR(10),CHAR(13)),CHAR(13)+CHAR(13),CHAR(13)),CHAR(13),#ReplChar);
I have the same issue, means I have a column having values with line breaks in it. I use the query
update `your_table_name` set your_column_name = REPLACE(your_column_name,'\n','')
And this resolves my issue :)
Basically '\n' is the character for Enter key or line break and in this query, I have replaced it with no space (which I want)
Keep Learning :)
zain

Want to know first character CharIndex

In Below example i Want to know charindex of first character except spaces tabs newlines etc.
I am not able able to do that because CHARINDEX() function of SQL want character for index
but in my string anyone comes dynamically.
Declare #str varchar(100)
set #str=' test String'
in above case i want charindex of 't' (means first character of string)
set #str=' String test'
in above case i want charindex of 'S' (means first character of string)
Anyone please suggest me the solution.
The best way would be to come up with some kind of a regex. You can also have carriage return + line feed (linebreak) and tab -characters, which won't show correctly unless you do something like this:
DECLARE #str VARCHAR(100)
SET #str=CHAR(9)+' '+CHAR(13)+CHAR(10)+'test String'
SELECT CHARINDEX(LTRIM(REPLACE(REPLACE(REPLACE(#str,CHAR(13),' '),CHAR(10),' '),CHAR(9),' ')), #str);
SELECT SUBSTRING(#str, CHARINDEX(LTRIM(REPLACE(REPLACE(REPLACE(#str,CHAR(13),' '),CHAR(10),' '),CHAR(9),' ')), #str), 1)
The characters are as follows:
CHAR(13) = carriage return
CHAR(10) = linefeed
CHAR(13) + CHAR(10) = standard newline characters
CHAR(9) = TAB
Use Collation for Case sensitive.
Select CHARINDEX ( 'S',#str COLLATE Latin1_General_CS_AS, 1 )
You could use PATINDEX to find the position of a character that is not one of a specific subset of excluded characters:
PATINDEX('%[^list of excluded characters]%', #str)
In your case, the excluded character list would consist of CHAR(32) (space), CHAR(9) (tab), CHAR(13) (carriage return), CHAR(10) (linefeed) and whatever else you mean by the etc.. Here is an example:
DECLARE #str varchar(100);
SET #str = '
test string';
SELECT PATINDEX('%[^' + CHAR(32) + CHAR(9) + CHAR(13) + CHAR(10) + ']%', #str);
The #str in the above example begins with a newline (CHAR(13) + CHAR(10)) followed by two spaces. Therefore, the output of the SELECT statement would be this:
----------
4