select and concatenate everything before and after a certain character - sql

I've got a string like AAAA.BBB.CCCC.DDDD.01.A and I'm looking to manipulate this and end up with AAAA-BBB
I've achieved this by writing this debatable piece of code
declare #string varchar(100) = 'AAAA.BBB.CCCC.DDDD.01.A'
select replace(substring(#string,0,charindex('.',#string)) + substring(#string,charindex('.',#string,CHARINDEX('.',#string)),charindex('.',#string,CHARINDEX('.',#string)+1)-charindex('.',#string)),'.','-')
Is there any other way to achieve this which is more elegant and readable ?
I was looking at some string_split operations, but can't wrap my head around it.

If you are open to some JSON transformations, the following approach is an option. You need to transform the text into a valid JSON array (AAAA.BBB.CCCC.DDDD.01.A is transformed into ["AAAA","BBB","CCCC","DDDD","01","A"]) and get the required items from this array using JSON_VALUE():
Statement:
DECLARE #string varchar(100) = 'AAAA.BBB.CCCC.DDDD.01.A'
SET #string = CONCAT('["', REPLACE(#string, '.', '","'), '"]')
SELECT CONCAT(JSON_VALUE(#string, '$[0]'), '-', JSON_VALUE(#string, '$[1]'))
Result:
AAAA-BBB
Notes: With this approach you can easily access all parts from the input string by index (0-based).

I think this is a little cleaner:
declare #string varchar(100) = 'AAAA.BBB.CCCC.DDDD.01.A'
select
replace( -- replace '.' with '-' (A)
substring(#string, 1 -- in the substring of #string starting at 1
,charindex('.', #string -- and going through 1 before the index of '.'(B)
,charindex('.',#string)+1) -- that is after the first index of the first '.'
-1) -- (B)
,'.','-') -- (A)

Depending on what is in your string you might be able to abuse PARSENAME into doing it. Intended for breaking up names like adventureworks.dbo.mytable.mycolumn it works like this:
DECLARE #x as VARCHAR(100) = 'aaaa.bbb.cccc.ddddd'
SELECT CONCAT( PARSENAME(#x,4), '-', PARSENAME(#x,3) )
You could also look at a mix of STUFF to delete the first '.' and replace with '-' then LEFT the result by the index of the next '.' but it's unlikely to be neater than this or Kevin's proposal
Using string split would likely be as unwieldy:
SELECT CONCAT(MAX(CASE WHEN rn = 1 THEN v END), '-', MAX(CASE WHEN rn = 2 THEN v END))
FROM (
SELECT row_number () over (order by (select 0)) rn, value as v
FROM string_split(#x,'.')
) y WHERE rn IN (1,2)
Because the string is split to rows which then need to be numbered in order to filter and pull the parts you want. This also relies on the strings coming out of string split in the order they were in the original string, which MS do not guarantee will be the case

Related

SQL Server 2012 string functions

I have a field that can vary in length of the format CxxRyyy where x and y are numeric. I want to choose xx and yyy. For instance, if the field value is C1R12, then I want to get 1 and 12. if I use substring and charindex then I have to use a length, but I would like to use a position like
SUBSTRING(WPLocationNew, CHARINDEX('C',WPLocationNew,1)+1, CHARINDEX('R',WPLocationNew,1)-1)
or
SUBSTRING(WPLocationNew, CHARINDEX('C',WPLocationNew,1)+1, LEN(WPLocationNew) - CHARINDEX('R',WPLocationNew,1))
to get x, but I know that doesn't work. I feel like there is a fairly simple solution, but I am not coming up with it yet. Any suggestions
If these are cell references and will always be in the form C{1-5 digits}R{1-5 digits} you can do this:
DECLARE #t TABLE(Original varchar(32));
INSERT #t(Original) VALUES ('C14R4535'),('C1R12'),('C57R123');
;WITH src AS
(
SELECT Original, c = REPLACE(REPLACE(Original,'C',''),'R','.')
FROM #t
)
SELECT Original, C = PARSENAME(c,2), R = PARSENAME(c,1)
FROM src;
Output
Original
C
R
C14R4535
14
4535
C1R12
1
12
C57R123
57
123
Example db<>fiddle
If you need to protect against other formats, you can add
FROM #t WHERE Original LIKE 'C%[0-9]%R%[0-9]%'
AND PATINDEX('%[^C^R^0-9]%', Original) = 0
Updated db<>fiddle
It appears that you are attempting to parse an Excel cell reference. Those are predictably structured or I wouldn't suggest such an embarrassing hack as this.
Basically, take advantage of the fact that a try_cast in SQL ignores spaces when converting strings to numbers.
declare #val as varchar(20) = 'C1R12'
declare #newval as varchar(20)
declare #c as smallint
declare #r as smallint
--replace the C with 5 spaces
set #newval = replace(#val,'C',' ')
--replace the R with 5 spaces
set #newval = replace(#newval,'R',' ')
--take a look at the intermediate result, which is ' 1 14'
select #newval
set #c = try_cast(left(#newval,11) as smallint)
set #r = try_cast(right(#newval,6) as smallint)
--take a look at the results... two smallint, 1 and 14
select #c, #r
That can all be accomplished in one line for each element (a line for column and a line for row) but I wanted you to be able to understand what was happening so this example goes through the steps individually.
Here's yet another way:
declare #val as varchar(20) = 'C12R345'
declare #c as varchar(5)
declare #r as varchar(5)
set #c = SUBSTRING(#val, patindex('C%', #val)+1,(patindex('%R%', #val)-1)-patindex('C%', #val) )
set #r = SUBSTRING(#val, patindex('%R%', #val)+1, LEN(#val) -patindex('%R%', #val))
select cast(#c as int) as 'C', cast(#r as int) as 'R'
dbfiddle
There are lots of different ways to approach string parsing. Here's just one possible idea:
declare #s varchar(10) = 'C01R002';
select
rtrim( left(replace(stuff(#s, 1, 1, ''), 'R', ' '), 10)) as c,
ltrim(right(replace(substring(#s, 2, 10), 'R', ' '), 10)) as r
Strip out the 'C' and then replace the 'R' with enough spaces so that the left and right sides can be extracted using a fixed length and then easily trimmed back.
stuff() and substring() as used above are just different ways accomplish exactly the same thing. One advantage here is that it does use fairly portable string functions and it's conceivable that this is somewhat faster. This is also done inline and without multiple steps.

Edit string column in SQL - remove sections between separators

I have a string column in my table that contains 'Character-separated' data such as this:
"Value|Data|4|Z|11/06/2012"
This data is fed into a 'parser' and deserialised into a particular object. (The details of this aren't relevant and can't be changed)
The structure of my object has changed and now I would like to get rid of some of the 'sections' of data
So I want the previous value to turn into this
"Value|Data|11/06/2012"
I was hoping I might be able to get some help on how I would go about doing this in T-SQL.
The data always has the same number of sections, 'n' and I will want to remove the same sections for all rows , 'n-x and 'n-y'
So far I know I need an update statement to update my column value.
I've found various ways of splitting a string but I'm struggling to apply it to my scenario.
In C# I would do
string RemoveSecitons(string value)
{
string[] bits = string.split(value,'|');
List<string> wantedBits = new List<string>();
for(var i = 0; i < bits.Length; i++)
{
if ( i==2 || i==3) // position of sections I no longer want
{
continue;
}
wantedBits.Add(bits[i]);
}
return string.Join(wantedBits,'|');
}
But how I would do this in SQL I'm not sure where to start. Any help here would be appreciated
Thanks
Ps. I need to run this SQL on SQL Server 2012
Edit: It looks like parsing to xml in some manner could be a popular answer here, however I can't guarantee my string won't have characters such as '<' or '&'
Using NGrams8K you can easily write a nasty fast customized splitter. The logic here is based on DelimitedSplit8K. This will likely outperform even the C# code you posted.
DECLARE #string VARCHAR(8000) = '"Value|Data|4|Z|11/06/2012"',
#delim CHAR(1) = '|';
SELECT newString =
(
SELECT SUBSTRING(
#string, split.pos+1,
ISNULL(NULLIF(CHARINDEX(#delim,#string,split.pos+1),0),8000)-split.pos)
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY d.Pos), d.Pos
FROM
(
SELECT 0 UNION ALL
SELECT ng.position
FROM samd.ngrams8k(#string,1) AS ng
WHERE ng.token = #delim
) AS d(Pos)
) AS split(ItemNumber,Pos)
WHERE split.ItemNumber IN (1,2,5)
ORDER BY split.ItemNumber
FOR XML PATH('')
);
Returns:
newString
----------------------------
"Value|Data|11/06/2012"
Not the most elegant way, but works:
SELECT SUBSTRING(#str,1, CHARINDEX('|',#str,CHARINDEX('|',#str,1)+1)-1)
+ SUBSTRING(#str, CHARINDEX('|',#str,CHARINDEX('|',#str,CHARINDEX('|',#str,CHARINDEX('|',#str,1)+1)+1)+1), LEN(#str))
----------------------
Value|Data|11/06/2012
You might try some XQuery:
DECLARE #s VARCHAR(100)='Value|Data|4|Z|11/06/2012';
SELECT CAST('<x>' + REPLACE(#s,'|','</x><x>') + '</x>' AS XML)
.value('concat(/x[1],"|",/x[2],"|",/x[5])','nvarchar(max)');
In short: The value is trasformed to XML by some string replacements. Then we use the XQuery-concat to bind the first, the second and the fifth element together again.
This version is a bit less efficient but safe with forbidden characters:
SELECT CAST('<x>' + REPLACE((SELECT #s AS [*] FOR XML PATH('')),'|','</x><x>') + '</x>' AS XML)
.value('concat(/x[1],"|",/x[2],"|",/x[5])','nvarchar(max)')
Just to add a non-xml option for fun:
Edit and Caveat - In case anyone tries this for a different solution and doesn't read the comments...
HABO rightly noted that this is easily broken if any of the columns have a period (".") in them. PARSENAME is dependent on a 4 part naming structure and will return NULL if that is exceeded. This solution will also break if any values ever contain another pipe ("|") or another delimited column is added - the substring in my answer is specifically there as a workaround for the dependency on the 4 part naming. If you are trying to use this solution on, say, a variable with 7 delimited columns, it would need to be reworked or scrapped in favor of one of the other answers here.
DECLARE
#a VARCHAR(100)= 'Value|Data|4|Z|11/06/2012'
SELECT
PARSENAME(REPLACE(SUBSTRING(#a,0,LEN(#a)-CHARINDEX('|',REVERSE(#a))+1),'|','.'),4)+'|'+
PARSENAME(REPLACE(SUBSTRING(#a,0,LEN(#a)-CHARINDEX('|',REVERSE(#a))+1),'|','.'),3)+'|'+
SUBSTRING(#a,LEN(#a)-CHARINDEX('|',REVERSE(#a))+2,LEN(#a))
Here is a quick way to do it.
CREATE FUNCTION [dbo].StringSplitXML
(
#String VARCHAR(MAX), #Separator CHAR(1)
)
RETURNS #RESULT TABLE(id int identity(1,1),Value VARCHAR(MAX))
AS
BEGIN
DECLARE #XML XML
SET #XML = CAST(
('<i>' + REPLACE(#String, #Separator, '</i><i>') + '</i>')
AS XML)
INSERT INTO #RESULT
SELECT t.i.value('.', 'VARCHAR(MAX)')
FROM #XML.nodes('i') AS t(i)
WHERE t.i.value('.', 'VARCHAR(MAX)') <> ''
RETURN
END
GO
SELECT * FROM dbo.StringSplitXML( 'Value|Data|4|Z|11/06/2012','|')
WHERE id not in (3,4)
Note that using a UDF will slow things down, so this solution should be considered only if you have a reasonably small data set to work with.

SSMS replace all commas outside of quotation marks in string

I've written the following function in SSMS to replace any commas that are outside of quotation marks with ||||:
CREATE FUNCTION dbo.fixqualifier (#string nvarchar(max))
returns nvarchar(max)
as begin
DECLARE #STRINGTOPAD NVARCHAR(MAX)
DECLARE #position int = 1,#newstring nvarchar(max) ='',#QUOTATIONMODE INT = 0
WHILE(LEN(#string)>0)
BEGIN
SET #STRINGTOPAD = SUBSTRING(#string,0,IIF(#STRING LIKE '%"%',CHARINDEX('"',#string),LEN(#STRING)))
SET #newstring = #newstring + IIF(#QUOTATIONMODE = 1, REPLACE(#STRINGTOPAD,',','||||'),#STRINGTOPAD)
SET #QUOTATIONMODE = IIF(#QUOTATIONMODE = 1,0,1)
set #string = SUBSTRING(#string,1+IIF(#STRING LIKE '%"%',CHARINDEX('"',#string),LEN(#STRING)),LEN(#string))
END
return #newstring
end
The idea is for the function to find the first ", replace all ',' before that then switch to quotation mode 1 so it knows to not replace the , until it changes back to quotation mode 0 when it hits the 2nd " and so on.
so for example the string:
qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl
would become:
qwer||||tyu||||io||||asd||||"edffs,asdfgh"||||"jjkzx"||||kl
It works as expected but it's really inefficient when it comes to doing this for several thousand rows.
Is there a better way or doing this or at least speeding the function up.
Do a simple trick by Modulus
DECLARE #VAR VARCHAR(100) = 'qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl'
,#OUTPUT VARCHAR(100) = '';
SELECT #OUTPUT = #OUTPUT + CASE WHEN (LEN(#OUTPUT) - LEN(REPLACE(#OUTPUT, '"', ''))) % 2 = 0
THEN REPLACE(VAL, ',', '||||') ELSE VAL END
FROM (
SELECT SUBSTRING(#VAR, NUMBER, 1) VAL
FROM master.dbo.spt_values
WHERE type = 'P'
AND NUMBER BETWEEN 1 AND LEN(#VAR)
) A
PRINT #OUTPUT
Result:
qwer||||tyu||||io||||asd||||"edffs,asdfgh"||||"jjkzx"||||kl
By this LEN(#OUTPUT) - LEN(REPLACE(#OUTPUT, '"', '')) expression, you will get count of ". By taking Modulus of the count %2, if it is zero its even then you can replace commas, otherwise you will keep them.
This uses DelimitedSplit8k and completely avoids any RBAR methods (such as a WHILE or #Variable = #Variable +... (which is a hidden form of RBAR)).
It firstly splits on the quotation, and then on the commas, where the string isn't quoted. Finally it then puts the strings back together again, using the "old" STUFF and FOR XML PATH method:
USE Sandbox;
DECLARE #String varchar(8000) = 'qwer,tyu,io,asd,"edffs,asdfgh","jjkzx",kl';
WITH Splits AS(
SELECT QS.ItemNumber AS QuoteNumber, CS.ItemNumber AS CommaNumber, ISNULL(CS.Item, '"' + QS.Item + '"') AS DelimitedItem
FROM dbo.DelimitedSplit8K(#string,'"') QS
OUTER APPLY (SELECT *
FROM dbo.DelimitedSplit8K(QS.Item,',')
WHERE QS.ItemNumber % 2 = 1) CS
WHERE QS.Item <> ',')
SELECT STUFF((SELECT '||||' + S.DelimitedItem
FROM Splits S
ORDER BY S.QuoteNumber, S.CommaNumber
FOR XML PATH('')),1,1,'') AS DelimitedList;
(Note, DelimitedSplit8K does not accept more than 8,000 characters. If you have more than that, SQL Server is really not the right tool. STRING_SPLIT does not provide the ordinal position, so you would be unable to guarantee the rebuild order with it.)

SQL Server - Select column that contains query string and split values into anothers 'columns'

I need to do a select in a column that contains a query string like:
user_id=300&company_id=201503&status=WAITING OPERATION&count=1
I want to perform a select and break each value in a new column, something like:
user_id | company_id | status | count
300 | 201503 | WAITING OPERATION | 1
How can i do it in SQL Server without use procs?
I've tried a function:
CREATE FUNCTION [xpto].[SplitGriswold]
(
#List NVARCHAR(MAX),
#Delim1 NCHAR(1),
#Delim2 NCHAR(1)
)
RETURNS TABLE
AS
RETURN
(
SELECT
Val1 = PARSENAME(Value,2),
Val2 = PARSENAME(Value,1)
FROM
(
SELECT REPLACE(Value, #Delim2, '&') FROM
(
SELECT LTRIM(RTRIM(SUBSTRING(#List, [Number],
CHARINDEX(#Delim1, #List + #Delim1, [Number]) - [Number])))
FROM (SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects) AS x
WHERE Number <= LEN(#List)
AND SUBSTRING(#Delim1 + #List, [Number], LEN(#Delim1)) = #Delim1
) AS y(Value)
) AS z(Value)
);
GO
Execution:
select QueryString
from User.Log
CROSS APPLY notifier.SplitGriswold(REPLACE(QueryString, ' ', N'ŏ'), N'ŏ', '&') AS t;
But it returns me only one column with all inside:
QueryString
user_id=300&company_id=201503&status=WAITING OPERATION&count=1
Thanks in advance.
I've had to do this many times before, and you're in luck! Since you only have 3 delimiters per string, and that number is fixed, you can use SQL Server's PARSENAME function to do it. That's far less ugly than the best alternative (using the XML parsing stuff). Try this (untested) query (replace TABLE_NAME and COLUMN_NAME with the appropriate names):
SELECT
PARSENAME(REPLACE(COLUMN_NAME,'&','.'),1) AS 'User',
PARSENAME(REPLACE(COLUMN_NAME,'&','.'),2) AS 'Company_ID',
PARSENAME(REPLACE(COLUMN_NAME,'&','.'),3) AS 'Status',
PARSENAME(REPLACE(COLUMN_NAME,'&','.'),4) AS 'Count',
FROM TABLE_NAME
That'll get you the results in the form "user_id=300", which is far and away the hard part of what you want. I'll leave it to you to do the easy part (drop the stuff before the "=" sign).
NOTE: I can't remember if PARSENAME will freak out over the illegal name character (the "=" sign). If it does, simply nest another REPLACE in there to turn it into something else, like an underscore.
You need to use SQL SUBSTRING as part of your select statement. You would first need to build the first row, then use a UNION to return the second row.

How to sum numbers in a delimited string using SQL Server

I have a string containing numbers delimited by a pipe like so 23|12|12|32|43.
Using SQL I want to extract each number, add 10 and then sum to get a total.
Here is another alternative:
declare #str nvarchar(max) = '23|12|12|32|43';
set #str = 'select '+replace(#str, '|', '+');
exec(#str);
The answer using a recursive common table expression:
WITH cte AS (
SELECT
'23|12|12|32|43' + '|' AS string
,0 AS total
UNION ALL
SELECT
RIGHT(string, LEN(string) - PATINDEX('%|%', string))
,CAST(LEFT(string, PATINDEX('%|%', string) - 1) AS INT) + 10
FROM cte
WHERE PATINDEX('%|%', string) > 0
)
SELECT SUM(total) AS total FROM cte
As the recursion terminator I have put in a check to see if any more pipes exist in the string, however this then missed the last element which I have got around by concatenating an extra pipe on to the end of my original string, I think there is probably a better way to express the WHERE clause.
Here is another way of doing it:
DECLARE #s VARCHAR(1000) = '23|12|12|32|43'
SELECT CAST('<root><e>' + REPLACE(#s, '|', '</e><e>') + '</e></root>' AS XML)
.value('sum(/root/e) + count(/root/e) * 10', 'INT')
This uses casting to XML data type and functions provided by it.
I posted this just as an example, your approach has a much better performance.