T-SQL Convert Arithmetic Formula into its Components - sql

Problem Statement:
I have a Formula column that has the arithmetic operation in it. I want to extract the variable names from the formula, and delimit the variables with a comma and create a New Column "Formula Components"
The variable names follow the particular pattern - '%[^A-Za-z,_0-9 ]%'
However, I also want to keep the "Square Brackets" if they are to appear in the formula.
To Illustrate,
Input Data:
ID | Formula
------|-------------------------------------------
1 | ([x1] + [x2]) / 100
2 | ([y1] - [x2]) * 100
3 | z1 - z3
4 | [z4] % z3
5 | ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100
Desired Output
ID | Formula | FormulaComponents
------|------------------------------------------ |-----------------
1 | ([x1] + [x2]) / 100 | [x1],[x2]
2 | ([y1] - [x2]) * 100 | [y1],[x2]
3 | z1 - z3 | [z1],[z3]
4 | [z4] % z3 | [z4],[z3]
5 | ((x1 * 2) + ((y1 + 2)/[x1])*[z3])/100 | [x1],[y1],[z3]
As you can see above,
Row 1. The Formula column consists of two variable, so the Formula
components are [x1],[x2]
Row 5. Note that x1 appears two times in the formula; Once as x1
and once as [x1]. In this case, I only want to keep only [x1] once. [x1] could appear N number of
times in the Formula Column, but should appear only once in the FormulaComponents Column
P.S.: The Order of the variables appearing in the "FormulaComponents" column does not matter. So for example, in Row 5, the order can be [y1], [z3], [x1] OR [z3],[x1],[y1] and so on
To summarize: I want to write a SELECT statement in T-SQL that will create this new column.

You can split the string using string_split() and then carefully reaggregate the results:
select *
from t cross apply
(select string_agg('[' + value + ']', ',') as components
from (select distinct replace(replace(value, '[', ''), ']', '') as value
from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s
where value like '[[a-z]%'
) s
) s;
Here is a db<>fiddle.
This is made harder than necessary because your formulas do not have a canonical format. It would be simpler if all variables were surrounded by square braces. Or if all operators were surrounded by spaces.
EDIT:
SQL Server 2016 has string_split() but not string_agg(). That fan be replaced with XML "stuff":
You can split the string using string_split() and then carefully reaggregate the results:
select *
from t cross apply
(select stuff( (select distinct ',[' + value + ']'
from (select distinct replace(replace(value, '[', ''), ']', '') as value
from string_split(replace(replace(replace(replace(t.formula, '(', ' '), ')', ' '), '*', ' '), '/', ' '), ' ') s
where value like '[[a-z]%'
) t
order by 1
for xml path ('')
), 1, 1, '') as components
) s;

Related

Query to update strings using string_split function

I am trying to update column in table where data is in below format:
Id | ColA
----------
1 Peter,John:Ryan,Jack:Evans,Chris
2 Peter,John:Ryan,Jack
3 Hank,Tom
4
5 Cruise,Tom
I need to split the string by ':' and remove ',' and need to reverse the name and again append the same data separated by: and finally data should be as shown
Id | ColA
----------
1 John Peter:Jack Ryan:Chris Evans
2 John Peter:Jack Ryan
3 Tom Hank
4
5 Tom Cruise
Please let me know how can we achieve this
I tried to use Replace and Substring but how can we do it if we have data some are separated by two colon and some are separated by single colon.
Is there any way to identify and achieve the data in the above formatted one.
Here is a solution for SQL Server 2008 onwards.
It is based on XML and XQuery.
Using XQuery's FLWOR expression allows to tokenize odd vs. even XML elements. The rest is just a couple of the REPLACE() function calls to compose the desired output.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, tokens VARCHAR(1024));
INSERT INTO #tbl (tokens) VALUES
('Peter,John:Ryan,Jack:Evans,Chris'),
('Peter,John:Ryan,Jack'),
('Hank,Tom'),
(''),
('Cruise,Tom');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = ':'
, #comma CHAR(1) = ',';
SELECT ID, tokens
, REPLACE(REPLACE(c.query('
for $x in /root/r[position() mod 2 eq 0]
let $pos := count(root/r[. << $x])
return concat($x, sql:variable("#comma"), (/root/r[$pos])[1])
').value('text()[1]', 'VARCHAR(8000)')
, SPACE(1), #separator), #comma, SPACE(1)) AS result
FROM #tbl
CROSS APPLY (SELECT CAST('<root><r><![CDATA[' +
REPLACE(REPLACE(tokens,#comma,#separator), #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c)
ORDER BY ID;
Output
+----+----------------------------------+----------------------------------+
| ID | tokens | result |
+----+----------------------------------+----------------------------------+
| 1 | Peter,John:Ryan,Jack:Evans,Chris | John Peter:Jack Ryan:Chris Evans |
| 2 | Peter,John:Ryan,Jack | John Peter:Jack Ryan |
| 3 | Hank,Tom | Tom Hank |
| 4 | | NULL |
| 5 | Cruise,Tom | Tom Cruise |
+----+----------------------------------+----------------------------------+
SQL #2 (don't try it, it won't work)
Unfortunately, SQL Server doesn't fully support even XQuery 1.0 standard. XQuery 3.1 is the latest standard. XQuery 1.0 functions fn:substring-after() and fn:substring-before() are badly missing.
In a dream world a solution would be much simpler, along the following:
SELECT *
, c.query('
for $x in /root/r
return concat(fn:substring-after($x, ","), ",", fn:substring-before($x, ","))
')
FROM #tbl
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(tokens, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c);
Please up-vote the following suggestion to improve SQL Server:
SQL Server vNext (post 2019) and NoSQL functionality
It became one of the most popular requests for SQL Server.
The current voting tally is 590 and counting.
Something like this should work:
CREATE TABLE YourTableNameHere (
Id int NULL
,ColA varchar(1000) NULL
);
INSERT INTO YourTableNameHere (Id,ColA) VALUES
(1, 'Peter,John:Ryan,Jack:Evans,Chris')
,(2, 'Peter,John:Ryan,Jack')
,(3, 'Hank,Tom')
,(4, '')
,(5, 'Cruise,Tom');
SELECT
tbl.Id
,STUFF((SELECT
CONCAT(':'
,RIGHT(REPLACE(ss.value, ',', ' '), LEN(REPLACE(ss.value, ',', ' ')) - CHARINDEX(' ', REPLACE(ss.value, ',', ' '), 1)) /*first name*/
,' '
,CASE WHEN CHARINDEX(',', ss.value, 1) > 1 THEN LEFT(REPLACE(ss.value, ',', ' '), CHARINDEX(' ', REPLACE(ss.value, ',', ' '), 1) - 1) /*last name*/ ELSE '' END)
FROM
YourTableNameHere AS tbl_inner
CROSS APPLY string_split(tbl_inner.ColA, ':') AS ss
WHERE
tbl_inner.Id = tbl.Id
FOR XML PATH('')), 1, 1, '') AS ColA
FROM
YourTableNameHere AS tbl;
This uses the string_split function within a FOR XML clause to split the values in ColA by the : character, then replace the , with a space, parse to the left and right of the space, then recombine the parsed values delimited by a : character.
One thing to note here, per Microsoft the output of string_split is not guaranteed to be in the same order as the input:
Note
The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string.
So in order to guarantee the output of this function is going to concatenate the names back in the same order that they existed in the input column you would either need to implement your own function to split the string or come up with some criteria for combining them in a certain order. For example, you could recombine them in alphabetical order by adding ORDER BY ss.value to the inner query for ColA in the final result set. In my testing using your input the final values were ordered the same as the input column, but it is worth noting that that behaviour is not guaranteed and in order to guarantee it then you need to do more work.

Replace a value in a comma separated string in SQL Server database

I'm using a SQL Server 2014 database and I have a column that contains comma-separated values such as:
1,2,3
4,5
3,6,2
4,2,8
2
What I need to do is to replace the number 2 with the number 3 (string values) in each record and not duplicate the 3 if possible. I'm not sure that this can be done unless I use a function and I'm not sure how to do it in a function.
I think I need to split a string into a table and then loop the values and put it back together with the new value. Is there an easier way? Any help is appreciated.
Expect output would therefore be:
1,3
4,5
3,6
4,3,8
3
While it is possible, I do not encourage this:
DECLARE #old AS VARCHAR(3) = '2';
DECLARE #new AS VARCHAR(3) = '3';
WITH opdata(csv) AS (
SELECT '1,22,3' UNION ALL
SELECT '1,2,3' UNION ALL
SELECT '4,5' UNION ALL
SELECT '3,6,2' UNION ALL
SELECT '4,2,8' UNION ALL
SELECT '2'
), cte1 AS (
SELECT
csv,
CASE
WHEN ',' + csv + ',' LIKE '%,' + #old + ',%' THEN
CASE
WHEN ',' + csv + ',' LIKE '%,' + #new + ',%' THEN REPLACE(',' + csv + ',', ',' + #old + ',', ',') -- new already present so just delete old
ELSE REPLACE(',' + csv + ',', ',' + #old + ',', ',' + #new + ',') -- replace old with new
END
ELSE ',' + csv + ','
END AS tmp
FROM opdata
)
SELECT
csv,
STUFF(STUFF(tmp, 1, 1, ''), LEN(tmp) - 1, 1, '') AS res
FROM cte1
Result:
csv | res
-------+-------
1,22,3 | 1,22,3
1,2,3 | 1,3
4,5 | 4,5
3,6,2 | 3,6
4,2,8 | 4,3,8
2 | 3
Note that the plethora of ',...,' is required to avoid replacing values such as 22. If you are using SQL Server 2017 you can ditch the extra CTE + STUFF and use TRIM(',' FROM ...).
This isn't going to perform particularly well, however:
WITH CTE AS (
SELECT *
FROM (VALUES ('1,2,3'),
('4,5'),
('3,6,2'),
('4,2,8'),
('2')) V(DS))
SELECT CASE WHEN DS LIKE '%3%' THEN REPLACE(REPLACE(DS,'2,',''),',2','')
WHEN DS LIKE '%2%' THEN REPLACE(DS,'2','3')
ELSE DS
END
FROM CTE;
May be you are looking something like this.
SELECT REPLACE(CASE WHEN CHARINDEX('2', '1,2,3') > 0 THEN REPLACE('1,2,3', '2','') ELSE REPLACE('1,2,3', '2','3') END, ',,',',')
I have taken a hard coded value for demonstration. You can replace'1,2,3' with column name in the table.
To update:
DECLARE #was nvarchar(2) = 2,
#willbe nvarchar(2) = 3,
#d nvarchar(1) = ','
UPDATE strings
SET string = REVERSE(
STUFF(
REVERSE(
STUFF(
CASE WHEN CHARINDEX(#d+#willbe+#d,#d+string+#d) > 0
THEN REPLACE(#d+string+#d,#d+#was+#d,#d)
ELSE REPLACE(#d+string+#d,#d+#was+#d,#d+#willbe+#d)
END,1,1,'')
),1,1,''))
Output:
1,3
4,5
3,6
4,3,8
3

Parsing expression only with sql \ finding an order of expression's values

Is there a way to find an order of words/letters inside an expression found in the database?
To be more clear here is an example:
From table X i'm getting the Names: "a" and "b".
In other table there is the expression: "b + a",
The result I need is b,1 | a,2
Is there any way to do it using only SQL query?
P.S. I didn't find any reference to this subject...
Beautiful question! Take a look at this solution wchich breaks expression into list of identifiers:
DECLARE #val varchar(MAX) = 'b * (c + a) / (b - c)';
WITH Split AS
(
SELECT 1 RowNumber, LEFT(#val, PATINDEX('%[^a-z]%', #val)-1) Val, STUFF(#val, 1, PATINDEX('%[^a-z]%', #val), '')+'$' Rest
UNION ALL
SELECT RowNumber+1 Rownumber, LEFT(Rest, PATINDEX('%[^a-z]%', Rest)-1) Val, STUFF(Rest, 1, PATINDEX('%[^a-z]%', Rest), '') Rest
FROM Split
WHERE PATINDEX('%[^a-z]%', Rest)<>0
)
SELECT Val, ROW_NUMBER() OVER (ORDER BY MIN(RowNumber)) RowNumber FROM Split
WHERE LEN(Val)<>0
GROUP BY Val
It yields following results (only first occurences):
b 1
c 2
a 3
If executed with DECLARE #val varchar(MAX) = 'as * (c + a) / (bike - car)' returns:
as 1
c 2
a 3
bike 4
car 5
(From an similar question)
You can do it with CHARINDEX() that searches for a substring within a larger string, and returns the position of the match, or 0 if no match is found.
CHARINDEX(' a ',' ' + REPLACE(REPLACE(#mainString,'+',' '),'.',' ') + ' ')
Add more recursive REPLACE() calls for any other punctuation that may occur
For your question here is an example:
INSERT INTO t1 ([name], [index])
SELECT name, CHARINDEX(' ' + name + ' ',' ' + REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE('b * (c + a) / (b - c)','+',' '),'-',' '),'*',' '),'(',' '),')',' '),'/',' ') + ' ')
FROM t2
The result will be:
a, 10
b, 1
c, 6

SQL Sorting numeric and string

Hi I have interesting problem, I have about 1500 records within a table. the format of column I need to sort against is
String Number.number.(optional number) (optional string)
In reality this could look like this:
AB 2.10.19
AB 2.10.2
AB 2.10.20 (I)
ACA 1.1
ACA 1.9 (a) V
I need a way to sort these so that instead of
AB 2.10.19
AB 2.10.2
AB 2.10.20 (I)
I get this
AB 2.10.2
AB 2.10.19
AB 2.10.20 (I)
Because of the lack of standard formatting I'm at a loss as to how I can sort this via SQL.
I'm at the point of just manually identifying a new int column to denote the sorting value, unless anyone has any suggestion?
I'm using SQL Server 2008 R2
You would need to sort on the first text token, then on the second text token (which is not a number, its a string comprising some numbers) then optionally on any remaining text.
To make the 2nd token sort correctly (like a version number I presume) you can use a hierarchyid:
with t(f) as (
select 'AB 2.10.19' union all
select 'AB 2.10.2' union all
select 'AB 2.10.20 (I)' union all
select 'AB 2.10.20 (a) Z' union all
select 'AB 2.10.21 (a)' union all
select 'ACA 1.1' union all
select 'ACA 1.9 (a) V' union all
select 'AB 4.1'
)
select * from t
order by
left(f, charindex(' ', f) - 1),
cast('/' + replace(substring(f, charindex(' ', f) + 1, patindex('%[0-9] %', f + ' ') - charindex(' ', f)) , '.', '/') + '/' as hierarchyid),
substring(f, patindex('%[0-9] %', f + ' ') + 1, len(f))
f
----------------
AB 2.10.2
AB 2.10.19
AB 2.10.20 (a) Z
AB 2.10.20 (I)
AB 2.10.21 (a)
AB 4.1
ACA 1.1
ACA 1.9 (a) V
add text for the same length
SELECT column
FROM table
ORDER BY left(column + replicate('*', 100500), 100500)
--get the start and end position of numeric in the string
with numformat as
(select val,patindex('%[0-9]%',val) strtnum,len(val)-patindex('%[0-9]%',reverse(val))+1 endnum
from t
where patindex('%[0-9]%',val) > 0) --where condition added to exclude records with no numeric part in them
--get the substring based on the previously calculated start and end positions
,substrng_to_sort_on as
(select val, substring(val,strtnum,endnum-strtnum+1) as sub from numformat)
--Final query to sort based on the 1st,2nd and the optional 3rd numbers in the string
select val
from substrng_to_sort_on
order by
cast(substring(sub,1,charindex('.',sub)-1) as numeric), --1st number in the string
cast(substring(sub,charindex('.',sub)+1,charindex('.',reverse(sub))) as numeric), --second number in the string
cast(reverse(substring(reverse(sub),1,charindex('.',reverse(sub))-1)) as numeric) --third number in the string
Sample demo
Try this:
SELECT column
FROM table
ORDER BY CASE WHEN SUBSTRING(column,LEN(column)-1,1) = '.'
THEN 0
ELSE 1
END, column
This will put any strings that have a . in the second to last position first in the ordering.
Edit:
On second thought, this won't work with the leading 'AB', 'ACA' etc. Try this instead:
SELECT column
FROM table
ORDER BY SUBSTRING(column,1,2), --This will deal with leading letters up to 2 chars
CASE WHEN SUBSTRING(column,LEN(column)-1,1) = '.'
THEN 0
ELSE 1
END,
Column
Edit2:
To also compensate for the second numeric set, use this:
SELECT column
FROM table
ORDER BY substring(column,1,2),
CASE WHEN substring(column,charindex('.',column) + 2,1) = '.' and substring(column,len(column)-1,1) = '.' THEN 0
WHEN substring(column,charindex('.',column) + 2,1) = '.' and substring(column,len(column)-1,1) <> '.' THEN 1
WHEN substring(column,charindex('.',column) + 2,1) <> '.' and substring(column,len(column)-1,1) = '.' THEN 2
ELSE 3 END, column
Basically, this is a manual way to force hierarchical ordering by accounting for each condition.

Convert varchar to 3 (sometimes 4) chars in T-SQL

I select data from a database. The values are (field name is ADR_KOMP_VL) :
4 , 61A, 100, 12, 58, 123C, 6 A, 5
I need to convert these values to 3 digits (except when there is a letter then it is 4)
So the converted values should be:
004, 061A, 100, 012, 058, 123C, 006A, 005
The rules are:
Always 3 digits
No spaces
If the original value is less than three digits, put 0's in front of it.(The length is 3)
If the original value contains a letter, put 0's in front of it (but the length is 4)
For the "no space" part I have this:
select REPLACE(ADR_KOMP_VL, ' ','')
The solution I have so far is:
SELECT RIGHT('000' + CONVERT(VARCHAR(4),REPLACE(ADR_KOMP_VL, ' ','')), 3)
But this only gives me the right length, when there is no letter in the value. My problem is how to handle the values with a letter in them??
This only check if the last character is letter. Additional logic will be required if that's not the case
SELECT REPLICATE('0', CASE WHEN ISNUMERIC(RIGHT(ADR_KOMP_VL, 1)) = 0 THEN 4
ELSE 3
END - LEN(REPLACE(ADR_KOMP_VL, ' ', '')))
+ REPLACE(ADR_KOMP_VL, ' ', '')
FROM TX
EDIT - actually this might work better, checks for whole ADR_KOMP_VL if it's numeric:
SELECT REPLICATE('0', CASE WHEN ISNUMERIC(REPLACE(ADR_KOMP_VL, ' ', '')) = 0 THEN 4
ELSE 3
END - LEN(REPLACE(ADR_KOMP_VL, ' ', '')))
+ REPLACE(ADR_KOMP_VL, ' ', '')
FROM TX
SQLFiddle DEMO
You can use a case statement:
SELECT (case when ADR_KOMP_VL like '%[A-Z]%'
then RIGHT('0000' + CONVERT(VARCHAR(4),REPLACE(ADR_KOMP_VL, ' ','')), 4)
else RIGHT('000' + CONVERT(VARCHAR(4),REPLACE(ADR_KOMP_VL, ' ','')), 3)
end)