Select specific portion of string using Regex match - sql

Please consider the below table. I am trying to retrieve only the EUR amount within the Tax strings. Some records vary more than the other in size, but the float numbers are always there.
OrderID SKU Price Tax
**** **** **** [<TV<standard#21.0#false#21.36#EUR>VT>]
**** **** **** [<TV<standard#21.0#false#7.21#EUR>VT>]
**** **** **** [<TV<standard#17.0#false#5.17#EUR>VT>]
I wrote a regular expression that matches what I need: \d+\W\d+ returns me both float values within the string. In Oracle SQL I can simply get the second occurrence with a query like:
SELECT REGEXP_SUBSTR(column, '\d+\W\d+',1,2) FROM table
Using the above approach I retrieve 21.36, 7.21 and 5.17 for those three records.
How can I achieve this with SQL Server?

Obviously regex would be the likely tool of choice here. But SQL Server does not have much native regex support. Here is a pure SQL Server solution making use of PATINDEX and CHARINDEX. It is a bit verbose, but gets the job done:
SELECT
SUBSTRING(Tax,
CHARINDEX('#', Tax, PATINDEX('%[0-9]#%', Tax) + 3) + 1,
CHARINDEX('#', Tax, CHARINDEX('#', Tax, PATINDEX('%[0-9]#%', Tax) + 3) + 1) -
CHARINDEX('#', Tax, PATINDEX('%[0-9]#%', Tax) + 3) - 1)
FROM yourTable;
Demo

Please try the following solution.
The approach is using XML for tokenization of the tax column.
It is producing an XML like below for each row:
<root>
<r>[<TV<standard</r>
<r>21.0</r>
<r>false</r>
<r>21.36</r>
<r>EUR>VT>]</r>
</root>
4th r element is a monetary value in question.
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, Tax VARCHAR(MAX));
INSERT INTO #tbl (Tax) VALUES
('[<TV<standard#21.0#false#21.36#EUR>VT>]'),
('[<TV<standard#21.0#false#7.21#EUR>VT>]'),
('[<TV<standard#17.0#false#5.17#EUR>VT>]');
-- DDL and sample data population, end
DECLARE #separator CHAR(1) = '#';
SELECT t.*
, c.value('(/root/r[4]/text())[1]', 'DECIMAL(10,2)') AS result
FROM #tbl AS t
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' +
REPLACE(tax, #separator, ']]></r><r><![CDATA[') +
']]></r></root>' AS XML)) AS t1(c);
Output
+----+-----------------------------------------+--------+
| ID | Tax | result |
+----+-----------------------------------------+--------+
| 1 | [<TV<standard#21.0#false#21.36#EUR>VT>] | 21.36 |
| 2 | [<TV<standard#21.0#false#7.21#EUR>VT>] | 7.21 |
| 3 | [<TV<standard#17.0#false#5.17#EUR>VT>] | 5.17 |
+----+-----------------------------------------+--------+

Related

How to use the SQL REPLACE Function, so that it will replace some text between a certain range, rather than one specific value

I have a table called Product and I am trying to replace some of the values in the Product ID column pictured below:
ProductID
PIDLL0000074853
PIDLL000086752
PIDLL00000084276
I am familiar with the REPLACE function and have used this like so:
SELECT REPLACE(ProductID, 'LL00000', '/') AS 'Product Code'
FROM Product
Which returns:
Product Code
PID/74853
PIDLL000086752
PID/084276
There will always be there letter L in the ProductID twice LL. However, the zeros range between 4-6. The L and 0 should be replaced with a /.
If anyone could suggest the best way to achieve this, it would be greatly appreciate. I'm using Microsoft SQL Server, so standard SQL syntax would be ideal.
Please try the following solution.
All credit goes to #JeroenMostert
SQL
-- DDL and sample data population, start
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, ProductID VARCHAR(50));
INSERT INTO #tbl (ProductID) VALUES
('PIDLL0000074853'),
('PIDLL000086752'),
('PIDLL00000084276'),
('PITLL0000084770');
-- DDL and sample data population, end
SELECT *
, CONCAT(LEFT(ProductID,3),'/', CONVERT(DECIMAL(38, 0), STUFF(ProductID, 1, 5, ''))) AS [After]
FROM #tbl;
Output
+----+------------------+-----------+
| ID | ProductID | After |
+----+------------------+-----------+
| 1 | PIDLL0000074853 | PID/74853 |
| 2 | PIDLL000086752 | PID/86752 |
| 3 | PIDLL00000084276 | PID/84276 |
| 4 | PITLL0000084770 | PIT/84770 |
+----+------------------+-----------+
This isn't particularly pretty in T-SQL, as it doesn't support regex or even pattern replacement. Therefore you method is to use things like CHARINDEX and PATINDEX to find the start and end positions and then replace (don't read REPLACE) that part of the text.
This uses CHARINDEX to find the 'LL', and then PATINDEX to find the first non '0' character after that position. As PATINDEX doesn't support a start position I have to use STUFF to remove the first characters.
Then, finally, we can use STUFF (again) to replace the length of characters with a single '/':
SELECT STUFF(V.ProductID,CI.I+2,ISNULL(PI.I,0),'/')
FROM (VALUES('PIDLL0000074853'),
('PIDLL000086752'),
('PIDLL00000084276'),
('PIDLL3246954384276'))V(ProductID)
CROSS APPLY(VALUES(NULLIF(CHARINDEX('LL',V.ProductID),0)))CI(I)
CROSS APPLY(VALUES(NULLIF(PATINDEX('%[^0]%',STUFF(V.ProductID,1,CI.I+2,'')),1)))PI(I);
If you are always starting with "PIDLL", you can just remove the "PIDLL", cast the rest as an INT to lose the leading 0's, then append the front of the string with "PID/". One line of code.
-- Sample Data
DECLARE #t TABLE (ProductID VARCHAR(40));
INSERT #t VALUES('PIDLL0000074853'),('PIDLL000086752'),('PIDLL00000084276');
-- Solution
SELECT t.ProductID, NewProdID = 'PID/'+LEFT(CAST(REPLACE(t.ProductID,'PIDLL','') AS INT),20)
FROM #t AS t;
Returns:
ProductID NewProdID
------------------ ----------------
PIDLL0000074853 PID/74853
PIDLL000086752 PID/86752
PIDLL00000084276 PID/84276

Extracting certain text between two characters

I have a nvarchar string from which I need to extract certain text from between characters.
Example: 1.abc.5,m001-1-Exit,822-FName-18001233321--2021-09-23 13:53:10 Thursday-m001-1-Exit-Swipe,Card NO: 822User ID: FNameName: 18001233321Dept: Read Date: 2021-09-23 13:53:10 ThursdayAddr: m001-1-ExitStatus: Swipe,07580ec2000002a52E917D0000000000372BA56E11010000
What I need:
| Name | Phone Number |
| -------- | -------------- |
| FName | 1800123321 |
My Attempt:
SELECT SUBSTRING(col, LEN(LEFT(col, CHARINDEX ('-', col))) + 1, LEN(col) - LEN(LEFT(col, CHARINDEX ('-', col))) - LEN(RIGHT(col, LEN(col) - CHARINDEX ('-', col))) - 1);
One way:
Use patindex to find "FName-"
Remove the start of the string up until and including "FName-"
Use patindex to find "--"
Remove the rest of the string from and including "--"
You can consolidate the query down to one line, but you'll find yourself repeating parts of the logic - which I like to avoid. And calculating one thing at a time makes it easier to debug.
select
A.Col
, B.StringStart
, C.NewString
, patindex('%--%',C.NewString) NewStringEnd
, substring(C.NewString,1,patindex('%--%',C.NewString)-1) -- <- Required Result
from (
values
(N'1.abc.5,m001-1-Exit,822-FName-18001233321--2021-09-23 13:53:10 Thursday-m001-1-Exit-Swipe,Card NO: 822User ID: FNameName: 18001233321Dept: Read Date: 2021-09-23 13:53:10 ThursdayAddr: m001-1-ExitStatus: Swipe,07580ec2000002a52E917D0000000000372BA56E11010000')
) A (Col)
cross apply (
values
(patindex('%FName-%',Col))
) B (StringStart)
cross apply (
values
(substring(A.Col,B.StringStart+6,len(A.Col)-B.StringStart-6))
) C (NewString);

Add Trailing Zeroes After Decimal

I am working with sku numbers that have the following 9 character structure:
a. a 3 digit number,
b. a period,
c. a five digit number.
An example: 505.12345.
A considerable % of the sku's end in 0. Examples: 505.12340, 505.12300, 505.12000.
I had no trouble keeping the trailing zeroes in SQL Server by setting the datatype to varchar after the migration from S3 -> SQL Server. I used a new machine learning model in AWS Sagemaker that cut off the trailing zeroes prior to the migration to S3.
The example sku's above now look like: 505.1234, 505.123, 505.12
My question: what is the best way to add trailing zeroes to all sku's where LEN([sku]) < 9? I would prefer to keep the sku datatype as varchar.
If you have a string, you can right-pad it with 0s as follows:
left(sku + replicate('0', 9), 9)
Alternatively:
sku + replicate('0', 9 - len(sku))
Demo on DB Fiddle:
select sku,
left(sku + replicate('0', 9), 9) new_sku,
sku + replicate('0', 9 - len(sku)) new_sku2
from (values ('505.1234'), ('505.123'), ('505.12'), ('505.12345')) x(sku)
sku | new_sku | new_sku2
:-------- | :-------- | :--------
505.1234 | 505.12340 | 505.12340
505.123 | 505.12300 | 505.12300
505.12 | 505.12000 | 505.12000
505.12345 | 505.12345 | 505.12345
One simple way would be to CAST back to DECIMAL(9, 5) and then CAST again to CHAR(9)
Data
drop table if exists #tTable;
go
create table #tTable(
dec_num varchar(20) not null);
Insert into #tTable values
('505.1234'),
('505.123'),
('505.12');
Query
select cast(cast(dec_num as decimal(9,5)) as char(9)) char_9
from #tTable;
Output
char_9
505.12340
505.12300
505.12000

find the end point of a pattern in SQL server

There is a comma separated string in a column which looks like
test=1,value=2.2,system=321
I want to extract value out from the string. I can use select PatIndex('%value=%',columnName) then use left, but this only find the beginning of the patindex.
How to identify the end of pattern value=%, so we can extract the value out?
Chain a few SUBSTRING with CHARINDEX and your PATHINDEX.
DECLARE #text VARCHAR(100) = 'test=1,value=2.21954,system=321'
SELECT
Original = #text,
Parsed = SUBSTRING( -- Get a portion of the original value
#text,
PATINDEX('%value=%',#text) + 6, -- ... starting from the 'value=' (without the 'value=')
-1 + CHARINDEX( -- ... and get as many characters until the first comma
',',
SUBSTRING( -- ... (find the comma starting from the 'value=' onwards)
#text,
PATINDEX('%value=%',#text) + 6,
100)))
Result:
Original Parsed
test=1,value=2.2,system=321 2.2
Note that the CHARINDEX will fail if there is no comma after your value=. You can filter this with a WHERE.
I strongly suggest to store your values already split on a proper table and you wont have to deal with string nightmares like this.
You can use CHARINDEX with starting position to find the first comma after the pattern. CROSS APPLY is used to keep the query easier to read:
WITH tests(str) AS (
SELECT 'test=1,value=2.2,system=321'
)
SELECT str, substring(str, pos1, pos2 - pos1) AS match
FROM tests
CROSS APPLY (SELECT PATINDEX('%value=%', str) + 6) AS ca1(pos1)
CROSS APPLY (SELECT CHARINDEX(',', str, pos1 + 1)) AS ca2(pos2)
-- 2.2
First of all, don't store denormalized data in this way, if you want to query them. SQL, the language, isn't good at string manipulation. Parsing and splitting strings can't take advantage of indexes either, which means any query that tried to find eg all records that refer to system 321 would have to scan and parse all rows.
SQL Server 2016 and JSON
SQL Server 2016 added suppor for JSON and the STRING_SPLIT function. Earlier versions already provided the XML type. It's better to store complex values as JSON or XML instead of trying to parse the string.
One option is to convert the string into a JSON object and retrieve the value contents, eg :
DECLARE #text VARCHAR(100) = 'test=1,value=2.2,system=321'
select json_value('{"' + replace(replace(#text,',','","'),'=','":"') + '"}','$.value')
This returns 2.2.
The replacements converted the original string into
{"test":"1","value":"2.2","system":"321"}
JSON_VALUE(#json,'$.') will return the value property of that object
Earlier SQL Server versions
In earlier SQL Server version, you can convert that string into an XML element the same way and use XQuery :
DECLARE #text VARCHAR(100) = 'test=1,value=2.2,system=321';
declare #xml varchar(100)='<r ' + replace(replace(#text,',','" '),'=',' ="') + '" />';
select #xml
select cast(#xml as xml).value('(/r[1]/#value)','varchar(20)')
In this case #xml contains :
<r test ="1" value ="2.2" system ="321" />
The query result is 2.2
You can try like following.
DECLARE #xml AS XML
SELECT #xml = Cast(( '<X>' + Replace(txt, ',', '</X><X>') + '</X>' ) AS XML)
FROM (VALUES ('test=1,value=2.2,system=321')) v(txt)
SELECT LEFT(value, Charindex('=', value) - 1) AS LeftPart,
RIGHT(value, Charindex('=', Reverse(value)) - 1) AS RightPart
FROM (SELECT n.value('.', 'varchar(100)') AS value
FROM #xml.nodes('X') AS T(n))T
Online Demo
Output
+----------+-----------+
| LeftPart | RightPart |
+----------+-----------+
| test | 1 |
+----------+-----------+
| value | 2.2 |
+----------+-----------+
| system | 321 |
+----------+-----------+
You can try the below query if you are using SQL Server (2016 or above)
SELECT RIGHT(Value,CHARINDEX('=',REVERSE(Value))-1) FROM YourTableName
CROSS APPLY STRING_SPLIT ( ColumnName , ',' )
WHERE Value Like 'Value=%'

Unique 8 digit incremental number

I'm trying to generate a number which will ultimately be stored as string(varchar). e.g.
First - ABC00000001
Second- ABC00000002
.........................
I am able to generate character string as expected. Now the problem is,incremental number.
What i am trying to do is get the last number stored e.g. ABC00000009 and generate the next number that is ABC00000010. How to do the same?
If i extract integers from this than i will get 1 or 10,how to make it according to 8 digit format.
Any help would really be appreciated.
Of course if changing the table structure is not an option, you can try this:
DECLARE #lastValue VARCHAR(15) = 'ABC00000001'
SELECT CONCAT('ABC', RIGHT(100000000 + CAST(RIGHT(#lastValue, 8) AS INT) + 1, 8))
Result
-----------
ABC00000002
I would suggest that you create an identity column. This will increment (usually by 1, but not always). Then create a computed column:
alter table t add generated_number as
('ABC' + right(replicate('0', 8) + cast(idcol as varchar(255)), 8));
Almost the same approach Gordon Linoff has taken, I just prefer to use math where possible instead of string concatenation. My answer is different only because I add id value to 100000000 instead of using replicate.
CREATE TABLE dbo.test (
id int IDENTITY(1, 1) PRIMARY KEY
, some_value sysname UNIQUE
, super_column AS 'ABC' + RIGHT(100000000 + id, 8));
GO
INSERT INTO dbo.test (some_value)
VALUES ('some_value_1'), ('some_value_2');
SELECT *
FROM dbo.test AS T;
Result:
+----+--------------+--------------+
| id | some_value | super_column |
+----+--------------+--------------+
| 1 | some_value_1 | ABC00000001 |
| 2 | some_value_2 | ABC00000002 |
+----+--------------+--------------+