Optimization of a substring query with charindex to trim the left part of a string - sql

I need to get a substring of xyzdf/1234 resulting in 1234 (i.e. trimming the left part of the slash / ) . I have used
substring('xyzdf/1234',charindex('/','xyzdf/1234')+1,len('xyzdf/1234')-charindex('/','xyzdf/1234'))
which works but it is repetitive...
then I have used this way:
stuff('xyzdf/1234',1,charindex('/','xyzdf/1234'),'') and it works too and it is more compact, but still repeats the same argument twice 'xyzdf/1234'.
I wonder what would be the faster way to trim the left part. I will need to clean data in one column for million records. Not sure if the stuff command is faster enough. (Mind you it is a bulk operation). Thanks!

You could select the string from a VALUES.
That way you can repeat the value without double hardcoding it.
Then get the right part with the number from it.
F.e. using RIGHT, CHARINDEX, REVERSE and VALUES:
select right(val, charindex('/',reverse(val))-1) as nr
from (values ('xyzdf/1234')) q(val);
Or use SUBSTRING, CHARINDEX, LEN and VALUES:
select substring(val,charindex('/',val)+1,len(val)) as nr
from (values ('xyzdf/1234')) q(val);
Or abuse PARSENAME:
select parsename(replace('xyzdf/1234','/','.'),1) as nr;
Or use variables:
declare #value varchar(30) = 'xyzdf/1234';
declare #nr int = right(#value, charindex('/',reverse(#value))-1);
select #nr as nr;
But if the intention is to update a column so that only the number remains?
Then using the SUBSTRING method is probably still the safest.
Because it would keep those without / untouched, and without crashing on an Invalid length parameter passed error.
Example:
declare #Table table (id int identity(1,1) primary key, col1 varchar(30));
insert into #Table (col1) values
('xyzdf/1234'),
('12345');
update #Table
set col1 = substring(col1,charindex('/',col1)+1,len(col1))
where col1 like '%/[0-9]%';
select * from #Table;

Related

T-SQL: Count Numbers of semicolons before expression

I got a table with strings that look like that:
'9;1;test;A;11002'
How would I count how many semicolons are there before the 'A'?
Cheers!
Using string functions
select len(left(str,charindex(str,'A')) - len(replace(left(str,charindex(str,'A'), ';', '')) n
from tbl
Hint1: The whole issue has some smell... You should not store your data as CSV string. But sometimes we have to work with what we have...
Hint2: The following needs SQL-Server v2016. With an older version we'd need to do something similar based on XML.
Try this:
--A declared table to mockup your issue
DECLARE #tbl TABLE(ID INT IDENTITY, YourCSVstring VARCHAR(100));
INSERT INTO #tbl(YourCSVstring)
VALUES('9;1;test;A;11002');
--the query
SELECT t.ID
,A.*
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'["',REPLACE(t.YourCSVstring,';','","'),N'"]')) A;
The idea in short:
We use some replacements to translate your CSV-string to a JSON array.
Now we can use use OPENJSON() to read it.
The value is the array item, the key its zero-based index.
Proceed with this however you need it.
Just to give you some fun: You can easily read the CSV type-safe into columns by doubling the [[ and using WITH to specify your columns:
SELECT t.ID
,A.*
FROM #tbl t
CROSS APPLY OPENJSON(CONCAT(N'[["',REPLACE(t.YourCSVstring,';','","'),N'"]]'))
WITH(FirstNumber INT '$[0]'
,SecondNumber INT '$[1]'
,SomeText NVARCHAR(100) '$[2]'
,YourLetterA NVARCHAR(100) '$[3]'
,FinalNumber INT '$[4]')A
returns:
ID FirstNumber SecondNumber SomeText YourLetterA FinalNumber
1 9 1 test A 11002

String_Split inserts only the first value

I'm trying to insert comma separated Guids into a temp table, to later check for a value using IN in these Guids. The following query is inserting only the first value in the table twice.
DECLARE #campaignids nvarchar(max) = '1DEBD122-FF1B-4E87-8812-D427ABA5D54E,FBD06A2E-24D1-4C06-B71D-B4306D8EA3BD'
DECLARE #TempCampaignIds TABLE (CampaignId uniqueidentifier)
INSERT INTO #TempCampaignIds
SELECT CAST(#campaignids AS uniqueidentifier)
FROM STRING_SPLIT(#campaignids, ',')
SELECT CampaignId FROM #TempCampaignIds
--result
CampaignId
1DEBD122-FF1B-4E87-8812-D427ABA5D54E
1DEBD122-FF1B-4E87-8812-D427ABA5D54E
You need to use the value from the string:
INSERT INTO #TempCampaignIds (CampaignId)
SELECT CAST(s.value AS uniqueidentifier)
FROM STRING_SPLIT(#campaignids, ',') s;
Here is a db<>fiddle.
I'm actually surprised that your code works, but SQL Server converts the first value of such a string without an error. That doesn't seem to happen for other data types. In fact, SQL Server appears to look at only the first 36 characters for a unique identifier.

'LIKE' issues with FLOAT: SQL query needed to find values >= 4 decimal places

I have a conundrum....
There is a table with one NVARCHAR(50) Float column that has many rows with many numbers of various decimal lengths:
'3304.063'
'3304.0625'
'39.53'
'39.2'
I need to write a query to find only numbers with decimal places >= 4
First the query I wrote was:
SELECT
Column
FROM Tablename
WHERE Column LIKE '%.[0-9][0-9]%'
The above code finds all numbers with decimal places >= 2:
'3304.063'
'3304.0625'
'39.53'
Perfect! Now, I just need to increase the [0-9] by 2...
SELECT
Column
FROM Tablename
WHERE Column LIKE '%.[0-9][0-9][0-9][0-9]%'
this returned nothing! What?
Does anyone have an explanation as to what went wrong as well and/or a possible solution? I'm kind of stumped and my hunch is that it is some sort of 'LIKE' limitation..
Any help would be appreciated!
Thanks.
After your edit, you stated you are using FLOAT which is an approximate value stored as 4 or 8 bytes, or 7 or 15 digits of precision. The documents explicitly state that not all values in the data type range can be represented exactly. It also states you can use the STR() function when converting it which you'll need to get your formatting right. Here is how:
declare #table table (columnName float)
insert into #table
values
('3304.063'),
('3304.0625'),
('39.53'),
('39.2')
--see the conversion
select * , str(columnName,20,4)
from #table
--now use it in a where clause.
--Return all values where the last digit isn't 0 from STR() the conversion
select *
from #table
where right(str(columnName,20,4),1) != 0
OLD ANSWER
Your LIKE statement would do it, and here is another way just to show they both work.
declare #table table (columnName varchar(64))
insert into #table
values
('3304.063'),
('3304.0625'),
('39.53'),
('39.2')
select *
from #table
where len(right(columnName,len(columnName) - charindex('.',columnName))) >= 4
select *
from #table
where columnName like '%.[0-9][0-9][0-9][0-9]%'
One thing that could be causing this is a space in the number somewhere... since you said the column type was VARCHAR this is a possibility, and could be avoided by storing the value as DECIMAL
declare #table table (columnName varchar(64))
insert into #table
values
('3304.063'),
('3304. 0625'), --notice the space here
('39.53'),
('39.2')
--this would return nothing
select *
from #table
where columnName like '%.[0-9][0-9][0-9][0-9]%'
How to find out if this is the case?
select *
from #table
where columnName like '% %'
Or, anything but numbers and decimals:
select *
from #table
where columnName like '%[^.0-9]%'
The following is working fine for me:
declare #tab table (val varchar(50))
insert into #tab
select '3304.063'
union select '3304.0625'
union select '39.53'
union select '39.2'
select * from #tab
where val like '%.[0-9][0-9][0-9][0-9]%'
Assuming your table only has numerical data, you can cast them to decimal and then compare:
SELECT COLUMN
FROM tablename
WHERE CAST(COLUMN AS DECIMAL(19,4)) <> CAST(COLUMN AS DECIMAL(19,3))
You'd want to test the performance of this against using the character data type solutions that others have already suggested.
You can use REVERSE:
declare #vals table ([Val] nvarchar(50))
insert into #vals values ('3304.063'), ('3304.0625'), ('39.53'), ('39.2')
select [Val]
from #Vals
where charindex('.',reverse([Val]))>4

A query that will search for the highest numeric value in a table where the column has an alphanumeric sequence

I have a column (XID) that contains a varchar(20) sequence in the following format: xxxzzzzzz Where X is any letter or a dash and zzzzz is a number.
I want to write a query that will strip the xxx and evaluate and return which is the highest number in the table column.
For example:
aaa1234
bac8123
g-2391
After, I would get the result of 8123
Thanks!
A bit painful in SQL Server, but possible. Here is one method that assumes that only digits appear after the first digit (which you actually specify as being the case):
select max(cast(stuff(col, 1, patindex('%[0-9]%', col) - 1, '') as float))
from t;
Note: if the last four characters are always the number you are looking for, this is probably easier to do with right():
select max(right(col, 4))
Using Numbers table
declare #string varchar(max)
set #string='abc1234'
select top 1 substring(#string,n,len(#string))
from
numbers
where n<=len(#string)
and isnumeric(substring(#string,n,1))=1
order by n
Output:1234
Using PATINDEX you can achieve it, like this -
DECLARE #test table
(
id INT,
player varchar(100)
)
INSERT #test
VALUES (1,'aaa1234'),
(2,'bac8123'),
(3,'g-2391')
SELECT
MAX(CONVERT(INT, LTRIM(SUBSTRING(player, PATINDEX('%[0-9]%', player), LEN(player)))))
FROM #test
Try:
Select MAX(RIGHT(XID,17))
from table
You can also use this method
CREATE TABLE #Tmp
(
XID VARCHAR(20)
)
INSERT INTO #Tmp(XID)
VALUES ('aaa1234'), ('bac8123'), ('g-2391')
SELECT MAX(RIGHT(XID, LEN(XID) - 3))
FROM #Tmp

How to get the data between mth and nth occurrence in a string

I'm using a SQL Server query to fetch the column information. But I need some information which is after 3rd and 4th occurrence in that particular column
Here is my sample data
[xxxxxxx||gh||vbh||CAPACITY_CPU||aed]
[qwe34||asdf||qwe||CONNECTIVITY||ghj]
[ertgfy||fgv||yuhjj||ACCESS||rty]
[tyhuj||rtg||qwert||ACCESS||TMW]
I'm looking for the data information after 3rd and 4th occurrence of ||
Something like
Capacity_CPU
CONNECTIVITY
ACCESS
My source column is not specific length, it will vary in the length
Use PATINDEX
create regex for the column that you need, then use SUBSTRING to extract the string that you want
You can use mixture of SUBSTRING, CHARINDEX, LEFT AND RIGHT Function. The best solution is you have to play with this function.
`
Create table #t( Name varchar(200))
Insert into #t
values
('[xxxxxxx||gh||vbh||CAPACITY_CPU||aed]'),
('[qwe34||asdf||qwe||CONNECTIVITY||ghj]'),
('[ertgfy||fgv||yuhjj||ACCESS||rty]'),
('[tyhuj||rtg||qwert||ACCESS||TMW]')
Select * from #t
Select
name,
Right(LEFT(name,len(name)-6),charindex('||',reverse(LEFT(name,len(name)-7))))
From #t
`
1) Instead of trying to do such operations with those strings you could normalize database by designing and adding a new table. In this case, you would need a simple SELECT:
SELECT Column4
FROM dbo.Table;
2) Otherwise, one solution is to convert those strings into XML and to use nodes and value XML methods:
DECLARE #Source NVARCHAR(MAX);
SET #Source =
N'[xxxxxxx||gh||vbh||CAPACITY_CPU||aed]
[qwe34||asdf||qwe||CONNECTIVITY||ghj]
[ertgfy||fgv||yuhjj||ACCESS||rty]
[tyhuj||rtg||qwert||ACCESS||TMW]';
DECLARE #EncodedSource NVARCHAR(MAX);
SET #EncodedSource = (SELECT #source FOR XML PATH(''));
DECLARE #x XML;
SET #x = REPLACE(REPLACE(REPLACE(#EncodedSource, N'[', N'<row> <col>'), N']', N'"</col> </row>'), N'||', N'</col> <col>');
SELECT r.XmlContent.value('(col[1]/text())[1]', 'NVARCHAR(100)') AS Col1,
r.XmlContent.value('(col[4]/text())[1]', 'NVARCHAR(100)') AS Col4
FROM #x.nodes('/row') r(XmlContent);
Note: you need to replace NVARCHAR(length) with the proper data type and max. length.