Can the Select list in a SQL Statement use Regular Expressions - sql

I have a SQL statement,
select ColumnName from Table
And I get this result,
Error 192.168.1.67 UserName 0bce6c62-1efb-416d-bce5-71c3c8247b75 An existing ....
So anyway the field has a lot of stuff in it, I just want to get out the 'UserName'.
Can I use a regex for that?
I mean it would be kind of like this,
select SUBSTRING(ColumnName, 0, 5) from Table
Except the SUBSTRING would be replaced with a regex of some kind. I am comfortable with regex, but I am not sure how to apply it in this case, or even if you can.
If I could get this working it would be great because I plan to pull the data into a temporary table, and do some quite complicated things matching it with other tables etc. If I can get this all working it would save me writing a C# app to do it with.
Thanks.

No, out of the box, SQL Server doesn't support regexs.
You could retrofit those by means of a SQL-CLR assembly that you deploy into SQL Server.

I think going you should use SUBSTRING anyway. Using regular expression is more flexible but also lead to a large processing overhead. This becomes even worse if your have to process a large recordsets.
You have to justify if there's the need for flexibility in first place.
If so you should read about it here:
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Using T-SQL only can look like that:
SELECT 'Error 192.168.1.67 XUserNameX 0bce6c62-1efb-416d-bce5-71c3c8247b75 An existing' expr
INTO log_table
GO
WITH
split1 (expr, cstart, cend)
AS (
SELECT
expr, 1, 0
FROM
log_table a
), split2 (expr, cstart, cend, div)
AS (
SELECT
a.expr, a.cend + 1, CHARINDEX(' ', a.expr, a.cend + 1), 1
FROM
split1 a
UNION ALL
SELECT
a.expr, a.cend + 1, CHARINDEX(' ', a.expr, a.cend + 1), div+1
FROM
split2 a
WHERE
a.cend > 1
), substrings(expr, div)
AS (
SELECT
SUBSTRING(expr, cstart, cend - cstart), div
FROM
split2
)
SELECT expr from
substrings a
where
a.div = 3

UPDATE
we cannot tell where the start of the
username is. Unless we can say 'find
me the start character after the
second space'
That is fairly straightforward:
Filter out strings that have fewer than
two spaces (alternatively, have three
or more words);
Find the position after the first
space (alternatively, the beginning
of the second word);
Find the position after the the first
space after the first space
(alternatively, the beginning of the
third word);
Determine the length of the third
word using the position of the next
space (or the end of the string is
there are only three words);
Use the above values with the
SUBSTRING() function to return the
third word.
Example:
WITH MyTable (ColumnName)
AS
(
SELECT NULL
UNION ALL
SELECT ''
UNION ALL
SELECT 'One.'
UNION ALL
SELECT 'Two words.'
UNION ALL
SELECT 'Three word sentence.'
UNION ALL
SELECT 'Sentence containing four words.'
UNION ALL
SELECT 'Five words in this sentence.'
UNION ALL
SELECT 'Sentence containing more than five words.'
),
AtLeastThreeWords (ColumnName, pos_word_2_start)
AS
(
SELECT M1.ColumnName, CHARINDEX(' ', M1.ColumnName) + LEN(' ') + 1
FROM MyTable AS M1
WHERE LEN(M1.ColumnName) - LEN(REPLACE(M1.ColumnName, ' ', '')) >= 2
),
MyTable2 (ColumnName, pos_word_3_start)
AS
(
SELECT M1.ColumnName,
CHARINDEX(' ', M1.ColumnName, pos_word_2_start) + LEN(' ') + 1
FROM AtLeastThreeWords AS M1
),
MyTable3 (ColumnName, pos_word_3_start, pos_word_3_end)
AS
(
SELECT M1.ColumnName, M1.pos_word_3_start,
CHARINDEX(' ', M1.ColumnName, pos_word_3_start) + LEN(' ')
FROM MyTable2 AS M1
),
MyTable4 (ColumnName, pos_word_3_start, word_3_length)
AS
(
SELECT M1.ColumnName, M1.pos_word_3_start,
CASE
WHEN pos_word_3_start < pos_word_3_end
THEN pos_word_3_end - pos_word_3_start
ELSE LEN(M1.ColumnName) - pos_word_3_start + 1
END
FROM MyTable3 AS M1
)
SELECT M1.ColumnName,
SUBSTRING(M1.ColumnName, pos_word_3_start, word_3_length)
AS word_3
FROM MyTable4 AS M1;
ORIGINAL ANSWER:
Is the problem that the position and/or length of the username value may not be constant in the data but always follows the string 'username '? If so, you can use CHARINDEX with SUBSTRING e.g.
WITH MyTable (ColumnName)
AS
(
SELECT 'Error 192.168.1.67 UserName 0bce6c62-1efb-416d-bce5-71c3c8247b75 An existing ....'
UNION ALL
SELECT 'Username onedaywhen is invalid'
),
MyTable1 (ColumnName, pos1)
AS
(
SELECT M1.ColumnName, CHARINDEX('UserName ', M1.ColumnName) + LEN('UserName ') + 1
FROM MyTable AS M1
),
MyTable2 (ColumnName, pos1, pos2)
AS
(
SELECT M1.ColumnName, M1.pos1,
CHARINDEX(' ', M1.ColumnName, pos1) - M1.pos1
FROM MyTable1 AS M1
)
SELECT SUBSTRING(M1.ColumnName, M1.pos1, M1.pos2)
FROM MyTable2 AS M1;
...though you'd need to make it more robust e.g. when there is no trailing space after the username value etc.

Related

SQL - Remove Duplicate value between two columns

I'm looking a simple way to remove an unwanted Duplicate value.
The Dupe is part of a reference to another column, and not within the column itself, but the column I want to remove the dupe value from is multi-delimited with other values.
Here is an example table:
ID,Thing
Dog,Cat;Dog;Bird
Snake,Horse;Fish;Snake
Car,Car;Bus;Bike
As you can see Dog,Snake,Car are the values I need to remove from the Thing column.
Output:
ID,Thing
Dog,Cat;Bird
Snake,Horse;Fish
Car,Bus;Bike
Is there a way to match within a multidelimited field and pull out the exact match?
I'm using SQL Server MGMT studio. Thanks.
WITH CTE AS
(
SELECT ID, Thing, ROW_NUMBER() OVER (PARTITION BY Thing) AS rn
)
DELETE
FROM CTE
WHERE rn > 1
I believe this will do it. Test first by running just the CTE part of the query so you can see what rn is.
Your question and sample data is not very clear. I think what you want is to remove anything from the second column that is in the first column, in which case you can try using replace
select Id,
replace(replace(thing,id,''),';;',';')
from table
Storing multi-value elements in a column is never a good idea and is a conflict of interest with the relational data model; it pretty much always causes problems at some point.
What you can do is concatenate a leading and a trailing ; to the value of Thing and then replace the value of ID with an empty string.
Then remove the leading and trailing ;.
If your version of SQL Server is 2017+, you can use the function TRIM():
SELECT Id,
TRIM(';' FROM REPLACE(';' + Thing + ';', ';' + ID + ';', ';')) Thing
from tablename;
For previous versions use SUBSTRING():
SELECT Id,
SUBSTRING(
REPLACE(';' + Thing + ';', ';' + ID + ';', ';'),
2,
LEN(REPLACE(';' + Thing + ';', ';' + ID + ';', ';')) - 2
) Thing
from tablename;
If you want to update the table:
UPDATE tablename
SET Thing = TRIM(';' FROM REPLACE(';' + Thing + ';', ';' + ID + ';', ';'));
or:
UPDATE tablename
SET Thing = SUBSTRING(
REPLACE(';' + Thing + ';', ';' + ID + ';', ';'),
2,
LEN(REPLACE(';' + Thing + ';', ';' + ID + ';', ';')) - 2
);
See the demo.
I don't really understand what "multi-delimited" means with respect to a string. In your context it seems to suggest that you might have different types of delimiters. It definitely does mean that you have a really poor data model. If you want to remove the id from the things column, then my first suggestion is to fix the delimiters.
In SQL Server, you could use:
select t.*,
(select string_agg(s.value, ';')
from string_split(replace(t.things, ',', ';'), ';') s
where s.value <> t.id
) as new_things
from t;
If the delimited have some intrinsic meaning (did I mentioned that you should fix the data model?), then you can use a more brute force approach. Here is one method:
select t.*,
(case when things = id then ''
when things like concat(id, '[,;]%')
then stuff(things, 1, len(id) + 1, '')
when things like concat('%[,;]', id)
then left(things, len(things) - len(id) - 1)
when things like concat('%[,;]', id, '[,;]%')
then stuff(things, patindex(concat('%[,;]', id, '[,;]%'), things), len(id) + 1, '')
else things
end)
from t;
Here is a db<>fiddle.
Your Question is a good one. I used simple case statement to get the answer. It is CHARINDEX that helped to find the location of the value in Id column and then identified the position of the value in id and according to the position, string was replaced by required values.
--Preparing the table
SELECT *
INTO t
FROM (VALUES
('Dog', 'Cat;Dog;Bird'),
('Snake', 'Horse;Fish;Snake'),
('Car', 'Car;Bus;Bike')
) v(id, things)
--Query
SELECT id
,CASE WHEN CHARINDEX(reverse(id), reverse(things), 1) = 1 THEN REPLACE(things,';'+id ,'')
WHEN CHARINDEX(id, things, 1) < LEN(things) AND CHARINDEX(id, things, 1) > 1 THEN REPLACE(things, id +';' ,'')
WHEN CHARINDEX(id, things, 1) = 1 THEN REPLACE(things, id +';' ,'')
ELSE 'End'
END as [things]
FROM t

SQL: select the last values before a space in a string

I have a set of strings like this:
CAP BCP0018 36
MFP ACZZ1BD 265
LZP FEI-12 3
I need to extract only the last values from the right and before the space, like:
36
265
3
how will the select statement look like? I tried using the below statement, but it did not work.
select CHARINDEX(myField, ' ', -1)
FROM myTable;
Perhaps the simplest method in SQL Server is:
select t.*, v.value
from t cross apply
(select top (1) value
from string_split(t.col, ' ')
where t.col like concat('% ', val)
) v;
This is perhaps not the most performant method. You probably would use:
select right(t.col, charindex(' ', reverse(t.col)) - 1)
Note: If there are no spaces, then to prevent an error:
select right(t.col, charindex(' ', reverse(t.col) + ' ') - 1)
Since you have mentioned CHARINDEX() in question, I am assuming you are using SQL Server.
Try below
declare #table table(col varchar(100))
insert into #table values('CAP BCP0018 36')
insert into #table values('MFP ACZZ1BD 265')
insert into #table values('LZP FE-12 3')
SELECT REVERSE(LEFT(REVERSE(col),CHARINDEX(' ',REVERSE(col)) - 1)) FROM #table
Functions used
CHARINDEX ( expressionToFind , expressionToSearch ) : returns position of FIRST occurence of an expression inside another expression.
LEFT ( character_expression , integer_expression ) : Returns the left part of a character string with the specified number of characters.
REVERSE ( string_expression ) : Returns the reverse order of a string value

SQL Server : to get last 4 character in first column and get the first letter of the words in 2nd column but ignore non alphabets

Can I check will it be possible to run SQL with this requirement? I trying to get a new value for new column from these 2 existing columns ID and Description.
For ID, simply retrieve last 4 characters
For Description, would like to get the first alphabets for each word but ignore the numbers & symbols.
SQL Server has lousy string processing capabilities. Even split_string() doesn't preserve the order of the words that it finds.
One approach to this uses a recursive CTE to split the strings and accumulate the initials:
with t as (
select v.*
from (values (2004120, 'soccer field 2010'), (2004121, 'ruby field')) v(id, description)
),
cte as (
select id, description, convert(varchar(max), left(description, charindex(' ', description + ' '))) as word,
convert(varchar(max), stuff(description, 1, charindex(' ', description + ' ') , '')) as rest,
1 as lev,
(case when description like '[a-zA-Z]%' then convert(varchar(max), left(description, 1)) else '' end) as inits
from t
union all
select id, description, convert(varchar(max), left(rest, charindex(' ', rest + ' '))) as word,
convert(varchar(max), stuff(rest, 1, charindex(' ', rest + ' ') , '')) as rest,
lev + 1,
(case when rest like '[a-zA-Z]%' then convert(varchar(max), inits + left(rest, 1)) else inits end) as inits
from cte
where rest > ''
)
select id, description, inits + right(id, 4)
from (select cte.*, max(lev) over (partition by id) as max_lev
from cte
) cte
where lev = max_lev;
Here is a db<>fiddle.
To get the last 4 numbers of the ID you could use:
SELECT Id%10000 as New_Id from Tablename;
To get the starting of each Word you could use(letting the answer be String2):
LEFT(Description,1)
This is equivalent to using SUBSTRING(Description,1,1)
This helps you get the first letter of each word.
To concatenate both of them you could use the CONCAT function:
SELECT CONCAT(String2,New_Id)
See more on the CONCAT function here

SQL SERVER select string from right after a certain character

I have a bit of problem regarding sql select statement.
I have a column value that look like this
2>4>5 or
28>30>52 or
300>410>500 or
2>4>5>8
My question is, how can i get the value from RIGHT after the >
character, so the select statement from the value above will return
4
30
410
5
Thanks in advance
If you need second value from right, then try:
SELECT SUBSTRING_INDEX( SUBSTRING_INDEX(your_column, '>', -2), '>', 1);
EDIT
One solution for sql server:
DECLARE #str varchar(max);
set #str = '2>4>5>8';
SELECT reverse( substring(
substring( reverse(#str), charindex( '>', reverse(#str) )+1, len(#str) ), 0,
charindex( '>', substring( reverse(#str), charindex( '>', reverse(#str) )+1, len(#str) ) )
) );
This is similar to extracting the n-th element from a delimited string. The only difference is that in this case we want the n-th-to-last element. The change can be achieved with a double use of reverse. Assuming the table is MyTable and the field is MyColumn, here's one way:
SELECT
Reverse(
CAST('<x>' + REPLACE(Reverse(MyColumn),'>','</x><x>') + '</x>' AS XML).value('/x[2]', --x[2] because it's the second element in the reversed string
'varchar(5)' --Use something long enough to catch any number which might occur here
))
FROM
MyTable
With credit to #Shnugo for his efforts here: Using T-SQL, return nth delimited element from a string
You can't cast as an int where I've put varchar(5)since at that stage the strings are still reversed. If you need to convert to an integer, do that by wrapping a convert/cast on the outside.
;WITH cte1(Value)
AS
(
SELECT '2>4>5' Union all
SELECT '28>30>52' Union all
SELECT '300>410>500' Union all
SELECT '2>4>5>8'
)
SELECT
SUBSTRING(
(
REVERSE(SUBSTRING(((REVERSE((SUBSTRING(Value, RIGHT(CHARINDEX('>', Value), Len(Value)) + 1, Len(Value)))))),
CHARINDEX('>',((REVERSE((SUBSTRING(Value, RIGHT(CHARINDEX('>', Value), Len(Value)) + 1, Len(Value)))))))+1,LEN(Value)))
),CHARINDEX('>',(
REVERSE(SUBSTRING(((REVERSE((SUBSTRING(Value, RIGHT(CHARINDEX('>', Value), Len(Value)) + 1, Len(Value)))))),
CHARINDEX('>',((REVERSE((SUBSTRING(Value, RIGHT(CHARINDEX('>', Value), Len(Value)) + 1, Len(Value)))))))+1,LEN(Value)))
))+1,LEN(Value))
AS ExpectedValue
FROM cte1

How to get middle portion from Sql server table data?

I am trying to get First name from employee table, in employee table full_name is like this: Dow, Mike P.
I tried with to get first name using below syntax but it comes with Middle initial - how to remove middle initial from first name if any. because not all name contain middle initial value.
-- query--
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
len(Employee_First_Name)) AS FirstName
---> remove middle initial from right side from employee
-- result
Full_name Firstname Dow,Mike P. Mike P.
--few example for Full_name data---
smith,joe j. --->joe (need result as)
smith,alan ---->alan (need result as)
Instead of specifying the len you need to use charindex again, but specify that you want the second occurrence of a space.
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
CHARINDEX(' ', Employee_First_Name, 2)) AS FirstName
One thing to note, the second charindex can return 0 if there is no second occurence. In that case, you would want to use something like the following:
select Employee_First_Name as full_name,
SUBSTRING(
Employee_First_Name,
CHARINDEX(',', Employee_First_Name) + 1,
IIF(CHARINDEX(' ', Employee_First_Name, 2) = 0, Len(Employee_First_name), CHARINDEX(' ', Employee_First_Name, 2))) AS FirstName
This removes the portion before the comma.. then uses that string and removes everything after space.
WITH cte AS (
SELECT *
FROM (VALUES('smith,joe j.'),('smith,alan'),('joe smith')) t(fullname)
)
SELECT
SUBSTRING(
LTRIM(SUBSTRING(fullname,CHARINDEX(',',fullname) + 1,LEN(fullname))),
0,
COALESCE(NULLIF(CHARINDEX(' ',LTRIM(SUBSTRING(fullname,CHARINDEX(',',fullname) + 1,LEN(fullname)))),0),LEN(fullname)))
FROM cte
output
------
joe
alan
joe
To be honest, this is most easily expressed using multiple levels of logic. One way is using outer apply:
select ttt.firstname
from t outer apply
(select substring(t.full_name, charindex(', ', t.full_name) + 2, len(t.full_name) as firstmi
) tt outer apply
(select (case when tt.firstmi like '% %'
then left(tt.firstmi, charindex(' ', tt.firstmi)
else tt.firstmi
end) as firstname
) as ttt
If you want to put this all in one complicated statement, I would suggest a computed column:
alter table t
add firstname as (stuff((case when full_name like '%, % %.',
then left(full_name,
charindex(' ', full_name, charindex(', ', full_name) + 2)
)
else full_name
end),
1,
charindex(', ', full_name) + 2,
'')
If format of this full_name field is the same for all rows, you may utilize power of SQL FTS word breaker for this task:
SELECT N'Dow, Mike P.' AS full_name INTO #t
SELECT display_term FROM #t
CROSS APPLY sys.dm_fts_parser(N'"' + full_name + N'"', 1033, NULL, 1) p
WHERE occurrence = 2
DROP TABLE #t