regexReplace using select - sql

I want to use a regular expression to remove special characters (!, ", #, $,%, &, /. (,), =,?, |) from a table
SELECT
'|R!$#&2-_D%2' as Original,
UPPER
(
REPLACE
(
( MDS_Demo.mdq.regexReplace
('|R!2- _D%2',
'[!|”#$%&/()=?»«;,:._]', '', 0
)
)
, ' ', ' '
)
) as Correct
The list of characters and words to remove identified is in a table, so I wanted to replace the list of character identified in the expression and used a select to a table where is listed all special character to removed.
SELECT
'|R!$#&2-_D%2' as Original,
UPPER(REPLACE((MDS_Demo.mdq.regexReplace('|R!2- _D%2',
< SELECT SPECIAL_CHARACTERS FROM TABLE01 >
, '', 0)), ' ', ' ') ) as Correct
Any suggestions?

I believe you can replace any string expression with a (SELECT ...)
i.e. SELECT ltrim( (SELECT ' trimmed') ) as test works here
http://sqlfiddle.com/#!6/8222f/4
.. so where you have your < SELECT SPECIAL_CHARACTERS FROM TABLE01 > just put the required SELECT inside brackets and you're good to go?

Related

SQL: select the last values before a space in a string

I have a set of strings like this:
CAP BCP0018 36
MFP ACZZ1BD 265
LZP FEI-12 3
I need to extract only the last values from the right and before the space, like:
36
265
3
how will the select statement look like? I tried using the below statement, but it did not work.
select CHARINDEX(myField, ' ', -1)
FROM myTable;
Perhaps the simplest method in SQL Server is:
select t.*, v.value
from t cross apply
(select top (1) value
from string_split(t.col, ' ')
where t.col like concat('% ', val)
) v;
This is perhaps not the most performant method. You probably would use:
select right(t.col, charindex(' ', reverse(t.col)) - 1)
Note: If there are no spaces, then to prevent an error:
select right(t.col, charindex(' ', reverse(t.col) + ' ') - 1)
Since you have mentioned CHARINDEX() in question, I am assuming you are using SQL Server.
Try below
declare #table table(col varchar(100))
insert into #table values('CAP BCP0018 36')
insert into #table values('MFP ACZZ1BD 265')
insert into #table values('LZP FE-12 3')
SELECT REVERSE(LEFT(REVERSE(col),CHARINDEX(' ',REVERSE(col)) - 1)) FROM #table
Functions used
CHARINDEX ( expressionToFind , expressionToSearch ) : returns position of FIRST occurence of an expression inside another expression.
LEFT ( character_expression , integer_expression ) : Returns the left part of a character string with the specified number of characters.
REVERSE ( string_expression ) : Returns the reverse order of a string value

Query SQL with similar values

I have to make a query to a base using as a comparison a string like this 12345678, but the value to compare is this way12.345.678, if I do the following query it does not return anything.
SELECT * FROM TABLA WHERE CAMPO = '12345678'
Where CAMPO would have the value of (12.345.678), if I replace = with a like, it does not return the data either
SELECT * FROM TABLA WHERE CAMPO like '12345678%'
SELECT * FROM TABLA WHERE CAMPO like '%12345678'
SELECT * FROM TABLA WHERE CAMPO like '%12345678%'
None of the 3 previous consultations works for me, how can I make this query?
The value can be of either 7, 8 or 9 numbers and the. It has to be every 3 from the end to the beginning
Use REPLACE() function to replace all the dots '.' as
SELECT *
FROM(
VALUES ('12.345.678'),
('23.456.789')
) T(CAMPO)
WHERE REPLACE(CAMPO, '.', '') = '12345678';
Your query should be
SELECT * FROM TABLA WHERE REPLACE(CAMPO, '.', '') = '12345678';
You can compare the string without the dots to a REPLACE(StringWithDots, '.','')
I recommend you to convert the number to numeric
So you can use < and > operators and all functions that require you to have a number...
the best way to achieve this is to make sure you remove any unecessary dots and convert the commas to dots. like this
CONVERT(NUMERIC(10, 2),
REPLACE(
REPLACE('7.000,45', '.', ''),
',', '.'
)
)
I hope this will help you out.
A SARGABLE solution would be to write a function that takes your target value ('12345678') and inserts the separators ('.') every third character from right to left. The result ('12.345.678') can then be used in a where clause and benefit from an index on CAMPO.
The following code demonstrates an approach without creating a user-defined function (UDF). Instead, a recursive common table expression (CTE) is used to process the input string three characters at a time to build the dotted target string. The result is used in a query against a sample table.
To see the results from the recursive CTE replace the final select statement with the commented select immediately above it.
-- Sample data.
declare #Samples as Table ( SampleId Int Identity, DottedDigits VarChar(20) );
insert into #Samples ( DottedDigits ) values
( '1' ), ( '12' ), ( '123' ), ( '1.234' ), ( '12.345' ),
( '123.456' ), ( '1.234.567' ), ( '12.345.678' ), ( '123.456.789' );
select * from #Samples;
-- Query the data.
declare #Target as VarChar(15) = '12345678';
with
Target as (
-- Get the first group of up to three characters from the tail of the string ...
select
Cast( Right( #Target, 3 ) as VarChar(20) ) as TargetString,
Cast( Left( #Target, case when Len( #Target ) > 3 then Len( #Target ) - 3 else 0 end ) as VarChar(20) ) as Remainder
union all
-- ... and concatenate the next group with a dot in between.
select
Cast( Right( Remainder, 3 ) + '.' + TargetString as VarChar(20) ),
Cast( Left( Remainder, case when Len( Remainder ) > 3 then Len( Remainder ) - 3 else 0 end ) as VarChar(20) )
from Target
where Remainder != ''
)
-- To see the intermediate results replace the final select with the line commented out below:
--select TargetString from Target;
select SampleId, DottedDigits
from #Samples
where DottedDigits = ( select TargetString from Target where Remainder = '' );
An alternative approach would be to add a indexed computed column to the table that contains Replace( CAMPO, '.', '' ).
If the table containing IDs like 12.345.678 is big (contains many records), I would add a computed field that removes the dots (and if this ID does never contain any alphanumeric characters other than dots and has no significant leading zeros then also cast it in an INT or BIGINT) and persist it and lay an index over it. That way you loose a little time when inserting the record but are querying it with maximum speed and therefore saving processor power.

SQL Server extract integers from string using regular expression

I have a string (unc file path) that I need to extract some integers that will be embedded in the string in a semi-predictable way.
Example strings:
\\servername\folder1\FTP\folder2\512/862450_FileBundle.zip
--OR-- : \\servername\folder1\FTP\folder2\512\862450_FileBundle.zip
--OR-- : servername/folder1/FTP/folder2/512/862450_FileBundle.zip
The following regular expression regular expression will match on any integer value that is bounded by a forward or backslash: (\/|\\)\d+(\/|\\)
So the REGEX above would match on "\512\", or "\512/", or "/512/" or even "/512\".
I have tried the following SQL and other variations without success:
DECLARE #testString varchar(50) = '\\servername\folder1\FTP\folder2\512/862450_FileBundle.zip'
SELECT PATINDEX('%(\/|\\)\d+(\/|\\)%', #testString)
I'm not terribly familiar with REGEX and SQL so I'm not even sure this is possible.
SQL Server doesn't have as good pattern matching abilities as regular expressions. You can search for the pattern:
[/\\][0-9]%[/\\]
That is, slash followed by a digit followed by any other string followed by a slash. This will match any characters after the first digit, but your examples have nothing of the form /1abc/.
If this is sufficient, then this does the trick:
select v.*,
left(v2.str2, patindex('%[/\\]%', v2.str2) - 1)
from (values ('\\servername\folder1\FTP\folder2\512/862450_FileBundle.zip')) v(str) cross apply
(values (stuff(v.str, 1, patindex('%[/\\][0-9]%[/\\]%', v.str), ''))) v2(str2)
Other than writing a UDF to loop through the characters, the only thing I can think of is brute force approach...
(The User Defined Function might be your least worst option.)
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=face1befe5e7c74f457846fc37eca649
SELECT
*,
SUBSTRING(test.unc_file_path, headMatch.pos+1, headMatch.chars)
FROM
test
OUTER APPLY
(
SELECT
MIN(pos), MIN(chars)
FROM
(
SELECT
PATINDEX('%' + head + body + tail + '%', test.unc_file_path) AS pos, chars
FROM
(
SELECT '\'
UNION ALL SELECT '/'
)
head(head)
CROSS JOIN
(
SELECT 1, '[0-9]'
UNION ALL SELECT 2, '[0-9][0-9]'
UNION ALL SELECT 3, '[0-9][0-9][0-9]'
UNION ALL SELECT 4, '[0-9][0-9][0-9][0-9]'
UNION ALL SELECT 5, '[0-9][0-9][0-9][0-9][0-9]'
)
body(chars, body)
CROSS JOIN
(
SELECT '\'
UNION ALL SELECT '/'
)
tail(tail)
)
match
WHERE
pos > 0
)
headMatch(pos, chars)

Possible to use select in an update clause

I am trying to update a table using a select like so. It does not work. Is this the correct method or do I have to put the result of the select into a temp table and update the table from that?
Update WaterRevPropInfo
Set StreetDir = Direction
where exists (SELECT StreetNum,
ISNULL
( LTRIM
( RIGHT
( RTRIM(StreetNum),
LEN
( StreetNum
) +
1 -
( PATINDEX --Identifies first instance of a numeric char
( '%[0-9]%',
StreetNum
) +
PATINDEX --Identifies first instance of a non-numeric char
( '%[^0-9]%',
SUBSTRING --that follows the first numeric char
( StreetNum,
PATINDEX
( '%[0-9]%',
StreetNum
),
LEN(StreetNum)
) + ' '
)
) +
1
)
),
' '
) AS 'Direction')
FROM WaterRevPropInfo
The exists will give you TRUE if WaterRevPropInfo has at least 1 row, regardless of what you put in the select. I think you need to do something like this:
UPDATE WaterRevPropInfo
SET StreetDir = ISNULL(LTRIM(RIGHT(RTRIM(StreetNum), LEN(StreetNum) + 1 - (PATINDEX --Identifies first instance of a numeric char
('%[0-9]%', StreetNum) + PATINDEX --Identifies first instance of a non-numeric char
('%[^0-9]%', SUBSTRING --that follows the first numeric char
(StreetNum, PATINDEX ('%[0-9]%', StreetNum), LEN(StreetNum)) + ' ' ) ) + 1 ) ), StreetDir)
It will assign all your logic to StreetDir, unless it's NULL, in which case it will keep its value (will reassign itself).

Can the Select list in a SQL Statement use Regular Expressions

I have a SQL statement,
select ColumnName from Table
And I get this result,
Error 192.168.1.67 UserName 0bce6c62-1efb-416d-bce5-71c3c8247b75 An existing ....
So anyway the field has a lot of stuff in it, I just want to get out the 'UserName'.
Can I use a regex for that?
I mean it would be kind of like this,
select SUBSTRING(ColumnName, 0, 5) from Table
Except the SUBSTRING would be replaced with a regex of some kind. I am comfortable with regex, but I am not sure how to apply it in this case, or even if you can.
If I could get this working it would be great because I plan to pull the data into a temporary table, and do some quite complicated things matching it with other tables etc. If I can get this all working it would save me writing a C# app to do it with.
Thanks.
No, out of the box, SQL Server doesn't support regexs.
You could retrofit those by means of a SQL-CLR assembly that you deploy into SQL Server.
I think going you should use SUBSTRING anyway. Using regular expression is more flexible but also lead to a large processing overhead. This becomes even worse if your have to process a large recordsets.
You have to justify if there's the need for flexibility in first place.
If so you should read about it here:
http://msdn.microsoft.com/en-us/magazine/cc163473.aspx
Using T-SQL only can look like that:
SELECT 'Error 192.168.1.67 XUserNameX 0bce6c62-1efb-416d-bce5-71c3c8247b75 An existing' expr
INTO log_table
GO
WITH
split1 (expr, cstart, cend)
AS (
SELECT
expr, 1, 0
FROM
log_table a
), split2 (expr, cstart, cend, div)
AS (
SELECT
a.expr, a.cend + 1, CHARINDEX(' ', a.expr, a.cend + 1), 1
FROM
split1 a
UNION ALL
SELECT
a.expr, a.cend + 1, CHARINDEX(' ', a.expr, a.cend + 1), div+1
FROM
split2 a
WHERE
a.cend > 1
), substrings(expr, div)
AS (
SELECT
SUBSTRING(expr, cstart, cend - cstart), div
FROM
split2
)
SELECT expr from
substrings a
where
a.div = 3
UPDATE
we cannot tell where the start of the
username is. Unless we can say 'find
me the start character after the
second space'
That is fairly straightforward:
Filter out strings that have fewer than
two spaces (alternatively, have three
or more words);
Find the position after the first
space (alternatively, the beginning
of the second word);
Find the position after the the first
space after the first space
(alternatively, the beginning of the
third word);
Determine the length of the third
word using the position of the next
space (or the end of the string is
there are only three words);
Use the above values with the
SUBSTRING() function to return the
third word.
Example:
WITH MyTable (ColumnName)
AS
(
SELECT NULL
UNION ALL
SELECT ''
UNION ALL
SELECT 'One.'
UNION ALL
SELECT 'Two words.'
UNION ALL
SELECT 'Three word sentence.'
UNION ALL
SELECT 'Sentence containing four words.'
UNION ALL
SELECT 'Five words in this sentence.'
UNION ALL
SELECT 'Sentence containing more than five words.'
),
AtLeastThreeWords (ColumnName, pos_word_2_start)
AS
(
SELECT M1.ColumnName, CHARINDEX(' ', M1.ColumnName) + LEN(' ') + 1
FROM MyTable AS M1
WHERE LEN(M1.ColumnName) - LEN(REPLACE(M1.ColumnName, ' ', '')) >= 2
),
MyTable2 (ColumnName, pos_word_3_start)
AS
(
SELECT M1.ColumnName,
CHARINDEX(' ', M1.ColumnName, pos_word_2_start) + LEN(' ') + 1
FROM AtLeastThreeWords AS M1
),
MyTable3 (ColumnName, pos_word_3_start, pos_word_3_end)
AS
(
SELECT M1.ColumnName, M1.pos_word_3_start,
CHARINDEX(' ', M1.ColumnName, pos_word_3_start) + LEN(' ')
FROM MyTable2 AS M1
),
MyTable4 (ColumnName, pos_word_3_start, word_3_length)
AS
(
SELECT M1.ColumnName, M1.pos_word_3_start,
CASE
WHEN pos_word_3_start < pos_word_3_end
THEN pos_word_3_end - pos_word_3_start
ELSE LEN(M1.ColumnName) - pos_word_3_start + 1
END
FROM MyTable3 AS M1
)
SELECT M1.ColumnName,
SUBSTRING(M1.ColumnName, pos_word_3_start, word_3_length)
AS word_3
FROM MyTable4 AS M1;
ORIGINAL ANSWER:
Is the problem that the position and/or length of the username value may not be constant in the data but always follows the string 'username '? If so, you can use CHARINDEX with SUBSTRING e.g.
WITH MyTable (ColumnName)
AS
(
SELECT 'Error 192.168.1.67 UserName 0bce6c62-1efb-416d-bce5-71c3c8247b75 An existing ....'
UNION ALL
SELECT 'Username onedaywhen is invalid'
),
MyTable1 (ColumnName, pos1)
AS
(
SELECT M1.ColumnName, CHARINDEX('UserName ', M1.ColumnName) + LEN('UserName ') + 1
FROM MyTable AS M1
),
MyTable2 (ColumnName, pos1, pos2)
AS
(
SELECT M1.ColumnName, M1.pos1,
CHARINDEX(' ', M1.ColumnName, pos1) - M1.pos1
FROM MyTable1 AS M1
)
SELECT SUBSTRING(M1.ColumnName, M1.pos1, M1.pos2)
FROM MyTable2 AS M1;
...though you'd need to make it more robust e.g. when there is no trailing space after the username value etc.