Regex pattern for adding N before string value in sql statement - sql

I need help with the regex pattern to add N before all string values in sql-statements
For example:
Before: SELECT * FROM table WHERE column = '123';
After: SELECT * FROM table WHERE column = N'123';
In that example, I can use this pattern: '[^']+'. However, I need help with pattern for this example:
Before: SELECT * FROM table WHERE column1 = 'One''Two' AND column2 = 'abc';
After: SELECT * FROM table WHERE column1 = N'One''Two' AND column2 = N'abc';
If there's a double '', it should skip those.
Information about my problem: You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server

Well, you could use something like this to satisfy the above (provided whatever you're using to apply it supports a negative lookbehind:
(?<!')'[\w]+?'
It's using a negative lookbehind to exclude captures between ' and ' that are preceded by another '. You may have to adapt the \w if you required spaces/other characters.
Edit (Updated answer to include your extra strings from the comments):
You could try:
(?<=\s)'.*?'
Which will capture zero or more characters occurring between ' characters (ungreedy), and uses a positive lookbehind to ensure the first ' is preceded by a space character, which should satisfy all the strings you've listed.
You could optionally add a negative lookahead on the end (?!\w) to skip over the ' in O'Connor.

Related

SQL Server - Regex pattern match only alphanumeric characters

I have an nvarchar(50) column myCol with values like these 16-digit, alphanumeric values, starting with '0':
0b00d60b8d6cfb19, 0b00d60b8d6cfb05, 0b00d60b8d57a2b9
I am trying to delete rows with myCol values that don't match those 3 criteria.
By following this article, I was able to select the records starting with '0'. However, despite the [a-z0-9] part of the regex, it also keeps selecting myCol values containing special characters like 00-d#!b8-d6/f&#b. Below is my select query:
SELECT * from Table
WHERE myCol LIKE '[0][a-z0-9]%' AND LEN(myCol) = 16
How should the expression be changed to select only rows with myCol values that don't contain special characters?
If the value must only contain a-z and digits, and must start with a 0 you could use the following:
SELECT *
FROM (VALUES(N'0b00d60b8d6cfb19'),
(N'0b00d60b8d6cfb05'),
(N'0b00d60b8d57a2b9'),
(N'00-d#!b8-d6/f&#b'))V(myCol)
WHERE V.myCol LIKE '0%' --Checks starts with a 0
AND V.myCol NOT LIKE '%[^0-9A-z]%' --Checks only contains alphanumerical characters
AND LEN(V.myCol) = 16;
The second clause works as the LIKE will match any character that isn't an alphanumerical character. The NOT then (obviously) reverses that, meaning that the expression only resolves to TRUE when the value only contains alphanumerical characters.
Pattern matching in SQL Server is not awesome, and there is currently no real regex support.
The % in your pattern is what is including the special characters you show in your example. The [a-z0-9] is only matching a single character. If your character lengths are 16 and you're only interested in letters and numbers then you can include a pattern for each one:
SELECT *
FROM Table
WHERE myCol LIKE '[0][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9][a-z0-9]';
Note: you don't need the AND LEN(myCol) = 16 with this.

Regex not matching correct string

I am busy building a lookup table for specific names of merchants. I tried to make use of the following regex but it's returning less results than the standard "like" function in Netezza SQL. Please refer to below:
SQL Like function: where trim(upper(a.MRCH_NME)) like '%CNA %' -- returns 4622 matches
Regex function in Netezza SQL: where array_combine(regexp_extract_all(trim(upper(a.MRCH_NME)),'.*CNA\s','i'),'|') = 'CNA' -- returns 2226 matches
I looked at the two result sets and found that strings such as the following aren't matched:
!C CNA INT ARR
*CNA PLATZ 0400
015764 CNA CRAD
C#CNA PARK 0
I made use of the following regex expression: /.*CNA\s'/
Any idea why the above strings aren't being returned as matches?
Thank you.
You probably should be using regexp_like:
SELECT *
FROM yourTable
WHERE REGEXP_LIKE(MRCH_NME, 'CNA[ ]', 'i');
This would be logically identical to the following query using LIKE:
SELECT *
FROM yourTable
WHERE MRCH_NME LIKE '%CNA ';
It seems to me the problem is more with your code rather than the regex. Look: like '%CNA %' returns all entries that contain a CNA substring followed with a literal space anywhere inside the entry. The '.*CNA\s' regex matches any 0+ chars other than newline followed with CNA and **any whitespace char*.
Acc. to this reference, \s matches "a white space character. White space is defined as [\t\n\f\r\p{Z}].
Thus, you should in fact just use
WHERE REGEXP_LIKE(MRCH_NME, 'CNA ', 'i')
or, better with a word boundary check:
WHERE REGEXP_LIKE(MRCH_NME, '\bCNA\b', 'i')
where \b marks a transition from a word to non-word and non-word to word character, thus ensuring a whole word search and justifying the regex usage.
If you do not need to match the merchant name as a whole word, use the regular LIKE with '%CNA %', it should be more efficient.

Escaping (, round brackets sybase SQL

I am working with Sybase SQL and want to exclude all entries that look like this:
(NOT PRESENT)
So I tried using:
SELECT col FROM table WHERE col NOT LIKE '(%)'
Do you guys know what is happening? I think I need to escap ( somehow, but I do not know how. The following returns an error:
SELECT col FROM table WHERE col NOT LIKE '\(%\)' ESCAPE '\'
Kind Regards
Try this :
SELECT col FROM table WHERE col NOT LIKE ('(%)')
You might find this helpful
Sybase Event Stream Processor 5.0 CCL Programmers Guide - String Functions
like()
Scalar. Determines whether a given string matches a specified pattern string.
Syntax
like ( string, pattern )
Parameters
string A string.
pattern A pattern of characters, as a string. Can contain wildcards.
Usage
Determines whether a string matches a pattern string. The function returns 1 if the string matches the pattern, and 0 otherwise. The pattern argument can contain wildcards: '_' matches a single arbitrary character, and '%' matches 0 or more arbitrary characters. The function takes in two strings as its arguments, and returns an integer.
Note: In SQL, the infix notation can also be used: sourceString like patternString.
Example
like ('MSFT', 'M%T') returns 1.

regexp after a word appear

Im using regexp to find the text after a word appear.
Fiddle demo
The problem is some address use different abreviations for big house: Some have space some have dot
Quinta
QTA
Qta.
I want all the text after any of those appear. Ignoring Case.
I try this one but not sure how include multiple start
SELECT
REGEXP_SUBSTR ("Address", '[^QUINTA]+') "REGEXPR_SUBSTR"
FROM Address;
Solution:
I believe this will match the abbreviations you want:
SELECT
REGEXP_REPLACE("Address", '^.*Q(UIN)?TA\.? *|^.*', '', 1, 1, 'i')
"REGEXPR_SUBSTR"
FROM Address;
Demo in SQL fiddle
Explanation:
It tries to match everything from the begging of the string:
until it finds Q + UIN (optional) + TA + . (optional) + any number of spaces.
if it doesn't find it, then it matches the whole string with ^.*.
Since I'm using REGEXP_REPLACE, it replaces the match with an empty string, thus removing all characters until "QTA", any of its alternations, or the whole string.
Notice the last parameter passed to REGEXP_REPLACE: 'i'. That is a flag that sets a case-insensitive match (flags described here).
The part you were interested in making optional uses a ( pattern ) that is a group with the ? quantifier (which makes it optional). Therefore, Q(UIN)?TA matches either "QUINTA" or "QTA".
Alternatively, in the scope of your question, if you wanted different options, you need to use alternation with a |. For example (pattern1|pattern2|etc) matches any one of the 3 options. Also, the regex (QUINTA|QTA) matches exactly the same as Q(UIN)?TA
What was wrong with your pattern:
The construct you were trying ([^QUINTA]+) uses a character class, and it matches any character except Q, U, I, N, T or A, repeated 1 or more times. But it's applied to characters, not words. For example, [^QUINTA]+ matches the string "BCDEFGHJKLMOPRSVWXYZ" completely, and it fails to match "TIA".

Like operator and Trailing spaces in SQL Server

This one matches column_name like 'CharEndsHere%'
and
This one doesn't column_name like 'CharEndsHere'
I know that like operator will consider even the trailing spaces, so I just copied the exact column value (with trailing spaces) and pasted it.
Something like column_name like 'CharEndsHere ' yet it doesn't match -- why?.
I haven't used '=' operator since the columns type is ntext
Is there something I am missing here or shouldn't I use like operator in this way?
Edited : column_name like 'CharEndsHere__' (__ denoted the spaces) 'CharEndsHere ' is the exact value in that cell, using like in this way valid or no?
Edit :
This is the code I tried,
SELECT *
FROM [DBName].[dbo].[TableName]
WHERE [DBName].[dbo].[TableName].Address1 LIKE rtrim('4379 Susquehanna Trail S ')
I have also tried without using rtrim, yet the same result
Edit: According to Blindy's answer,
If a comparison in a query is to return all rows with the string LIKE 'abc' (abc
without a space), all rows that start with abc and have zero or more trailing
blanks are returned.
But in my case, I have queried, Like 'abc' and there is a cell containing 'abc '(with trailing spaces) which is not returned. That's my actual problem
This is a case of reading the documentation, it's very explicitly stated here: http://msdn.microsoft.com/en-us/library/ms179859.aspx
When you perform string comparisons by using LIKE, all characters in the pattern string are significant. This includes leading or trailing spaces. If a comparison in a query is to return all rows with a string LIKE 'abc ' (abc followed by a single space), a row in which the value of that column is abc (abc without a space) is not returned. However, trailing blanks, in the expression to which the pattern is matched, are ignored. If a comparison in a query is to return all rows with the string LIKE 'abc' (abc without a space), all rows that start with abc and have zero or more trailing blanks are returned.
Edit: According to your comments, you seem to be looking for a way to use like while ignoring trailing spaces. Use something like this: field like rtrim('abc '). It will still use indexes because rtrim() is a scalar operand and it's evaluated before the lookup phase.