RegEx for not matching items in a quote - sql

So I am trying to figure out a Regular Expression and am having some issues. What I want to find (match) is all of the SQL parameters in a large script file, but NOT match items in single quotes (such as email addresses). For example:
INSERT INTO [User]
(
[UserGuid], [CompanyGuid], [Name], [EmailAddress]
) VALUES (
#UserGuid1, #CompanyGuid, 'Jason', 'jason#jason.com'
)
With #UserGuid1 and #CompanyGuid matching, but not #jason matching. I have been using this RegEx:
(#+[\w]+)
But it matches the email address, so I tried to do a negative look ahead/behind like this:
(?<!')[\W](#+[\w]+)[\W](?!')
but it is matching the '(' in the following example:
INSERT INTO [User] ([UserGuid]) VALUES (#UserGuid1)
Anyone have an idea what I am missing here? Something that can say: "anything that is NOT in a quote set?". Also, it is safe to assume balanced quote sets.

have you try the following?
(?<=\W)(#\w+)
basically it makes sure that the captured value preceded by a non-word character, you can add look-ahead too but it's kinda redundant because + is greedy and will match until non-word anyway.
the following will insure that in INSERT INTO [User] ([UserGuid]) VALUES ('#UserGuid1') nothing is matched:
(?<![\w'])(#\w+)

(#+[^']+) should help. The [^' ,] will match anything except a single quote, space or comma. You may need to add a few more characters, but that's the general idea.

Try this:
(?<=[^\w'])(#\w+)(?!')
This specifies that each match must be preceded by a non-word character (except for single quotes), then have an # sign and a word, and not followed by another single quote.

Related

REGEXP_REPLACE URL BIGQUERY

I have two types of URL's which I would need to clean, they look like this:
["//xxx.com/se/something?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
["//www.xxx.com/se/car?p_color_car=White?SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"]
The outcome I want is;
SE_{ifmobile:MB}{ifnotmobile:DT}_A_B_C_D_E_F_G_H"
I want to remove the brackets and everything up to SE, the URLS differ so I want to remove:
First URL
["//xxx.com/se/something?
Second URL:
["//www.xxx.com/se/car?p_color_car=White?
I can't get my head around it,I've tried this .*\/ . But it will still keep strings I don't want such as:
(1 url) =
something?
(2 url) car?p_color_car=White?
You can use
regexp_replace(FinalUrls, r'.*\?|"\]$', '')
See the regex demo
Details
.*\? - any zero or more chars other than line breakchars, as many as possible and then ? char
| - or
"\]$ - a "] substring at the end of the string.
Mind the regexp_replace syntax, you can't omit the replacement argument, see reference:
REGEXP_REPLACE(value, regexp, replacement)
Returns a STRING where all substrings of value that match regular
expression regexp are replaced with replacement.
You can use backslashed-escaped digits (\1 to \9) within the
replacement argument to insert text matching the corresponding
parenthesized group in the regexp pattern. Use \0 to refer to the
entire matching text.

sql regexp string end with ".0"

I want to judge if a positive number string is end with ".0", so I wrote the following sql:
select '12310' REGEXP '^[0-9]*\.0$'. The result is true however. I wonder why I got the result, since I use "\" before "." to escape.
So I write another one as select '1231.0' REGEXP '^[0-9]\d*\.0$', but this time the result is false.
Could anyone tell me the right pattern?
Dot (.) in regexp has special meaning (any character) and requires escaping if you want literally dot:
select '12310' REGEXP '^[0-9]*\\.0$';
Result:
false
Use double-slash to escape special characters in Hive. slash has special meaning and used for characters like \073 (semicolon), \n (newline), \t (tab), etc. This is why for escaping you need to use double-slash. Also for character class digit use \\d:
hive> select '12310.0' REGEXP '^\\d*?\\.0$';
OK
true
Also characters inside square brackets do not need double-slash escaping: [.] can be used instead of \\.
If you know it is a number string, why not just use:
select ( val like '%.0' )
You need regular expression if you want to validate that the string has digits everywhere else. But if you only need to check the last two characters, like is sufficient.
As for your question . is a wildcard in regular expressions. It matches any character.

escape in a select statement

In the following sql, what the use of escape is ?
select * from dual where dummy like 'funny&_' escape '&';
SQL*Plus ask for the value of _ whether escape is specified or not.
The purpose of the escape clause is to stop the wildcard characters (eg. % or _) from being considered as wildcards, as per the documentation
The reason why you're being prompted for the value of _ is because you're using &, which is also usually the character used to prompt for a substitution variable.
To stop the latter from happening, you could:
change to a different escape character
prior to running your statement, run set define off if you're using SQL*Plus (or as a script in a GUI, eg. Toad) or turn off the substitution variable prompting if you're using a GUI.
change the define character to something different by running set define <character>
The escape character is used to indicate that the underscore should be matched as an actual character, rather than as a single-character wildcard. This is explained in the documentation.
You can include the actual characters % or _ in the pattern by using the ESCAPE clause, which identifies the escape character. If the escape character precedes the character % or _ in the pattern, then Oracle interprets this character literally in the pattern rather than as a special pattern-matching character.
If you didn't have the escape clause then the underscore would match any single character, so where dummy like 'funny_' would match 'funnyA', 'funnyB', etc. and not just an actual underscore.
The escape character you've chosen is & which is the default SQL*Plus client substitution variable marker. It has nothing to do with the escape clause, and using that is causing the &_ part of the pattern to be interpreted as a substitution variable called _, hence your being prompted. As it isn't related, the escape clause has no effect on that.
The simplest thing is probably to choose a different escape character. If you want to use that specific escape character and not be prompted, disable or change the substitution character:
set define off
select * from dual where dummy like 'funny&_' escape '&';
set define on
That will then match rows where dummy contains exactly the string 'funny_'. (It's therefore equivalent to where dummy = 'funny_', as there are no unescaped wildcards, making the like pattern matching redundant). It will not match any that start with that pattern (it's sort of like using regexp_like with start and end anchors, and you might be expecting it to work as if you hadn't supplied anchors, but it doesn't). You would need to add a % wildcard for that:
set define off
select * from dual where dummy like 'funny&_%' escape '&';
set define on
And if you want to match any that don't start with funny_ but have it somewhere in the middle of the value, you would need to add another wildcard before it too:
set define off
select * from dual where dummy like '%funny&_%' escape '&';
set define on
You haven't shown any sample data or expected results to it isn't clear which pattern you need.
SQL Fiddle doesn't have substitution variables but here's an example showing how those three patterns match various values.
The syntax for the SQL LIKE Condition is:
expression LIKE pattern [ ESCAPE 'escape_character' ]
Parameters or Arguments
expression : A character expression such as a column or field.
pattern : A character expression that contains pattern matching. The patterns that you can choose from are:
Wildcard | Explanation
---------+-------------
% | Allows you to match any string of any length (including zero length)
_ | Allows you to match on a single character
escape_character: Optional. It allows you to test for literal instances of a wildcard character such as % or _.
Source : http://www.techonthenet.com/sql/like.php

Remove Special Characters from an Oracle String

From within an Oracle 11g database, using SQL, I need to remove the following sequence of special characters from a string, i.e.
~!##$%^&*()_+=\{}[]:”;’<,>./?
If any of these characters exist within a string, except for these two characters, which I DO NOT want removed, i.e.: "|" and "-" then I would like them completely removed.
For example:
From: 'ABC(D E+FGH?/IJK LMN~OP' To: 'ABCD EFGHIJK LMNOP' after removal of special characters.
I have tried this small test which works for this sample, i.e:
select regexp_replace('abc+de)fg','\+|\)') from dual
but is there a better means of using my sequence of special characters above without doing this string pattern of '\+|\)' for every special character using Oracle SQL?
You can replace anything other than letters and space with empty string
[^a-zA-Z ]
here is online demo
As per below comments
I still need to keep the following two special characters within my string, i.e. "|" and "-".
Just exclude more
[^a-zA-Z|-]
Note: hyphen - should be in the starting or ending or escaped like \- because it has special meaning in the Character class to define a range.
For more info read about Character Classes or Character Sets
Consider using this regex replacement instead:
REGEXP_REPLACE('abc+de)fg', '[~!##$%^&*()_+=\\{}[\]:”;’<,>.\/?]', '')
The replacement will match any character from your list.
Here is a regex demo!
The regex to match your sequence of special characters is:
[]~!##$%^&*()_+=\{}[:”;’<,>./?]+
I feel you still missed to escape all regex-special characters.
To achieve that, go iteratively:
build a test-tring and start to build up your regex-string character by character to see if it removes what you expect to be removed.
If the latest character does not work you have to escape it.
That should do the trick.
SELECT TRANSLATE('~!##$%sdv^&*()_+=\dsv{}[]:”;’<,>dsvsdd./?', '~!##$%^&*()_+=\{}[]:”;’<,>./?',' ')
FROM dual;
result:
TRANSLATE
-------------
sdvdsvdsvsdd
SQL> select translate('abc+de#fg-hq!m', 'a+-#!', etc.) from dual;
TRANSLATE(
----------
abcdefghqm

SQL Server LIKE containing bracket characters

I am using SQL Server 2008. I have a table with the following column:
sampleData (nvarchar(max))
The value for this column in some of these rows are lists formatted as follows:
["value1","value2","value3"]
I'm trying to write a simple query that will return all rows with lists formatted like this, by just detecting the opening bracket.
SELECT * from sampleTable where sampleData like '[%'
The above query doesn't work, because '[' is a special character. How can I escape the bracket so my query does what I want?
... like '[[]%'
You use [ ] to surround a special character (or range).
See the section "Using Wildcard Characters As Literals" in SQL Server LIKE
Note: You don't need to escape the closing bracket...
Aside from gbn's answer, the other method is to use the ESCAPE option:
SELECT * from sampleTable where sampleData like '\[%' ESCAPE '\'
See the documentation for details.
Just a further note here...
If you want to include the bracket (or other specials) within a set of characters, you only have the option of using ESCAPE (since you are already using the brackets to indicate the set).
Also you must specify the ESCAPE clause, since there is no default escape character (it isn't backslash by default as I first thought, coming from a C background).
E.g., if I want to pull out rows where a column contains anything outside of a set of 'acceptable' characters, for the sake of argument let's say alphanumerics... we might start with this:
SELECT * FROM MyTest WHERE MyCol LIKE '%[^a-zA-Z0-9]%'
So we are returning anything that has any character not in the list (due to the leading caret ^ character).
If we then want to add special characters in this set of acceptable characters, we cannot nest the brackets, so we must use an escape character, like this...
SELECT * FROM MyTest WHERE MyCol LIKE '%[^a-zA-Z0-9\[\]]%' ESCAPE '\'
Preceding the brackets (individually) with a backslash and indicating that we are using backslash for the escape character allows us to escape them within the functioning brackets indicating the set of characters.