Check constraint for Emails in an Oracle Database - sql

I've searched everywhere for a decent and logical CHECK constraint to validate that an email is in the right format. So far I've found really long and unnecessary expressions like:
create table t (
email varchar2(320) check (
regexp_like(email, '[[:alnum:]]+#[[:alnum:]]+\.[[:alnum:]]')
)
);
and
create table stk_t (
email varchar2(320) check (
email LIKE '%#%.%' AND email NOT LIKE '#%' AND email NOT LIKE '%#%#%'
)
);
Surely there is a simpler way?
I'm using Oracle 11g database and SQL Developer IDE.
This is what I have:
constraint Emails_Check check (Emails LIKE '%_#%_._%')
Can someone please let me know if this is the most efficient way of validating emails?

You can try this
email varchar2(255) check (
email LIKE '%#%.%' AND email NOT LIKE '#%' AND email NOT LIKE '%#%#%' )

CREATE TABLE MYTABLE(
EMAIL VARCHAR2(30) CHECK(REGEXP_LIKE (EMAIL,'^[A-Za-z]+[A-Za-z0-9.]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$'))
)
Explanation of Regular Expression
^ #start of the line
[_A-Za-z0-9-]+ # must start with string in the bracket [ ], must contains one or more (+)
( # start of group #1
\\.[_A-Za-z0-9-]+ # follow by a dot "." and string in the bracket [ ], must contains one or more (+)
)* # end of group #1, this group is optional (*)
# # must contains a "#" symbol
[A-Za-z0-9]+ # follow by string in the bracket [ ], must contains one or more (+)
( # start of group #2 - first level TLD checking
\\.[A-Za-z0-9]+ # follow by a dot "." and string in the bracket [ ], must contains one or more (+)
)* # end of group #2, this group is optional (*)
( # start of group #3 - second level TLD checking
\\.[A-Za-z]{2,} # follow by a dot "." and string in the bracket [ ], with minimum length of 2
) # end of group #3
$ #end of the line

Stumbled upon this answer while hunting for a simple solution on the internet:
ALTER TABLE YourTableName
ADD CONSTRAINT YourConstraintName CHECK(YourColumnName LIKE '%___#___%.__%')
All points to #bhanu_nz here

Related

Azure Kusto Query to trim multiple parts of a string

I'm using a KQL query in Azure to create a Sentinel alert.
I can't workout how to trim a string to show the data between the third instance of the " character and the first instance of (
I've tried to use a trim_start/ trim_end and also a split command but keep getting regex problems.
An example of the string is [ "HOSTNAME", "Test User (t.user#example.com)" ]
I'd like to either extract Test User from the string or HOSTNAME, Test User and t.user#example.com into separate fields.
Any help or pointers in the right direction would be appreciated
you could use the parse operator.
for example:
print input = '[ "HOSTNAME", "Test User (t.user#example.com)" ]'
| parse input with * '"' host_name '"' * '"' user_name ' (' email_address ')' *
input
host_name
user_name
email_address
[ "HOSTNAME", "Test User (t.user#example.com)" ]
HOSTNAME
Test User
t.user#example.com
parse-where is good for this, too.
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/parsewhereoperator

Multiline regex match in MariaDB/Mediawiki

I am trying to match text (contained in a Mediawiki template) in multiple lines via the Replace Text extension in MW 1.31, server running MariaDB 10.3.22.
An example of the template is the following (other templates may exist on the same page):
{{WoodhouseENELnames
|Text=[[File:woodhouse_999.jpg|thumb|link={{filepath:woodhouse_999.jpg}}]]Αἰακός, ὁ, or say, son of Aegina.
<b class="b2">Of Aeacus</b>, adj.: Αἰάκειος.
<b class="b2">Descendant of Aeacus</b>: Αἰακίδης, -ου, ὁ.
}}
Above and below could be other templates, with varying number of line breaks I.e.
{{MyTemplatename
|Text=text, text, text
}}
{{WoodhouseENELnames
|Text=text, text, text
}}
{{OtherTemplatename
|Text= text, text, text
}}
There are varying number of lines and/or line breaks within the template. I want to match the full template and delete it; that is match from {{WoodhouseENELnames to the closing }} but without matching any templates further down, that is, stop matching if further {{ are encountered.
The closest I got was using something like:
Find
({{WoodhouseENELnames\n\|Text=)(.*?)\n+(.*?)\n+(.*?)\n+(.*?)(\n+}})
And adding/removing (.*?)\n+ in the regex to match cases with more or less lines. The problem is that this expression might inadvertently match other templates following this one.
Is there a regex that would match all possible text/line breaks contained within the template (in a lazy way, as there may be other templates below and above) in the same page? The templates are delimited by opening {{ and closing }})?
Edited to clear up any confusing
This is a recursion simulation for use on
Java, Python style engines that do not support function calls (recursion)
(?s)(?={{WoodhouseENELnames)(?:(?=.*?{{(?!.*?\1)(.*}}(?!.*\2).*))(?=.*?}}(?!.*?\2)(.*)).)+?.*?(?=\1)(?:(?!{{).)*(?=\2$)
Recursion Simulation demo
Just check matchs for result
This is real recursion for use on Perl, PCRE style engines
(?s){{WoodhouseENELnames((?:(?>(?:(?!{{|}}).)+)|{{(?1)}})*)}}
Recursion demo
Note that Dot-Net is done differently and is not included here
I can only think of a brute-force, iterative approach using a recursive query.
The idea is to walk through the string, starting at the first occurence of string part '{{WoodhouseENELnames'. From there on, we can set a counter that keeps tracks of how many opening and closing brackets were met. When the count reaches 0, we know the pattern is exhausted. The final step is to rebuild a string that retains the parts prior to and after the pattern.
For this to work, you need a unique column to identify each row. I assumed id.
with recursive cte as (
select
n_open n0,
n_open n1,
1 cnt,
mycol,
id
from (select t.*, locate('{{WoodhouseENELnames', mycol) n_open from mytable t) x
where n_open > 0
union all
select
n0,
n1 + 2 + case when n_open > 0 and n_open < n_close then n_open else n_close end,
cnt + case when n_open > 0 and n_open < n_close then 1 else -1 end,
mycol,
id
from (
select
c.*,
locate('{{', substring(mycol, n1 + 2)) n_open,
locate('}}', substring(mycol, n1 + 2)) n_close
from cte c
) x
where cnt > 0
)
select id, concat(substring(mycol, 1, min(n0) - 1), substring(mycol, max(n1) + 1)) mycol
from cte
group by id
Demo on DB Fiddle
Set-up - I added string parts before and after the pattern (including double brackets for extra fun):
create table mytable(id int, mycol varchar(2000));
insert into mytable values (
1,
'{{abcd{{WoodhouseENELnames
|Text=[[File:woodhouse_999.jpg|thumb|link={{filepath:woodhouse_999.jpg}}]]Αἰακός, ὁ, or say, son of Aegina.
<b class="b2">Of Aeacus</b>, adj.: Αἰάκειος.
<b class="b2">Descendant of Aeacus</b>: Αἰακίδης, -ου, ὁ.
}} efgh{{'
);
Results:
id | mycol
-: | :------------
1 | {{abcd efgh{{
MariaDB uses the PCRE-Regex engine.
If you can assure, that
the opening tag of your template ({{WoodhouseENELnames) starts on a new line
the closing tag of your template (}}) starts on a new line
no other closing tags (}}) in between starts on a new line, the follwoing Regex will do:
(?ms)^{{WoodhouseENELnames.+?^}}
Description:
(?ms) thells the regex that ^ matches any linebreak in the text and that . also matches newlines.
Then serach for your opening tag on a new line.
Search for the shortest possible string including any character (also newlines) up to
a closing tag (}}) on a new line.
If you want to capture the match, enclose the regex within (and )
EDIT:
As The PCRE2 supports recursive patterns, the follwing, more complexe regex will match, regardless of the beginnig-of-line-constraints above:
(?msx)
({{WoodhouseENELnames # group 1: Matching the whole template
( # group 2: Mathing the contents of the Template, including subpatters.
[^{}]* # Search zewro or more characters except { or }
{{ # The beginning of a subpattern
( # Containg if:
[^{}]++ # Search zewro or more characters except { or }
| (?2) # or the recursive pattern group 2
)* # Zero or more times
}} # The closing of the subpattern.
[^{}]* # Search zewro or more characters except { or }
)
}}
)
Cave-at: Doesn't cater for single { or } within the templates.
EDIT 2
I hate giving up before the job is done :-) This regex should work regardless of all contstraints above:
(?msx) # Note the additional 'x'-Option, allowing free spacing.
({{WoodhouseENELnames # Searcdh group 1 - Top level template:
( # Search group 2 - top level template contents:
( # Search-group 3 - Subtemplate contents:
[^{}]* # Zero or more characters except { or }
| {(?!{) # or a single { not follwed by a {
| }(?!}) # or a single } not follwed by a }
)* # Closing search group 3
{{ # Opening subtemplate tag
( # Search group 4:
(?3)* # Reusing serach group 3, zero or more times
| (?2) # or Recurse search group 2 (of which, this is a part)
)* # Group 4 zero or more times
}} # Closing subtemplate tag
(?3)* # Reusing search group 3, zero or more times
) # Closing Search group 2 - Template contents
}} # Top-level Template closing tag
) # Closing Search group 1
The last two solutions are based on the PCRE2 documentation

Search for any of a list of strings inside another string

I need to identify records with valid addresses by comparing the address fields against a list of street-like words.
So the code would look something like:
set street_list = 'STREET', 'ROAD', 'AVENUE', 'DRIVE', 'WAY', 'PLACE' (etc.)
;
create table [new table] as
select *
from [source table]
where [address line 1] (contains any word from STREET_LIST) or
[address line 2] (contains any word from STREET_LIST) or
[address line 3] (contains any word from STREET_LIST)
;
Is this possible?
Using LostReality's regexp suggestion, I got as far as:
select *
from [source table]
where upper([address line 1]) regexp '.* STREET.*|.* ST.*|.* ROAD.*|.* RD.*|.* CLOSE.*|.* LANE.*|.* LA.*|.* AVENUE.*|.* AVE.*|.* DRIVE.*|.* DR.*|.* HOUSE.*|.* WAY.*|.* PLACE.*|.* SQUARE.*|.* WALK.*|.* GROVE.*|.* GREEN.*|.* PARK.*|.* PK.*|.* CRESCENT.*|.* TERRACE.*|.* PARADE.*|.* GARDEN.*|.* GARDENS.*|.* COURT.*|.* COTTAGES.*|.* COTTAGE.*|.* MEWS.*|.* ESTATE.*|.* RISE.*|.* FARM.*'
;
and it seems to work.
But I have two small problems with it:
1) how do I write the regexp on more than one line so it's easier to read?
2) is there any way of putting that regexp into a macro variable because I want to check 5 address lines and I don't want 5 copies of the same expression.
Thanks
Solution for Hive. You can put regexp pattern in the variable and also you can use macro, fixed your template:
set hivevar:street_list ='STREET|ST|ROAD|RD|CLOSE|LANE|LA|AVENUE|AVE|DRIVE|DR|HOUSE|WAY|PLACE|SQUARE|WALK|GROVE|GREEN|PARK|PK|CRESCENT|TERRACE|PARADE|GARDEN|GARDENS|COURT|COTTAGES|COTTAGE|MEWS|ESTATE|RISE|FARM';
--boolean macro for using in the WHERE
create temporary macro contains_word(s string) (upper(s) rlike ${hivevar:street_list} ) ;
with some_table as ( --use your table instead of this synthetic example
select stack(2,'some string containing STREET and WALK',
'some string containing something else') as str
) --use your table instead of this synthetic example
--use macro in your query
select str from some_table
where contains_word(str);
Result:
OK
some string containing STREET and WALK
Time taken: 0.229 seconds, Fetched: 1 row(s)
Use OR like in your question:
where contains_word(address_line_1) OR contains_word(address_line_2) ...
Hope you have got the idea

SQL Query to Substitute value in the scanned results and update that field of Table

I have a 1 to many Organization: Users relationship.
I want to fetch the usernames of all User model of an Organization, capture a part of that username and append/substitute it with new value.
Here is how I am doing:
Form the raw SQL to Get the matching usernames and replace them with new value.
raw = "SELECT REGEXP_REPLACE($1::string[], '(^[a-z0-9]+)((\s[a-z0-9]+))*\#([a-z0-9]+)$', m.name[1] || '#' || $2) FROM (SELECT REGEXP_MATCHES($1::string[], '(^[a-z0-9]+)((\s[a-z0-9]+))*\#([a-z0-9]+)$') AS name) m"
Get the matching usernames and replace them with new value.
usernames: list of usernames retrieved from queryable
Repo.query(raw, [usernames, a_string])
Error I am getting
SELECT REGEXP_REPLACE($1::string[], '(^[a-z0-9]+)(( [a-z0-9]+))#([a-z0-9]+)$', m.name[1] || '#' || $2) FROM (SELECT REGEXP_MATCHES($1::string[], '(^[a-z0-9]+)(( [a-z0-9]+))#([a-z0-9]+)$') AS name) m [["tradeboox#trdbx18"], "trdbx17"]
{:error,
%Postgrex.Error{connection_id: 7222, message: nil,
postgres: %{code: :undefined_object, file: "parse_type.c", line: "257",
message: "type \"string[]\" does not exist", pg_code: "42704",
position: "137", routine: "typenameType", severity: "ERROR",
unknown: "ERROR"}}}
FYI: The username field of User model is of type citext
Once I get the replaced values, I want to update the User with something like
update([u], set: [username: new_values])
Any ideas on how to proceed with this?
`
There is no string type in PostgreSQL.
Function regexp_matches accepts as first parameter only text and it can't be array. So what you need to do is first change that type to text, then unnest($1::text[]) your array. Iterate over resulting set of rows with those regexp.
raw = "SELECT REGEXP_REPLACE(m.item, '(^[a-z0-9]+)((\s[a-z0-9]+))*\#([a-z0-9]+)$', m.name[1] || '#' || $2)
FROM (
SELECT item, REGEXP_MATCHES(item, '(^[a-z0-9]+)((\s[a-z0-9]+))*\#([a-z0-9]+)$') AS name
FROM unnest($1::text[]) AS items(item)
) m"
If I understand it correctly, you are trying to replace everything after # with some different string - if that is the case, then your regexp will put anything after spacebar into second element of matches array. You would need this instead: ((^[a-z0-9]+)(\s[a-z0-9]+)*).
If above is true, then you can do all that much easier with this:
SELECT REGEXP_REPLACE(item, '((^[a-z0-9]+)(\s[a-z0-9]+)*)\#([a-z0-9]+)$', '\1' || $2) AS name
FROM unnest($1::text[]) AS items(item)
Best practice however is to simply do replace in UPDATE statement:
UPDATE "User" SET
name = concat(split_part(name, '#', 1), '#', $2)
WHERE organization_id = $3
AND name ~* '^[a-z0-9]+(\s[a-z0-9]+)*\#[a-z0-9]+$'
It will split name by #, take first part, then append # and whatever is assigned to $2 (domain name I guess). It will update only rows that have organization_id matching to some id and have names matching your regexp (you can omit regexp if you want to change all names from organization). Make sure that table in actually named User, case sensitive, or remove double quotes to have case insensitive version.
I sadly do not know how to do this in your ORM.

REGEXP_SUBSTR : extracting portion of string between [ ] including []

I am on Oracle 11gR2.
I am trying to extract the text between '[' and ']' including [].
ex:
select regexp_substr('select userid,username from tablename where user_id=[REQ.UID] and username=[REQD.VP.UNAME]','\[(.*)\]') from dual
Output:
[REQ.UID] and username=[REQD.VP.UNAME]
Output needed:
[REQ.UID][REQD.VP.UNAME]
Please let me know how to get the needed output.
Thanks & Regards,
Bishal
Assuming you are just going to have two occurrences of [] then the following should suffice. The ? in the .*? means that it is non-greedy so that it doesn't gobble up the last ].
select
regexp_replace('select userid,username from tablename where user_id=[REQ.UID] and username=[REQD.VP.UNAME]'
,'.*(\[.*?\]).*(\[.*?\]).*','\1\2')
from dual
;
I'm not an Oracle user, but from quick perusal of the docs, I think this should be close:
REGEXP_REPLACE('select userid,username from tablename where user_id=[REQ.UID] and username=[REQD.VP.UNAME]',
'^[^\[]*(\[[^\]]*\])[^\[]*(\[[^\]]*\])$', '\1 \2')
Which looks much nastier than it is.
Pattern is:
^[^\[]* Capture all characters up to (but not including) the first [
(\[[^\]]*\]) Capture into group 1 anything like [<not "]">]
[^\[]* Capture everything up to (nut not including) the next [
(\[[^\]]*\]) Capture into group 2 anything like [<not "]">], at the end of the string
Then the replacement is simple, just <grp 1> <grp 2>