CASE
WHEN <in_data> LIKE '[0-9][0-9][0-9][0-9][0-9][0-9]' THEN SUBSTR(<in_data>,1,3)
ELSE '000'
END
We're doing a migration project from Sybase to Teradata, and having a problem figuring this one out :) I'm still new to Teradata.
I would like to ask the equivalent TD code for this -
LIKE '[0-9][0-9][0-9][0-9][0-9][0-9]' to Teradata
Basically, it just checks whether the digits are numeric value.
Can someone give me a hint on this
You can also use REGEXP_SUBSTR to directly extract the three digits:
COALESCE(REGEXP_SUBSTR(in_data,'^[0-9]{3}(?=[0-9]{3}$)'), '000')
This looks for the first three digits and then does a lookahead for three following digits without adding them to the overall match.
^ indicates the begin of the string, '$' the end, so there are no other characters before or after the six digits. (?=...) is a so-called "lookahead", i.e. those three digits are checked, but ignored.
If there's no match the regex returns NULL which is changed to '000'.
You need to use regexp instead of like, since [0-9][0-9][0-9][0-9][0-9][0-9] is a regular expression.
To do an exact match, you need to add anchors. ie, to match the string which contains an exact 6 digit chars.
regexp '^[0-9]{6}$'
or
regexp '^[[:digit:]]{6}$'
Related
I am trying to split a field by delimiter in LookML. This field either follows the format of:
Managers (AE)
Managers (AE - MM)
I was able to split to first case using this
sql: case
when rlike (${user_role_name}, '^.*[\\(\\)].*$') then split_part(${user_role_name}, ' ', -1)
However, I haven't been able to get the 2nd case to do the same. It's in a case statement so I am going to add another when statement, but am not able to figure out the regex for parentheses that contains spaces.
Thanks in advance for the help!
By "split" the string, I think you mean you want to extract the part in parentheses, right?
I would do this using a regex substring method. You didn't mention what warehouse you're using, and the syntax will vary a little, but on snowflake that would look like:
regexp_substr(${user_role_name}, '\\([^)]*\\)')
So, for example, with the inputs you gave:
select regexp_substr('Managers (AE)', '\\([^)]*\\)')
union all
select regexp_substr('Managers (AE - MM)', '\\([^)]*\\)')
result
(AE)
(AE - MM)
I am using Snowflake SQL. I would like to remove characters from a string after a special character ~. How can I do that?
here is the whole scenario. Let me explain. I do have a string like 'CK#123456~fndkjfgdjkg'. Now, i want only the number after #.And not anything after ~. This is number length varies for that field value. It might be 1 or 5 or 3. And i want to add the condition in where class where this number is equal to check_num from other table after joining. I am trying REGEXP_SUBSTR(A.SRC_TXT, '(?<=CK#)(.+?\b)') = C.CHK_NUM in the where condition. I am getting the error as 'No repititive argument after ?'
You can use a regex for this
-- To remove just the character after a ~
select regexp_replace('fo~o bar','~.', '');
-- returns 'fo bar'
--If you want to keep the ~
select regexp_replace('fo~o bar','~.', '~');
-- returns 'fo~ bar'
--If you want to remove everything after the ~
select regexp_replace('fo~o bar','~.*', '');
--returns 'fo'
If you need to remove other specific character sets after a ~, you can probably do this with a slightly more complicated regex, but I'd need examples of your desired input/output to help with that.
EDIT for updated question
This regex replace should get what you need.
select regexp_replace('CK#123456~fndkjfgdjkg','CK#(\\d*)~.*', '\\1');
-- returns 123456
(\\d*) gets ANY number of digits in a row, and the \\1 causes it to replace the match with what was in the first set of parenthesis, which is your list of digits. the CK# and ~.* are there to make sure the whole string gets matched and replaced.
If the CK# can vary as well, you can use .*? like this.
select regexp_replace('ABCD123HI#123456~fndkjfgdjkg','.*?#(\\d*)~.*', '\\1')
-- returns 123456
I'd probably do something like the following, easy enough but not as cool as RegEx type of functions.
set my_string='fooo~12345';
set search_for_me = '~';
SELECT SUBSTR($my_string, 1, DECODE(position($search_for_me, $my_string), 0, length($my_string), position($search_for_me, $my_string)));
I hope this helps...Rich
It looks like lookahead and lookbehinds are not supported in REGEXP functions, they seem to work in the PATTERN clause of a LIST command. Snowflake documentation makes no mention either way of lookahead or lookbehinds.
In your example:
It seems that the query engine is looking for that repeating argument, where you are attempting a lookbehind
You have not specified what you wanted extracted. You have two capture groups, but in this scenario everything would be returned
Since you are looking to remove everything after ~ you have a delimiter, why not use it in your REGEXP_SUBSTR function?
Try the following:
SELECT $1,REGEXP_SUBSTR($1,'\\w+#(.+?)~',1,1,'is',1)
FROM VALUES
('CK#123456~fndkjfgdjkg')
,('QH#128fklj924~fndkjfgdjkg')
;
This looks for:
One or more word characters
Followed by #
Capturing one or more characters upto and not including ~
Returns the characters within the capture group
You can change the .+? to \\d+? to make sure the pattern is only digits. Backslashes must be escaped with a backslash.
The descriptions for each argument of the function can be found here:
https://docs.snowflake.net/manuals/sql-reference/functions/regexp_substr.html
You could check this!!
select substr('CK#123456~fndkjfgdjkg',4,6) from dual;
OUTPUT
123456
https://docs.snowflake.net/manuals/sql-reference/functions/substr.html
I need to find rows where the phone number field contains unexpected characters.
Most of the values in this field look like:
123456-7890
This is expected. However, we are also seeing character values in this field such as * and #.
I want to find all rows where these unexpected character values exist.
Expected:
Numbers are expected
Hyphen with numbers is expected (hyphen alone is not)
NULL is expected
Empty is expected
Tried this:
WHERE phone_num is not like ' %[0-9,-,' ' ]%
Still getting rows where phone has numbers.
from https://regexr.com/3c53v address you can edit regex to match your needs.
I am going to use example regex for this purpose
select * from Table1
Where NOT REGEXP_LIKE(PhoneNumberColumn, '^[+]*[(]{0,1}[0-9]{1,4}[)]{0,1}[-\s\./0-9]*$')
You can use translate()
...
WHERE translate(Phone_Number,'a1234567890-', 'a') is NOT NULL
This will strip out all valid characters leaving behind the invalid ones. If all the characters are valid, the result would be NULL. This does not validate the format, for that you'd need to use REGEXP_LIKE or something similar.
You can use regexp_like().
...
WHERE regexp_like(phone_num, '[^ 0123456789-]|^-|-$')
[^ 0123456789-] matches any character that is not a space nor a digit nor a hyphen. ^- matches a hyphen at the beginning and -$ on the end of the string. The pipes are "ors" i.e. a|b matches if pattern a matches of if pattern b matches.
Oracle has REGEXP_LIKE for regex compares:
WHERE REGEXP_LIKE(phone_num,'[^0-9''\-]')
If you're unfamiliar with regular expressions, there are plenty of good sites to help you build them. I like this one
I have a questions regarding below data.
You clearly can see each EMP_IDENTIFIER has connected with EMP_ID.
So I need to pull only identifier which is 10 characters that will insert another column.
How would I do that?
I did some traditional way, using INSTR, SUBSTR.
I just want to know is there any other way to do it but not using INSTR, SUBSTR.
EMP_ID(VARCHAR2)EMP_IDENTIFIER(VARCHAR2)
62049 62049-2162400111
6394 6394-1368000222
64473 64473-1814702333
61598 61598-0876000444
57452 57452-0336503555
5842 5842-0000070666
75778 75778-0955501777
76021 76021-0546004888
76274 76274-0000454999
73910 73910-0574500122
I am using Oracle 11g.
If you want the second part of the identifier and it is always 10 characters:
select t.*, substr(emp_identifier, -10) as secondpart
from t;
Here is one way:
REGEXP_SUBSTR (EMP_IDENTIFIER, '-(.{10})',1,1,null,1)
That will give the 1st 10 character string that follows a dash ("-") in your string. Thanks to mathguy for the improvement.
Beyond that, you'll have to provide more details on the exact logic for picking out the identifier you want.
Since apparently this is for learning purposes... let's say the assignment was more complicated. Let's say you had a longer input string, and it had several groups separated by -, and the groups could include letters and digits. You know there are at least two groups that are "digits only" and you need to grab the second such "purely numeric" group. Then something like this will work (and there will not be an instr/substr solution):
select regexp_substr(input_str, '(-|^)(\d+)(-|$)', 1, 2, null, 2) from ....
This searches the input string for one or more digits ( \d means any digit, + means one or more occurrences) between a - or the beginning of the string (^ means beginning of the string; (a|b) means match a OR b) and a - or the end of the string ($ means end of the string). It starts searching at the first character (the second argument of the function is 1); it looks for the second occurrence (the argument 2); it doesn't do any special matching such as ignore case (the argument "null" to the function), and when the match is found, return the fragment of the match pattern included in the second set of parentheses (the last argument, 2, to the regexp function). The second fragment is the \d+ - the sequence of digits, without the leading and/or trailing dash -.
This solution will work in your example too, it's just overkill. It will find the right "digits-only" group in something like AS23302-ATX-20032-33900293-CWV20-3499-RA; it will return the second numeric group, 33900293.
i have to validate the data contains numeric or not and if it is not numeric return 0 and if it is numeric or empty return 1.
Below is my query i tried in SQL.
SELECT dbo.Regex('^[0-9]','123') -- This is returning 1.
SELECT dbo.Regex('^[0-9]','') -- this is not returning 1 but i want to return as 1 and i try to put space in "pattern" also it is not working...
please can any one help....
Thanks in advance
Try:
SELECT dbo.Regex('^[0-9]*$','123')
or better yet:
SELECT dbo.Regex('^\d*$','123')
\d is standard shorthand for [0-9]. Note that when you use [0-9] it means "match exactly one digit. The * wildcard means match zero or more so \d* means match zero or more digits, which includes an empty result.
Your Regex is not quite right: "^[0-9]" checks only, if the first character in the string is a number. Your regex would return 1 on "3abcde".
Use
"^\d*$"
to check if the field contains only numbers (or nothing). If you want to allow whitespace, use
"^\s*\d*\s*$"
If you want to allow signs and decimals, try
"^\s*(+|-)?\d*\.?\d*\s*$"
Try ^[0-9]*$, which will match 0 or more digits only:
SELECT dbo.Regex('^[0-9]*$','')
should return 1, as should...
SELECT dbo.Regex('^[0-9]*$','123')