How can i make a regex that string should contain char and number. if its just letter or just number it should return me false
Eg:
123swift -> true
swift123 -> true
1231 -> false
swift -> false
My regex:
[a-z]|[0-9]
Use
^(?=.*?[A-Za-z])(?=.*?[0-9])[0-9A-Za-z]+$
Or, a presumably more efficient version:
^(?=[^A-Za-z]*[A-Za-z])(?=[^0-9]*[0-9])[0-9A-Za-z]+$
See proof.
Expanation:
NODE EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
[A-Za-z] any character of: 'A' to 'Z', 'a' to 'z'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
[0-9] any character of: '0' to '9'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
[0-9A-Za-z]+ any character of: '0' to '9', 'A' to 'Z',
'a' to 'z' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Related
I need to write a REGEXP_REPLACE query for a spark.sql() job.
If the value, follows the below pattern then only, the words before the first hyphen are extracted and assigned to the target column 'name', but if the pattern doesn't match, the entire 'name' should be reported.
Pattern:
Values should be hyphen delimited. Any values can be present before the first hyphen (be it numbers,
alphabets, special characters or even space)
First hyphen should be exactly followed by 2 words, separated by hyphen (it can only be numbers,
alphabets or alphanumeric) (Note: Special characters & blanks are not allowed)
Two words should be followed by one or more digits, followed by hyphen.
Last portion should be only one or more digits.
For Example:
if name = abc45-dsg5-gfdvh6-9890-7685, output of REGEXP_REPLACE = abc45
if name = abc, output of REGEXP_REPLACE = abc
if name = abc-gf5-dfg5-asd5-98-00, output of REGEXP_REPLACE = abc-gf5-dfg5-asd5-98-00
I have
spark.sql("SELECT REGEXP_REPLACE(name , '-[^-]+-\\w{2}-\\d+-\\d+$','',1,1,'i') AS name").show();
But it does not work.
Use
^([^-]*)(-[a-zA-Z0-9]+){2}-[0-9]+-[0-9]+$
See proof. Replace with $1. If $1 does not work, use \1. If \1 does not work use \\1.
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[^-]* any character except: '-' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
( group and capture to \2 (2 times):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[a-zA-Z0-9]+ any character of: 'a' to 'z', 'A' to
'Z', '0' to '9' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
){2} end of \2 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \2)
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
In an Oracle SQL query there is a field with the following content (example):
{"ID Card": 0.29333333333333333} or {"Speedtest": 0.8166666666666667}
Can I use RegEx, for example, to output the field in the query so that only the numbers and the period remain?
Example:
Select ID, CREATEDFORMAT, INSERT_TS, regexp_substr(SCORE, '[^0-9]') xSCORE FROM MYTABLE
But with the [^ 0-9] I only have the numbers without a point.
If you are using Oracle Database 12.1.0.2 or higher and the number you are trying to parse out is always in a JSON object, you can use the JSON_VALUE function to pull the information out.
Query
WITH
sample_data
AS
(SELECT '{"ID Card": 0.29333333333333333}' AS sample_val FROM DUAL
UNION ALL
SELECT '{"Speedtest": 0.8166666666666667}' FROM DUAL)
SELECT s.sample_val, json_value (s.sample_val, '$.*') AS number_val
FROM sample_data s;
Result
SAMPLE_VAL NUMBER_VAL
____________________________________ ______________________
{"ID Card": 0.29333333333333333} 0.29333333333333333
{"Speedtest": 0.8166666666666667} 0.8166666666666667
Use
REGEXP_SUBSTR(SCORE, '[-+]?[0-9]*\.?[0-9]+')
See proof
Explanation
--------------------------------------------------------------------------------
[-+]? any character of: '-', '+' (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
[0-9]* any character of: '0' to '9' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount possible))
use: REGEXP_SUBSTR (s.sample_val, '[+-]?[0-9]+[\.]?[0-9]+')
see this demo: https://dbfiddle.uk/?rdbms=oracle_18&fiddle=76c2b3be1d7d266f217d6b0541478c17
result:
SAMPLE_VAL NUMBER_VAL
---------------------------------- --------------------
{"ID Card": 0.29333333333333333} 0.29333333333333333
{"Speedtest": 0.8166666666666667} 0.8166666666666667
{"texts": 12.3456} 12.3456
{"texts": -65} -65
This is a change to #Ryszard Czech's post.
The regexp [[:blank:]] and \s arent they the same.
The below shows 2 different results.
select regexp_replace('Greg94/Eric99Chandler/Faulkner','/','')
from dual
where regexp_like(trim('Greg94/Eric99Chandler/Faulkner'),'[^[[:blank:]]]');
The above query returns no rows whereas when i replace blank with [^/s] it returns the row.
the problem is that you are using [[:blank:]] instead of [:blank:].
Regular Expression [^ [[:blank:]]] evaluate:
^[[:blank:]] : no character within the list "[, [:blank:]"
] last character to be evaluated.
or you remove the last character ']' which is the one that is not returning records or correct the expression:
[^ [:blank:]]
[^\s] is correct.
That would be
SQL> SELECT regexp_replace('Greg94/Eric99Chandler/Faulkner','/','') as result
2 FROM dual
3 WHERE REGEXP_LIKE(TRIM('Greg94/Eric99Chandler/Faulkner'), '[^[:blank:]]');
RESULT
--------------------------------------------------
Greg94Eric99ChandlerFaulkner
SQL>
SQL> SELECT regexp_replace('Greg94/Eric99Chandler/Faulkner','/','') as result
2 FROM dual
3 WHERE NOT REGEXP_LIKE(TRIM('Greg94/Eric99Chandler/Faulkner'), '[[:blank:]]');
RESULT
--------------------------------------------------
Greg94Eric99ChandlerFaulkner
SQL>
SQL> SELECT regexp_replace('Greg94/Eric99Chandler/Faulkner','/','') as result
2 FROM dual
3 WHERE REGEXP_LIKE(TRIM('Greg94/Eric99Chandler/Faulkner'), '[^\s]');
RESULT
--------------------------------------------------
Greg94Eric99ChandlerFaulkner
SQL>
Pick the one you like the most. Besides, if you found what works OK, why don't you simply use it (and forget about the one that doesn't work)? (I guess I know - because of but WHY???).
Perhaps a clearer test would be to generate some strings containing various whitespace characters and then use case expressions to see whether they match different regexes.
with demo (str) as
( select ':' from dual union all
select 'a' from dual union all
select 'b' from dual union all
select 'c' from dual union all
select 'contains'||chr(9)||'tabs' from dual union all
select 'contains'||chr(10)||chr(13)||'linebreaks' from dual union all
select 'contains some spaces' from dual
)
select str
, case when regexp_like(str,'[:blank:]') then 'Y' end as "[:blank:]"
, case when regexp_like(str,'[[:blank:]]') then 'Y' end as "[[:blank:]]"
, case when regexp_like(str,'[[:space:]]') then 'Y' end as "[[:space:]]"
, case when regexp_like(str,'\s') then 'Y' end as "\s"
from demo
order by 1;
STR [:blank:] [[:blank:]] [[:space:]] \s
-------------------- --------- ----------- ----------- --
: Y
a Y
b Y
c
contains tabs Y Y Y
contains Y Y Y
linebreaks
contains some spaces Y Y Y Y
(I manually edited the result for the row with tabs to align the results, otherwise the tab messes it up and makes it harder to read.)
[:blank:] matches any of :, b, l, a, n, k, because a character class is only valid within a [] bracket expression.
[[:blank:]] only matches spaces.
[[:space:]] matches tab, newline, carriage return and space.
\s is the same as [[:space:]]
As for your example, it is not behaving as you expected in two different ways.
Firstly, [^[[:blank:]]] should be [^[:blank:]] - that is, the character class [:blank:] within a bracketed expression.
Secondly, the corrected syntax still returns a match when there are no blanks because it looks for any character that is not a space, for example the first character G is not a space so it matches the expression:
regexp_like('Greg94/Eric99Chandler/Faulkner','[^ ]');
To identify strings that do not contain any whitespace character, you should use:
not regexp_like(str,'\s')
or
not regexp_like(str, '[[:space:]]')
Why does this fail for the first character in oracle sql?
select DECODE( TRANSLATE('1','123',' '), NULL, 'number','contains char') from dual
This works because 1 is the second digit
select DECODE( TRANSLATE('1','4123',' '), NULL, 'number','contains char') from dual
But this fails because 4 is the first digit
select DECODE( TRANSLATE('4','423',' '), NULL, 'number','contains char') from dual
First let's take a look at translate function definition:
TRANSLATE(expr, from_string, to_string): TRANSLATE returns expr with all
occurrences of each character in from_string replaced by its corresponding
character in to_string. Characters in expr that are not in from_string are not replaced.
If expr is a character string, then you must enclose it in single quotation marks.
The argument from_string can contain more characters than to_string. In this case,
the extra characters at the end of from_string have no corresponding characters
in to_string. If these extra characters appear in char, then they are removed
from the return value.
i.e. TRANSLATE(some_string,'123','abc'): 1 will be replaced by a, 2 by b, 3 by c(I will use arrow -> instead of "replaced by" further)
Now let's take a look at our examples:
TRANSLATE('1','123',' '): 1 -> " ", 2->nothing, 3->nothing.
(nothing means removed from the return value, see definition)
Result of above function is string consisted of whitespace - " "
TRANSLATE('1','4123',' '): 4 -> " ", 1->nothing, 2->nothing, 3->nothing
Result of the above function is empty string "". Oracle Database interprets the empty string as null, and if this function has a null argument, then it returns null.
TRANSLATE('4','423',' '): 4->" ", 2->nothing, 3->nothing
Result of the above function is whitespace string as in the first example.
That is why you are getting "contains char" in the first and third queries, and number in the second one
For a markup language I'm trying to parse, I decided to give parser generation a try with ANTLR. I'm new to the field, and I'm messing something up.
My grammar is
grammar Test;
DIGIT : ('0'..'9');
LETTER : ('A'..'Z');
SLASH : '/';
restriction
: ('E' ap)
| ('L' ap)
| 'N';
ap : LETTER LETTER LETTER;
car : LETTER LETTER;
fnum : DIGIT DIGIT DIGIT DIGIT? LETTER?;
flt : car fnum?;
message : 'A' (SLASH flt)? (SLASH restriction)?;
which does exactly what I want, when I give it an input string A/KK543/EPOS. When I give it A/KL543/EPOS however, it fails (MismatchedTokenException(9!=5)). It seems like some sort of conflict; it wants to generate restriction on the first L, so it seems I'm doing something wrong in the language definition, but I can't properly find out what.
For the input "A/KK543/EPOS", the following tokens are created:
'A' 'A'
SLASH '/'
LETTER 'K'
LETTER 'K'
DIGIT '5'
DIGIT '4'
DIGIT '3'
SLASH '/'
'E' 'E'
LETTER 'P'
LETTER 'O'
LETTER 'S'
But for the input "A/KL543/EPOS", these are created:
'A' 'A'
SLASH '/'
LETTER 'K'
'L' 'L'
DIGIT '5'
DIGIT '4'
DIGIT '3'
SLASH '/'
'E' 'E'
LETTER 'P'
LETTER 'O'
LETTER 'S'
As you can see, the char 'L' does not get tokenized as a LETTER. For the literal tokens 'A', 'E', 'L' and 'N' inside your parser rules, ANTLR (automatically) creates separate lexer rules that are place before all other lexer rules. This causes your lexer to look like this behind the scenes:
A : 'A';
E : 'E';
L : 'L';
N : 'N';
DIGIT : '0'..'9';
LETTER : 'A'..'Z';
SLASH : '/';
Therefor, any single 'A', 'E', 'L' and 'N' will never become a LETTER token. This is simply how ANTLR works. If you want to match them as letters, you'll need to create a parser rule letter and let it match these tokens too. Something like this:
message
: A (SLASH flt)? (SLASH restriction)?
;
flt
: car fnum?
;
fnum
: DIGIT DIGIT DIGIT DIGIT? letter?
;
restriction
: E ap
| L ap
| N
;
ap
: letter letter letter
;
car
: letter letter
;
letter
: A
| E
| L
| N
| LETTER
;
A : 'A';
E : 'E';
L : 'L';
N : 'N';
DIGIT : '0'..'9';
LETTER : 'A'..'Z';
SLASH : '/';
which will parse the input "A/KL543/EPOS" like this: