I am trying to achieve the following in the context of NetSuite saved search results output.
1. Remove every character after the first hyphen (-) or a colon (:) including space right before either of these characters.
So for e.g.
Input: test 123 - xyz : 123
this should output as test 123 -> this should even remove the space that you see right before the hyphen.
I tried the below two codes
SUBSTR({custitem123}, 0, INSTR({custitem123}, '-')-1)
SUBSTR({custitem123}, 0, INSTR({custitem123}, ':')-1)
And these work fine on their own- so I am trying to combine these in one single formula that will look for either of these and remove all characters after them -- apart from this, it should also look for any space right before the hyphen or colon and replace it with nothing. Not sure how you would achieve this.
2. Remove all non-alphabet characters & space before the alphabet characters (if any).
for e.g. Input: 1. Test XYZ
This should have Output as:
Test XYZ
I tried achieving this by using the below formula-
TRIM({class}, '[^A-Za-z ]', '')
The problem with this approach is it fails to replace the space character before the first alphabet of Test. I understand this is because I told it to skip replacing space characters. What I don't know is how do I tell it to only replace the space that it finds before the first alphabet character.
In short, how do I make sure the output is:
Test XYZ
And not
Test XYZ (that has a space before Test)
You can use regexp_substr as
regexp_substr({custitem123}, '[^-]+') to extract test 123 only from Input: test 123 - xyz : 123
if you add trim also, then you can get whitespaces around trimmed as
e.g. trim(regexp_substr({custitem123}, '[^-]+')) gives test 123 as trimmed output.
use RTRIM instead of Trim to remove the trailing whitespaces like this:
RTRIM(regexp_substr({custitem123}, '[^-]+'))
test 123 - xyz : 123 resolves to test 123
Also thanks for asking this question helped me solve my own similar issue :D
Related
I'm trying to write a function that removes any occurrence of any of the 26 alphabet letters from a string.
In: 'AA123A' -> Out: '123'
In: 'AB-123-CD% -> Out: '-123-%'
All I can find on Google is how to remove non-numeric characters, which all seem to be formed around defining the numbers you want to keep. But I want to keep any symbols too.
The 'simple' answer is 26 nested REPLACE for each letter, but I can't believe there isn't a better way to do it.
I could define a string of A-Z and loop through each character, calling the REPLACE 26 times - makes the code simpler but is the same functionally.
Does anyone have an elegant solution?
If I understand correctly, you can use TRANSLATE, e.g.:
SELECT REPLACE(TRANSLATE('AB-123- CDdcba%', 'ABCDabcd',' '), ' ', '');
SELECT REPLACE(TRANSLATE('AB-123- CDdcba%', 'ABCDabcd','AAAAAAAA'), 'A', '');
first case trimming also spaces,
second one, preserving existing spaces.
Just add the rest of characters to 'ABCDabcd' argument and keep 'AAAAAAAA' same length as the second argument.
We have a problem with a regular expression on hive.
We need to exclude the numbers with +37 or 0037 at the beginning of the record (it could be a false result on the regex like) and without letters or space.
We're trying with this one:
regexp_like(tel_number,'^\+37|^0037+[a-zA-ZÀÈÌÒÙ ]')
but it doesn't work.
Edit: we want it to come out from the select as true (correct number) or false.
To exclude numbers which start with +01 0r +001 or +0001 and having only digits without spaces or letters:
... WHERE tel_number NOT rlike '^\\+0{1,3}1\\d+$'
Special characters like + and character classes like \d in Hive should be escaped using double-slash: \\+ and \\d.
The general question is, if you want to describe a malformed telephone number in your regex and exclude everything that matches the pattern or if you want to describe a well-formed telephone number and include everything that matches the pattern.
Which way to go, depends on your scenario. From what I understand of your requirements, adding "not starting with 0037 or +37" as a condition to a well-formed telephone number could be a good approach.
The pattern would be like this:
Your number can start with either + or 00: ^(\+|00)
It cannot be followed by a 37 which in regex can be expressed by the following set of alternatives:
a. It is followed first by a 3 then by anything but 7: 3[0-689]
b. It is followed first by anything but 3 then by any number: [0-24-9]\d
After that there is a sequence of numbers of undefined length (at least one) until the end of the string: \d+$
Putting everything together:
^(\+|00)(3[0-689]|[0-24-9]\d)\d+$
You can play with this regex here and see if this fits your needs: https://regex101.com/r/KK5rjE/3
Note: as leftjoin has pointed out: To use this regex in hive you might need to additionally escape the backslashes \ in the pattern.
You can use
regexp_like(tel_number,'^(?!\\+37|0037)\\+?\\d+$')
See the regex demo. Details:
^ - start of string
(?!\+37|0037) - a negative lookahead that fails the match if there is +37 or 0037 immediately to the right of the current location
\+? - an optional + sign
\d+ - one or more digits
$ - end of string.
I'm trying to write a regex that matches the numbers 456725 to 456744 (Last 2 digits, 25-44), but can't seem to figure out a correct regex format. I've tried ^(4567[2-4][0-9]) but using this also matches 456745 which it shouldn't.
If you do it like ^(4567[2-4][0-9]), you are allowing any number in the range between [2-4] together with any number in the range between [0-9], which is obviously not what you wanted.
So you need to change for something like:
^4567(?:2[5-9]|3[0-9]|4[0-4])
Explanation
^ asserts position at start of the string
4567 matches the characters 4567 literally
Non-capturing group (?:2[5-9]|3[0-9]|4[0-4])
1st Alternative 2[5-9]
2 matches the character 2 literally
Match a single character present in the list [5-9]
2nd Alternative 3[0-9]
3 matches the character 3 literally
Match a single character present in the list [0-9]
3rd Alternative 4[0-4]
4 matches the character 4 literally
Match a single character present in the list [0-4]
You could use the page regex101 to learn more and read good explanations on the subject. Hope it helps.
If your variable is just an integer it is best to just compare it as such...
For the regex though..the ^(4567 is correct your issue is the [2-4] and [0-9] those are independent of each other. You need to put the pieces together so only 25-29 and 40-44 are allowed.
This should get you on the right track:
^(4567(?:2[5-9]|3[0-9]|4[0-4]))$
From within an Oracle 11g database, using SQL, I need to remove the following sequence of special characters from a string, i.e.
~!##$%^&*()_+=\{}[]:”;’<,>./?
If any of these characters exist within a string, except for these two characters, which I DO NOT want removed, i.e.: "|" and "-" then I would like them completely removed.
For example:
From: 'ABC(D E+FGH?/IJK LMN~OP' To: 'ABCD EFGHIJK LMNOP' after removal of special characters.
I have tried this small test which works for this sample, i.e:
select regexp_replace('abc+de)fg','\+|\)') from dual
but is there a better means of using my sequence of special characters above without doing this string pattern of '\+|\)' for every special character using Oracle SQL?
You can replace anything other than letters and space with empty string
[^a-zA-Z ]
here is online demo
As per below comments
I still need to keep the following two special characters within my string, i.e. "|" and "-".
Just exclude more
[^a-zA-Z|-]
Note: hyphen - should be in the starting or ending or escaped like \- because it has special meaning in the Character class to define a range.
For more info read about Character Classes or Character Sets
Consider using this regex replacement instead:
REGEXP_REPLACE('abc+de)fg', '[~!##$%^&*()_+=\\{}[\]:”;’<,>.\/?]', '')
The replacement will match any character from your list.
Here is a regex demo!
The regex to match your sequence of special characters is:
[]~!##$%^&*()_+=\{}[:”;’<,>./?]+
I feel you still missed to escape all regex-special characters.
To achieve that, go iteratively:
build a test-tring and start to build up your regex-string character by character to see if it removes what you expect to be removed.
If the latest character does not work you have to escape it.
That should do the trick.
SELECT TRANSLATE('~!##$%sdv^&*()_+=\dsv{}[]:”;’<,>dsvsdd./?', '~!##$%^&*()_+=\{}[]:”;’<,>./?',' ')
FROM dual;
result:
TRANSLATE
-------------
sdvdsvdsvsdd
SQL> select translate('abc+de#fg-hq!m', 'a+-#!', etc.) from dual;
TRANSLATE(
----------
abcdefghqm
What is the regex for these cases:
29000.12345678900, expected result 29000.123456789
29000.000, expected result 29000
29000.00003400, expected result 29000.000034
In short, I want to eliminate the 0 point if there is no 1-9 found again behind decimal and I also want to eliminate the dot (.) if actually the number can be considered as integer.
I use this regex
(?:.0*$|0*$)
but it gives me this result:
29123.6 from 29123.6400, 4 is gone from there.
When I tested the regex separately, it works perfectly,
.0*$ gives me 29123 from 29123.0000
0*$ gives me 29123.6423 from 29123.642300
Am I missing something with the combined regex?
If you think regex is the best way of doing it, you can just use something like this:
\.?0+$
It works for both cases:
> '12300000.000001130000000'.replace(/\.?0+$/g, '')
"12300000.00000113"
> '12300000.000000000000'.replace(/\.?0+$/g, '')
"12300000"
You can use this regex
^\d+(\.\d*[1-9])?
- -------------
| |->this would match only if the digits after . end with [1-9]
|
|->^ depicts the start of the string..it is necessary to match the pattern
that solves your problem
try it here
You simply want this:
^\d*(\.?\d*[1-9])?
^\d* that means one or more digit before the first group.
In the () that describes matching group.
\.? means single DOT(.) can be there but optional. eg. (.)
\d* there can be one or more digits. eg. (1234)
\.?\d* there can be one DOT and one or more digit eg. (.123)
[1-9] this includes only digit from 1 to 9 only excluding 0. eg. (2344)
Regex
I don't know whether Objective-C supports something like the following construct, but in Python you can do it completely without regular expressions using str.rstrip():
In [1]: def shorten_number(number):
...: return number.rstrip('0').rstrip('.')
In [2]: shorten_number('29000.12345678900')
Out[2]: '29000.123456789'
In [3]: shorten_number('29000.000')
Out[3]: '29000'