regex decimal with negative - vb.net

with the help of this thread I have tweak a regex for my use.
Decimal number regular expression, where digit after decimal is optional
So far I have this /^-?[1-9]$|^,\d+$|^0,\d$|^[1-9]\d*,\d*$
It works for
-12
0
1
It does not work for
-1,
-1,1
what I want is:
-1,1
may a kind soul explain what I am missing or doing wrong with this regex pls? Thank you very much!
I am tweaking it with : https://regexr.com/6sjkh but it's been 2 hours and I still don't know what I am missing. I don't know if language is important but I am in visual basic.

The thread you are quoting in your question has already some good answers. However if you do not need to also capture + and do not care about numbers without the leading zero before the comma and no thousand separator I would recommend the following:
/^-?\d+(,\d+)?/
The ^ indicates the beginning so that no other text in front is allowed
The -? is allowing a negative sign ( you need to escape this with \ as - can be a special character
\d+ expects 1 or more digits
The last part in parentheses covers the fractional part which as a whole can occure 0 or one time due to the question mark at the end
, is the decimal separator (if you want to have dots you need to put her ., as a dot is the special character for any character and also needs to be escaped if you look for a real dot)
\d+ is again one or more digits after the comma
I hope this helps a bit. Besides this I can also recommend using an online regular expression tool like https://regex101.com/

Related

Regexp - recognize trailing 0 for decimal numbers?

I need a regular expression which can find a trail of "0" in the decimal space.
F.e. following format should be recognized:
1.0
1.00
1.000
etc...
Is there somekind of "wildcard" for that?
Any idea?
Thanks,
KS
Please see the following regular expression that matches trailing zeros on the end of any decimal. Note that this does match trailing zeros in the case where there is a meaningful decimal value, but zeros occur after.
https://regex101.com/r/BJvLrO/1/
When working with regex, it is very valuable to always use a tool like https://regex101.com or https://regexr.com. Both of these tools will help you truly understand the Regular Expression. Try hovering your mouse over the different elements of the regex in my example and the tool will describe each part. You can also read the "Explanation" section to on the right side.

REGEX Extract Amount Without Currency

SELECT
ocr_text,
bucket,
REGEXP_EXTRACT('-?[0-9]+(\.[0-9]+)?', ocr_text)
FROM temp
I am trying to extract amounts from a string that will not have currency present. Any number that does not have decimals should not match. Commas should be allowed assuming they follow the correct rules (at hundreds marker)
56 no (missing decimals)
56.45 yes
120 no (missing decimals)
120.00 yes
1200.00 yes
1,200.00 yes
1,200 no (missing decimals)
1200 no (missing decimals)
134.5 no (decimal not followed by 2 digits)
23,00.00 no (invalid comma location)
I'm a noob to REGEX so I know my above statement already does not meet the criteria i've listed. However, i'm already stuck getting the error (INVALID_FUNCTION_ARGUMENT) premature end of char-class on my REGEX_EXTRACT line
Can someone point me in the right direction? How can I resolve my current issue? How can I modify to correctly incorporate the other criteria listed?
Here is a general regex pattern for a positive/negative number with two decimal places and optional thousands comma separators:
(?<!\S)(?:-?[0-9]{1,3}(,[0-9]{3})*(\.[0-9]{2})|-?[0-9]+(\.[0-9]{2}))(?!\S)
Demo
Your updated query:
SELECT
ocr_text,
bucket,
REGEXP_EXTRACT(ocr_text, '(?<!\S)(?:-?[0-9]{1,3}(,[0-9]{3})*(\.[0-9]{2})|-?[0-9]+(\.[0-9]{2}))(?!\S)')
FROM temp;
From the Presto docs I read, it supposedly supports Java's regex syntax. In the event that lookarounds are not working, you may try this version:
SELECT
ocr_text,
bucket,
REGEXP_EXTRACT(ocr_text, '(\s|^)(?:-?[0-9]{1,3}(,[0-9]{3})*(\.[0-9]{2})|-?[0-9]+(\.[0-9]{2}))(\s|$)')
FROM temp;
REGEXP_EXTRACT('^[-]?(\d*.\d*)', ocr_text)
Pattern: ^[-]?(\d*\.\d*)
Explanation:
^ - Start of line
[-]? - With or without negative dash (-)
\d* - 0 or more digits
\. - a decimal (escaped, because in regex decimals are considered special characters)
\d* - 0 or more digits (the decimal part);
$ - End of the line.
Bonus tip: There are helpful tools online to test your regex!
The Below code works to extract the value like all numbers but it catches all, only specific to certain alphabets its not working well. Anyone, please suggest well.
-?\d+\.?\d*
I have done work on NLP using Regex.

How to include apostrophe in character set for REGEXP_SUBSTR()

The IBM i implementation of regex uses apostrophes (instead of e.g. slashes) to delimit a regex string, i.e.:
... where REGEXP_SUBSTR(MYFIELD,'myregex_expression')
If I try to use an apostrophe inside a [group] within the expression, it always errors - presumably thinking I am giving a closing quote. I have tried:
- escaping it: \'
- doubling it: '' (and tripling)
No joy. I cannot find anything relevant in the IBM SQL manual or by google search.
I really need this to, for instance, allow names like O'Leary.
Thanks to Wiktor Stribizew for the answer in his comment.
There are a couple of "gotchas" for anyone who might land on this question with the same problem. The first is that you have to give the (presumably Unicode) hex value rather than the EBCDIC value that you would use, e.g. in ordinary interactive SQL on the IBM i. So in this case it really is \x27 and not \x7D for an apostrophe. Presumably this is because the REGEXP_ ... functions are working through Unicode even for EBCDIC data.
The second thing is that it would seem that the hex value cannot be the last one in the set. So this works:
^[A-Z0-9_\+\x27-]+ ... etc.
But this doesn't
^[A-Z0-9_\+-\x27]+ ... etc.
I don't know how to highlight text within a code sample, so I draw your attention to the fact that the hyphen is last in the first sample and second-to-last in the second sample.
If anyone knows why it has to not be last, I'd be interested to know. [edit: see Wiktor's answer for the reason]
btw, using double quotes as the string delimiter with an apostrophe in the set didn't work in this context.
A single quote can be defined with the \x27 notation:
^[A-Z0-9_+\x27-]+
^^^^
Note that when you use a hyphen in the character class/bracket expression, when used in between some chars it forms a range between those symbols. When you used ^[A-Z0-9_\+-\x27]+ you defined a range between + and ', which is an invalid range as the + comes after ' in the Unicode table.

REGEXP_REPLACE explanation

Hi may i know what does the below query means?
REGEXP_REPLACE(number,'[^'' ''-/0-9:-#A-Z''[''-`a-z{-~]', 'xy') ext_number
part 1
In terms of explaining what the function function call is doing:
It is a function call to analyse an input string 'number' with a regex (2nd argument) and replace any parts of the string which match a specific string. As for the name after the parenthesis I am not sure, but the documentation for the function is here
part 2
Sorry to be writing a question within an answer here but I cannot respond in comments yet (not enough rep)
Does this regex work? Unless sql uses different syntax this would appear to be a non-functional regex. There are some red flags, e.g:
The entire regex is wrapped in square parenthesis, indicating a set of characters but seems to predominantly hold an expression
There is a range indicator between a single quote and a character (invalid range: if a dash was required in the match it should be escaped with a '\' (backslash))
One set of square brackets is never closed
After some minor tweaks this regex is valid syntax:
^'' ''\-\/0-9:-#A-Z''[''-a-z{-~]`, but does not match anything I can think of, it is important to know what string is being examined/what the context is for the program in order to identify what the regex might be attempting to do
It seems like it is meant to replaces all ASCII control characters in the column or variable number with xy.
[] encloses a class of characters. Any character in that class matches. [^] negates that, hence all characters match, that are not in the class.
- is a range operator, e.g. a-z means all characters from a to z, like abc...xyz.
It seams like characters enclosed in ' should be escaped (The second ' is to escape the ' in the string itself.) At least this would make some sense. (But for none of the DBMS I found having a regexp_replace() function (Postgres, Oracle, DB2, MariaDB, MySQL), I found something in the docs, that would indicate this escape mechanism. They all use \, but maybe I missed something? Unfortunately you didn't tag which DBMS you're actually using!)
Now if you take an ASCII table you'll see, that the ranges in the expression make up all printable characters (counting space as printable) in groups from space to /, 0 to 9, : to #, etc.. Actually it might have been shorter to express it as '' ''-~, space to ~.
Given the negation, all these don't match. The ones left are from NUL to US and DEL. These match and get replaced by xy one by one.

unwanted leading blank space on oracle number format

I need to pad numbers with leading zeros (total 8 digits) for display. I'm using oracle.
select to_char(1011,'00000000') OPE_NO from dual;
select length(to_char(1011,'00000000')) OPE_NO from dual;
Instead of '00001011' I get ' 00001011'.
Why do I get an extra leading blank space? What is the correct number formatting string to accomplish this?
P.S. I realise I can just use trim(), but I want to understand number formatting better.
#Eddie: I already read the documentation. And yet I still don't understand how to get rid of the leading whitespace.
#David: So does that mean there's no way but to use trim()?
Use FM (Fill Mode), e.g.
select to_char(1011,'FM00000000') OPE_NO from dual;
From that same documentation mentioned by EddieAwad:
Negative return values automatically
contain a leading negative sign and
positive values automatically contain
a leading space unless the format
model contains the MI, S, or PR format
element.
EDIT: The right way is to use the FM modifier, as answered by Steve Bosman. Read the section about Format Model Modifiers for more info.