Need regular expression that will work to find numeric and alpha characters in a string - numeric

Here's what I'm trying to do. A user can type in a search string, which can include '*' or '?' wildcard characters. I'm finding this works with regular strings but not with ones including numeric characters.
e.g:
414D512052524D2E535441524B2E4E45298B8751202AE908
1208
if I look for a section of that hex string, it returns false. If I look for "120" or "208" in the "1208" string it fails.
Right now, my regular expression pattern ends up looking like this when a user enters, say "w?f": '\bw.?f\b'
I'm (obviously) not well-versed in regular expressions at the moment, but would appreciate any pointers someone may have to handle numeric characters in the way I need to - thanks!
Code in question:
/**
*
* #param searchString
* #param strToBeSearched
* #return
*/
public boolean findString(String searchString, String strToBeSearched) {
Pattern pattern = Pattern.compile(wildcardToRegex(searchString));
return pattern.matcher(strToBeSearched).find();
}
private String wildcardToRegex(String wildcard){
StringBuffer s = new StringBuffer(wildcard.length());
s.append("\\b");
for (int i = 0, is = wildcard.length(); i < is; i++) {
char c = wildcard.charAt(i);
switch(c) {
case '*':
s.append(".*");
break;
case '?':
s.append(".?");
break;
default:
s.append(c);
break;
}
}
s.append("\\b");
return(s.toString());
}

Let's assume your string to search in is
1208
The search "term" the user enters is
120
The pattern then is
\b120\b
The \b (word boundary) meta-character matches beginning and end of "words".
In our example, this can't work because 120 != 1208
The pattern has to be
\b.*120.*\b
where .* means match a variable number of characters (including null).
Solution:
either add the .*s to your wildcardToRegex(...) method to make this functionality work out-of-the-box,
or tell your users to search for *120*, because your * wildcard character does exactly the same.
This is, in fact, my preference because the user can then define whether to search for entries starting with something (search for something*), including something (*something*), ending with something (*something), or exactly something (something).

Related

How to reduce a string to ASCII 7 characters for indexing purposes?

I am working on an application which must index certain sentences. Currently using Java and PostgreSQL. The sentences may be in several languages like French and Spanish using accents and other non-ASCII symbols.
For each word I want to create an index-able equivalent so that a user can perform a search insensitive to accents (transliteration). For example, when the user searches "nacion" it must find it even if the original word stored by the application was "Nación".
What could be the best strategy for this? I am not necessarily restricted only to PostgreSQL, nor the internal indexed value needs to have any similarity with the original word. Ideally, it should be a generic solution for converting any Unicode string into an ASCII string insensitive to case and accents.
So far I am using a custom function shown below which naively just replaces some letters with ASCII equivalents before storing the indexed value and does the same on query strings.
public String toIndexableASCII (String sStrIn) {
if (sStrIn==null) return null;
int iLen = sStrIn.length();
if (iLen==0) return sStrIn;
StringBuilder sStrBuff = new StringBuilder(iLen);
String sStr = sStrIn.toUpperCase();
for (int c=0; c<iLen; c++) {
switch (sStr.charAt(c)) {
case 'Á':
case 'À':
case 'Ä':
case 'Â':
case 'Å':
case 'Ã':
sStrBuff.append('A');
break;
case 'É':
case 'È':
case 'Ë':
case 'Ê':
sStrBuff.append('E');
break;
case 'Í':
case 'Ì':
case 'Ï':
case 'Î':
sStrBuff.append('I');
break;
case 'Ó':
case 'Ò':
case 'Ö':
case 'Ô':
case 'Ø':
sStrBuff.append('O');
break;
case 'Ú':
case 'Ù':
case 'Ü':
case 'Û':
sStrBuff.append('U');
break;
case 'Æ':
sStrBuff.append('E');
break;
case 'Ñ':
sStrBuff.append('N');
break;
case 'Ç':
sStrBuff.append('C');
break;
case 'ß':
sStrBuff.append('B');
break;
case (char)255:
sStrBuff.append('_');
break;
default:
sStrBuff.append(sStr.charAt(c));
}
}
return sStrBuff.toString();
}
String s = "Nación";
String x = Normalizer.normalize(s, Normalizer.Form.NFD);
StringBuilder sb=new StringBuilder(s.length());
for (char c : x.toCharArray()) {
if (Character.getType(c) != Character.NON_SPACING_MARK) {
sb.append(c);
}
}
System.out.println(s); // Nación
System.out.println(sb.toString()); // Nacion
How this works:
It splits up international characters to NFD decomposition (ó becomes o◌́), then strips the combining diacritical marks.
Character.NON_SPACING_MARK contains combining diacritical marks (Unicode calls it Bidi Class NSM [Non-Spacing Mark]).
The one obvious improvement for your current code: use a Map<Character, Character> that you prefill with your mappings.
And then simply check if that Map has a mapping; of so; use that; otherwise use the original character.
And as Androbin explains, there are special maps that do not rely on objects, but work with primitive types, like this trove. So, depending on your solution and requirements; you could look into that.

Convert a string into a int

I need some help here, I am currently making a game, but I got stuck somewhere. So, what I want is, if a Labels text is higher then the other labels text, then something will happen, I typed If Label26.Text > Label24.Text Then Label33.Visible = True which seems not to work, please, I need some help here, thanks. And yes, the labels text is NUMBERS.
The Text property of a label is a string. As far as computers go, you can't do math (using comparison operators like > will not return the result you are expecting) with strings because they are just a sequence of characters.
Even if the string only contains a number, the computer still sees it as a sequence of characters and not a number ("5" is a string literal with the character 5 in it, while 5 is an integer that can be used in a mathematic expression).
As some of the other commenters mentioned, you need to cast the Text property to an Integer or Double (or some other numeric data type). To do so, you'd want to use Int32.Parse to change the strings to integers.
If Int32.Parse(Label26.Text) > Int32.Parse(Label24.Text) Then Label33.Visible = True
You can use the int.tryParse to check if the content of the variable is a number or not. The output of the TryParse is a boolean, see the example below:
int num1 = 0;
bool num1_ = false;
num1_ = int.TryParse(txt1.Text.ToString(), out num1);
if (num1_)
{
// Is a number/integer
//Do something
}
else
{
//Is a string
//Do something else
}

Regular Expression for validate price in decimal

I really unable to find any workaround for regular expression to input price in decimal.
This what I want:-
12345
12345.1
12345.12
12345.123
.123
0.123
I also want to restrict digits.
I really created one but not validating as assumed
^([0-9]{1,5}|([0-9]{1,5}\.([0-9]{1,3})))$
Also want to know how is above expression different from the one
^([0-9]{1,5}|([0-9].([0-9]{1,3})))$ thats working fine.
Anyone with good explanation.
"I am using NSRegularExpression - Objective C" if this helps to answer more precisely
- (IBAction)btnTapped {
NSRegularExpression * regex = [NSRegularExpression regularExpressionWithPattern:
#"^\\d{1,5}([.]\\d{1,3})?|[.]\\d{1,3}$" options:NSRegularExpressionCaseInsensitive error:&error];
if ([regex numberOfMatchesInString:txtInput.text options:0 range:NSMakeRange(0, [txtInput.text length])])
NSLog(#"Matched : %#",txtInput.text);
else
NSLog(#"Not Matched : %#",txtInput.text);
}
"I am doing it in a buttonTap method".
This simple one should suit your needs:
\d*[.]?\d+
"Digits (\d+) that can be preceded by a dot ([.]?), which can itself be preceded by digits (\d*)."
Since you're talking about prices, neither scientific notation nor negative numbers are necessary.
Just as a point of interest, here's the one I usually used, scientific notation and negative numbers included:
[-+]?\d*[.]?\d+(?:[eE][-+]?\d+)?
For the new requirements (cf. comments), you can't specify how many digits you want on the first regex I gave, since it's not the way it has been built.
This one should suit your needs better:
\d{1,5}([.]\d{1,3})?|[.]\d{1,3}
"Max 5 digits (\d{1,5}) possibly followed ((...)?) by a dot itself followed by max 3 digits ([.]\d{1,3}), or (|) simply a dot followed by max 3 digits ([.]\d{1,3})".
Let's do this per-partes:
Sign in the beginning: [+-]?
Fraction number: \.\d+
Possible combinations (after sign):
Number: \d+
Fraction without zero \.\d+
And number with fraction: \d+\.\d+
So to join it all together <sign>(number|fraction without zero|number with fraction):
^[+-]?(\d+|\.\d+|\d+\.\d+)$
If you're not restricting the lengths to 5 digits before the decimal and 3 digits after then you could use this:
^[+-]?(?:[0-9]*\.[0-9]|[0-9]+)$
If you are restricting it to 5 before and 3 after max then you'd need something like this:
^[+-]?(?:[0-9]{0,5}\.[0-9]{1,3}|[0-9]{1,5})$
As far as the difference between your regexes goes, the first one limits the length of the number of digits before the decimal marker to 1-5 with and without decimals present. The second one only allows a single digit in front of the decimal pointer and 1-5 digits if there is no decimal.
How about this: ^([+-])?(\d+)?([.,])?(\d+)?$
string input = "bla";
if (!string.IsNullOrWhiteSpace(input))
{
string pattern = #"^(\s+)?([-])?(\s+)?(\d+)?([,.])?(\d+)(\s+)?$";
input = input.Replace("\'", ""); // Remove thousand's separator
System.Text.RegularExpressions.Regex.IsMatch(input, pattern);
// if server culture = de then reverse the below replace
input = input.Replace(',', '.');
}
Edit:
Oh oh - just realized that's where we run into a little bit of a problem if an en-us user uses ',' as thousand's separator....
So here a better one:
string input = "+123,456";
if (!string.IsNullOrWhiteSpace(input))
{
string pattern = #"^(\s+)?([+-])?(\s+)?(\d+)?([.,])?(\d+)(\s+)?$";
input = input.Replace(',', '.'); // Ensure no en-us thousand's separator
input = input.Replace("\'", ""); // Remove thousand's separator
input = System.Text.RegularExpressions.Regex.Replace(input, #"\s", ""); // Remove whitespaces
bool foo = System.Text.RegularExpressions.Regex.IsMatch(input, pattern);
if (foo)
{
bool de = false;
if (de) // if server-culture = de
input = input.Replace('.', ',');
double d = 0;
bool bar = double.TryParse(input, out d);
System.Diagnostics.Debug.Assert(foo == bar);
Console.WriteLine(foo);
Console.WriteLine(input);
}
else
throw new ArgumentException("input");
}
else
throw new NullReferenceException("input");
Edit2:
Instead of going through the hassle of getting the server culture, just use the tryparse overload with the culture and don't resubstitute the decimal separator.
double.TryParse(input
, System.Globalization.NumberStyles.Any
, new System.Globalization.CultureInfo("en-US")
, out d
);

Char.IsSymbol("*") is false

I'm working on a password validation routine, and am surprised to find that VB does not consider '*' to be a symbol per the Char.IsSymbol() check.
Here is the output from the QuickWatch:
char.IsSymbol("*") False Boolean
The MS documentation does not specify what characters are matched by IsSymbol, but does imply that standard mathematical symbols are included here.
Does anyone have any good ideas for matching all standard US special characters?
Characters that are symbols in this context: UnicodeCategory.MathSymbol, UnicodeCategory.CurrencySymbol, UnicodeCategory.ModifierSymbol and UnicodeCategory.OtherSymbol from the System.Globalization namespace. These are the Unicode characters designated Sm, Sc, Sk and So, respectively. All other characters return False.
From the .Net source:
internal static bool CheckSymbol(UnicodeCategory uc)
{
switch (uc)
{
case UnicodeCategory.MathSymbol:
case UnicodeCategory.CurrencySymbol:
case UnicodeCategory.ModifierSymbol:
case UnicodeCategory.OtherSymbol:
return true;
default:
return false;
}
}
or converted to VB.Net:
Friend Shared Function CheckSymbol(uc As UnicodeCategory) As Boolean
Select Case uc
Case UnicodeCategory.MathSymbol, UnicodeCategory.CurrencySymbol, UnicodeCategory.ModifierSymbol, UnicodeCategory.OtherSymbol
Return True
Case Else
Return False
End Select
End Function
CheckSymbol is called by IsSymbol with the Unicode category of the given char.
Since the * is in the category OtherPunctuation (you can check this with char.GetUnicodeCategory()), it is not considered a symbol, and the method correctly returns False.
To answer your question: use char.GetUnicodeCategory() to check which category the character falls in, and decide to include it or not in your own logic.
If you simply need to know that character is something else than digit or letter,
use just
!char.IsLetterOrDigit(c)
preferably with
&& !char.IsControl(c)
Maybe you have the compiler option "strict" of, because with
Char.IsSymbol("*")
I get a compiler error
BC30512: Option Strict On disallows implicit conversions from 'String' to 'Char'.
To define a Character literal in VB.NET, you must add a c to the string, like this:
Char.IsSymbol("*"c)
IsPunctuation(x) is what you are looking for.
This worked for me in C#:
string Password = "";
ConsoleKeyInfo key;
do
{
key = Console.ReadKey(true);
// Ignore any key out of range.
if (char.IsPunctuation(key.KeyChar) ||char.IsLetterOrDigit(key.KeyChar) || char.IsSymbol(key.KeyChar))
{
// Append the character to the password.
Password += key.KeyChar;
Console.Write("*");
}
// Exit if Enter key is pressed.
} while (key.Key != ConsoleKey.Enter);

Converting uppercase string to title case in Objective-C

I created the following method which starts by using the built-in convertStringToTitleCase method on NSString but it really just capitalizes the first letter of each word. I see in .NET there is a method for TextInfo.ToTitleCase which attempts what I'd like to do with Objective-C but also falls short.
http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase.aspx
The method I wrote to start is below. How would you handle properly casing an uppercase string? Would a database of words to convert to all uppercase/lowercase help?
- (NSString *)convertStringToTitleCase:(NSString *)str {
NSMutableString *convertedStr = [NSMutableString stringWithString:[str capitalizedString]];
NSRange range = NSMakeRange(0, convertedStr.length);
// a list of words to always make lowercase could be placed here
[convertedStr replaceOccurrencesOfString:#" De "
withString:#" de "
options:NSLiteralSearch
range:range];
// a list of words to always make uppercase could be placed here
[convertedStr replaceOccurrencesOfString:#" Tv "
withString:#" TV "
options:NSLiteralSearch
range:range];
return convertedStr;
}
As noted in comments, the .NET method you refer to doesn't do "proper" title case (that is, follow a list of exception words to be left in either all-caps or all-lowercase), so -[NSString capitalizedString] is as equivalent as you'll get. If you want exception words, you'll have to write your own method (or find someone else who did, as a google search for NSString "title case" might).
How "proper" your title casing gets depends on how many exception words you're willing to throw at it. How much of the English language do you want it to support? What about other languages? It'll also depend on how far you go in analyzing word boundaries -- you might want "TV" to stay all-caps regardless of whether it's in quotes, at the end of a sentence, etc., but you probably also don't want "you've" to come out "You'Ve".
If you want to process exception words, your plan of repeatedly running replaceOccurrencesOfString... will get slower the more exception words you have. (Also, using spaces in your search/replace strings means you aren't considering other word boundaries you might want to.)
It might be useful to consider NSRegularExpression, since regular expressions already have pretty robust notions of case and word boundaries. If that doesn't work well for you, using a scanner to read through the input string while producing a transformed output string would be more efficient than running multiple search/replace operations.
A nice one-liner(not a general solution, probably very inefficient on huge strings):
[[str lowercaseString] capitalizedString];
extension String {
/**
Get the title case string.
*/
var titleCase: String {
get {
return getTitleCaseString()
}
}
// MARK: Private methods.
/**
Get title case string.
- returns: The title case string regarding the lowercase words.
*/
private func getTitleCaseString() -> String {
struct Holder {
static let lowercaseWords = ["a", "an", "and", "at", "but", "by", "else", "for",
"from", "if", "in", "into", "is", "nor", "of", "off",
"on", "or", "out", "the", "to", "via", "vs", "with"]
}
return replaceToLowercaseAllOccurrencesOfWords(Holder.lowercaseWords).capitalizeFirst
}
/**
Replace to lowercase all occurrences of lowercase words.
- parameter lowercaseWords: The lowercase words to replace.
- returns: String with all occurrences replace to the lowercase words.
*/
private func replaceToLowercaseAllOccurrencesOfWords(lowercaseWords: [String]) -> String {
let capitalizedSelf = NSMutableString(string: self.capitalizedString)
for word in lowercaseWords {
if let lowercaseWordRegex = try? NSRegularExpression(pattern: "\\b\(word)\\b", options: .CaseInsensitive) {
lowercaseWordRegex.replaceMatchesInString(capitalizedSelf,
options: NSMatchingOptions(),
range: NSMakeRange(0, capitalizedSelf.length),
withTemplate: word)
}
}
return capitalizedSelf as String
}
/**
Capitalize first char.
*/
private var capitalizeFirst: String {
if isEmpty { return "" }
var result = self
result.replaceRange(startIndex...startIndex, with: String(self[startIndex]).uppercaseString)
return result
}
}