NSXMLNode textWithStringValue with entity - objective-c

Creating NSXMLNode with string:
NSXMLNode *node1 = [NSXMLNode textWithStringValue:#"<"];
NSLog(#"node1=%#",node1);
NSXMLNode *node2 = [NSXMLNode textWithStringValue:#">"];
NSLog(#"node2=%#",node2);
produces the following output:
node1=<
node2=>
Why is the "<" character escaped (i.e. converted into "<") while the ">" character is not?
Is this a bug?
Which node is handled correctly?

To quote the XML Spec:
The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. [...] The right angle bracket (>) may be represented using the string " &gt ; ", and must, for compatibility, be escaped using either " &gt ; " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.
In short, there are circumstances in which > does not have to be escaped, such as if it appears in an attribute.
No.
Both are.
If you ask for the string in canonical format, both characters will be escaped:
NSXMLNode *node3 = [NSXMLNode textWithStringValue:#">"];
NSLog(#"node3=%#",[node3 canonicalXMLStringPreservingComments:NO]);
Output:
node3=>

Related

Removing a sequence inside of brackets from a string [duplicate]

I'm trying to handle a bunch of files, and I need to alter then to remove extraneous information in the filenames; notably, I'm trying to remove text inside parentheses. For example:
filename = "Example_file_(extra_descriptor).ext"
and I want to regex a whole bunch of files where the parenthetical expression might be in the middle or at the end, and of variable length.
What would the regex look like? Perl or Python syntax would be preferred.
s/\([^)]*\)//
So in Python, you'd do:
re.sub(r'\([^)]*\)', '', filename)
The pattern that matches substrings in parentheses having no other ( and ) characters in between (like (xyz 123) in Text (abc(xyz 123)) is
\([^()]*\)
Details:
\( - an opening round bracket (note that in POSIX BRE, ( should be used, see sed example below)
[^()]* - zero or more (due to the * Kleene star quantifier) characters other than those defined in the negated character class/POSIX bracket expression, that is, any chars other than ( and )
\) - a closing round bracket (no escaping in POSIX BRE allowed)
Removing code snippets:
JavaScript: string.replace(/\([^()]*\)/g, '')
PHP: preg_replace('~\([^()]*\)~', '', $string)
Perl: $s =~ s/\([^()]*\)//g
Python: re.sub(r'\([^()]*\)', '', s)
C#: Regex.Replace(str, #"\([^()]*\)", string.Empty)
VB.NET: Regex.Replace(str, "\([^()]*\)", "")
Java: s.replaceAll("\\([^()]*\\)", "")
Ruby: s.gsub(/\([^()]*\)/, '')
R: gsub("\\([^()]*\\)", "", x)
Lua: string.gsub(s, "%([^()]*%)", "")
Bash/sed: sed 's/([^()]*)//g'
Tcl: regsub -all {\([^()]*\)} $s "" result
C++ std::regex: std::regex_replace(s, std::regex(R"(\([^()]*\))"), "")
Objective-C: NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:#"\\([^()]*\\)" options:NSRegularExpressionCaseInsensitive error:&error]; NSString *modifiedString = [regex stringByReplacingMatchesInString:string options:0 range:NSMakeRange(0, [string length]) withTemplate:#""];
Swift: s.replacingOccurrences(of: "\\([^()]*\\)", with: "", options: [.regularExpression])
Google BigQuery: REGEXP_REPLACE(col, "\\([^()]*\\)" , "")
I would use:
\([^)]*\)
If you don't absolutely need to use a regex, useconsider using Perl's Text::Balanced to remove the parenthesis.
use Text::Balanced qw(extract_bracketed);
my ($extracted, $remainder, $prefix) = extract_bracketed( $filename, '()', '[^(]*' );
{ no warnings 'uninitialized';
$filename = (defined $prefix or defined $remainder)
? $prefix . $remainder
: $extracted;
}
You may be thinking, "Why do all this when a regex does the trick in one line?"
$filename =~ s/\([^}]*\)//;
Text::Balanced handles nested parenthesis. So $filename = 'foo_(bar(baz)buz)).foo' will be extracted properly. The regex based solutions offered here will fail on this string. The one will stop at the first closing paren, and the other will eat them all.
$filename =~ s/\([^}]*\)//;
# returns 'foo_buz)).foo'
$filename =~ s/\(.*\)//;
# returns 'foo_.foo'
# text balanced example returns 'foo_).foo'
If either of the regex behaviors is acceptable, use a regex--but document the limitations and the assumptions being made.
If a path may contain parentheses then the r'\(.*?\)' regex is not enough:
import os, re
def remove_parenthesized_chunks(path, safeext=True, safedir=True):
dirpath, basename = os.path.split(path) if safedir else ('', path)
name, ext = os.path.splitext(basename) if safeext else (basename, '')
name = re.sub(r'\(.*?\)', '', name)
return os.path.join(dirpath, name+ext)
By default the function preserves parenthesized chunks in directory and extention parts of the path.
Example:
>>> f = remove_parenthesized_chunks
>>> f("Example_file_(extra_descriptor).ext")
'Example_file_.ext'
>>> path = r"c:\dir_(important)\example(extra).ext(untouchable)"
>>> f(path)
'c:\\dir_(important)\\example.ext(untouchable)'
>>> f(path, safeext=False)
'c:\\dir_(important)\\example.ext'
>>> f(path, safedir=False)
'c:\\dir_\\example.ext(untouchable)'
>>> f(path, False, False)
'c:\\dir_\\example.ext'
>>> f(r"c:\(extra)\example(extra).ext", safedir=False)
'c:\\\\example.ext'
For those who want to use Python, here's a simple routine that removes parenthesized substrings, including those with nested parentheses. Okay, it's not a regex, but it'll do the job!
def remove_nested_parens(input_str):
"""Returns a copy of 'input_str' with any parenthesized text removed. Nested parentheses are handled."""
result = ''
paren_level = 0
for ch in input_str:
if ch == '(':
paren_level += 1
elif (ch == ')') and paren_level:
paren_level -= 1
elif not paren_level:
result += ch
return result
remove_nested_parens('example_(extra(qualifier)_text)_test(more_parens).ext')
If you can stand to use sed (possibly execute from within your program, it'd be as simple as:
sed 's/(.*)//g'
>>> import re
>>> filename = "Example_file_(extra_descriptor).ext"
>>> p = re.compile(r'\([^)]*\)')
>>> re.sub(p, '', filename)
'Example_file_.ext'
Java code:
Pattern pattern1 = Pattern.compile("(\\_\\(.*?\\))");
System.out.println(fileName.replace(matcher1.group(1), ""));

Format a string in Elm

I have a list of string and generate it to HTML dynamically with li tag. I want to assign that value to id attribute as well. But the problem is the string item has some special characters like :, ', é, ... I just want the output to include the number(0-9) and the alphabet (a-z) only.
// Input:
listStr = ["Pop & Suki", "PINK N' PROPER", "L'Oréal Paris"]
// Output:
result = ["pop_suki", "pink_n_proper", "loreal_paris"] ("loral_paris" is also good)
Currently, I've just lowercased and replace " " to _, but don't know how to eliminate special character.
Many thanks!
Instead of thinking of it as eliminating special characters, consider the permitted characters – you want just lower-case alphanumeric characters.
Elm provides Char.isAlphaNum to test for alphanumeric characters, and Char.toLower to transform a character to lower case. It also provides the higher function String.foldl which you can use to process a String one Char at a time.
So for each character:
check if it's alphanumeric
if it is, transform it to lower case
if not and it is a space, transform it to an underscore
else drop the character
Putting this together, we create a function that processes a character and appends it to the string processed so far, then apply that to all characters in the input string:
transformNextCharacter : Char -> String -> String
transformNextCharacter nextCharacter partialString =
if Char.isAlphaNum nextCharacter then
partialString ++ String.fromChar (Char.toLower nextCharacter)
else if nextCharacter == ' ' then
partialString ++ "_"
else
partialString
transformString : String -> String
transformString inputString =
String.foldl transformNextCharacter "" inputString
Online demo here.
Note: This answer simply drops special characters and thus produces "loral_paris" which is acceptable as per the OP.
The answer that was ticked is a lot more efficient than the code I have below. Nonetheless, I just want to add my code as an optional method.
Nonetheless, if you want to change accents to normal characters, you can install and use the elm-community/string-extra package. That one has the remove accent method.
This code below is inefficient as you keep on calling library function on the same string of which all of them would go through your string one char at a time.
Also, take note that when you remove the & in the first index you would have a double underscore. You would have to replace the double underscore with a single underscore.
import Html exposing (text)
import String
import List
import String.Extra
import Char
listStr = ["Pop & Suki", "PINK N' PROPER", "L'Oréal Paris"]
-- True if alpha or digit or space, otherwise, False.
isDigitAlphaSpace : Char -> Bool
isDigitAlphaSpace c =
if Char.isAlpha c || Char.isDigit c || c == ' ' then
True
else
False
main =
List.map (\x -> String.Extra.removeAccents x --Remove Accents first
|> String.filter isDigitAlphaSpace --Remove anything that not digit alpha or space
|> String.replace " " "_" --Replace space with _
|> String.replace "__" "_" --Replace double __ with _
|> String.toLower) listStr --Turn the string to lower
|> Debug.toString
|> Html.text

Regular Expression for validate price in decimal

I really unable to find any workaround for regular expression to input price in decimal.
This what I want:-
12345
12345.1
12345.12
12345.123
.123
0.123
I also want to restrict digits.
I really created one but not validating as assumed
^([0-9]{1,5}|([0-9]{1,5}\.([0-9]{1,3})))$
Also want to know how is above expression different from the one
^([0-9]{1,5}|([0-9].([0-9]{1,3})))$ thats working fine.
Anyone with good explanation.
"I am using NSRegularExpression - Objective C" if this helps to answer more precisely
- (IBAction)btnTapped {
NSRegularExpression * regex = [NSRegularExpression regularExpressionWithPattern:
#"^\\d{1,5}([.]\\d{1,3})?|[.]\\d{1,3}$" options:NSRegularExpressionCaseInsensitive error:&error];
if ([regex numberOfMatchesInString:txtInput.text options:0 range:NSMakeRange(0, [txtInput.text length])])
NSLog(#"Matched : %#",txtInput.text);
else
NSLog(#"Not Matched : %#",txtInput.text);
}
"I am doing it in a buttonTap method".
This simple one should suit your needs:
\d*[.]?\d+
"Digits (\d+) that can be preceded by a dot ([.]?), which can itself be preceded by digits (\d*)."
Since you're talking about prices, neither scientific notation nor negative numbers are necessary.
Just as a point of interest, here's the one I usually used, scientific notation and negative numbers included:
[-+]?\d*[.]?\d+(?:[eE][-+]?\d+)?
For the new requirements (cf. comments), you can't specify how many digits you want on the first regex I gave, since it's not the way it has been built.
This one should suit your needs better:
\d{1,5}([.]\d{1,3})?|[.]\d{1,3}
"Max 5 digits (\d{1,5}) possibly followed ((...)?) by a dot itself followed by max 3 digits ([.]\d{1,3}), or (|) simply a dot followed by max 3 digits ([.]\d{1,3})".
Let's do this per-partes:
Sign in the beginning: [+-]?
Fraction number: \.\d+
Possible combinations (after sign):
Number: \d+
Fraction without zero \.\d+
And number with fraction: \d+\.\d+
So to join it all together <sign>(number|fraction without zero|number with fraction):
^[+-]?(\d+|\.\d+|\d+\.\d+)$
If you're not restricting the lengths to 5 digits before the decimal and 3 digits after then you could use this:
^[+-]?(?:[0-9]*\.[0-9]|[0-9]+)$
If you are restricting it to 5 before and 3 after max then you'd need something like this:
^[+-]?(?:[0-9]{0,5}\.[0-9]{1,3}|[0-9]{1,5})$
As far as the difference between your regexes goes, the first one limits the length of the number of digits before the decimal marker to 1-5 with and without decimals present. The second one only allows a single digit in front of the decimal pointer and 1-5 digits if there is no decimal.
How about this: ^([+-])?(\d+)?([.,])?(\d+)?$
string input = "bla";
if (!string.IsNullOrWhiteSpace(input))
{
string pattern = #"^(\s+)?([-])?(\s+)?(\d+)?([,.])?(\d+)(\s+)?$";
input = input.Replace("\'", ""); // Remove thousand's separator
System.Text.RegularExpressions.Regex.IsMatch(input, pattern);
// if server culture = de then reverse the below replace
input = input.Replace(',', '.');
}
Edit:
Oh oh - just realized that's where we run into a little bit of a problem if an en-us user uses ',' as thousand's separator....
So here a better one:
string input = "+123,456";
if (!string.IsNullOrWhiteSpace(input))
{
string pattern = #"^(\s+)?([+-])?(\s+)?(\d+)?([.,])?(\d+)(\s+)?$";
input = input.Replace(',', '.'); // Ensure no en-us thousand's separator
input = input.Replace("\'", ""); // Remove thousand's separator
input = System.Text.RegularExpressions.Regex.Replace(input, #"\s", ""); // Remove whitespaces
bool foo = System.Text.RegularExpressions.Regex.IsMatch(input, pattern);
if (foo)
{
bool de = false;
if (de) // if server-culture = de
input = input.Replace('.', ',');
double d = 0;
bool bar = double.TryParse(input, out d);
System.Diagnostics.Debug.Assert(foo == bar);
Console.WriteLine(foo);
Console.WriteLine(input);
}
else
throw new ArgumentException("input");
}
else
throw new NullReferenceException("input");
Edit2:
Instead of going through the hassle of getting the server culture, just use the tryparse overload with the culture and don't resubstitute the decimal separator.
double.TryParse(input
, System.Globalization.NumberStyles.Any
, new System.Globalization.CultureInfo("en-US")
, out d
);

How to convert \t into TAB character?

NSString* abc = #"\u003ca\\tb\\tc\u003e";
How can I convert it to <a b c>
For horizontal tab, you should use "\t". And for "<" or ">" unicode character pass its hexadecimal value and log it by character specifier (%c) as i used below. It must work for you as I have tried in my xcode and it worked for me.
NSString * requiredStrg = [NSString stringWithFormat:#"%c a\tb\tc %c",0x3c,0x3e];
NSLog(#"%#",requiredStrg);

Regular expression for separating words by uppercase letters and numbers

I was wondering if anyone might know what the regular expression would be to turn this:
West4thStreet
into this:
West 4th Street
I'm going to add the spaces to the string in Objective-C.
Thanks!
I don't know exactly where you want to put in spaces, but try something like [a-z.-][^a-z .-] and then put a space between the two characters in each match.
Something like this perl regex substitution would put a space before each group of capital letters or numbers. (You'd want to trim space before the string in this case also.) I assume you don't want it to break up eg: 45thStreet to 4 5th Street
Letters I'm less certain of.
s/([A-Z]+|[0-9]+)/ \1/g
I created a pattern to not match the beginning of the line for my personal amusement:
s/([^\^])([A-Z]+|[0-9]+)/\1 \2/g
This should work, if all your strings truly match the format of your example:
([A-Z][a-z]+)(\d+[a-z]+)([A-Z][a-z]+)
You can then separate the groups with spaces.
Another option would be to not use RegExKit and use code to loop through each character in the string and insert a space after each capital letter or after first decimal..
NSMutableString *myText2 = [[NSMutableString alloc] initWithString:#"The1stTest"];
bool isNumber=false;
for(int x=myText2.length-1;x>1;x--)
{
bool isUpperCase = [[NSCharacterSet uppercaseLetterCharacterSet] characterIsMember:[myText2 characterAtIndex:x]];
bool isLowerCase = [[NSCharacterSet lowercaseLetterCharacterSet] characterIsMember:[myText2 characterAtIndex:x]];
if([[NSCharacterSet decimalDigitCharacterSet] characterIsMember:[myText2 characterAtIndex:x]])
isNumber = true;
if((isUpperCase || isLowerCase) && isNumber)
{
[myText2 insertString:#" " atIndex:x+1];
isNumber=false;
}
if(isUpperCase)
[myText2 insertString:#" " atIndex:x];
}
NSLog(#"%#",myText2); // Output: "The 1st Test"