Web regex validation on Objective-C - objective-c

I've some regular expressions for the form fields validation.
I've an unit test to define the expected result
NSArray *suiteWebs = [NSArray arrayWithObjects:
#"http://webapp.stackoverflow.net",
#"http://webapp.stackoverflow.net/info.php",
#"http://www.stackoverflow.net",
#"http://www.stackoverflow.net/",
#"https://webapp.stackoverflow.net",
#"https://webapp.stackoverflow.net/info.php",
#"https://www.stackoverflow.net",
#"https://www.stackoverflow.net/"
#"webapp.stackoverflow.net",
#"webapp.stackoverflow.net/info.php",
#"www.stackoverflow.net",
#"www.stackoverflow.net/",
#"www.stack-overflow.com",
#"www.stackoverflow_.com",
#"www.stackover_flow.com",
nil];
NSArray *falseSuiteWebs = [NSArray arrayWithObjects:
#"ftp://webapp.stackoverflow.net",
#"http:/www.stackoverflow.net",
#"ftps://webapp.stackoverflow.net",
#"https:/www.stackoverflow.net",
nil];
for (NSString *web in suiteWebs) {
NSLog(#"Validating web %#", web);
STAssertTrue([TSAddEntityForm validateWeb:web withPatter:currentRegex], [NSString stringWithFormat:#"currentRegex web %#", web]);
}
for (NSString *web in falseSuiteWebs) {
NSLog(#"Validating web %#", web);
STAssertFalse([TSAddEntityForm validateWeb:web withPatter:currentRegex], [NSString stringWithFormat:#"currentRegex web %#", web]);
}
My actual regular expression is the next one:
NSString *webRegex4 = #"((http|https)://){0,1}((\\w)*|([0-9]*)|([\\-|_])*)+([\\.|/]((\\w)*|([0-9]*)|([\\-|_])*))+";
My problem are with the domains with - my regular expression don't validate it. For example the url www.stack-overflow.com is rejected
Any suggestions?
Thank you

May be this regexp would be better in your case (it's not ideal, but works for above suitable and bad samples):
(http(s)?://)?[\w-]+(\.[\w-]+)*\.\w{2,6}[/\w.-]*
It may begin from http:// or https://,
[\w-]+(\.[\w-]+)*\.\w{2,6} - describes domain
[/\w.-]* - folders and documents

In general, complex regular expressions are fool's gold. Use multiple passes with multiple regular expressions. Validate components of the URLs independently.
Complex regular expressions can be very powerful, but can also paint you into a fragile corner with something as open ended as URLs.
Also, if you're using Objective-C, it is easy to break things down with some of the facilities provided by NSURL.
NSURL will also give you a good idea of what components of a URL you should look at.
By using NSURL methods to extract components of the URLs, you can apply your regular expressions more carefully to each component.
CFURL is equally powerful.

Related

What is the most efficient way to compare an NSString in this way

I have an app (Cocoa Touch, Web Browser), however I need to be able to compare an NSString with thousands of other strings. Here's the deal.
When a WebView loads, I get the URL. I need to compare this URL with literally thousands of results (27,847). Each of those numbers represents a line of text in a plain text file.
I would like to know the best way to go about getting the data from the text file, and comparing it with the NSString. I need to know if the URL that the WebView is loading contains any of these strings.
The app needs to be very fast, so I can't just parse through every line in the text file, turn it into an array, and then compare each and every result.
Please share your ideas. Thanks.
I think the cleanest solution is to:
Create a web service that can offload the work to a server and return a response. Since it sounds like you're building a web protection service, your database may grow to be quite substantial over time, and you can just scale your server up to increase its speed. Furthermore, you don't want to have to update your app every time the lookup data changes.
Other options are:
Use a local SQLite database. SQL databases should perform lookups relatively fast.
If you don't want to use any database, have you tried putting all the search strings into an NSDictionary or NSMutableDictionary object? This way, you would just check if the valueForKey: for the string you're searching for is nil.
Sample code for this:
NSDictionary *searchDictionary = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithBool:YES], #"google.com",
[NSNumber numberWithBool:YES], #"yahoo.com",
[NSNumber numberWithBool:YES], #"bing.com",
nil];
NSString *searchString = #"bing.com";
if ([searchDictionary valueForKey:searchString]) {
// search string found
} else {
// search string not found
}
Note: if you want the NSDictionary to perform case-insensitive comparisons, pre-load all values lowercase, and make the search string lowercase when using valueForKey:.
How much memory this could take is a whole other story, but I don't see how this comparison could be made much faster locally. I strongly recommend the remove web service approach, though.
Create a string from the file and enumerate through the lines.
NSString *stringToCheck;
NSData *bytesOfFile = [NSData dataWithContentsOfFile:#"/path/myfile.txt"];
NSString *fileString = [[NSString alloc] initWithData:bytesOfFile
encoding:NSUTF8Encoding];
__block BOOL foundMatch = NO;
[fileString enumerateLinesUsingBlock:^(NSString *line, BOOL *stop){
if([stringToCheck isEqualToString:line]){
*stop = YES;
foundMatch = YES;
}
}];
This is a job for regular expressions. Take all of the substrings you're looking for/filtering against, escape them appropriately (escaping characters such as [, ], |, and \, among others, with \), and join them with a |. The resulting string is your regular expression, which you apply to each URL.
You could loop through an entire array full of substrings, doing rangeOfString:options: with each one, but that's the slow way. A good regular expression implementation is built for this sort of thing, and I would hope that Apple's implementation is suitable.
That said, profile the hell out of it. I've seen some regex implementations choke on the | operator, so you'll want to make sure that Apple's is not one of them.
If you need to compare each string in your text file, you are going to have to compare it, no way around it.
What you can do however is do it on a background thread while showing some loading or something, and it won't feel as if the app got stuck.
I would suggest you try with NSDictionary first. You can load up all your URLs into this, and internally it will use some sort of hash table/map for very quick (O(1)) lookup.
You can then check the result of [dictionary objectForKey:userURL], and if it returns something then the URL matched one in the dictionary.
The only problem with this is that it requires an exact string match. If your dictionary contains http://server/foobar and the user enters http://server/FOOBAR (because it's a case-insensitive server), you are going to get a miss on your lookup. Similarly, adding ?foobar queries to the end of URLs will result in a miss. You could also add an explicit port with server:80, and with %XX character encoding you can create hundreds of variations of the same URL. You will have to account for this and canonicalize both the URLs in your dictionary, and the URL entered by the user prior to lookup.

NSXMLElement, without an NSXMLDocument

I have an NSXMLElement that is a copy from somewhere else in my code. After it's been used, it no longer is connected to an NSXMLDocument. It's (what I believe to be) not linked to anything.
If I NSLog the NSXMLElement, I am given the contents. I can see it.
But, if I try to use nodesForXPath, it returns nothing. It's blank. It cannot find anything (unless I just search for *, in which case it returns everything with /t/t/n/t/t).
Now, if I make that NSXMLElement the root of a NEW NSXMLDocument, and then search the new document, I can search the XPath perfectly!
I am trying to understand the logic there. I have been reading the documentation, but I haven't found anything that explains what is happening here (or at least, I have no understood it if I did find it in the documentation).
I would really appreciate someone helping me understand exactly why this is happening, and whether or not I need to use this NSXMLDocument.
Here is my code:
(element is an NSXMLElement coming from an NSArray of stored NSXMLElements. They are copies from a document used in another method)
NSArray* scene = [element nodesForXPath:#"scene[1]" error:nil];
NSLog(#"scene: %#", [[scene objectAtIndex:0] stringValue]);
But if I do this, I get a result:
NSXMLDocument* docElement = [[NSXMLDocument alloc] initWithRootElement:element];
NSArray* scene = [docElement nodesForXPath:#"scene[1]" error:nil];
NSLog(#"scene: %#", [[scene objectAtIndex:0] stringValue]);
It's probably because the xpath context used by libxml2 holds a reference to the document. If there is no document, it probably doesn't know how to operate. The behavior if you search for * may just be a special-case because it knows it has to return everything so it doesn't even try to search.

Objective-C String-Replace

I want to replace multiple elements in my string in Objective-C.
In PHP you can do this:
str_replace(array("itemtoreplace", "anotheritemtoreplace", "yetanotheritemtoreplace"), "replacedValue", $string);
However in objective-c the only method I know is NSString replaceOccurancesOfString. Is there any efficient way to replace multiple strings?
This is my current solution (very inefficient and.. well... long)
NSString *newTitle = [[[itemTitleField.text stringByReplacingOccurrencesOfString:#"'" withString:#""] stringByReplacingOccurrencesOfString:#" " withString:#"'"] stringByReplacingOccurrencesOfString:#"^" withString:#""];
See what I mean?
Thanks,
Christian Stewart
If this is something you're regularly going to do in this program or another program, maybe make a method or conditional loop to pass the original string, and multi-dimensional array to hold the strings to find / replace. Probably not the most efficient, but something like this:
// Original String
NSString *originalString = #"My^ mother^ told me not to go' outside' to' play today. Why did I not listen to her?";
// Method Start
// MutableArray of String-pairs Arrays
NSMutableArray *arrayOfStringsToReplace = [NSMutableArray arrayWithObjects:
[NSArray arrayWithObjects:#"'",#"",nil],
[NSArray arrayWithObjects:#" ",#"'",nil],
[NSArray arrayWithObjects:#"^",#"",nil],
nil];
// For or while loop to Find and Replace strings
while ([arrayOfStringsToReplace count] >= 1) {
originalString = [originalString stringByReplacingOccurrencesOfString:[[arrayOfStringsToReplace objectAtIndex:0] objectAtIndex:0]
withString:[[arrayOfStringsToReplace objectAtIndex:0] objectAtIndex:1]];
[arrayOfStringsToReplace removeObjectAtIndex:0];
}
// Method End
Output:
2010-08-29 19:03:15.127 StackOverflow[1214:a0f] My'mother'told'me'not'to'go'outside'to'play'today.'Why'did'I'not'listen'to'her?
There is no more compact way to write this with the Cocoa frameworks. It may appear inefficient from a code standpoint, but in practice this sort of thing is probably not going to come up that often, and unless your input is extremely large and you're doing this incredibly frequently, you will not suffer for it. Consider writing these on three separate lines for readability versus chaining them like that.
You can always write your own function if you're doing something performance-critical that requires batch replace like this. It would even be a fun interview question. :)
Considered writing your own method? Tokenize the string and iterate through all of them replacing one by one, there really is no faster way than O(n) to replace words in a string.
Would be a single for loop at most.
Add the # to the start of the all the strings, as in
withString:#""
It's missing for a few.

Check that an email address is valid on iOS [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Best practices for validating email address in Objective-C on iOS 2.0?
I am developing an iPhone application where I need the user to give his email address at login.
What is the best way to check if an email address is a valid email address?
Good cocoa function:
-(BOOL) NSStringIsValidEmail:(NSString *)checkString
{
BOOL stricterFilter = NO; // Discussion http://blog.logichigh.com/2010/09/02/validating-an-e-mail-address/
NSString *stricterFilterString = #"^[A-Z0-9a-z\\._%+-]+#([A-Za-z0-9-]+\\.)+[A-Za-z]{2,4}$";
NSString *laxString = #"^.+#([A-Za-z0-9-]+\\.)+[A-Za-z]{2}[A-Za-z]*$";
NSString *emailRegex = stricterFilter ? stricterFilterString : laxString;
NSPredicate *emailTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", emailRegex];
return [emailTest evaluateWithObject:checkString];
}
Discussion on Lax vs. Strict - http://blog.logichigh.com/2010/09/02/validating-an-e-mail-address/
And because categories are just better, you could also add an interface:
#interface NSString (emailValidation)
- (BOOL)isValidEmail;
#end
Implement
#implementation NSString (emailValidation)
-(BOOL)isValidEmail
{
BOOL stricterFilter = NO; // Discussion http://blog.logichigh.com/2010/09/02/validating-an-e-mail-address/
NSString *stricterFilterString = #"^[A-Z0-9a-z\\._%+-]+#([A-Za-z0-9-]+\\.)+[A-Za-z]{2,4}$";
NSString *laxString = #"^.+#([A-Za-z0-9-]+\\.)+[A-Za-z]{2}[A-Za-z]*$";
NSString *emailRegex = stricterFilter ? stricterFilterString : laxString;
NSPredicate *emailTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", emailRegex];
return [emailTest evaluateWithObject:self];
}
#end
And then utilize:
if([#"emailString#email.com" isValidEmail]) { /* True */ }
if([#"InvalidEmail#notreallyemailbecausenosuffix" isValidEmail]) { /* False */ }
To check if a string variable contains a valid email address, the easiest way is to test it against a regular expression. There is a good discussion of various regex's and their trade-offs at regular-expressions.info.
Here is a relatively simple one that leans on the side of allowing some invalid addresses through: ^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,6}$
How you can use regular expressions depends on the version of iOS you are using.
iOS 4.x and Later
You can use NSRegularExpression, which allows you to compile and test against a regular expression directly.
iOS 3.x
Does not include the NSRegularExpression class, but does include NSPredicate, which can match against regular expressions.
NSString *emailRegex = ...;
NSPredicate *emailTest = [NSPredicate predicateWithFormat:#"SELF MATCHES %#", emailRegex];
BOOL isValid = [emailTest evaluateWithObject:checkString];
Read a full article about this approach at cocoawithlove.com.
iOS 2.x
Does not include any regular expression matching in the Cocoa libraries. However, you can easily include RegexKit Lite in your project, which gives you access to the C-level regex APIs included on iOS 2.0.
Heres a good one with NSRegularExpression that's working for me.
[text rangeOfString:#"^.+#.+\\..{2,}$" options:NSRegularExpressionSearch].location != NSNotFound;
You can insert whatever regex you want but I like being able to do it in one line.
to validate the email string you will need to write a regular expression to check it is in the correct form. there are plenty out on the web but be carefull as some can exclude what are actually legal addresses.
essentially it will look something like this
^((?>[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+\x20*|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*"\x20*)*(?<angle><))?((?!\.)(?>\.?[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+)+|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*")#(((?!-)[a-zA-Z\d\-]+(?<!-)\.)+[a-zA-Z]{2,}|\[(((?(?<!\[)\.)(25[0-5]|2[0-4]\d|[01]?\d?\d)){4}|[a-zA-Z\d\-]*[a-zA-Z\d]:((?=[\x01-\x7f])[^\\\[\]]|\\[\x01-\x7f])+)\])(?(angle)>)$
Actually checking if the email exists and doesn't bounce would mean sending an email and seeing what the result was. i.e. it bounced or it didn't. However it might not bounce for several hours or not at all and still not be a "real" email address. There are a number of services out there which purport to do this for you and would probably be paid for by you and quite frankly why bother to see if it is real?
It is good to check the user has not misspelt their email else they could enter it incorrectly, not realise it and then get hacked of with you for not replying. However if someone wants to add a bum email address there would be nothing to stop them creating it on hotmail or yahoo (or many other places) to gain the same end.
So do the regular expression and validate the structure but forget about validating against a service.

Regex to get value within tag

I have a sample set of XML returned back:
<rsp stat="ok">
<site>
<id>1234</id>
<name>testAddress</name>
<hostname>anotherName</hostname>
...
</site>
<site>
<id>56789</id>
<name>ba</name>
<hostname>alphatest</hostname>
...
</site>
</rsp>
I want to extract everything within <name></name> but not the tags themselves, and to have that only for the first instance (or based on some other test select which item).
Is this possible with regex?
<disclaimer>I don't use Objective-C</disclaimer>
You should be using an XML parser, not regexes. XML is not a regular language, hence not easely parseable by a regular expression. Don't do it.
Never use regular expressions or basic string parsing to process XML. Every language in common usage right now has perfectly good XML support. XML is a deceptively complex standard and it's unlikely your code will be correct in the sense that it will properly parse all well-formed XML input, and even it if does, you're wasting your time because (as just mentioned) every language in common usage has XML support. It is unprofessional to use regular expressions to parse XML.
You could use Expat, with has Objective C bindings.
Apple's options are:
The CF xml parser
The tree based Cocoa parser (10.4 only)
Without knowing your language or environment, here are some perl expressions. Hopefully it will give you the right idea for your application.
Your regular expression to capture the text content of a tag would look something like this:
m/>([^<]*)</
This will capture the content in each tag. You will have to loop on the match to extract all content. Note that this does not account for self-terminated tags. You would need a regex engine with negative lookbehinds to accomplish that. Without knowing your environment, it's hard to say if it would be supported.
You could also just strip all tags from your source using something like:
s/<[^>]*>//g
Also depending on your environment, if you can use an XML-parsing library, it will make your life much easier. After all, by taking the regex approach, you lose everything that XML really offers you (structured data, context awareness, etc).
The best tool for this kind of task is XPath.
NSURL *rspURL = [NSURL fileURLWithPath:[#"~/rsp.xml" stringByExpandingTildeInPath]];
NSXMLDocument *document = [[[NSXMLDocument alloc] initWithContentsOfURL:rspURL options:NSXMLNodeOptionsNone error:NULL] autorelease];
NSArray *nodes = [document nodesForXPath:#"/rsp/site[1]/name" error:NULL];
NSString *name = [nodes count] > 0 ? [[nodes objectAtIndex:0] stringValue] : nil;
If you want the name of the site which has id 56789, use this XPath: /rsp/site[id='56789']/name instead. I suggest you read W3Schools XPath tutorial for a quick overview of the XPath syntax.
As others say, you should really be using NSXMLParser for this sort of thing.
HOWEVER, if you only need to extract the stuff in the name tags, then RegexKitLite can do it quite easily:
NSString * xmlString = ...;
NSArray * captures = [xmlString arrayOfCaptureComponentsMatchedByRegex:#"<name>(.*?)</name>"];
for (NSArray * captureGroup in captures) {
NSLog(#"Name: %#", [captureGroup objectAtIndex:1];
}
Careful about namespaces:
<prefix:name xmlns:prefix="">testAddress</prefix:name>
is equivalent XML that will break regexp based code. For XML, use an XML parser. XPath is your friend for things like this. The XPath code below will return a sequence of strings with the info you want:
./rsp/site/name/text()
Cocoa has NSXML support for XPath.