Objective C: componentsSeperatedByRegex - objective-c

Probably some easy answer out there, but I wasn't able to find it. I am using RegexKitLite to manipulate some NSStrings in my iPhone app. I want to get the set of links from an entire HTML source from a webpage. I want it to return the array of links, not the rest.
My code:
NSString *regex = #"[0-9]{1,}.htm";
for (NSString *match in [sourcePage componentsSeperatedByRegex: regex]{
NSLog (#"%#", match); // Just to debug, will use data later.
}
The problem here is that the enire page is returned in small pieces, but exactly without the strings I want.
Any help is appreciated!

Use -componentsMatchedByRegex: instead.

Related

What is the most efficient way to compare an NSString in this way

I have an app (Cocoa Touch, Web Browser), however I need to be able to compare an NSString with thousands of other strings. Here's the deal.
When a WebView loads, I get the URL. I need to compare this URL with literally thousands of results (27,847). Each of those numbers represents a line of text in a plain text file.
I would like to know the best way to go about getting the data from the text file, and comparing it with the NSString. I need to know if the URL that the WebView is loading contains any of these strings.
The app needs to be very fast, so I can't just parse through every line in the text file, turn it into an array, and then compare each and every result.
Please share your ideas. Thanks.
I think the cleanest solution is to:
Create a web service that can offload the work to a server and return a response. Since it sounds like you're building a web protection service, your database may grow to be quite substantial over time, and you can just scale your server up to increase its speed. Furthermore, you don't want to have to update your app every time the lookup data changes.
Other options are:
Use a local SQLite database. SQL databases should perform lookups relatively fast.
If you don't want to use any database, have you tried putting all the search strings into an NSDictionary or NSMutableDictionary object? This way, you would just check if the valueForKey: for the string you're searching for is nil.
Sample code for this:
NSDictionary *searchDictionary = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithBool:YES], #"google.com",
[NSNumber numberWithBool:YES], #"yahoo.com",
[NSNumber numberWithBool:YES], #"bing.com",
nil];
NSString *searchString = #"bing.com";
if ([searchDictionary valueForKey:searchString]) {
// search string found
} else {
// search string not found
}
Note: if you want the NSDictionary to perform case-insensitive comparisons, pre-load all values lowercase, and make the search string lowercase when using valueForKey:.
How much memory this could take is a whole other story, but I don't see how this comparison could be made much faster locally. I strongly recommend the remove web service approach, though.
Create a string from the file and enumerate through the lines.
NSString *stringToCheck;
NSData *bytesOfFile = [NSData dataWithContentsOfFile:#"/path/myfile.txt"];
NSString *fileString = [[NSString alloc] initWithData:bytesOfFile
encoding:NSUTF8Encoding];
__block BOOL foundMatch = NO;
[fileString enumerateLinesUsingBlock:^(NSString *line, BOOL *stop){
if([stringToCheck isEqualToString:line]){
*stop = YES;
foundMatch = YES;
}
}];
This is a job for regular expressions. Take all of the substrings you're looking for/filtering against, escape them appropriately (escaping characters such as [, ], |, and \, among others, with \), and join them with a |. The resulting string is your regular expression, which you apply to each URL.
You could loop through an entire array full of substrings, doing rangeOfString:options: with each one, but that's the slow way. A good regular expression implementation is built for this sort of thing, and I would hope that Apple's implementation is suitable.
That said, profile the hell out of it. I've seen some regex implementations choke on the | operator, so you'll want to make sure that Apple's is not one of them.
If you need to compare each string in your text file, you are going to have to compare it, no way around it.
What you can do however is do it on a background thread while showing some loading or something, and it won't feel as if the app got stuck.
I would suggest you try with NSDictionary first. You can load up all your URLs into this, and internally it will use some sort of hash table/map for very quick (O(1)) lookup.
You can then check the result of [dictionary objectForKey:userURL], and if it returns something then the URL matched one in the dictionary.
The only problem with this is that it requires an exact string match. If your dictionary contains http://server/foobar and the user enters http://server/FOOBAR (because it's a case-insensitive server), you are going to get a miss on your lookup. Similarly, adding ?foobar queries to the end of URLs will result in a miss. You could also add an explicit port with server:80, and with %XX character encoding you can create hundreds of variations of the same URL. You will have to account for this and canonicalize both the URLs in your dictionary, and the URL entered by the user prior to lookup.

Using libqrencode library

I loaded libqrencode library in my cocoa project but I'm not sure how to use it exactly. I have a text field in which you type a text, and once done you click a button and I log that text with NSLog. Now I want to encode that text to be able to use it later and generate a QRcode out of it, so in the manual it's saying to use this format
QRcode* QRcode_encodeString (const char * string,
int version,
QRecLevel level,
QRencodeMode hint,
int casesensitive
)
I am not sure how to use that in my method to log the results as well
- (IBAction)GenerateCode:(id)sender {
NSString *urlText = [[NSString alloc] initWithFormat:#"%#", [_urlField stringValue]];
NSLog(#"The url is %#", urlText);
}
You need to get from an NSString instance to a const char *. This has been answered several times on SO, but here's one. Once you do that, you can call QRCode_encodeString() directly and pass whatever you desire for the arguments.
If you need more specifics, you'll have to try something, post your code, and describe how it's not working for you so we can help you more directly without just writing it for you.

NSXMLElement, without an NSXMLDocument

I have an NSXMLElement that is a copy from somewhere else in my code. After it's been used, it no longer is connected to an NSXMLDocument. It's (what I believe to be) not linked to anything.
If I NSLog the NSXMLElement, I am given the contents. I can see it.
But, if I try to use nodesForXPath, it returns nothing. It's blank. It cannot find anything (unless I just search for *, in which case it returns everything with /t/t/n/t/t).
Now, if I make that NSXMLElement the root of a NEW NSXMLDocument, and then search the new document, I can search the XPath perfectly!
I am trying to understand the logic there. I have been reading the documentation, but I haven't found anything that explains what is happening here (or at least, I have no understood it if I did find it in the documentation).
I would really appreciate someone helping me understand exactly why this is happening, and whether or not I need to use this NSXMLDocument.
Here is my code:
(element is an NSXMLElement coming from an NSArray of stored NSXMLElements. They are copies from a document used in another method)
NSArray* scene = [element nodesForXPath:#"scene[1]" error:nil];
NSLog(#"scene: %#", [[scene objectAtIndex:0] stringValue]);
But if I do this, I get a result:
NSXMLDocument* docElement = [[NSXMLDocument alloc] initWithRootElement:element];
NSArray* scene = [docElement nodesForXPath:#"scene[1]" error:nil];
NSLog(#"scene: %#", [[scene objectAtIndex:0] stringValue]);
It's probably because the xpath context used by libxml2 holds a reference to the document. If there is no document, it probably doesn't know how to operate. The behavior if you search for * may just be a special-case because it knows it has to return everything so it doesn't even try to search.

Manipulating HTML

I need to read a HTML file and search for some tags in it. Based on the results, some tags would need to be removed, other ones changed and maybe refining some attributes — to then write the file back.
Is NSXMLDocument the way to go? I don't think that a parser is really needed in this case, it could even mean more work. And I don't want to touch the entire file, all I need to do is to load the file in memory, change some things, and save it again.
Note that, I'll be dealing with HTML, and not XHTML. Could that be a problem for NSXMLDocument? Maybe some unmatched tags or un-closed ones could make it stop working.
NSXMLDocument is the way to go. That way you can use Xpath/Xquery to find the tags you want. Bad HTML might be a problem but you can set NSXMLDocumentTidyHTML and it should be OK unless it's really bad.
NSRange startRange = [string rangeOfString:#"<htmlTag>"];
NSRange endRange = [string rangeOfString:#"</htmlTag>"];
NSString *subStr = [string subStringWithRange:NSMakeRange(startRange.location+startRange.length, endRange.location-startRange.location-startRange.length)];
NSString *finalStr = [string stringByReplacingOccurencesOfString:substr];
and then write finalstr to the file.
This is what I would do, note that I don't exactly know what the advantages of using NSXMLDocument would be, this should do it perfectly.
NSXMLDocument will possibly fail, due to the fact that HTML pages are not well formed, but you can try with NSXMLDocumentTidyHTML/NSXMLDocumentTidyXML (you can use them both to improve results) as outlined here and also have a look a this for tan approach at modifying the HTML.

NSString writeToFile with URL encoded string

I have a Mac application that keeps it's own log file. It appends info to the file using NSString's writeToFile method. One of the things that it logs are URL's of web services that it is interacting with. To encode the URL, I'm doing this:
searchString = (NSString *)CFURLCreateStringByAddingPercentEscapes(NULL, (CFStringRef)searchString, NULL, (CFStringRef)#"!*'();:#&=+$,/?%#[]", kCFStringEncodingUTF8 );
The app then appends searchString to the rest of the URL and writes it to the log file. Now the problem is that after adding that URL encoding line, nothing seems to be getting written to the file. The program functions as expected otherwise however. Removing the line of code above results in all of the correct information being logged to the file (removing that line is not an option because searchString must be URL encoded).
Oh and I am using NSUTF8StringEncoding when writing the NSString to the file.
Thanks for any help.
EDIT: I know there's also a similar function to CFURLCreateStringByAddingPercentEscapes in NSString, but I've read that it doesn't always work. Can anyone shed some light on this if my original question cannot be answered? Thanks! (EDIT: same problem occurs when using stringByAddingPercentEscapesUsingEncoding:)
EDIT 2: Here's the code that I'm using to append messages to the log file.
+(void)logText:(NSString *)theString{
NSString *docsDirectory = [NSSearchPathForDirectoriesInDomains(NSApplicationSupportDirectory,NSUserDomainMask,YES) objectAtIndex:0];
NSString *path = [docsDirectory stringByAppendingPathComponent:#"Folder/File.log"];
NSString *fileContents = [[[NSString alloc] initWithContentsOfFile:path] autorelease];
if([fileContents lengthOfBytesUsingEncoding:NSUTF8StringEncoding] >= 204800){
fileContents = #"";
}
NSString *timeStamp = [[NSDate date] description];
timeStamp = [timeStamp stringByAppendingString:#": "];
timeStamp = [timeStamp stringByAppendingString:theString];
fileContents = [fileContents stringByAppendingString:timeStamp];
fileContents = [fileContents stringByAppendingString:#"\n"];
[fileContents writeToFile:path atomically:YES encoding:NSUTF8StringEncoding error:NULL];
}
Because after almost a whole day no one else has offered any answers, I'm going to post a wild guess here: you're not accidentally using the string you want to output (with percent characters in it) as a format string are you?
That is, making the mistake of doing:
NSLog(#"In format strings you can use %# as a placeholder for an object, and %i for a plain C integer.")
Instead of:
NSLog(#"%#", #"In format strings you can use %# as a placeholder for an object, and %i for a plain C integer.");
But I'm going to be surprised if this turns out to be the cause of your problem, as it usually causes random-looking output, rather than absolutely no output. And in some cases, Xcode also gives compiler warnings about it (when I tried NSLog(myString), I got "warning: format not a string literal and no format arguments").
So don't shoot me down if this answer doesn't help. It would be easier to answer your question if you could show us more of your logging code. As for the one line you provided, I can't detect anything wrong with it.
Edit: Oops, I kind of missed that you mentioned you're using writeToFile:atomically:encoding:error: to write the string to the file, so it's even more unlikely you're accidentally treating it as a format string somewhere. But I'm going to leave this answer up for now. Again, you should really show us more of your code though ...
Edit: Regarding your question on a method in NSString that has similar percent encoding functionality, that would be stringByAddingPercentEscapesUsingEncoding:. I'm not sure what kind of problems you're thinking of when you say you've heard it doesn't always work. But one thing is that CFURLCreateStringByAddingPercentEscapes allows you to specify extra characters that don't normally have to be escaped but which you still want to be escaped, while the method of NSString doesn't allow you to specify this.