How to use NSScanner to scan from a string - objective-c

I have a string which looks like:
#chat :hi there
And I'd like to scan all the text from the : to a string, so it ends like hi there
I've tried
[[NSScanner scannerWithString:argument] scanUpToString:#":" intoString:&newarg];
But newarg contains only #chat. How this can be achieved?

Example String:
#chat :Hello World,
#chat :How are you doing?
Code:
NSString *theString = #"#chat :Hello World,\n"
"#chat :How are you doing?";
NSScanner *theScanner = [NSScanner scannerWithString:theString];
NSCharacterSet *seperator = [NSCharacterSet characterSetWithCharactersInString:#":"];
NSCharacterSet *newLine = [NSCharacterSet newlineCharacterSet];
NSString *theText;
while ([theScanner isAtEnd] == NO) {
[theScanner scanUpToCharactersFromSet:seperator intoString:NULL];
[theScanner setScanLocation: [theScanner scanLocation]+1];
[theScanner scanUpToCharactersFromSet:newLine intoString:&theText];
NSLog(#"%#",theText);
}
Output:
Hello World,
How are you doing?

Consider using an if statement to iterate through the scan. You've always told the computer to scan everything until the character ":", but it sounds like you actually want to scan everything AFTER the ":" character. Anne's answer provides an excellent example of such.

Related

Parsing SRT file with Objective C

Text example:
1
00:00:00,000 --> 00:00:01,000
This is the first line
2
00:00:01,000 --> 00:00:02,000
This is the second line
3
00:00:02,000 --> 00:00:03,000
This is the last line
In JavaScript I would parse this with a regular expression certainly. I'm just wondering, is that the best way to do this in Obj C? I'm sure I could figure out a way to do this, but I'm wanting to do it an appropriate way.
I only need to know where to start and I'm happy to do the rest, but for understanding sake I'm going to end up with something like this (pseudo code):
NSDictionary
index -> [0-9]+
start -> hh:mm:ss,mmm
end -> hh:mm:ss,mmm
text -> one of the lines of text
In this case, I'd be parsing three entries into my dictionary.
Some background: I wrote a small app and created a file called stuff.srt containing your examples that resides in the bundle; hence, my means of accessing it.
This is just a quick and dirty thing, a proof-of-concept. Note that it doesn't check results. Real applications always check their results. As you can see, the work takes place in the -applicationDidFinishLaunching: method (I'm working in Mac OS X, not iOS).
EDIT:
It's been pointed out that the code as originally posted didn't handle multiple text lines correctly. To address this, I take advantage of the fact that SRT files use CRLF as their line breaks, and search for two occurrences of this sequence. I then change all occurrences of CRLF in the text string to spaces, based on what I observed here. This doesn't account for leading or trailing spaces in each line of the text.
I changed the contents of the stuff.srt file to this:
1
00:00:00,000 --> 00:00:01,000
This is the first line
and it has a secondary line
2
00:00:01,000 --> 00:00:02,000
This is the second line
3
00:00:02,000 --> 00:00:03,000
This is the last line
and it has a secondary line too
and the code has been revised as follows (I also put everything into an #autoreleasepool directive; there might be a lot of autoreleased objects generated in the course of parsing the file!):
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification
{
NSString *path = [[NSBundle mainBundle] pathForResource:#"stuff" ofType:#"srt"];
NSString *string = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:NULL];
NSScanner *scanner = [NSScanner scannerWithString:string];
while (![scanner isAtEnd])
{
#autoreleasepool
{
NSString *indexString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&indexString];
NSString *startString;
(void) [scanner scanUpToString:#" --> " intoString:&startString];
// My string constant doesn't begin with spaces because scanners
// skip spaces and newlines by default.
(void) [scanner scanString:#"-->" intoString:NULL];
NSString *endString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&endString];
NSString *textString;
// (void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&textString];
// BEGIN EDIT
(void) [scanner scanUpToString:#"\r\n\r\n" intoString:&textString];
textString = [textString stringByReplacingOccurrencesOfString:#"\r\n" withString:#" "];
// Addresses trailing space added if CRLF is on a line by itself at the end of the SRT file
textString = [textString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
// END EDIT
NSDictionary *dictionary = [NSDictionary dictionaryWithObjectsAndKeys:
indexString, #"index",
startString, #"start",
endString , #"end",
textString , #"text",
nil];
NSLog(#"%#", dictionary);
}
}
}
The revised output looks like this:
2013-02-09 16:10:17.727 SRTFileScan[4846:303] {
end = "00:00:01,000";
index = 1;
start = "00:00:00,000";
text = "This is the first line and it has a secondary line";
}
2013-02-09 16:10:17.729 SRTFileScan[4846:303] {
end = "00:00:02,000";
index = 2;
start = "00:00:01,000";
text = "This is the second line";
}
2013-02-09 16:10:17.730 SRTFileScan[4846:303] {
end = "00:00:03,000";
index = 3;
start = "00:00:02,000";
text = "This is the last line and it has a secondary line too";
}
One other thing I learned from what I've read today: The SRT file format originated in France, and the comma seen in the input is the decimal separator used there.
Apple has a sample code to parse subtitle files. Check the relevant part here:
https://developer.apple.com/library/mac/samplecode/avsubtitleswriterOSX/Listings/avsubtitleswriter_SubtitlesTextReader_m.html#//apple_ref/doc/uid/DTS40013409-avsubtitleswriter_SubtitlesTextReader_m-DontLinkElementID_5
My suggest is to use a NSDateFormatter to parse the second line. I would split that string in two strings (see componentsSeparatedByString: in NSString class reference). This while reading the file line per line.
So the loop would be:
If the file contains again data, read the next line;
If the next line is a multiple of 4, allocate a new object. This object should be able to contain two dates, one integer and one string;
If the next line is not a multiple of 4, read the line and assign it's value to the corresponding field.

Yet another NSScanner characterSetWithCharactersInString newb

Let's assume I have a string ("G00 X0.0000 Y0.0000") and I need to to parse its contents. Here is my code:
NSCharacterSet *params = [NSCharacterSet characterSetWithCharactersInString:#"XY"];
//setup the scanner
NSScanner *scanner = [NSScanner scannerWithString:stringToBeScanned];
NSString *scanned = nil;
//scan the string
NSLog(#"%#", stringToBeScanned);
while ([scanner scanUpToCharactersFromSet:params intoString:&scanned]) {
struct keypair code;
code.key = [scanned characterAtIndex:0];
code.value = [[scanned substringFromIndex:1] doubleValue];
NSLog(#"--> %# [%lu]= (%c, %.4f)", scanned, [scanner scanLocation], code.key, code.value);
}
And the output to NSLog:
G00 X0.0000 Y0.0000
--> G00 [4]= (G, 0.0000)
My characterSet includes both 'X' and 'Y' and I can't figure out why my NSScanner won't scan in the 'X0.0000 ' - it should find that Y and pull in everything from X up to Y according to my understanding.
I can see from the scanLocation that the scanner is stopping at index 4 (correctly), but the loop either doesn't continue or evaluates to false. Shouldn't the scanner keep looping and finding my delimiters (from the characterSet) and grabbing data?
scanUpToCharactersFromSet:intoString: scans up to the "X" and gives you the characters it scanned "G00 ".
Note that it does not scan the "X". When you call the method again, it looks at the next character (the "X"), notices that it is a character in the set, and stops scanning. As it scanned no characters, it then returns NO.
To scan the "X" (or "Y"), you will want to use scanCharactersFromSet:intoString: as well.
I solved this issue. Basically I receive a string with a list of "codes" followed by a value associated with that command/parameter. There could several different "commands" in each string, or none at all. The key was to use scanCharactersFromSet: and scanUpToCharactersFromSet: in order to capture the right pairings and parse the entire string while staying very flexible. It's a little ugly, I know.
Here is my code:
//setup the scanner
NSScanner *scanner = [NSScanner scannerWithString:[self stringByAppendingString:#"!"]];
NSCharacterSet *codeset = [NSCharacterSet characterSetWithCharactersInString:#"GMTFIJKPRSXYZ!"];
NSString *scanned = nil;
char codechar;
//perform the first scan
[scanner scanCharactersFromSet:codeset intoString:&scanned];
if (scanned)
codechar = [scanned characterAtIndex:0];
//scan the string
while ([scanner scanUpToCharactersFromSet:codeset intoString:&scanned]) {
struct keypair code;
code.key = codechar;
code.value = [scanned doubleValue];
NSLog(#"--> (%c, %.4f)", code.key, code.value);
//skip over the delimeter we encountered
[scanner scanCharactersFromSet:codeset intoString:&scanned];
if (scanned)
codechar = [scanned characterAtIndex:0];
}

Split NSString into NSArray by blank lines

I am reading a *.srt subtitle file into a NSString. The content of this string looks like this:
1
00:00:20,000 --> 00:00:24,400
Altocumulus clouds occur between six thousand
2
00:00:24,600 --> 00:00:27,800
and twenty thousand feet above ground level.
I am looking for an elegant solution to split this string into an NSArray in which each element contains the information which is related to one particular subtitle-"frame", e.g. the zeroth element would look like this:
1
00:00:20,000 --> 00:00:24,400
Altocumulus clouds occur between six thousand
Any ideas how to accomplish this task in an elegant manner? I tried splitting the original string using the method
[string componentsSeparatedByString:#"\n\n"];
but this method fails to detect the blank lines..
Thanks for your help!
tobi
If [string componentsSeparatedByString:#"\n\n"] doesn't work, then there are two possibilities:
Your file contains MSDOS-style line breaks, which are \r\n. So try splitting on #"\r\n\r\n".
Your supposedly blank lines contain spaces or tabs. You can check this from the shell using cat -e.
I'd suggest using NSScanner instead. It's more flexible and you don't have to worry about whether your line breaks are Windows or Unix style and whether the blank lines contain any spaces. Here's an example:
NSMutableArray *lines = [NSMutableArray array];
NSString *s = #"foo\n\nbar\r\n \t \r\nbaz"; //intentionally mixed line breaks
NSScanner *scanner = [NSScanner scannerWithString:s];
while (![scanner isAtEnd]) {
[scanner scanCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:NULL];
NSString *line = nil;
[scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&line];
if (line) {
[lines addObject:line];
}
}
NSLog(#"%#", lines);
According to http://en.wikipedia.org/wiki/SubRip, the line breaks are a CRLF, which would be \r\n.

Using NSScanner to parse a string

I have a string with formatting tags in it, such as There are {adults} adults, and {children} children. I have a dictionary which has "adults" and "children" as keys, and I need to look up the value and replace the macros with that value. This is fully dynamic; the keys could be anything (so I can't hardcode a stringByReplacingString).
In the past, I've done similar things before just by looping through a mutable string, and searching for the characters; removing what I've already searched for from the source string as I go. It seems like this is exactly the type of thing NSScanner is designed for, so I tried this:
NSScanner *scanner = [NSScanner scannerWithString:format];
NSString *foundString;
scanner.charactersToBeSkipped = nil;
NSMutableString *formatedResponse = [NSMutableString string];
while ([scanner scanUpToString:#"{" intoString:&foundString]) {
[formatedResponse appendString:[foundString stringByReplacingOccurrencesOfString:#"{" withString:#""]]; //Formatted string contains everything up to the {
[scanner scanUpToString:#"}" intoString:&foundString];
NSString *key = [foundString stringByReplacingOccurrencesOfString:#"}" withString:#""];
[formatedResponse appendString:[data objectForKey:key]];
}
NSRange range = [format rangeOfString:#"}" options:NSBackwardsSearch];
if (range.location != NSNotFound) {
[formatedResponse appendString:[format substringFromIndex:range.location + 1]];
}
The problem with this is that when my string starts with "{", then the scanner returns NO, instead of YES. (Which is what the documentation says should happen). So am I misusing NSScanner? The fact that scanUpToString doesn't include the string that was being searched for as part of its output seems to make it almost useless...
Can this be easily changed to do what I want, or do I need to re-write using a mutable string and searching for the characters manually?
Use isAtEnd to determine when to stop. Also, the { and } are not included in the result of scanUpToString:, so they will be at the beginning of the next string, but the append after the loop is not necessary since the scanner will return scanned content even if the search string is not found.
// Prevent scanner from ignoring whitespace between formats. For example, without this, "{a} {b}" and "{a}{b}" and "{a}
//{b}" are all equivalent
[scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:#""]];
while(![scanner isAtEnd]) {
if([scanner scanUpToString:#"{" intoString:&foundString]) {
[formattedResponse appendString:foundString];
}
if(![scanner isAtEnd]) {
[scanner scanString:#"{" intoString:nil];
foundString = #""; // scanUpToString doesn't modify foundString if no characters are scanned
[scanner scanUpToString:#"}" intoString:&foundString];
[formattedResponse appendString:[data objectForKey:foundString];
[scanner scanString:#"}"];
}
}

scanUpToCharactersFromSet stops after one loop

I'm trying to get the contents of a CSV file into an array. When I've done this before I had one record per line, and used the newline character with scanUpToCharactersFromSet:intoString:, passing newlineCharacterSet as the character set:
while ([lineScanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet]
intoString:&line])
Now, I'm working with a file where many of the entries themselves contain newline characters. I've tried adding a unique character to the end of each record (a * character) but my loop only runs once. Is there something which is making the while loop break that I don't know about? Here's the code I'm using now:
NSError *error;
NSString *data = [[NSString alloc] initWithContentsOfFile:[[self delegate] filePath] encoding:NSUTF8StringEncoding error:&error];
NSScanner *lineScanner = [NSScanner scannerWithString:data];
NSString *line = nil;
// Start parsing the CSV file
while ([lineScanner scanUpToCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:#"*"]
intoString:&line]) {
NSArray *elements = [line componentsSeparatedByString:#","];
NSLog("Name: %#", [elements objectAtIndex:1]);
}
**Edit: ** Thanks to Peter's answer below, I found that my scanner was stuck behind the * character. I added this line in the loop:
[lineScanner scanCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:#"*"] intoString:NULL];
and now it's working like it should.
Let's go through one pass at a time:
First:
while ([lineScanner scanUpToCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:[NSCharacterSet newlineCharacterSet]] intoString:&line]) {
The scanner puts everything before the line break into line. It advances up to the newline.
Second:
while ([lineScanner scanUpToCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:[NSCharacterSet newlineCharacterSet]] intoString:&line]) {
The scanner is already on a line break, so it scans no characters. As documented, since it scanned no characters, it returns NO. Your loop terminates.
The solution is to scan the line break at the end of the loop, to get the scanner past it. You can pass NULL for the output parameter, assuming you don't care what the line break was.
This is correct behavior: If you did/do care what the characters you scanned up to were, this lets you obtain them. That would be more difficult if NSScanner scanned past the characters automatically.
I think the while condition is wrong. According to the String Programming Guide, it should be something like:
while ([theScanner isAtEnd] == NO) {
[lineScanner scanUpToCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:#"*"] intoString:&line]
// ...
}