Get certain values in a NSString - objective-c

I have this NSString below. I want to be able to get the name of each application and the location of it. Can someone help me or point me in the right direction?
Thanks in advance.
Applications:
Xcode:
Version: 4.5.2
Last Modified: 11/3/12 11:45 PM
Kind: Intel
64-Bit (Intel): Yes
App Store: Yes
Location: /Applications/Xcode.app
Terminal:
Version: 2.3
Last Modified: 6/21/12 12:01 AM
Kind: Intel
64-Bit (Intel): Yes
App Store: No
Location: /Applications/Utilities/Terminal.app
Google Chrome:
Version: 23.0.1271.64
Last Modified: 10/31/12 7:59 PM
Kind: Intel
64-Bit (Intel): No
App Store: No
Location: /Applications/Google Chrome.app
App Store:
Version: 1.2.1
Last Modified: 4/18/12 7:52 PM
Kind: Intel
64-Bit (Intel): Yes
App Store: No
Location: /Applications/App Store.app

Create an instance of NSScanner using your string. Since instances of NSScanner default to skipping whitespace and newlines by default, you need to disable this behavior before proceeding, like this:
[scanner setCharactersToBeSkipped:[[[NSCharacterSet alloc] init] autorelease]];
From here, you'd do something like this:
NSString *appName = nil;
[scanner scanString:#"Applications:" intoString:NULL];
NSCharacterSet *charset = [NSCharacterSet newlineCharacterSet];
[scanner scanCharactersFromSet:charset intoString:NULL];
while ([scanner isAtEnd] == NO)
{
// Get your application name.
[scanner scanUpToCharactersFromSet:charset intoString:&appName];
[scanner scanCharactersFromSet:charset intoString:NULL];
// You could do something with your application name here.
// Skip over the other stuff.
for (NSUInteger idx = 0; idx < 6; idx++)
{
[scanner scanUpToString:#":" intoString:NULL];
[scanner scanString:#":" intoString:NULL];
[scanner scanUpToCharactersFromSet:charset intoString:NULL];
[scanner scanCharactersFromSet:charset intoString:NULL];
}
}
Note that, despite appearances, I didn't give you a complete solution. Beyond checking for the end of the string, there's no error checking, and it goes without saying that Real Applications check for errors. (There should even have been a check for end-of-string at the very outset.) Also, this snippet relies on a rigid presentation of the data, such as you have provided.

I would use componentsSeparatedByString if there are easy delimiters. If not, try to use a regex with NSRegularExpression to find the value

There are numerous ways to solve this. NSRegularExpression has been mentioned. NSScanner is another. Here's a potential solution (no guarantees about correctness - just a guide.)
#import <Foundation/Foundation.h>
int main(int argc, const char * argv[])
{
#autoreleasepool {
NSString *source = #"Applications:\n\nXcode:\n\n Version: 4.5.\n Last Modified: 11/3/12 11:45 PM\n Kind: Intel\n 64-Bit (Intel): Yes\n App Store: Yes\n Location: /Applications/Xcode.app";
NSScanner *scanner = [NSScanner scannerWithString:source];
[scanner scanUpToString:#":" intoString:NULL];
[scanner scanString:#":" intoString:NULL];
while( ![scanner isAtEnd] ) {
NSString *appName = nil;
NSString *appPath = nil;
[scanner scanUpToCharactersFromSet:[NSCharacterSet alphanumericCharacterSet] intoString:NULL];
[scanner scanCharactersFromSet:[NSCharacterSet alphanumericCharacterSet] intoString:&appName];
NSString *junk = nil;
[scanner scanUpToString:#"Location: " intoString:&junk];
[scanner scanString:#"Location: " intoString:NULL];
[scanner scanUpToString:#"\n" intoString:&appPath];
[scanner scanString:#"\n" intoString:NULL];
printf("app name = %s, path = %s\n",[appName UTF8String],[appPath UTF8String]);
}
}
return 0;
}

Related

How can I modify this SRT file parser?

I found some good code for parsing .srt files on stackoverflow (Parsing SRT file with Objective C) shown below:
NSScanner *scanner = [NSScanner scannerWithString:[theTextView string]];
while (![scanner isAtEnd])
{
#autoreleasepool
{
NSString *indexString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&indexString];
NSString *startString;
(void) [scanner scanUpToString:#" --> " intoString:&startString];
(void) [scanner scanString:#"-->" intoString:NULL];
NSString *endString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&endString];
NSString *textString;
(void) [scanner scanUpToString:#"\r\n\r\n" intoString:&textString];
textString = [textString stringByReplacingOccurrencesOfString:#"\r\n" withString:#" "];
textString = [textString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
NSDictionary *dictionary = [NSDictionary dictionaryWithObjectsAndKeys:
indexString, #"index",
startString, #"start",
endString , #"end",
textString , #"text",
nil];
NSLog(#"%#", dictionary);
}
}
I have a number of .srt files from a TV series that contain a lot of ‘credit’ subs which kinda spoil the experience and coded them out, leaving me with non-sequential indexes like this:
// deleted subtitles
3
00:00:11,070 --> 00:00:14,466
Screenwriter: Name here...
4
00:00:14,633 --> 00:00:17,466
Music: Name here...
5
00:00:17,686 --> 00:00:20,680
Narrator: Name here...
// deleted subtitle
7
00:01:17,966 --> 00:01:21,966
Episode 12
which chokes FCPX when I try to import the file. I’m completely new to NSScanner and tried everything I can think of without success. I'd appreciate any help in modifying the above just to skip the sub index line altogether (if possible?). I'm okay with adding them back in sequentially with separate code. Thanks!
UPDATE:
Thanks for your suggestion of indexing through the 'while' loop skaak, but the problem still seems to defy logic as it never increases beyond the very first pass (!!). The logs are shown below - firstly using an NSDictionary and then appending to an NSMutableString (probably more useful for my purposes). Note that in both cases the first sub does get changed to 1, but indices 4,5,7 remain unchanged rather than being renumbered 2,3,4.
2020-07-29 18:35:26.267 SRT Editor[12494:903]
{
end = "00:00:14,466";
index = 1;
start = "00:00:11,070";
text = "Screenwriter: Hashida Sugako\n\n4 00:00:14,633 --> 00:00:17,466 Music: Sakada Koichi\n\n5 00:00:17,686 --> 00:00:20,680 Narrator: Naraoka Tomoko\n\n7 00:01:28,633 --> 00:01:34,233 It was early spring in 1958...
}
2020-07-29 18:51:15.612 SRT Editor[12646:903]
1
00:00:11,07000:00:14,466Screenwriter: Hashida Sugako
4 00:00:14,633 --> 00:00:17,466 Music: Sakada Koichi
5 00:00:17,686 --> 00:00:20,680 Narrator: Naraoka Tomoko
7 00:01:28,633 --> 00:01:34,233 It was early spring in 1958...
Another puzzling observation is that if I put in a loopCounter++ it also suggests the 'while' loop only makes one pass through which baffles me, though I did mention being unfamiliar with NSScanner.
Try this
NSUInteger index = 1;
NSScanner *scanner = [NSScanner scannerWithString:[theTextView string]];
while (![scanner isAtEnd])
{
#autoreleasepool
{
NSString *indexString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&indexString];
NSString *startString;
(void) [scanner scanUpToString:#" --> " intoString:&startString];
(void) [scanner scanString:#"-->" intoString:NULL];
NSString *endString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&endString];
NSString *textString;
(void) [scanner scanUpToString:#"\r\n\r\n" intoString:&textString];
textString = [textString stringByReplacingOccurrencesOfString:#"\r\n" withString:#" "];
textString = [textString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
NSDictionary *dictionary = [NSDictionary dictionaryWithObjectsAndKeys:
// Use my own incremental index
#( index ), #"index",
startString, #"start",
endString , #"end",
textString , #"text",
nil];
NSLog(#"%#", dictionary);
// Move to next index
index ++;
}
}

Remove contents between script and style tags in Objective-C

Alright, so I am working on a web crawler that can take webpages and convert them into passages of text. To remove the tags themselves, I found this on Stack Overflow:
- (NSString *) stripTags:(NSString *)str
{
NSMutableString *ms = [NSMutableString stringWithCapacity:[str length]];
NSScanner *scanner = [NSScanner scannerWithString:str];
[scanner setCharactersToBeSkipped:nil];
NSString *s = nil;
while (![scanner isAtEnd])
{
[scanner scanUpToString:#"<" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:#">" intoString:NULL];
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
s = nil;
}
return ms;
}
And it works, however, it only removes the tags, not the contents between script and style tags (as obviously I don't want the contents between all tags to be removed as that would result in an empty string).
Is there any way I can have specifically the script and style tags truncated?
Thanks a lot in advance.
EDIT:
I have tried changing my code to:
- (NSString *) stripTags:(NSString *)str
{
NSMutableString *ms = [NSMutableString stringWithCapacity:[str length]];
NSScanner *scanner = [NSScanner scannerWithString:str];
[scanner setCharactersToBeSkipped:nil];
NSString *s = nil;
while (![scanner isAtEnd])
{
[scanner scanUpToString:#"<script" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:#"script>" intoString:NULL];
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
[scanner scanUpToString:#"<" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:#">" intoString:NULL];
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
s = nil;
}
return ms;
}
but the scripts and css is still being included
You can edit the scanner code so that you can check the tags. If the tag is one you want to remove then you can scan to the closing tag and just discard the string. You not then you can store / append the string.
Read up to the tag start (<)' then read the tag so you can check what it is. Then read to the tag close and either drop it or save it.
Start with something like (typed inline and not tested in any way):
while (![scanner isAtEnd])
{
[scanner scanUpToString:#"<" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:#">" intoString:&t];
if ([t isEqualToString:#"tagToIgnore"]) {
[scanner scanUpToString:#"<" intoString:NULL];
[scanner setScanLocation:[scanner scanLocation]-1];
s = nil;
t = nil;
continue;
}
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
s = nil;
t = nil;
}

Parsing Class writes last item twice

I am helpless. I parse this text...
<parse>HELLO</parse>
<parse>World</parse>
<parse>digit</parse>
<parse>wow</parse>
<parse>hellonewitem</parse>
<parse>lastitem</parse>
with an instance of NSScanner:
-(NSMutableArray *)parseTest
{
if (parserTest != NULL)
{
NSScanner *scanner = [[NSScanner alloc] initWithString:parserTest];
NSString *test;
NSMutableArray *someArray = [NSMutableArray array];
while ([scanner isAtEnd]!=YES)
{
[scanner scanUpToString:#"<parse>" intoString:nil];
[scanner scanString:#"<parse>" intoString:nil];
[scanner scanUpToString:#"</parse>" intoString:&test];
[scanner scanString:#"</parse>" intoString:nil];
[someArray addObject:test];
NSLog(#"%#",test);
}
return someArray;
}
Can't get my head around why I am getting the last object twice here in the returned array. What am I missing? Is there something wrong with the:
[scanner isAtEnd]!=Yes?
Thanks for any help!
Matthias
check the count of the someArray,
NSLog(#"%d",[someArray count]);
if it is 6, then you are doing something wrong in printing the values.
else if it is 7, then something going wrong somewhere, and need to be sorted
Hope the first condition is true.

Parsing a arbitrary textual data format

I am trying to parse a .rtf file that was created by someone else, and I don't have control over the file contents or format. There are several blocks to the file and each block has a set of information that I need to get. Each block is set up like this:
[Title]
[Type] ([sub type])
Level: [CSV list of levels]
Components: [CSV list of components]
Time: [proprietary time format]
Length: [length value]
Target: [target text]
Dwell: [dwell time in proprietary time format]
Saves: [yes/no]
Additional Information: [additional information]
[notes]
There may be from 50 to 100 blocks like the one above in each file. I have used the NSRegularExpression class to do some other parsing in my app, but I can't even think about how to accomplish this.
As far as I can tell, each block is separated by a double line.
Try using a NSScanner, like this:
NSString *input =
#"[Title]\n"
#"[Type] ([sub type])\n"
#"Level: [CSV list of levels]\n"
#"Components: [CSV list of components]\n"
#"Time: [proprietary time format]\n"
#"Length: [length value]\n"
#"Target: [target text]\n"
#"Dwell: [dwell time in proprietary time format]\n"
#"Saves: [yes/no]\n"
#"Additional Information: [additional information]\n"
#"[notes]\n";
NSString *title, *type, *subType, *level, *components, *time, *length, *target, *dwell, *saves, *additional, *notes;
title = type = subType = level = components = time = length = target = dwell = saves = additional = notes = nil;
NSScanner *scanner = [NSScanner scannerWithString:input];
// read the first line into title...
[scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&title];
[scanner scanCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:nil];
// read the first part of the second line into type
[scanner scanUpToString:#" (" intoString:&type];
[scanner scanString:#"(" intoString:nil];
// read the next part of the second line into subType
[scanner scanUpToString:#")" intoString:&subType];
// read the end of the line
[scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:nil];
[scanner scanCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:nil];
// read in level
[scanner scanString:#"Level: " intoString:nil];
[scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&level];
[scanner scanCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:nil];
// read in components:
[scanner scanString:#"Components: " intoString:nil];
[scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&components];
[scanner scanCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:nil];
// read in time:
[scanner scanString:#"Time: " intoString:nil];
[scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&time];
[scanner scanCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:nil];
// read in length
[scanner scanString:#"Length: " intoString:nil];
[scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&length];
[scanner scanCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:nil];
// complete for all other metadata
NSLog(#"%#", title);
NSLog(#"%# (%#)", type, subType);
NSLog(#"%#", level);
NSLog(#"%#", components);
NSLog(#"%#", time);
NSLog(#"%#", length);
NSLog(#"%#", target);
NSLog(#"%#", dwell);
NSLog(#"%#", saves);
NSLog(#"%#", additional);
NSLog(#"%#", notes);
This works for me, obviously complete the process for all the other fields.
For the time being, I can just convert the .rtf files to regular text files. This allows me to process them more easily.
Thanks for the help! I am going to look into using the NSScanner to do this more elegantly.

NSScanner Remove Substring Quotation from NSString

I have NSString's in the form of Johnny likes "eating" apples. I want to remove the quotations from my strings so that.
Johnny likes "eating" apples
becomes
John likes apples
I've been playing with NSScanner to do the trick but I'm getting some crashes.
- (NSString*)clean:(NSString*) _string
{
NSString *string = nil;
NSScanner *scanner = [NSScanner scannerWithString:_string];
while ([scanner isAtEnd] == NO)
{
[scanner scanUpToString:#"\"" intoString:&string];
[scanner scanUpToString:#"\"" intoString:nil];
[scanner scanUpToString:#"." intoString:&string]; // picked . becuase it's not in the string, really just want rest of string scanned
}
return string;
}
This code is hacky, but seems to produce the output you want.
It was not tested with unexpected inputs (string not in the form described, nil string...), but should get you started.
- (NSString *)stringByStrippingQuottedSubstring:(NSString *) stringToClean
{
NSString *strippedString,
*strippedString2;
NSScanner *scanner = [NSScanner scannerWithString:stringToClean];
[scanner scanUpToString:#"\"" intoString:&strippedString]; // Getting first part of the string, up to the first quote
[scanner scanUpToString:#"\" " intoString:NULL]; // Scanning without caring about the quoted part of the string, up to the second quote
strippedString2 = [[scanner string] substringFromIndex:[scanner scanLocation]]; // Getting remainder of the string
// Having to trim the second part of the string
// (Cf. doc: "If stopString is present in the receiver, then on return the scan location is set to the beginning of that string.")
strippedString2 = [strippedString2 stringByTrimmingCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:#"\" "]];
return [strippedString stringByAppendingString:strippedString2];
}
I will come back (much) later to clean it, and drill into the documentation of the class NSScanner to figure what I am missing, and had to take care with a manual string trimming.