Objective C: Get Substring between Double Quotes - objective-c

What would be the best way to get every substring between double quotes and make it into an array?
For example, if the string (NSString) is:
#"abcd \"efgh\" ijklm \"no\" p \"qrst\" uvwx \"y\" z"
I want the result to be:
{#"efgh", #"no", #"qrst", #"y"}
as an NSArray.

This should get you started:
NSString *str = #"abcd \"efgh\" ijklm \"no\" p \"qrst\" uvwx \"y\" z";
NSMutableArray *target = [NSMutableArray array];
NSScanner *scanner = [NSScanner scannerWithString:str];
NSString *tmp;
while ([scanner isAtEnd] == NO)
{
[scanner scanUpToString:#"\"" intoString:NULL];
[scanner scanString:#"\"" intoString:NULL];
[scanner scanUpToString:#"\"" intoString:&tmp];
if ([scanner isAtEnd] == NO)
[target addObject:tmp];
[scanner scanString:#"\"" intoString:NULL];
}
for (NSString *item in target)
{
NSLog(#"%#", item);
}

One way would be to use componentsSeparatedByString: to split them based on ". This should give you an array of words the count of which should be odd. Filter all the even numbered words into an array. This should be your desired array.
Alternatively look at NSPredicate.

Related

How can I modify this SRT file parser?

I found some good code for parsing .srt files on stackoverflow (Parsing SRT file with Objective C) shown below:
NSScanner *scanner = [NSScanner scannerWithString:[theTextView string]];
while (![scanner isAtEnd])
{
#autoreleasepool
{
NSString *indexString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&indexString];
NSString *startString;
(void) [scanner scanUpToString:#" --> " intoString:&startString];
(void) [scanner scanString:#"-->" intoString:NULL];
NSString *endString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&endString];
NSString *textString;
(void) [scanner scanUpToString:#"\r\n\r\n" intoString:&textString];
textString = [textString stringByReplacingOccurrencesOfString:#"\r\n" withString:#" "];
textString = [textString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
NSDictionary *dictionary = [NSDictionary dictionaryWithObjectsAndKeys:
indexString, #"index",
startString, #"start",
endString , #"end",
textString , #"text",
nil];
NSLog(#"%#", dictionary);
}
}
I have a number of .srt files from a TV series that contain a lot of ‘credit’ subs which kinda spoil the experience and coded them out, leaving me with non-sequential indexes like this:
// deleted subtitles
3
00:00:11,070 --> 00:00:14,466
Screenwriter: Name here...
4
00:00:14,633 --> 00:00:17,466
Music: Name here...
5
00:00:17,686 --> 00:00:20,680
Narrator: Name here...
// deleted subtitle
7
00:01:17,966 --> 00:01:21,966
Episode 12
which chokes FCPX when I try to import the file. I’m completely new to NSScanner and tried everything I can think of without success. I'd appreciate any help in modifying the above just to skip the sub index line altogether (if possible?). I'm okay with adding them back in sequentially with separate code. Thanks!
UPDATE:
Thanks for your suggestion of indexing through the 'while' loop skaak, but the problem still seems to defy logic as it never increases beyond the very first pass (!!). The logs are shown below - firstly using an NSDictionary and then appending to an NSMutableString (probably more useful for my purposes). Note that in both cases the first sub does get changed to 1, but indices 4,5,7 remain unchanged rather than being renumbered 2,3,4.
2020-07-29 18:35:26.267 SRT Editor[12494:903]
{
end = "00:00:14,466";
index = 1;
start = "00:00:11,070";
text = "Screenwriter: Hashida Sugako\n\n4 00:00:14,633 --> 00:00:17,466 Music: Sakada Koichi\n\n5 00:00:17,686 --> 00:00:20,680 Narrator: Naraoka Tomoko\n\n7 00:01:28,633 --> 00:01:34,233 It was early spring in 1958...
}
2020-07-29 18:51:15.612 SRT Editor[12646:903]
1
00:00:11,07000:00:14,466Screenwriter: Hashida Sugako
4 00:00:14,633 --> 00:00:17,466 Music: Sakada Koichi
5 00:00:17,686 --> 00:00:20,680 Narrator: Naraoka Tomoko
7 00:01:28,633 --> 00:01:34,233 It was early spring in 1958...
Another puzzling observation is that if I put in a loopCounter++ it also suggests the 'while' loop only makes one pass through which baffles me, though I did mention being unfamiliar with NSScanner.
Try this
NSUInteger index = 1;
NSScanner *scanner = [NSScanner scannerWithString:[theTextView string]];
while (![scanner isAtEnd])
{
#autoreleasepool
{
NSString *indexString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&indexString];
NSString *startString;
(void) [scanner scanUpToString:#" --> " intoString:&startString];
(void) [scanner scanString:#"-->" intoString:NULL];
NSString *endString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&endString];
NSString *textString;
(void) [scanner scanUpToString:#"\r\n\r\n" intoString:&textString];
textString = [textString stringByReplacingOccurrencesOfString:#"\r\n" withString:#" "];
textString = [textString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
NSDictionary *dictionary = [NSDictionary dictionaryWithObjectsAndKeys:
// Use my own incremental index
#( index ), #"index",
startString, #"start",
endString , #"end",
textString , #"text",
nil];
NSLog(#"%#", dictionary);
// Move to next index
index ++;
}
}

Remove contents between script and style tags in Objective-C

Alright, so I am working on a web crawler that can take webpages and convert them into passages of text. To remove the tags themselves, I found this on Stack Overflow:
- (NSString *) stripTags:(NSString *)str
{
NSMutableString *ms = [NSMutableString stringWithCapacity:[str length]];
NSScanner *scanner = [NSScanner scannerWithString:str];
[scanner setCharactersToBeSkipped:nil];
NSString *s = nil;
while (![scanner isAtEnd])
{
[scanner scanUpToString:#"<" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:#">" intoString:NULL];
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
s = nil;
}
return ms;
}
And it works, however, it only removes the tags, not the contents between script and style tags (as obviously I don't want the contents between all tags to be removed as that would result in an empty string).
Is there any way I can have specifically the script and style tags truncated?
Thanks a lot in advance.
EDIT:
I have tried changing my code to:
- (NSString *) stripTags:(NSString *)str
{
NSMutableString *ms = [NSMutableString stringWithCapacity:[str length]];
NSScanner *scanner = [NSScanner scannerWithString:str];
[scanner setCharactersToBeSkipped:nil];
NSString *s = nil;
while (![scanner isAtEnd])
{
[scanner scanUpToString:#"<script" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:#"script>" intoString:NULL];
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
[scanner scanUpToString:#"<" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:#">" intoString:NULL];
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
s = nil;
}
return ms;
}
but the scripts and css is still being included
You can edit the scanner code so that you can check the tags. If the tag is one you want to remove then you can scan to the closing tag and just discard the string. You not then you can store / append the string.
Read up to the tag start (<)' then read the tag so you can check what it is. Then read to the tag close and either drop it or save it.
Start with something like (typed inline and not tested in any way):
while (![scanner isAtEnd])
{
[scanner scanUpToString:#"<" intoString:&s];
if (s != nil)
[ms appendString:s];
[scanner scanUpToString:#">" intoString:&t];
if ([t isEqualToString:#"tagToIgnore"]) {
[scanner scanUpToString:#"<" intoString:NULL];
[scanner setScanLocation:[scanner scanLocation]-1];
s = nil;
t = nil;
continue;
}
if (![scanner isAtEnd])
[scanner setScanLocation:[scanner scanLocation]+1];
s = nil;
t = nil;
}

Obj-C: Create Array From String Where items are in <>

I am trying to parse a String to an Array each item is between <> for example <this is column 1><this is column 2> etc....
Help would be much appreciated.
Thanks
Something to demonstrate:
NSString *string = #"<this is column 1><this is column 2>";
NSScanner *scanner = [NSScanner scannerWithString:string];
NSMutableArray *array = [NSMutableArray arrayWithCapacity:0];
NSString *temp;
while ([scanner isAtEnd] == NO)
{
// Disregard the result of the scanner because it returns NO if the
// "up to" string is the first one it encounters.
// You should still have this in case there are other characters
// between the right and left angle brackets.
(void) [scanner scanUpToString:#"<" intoString:NULL];
// Scan the left angle bracket to move the scanner location past it.
(void) [scanner scanString:#"<" intoString:NULL];
// Attempt to get the string.
BOOL success = [scanner scanUpToString:#">" intoString:&temp];
// Scan the right angle bracket to move the scanner location past it.
(void) [scanner scanString:#">" intoString:NULL];
if (success == YES)
{
[array addObject:temp];
}
}
NSLog(#"%#", array);
NSString *input =#"<one><two><three>";
NSString *strippedInput = [input stringByReplacingOccurencesOfString: #">" withString: #""]; //strips all > from input string
NSArray *array = [strippedInput componentsSeperatedByString:#"<"];
Note that [array objectAtIndex:0] will be an empty string ("") an this doesn't work of course, if one of the "actual" string contain < or >
One approach might be to use either componentsSeparatedByCharactersInSet or componentsSeparatedByString from NSString.
NSString *test = #"<one> <two> <three>";
NSArray *array1 = [test componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:#"<>"]];
NSArray *array2 = [test componentsSeparatedByString:#"<"];
You'll need to do some cleaning up afterward, either trimming in the case of array2 or removing white-space strings in the case of array1

Parsing Class writes last item twice

I am helpless. I parse this text...
<parse>HELLO</parse>
<parse>World</parse>
<parse>digit</parse>
<parse>wow</parse>
<parse>hellonewitem</parse>
<parse>lastitem</parse>
with an instance of NSScanner:
-(NSMutableArray *)parseTest
{
if (parserTest != NULL)
{
NSScanner *scanner = [[NSScanner alloc] initWithString:parserTest];
NSString *test;
NSMutableArray *someArray = [NSMutableArray array];
while ([scanner isAtEnd]!=YES)
{
[scanner scanUpToString:#"<parse>" intoString:nil];
[scanner scanString:#"<parse>" intoString:nil];
[scanner scanUpToString:#"</parse>" intoString:&test];
[scanner scanString:#"</parse>" intoString:nil];
[someArray addObject:test];
NSLog(#"%#",test);
}
return someArray;
}
Can't get my head around why I am getting the last object twice here in the returned array. What am I missing? Is there something wrong with the:
[scanner isAtEnd]!=Yes?
Thanks for any help!
Matthias
check the count of the someArray,
NSLog(#"%d",[someArray count]);
if it is 6, then you are doing something wrong in printing the values.
else if it is 7, then something going wrong somewhere, and need to be sorted
Hope the first condition is true.

How to use NSScanner?

I've just read Apple documentation for NSScanner.
I'm trying to get the integer of this string:
#"user logged (3 attempts)"
I can't find any example, how to scan within parentheses. Any ideas?
Here's the code:
NSString *logString = #"user logged (3 attempts)";
NSScanner *aScanner = [NSScanner scannerWithString:logString];
[aScanner scanInteger:anInteger];
NSLog(#"Attempts: %i", anInteger);
Ziltoid's solution works, but it's more code than you need.
I wouldn't bother instantiating an NSScanner for the given situation. NSCharacterSet and NSString give you all you need:
NSString *logString = #"user logged (3 attempts)";
NSString *digits = [logString stringByTrimmingCharactersInSet:
[[NSCharacterSet decimalDigitCharacterSet] invertedSet]];
NSLog(#"Attempts: %i", [digits intValue]);
or in Swift:
let logString = "user logged (3 attempts)"
let nonDigits = NSCharacterSet.decimalDigitCharacterSet().invertedSet
let digits : NSString = logString.stringByTrimmingCharactersInSet(nonDigits)
NSLog("Attempts: %i", digits.intValue)
`Here is what I do to get certain values out of a string
First I have this method defined
- (NSString *)getDataBetweenFromString:(NSString *)data leftString:(NSString *)leftData rightString:(NSString *)rightData leftOffset:(NSInteger)leftPos;
{
NSInteger left, right;
NSString *foundData;
NSScanner *scanner=[NSScanner scannerWithString:data];
[scanner scanUpToString:leftData intoString: nil];
left = [scanner scanLocation];
[scanner setScanLocation:left + leftPos];
[scanner scanUpToString:rightData intoString: nil];
right = [scanner scanLocation] + 1;
left += leftPos;
foundData = [data substringWithRange: NSMakeRange(left, (right - left) - 1)]; return foundData;
}
Then call it.
foundData = [self getDataBetweenFromString:data leftString:#"user logged (" rightString:#"attempts)" leftOffset:13];
leftOffset is the number of characters for the left string
Could be an easier cleaner way but that was my solution.
Here is a simple solution using NSScanner (yes, #NSResponder has a really neat solution!):
NSString *logString = #"user logged (3 attempts)";
NSString *numberString;
NSScanner *scanner = [NSScanner scannerWithString:logString];
[scanner scanUpToCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet] intoString:nil];
[scanner scanCharactersFromSet:[NSCharacterSet decimalDigitCharacterSet] intoString:&numberString];
NSLog(#"Attempts: %i", [numberString intValue]);
NSLog output:
Attempts: 3
NSScanner is a linear scanner. You have to scan through the stuff you don't want to get to what you do want.
You could do [aScanner scanUpToCharactersInSet:[NSCharacterSet decimalDigitCharacterSet] intoString:NULL] to jump past everything up to the number character. Then you do [aScanner scanInteger:&anInteger] to scan the character into an integer.
here is the reg-ex usage
NSString *logString = #"user logged (3 attempts)";
NSString * digits = [logString stringByMatching:#"([+\\-]?[0-9]+)" capture:1];