Split NSString into NSArray by blank lines - objective-c

I am reading a *.srt subtitle file into a NSString. The content of this string looks like this:
1
00:00:20,000 --> 00:00:24,400
Altocumulus clouds occur between six thousand
2
00:00:24,600 --> 00:00:27,800
and twenty thousand feet above ground level.
I am looking for an elegant solution to split this string into an NSArray in which each element contains the information which is related to one particular subtitle-"frame", e.g. the zeroth element would look like this:
1
00:00:20,000 --> 00:00:24,400
Altocumulus clouds occur between six thousand
Any ideas how to accomplish this task in an elegant manner? I tried splitting the original string using the method
[string componentsSeparatedByString:#"\n\n"];
but this method fails to detect the blank lines..
Thanks for your help!
tobi

If [string componentsSeparatedByString:#"\n\n"] doesn't work, then there are two possibilities:
Your file contains MSDOS-style line breaks, which are \r\n. So try splitting on #"\r\n\r\n".
Your supposedly blank lines contain spaces or tabs. You can check this from the shell using cat -e.

I'd suggest using NSScanner instead. It's more flexible and you don't have to worry about whether your line breaks are Windows or Unix style and whether the blank lines contain any spaces. Here's an example:
NSMutableArray *lines = [NSMutableArray array];
NSString *s = #"foo\n\nbar\r\n \t \r\nbaz"; //intentionally mixed line breaks
NSScanner *scanner = [NSScanner scannerWithString:s];
while (![scanner isAtEnd]) {
[scanner scanCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:NULL];
NSString *line = nil;
[scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&line];
if (line) {
[lines addObject:line];
}
}
NSLog(#"%#", lines);

According to http://en.wikipedia.org/wiki/SubRip, the line breaks are a CRLF, which would be \r\n.

Related

Delete whiteSpaces in Objective C [duplicate]

This question already has an answer here:
How do you remove extra empty space in NSString?
(1 answer)
Closed 6 years ago.
I am trying to delete the extra white spaces in my string
for exemple
NSString *mystring= #" Alex mona ok";
so after deleting the extra white spaces mastering should look like this
// deleting the first spaces, middle spaces and the last spaces
"Alex mona ok"
Unfortunately, Cocoa's split method is not versatile enough to remove duplicate separators on its own, so you need to write quite a bit of code:
Split your string into words on whitespace
Remove empty entries created for adjacent separators
Join the array back on a single space
Here is the same thing coded in Objective-C:
NSString *mystring= #" Alex mona ok";
NSMutableArray *words = [[mystring componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]] mutableCopy];
[words removeObject:#""];
NSString *res = [words componentsJoinedByString:#" "];
If you only need to remove a certain character like a space use this:
[mystring stringByReplacingOccurrencesOfString:#" " withString:#""]
If you need to remove tabs, spaces, etc. use:
NSArray* newstring = [mystring componentsSeparatedByCharactersInSet :[NSCharacterSet whitespaceAndNewlineCharacterSet]];
NSString* nospacestring = [newstring componentsJoinedByString:#" "];
This removes all whitespace and then joins the components of the non whitespace back together.

in objective-c is there any easy way to add backslash in front of special characters?

Note: Not sure why this is marked as duplicate as I clearly stated that I don't want to use stringByReplacingOccurrencesOfString over and over again.
I have a question regarding the special character filename.
I have implemented a program, so that when you open a file or multiple files, the program will read all these filenames and local path and store them into the NSMutableArray. This part works perfectly without a problem.
My program also need to use NSTask to manipulate these files. However, the problem is, sometimes filename will contain special characters, for example, /Users/josh/Desktop/Screen Shot 2013-03-19 at 2.05.06 PM.png.
I have to replace space with backslash and space
NSString *urlPath = [[self url] path];
urlPath = [urlPath stringByReplacingOccurrencesOfString:#"(" withString:#"\\("];
urlPath = [urlPath stringByReplacingOccurrencesOfString:#")" withString:#"\\)"];
urlPath = [urlPath stringByReplacingOccurrencesOfString:#" " withString:#"\\ "];
to: /Users/josh/Desktop/Screen\ Shot\ 2013-03-19\ at\ 2.05.06\ PM.png
so that I can manipulate the file properly.
Same for the ( and ). I also need to add backslash before that.
but there are too many special characters. ie.
/Users/josh/Desktop/~!##$?:<,.>%^&*()_+`-={}[]\|'';.txt
I need to change to:
/Users/josh/Desktop/\~\!#\#\$\?\:\<\,.\>\%^\&\*\(\)_+\`-\=\{\}\[\]\\\|\'\'\;.txt
and not to mention other special characters (ie. accent)
Is there any easy way to put a backslash in front of each special character, as I don't want to keep calling stringByReplacingOccurrencesOfString over and over again.
As described in NSTask's documentation for the setArguments: method, there should be no need to do special quoting:
Discussion
The NSTask object converts both path and the strings in
arguments to appropriate C-style strings (using
fileSystemRepresentation) before passing them to the task via argv[].
The strings in arguments do not undergo shell expansion, so you do not
need to do special quoting, and shell variables, such as $PWD, are not
resolved.
If you feel it is necessary, can you please provide some examples of the commands you want to run in the NSTask?
[UPDATE]: I see in the comments that you indeed are using the NSTask to execute a bash shell with -c, which I had wondered about. I've generally used NSTask to execute the command directly rather than going through the shell, like this:
NSTask *task = [[NSTask alloc] init];
[task setLaunchPath:#"/bin/ls"];
[task setArguments:[NSArray arrayWithObjects:#"-l", self.url.path, nil]];
Can you give a more accurate example of the actual command you want to run? For example, are you piping a series of commands together? Perhaps there might be an alternate way to achieve the same results without the need for using the bash shell...
I think you may be able to use an NSRegularExpressionSearch search.
It would look something like this
+ (NSString *) addBackslashes: (NSString *) string
{
// First convert the name string to a pure ASCII string
NSData *asciiData = [string dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *asciiString = [[[NSString alloc] initWithData:asciiData encoding:NSASCIIStringEncoding] lowercaseString];
// Define the characters that we will replace
NSString *searchCharacters = #"PUT IN ALL OF YOUR SPECIAL CHARACTERS HERE";
// example NSString *searchCharacters = #"!##$%&*()";
// replace them
NSString *regExPattern = [NSString stringWithFormat:#"[%#]", searchCharacters];
string = [asciiString stringByReplacingOccurrencesOfString:regExPattern withString: [NSString stringWithFormat:#"\\%#", regExPattern] options:NSRegularExpressionSearch range:NSMakeRange(0, asciiString.length)];
return string;
}
you could maintain a set of strings that need to be escaped and use NSScanner to build the new string by iterating the the source string and each time a problematic character is found u first add \\ to a destination string and continue coping the next chars.
NSString *sourceString = #"/Users/josh/Desktop/\"Screen Shot\" 2013-03-19 at 2\\05\\06 PM.png";
NSMutableString *destString = [#"" mutableCopy];
NSCharacterSet *escapeCharsSet = [NSCharacterSet characterSetWithCharactersInString:#" ()\\"];
NSScanner *scanner = [NSScanner scannerWithString:sourceString];
while (![scanner isAtEnd]) {
NSString *tempString;
[scanner scanUpToCharactersFromSet:escapeCharsSet intoString:&tempString];
if([scanner isAtEnd]){
[destString appendString:tempString];
}
else {
[destString appendFormat:#"%#\\%#", tempString, [sourceString substringWithRange:NSMakeRange([scanner scanLocation], 1)]];
[scanner setScanLocation:[scanner scanLocation]+1];
}
}
NSLog(#"\n%#\n%#", sourceString, destString);
result:
/Users/josh/Desktop/Screen Shot 2013-03-19 at 2.05.06 PM.png
/Users/josh/Desktop/Screen\ Shot\ 2013-03-19\ at\ 2.05.06\ PM.png

Parsing SRT file with Objective C

Text example:
1
00:00:00,000 --> 00:00:01,000
This is the first line
2
00:00:01,000 --> 00:00:02,000
This is the second line
3
00:00:02,000 --> 00:00:03,000
This is the last line
In JavaScript I would parse this with a regular expression certainly. I'm just wondering, is that the best way to do this in Obj C? I'm sure I could figure out a way to do this, but I'm wanting to do it an appropriate way.
I only need to know where to start and I'm happy to do the rest, but for understanding sake I'm going to end up with something like this (pseudo code):
NSDictionary
index -> [0-9]+
start -> hh:mm:ss,mmm
end -> hh:mm:ss,mmm
text -> one of the lines of text
In this case, I'd be parsing three entries into my dictionary.
Some background: I wrote a small app and created a file called stuff.srt containing your examples that resides in the bundle; hence, my means of accessing it.
This is just a quick and dirty thing, a proof-of-concept. Note that it doesn't check results. Real applications always check their results. As you can see, the work takes place in the -applicationDidFinishLaunching: method (I'm working in Mac OS X, not iOS).
EDIT:
It's been pointed out that the code as originally posted didn't handle multiple text lines correctly. To address this, I take advantage of the fact that SRT files use CRLF as their line breaks, and search for two occurrences of this sequence. I then change all occurrences of CRLF in the text string to spaces, based on what I observed here. This doesn't account for leading or trailing spaces in each line of the text.
I changed the contents of the stuff.srt file to this:
1
00:00:00,000 --> 00:00:01,000
This is the first line
and it has a secondary line
2
00:00:01,000 --> 00:00:02,000
This is the second line
3
00:00:02,000 --> 00:00:03,000
This is the last line
and it has a secondary line too
and the code has been revised as follows (I also put everything into an #autoreleasepool directive; there might be a lot of autoreleased objects generated in the course of parsing the file!):
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification
{
NSString *path = [[NSBundle mainBundle] pathForResource:#"stuff" ofType:#"srt"];
NSString *string = [NSString stringWithContentsOfFile:path encoding:NSUTF8StringEncoding error:NULL];
NSScanner *scanner = [NSScanner scannerWithString:string];
while (![scanner isAtEnd])
{
#autoreleasepool
{
NSString *indexString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&indexString];
NSString *startString;
(void) [scanner scanUpToString:#" --> " intoString:&startString];
// My string constant doesn't begin with spaces because scanners
// skip spaces and newlines by default.
(void) [scanner scanString:#"-->" intoString:NULL];
NSString *endString;
(void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&endString];
NSString *textString;
// (void) [scanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet] intoString:&textString];
// BEGIN EDIT
(void) [scanner scanUpToString:#"\r\n\r\n" intoString:&textString];
textString = [textString stringByReplacingOccurrencesOfString:#"\r\n" withString:#" "];
// Addresses trailing space added if CRLF is on a line by itself at the end of the SRT file
textString = [textString stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
// END EDIT
NSDictionary *dictionary = [NSDictionary dictionaryWithObjectsAndKeys:
indexString, #"index",
startString, #"start",
endString , #"end",
textString , #"text",
nil];
NSLog(#"%#", dictionary);
}
}
}
The revised output looks like this:
2013-02-09 16:10:17.727 SRTFileScan[4846:303] {
end = "00:00:01,000";
index = 1;
start = "00:00:00,000";
text = "This is the first line and it has a secondary line";
}
2013-02-09 16:10:17.729 SRTFileScan[4846:303] {
end = "00:00:02,000";
index = 2;
start = "00:00:01,000";
text = "This is the second line";
}
2013-02-09 16:10:17.730 SRTFileScan[4846:303] {
end = "00:00:03,000";
index = 3;
start = "00:00:02,000";
text = "This is the last line and it has a secondary line too";
}
One other thing I learned from what I've read today: The SRT file format originated in France, and the comma seen in the input is the decimal separator used there.
Apple has a sample code to parse subtitle files. Check the relevant part here:
https://developer.apple.com/library/mac/samplecode/avsubtitleswriterOSX/Listings/avsubtitleswriter_SubtitlesTextReader_m.html#//apple_ref/doc/uid/DTS40013409-avsubtitleswriter_SubtitlesTextReader_m-DontLinkElementID_5
My suggest is to use a NSDateFormatter to parse the second line. I would split that string in two strings (see componentsSeparatedByString: in NSString class reference). This while reading the file line per line.
So the loop would be:
If the file contains again data, read the next line;
If the next line is a multiple of 4, allocate a new object. This object should be able to contain two dates, one integer and one string;
If the next line is not a multiple of 4, read the line and assign it's value to the corresponding field.

How to break a NSString into words with non-significant blank suppression?

I have several NSStrings with a format similar to the one below:
"Hello, how are you?"
How can I break the string into an array of words? For example, for the above sentence I would expect an array consisting of "Hello,", "how", "are", "you?"
Usually I would break the string into words by using the function [NSString componentsSeparatedByCharactersInSet: NSCharacterSet set]
However this won't work in this situation because the spaces between the words are of unequal length. Note I will not be aware of the size of each word and the space between them.
How can I accomplish this? I am working on an app for OSX not iOS.
EDIT: My eventual goal is to retrieve the second word in the sentence. If there is a easier way to do this without breaking the string into an array please feel free to suggest it.
Try this:
NSMutableArray *parts = [NSMutableArray arrayWithArray:[str componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]]];
[parts removeObjectIdenticalTo:#""];
NSString *res = [parts objectAtIndex:1]; // The second string
Well, you could actually write a loop to iterate through the characters and find the first non-blank after the first blank, then iterate further to find the ending blank (or end of line). Would probably be about 5x faster (with much fewer object allocations) than using one of the other methods, and could be done in about 10 lines.
If you dont want to use a CharacterSet try this to remove extra spaces:
NSString* string = #"word1, word2 word3 word4";
bool done = false;
do {
NSString tempStr = [string stringByReplacingOccurrencesOfString:#" " withString:#" "];
done = [string isEqualToString:tempStr];
string = tempStr;
} while (!done);
NSLog(#"%#", string);
this will output "word1, word2 word3 word4"

scanUpToCharactersFromSet stops after one loop

I'm trying to get the contents of a CSV file into an array. When I've done this before I had one record per line, and used the newline character with scanUpToCharactersFromSet:intoString:, passing newlineCharacterSet as the character set:
while ([lineScanner scanUpToCharactersFromSet:[NSCharacterSet newlineCharacterSet]
intoString:&line])
Now, I'm working with a file where many of the entries themselves contain newline characters. I've tried adding a unique character to the end of each record (a * character) but my loop only runs once. Is there something which is making the while loop break that I don't know about? Here's the code I'm using now:
NSError *error;
NSString *data = [[NSString alloc] initWithContentsOfFile:[[self delegate] filePath] encoding:NSUTF8StringEncoding error:&error];
NSScanner *lineScanner = [NSScanner scannerWithString:data];
NSString *line = nil;
// Start parsing the CSV file
while ([lineScanner scanUpToCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:#"*"]
intoString:&line]) {
NSArray *elements = [line componentsSeparatedByString:#","];
NSLog("Name: %#", [elements objectAtIndex:1]);
}
**Edit: ** Thanks to Peter's answer below, I found that my scanner was stuck behind the * character. I added this line in the loop:
[lineScanner scanCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:#"*"] intoString:NULL];
and now it's working like it should.
Let's go through one pass at a time:
First:
while ([lineScanner scanUpToCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:[NSCharacterSet newlineCharacterSet]] intoString:&line]) {
The scanner puts everything before the line break into line. It advances up to the newline.
Second:
while ([lineScanner scanUpToCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:[NSCharacterSet newlineCharacterSet]] intoString:&line]) {
The scanner is already on a line break, so it scans no characters. As documented, since it scanned no characters, it returns NO. Your loop terminates.
The solution is to scan the line break at the end of the loop, to get the scanner past it. You can pass NULL for the output parameter, assuming you don't care what the line break was.
This is correct behavior: If you did/do care what the characters you scanned up to were, this lets you obtain them. That would be more difficult if NSScanner scanned past the characters automatically.
I think the while condition is wrong. According to the String Programming Guide, it should be something like:
while ([theScanner isAtEnd] == NO) {
[lineScanner scanUpToCharactersFromSet:[NSCharacterSet characterSetWithCharactersInString:#"*"] intoString:&line]
// ...
}