Comparing NSString to NSTextView Range prior to Appending - objective-c

Coding in Objective-C, I'm appending text to a NSTextView object named subCap in my code like so:
[[[_subCAP textStorage] mutableString]appendString:[NSString stringWithFormat:#"%#", subcapLine]];
subcapLine will have two timecode values such as: "01:00:00:00 01:00:01:00" separated by a single space, then a newline (\n) character, then a string like "ONC314_001_001" followed by two newline chars (\n\n).
The end result will create a list similar to:
01:00:00:00 01:00:01:00
ONC314_001_001
01:00:01:00 01:00:02:00
ONC314_001_002
01:00:02:00 01:00:03:00
ONC314_001_003
etc, etc, etc.
It's a sub caption file for placing text (the ONC314 lines) at appropriate times in a video file, as indicated by the timecodes.
However, I've determined that there is an odd set of circumstances where a timecode pair could be the same as the previous timecode pair, and if that happens, I want to skip appending that line.
So, my question is, given that the timecodes are always 11 chars apiece, separated by a space, can anybody think of a way I can easily grab the prior TC pair and compare it to my current pair in the subcapLine I'm preparing to append? The problem is the text of the sub caption could be random lengths. In my example they're the same, but that isn't always the case.
If I need to check prior to compiling my subcapLine, I can do that too, but I just thought it might be more slick to use a range of some sort to grab the prior pair of TCs from the last-written line in the NSTextView object and compare (again, using a range?) against the TCs in the line I'm about to append?
Thoughts and suggestions much appreciated.
Chris Conlee

When you add a timecode store the length of the text field string just before you add the timecode so you will have the offset to the timecode you are about to add.
Then before adding a new timecode you could simply use the previous offset you stored to extract the substring and do a string comparison and see if the timecodes are identical.
This should allow you to always have an offset to the previous timecode regardless of the length of the subtitles.

Related

Is format ####0.000000 different to 0.000000?

I am working on some legacy code at the moment and have come across the following:
FooString = String.Format("{0:####0.000000}", FooDouble)
My question is, is the format string here, ####0.000000 any different from simply 0.000000?
I'm trying to generalize the return type of the function that sets FooDouble and so checking to make sure I don't break existing functionality hence trying to work out what the # add to it here.
I've run a couple tests in a toy program and couldn't see how the result was any different but maybe there's something I'm missing?
From MSDN
The "#" custom format specifier serves as a digit-placeholder symbol.
If the value that is being formatted has a digit in the position where
the "#" symbol appears in the format string, that digit is copied to
the result string. Otherwise, nothing is stored in that position in
the result string.
Note that this specifier never displays a zero that
is not a significant digit, even if zero is the only digit in the
string. It will display zero only if it is a significant digit in the
number that is being displayed.
Because you use one 0 before decimal separator 0.0 - both formats should return same result.

Filtering rows in Pentaho

I have a dataset with columns containing numbers. However, some of the rows in that column have missing data. Instead of numbers, a dash (-) is placed in the cell.
What I want to happen is to separate those rows with a dash and output them to a separate excel file. Those without the dash, should output to a csv file.
I tried the "filter rows" but it gives me an error:
Unexpected conversion error while converting value [constant String] to a Number
constant String : couldn't convert String to number
constant String : couldn't convert String to number : non-numeric character found at position 1 for value [-]
My condition is if
Column1 CONTAINS - (String)
You cant try to convert to number in the select step,and handler the error, if can not convert to number that mean that is (-)
You can convert missing value indicators (like a dash or any other string) to null in Text-File-Input - see field option "Null if". That way you still can use the metadata detection feature and will not trip over a dash arriving in a Number field.
With CSV-File-Input you should stick to the String datatype until a Null-If step has cleansed the values, so you can change the datatype to Number in a Select-Values step.
If you must preserve the dash character, don't use metadata detection (as it suggests datatype Number) or use more rows to sample (so a field with a dash is encountered) or just revert the datatype to String again before saving and running the transformation.
My solution lies on the first 'Replace in String'. I replaced the dash into something numeric and can easily be distinguished from the rest of the numbers (I used 9999) and carried on with the rest of my process.
In filter rows, I had no problems anymore with the data type because both my variables and condition contained numbers, therefore, it no longer had to convert anything.
After filter rows, I added the 'Null-if' to remove the random 9999 that I used
just to have something to replace the dash.
After that, the separation was made just as I hope it would.
Thanks to #marabu for the Null-if idea.

CHCSVParsing an unusual csv file

I'm having difficulties as to how I should parse this kind of csv file.
For example:
06:16 PM,7,299,http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0026_heavy_rain_showers_night.png,Moderate rain at times,14,22,180,S,3.1,81,10,993,75
2014-01-31,9,48,3,38,22,35,176,S,119,http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0003_white_cloud.png,Cloudy,6.0
2014-02-01,7,45,3,37,19,30,220,SW,113,http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0001_sunny.png,Sunny,2.2
2014-02-02,9,47,3,37,17,27,236,SW,113,http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0001_sunny.png,Sunny,0.0
2014-02-03,8,46,3,37,21,34,152,SSE,116,http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0002_sunny_intervals.png,Partly Cloudy,1.8
2014-02-04,9,48,3,38,20,32,191,SSW,263,http://cdn.worldweatheronline.net/images/wsymbols01_png_64/wsymbol_0009_light_rain_showers.png,Patchy light drizzle,1.4
"London","United Kingdom","City Of London, Greater London",51.517,-0.106,7421228,http://www.worldweatheronline.com/London-weather/City-of-London-Greater-London/GB.aspx
For example, I need to get the first two values on the first line, all the values on the 2nd to 6th line, and the first value of the 7th line.
I currently have a model class with properties for all the values I need to get.
I'm not sure how to do it in this situation. So far, I know how to parse that csv (if I didn't need to get the first two values on the 1st line, and the first value of the 7th line)
What would be the logic to parse in this situation? Hope you guys can give me some idea how to do it.
Thanks
Split the string into lines (busing a scanner or array method depending on how big the string is). Once you have your lines, takes the special ones and pass to appropriate methods to extract the required values. Likewise for the main lines.
You can use your parser on each line individually or use a scanner or array method if that's easier depending on what content you need to extract and where it is.

User input text translation

I'm working on a translator that will take English language text (as user input into a UITextView) and (with a button press) replace specific words with alternatives. I have both the English words in scope plus their alternatives in separate Arrays (englishArray and alternativeArray), indexed correspondingly.
My challenge is finding an algorithm that will allow me to identify a word in the input text (a UITextView) ignoring characters like <",.()>, lookup the word in englishArray (case insensitive), locate the corresponding word in alternativeArray and then use that word in place of the original - writing it back to the UITextView.
Any help greatly appreciated.
NB. I have created a Category extending the NSArray functionality with a indexOfCaseInsensitiveString method that ignores case when doing an indexOfObject type lookup if that helps.
Tony.
I think that using an NSScanner would be best to parse the string into separate words which you could then pass to your indexOfCaseInsensitiveString method. scanCharactersFromSet:intoString: using a set of all the characters you want to ignore, including whitespace and newline characters should get you to the start of a word, and then you could use scanUpToCharactersFromSet:intoString: using the same set to scan to the end of the word. Using scanLocation at the beginning and end of each scan should allow you to get the range of that word, so if you find a match in your array, you will know where in your string to make the replacement.
Thanks for your suggestion. It's working with one exception.
I want to capture all punctuation so I can recreate the original input but with the substituted words. Even though I have a 'space' in my Character Set, the scanner is not putting the spaces into the 'intoString'. Other characters I specify in the Character Set such as '(' and ';' are represented in the 'intoString'.
Net is that when I recreate the input, it's perfect except that I get individual words running into each other.
UPDATE: I fixed that issue by including:
[theScanner setCharactersToBeSkipped:nil];
Thanks again.

Using NSXMLParser with ISO-8859-1 truncates words with accents

I have the same exact problem that's in this question, but it didn't get any good answers.
I'm trying to parse an XML file with an ISO-8859-1 encoding, but everytime there's an accentuated word, it gets truncated and doesn't show properly.
Example:
Original Word: Interés
Word Shown: és
You're making the assumption that you only get one -parser:foundCharacters: delegate method for the text. In this case, that's wrong. You're getting two calls to -parser:foundCharacters:, the first being the text up to the accented character, and the second being the text after it. Your logs even demonstrate this.
Therefore, what you need to do is, when you start a new element, you should also initialize a new NSMutableString* instance. Then when you get -parser:foundCharacters: you append to this string instead of replacing it. When the tag closes, this string now contains all of the text in the tag, instead of just the last text block.
You must use a NSMutableString and append chars with it on the foundCharacters method.
That's why your string becomes truncated.