componentsSeparatedByString for SFV files - objective-c

I'm having an NSString like this:
music that rocks.mp3 e99a65fb
The problem is that between the filename (musicthatrocks.mp3) and the CRC32 checksum (e99a65fb) there could be many more spaces then one. How can I split the line into an array?
I thought of using componentsSeparatedByString, but my problem is that the filename can also contain spaces.
Thanks in advance.

If you're OK with regular expressions, you could do it that way. For example (using RegexKitLite):
NSString * fileName = [line stringByMatching:#"(.*)\\s" capture:1];
An explanation of the regex: (.*) This will match as many characters as it can, until it finds a space character. However, the capture is greedy, which means it's going to grab as many characters as it can before the last space character (in a nutshell).
Or you can use NSString methods to find the last occurrence of the space character and get a substring from the beginning of the line to the last space character.
Or you can split the string base on #" ", throw away the last object in the array, and then recombine the array with #" ".

Use componentsSeparatedByString:.
Get the lastObject, which is the CRC32.
Use subarrayWithRange: to create a new array without the CRC32.
Use componentsJoinedByString: to reconstitute the filename.
You may want to do step 4 in a loop, repeatedly deleting the last object until either the filename exists in the target directory or you run out of empty strings at the end of the array. This handles the case of multiple spaces between filename and CRC32, as well as the case of a filename ending in spaces (pathological but possible).

Related

Postgres - substring from the beginning to the second last occurrence of a char within a string

I need to retrieve the bolded section of the below string . This value is in a column within my Postgres database table.
SEALS_LME_TRADES_MBL_20220919_00212.csv
I tried to utilize the functions; substring, reverse, strpos but they all have limitations. It seems like regex is the best option, however I was not able to do it.
Essentially I need to substring from beginning till the second last '_'. I do not want the date and sequence number along with the file extension at the end.
The closes regex I managed to get is: ^(([^]*){4})
https://regex101.com/
This look a little wonky but how about this?
select substring ('SEALS_LME_TRADES_MBL_20220919_00212.csv', '^(.+)_[^_]+_[^_]+')
Translation
^ from the beginning
(.+) any characters (capture and return this value), followed by
_ an underscore, followed by
[^_]+ one or more non-underscores, followed by
_ an underscore, followed by
[^_]+ one or more non-underscores
Regex greediness will cause any incidental underscores to be captured in the initial string.
Technically speaking the last portion (one or more non-underscores) can probably be omitted.

Regular expression to extract a number of steps

I have a localized string that looks something like this in English:
"
5 Mile(s)
5,252 Step(s)
"
My app is localized both in left-to-right and right-to-left languages so I don't want to make assumptions either about the ordering of the step(s) or about the formatting of the number (e.g. 5,252 can be 5.252 depending on user locale). So I need to account for possibilities that can include things like
Step(s) 5.252
as well as what's above.
A few other caveats
All I know is that if the Step(s) line is in there, it will be on its own line (hence in my regex I require \n at each end of the string)
No guarantee that the Mile(s) information will be in the string at all, let alone whether it will be before or after Step(s)
Here's my attempt at pattern extraction:
NSString *patternString = [NSString stringWithFormat:#"\\n(([0-9,\\.]*)\s*%#|%#\s*([0-9,\\.]*))\\n",
NSLocalizedString(#"Step(s)",nil), NSLocalizedString(#"Step(s)",nil)];
There appear to be two problems with this:
XCode is indicating Unknown escape sequence '\s' for the second \s in the pattern string above
No matches are being found even for strings like the following:
0.2 Mile(s)
1,482 Step(s)
Ideally I would extract the 1,482 out of this string in a way that is localization friendly. How should I modify my regex?
as far as the regex, perhaps this approach might work - it simply matches (with named groups) each couplet of numbers in sequence, with the assumption the first is miles and the second is steps. Decimals in the . or , form are optional:
(?<miles>\d+(?:[.,]\d+)?).*?(?<steps>\d+(?:[.,]\d+)?)
(and i think it should be \\s) - i'm not an ios guy, but if you can use a regex literal it would be way more readable.
regular expression demo
First I'd like to ask - Why is Mile(s) mentioned in the question at all?
And now to my two bits - you could simply use a positive look-ahead:
^(?=.*Step\(s\))[^\d]*(\d+(?:[.,]\d+)?)
It makes sure the expected word is present on the line, and then captures the number on it, allowing for localized, optional, decimal separator and decimals. This way it doesn't matter if the numer is before, or after, the "word".
It doesn't take localization of the "word" into account, but that you seem to have handled by yourself ;)
See it here at regex101.
Your regex is close, although in Obj-C you need to double-escape the \s and (s):
^(([0-9,.]*)\\s*%#|%#\\s*([0-9,.]*))$
In your NSLocalizedString you likely also need to escape the parentheses enclosing (s):
NSString *patternString = [NSString stringWithFormat:#"^(([\\d,.]+)\\s%#|%#\\s([\\d,.]+))$",
NSLocalizedString(#"Step\\(s\\)",nil), NSLocalizedString(#"Step\\(s\\)",nil)];
If you don't escape (s) then the regex engine is probably going to interpret it as a capture group.
Looking at NSLog you can see what the pattern actually reads like:
NSLog(#"patternString: %#", patternString);
Output:
patternString: ^(([\d,.]+)\sStep\(s\)|Step\(s\)\s([\d,.]+))$
Since you mentioned the Mile(s) part may not be in the string at all I'm assuming it isn't relevant to the regular expression. As I understand from the question, you just need to capture the number of steps and nothing else. On this basis, here's a modified version of your existing regex:
NSString *patternString =
[NSString stringWithFormat:#"^(?:([0-9,.]*)\\s*%#|%#\\s*([0-9,.]*))$",
NSLocalizedString(#"Step\\(s\\)",nil), NSLocalizedString(#"Step\\(s\\)",nil)];
Demo:
https://www.regex101.com/r/Q6ff1b/1
This is based on the following tips/modifications:
Use the m (= UREGEX_MULTILINE) flag option when creating the regex to specify that ^ and $ match the start and end of each line. This is more sophisticated than using \n as it will also handle the start and end of the string where this might not be present. See here.
Always use a double backslash (\\) for regex escaping - otherwise NSString will interpret the single backslash to be escaping the next character and convert it before it gets to the regex.
Literal parentheses need to be escaped - e.g. Step\\(s\\) instead of Step(s).
Characters within a character class (i.e. anything within the [] square brackets) don't need to be escaped - so it would be . rather than \\. - the latter.
If you are using (x|y|...) as a choice and don't need it to be a capturing group, use ?: after the first parenthesis to ensure it doesn't get captured - i.e. (?:x|y|...).

How to check that whole string matching to pattern instead find substrings that matching using NSRegularExpression? [duplicate]

I would like to write a regular expression that starts with the string "wp" and ends with the string "php" to locate a file in a directory. How do I do it?
Example file: wp-comments-post.php
This should do it for you ^wp.*php$
Matches
wp-comments-post.php
wp.something.php
wp.php
Doesn't match
something-wp.php
wp.php.txt
^wp.*\.php$ Should do the trick.
The .* means "any character, repeated 0 or more times". The next . is escaped because it's a special character, and you want a literal period (".php"). Don't forget that if you're typing this in as a literal string in something like C#, Java, etc., you need to escape the backslash because it's a special character in many literal strings.
Example:
ajshdjashdjashdlasdlhdlSTARTasdasdsdaasdENDaknsdklansdlknaldknaaklsdn
1) START\w*END
return: STARTasdasdsdaasdEND - will give you words between START and END
2) START\d*END
return: START12121212END - will give you numbers between START and END
3) START\d*_\d*END
return: START1212_1212END - will give you numbers between START and END having _

Objc-DataFile-Unreadable Substring-Unknown to any encoding

I have a DataFile, built by subsidiairy Application. I need to locate some substring contained in the data file. They are identifiable by the character symbols delimiting them. For instance : *!substringqSxt .The substring will vary from a project to another so I need to locate the symbols delimiting them to read the following substring. I also printed the file to different encodings trying which one was used and matched the original data file. found it was MacOsRomanStringEncoding.
I use NSRange:rangOfStringto locate the delimiting symbols. Here is my code :
char *debutAudio ="jjbj";
char *finAudio ="qSxt";
NSString *debutAudioConverted = [[NSString alloc]
initWithCString: debutAudio
encoding:NSMacOSRomanStringEncoding];
NSString *finAudioConverted = [[NSString alloc]
initWithCString: finAudio
encoding:NSMacOSRomanStringEncoding];
NSRange debutaudioRange =[dataFileContent rangeOfString:debutAudioConverted];
NSRange finaudioRange =[dataFileContent rangeOfString:finAudioConverted];
NSLog(#"range is %#",NSStringFromRange(debutaudioRange));
NSLog(#"range is %#",NSStringFromRange(finaudioRange));
Both NSLog returns range is {9223372036854775807, 0}
so not locating the delimiting strings there.
And if I ask to look for other strings contained in the file like "Settings" the rangeOfString will return the proper location and length.
I thought the file may contain multiple encodings, and tried converting with initWithCStringto any possible encoding but nothing would do.
Also if I open the file in text edit and use the "Find" function, it will not locate the delimiting string, but will locate other words. My guts tell me its related. I dont know where to look for info. Could the file be protected, I am reading a copy of it though.
I have found the problem occuring here. The proper encoding is still MacOsRoman. The problem is the prefix string *debutAudio "jjbj"there is actually a tiny space , like a quarter space between each characters. I have tried every unicode spaces listed here :https://www.cs.tut.fi/~jkorpela/chars/spaces.html#adj
without any success. Now I will tried to find some half or quarter space under MacOsRoman see if that is working.

How do I remove illegal characters from an NSString?

I am parsing a tab seperated list using a NSScanner based upon each line and the tabs. However for some reason the last field in the array (parsed from each row) contains a \r character.
How can I strip this from the NSString that represents the line (or the field)
If the \r character is at the end (probably because the file being parsed is CRLF), you can just do something like [string stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]]. (You might want to use an explicitly created '\r' character set instead if you don't want to strip whitespace as well.)
Try using the +[NSCharacterSet newlineCharacterSet] method with NSScanner in your various scanning method calls.
Just FYI, The \r is part of the line ending for a file created in a windows environment.