Trying to translate Object-C into Applescriptobjc for instagram post finder - objective-c

So I have this Objective-C code it does something that I had been trying to wrap my head around with plain Applescript, and also tried and failed with some python that I tried (and failed at). I'd post the Applescript I have already tried, but it is essentially worthless. So I am turning to the AppleScript/ASOBJC gurus here to help with a solution. The code is to reverse engineer an instagram media ID to a post ID (so if you have a photo that you know is from IG you can find the post ID for that photo).
-(NSString *) getInstagramPostId:(NSString *)mediaId {
NSString *postId = #"";
#try {
NSArray *myArray = [mediaId componentsSeparatedByString:#"_"];
NSString *longValue = [NSString stringWithFormat:#"%#",myArray[0]];
long itemId = [longValue longLongValue];
NSString *alphabet = #"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
while (itemId > 0) {
long remainder = (itemId % 64);
itemId = (itemId - remainder) / 64;
unsigned char charToUse = [alphabet characterAtIndex:(int)remainder];
postId = [NSString stringWithFormat:#"%c%#",charToUse , postId];
}
} #catch(NSException *exception) {
NSLog(#"%#",exception);
}
return postId;}
The code above comes from an answer on another SO question, which can be found here:
Link
I realize it is probably asking a lot but I suck at math so I don't really "get" this code, which is probably why I can't translate it to some form of Applescript myself! Hopefully I will learn something in this process.
Here is an example of the media ID the code is looking for:
45381714_262040461144618_1442077673155810739_n.jpg
And here is the post ID that the code above is supposed to translate into
BqvS62JHYH3
A lot of the research that went into these "calculators" is from this post from 5 years ago. It looks like the 18 digit to 10 digit ratio that they point out in the post is now an 11 to 19 ratio. I tried to test the code in Xcode but got an build error when I attempted to run it. Given that I am an Xcode n00b that is not surprising.
Thanks for your help with this!

Here's an (almost) "word-for-word" translation of your Objective-C code into ASObjC:
use framework "Foundation"
use scripting additions
on InstagramPostIdFromMediaId:mediaId
local mediaId
set postId to ""
set mediaId to my (NSString's stringWithString:mediaId)
set myArray to mediaId's componentsSeparatedByString:"_"
set longValue to my NSString's stringWithFormat_("%#", myArray's firstObject())
set itemId to longValue's longLongValue()
set alphabet to my (NSString's stringWithString:(("ABCDEFGHIJKLMNOPQRSTUVWXYZ" & ¬
"abcdefghijklmnopqrstuvwxyz0123456789-_")))
repeat while (itemId > 0)
set remainder to itemId mod 64
set itemId to itemId div 64
set unichar to (alphabet's characterAtIndex:remainder) as small integer
set postId to character id unichar & postId
end repeat
return postId
end InstagramPostIdFromMediaId:
By "almost", I mean that every Objective-C method utilised in the original script has been utilised by an equivalent call to the same Objective-C method by way of the ASObjC bridge, with two exceptions. I also made a trivial edit of a mathematical nature to one of the lines. Therefore, in total, I made three operational changes, two of these technically being functional changes but which end up to yielding identical results:
to replace (itemId - remainder) / 64 with itemId div 64
The AppleScript div command performs integer division, which is where a number given by regular division is truncated to remove everything after the decimal point. This is mathematically identical to what is being done when the remainder is subtracted from itemId before performing regular dividing.
to avoid the instance where stringWithFormat: is used to translate a unicode character index to a string representation
NSString objects store strings as a series of UTF-16 code points, and characterAtIndex: will retrieve a particular code point from a string, e.g. 0x0041, which refers to the character "A". stringWithFormat: uses the %c format specifier to translate an 8-bit unsigned integer (i.e. those in the range 0x0000 to 0x00FF) into its character value. AppleScript bungles this up, although I'm uncertain how or why this presents a problem. Unwrapping the value returned by charactertAtIndex: yields an opaque raw AppleScript data object that, for example, looks like «data ushr4100». This can happily be coerced into a small integer type, correctly returning the number 65 in denary. Therefore, whatever goes wrong is likely something stringWithFormat: is doing, so I used AppleScript's character id ... function to perform the same operation that stringWithFormat: was intended to do.
myArray[0] was replaced with myArray's firstObject()
Both of these are used in Objective-C to retrieve the first element in an array. myArray[0] is the very familiar C syntax that can happily be used in native Objective-C programming, but is not available to AppleScript. firstObject is an Objective-C method wrapping the underlying function and making it accessible for use in any Objective-C context, but also likely performs some additional checks to make it suitably safe to use without too much thought. As far as we're concerned in the AppleScript context, the result is identical.
With all that being said, supplying a mediaId of "45381714_262040461144618_1442077673155810739_n.jpg" to our new ASObjC handler gives this result:
"CtHhS"
rather than what you stated as the expected result, namely "BqvS62JHYH3". However, it's easy to see why. Both scripts are splitting the mediaId into components ("text items") at every occurrence of an underscore. Then only the first of these goes on to be used by either script to determine the postId. With the given mediaId above, the first text item is "45381714", which is far too short to be valid for our needs, hence the short length of the erroneous result above. The second text item is only 15 digits (characters) long so, too, is not viable. The third text item is 19 characters long, which is of the correct length.
Therefore, I replaced firstObject() in the script with item 3. As you can guess, instead of retrieving the first item from the array of text items (components) stored in myArray, it retrieves the third, namely "1442077673155810739". This produces the following result:
"BQDSgDW-VYA"
Similar, but not the identical to what you were expecting.
For now, I'll leave this with you. At this point, I would usually have compared this with your own previous attempts, but you said they were "worthless" so I'm assuming that this at least provides you with a piece of translated code that works in so far as it performs the same operations as its Objective-C counterpart. If you tell us what the nature of the actual hurdles you were facing are, that potentially lets me or someone else help further.
But since I can say with confidence that these two scripts are doing the same thing, then if the original is producing a different output with identical input, then that tells us that the data must be mutating at some point during its processing. Given that we are dealing with a number with an order of magnitude of 10¹⁹, I think it's very likely that the error is a result of floating-point precision. AppleScript stores any integers with absolute value up to and including 536870911 as type class integer, and anything exceeding this as type class real (floating point), so will be subject to floating-point errors.

Related

Linking Text to an Integer Objective C

The goal of this post is to find a more efficient way to create this method. Right now, as I start adding more and more values, I'm going to have a very messy and confusing app. Any help is appreciated!
I am making a workout app and assign an integer value to each workout. For example:
Where the number is exersiceInt:
01 is High Knees
02 is Jumping Jacks
03 is Jog in Place
etc.
I am making it so there is a feature to randomize the workout. To do this I am using this code:
-(IBAction) setWorkoutIntervals {
exerciseInt01 = 1 + (rand() %3);
exerciseInt02 = 1 + (rand() %3);
exerciseInt03 = 1 + (rand() %3);
}
So basically the workout intervals will first be a random workout (between high knees, jumping jacks, and jog in place). What I want to do is make a universal that defines the following so I don't have to continuously hard code everything.
Right now I have:
-(void) setLabelText {
if (exerciseInt01 == 1) {
exercise01Label.text = [NSString stringWithFormat:#"High Knees"];
}
if (exerciseInt01 == 2) {
exercise01Label.text = [NSString stringWithFormat:#"Jumping Jacks"];
}
if (exerciseInt01 == 3) {
exercise01Label.text = [NSString stringWithFormat:#"Jog in Place"];
}
}
I can already tell this about to get really messy once I start specifying images for each workout and start adding workouts. Additionally, my plan was to put the same code for exercise02Label, exercise03Label, etc. which would become extremely redundant and probably unnecessary.
What I'm thinking would be perfect if there would be someway to say
exercise01Label.text = exercise01Int; (I want to to say that the Label's text equals Jumping Jacks based on the current integer value)
How can I make it so I only have to state everything once and make the code less messy and less lengthy?
Three things for you to explore to make your code easier:
1. Count from zero
A number of things can be easier if you count from zero. A simple example is if your first exercise was numbered 0 then your random calculation would just be rand() % 3 (BTW look up uniform random number, there are much better ways to get a random number).
2. Learn about enumerations
An enumeration is a type with a set of named literal values. In (Objective-)C you can also think of them as just a collection of named integer values. For example you might declare:
typedef enum
{
HighKnees,
JumpingJacks,
JogInPlace,
ExerciseKindCount
} ExerciseCount;
Which declares ExerciseCount as a new type with 4 values. Each of these is equivalent to an integer, here HighKnees is equivalent to 0 and ExerciseKindCount to 3 - this should make you think of the first thing, count from zero...
3. Discover arrays
An array is an ordered collection of items where each item has an index - which is usually an integer or enumeration value. In (Objective-)C there are two basic kinds of arrays: C-style and object-style represented by NSArray and NSMutableArray. For example here is a simple C-style array:
NSString *gExerciseLabels[ExerciseKindCount] =
{ #"High Knees",
#"Jumping Jacks",
#"Jog in Place"
}
You've probably guessed by now, the first item of the above array has index 0, back to counting from zero...
Exploring these three things should quickly show you ways to simplify your code. Later you may wish to explore structures and objects.
HTH
A simple way to start is by putting the exercise names in an array. Then you can access the names by index. eg - exerciseNames[exerciseNumber]. You can also make the list of exercises in an array (of integers). So you would get; exerciseNames[exerciseTable[i]]; for example. Eventually you will want an object to define an exercise so that you can include images, videos, counts, durations etc.

Integer Precision and Conversion Errors

I have been programming Objective-C for only a few weeks. My experience in programming languages such as basic, visual basic, C++ and PHP is much more extensive starting back in 1987 and continuing forward to today. Although, for the last 5 years, I have exclusively coded PHP.
Today, I find myself confused by what I perceive to be bit conversion errors within the Objective-C language. I first noticed this the other day when trying to divide an integer (84) converted to a float by a float (10.0). This produced 8.399999, instead of the 8.400 I was hoping for. I coded a way around the issue and moved on.
Today, I am extracting an (int) 0 from an NSMutableDictionary. I store it first in an NSInteger and second in an int variable. The values should be 0 for both cases, but for both cases, I get the integer value 151229568. (See screenshot)
I remember from my early programming years that we had to worry about the size of the container, because pointing to block of memory with a 32-bit pointer to access a 4-bit value resulted in capturing all the data associated with other values and thus resulted in what appeared to be the wrong number being captured. With implicit memory management and type-conversions becoming the norm, I have not had to worry about this kind of issue for years, and now that I am confronted with it again, I need advice and clarification from programmers who are more familiar with this topic in todays programming environments.
Questions:
Is this a case of incorrect pointer sizing or something else?
What is happening on the back-end to produce this conversion from 0 to another number?
What can I do to get better precision and accuracy from my Objective-C calculations and variable assignments?
Code:
NSInteger hsibs = [keyData objectForKey:#"half_sibs"];
int hsibsi = [keyData objectForKey:#"half_sibs"];
//breakpoint and screen capture of variables in stack
I don't know Objective C all that well, but it looks like the method you use to obtain your data is returning a data type of id (see this reference), and not int
Looks like you either need to cast it or get the integer value in such a manner:
NSInteger hsibs = [[keyData objectForKey:#"half_sibs"] integerValue];
int hsibsi = [[keyData objectForKey:#"half_sibs"] intValue];
and then see if you get the expected results.

Conversion from float to int looks weird

I am having difficulty understanding why the following code is giving me the numbers below. Can anyone explain this conversion from float to int? (pCLocation is a CGPoint)
counter = 0;
pathCells[counter][0].x = pCLocation.x;
pathCells[counter][0].y = pCLocation.y;
cellCount[counter]++;
NSLog(#"%#",[NSString stringWithFormat:#"pCLocation at:
%f,%f",pCLocation.x,pCLocation.y]);
NSLog(#"%#",[NSString stringWithFormat:#"path cell 0: %i,%i",
pathCells[counter][cellCount[counter-1]].x,pathCells[counter][cellCount[counter]].y]);
2012-03-09 01:17:37.165 50LevelsBeta1[1704:207] pCLocation at: 47.000000,16.000000
2012-03-09 01:17:37.172 50LevelsBeta1[1704:207] path cell 0: 0,1078427648
Assuming your code is otherwise correct:
I think it would help you to understand how NSLog and other printf-style functions work. When you call NSLog(#"%c %f", a_char, a_float), your code pushes the format string and values onto the stack, then jumps to the start of that function's code. Since NSLog accepts a variable number of arguments, it doesn't know how much to pop off the stack yet. It knows at least there is a format string, so it pops that off and begins to scan it. When it finds a format specifier %c, it knows to pop one byte off the stack and print that value. Then it finds %f, so now it knows to pop another 32 bits and print that as a floating point value. Then it reaches the end of the format string, so it's done.
Now here's the kicker: if you lie to NSLog and tell it you are providing a int but actually provide a float, it has no way to know you are lying. It simply assumes you are telling the truth and prints whatever bits it finds in memory however you asked it to be printed.
That's why you are seeing weird values: you are printing a floating point value as though it were an int. If you really want an int value, you should either:
Apply a cast: NSLog(#"cell.x: %i", (int)cell.x);
Leave it a float but use the format string to hide the decimals: NSLog(#"cell.x: %.0f", cell.x);
(Alternate theory, still potentially useful.)
You might be printing out the contents of uninitialized memory.
In the code you've given, counter = 0 and is never changed. So you assign values to:
pathCells[0][0].x = pCLocation.x;
pathCells[0][0].y = pCLocation.y;
cellCount[0]++;
Then you print:
pathCells[0][cellCount[-1]].x
pathCells[0][cellCount[0]].y
I'm pretty sure that cellCount[-1] isn't what you want. C allows this because even though you think of it as working with an array of a specific size, foo[bar] really just means grab the value at memory address foo plus offset bar. So an index of -1 just means take one step back. That's why you don't get a warning or error, just junk data.
You should clarify what pathCells, cellCount, and counter are and how they relate to each other. I think you have a bug in how you are combining these things.

Trie Implementation Question

I'm implementing a trie for predictive text entry in VB.NET - basically autocompletion as far as the use of the trie is concerned. I've made my trie a recursive data structure based on the generic dictionary class.
It's basically:
class WordTree Inherits Dictionary(of Char, WordTree)
Each letter in a word (all upper cased) is used as a key to a new WordTrie. A null character on a leaf indicates the termination of a word. To find a word starting with a prefix I walk the trie as far as my prefix goes then collect all children words.
My question is basically on the implementation of the trie itself. I'm using the dictionary hash function to branch my tree. I could use a list and do a linear search over the list, or do something else. What's the smooth move here? Is this a reasonable way to do my branching?
Thanks.
Update:
Just to clarify, I'm basically asking if the dictionary branching approach is obviously inferior to some other alternative. The application in which I'm using this data structure only uses upper case letters, so maybe the array approach is the best. I might use the same data structure for a more complex typeahead situation in the future (more characters). In that case, it sounds like the dictionary is the right approach - up to the point where I need to use something more complex in general.
If it's just the 26 letters, as a 26 entry array. Then lookup is by index. It probably uses less space than the Dictionary if the bucket-list is longer than 26.
If you are worried about space, you can use bitmap compression on the valid byte transitions, assuming the 26char limit.
class State // could be struct or whatever
{
int valid; // can handle 32 transitions -- each bit set is valid
vector<State> transitions;
State getNextState( int ch )
{
int index;
int mask = ( 1 << ( toupper( ch ) - 'A' )) -1;
int bitsToCount = valid & mask;
for( index = 0; bitsToCount ; bitsToCount >>= 1)
{
index += bitsToCount & 1;
}
transitions.at( index );
}
};
There are other ways to do the bit counting Here, the index into the vector is the number of set bits in the valid bitset. the other alternative is the direct indexed array of states;
class State
{
State transitions[ 26 ]; // use the char as the index.
State getNextState( int ch )
{
return transitions[ ch ];
}
};
A good data structure that's efficient in space and potentially gives sub-linear prefix lookups is the ternary search tree. Peter Kankowski has a fantastic article about it. He uses C, but it's straightforward code once you understand the data structure. As he mentioned, this is the structure ispell uses for spelling correction.
I have done this (a trie implementation) in C with 8 bit chars, and simply used the array version (as alluded to by the "26 chars" answer).
HOWEVER, I am guessing that you want full unicode support (since a .NET char is unicode, among other reasons). Assuming you have to have support for unicode, the hash/map/dictionary lookup is probably your best bet, as a 64K entry array in each node won't really work very well.
About the only hack up I could think of on this is to store entire strings (suffixes or possibly "in-fixes") on branches that do not yet split, depending on how sparse the tree, er, trie, is. That adds a lot of logic to detect the multi-char strings, though, and to split them up when an alternate path is introduced.
What is the read vs update pattern?
---- update jul 2013 ---
If .NET strings have a function like java to get the bytes for a string (as UTF-8), then having an array in each node to represent the current position's byte value is probably a good way to go. You could even make the arrays variable size, with first/last bounds indicators in each node, since MANY nodes will have only lower case ASCII letters anyway, or only upper case letters or the digits 0-9 in some cases.
I've found burst trie's to be very space efficient. I wrote my own burst trie in Scala that also re-uses some ideas that I found in GWT's trie implementation. I used it in Stripe's Capture the Flag contest on a problem that was multi-node with a small amount of RAM.

Is there a practical limit to the size of bit masks?

There's a common way to store multiple values in one variable, by using a bitmask. For example, if a user has read, write and execute privileges on an item, that can be converted to a single number by saying read = 4 (2^2), write = 2 (2^1), execute = 1 (2^0) and then add them together to get 7.
I use this technique in several web applications, where I'd usually store the variable into a field and give it a type of MEDIUMINT or whatever, depending on the number of different values.
What I'm interested in, is whether or not there is a practical limit to the number of values you can store like this? For example, if the number was over 64, you couldn't use (64 bit) integers any more. If this was the case, what would you use? How would it affect your program logic (ie: could you still use bitwise comparisons)?
I know that once you start getting really large sets of values, a different method would be the optimal solution, but I'm interested in the boundaries of this method.
Off the top of my head, I'd write a set_bit and get_bit function that could take an array of bytes and a bit offset in the array, and use some bit-twiddling to set/get the appropriate bit in the array. Something like this (in C, but hopefully you get the idea):
// sets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// result is 0 on success, non-zero on failure (offset out-of-bounds)
int set_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//set the right bit
bytes[offset >> 3] |= (1 << (offset & 0x7));
return 0; //success
}
//gets the n-th bit in |bytes|. num_bytes is the number of bytes in the array
// returns (-1) on error, 0 if bit is "off", positive number if "on"
int get_bit(char* bytes, unsigned long num_bytes, unsigned long offset)
{
// make sure offset is valid
if(offset < 0 || offset > (num_bytes<<3)-1) { return -1; }
//get the right bit
return (bytes[offset >> 3] & (1 << (offset & 0x7));
}
I've used bit masks in filesystem code where the bit mask is many times bigger than a machine word. think of it like an "array of booleans";
(journalling masks in flash memory if you want to know)
many compilers know how to do this for you. Adda bit of OO code to have types that operate senibly and then your code starts looking like it's intent, not some bit-banging.
My 2 cents.
With a 64-bit integer, you can store values up to 2^64-1, 64 is only 2^6. So yes, there is a limit, but if you need more than 64-its worth of flags, I'd be very interested to know what they were all doing :)
How many states so you need to potentially think about? If you have 64 potential states, the number of combinations they can exist in is the full size of a 64-bit integer.
If you need to worry about 128 flags, then a pair of bit vectors would suffice (2^64 * 2).
Addition: in Programming Pearls, there is an extended discussion of using a bit array of length 10^7, implemented in integers (for holding used 800 numbers) - it's very fast, and very appropriate for the task described in that chapter.
Some languages ( I believe perl does, not sure ) permit bitwise arithmetic on strings. Giving you a much greater effective range. ( (strlen * 8bit chars ) combinations )
However, I wouldn't use a single value for superimposition of more than one /type/ of data. The basic r/w/x triplet of 3-bit ints would probably be the upper "practical" limit, not for space efficiency reasons, but for practical development reasons.
( Php uses this system to control its error-messages, and I have already found that its a bit over-the-top when you have to define values where php's constants are not resident and you have to generate the integer by hand, and to be honest, if chmod didn't support the 'ugo+rwx' style syntax I'd never want to use it because i can never remember the magic numbers )
The instant you have to crack open a constants table to debug code you know you've gone too far.
Old thread, but it's worth mentioning that there are cases requiring bloated bit masks, e.g., molecular fingerprints, which are often generated as 1024-bit arrays which we have packed in 32 bigint fields (SQL Server not supporting UInt32). Bit wise operations work fine - until your table starts to grow and you realize the sluggishness of separate function calls. The binary data type would work, were it not for T-SQL's ban on bitwise operators having two binary operands.
For example .NET uses array of integers as an internal storage for their BitArray class.
Practically there's no other way around.
That being said, in SQL you will need more than one column (or use the BLOBS) to store all the states.
You tagged this question SQL, so I think you need to consult with the documentation for your database to find the size of an integer. Then subtract one bit for the sign, just to be safe.
Edit: Your comment says you're using MySQL. The documentation for MySQL 5.0 Numeric Types states that the maximum size of a NUMERIC is 64 or 65 digits. That's 212 bits for 64 digits.
Remember that your language of choice has to be able to work with those digits, so you may be limited to a 64-bit integer anyway.