What does having two asterisk ** in Objective-C mean? - objective-c

I understand having one asterisk * is a pointer, what does having two ** mean?
I stumble upon this from the documentation:
- (NSAppleEventDescriptor *)executeAndReturnError:(NSDictionary **)errorInfo

It's a pointer to a pointer, just like in C (which, despite its strange square-bracket syntax, Objective-C is based on):
char c;
char *pc = &c;
char **ppc = &pc;
char ***pppc = &ppc;
and so on, ad infinitum (or until you run out of variable space).
It's often used to pass a pointer to a function that must be able to change the pointer itself (such as re-allocating memory for a variable-sized object).
=====
Following your request for a sample that shows how to use it, here's some code I wrote for another post which illustrates it. It's an appendStr() function which manages its own allocations (you still have to free the final version). Initially you set the string (char *) to NULL and the function itself will allocate space as needed.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void appendToStr (int *sz, char **str, char *app) {
char *newstr;
int reqsz;
/* If no string yet, create it with a bit of space. */
if (*str == NULL) {
*sz = strlen (app) + 10;
if ((*str = malloc (*sz)) == NULL) {
*sz = 0;
return;
}
strcpy (*str, app);
return;
}
/* If not enough room in string, expand it. We could use realloc
but I've kept it as malloc/cpy/free to ensure the address
changes (for the program output). */
reqsz = strlen (*str) + strlen (app) + 1;
if (reqsz > *sz) {
*sz = reqsz + 10;
if ((newstr = malloc (*sz)) == NULL) {
free (*str);
*str = NULL;
*sz = 0;
return;
}
strcpy (newstr, *str);
free (*str);
*str = newstr;
}
/* Append the desired string to the (now) long-enough buffer. */
strcat (*str, app);
}
static void dump(int sz, char *x) {
if (x == NULL)
printf ("%8p [%2d] %3d [%s]\n", x, sz, 0, "");
else
printf ("%8p [%2d] %3d [%s]\n", x, sz, strlen (x), x);
}
static char *arr[] = {"Hello.", " My", " name", " is", " Pax",
" and"," I", " am", " old."};
int main (void) {
int i;
char *x = NULL;
int sz = 0;
printf (" Pointer Size Len Value\n");
printf (" ------- ---- --- -----\n");
dump (sz, x);
for (i = 0; i < sizeof (arr) / sizeof (arr[0]); i++) {
appendToStr (&sz, &x, arr[i]);
dump (sz, x);
}
}
The code outputs the following. You can see how the pointer changes when the currently allocated memory runs out of space for the expanded string (at the comments):
Pointer Size Len Value
------- ---- --- -----
# NULL pointer here since we've not yet put anything in.
0x0 [ 0] 0 []
# The first time we put in something, we allocate space (+10 chars).
0x6701b8 [16] 6 [Hello.]
0x6701b8 [16] 9 [Hello. My]
0x6701b8 [16] 14 [Hello. My name]
# Adding " is" takes length to 17 so we need more space.
0x6701d0 [28] 17 [Hello. My name is]
0x6701d0 [28] 21 [Hello. My name is Pax]
0x6701d0 [28] 25 [Hello. My name is Pax and]
0x6701d0 [28] 27 [Hello. My name is Pax and I]
# Ditto for adding " am".
0x6701f0 [41] 30 [Hello. My name is Pax and I am]
0x6701f0 [41] 35 [Hello. My name is Pax and I am old.]
In that case, you pass in **str since you need to be able to change the *str value.
=====
Or the following, which does an unrolled bubble sort (oh, the shame!) on strings that aren't in an array. It does this by directly exchanging the addresses of the strings.
#include <stdio.h>
static void sort (char **s1, char **s2, char **s3, char **s4, char **s5) {
char *t;
if (strcmp (*s1, *s2) > 0) { t = *s1; *s1 = *s2; *s2 = t; }
if (strcmp (*s2, *s3) > 0) { t = *s2; *s2 = *s3; *s3 = t; }
if (strcmp (*s3, *s4) > 0) { t = *s3; *s3 = *s4; *s4 = t; }
if (strcmp (*s4, *s5) > 0) { t = *s4; *s4 = *s5; *s5 = t; }
if (strcmp (*s1, *s2) > 0) { t = *s1; *s1 = *s2; *s2 = t; }
if (strcmp (*s2, *s3) > 0) { t = *s2; *s2 = *s3; *s3 = t; }
if (strcmp (*s3, *s4) > 0) { t = *s3; *s3 = *s4; *s4 = t; }
if (strcmp (*s1, *s2) > 0) { t = *s1; *s1 = *s2; *s2 = t; }
if (strcmp (*s2, *s3) > 0) { t = *s2; *s2 = *s3; *s3 = t; }
if (strcmp (*s1, *s2) > 0) { t = *s1; *s1 = *s2; *s2 = t; }
}
int main (int argCount, char *argVar[]) {
char *a = "77";
char *b = "55";
char *c = "99";
char *d = "88";
char *e = "66";
printf ("Unsorted: [%s] [%s] [%s] [%s] [%s]\n", a, b, c, d, e);
sort (&a,&b,&c,&d,&e);
printf (" Sorted: [%s] [%s] [%s] [%s] [%s]\n", a, b, c, d, e);
return 0;
}
which produces:
Unsorted: [77] [55] [99] [88] [66]
Sorted: [55] [66] [77] [88] [99]
Never mind the implementation of sort, just notice that the variables are passed as char ** so that they can be swapped easily. Any real sort would probably be acting on a true array of data rather than individual variables but that's not the point of the example.

Pointer to Pointer
The definition of "pointer" says that it's a special variable that stores the address of another variable (not the value). That other variable can very well be a pointer. This means that it's perfectly legal for a pointer to be pointing to another pointer.
Let's suppose we have a pointer p1 that points to yet another pointer p2 that points to a character c. In memory, the three variables can be visualized as :
So we can see that in memory, pointer p1 holds the address of pointer p2. Pointer p2 holds the address of character c.
So p2 is pointer to character c, while p1 is pointer to p2. Or we can also say that p2 is a pointer to a pointer to character c.
Now, in code p2 can be declared as :
char *p2 = &c;
But p1 is declared as :
char **p1 = &p2;
So we see that p1 is a double pointer (i.e. pointer to a pointer to a character) and hence the two *s in declaration.
Now,
p1 is the address of p2 i.e. 5000
*p1 is the value held by p2 i.e. 8000
**p1 is the value at 8000 i.e. c
I think that should pretty much clear the concept, lets take a small example :
Source: http://www.thegeekstuff.com/2012/01/advanced-c-pointers/
For some of its use cases:
This is usually used to pass a pointer to a function that must be able to change the pointer itself, some of its use cases are:
Such as handling errors, it allows the receiving method to control what the pointer is referencing to. See this question
For creating an opaque struct i.e. so that others won't be able to allocate space. See this question
In case of memory expansion mentioned in the other answers of this question.
feel free to edit/improve this answer as I am learning:]

A pointer to a pointer.

In C pointers and arrays can be treated the same, meaning e.g. char* is a string (array of chars). If you want to pass an array of arrays (e.g. many strings) to a function you can use char**.

(reference: more iOS 6 development)
In Objective-C methods, arguments, including object pointers, are
passed by value, which means that the called method gets its own copy
of the pointer that was passed in. So if the called method wants to
change the pointer, as opposed to the data the pointer points to, you
need another level of indirection. Thus, the pointer to the pointer.

Related

Determine types from a variadic function's arguments in C

I'd like a step by step explanation on how to parse the arguments of a variadic function
so that when calling va_arg(ap, TYPE); I pass the correct data TYPE of the argument being passed.
Currently I'm trying to code printf.
I am only looking for an explanation preferably with simple examples but not the solution to printf since I want to solve it myself.
Here are three examples which look like what I am looking for:
https://stackoverflow.com/a/1689228/3206885
https://stackoverflow.com/a/5551632/3206885
https://stackoverflow.com/a/1722238/3206885
I know the basics of what typedef, struct, enum and union do but can't figure out some practical application cases like the examples in the links.
What do they really mean? I can't wrap my brain around how they work.
How can I pass the data type from a union to va_arg like in the links examples? How does it match?
with a modifier like %d, %i ... or the data type of a parameter?
Here's what I've got so far:
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include "my.h"
typedef struct s_flist
{
char c;
(*f)();
} t_flist;
int my_printf(char *format, ...)
{
va_list ap;
int i;
int j;
int result;
int arg_count;
char *cur_arg = format;
char *types;
t_flist flist[] =
{
{ 's', &my_putstr },
{ 'i', &my_put_nbr },
{ 'd', &my_put_nbr }
};
i = 0;
result = 0;
types = (char*)malloc( sizeof(*format) * (my_strlen(format) / 2 + 1) );
fparser(types, format);
arg_count = my_strlen(types);
while (format[i])
{
if (format[i] == '%' && format[i + 1])
{
i++;
if (format[i] == '%')
result += my_putchar(format[i]);
else
{
j = 0;
va_start(ap, format);
while (flist[j].c)
{
if (format[i] == flist[j].c)
result += flist[i].f(va_arg(ap, flist[i].DATA_TYPE??));
j++;
}
}
}
result += my_putchar(format[i]);
i++;
}
va_end(ap);
return (result);
}
char *fparser(char *types, char *str)
{
int i;
int j;
i = 0;
j = 0;
while (str[i])
{
if (str[i] == '%' && str[i + 1] &&
str[i + 1] != '%' && str[i + 1] != ' ')
{
i++;
types[j] = str[i];
j++;
}
i++;
}
types[j] = '\0';
return (types);
}
You can't get actual type information from va_list. You can get what you're looking for from format. What it seems you're not expecting is: none of the arguments know what the actual types are, but format represents the caller's idea of what the types should be. (Perhaps a further hint: what would the actual printf do if a caller gave it format specifiers that didn't match the varargs passed in? Would it notice?)
Your code would have to parse the format string for "%" format specifiers, and use those specifiers to branch into reading the va_list with specific hardcoded types. For example, (pseudocode) if (fspec was "%s") { char* str = va_arg(ap, char*); print out str; }. Not giving more detail because you explicitly said you didn't want a complete solution.
You will never have a type as a piece of runtime data that you can pass to va_arg as a value. The second argument to va_arg must be a literal, hardcoded specification referring to a known type at compile time. (Note that va_arg is a macro that gets expanded at compile time, not a function that gets executed at runtime - you couldn't have a function taking a type as an argument.)
A couple of your links suggest keeping track of types via an enum, but this is only for the benefit of your own code being able to branch based on that information; it is still not something that can be passed to va_arg. You have to have separate pieces of code saying literally va_arg(ap, int) and va_arg(ap, char*) so there's no way to avoid a switch or a chain of ifs.
The solution you want to make, using the unions and structs, would start from something like this:
typedef union {
int i;
char *s;
} PRINTABLE_THING;
int print_integer(PRINTABLE_THING pt) {
// format and print pt.i
}
int print_string(PRINTABLE_THING pt) {
// format and print pt.s
}
The two specialized functions would work fine on their own by taking explicit int or char* params; the reason we make the union is to enable the functions to formally take the same type of parameter, so that they have the same signature, so that we can define a single type that means pointer to that kind of function:
typedef int (*print_printable_thing)(PRINTABLE_THING);
Now your code can have an array of function pointers of type print_printable_thing, or an array of structs that have print_printable_thing as one of the structs' fields:
typedef struct {
char format_char;
print_printable_thing printing_function;
} FORMAT_CHAR_AND_PRINTING_FUNCTION_PAIRING;
FORMAT_CHAR_AND_PRINTING_FUNCTION_PAIRING formatters[] = {
{ 'd', print_integer },
{ 's', print_string }
};
int formatter_count = sizeof(formatters) / sizeof(FORMAT_CHAR_AND_PRINTING_FUNCTION_PAIRING);
(Yes, the names are all intentionally super verbose. You'd probably want shorter ones in the real program, or even anonymous types where appropriate.)
Now you can use that array to select the correct formatter at runtime:
for (int i = 0; i < formatter_count; i++)
if (current_format_char == formatters[i].format_char)
result += formatters[i].printing_function(current_printable_thing);
But the process of getting the correct thing into current_printable_thing is still going to involve branching to get to a va_arg(ap, ...) with the correct hardcoded type. Once you've written it, you may find yourself deciding that you didn't actually need the union nor the array of structs.

Stange behavior with my C string reverse function

I'm just an amateur programmer...
And when reading, for the second time, and more than two years apart, kochan's "Programming in Objective-C", now the 6th ed., reaching the pointer chapter i tried to revive the old days when i started programming with C...
So, i tried to program a reverse C string function, using char pointers...
At the end i got the desired result, but... got also a very strange behavior, i cannot explain with my little programming experience...
First the code:
This is a .m file,
#import <Foundation/Foundation.h>
#import "*pathToFolder*/NSPrint.m"
int main(int argc, char const *argv[])
{
#autoreleasepool
{
char * reverseString(char * str);
char *ch;
if (argc < 2)
{
NSPrint(#"No word typed in the command line!");
return 1;
}
NSPrint(#"Reversing arguments:");
for (int i = 1; argv[i]; i++)
{
ch = reverseString(argv[i]);
printf("%s\n", ch);
//NSPrint(#"%s - %s", argv[i], ch);
}
}
return 0;
}
char * reverseString(char * str)
{
int size = 0;
for ( ; *(str + size) != '\0'; size++) ;
//printf("Size: %i\n", size);
char result[size + 1];
int i = 0;
for (size-- ; size >= 0; size--, i++)
{
result[i] = *(str + size);
//printf("%c, %c\n", result[i], *(str + size));
}
result[i] = '\0';
//printf("result location: %lu\n", result);
//printf("%s\n", result);
return result;
}
Second some notes:
This code is compiled in a MacBook Pro, with MAC OS X Maverick, with CLANG (clang -fobjc-arc $file_name -o $file_name_base)
That NSPrint is just a wrapper for printf to print a NSString constructed with stringWithFormat:arguments:
And third the strange behavior:
If I uncomment all those commented printf declarations, everything work just fine, i.e., all printf functions print what they have to print, including the last printf inside main function.
If I uncomment one, and just one, randomly chosen, of those comment printf functions, again everything work just fine, and I got the correct printf results, including the last printf inside main function.
If I leave all those commented printf functions as they are, I GOT ONLY BLANK LINES with the last printf inside main block, and one black line for each argument passed...
Worst, if I use that NSPrint function inside main, instead of the printf one, I get the desired result :!
Can anyone bring some light here please :)
You're returning a local array, that goes out of scope as the function exits. Dereferencing that memory causes undefined behavior.
You are returning a pointer to a local variable of the function that was called. When that function returns, the memory for the local variable becomes invalid, and the pointer returned is rubbish.

What does PKCS5_PBKDF2_HMAC_SHA1 return value mean?

I'm attempting to use OpenSSL's PKCS5_PBKDF2_HMAC_SHA1 method. I gather that it returns 0 if it succeeds, and some other value otherwise. My question is, what does a non-zero return value mean? Memory error? Usage error? How should my program handle it (retry, quit?)?
Edit: A corollary question is, is there any way to figure this out besides reverse-engineering the method itself?
is there any way to figure this out besides reverse-engineering the method itself?
PKCS5_PBKDF2_HMAC_SHA1 looks like one of those undocumented functions because I can't find it in the OpenSSL docs. OpenSSL has a lot of them, so you should be prepared to study the sources if you are going to use the library.
I gather that it returns 0 if it succeeds, and some other value otherwise.
Actually, its reversed. Here's how I know...
$ grep -R PKCS5_PBKDF2_HMAC_SHA1 *
crypto/evp/evp.h:int PKCS5_PBKDF2_HMAC_SHA1(const char *pass, int passlen,
crypto/evp/p5_crpt2.c:int PKCS5_PBKDF2_HMAC_SHA1(const char *pass, int passlen,
...
So, you find the function's implementation in crypto/evp/p5_crpt2.c:
int PKCS5_PBKDF2_HMAC_SHA1(const char *pass, int passlen,
const unsigned char *salt, int saltlen, int iter,
int keylen, unsigned char *out)
{
return PKCS5_PBKDF2_HMAC(pass, passlen, salt, saltlen, iter,
EVP_sha1(), keylen, out);
}
Following PKCS5_PBKDF2_HMAC:
$ grep -R PKCS5_PBKDF2_HMAC *
...
crypto/evp/evp.h:int PKCS5_PBKDF2_HMAC(const char *pass, int passlen,
crypto/evp/p5_crpt2.c:int PKCS5_PBKDF2_HMAC(const char *pass, int passlen,
...
And again, from crypto/evp/p5_crpt2.c:
int PKCS5_PBKDF2_HMAC(const char *pass, int passlen,
const unsigned char *salt, int saltlen, int iter,
const EVP_MD *digest,
int keylen, unsigned char *out)
{
unsigned char digtmp[EVP_MAX_MD_SIZE], *p, itmp[4];
int cplen, j, k, tkeylen, mdlen;
unsigned long i = 1;
HMAC_CTX hctx_tpl, hctx;
mdlen = EVP_MD_size(digest);
if (mdlen < 0)
return 0;
HMAC_CTX_init(&hctx_tpl);
p = out;
tkeylen = keylen;
if(!pass)
passlen = 0;
else if(passlen == -1)
passlen = strlen(pass);
if (!HMAC_Init_ex(&hctx_tpl, pass, passlen, digest, NULL))
{
HMAC_CTX_cleanup(&hctx_tpl);
return 0;
}
while(tkeylen)
{
if(tkeylen > mdlen)
cplen = mdlen;
else
cplen = tkeylen;
/* We are unlikely to ever use more than 256 blocks (5120 bits!)
* but just in case...
*/
itmp[0] = (unsigned char)((i >> 24) & 0xff);
itmp[1] = (unsigned char)((i >> 16) & 0xff);
itmp[2] = (unsigned char)((i >> 8) & 0xff);
itmp[3] = (unsigned char)(i & 0xff);
if (!HMAC_CTX_copy(&hctx, &hctx_tpl))
{
HMAC_CTX_cleanup(&hctx_tpl);
return 0;
}
if (!HMAC_Update(&hctx, salt, saltlen)
|| !HMAC_Update(&hctx, itmp, 4)
|| !HMAC_Final(&hctx, digtmp, NULL))
{
HMAC_CTX_cleanup(&hctx_tpl);
HMAC_CTX_cleanup(&hctx);
return 0;
}
HMAC_CTX_cleanup(&hctx);
memcpy(p, digtmp, cplen);
for(j = 1; j < iter; j++)
{
if (!HMAC_CTX_copy(&hctx, &hctx_tpl))
{
HMAC_CTX_cleanup(&hctx_tpl);
return 0;
}
if (!HMAC_Update(&hctx, digtmp, mdlen)
|| !HMAC_Final(&hctx, digtmp, NULL))
{
HMAC_CTX_cleanup(&hctx_tpl);
HMAC_CTX_cleanup(&hctx);
return 0;
}
HMAC_CTX_cleanup(&hctx);
for(k = 0; k < cplen; k++)
p[k] ^= digtmp[k];
}
tkeylen-= cplen;
i++;
p+= cplen;
}
HMAC_CTX_cleanup(&hctx_tpl);
return 1;
}
So it looks like 0 on failure, and 1 on success. You should not see other values. And if you get a 0, then all the OUT parameters are junk.
Memory error? Usage error?
Well, sometimes you can call ERR_get_error. If you call it and it makes sense, then the error code is good. If the error code makes no sense, then its probably not good.
Sadly, that's the way I handle it because the library is not consistent with setting error codes. For example, here's the library code to load the RDRAND engine.
Notice the code clears the error code on failure if its a 3rd generation Ivy Bridge (that's the capability being tested), and does not clear or set an error otherwise!!!
void ENGINE_load_rdrand (void)
{
extern unsigned int OPENSSL_ia32cap_P[];
if (OPENSSL_ia32cap_P[1] & (1<<(62-32)))
{
ENGINE *toadd = ENGINE_rdrand();
if(!toadd) return;
ENGINE_add(toadd);
ENGINE_free(toadd);
ERR_clear_error();
}
}
How should my program handle it (retry, quit?)?
It looks like a hard failure.
Finally, that's exactly how I navigate the sources in this situation. If you don't like grep you can try ctags or another source code browser.

Determine Position of Most Signifiacntly Set Bit in a Byte

I have a byte I am using to store bit flags. I need to compute the position of the most significant set bit in the byte.
Example Byte: 00101101 => 6 is the position of the most significant set bit
Compact Hex Mapping:
[0x00] => 0x00
[0x01] => 0x01
[0x02,0x03] => 0x02
[0x04,0x07] => 0x03
[0x08,0x0F] => 0x04
[0x10,0x1F] => 0x05
[0x20,0x3F] => 0x06
[0x40,0x7F] => 0x07
[0x80,0xFF] => 0x08
TestCase in C:
#include <stdio.h>
unsigned char check(unsigned char b) {
unsigned char c = 0x08;
unsigned char m = 0x80;
do {
if(m&b) { return c; }
else { c -= 0x01; }
} while(m>>=1);
return 0; //never reached
}
int main() {
unsigned char input[256] = {
0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,
0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1a,0x1b,0x1c,0x1d,0x1e,0x1f,
0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x29,0x2a,0x2b,0x2c,0x2d,0x2e,0x2f,
0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3a,0x3b,0x3c,0x3d,0x3e,0x3f,
0x40,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4a,0x4b,0x4c,0x4d,0x4e,0x4f,
0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5a,0x5b,0x5c,0x5d,0x5e,0x5f,
0x60,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x6a,0x6b,0x6c,0x6d,0x6e,0x6f,
0x70,0x71,0x72,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7a,0x7b,0x7c,0x7d,0x7e,0x7f,
0x80,0x81,0x82,0x83,0x84,0x85,0x86,0x87,0x88,0x89,0x8a,0x8b,0x8c,0x8d,0x8e,0x8f,
0x90,0x91,0x92,0x93,0x94,0x95,0x96,0x97,0x98,0x99,0x9a,0x9b,0x9c,0x9d,0x9e,0x9f,
0xa0,0xa1,0xa2,0xa3,0xa4,0xa5,0xa6,0xa7,0xa8,0xa9,0xaa,0xab,0xac,0xad,0xae,0xaf,
0xb0,0xb1,0xb2,0xb3,0xb4,0xb5,0xb6,0xb7,0xb8,0xb9,0xba,0xbb,0xbc,0xbd,0xbe,0xbf,
0xc0,0xc1,0xc2,0xc3,0xc4,0xc5,0xc6,0xc7,0xc8,0xc9,0xca,0xcb,0xcc,0xcd,0xce,0xcf,
0xd0,0xd1,0xd2,0xd3,0xd4,0xd5,0xd6,0xd7,0xd8,0xd9,0xda,0xdb,0xdc,0xdd,0xde,0xdf,
0xe0,0xe1,0xe2,0xe3,0xe4,0xe5,0xe6,0xe7,0xe8,0xe9,0xea,0xeb,0xec,0xed,0xee,0xef,
0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7,0xf8,0xf9,0xfa,0xfb,0xfc,0xfd,0xfe,0xff };
unsigned char truth[256] = {
0x00,0x01,0x02,0x02,0x03,0x03,0x03,0x03,0x04,0x04,0x04,0x04,0x04,0x04,0x04,0x04,
0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,
0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,
0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08};
int i,r;
int f = 0;
for(i=0; i<256; ++i) {
r=check(input[i]);
if(r !=(truth[i])) {
printf("failed %d : 0x%x : %d\n",i,0x000000FF & ((int)input[i]),r);
f += 1;
}
}
if(!f) { printf("passed all\n"); }
else { printf("failed %d\n",f); }
return 0;
}
I would like to simplify my check() function to not involve looping (or branching preferably). Is there a bit twiddling hack or hashed lookup table solution to compute the position of the most significant set bit in a byte?
Your question is about an efficient way to compute log2 of a value. And because you seem to want a solution that is not limited to the C language I have been slightly lazy and tweaked some C# code I have.
You want to compute log2(x) + 1 and for x = 0 (where log2 is undefined) you define the result as 0 (e.g. you create a special case where log2(0) = -1).
static readonly Byte[] multiplyDeBruijnBitPosition = new Byte[] {
7, 2, 3, 4,
6, 1, 5, 0
};
public static Byte Log2Plus1(Byte value) {
if (value == 0)
return 0;
var roundedValue = value;
roundedValue |= (Byte) (roundedValue >> 1);
roundedValue |= (Byte) (roundedValue >> 2);
roundedValue |= (Byte) (roundedValue >> 4);
var log2 = multiplyDeBruijnBitPosition[((Byte) (roundedValue*0xE3)) >> 5];
return (Byte) (log2 + 1);
}
This bit twiddling hack is taken from Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup where you can see the equivalent C source code for 32 bit values. This code has been adapted to work on 8 bit values.
However, you may be able to use an operation that gives you the result using a very efficient built-in function (on many CPU's a single instruction like the Bit Scan Reverse is used). An answer to the question Bit twiddling: which bit is set? has some information about this. A quote from the answer provides one possible reason why there is low level support for solving this problem:
Things like this are the core of many O(1) algorithms such as kernel schedulers which need to find the first non-empty queue signified by an array of bits.
That was a fun little challenge. I don't know if this one is completely portable since I only have VC++ to test with, and I certainly can't say for sure if it's more efficient than other approaches. This version was coded with a loop but it can be unrolled without too much effort.
static unsigned char check(unsigned char b)
{
unsigned char r = 8;
unsigned char sub = 1;
unsigned char s = 7;
for (char i = 0; i < 8; i++)
{
sub = sub & ((( b & (1 << s)) >> s--) - 1);
r -= sub;
}
return r;
}
I'm sure everyone else has long since moved on to other topics but there was something in the back of my mind suggesting that there had to be a more efficient branch-less solution to this than just unrolling the loop in my other posted solution. A quick trip to my copy of Warren put me on the right track: Binary search.
Here's my solution based on that idea:
Pseudo-code:
// see if there's a bit set in the upper half
if ((b >> 4) != 0)
{
offset = 4;
b >>= 4;
}
else
offset = 0;
// see if there's a bit set in the upper half of what's left
if ((b & 0x0C) != 0)
{
offset += 2;
b >>= 2;
}
// see if there's a bit set in the upper half of what's left
if > ((b & 0x02) != 0)
{
offset++;
b >>= 1;
}
return b + offset;
Branch-less C++ implementation:
static unsigned char check(unsigned char b)
{
unsigned char adj = 4 & ((((unsigned char) - (b >> 4) >> 7) ^ 1) - 1);
unsigned char offset = adj;
b >>= adj;
adj = 2 & (((((unsigned char) - (b & 0x0C)) >> 7) ^ 1) - 1);
offset += adj;
b >>= adj;
adj = 1 & (((((unsigned char) - (b & 0x02)) >> 7) ^ 1) - 1);
return (b >> adj) + offset + adj;
}
Yes, I know that this is all academic :)
It is not possible in plain C. The best I would suggest is the following implementation of check. Despite quite "ugly" I think it runs faster than the ckeck version in the question.
int check(unsigned char b)
{
if(b&128) return 8;
if(b&64) return 7;
if(b&32) return 6;
if(b&16) return 5;
if(b&8) return 4;
if(b&4) return 3;
if(b&2) return 2;
if(b&1) return 1;
return 0;
}
Edit: I found a link to the actual code: http://www.hackersdelight.org/hdcodetxt/nlz.c.txt
The algorithm below is named nlz8 in that file. You can choose your favorite hack.
/*
From last comment of: http://stackoverflow.com/a/671826/315052
> Hacker's Delight explains how to correct for the error in 32-bit floats
> in 5-3 Counting Leading 0's. Here's their code, which uses an anonymous
> union to overlap asFloat and asInt: k = k & ~(k >> 1); asFloat =
> (float)k + 0.5f; n = 158 - (asInt >> 23); (and yes, this relies on
> implementation-defined behavior) - Derrick Coetzee Jan 3 '12 at 8:35
*/
unsigned char check (unsigned char b) {
union {
float asFloat;
int asInt;
} u;
unsigned k = b & ~(b >> 1);
u.asFloat = (float)k + 0.5f;
return 32 - (158 - (u.asInt >> 23));
}
Edit -- not exactly sure what the asker means by language independent, but below is the equivalent code in python.
import ctypes
class Anon(ctypes.Union):
_fields_ = [
("asFloat", ctypes.c_float),
("asInt", ctypes.c_int)
]
def check(b):
k = int(b) & ~(int(b) >> 1)
a = Anon(asFloat=(float(k) + float(0.5)))
return 32 - (158 - (a.asInt >> 23))

how to change the input of this program? (C Language)

How can I replace the use of (FILE) and (fopen) with (scanf) to get the input values and send in these 2 functions?
I want to use this function in Objective-c code.
For more info you can see the whole code here link
static void stemfile(FILE * f)
{ while(TRUE)
{ int ch = getc(f);
if (ch == EOF) return;
if (LETTER(ch))
{ int i = 0;
while(TRUE)
{ if (i == i_max) increase_s();
ch = tolower(ch); /* forces lower case */
s[i] = ch; i++;
ch = getc(f);
if (!LETTER(ch)) { ungetc(ch,f); break; }
}
s[stem(s,0,i-1)+1] = 0;
/* the previous line calls the stemmer and uses its result to
zero-terminate the string in s */
printf("%s",s);
}
else putchar(ch);
}
}
int main(int argc, char * argv[])
{ int i;
s = (char *) malloc(i_max+1);
for (i = 1; i < argc; i++)
{ FILE * f = fopen(argv[i],"r");
if (f == 0) { fprintf(stderr,"File %s not found\n",argv[i]); exit(1); }
stemfile(f);
}
free(s);
return 0;
}
The scanf() function cannot be a direct replacement for the existing code. The existing code (which is not very well written IMO), splits up the input character stream into letters (defined by the LETTER() macro to be either uppercase or lowercase characters), and non-letters, and converts these letter sequences into lowercase before applying the stem() function to them.
The scanf() function, on the other hand extracts primitive types (int, char, double, etc.) and explicitly delimited strings from the input stream. The delimiters in the given code (i.e. anything that is not LETTER()) is too vague for scanf() (though not for a regular expression). scanf() needs a specific character on each end of a substring to look for. Also, scanf() cannot convert to lowercase automatically.
Assuming your input continues to be files, I think the easiest solution might be to leave the code as-is and use it, convoluted as it may be. There is nothing about it that shouldn't run as part of a larger Objective-C program. Objective-C, after all, still provides access to the C standard library, at least within the limits that the operating system sets (iOS is far more limiting than MacOS, if your are on an Apple platform).
The general problem here is that of tokenization: breaking an input sequence of unclassified symbols (like characters) into sequence of classified tokens (like words and spaces). A common approach to the problem is to use a finite state machine/automaton (FSA/FSM) to apply parsing logic to the input sequence and extract the tokens as they are encountered. An FSA can be a bit hard to set up, but it is very robust and general.
I'm still not sure why you would want to use scanf() in main(). It would presumably mean changing the interface of stemfile() (including the name since it would no longer be processing a file) to take a character string as input. And scanf() is going to make life difficult; it will read strings separated by blanks, which may be part of its attraction, but it will include any punctuation that is included in the 'word'.
As Randall noted, the code in the existing function is a little obsure; I think it could be written more simply as follows:
#include <stdio.h>
#include <ctype.h>
#define LETTER(x) isalpha(x)
extern int stem(char *s, int lo, int hi);
static void stemfile(FILE * f)
{
int ch;
while ((ch = getc(f)) != EOF)
{
if (LETTER(ch))
{
char s[1024];
int i = 0;
s[i++] = ch;
while ((ch = getc(f)) != EOF && LETTER(ch))
s[i++] = ch;
if (ch != EOF)
ungetc(ch, f);
s[i] = '\0';
s[stem(s, 0, i-1)+1] = 0;
/* the previous line calls the stemmer and uses its result to
zero-terminate the string in s */
printf("%s", s);
}
else
putchar(ch);
}
}
I've slightly simplified things by making s into a simple local variable (it appears to have been a global, as does imax), removing imax and the increase_s() function. Those are largely incidental to the operation of the function.
If you want this to process a (null-terminated) string instead, then:
static void stemstring(const char *src)
{
char ch;
while ((ch = *src++) != '\0')
{
if (LETTER(ch))
{
int i = 0;
char s[1024];
s[i++] = ch;
while ((ch = *src++) != '\0' && LETTER(ch))
s[i++] = ch;
if (ch != '\0')
src--;
s[i-1] = '\0';
s[stem(s,0,i-1)+1] = 0;
/* the previous line calls the stemmer and uses its result to
zero-terminate the string in s */
printf("%s",s);
}
else
putchar(ch);
}
}
This systematically changes getc(f) into *src++, EOF into \0, and ungetc() into src--. It also (safely) changes the type of ch from int (necessary for I/O) to char. If you are worried about buffer overflow, you have to work a bit harder in the function, but few words in practice will be even 1024 bytes (and you could use 4096 as easily as 1024, with correspondingly smaller - infinitesimal - chance of real data overflowing the buffer. You need to judge whether that is a 'real' risk for you.
The main program can become quite simply:
int main(void)
{
char string[1024];
while (scanf("%1023s", string) == 1)
stemstring(string);
return(0);
}
Clearly, because of the '1023' in the format, this will never overflow the inner buffer. (NB: Removed the . from "%.1023s" in first version of this answer; scanf() is not the same as printf()!).
Challenged: does this work?
Yes - this code below (adding a dummy stem() function and slightly modifying the printing) works reasonably well for me:
#include <stdio.h>
#include <ctype.h>
#include <assert.h>
#define LETTER(x) isalpha(x)
#define MAX(x, y) (((x) > (y)) ? (x) : (y))
static int stem(const char *s, int begin, int end)
{
assert(s != 0);
return MAX(end - begin - 3, 3);
}
static void stemstring(const char *src)
{
char ch;
while ((ch = *src++) != '\0')
{
if (LETTER(ch))
{
int i = 0;
char s[1024];
s[i++] = ch;
while ((ch = *src++) != '\0' && LETTER(ch))
s[i++] = ch;
if (ch != '\0')
src--;
s[i-1] = '\0';
s[stem(s,0,i-1)+1] = 0;
/* the previous line calls the stemmer and uses its result to
zero-terminate the string in s */
printf("<<%s>>\n",s);
}
else
putchar(ch);
}
putchar('\n');
}
int main(void)
{
char string[1024];
while (scanf("%1023s", string) == 1)
stemstring(string);
return(0);
}
Example dialogue
H: assda23
C: <<assd>>
C: 23
H: 3423///asdrrrf12312
C: 3423///<<asdr>>
C: 12312
H: 12//as//12
C: 12//<<a>>
C: //12
The lines marked H: are human input (the H: was not part of the input); the lines marked C: are computer output.
Next attempt
The trouble with concentrating on grotesquely overlong words (1023-characters and more) is that you can overlook the simple. With scanf() reading data, you automatically get single 'words' with no spaces in them as input. Here's a debugged version of stemstring() with debugging printing code in place. The problem was two off-by-one errors. One was in the assignment s[i-1] = '\0'; where the -1 was not needed. The other was in the handling of the end of a string of letters; the while ((ch = *src++) != '\0') leftsrcone place too far, which led to interesting effects with short words entered after long words (when the difference in length was 2 or more). There's a fairly detailed trace of the test case I devised, using words such as 'great' and 'book' which you diagnosed (correctly) as being mishandled. Thestem()` function here simply prints its inputs and outputs, and returns the full length of the string (so there is no stemming occurring).
#include <stdio.h>
#include <ctype.h>
#include <assert.h>
#define LETTER(x) isalpha(x)
#define MAX(x, y) (((x) > (y)) ? (x) : (y))
static int stem(const char *s, int begin, int end)
{
int len = end - begin + 1;
assert(s != 0);
printf("ST (%d,%d) <<%*.*s>> RV %d\n", begin, end, len, len, s, len);
// return MAX(end - begin - 3, 3);
return len;
}
static void stemstring(const char *src)
{
char ch;
printf("-->> stemstring: <<%s>>\n", src);
while ((ch = *src++) != '\0')
{
if (ch != '\0')
printf("LP <<%c%s>>\n", ch, src);
if (LETTER(ch))
{
int i = 0;
char s[1024];
s[i++] = ch;
while ((ch = *src++) != '\0' && LETTER(ch))
s[i++] = ch;
src--;
s[i] = '\0';
printf("RD (%d) <<%s>>\n", i, s);
s[stem(s, 0, i-1)+1] = '\0';
/* the previous line calls the stemmer and uses its result to
zero-terminate the string in s */
printf("RS <<%s>>\n", s);
}
else
printf("NL <<%c>>\n", ch);
}
//putchar('\n');
printf("<<-- stemstring\n");
}
int main(void)
{
char string[1024];
while (scanf("%1023s", string) == 1)
stemstring(string);
return(0);
}
The debug-laden output is shown (the first line is the typed input; the rest is the output from the program):
what a great book this is! What.hast.thou.done?
-->> stemstring: <<what>>
LP <<what>>
RD (4) <<what>>
ST (0,3) <<what>> RV 4
RS <<what>>
<<-- stemstring
-->> stemstring: <<a>>
LP <<a>>
RD (1) <<a>>
ST (0,0) <<a>> RV 1
RS <<a>>
<<-- stemstring
-->> stemstring: <<great>>
LP <<great>>
RD (5) <<great>>
ST (0,4) <<great>> RV 5
RS <<great>>
<<-- stemstring
-->> stemstring: <<book>>
LP <<book>>
RD (4) <<book>>
ST (0,3) <<book>> RV 4
RS <<book>>
<<-- stemstring
-->> stemstring: <<this>>
LP <<this>>
RD (4) <<this>>
ST (0,3) <<this>> RV 4
RS <<this>>
<<-- stemstring
-->> stemstring: <<is!>>
LP <<is!>>
RD (2) <<is>>
ST (0,1) <<is>> RV 2
RS <<is>>
LP <<!>>
NL <<!>>
<<-- stemstring
-->> stemstring: <<What.hast.thou.done?>>
LP <<What.hast.thou.done?>>
RD (4) <<What>>
ST (0,3) <<What>> RV 4
RS <<What>>
LP <<.hast.thou.done?>>
NL <<.>>
LP <<hast.thou.done?>>
RD (4) <<hast>>
ST (0,3) <<hast>> RV 4
RS <<hast>>
LP <<.thou.done?>>
NL <<.>>
LP <<thou.done?>>
RD (4) <<thou>>
ST (0,3) <<thou>> RV 4
RS <<thou>>
LP <<.done?>>
NL <<.>>
LP <<done?>>
RD (4) <<done>>
ST (0,3) <<done>> RV 4
RS <<done>>
LP <<?>>
NL <<?>>
<<-- stemstring
The techniques shown - printing diagnostic information at key points in the program - is one way of debugging a program such as this. The alternative is stepping through the code with a source code debugger - gdb or its equivalent. I probably more often use print statements, but I'm an old fogey who finds IDE's too hard to use (because they don't behave like the command line I'm used to).
Granted, it isn't your code any more, but I do think you should have been able to do most of the debugging yourself. I'm grateful that you reported the trouble with my code. However, you also need to learn how to diagnose problems in other people's code; how to instrument it; how to characterize and locate the problems. You could then report the problem with precision - "you goofed with your end of word condition, and ...".