I came across a weird issue when testing my opensource project MHTextSearch. At line 169 of MHTextIndex.m:
uint64_t maxLength = [indexedString maximumLengthOfBytesUsingEncoding:encoding];
// some code
for (...) {
[indexedString getBytes:keyPtr
maxLength:maxLength
usedLength:&usedLength
encoding:encoding
options:NSStringEncodingConversionAllowLossy
range:subRange
remainingRange:NULL];
// some more code
}
Nothing, anywhere else, modifies maxLength. On the second iteration, maxLength is equal to 0, no matter what its previous value was. If I set a watchpoint on it, I see it changes in -[NSString getBytes:maxLength:usedLength:encoding:options:range:remainingRange:], at
0x100038d19: movq -0x30(%rbp), %rdx
0x100038d1d: movq %rdx, (%rsi)
0x100038d20: testq %rcx, %rcx << this instruction
The very weird thing about this is that it only happens on the x86_64 architecture, and it can be fixed if I change the code like this
uint64_t maxLength = [indexedString maximumLengthOfBytesUsingEncoding:encoding];
uint64_t strLength = maxLength;
// some code
for (...) {
[indexedString getBytes:keyPtr
maxLength:strLength
usedLength:&usedLength
encoding:encoding
options:NSStringEncodingConversionAllowLossy
range:subRange
remainingRange:NULL];
// some more code
}
With this code, maxLength still gets changed to 0 at the same instruction, but strLength stays consistent, so the effect is removed.
How come?
usedLength has the wrong type. It's declared uint32_t. However, it should be declared NSUInteger, which is 32 bits on 32 bit architectures and 64 bits on 64 bit architectures.
Related
I have a trivial program:
int main(void)
{
const char sname[]="xxx";
sem_t *pSemaphor;
if ((pSemaphor = sem_open(sname, O_CREAT, 0644, 0)) == SEM_FAILED) {
perror("semaphore initilization");
exit(1);
}
sem_unlink(sname);
sem_close(pSemaphor);
}
When I run it under valgrind, I get the following error:
==12702== Syscall param write(buf) points to uninitialised byte(s)
==12702== at 0x4E457A0: __write_nocancel (syscall-template.S:81)
==12702== by 0x4E446FC: sem_open (sem_open.c:245)
==12702== by 0x4007D0: main (test.cpp:15)
==12702== Address 0xfff00023c is on thread 1's stack
==12702== in frame #1, created by sem_open (sem_open.c:139)
The code was extracted from a bigger project where it ran successfully for years, but now it is causing segmentation fault.
The valgrind error from my example is the same as seen in the bigger project, but there it causes a crash, which my small example doesn't.
I see this with glibc 2.27-5 on Debian. In my case I only open the semaphores right at the start of a long-running program and it seems harmless so far - just annoying.
Looking at the code for sem_open.c which is available at:
https://code.woboq.org/userspace/glibc/nptl/sem_open.c.html
It seems that valgrind is complaining about the line (270 as I look now):
if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem.initsem, sizeof (sem_t)))
== sizeof (sem_t)
However sem.initsem is properly initialised earlier in a fairly baroque manner, firstly by explicitly setting fields in the sem.newsem (part of the union), and then once that is done by a call to memset (L226-228):
/* Initialize the remaining bytes as well. */
memset ((char *) &sem.initsem + sizeof (struct new_sem), '\0',
sizeof (sem_t) - sizeof (struct new_sem));
I think that this particular shenanigans is all quite optimal, but we need to make sure that all of the fields of new_sem have actually been initialised... we find the definition in https://code.woboq.org/userspace/glibc/sysdeps/nptl/internaltypes.h.html and it is this wonderful creation:
struct new_sem
{
#if __HAVE_64B_ATOMICS
/* The data field holds both value (in the least-significant 32 bytes) and
nwaiters. */
# if __BYTE_ORDER == __LITTLE_ENDIAN
# define SEM_VALUE_OFFSET 0
# elif __BYTE_ORDER == __BIG_ENDIAN
# define SEM_VALUE_OFFSET 1
# else
# error Unsupported byte order.
# endif
# define SEM_NWAITERS_SHIFT 32
# define SEM_VALUE_MASK (~(unsigned int)0)
uint64_t data;
int private;
int pad;
#else
# define SEM_VALUE_SHIFT 1
# define SEM_NWAITERS_MASK ((unsigned int)1)
unsigned int value;
int private;
int pad;
unsigned int nwaiters;
#endif
};
So if we __HAVE_64B_ATOMICS then the structure has a data field which contains both the value and the nwaiters, otherwise these are separate fields.
In the initialisation of sem.newsem we can see that these are initialised correctly, as follows:
#if __HAVE_64B_ATOMICS
sem.newsem.data = value;
#else
sem.newsem.value = value << SEM_VALUE_SHIFT;
sem.newsem.nwaiters = 0;
#endif
/* pad is used as a mutex on pre-v9 sparc and ignored otherwise. */
sem.newsem.pad = 0;
/* This always is a shared semaphore. */
sem.newsem.private = FUTEX_SHARED;
I'm doing all of this on a 64-bit system, so I think that valgrind is complaining about the initialisation of the 64-bit sem.newsem.data with a 32-bit value since from:
value = va_arg (ap, unsigned int);
We can see that value is defined simply as an unsigned int which will usually still be 32 bits even on a 64-bit system (see What should be the sizeof(int) on a 64-bit machine?), but that should just be an implicit cast to 64-bits when it is assigned.
So I think this is not a bug - just valgrind getting confused.
-(void)InitWithPwd:(char *)pPwd
{
char szResult[17];
//generate md5 checksum
CC_MD5(pPwd, strlen(pPwd),&szResult[0]);
szResult[16] = 0;
m_csPasswordHash[0]=0;
for(int i = 0;i < 16;i++)
{
char sz[3] = {'\0'};
//crash in blow row. The first pass is ok. The third pass crash.
//I can't understand.
sprintf(&sz[0],"%2.2x",szResult[i]);
strcat(m_csPasswordHash,sz);
}
m_csPasswordHash[32] = 0;
printf("pass:%s\n",m_csPasswordHash);
m_ucPacketType = 1;
}
I want to get the md5 of the password. But above code crash again and again. I can't understand why.
Your buffer (sz) is too small, causing sprintf() to generate a buffer overflow which leads to undefined behavior, in your case a crash.
Note that szResult[1] might be a negative value when viewed as an int (which happens when passing a char-type value to sprintf()), which can cause sprintf() to disregard your field width and precision directives in order to format the full value.
Here is an example showing this problem. The example code is written in C, but that shouldn't matter for this case.
This solves the problem by making sure the incoming data is considered unsigned:
sprintf(sz, "%02x", (unsigned char) szResult[i]);
My (Cocoa)-App runs perfectly fine when I start it from within Xcode. However when I archive / release it, that version will crash. The error reporter that pops open says:
[...]
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Application Specific Information:
[9082] stack overflow
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff944a7212 __pthread_kill + 10
1 libsystem_c.dylib 0x00007fff9290caf4 pthread_kill + 90
2 libsystem_c.dylib 0x00007fff92950e9e __abort + 159
3 libsystem_c.dylib 0x00007fff92951d17 __stack_chk_fail + 195
[...]
While it also gives me the line of code where execution stopped, this really does not help me as the exact same code path is executed in debug mode (but succeedes).
So I'm wondering: might it actually be that the stack sizes are different for a release and a debug version? And how big is the stack after all on a Mac (64 bit / Mountain Lion)? I'm not aware of putting insanely much data on the stack...
If I have too much data on the stack, what patterns would I need to avoid to reduce my stack load?
[Update]
ok, I got my App running by adding the -fno-stack-protector flag. (BTW: I'm compiling with LLVM)
Before that, I stepped through the crashing code line by line and found the following bahaviour which I don't understand:
the method foo(x) calls bacon(x). x is 8 and is handed from foo to bacon without modification. Yet when I step into bacon(x), x suddenly is 4295939448 (each time). If I set -fno-stack-protector the value is correct.
To my naive eyes, this looks as if the stack-protector sets the magic value 4295939448 somewhere in the stack and makes it read-only. And while my functions put their parameters on the stack, at some point parameter x happens to be put on that magic address and thus cannot be written (subsequent parameters seem to be written correctly). In my case x is a buffer length parameter, which naturally leads to a buffer-overflow and crash.
Does somebody have a deeper understanding of the stack-protector? Why is this happening? And in what cases is it safe and legal to disable the stack-protector and in what cases is it dangerous?
[Update 2: Original Code]
This method calls the other Decrypt below. stIVLen at that point is 8
BOOL CEracomSE::Decrypt(
PBYTE pMsg, size_t stLen,
const CSK* pKey /* = NULL */,
PBYTE pIV /* = NULL */, size_t stIVLen /* = 0 */,
FBM fbm /* = FBM_CBC */,
PADDING padding /* = NO_PADDING */
)
{
//stIVLen == 8
return Decrypt( (uint64_t)0, pMsg, stLen, pKey, pIV, stIVLen, fbm, padding );
}
When Decrypt is called stIVLen is 4295939448, the other parameter are still correct
BOOL CEracomSE::Decrypt(
uint64_t qwOffset,
PBYTE pMsg, size_t stLen,
const CSK* pKey /* = NULL */,
PBYTE pIV /* = NULL */, size_t stIVLen /* = 0 */,
FBM fbm /* = FBM_CBC */,
PADDING padding /* = NO_PADDING */
)
{
//stIVLen now is 4295939448
BYTE a_iv[16] = {0};
size_t a_iv_len;
BYTE a_key[32] = {0};
size_t a_key_len = 0;
size_t nBytes;
size_t nDataOffset;
size_t nRemainingData = stLen;
bool ret;
//[...]
}
I recently faced this situation with my app. I know its an old thread but responding anyways with the intent that someone else could benefit from the findings.
Passing -fno-stack-protector did solve the problem as suggested in the question. However, digging a little deeper we found that, changing all occurrences of literal array declaration to a longer declaration did solve the problem without passing the compiler flag.
so changing all occurrences of
#[#(1), #(2)]
to
[NSArray arrayWithObjects:#(1), #(2), nil]
May be its specific to our app only but hope it helps others as well.
I have a very useful macro defined in .gdbinit
define rc
call (int)[$arg0 retainCount]
end
Is there anyway to define the same macro for lldb ?
You can do that with the following command definition in lldb:
command regex rc 's/(.+)/print (int)[%1 retainCount]/'
Example:
(lldb) rc indexPath
print (int)[indexPath retainCount]
(int) $2 = 2
You can put that into ~/.lldbinit (and restart Xcode).
One should think that something like
command alias rc print (int)[%1 retainCount]
should work, but as explained in I can't get this simple LLDB alias to work the %1 expansion does not work with expression, and command regex is a workaround.
Incidentally, on architectures where function arguments are passed in registers (x86_64, armv7), lldb defines a series of register aliases that map to the register used to pass the integral values -- arg1, arg2, etc. For instance,
#include <stdio.h>
int main ()
{
char *mytext = "hello world\n";
puts (mytext);
return 0;
}
and we can easily see what argument is passed in to puts without having to remember the ABI conventions,
4 char *mytext = "hello world\n";
-> 5 puts (mytext);
6 return 0;
7 }
(lldb) p mytext
(char *) $0 = 0x0000000100000f54 "hello world\n"
(lldb) br se -n puts
Breakpoint created: 2: name = 'puts', locations = 1, resolved = 1
(lldb) c
Process 2325 resuming
Process 2325 stopped
libsystem_c.dylib`puts:
-> 0x7fff99ce1d9a: pushq %rbp
0x7fff99ce1d9b: movq %rsp, %rbp
0x7fff99ce1d9e: pushq %rbx
0x7fff99ce1d9f: subq $56, %rsp
(lldb) p/x $arg1
(unsigned long) $2 = 0x0000000100000f54
(lldb)
x86_64 and armv7 both pass the first "few" integral values in registers - beyond that they can get stored on the stack or other places and these aliases don't work. lldb doesn't currently provide similar convenience aliases for floating point arguments. But for the most common cases, this is covers what people need.
For this example, I am working with objective-c, but answers from the broader C/C++ community are welcome.
#interface BSWidget : NSObject {
float tre[3];
}
#property(assign) float* tre;
.
- (void)assignToTre:(float*)triplet {
tre[0] = triplet[0];
tre[1] = triplet[1];
tre[2] = triplet[2];
}
.
- (void)copyToTre:(float*)triplet {
memcpy(tre, triplet, sizeof(tre) );
}
So between these two approaches, and considering the fact that these setter functions will only generally handle dimensions of 2,3, or 4...
What would be the most efficient approach for this situation?
Will gcc generally reduce these to the same basic operations?
Thanks.
A quick test seems to show that the compiler, when optimising, replaces the memcpy call with the instructions to perform the assignment.
Disassemble the following code, when compiled unoptimised and with -O2, shows that in the optimised case the testMemcpy function does not contain a call to memcpy.
struct test src = { .a=1, .b='x' };
void testMemcpy(void)
{
struct test *dest = malloc(sizeof(struct test));
memcpy(dest, &src, sizeof(struct test));
}
void testAssign(void)
{
struct test *dest = malloc(sizeof(struct test));
*dest = src;
}
Unoptimised testMemcpy, with a memcpy call as expected
(gdb) disassemble testMemcpy
Dump of assembler code for function testMemcpy:
0x08048414 <+0>: push %ebp
0x08048415 <+1>: mov %esp,%ebp
0x08048417 <+3>: sub $0x28,%esp
0x0804841a <+6>: movl $0x8,(%esp)
0x08048421 <+13>: call 0x8048350 <malloc#plt>
0x08048426 <+18>: mov %eax,-0xc(%ebp)
0x08048429 <+21>: movl $0x8,0x8(%esp)
0x08048431 <+29>: movl $0x804a018,0x4(%esp)
0x08048439 <+37>: mov -0xc(%ebp),%eax
0x0804843c <+40>: mov %eax,(%esp)
0x0804843f <+43>: call 0x8048340 <memcpy#plt>
0x08048444 <+48>: leave
0x08048445 <+49>: ret
Optimised testAssign
(gdb) disassemble testAssign
Dump of assembler code for function testAssign:
0x080483f0 <+0>: push %ebp
0x080483f1 <+1>: mov %esp,%ebp
0x080483f3 <+3>: sub $0x18,%esp
0x080483f6 <+6>: movl $0x8,(%esp)
0x080483fd <+13>: call 0x804831c <malloc#plt>
0x08048402 <+18>: mov 0x804a014,%edx
0x08048408 <+24>: mov 0x804a018,%ecx
0x0804840e <+30>: mov %edx,(%eax)
0x08048410 <+32>: mov %ecx,0x4(%eax)
0x08048413 <+35>: leave
0x08048414 <+36>: ret
Optimised testMemcpy does not contain a memcpy call
(gdb) disassemble testMemcpy
Dump of assembler code for function testMemcpy:
0x08048420 <+0>: push %ebp
0x08048421 <+1>: mov %esp,%ebp
0x08048423 <+3>: sub $0x18,%esp
0x08048426 <+6>: movl $0x8,(%esp)
0x0804842d <+13>: call 0x804831c <malloc#plt>
0x08048432 <+18>: mov 0x804a014,%edx
0x08048438 <+24>: mov 0x804a018,%ecx
0x0804843e <+30>: mov %edx,(%eax)
0x08048440 <+32>: mov %ecx,0x4(%eax)
0x08048443 <+35>: leave
0x08048444 <+36>: ret
Speaking from a C background, I recommend using direct assignment. That version of the code is more obvious as to your intent, and less error-prone if your array changes in the future and adds extra indices that your function doesn't need to copy.
The two are not strictly equivalent. memcpy is typically implemented as a loop that copies the data in fixed-size chunks (that may be smaller than a float), so the compiler probably won't generate the same code for the memcpy case. The only way to know for sure is to build it both ways and look at the emitted assembly in a debugger.
Even if the memcpy call is inlined, it will probably result in more code and slower execution time. The direct assignment case should be more efficient (unless your target platform requires special code to handle float datatypes). This is only an educated guess, however; the only way to know for sure is to try it both ways and profile the code.
memcpy:
Do function prolog.
Initialize counter and pointers.
Check if have bytes to copy.
Copy memory.
Increment pointer.
Increment pointer.
Increment counter.
Repeat 3-7 3 or 11 more times.
Do function epilog.
Direct assignment:
Copy memory.
Copy memory.
Copy memory.
As you see, direct assignment is much faster.