Valgrind(memcheck) not showing all contexts - valgrind

My last context/error I see in my valgrind output file is...
==3030== 1075 errors in context 61 of 540:
==3030== Syscall param ioctl(SIOCETHTOOL,ir) points to uninitialised byte(s)
==3030== at 0x7525248: ioctl (syscall-template.S:84)
==3030== by 0x686A2A7: ??? (in /lib/libpal.so)
==3030== Address 0x96cf958 is on thread 16's stack
==3030== Uninitialised value was created by a stack allocation
==3030== at 0x686A20C: ??? (in /lib/libpal.so)
...but I don't see error contexts 62 - 540. My first thought was maybe in closing the program, valgrind crashed, but after this context it printed the ERROR SUMMARY
ERROR SUMMARY: 9733 errors from 540 contexts (suppressed: 0 from 0)
I don't think it's because we came across a frame without debug info because I can see this exact same issue get hit the first time at the very beginning of my output file. Or maybe the printing of error contexts specifically, is halted when a stacktrace has missing debug info?
Any ideas? Need an additional command line argument for valgrind? I know in helgrind it'll quit after seeing 1000000 errors(something like that) but it explicitly tells you what it's doing.

So for my version of valgrind I also executed helgrind and saw all contexts(647) as expected. I think the problem above is simply a result of valgrind coming across a frame with no debug symbols and saying, "If there's no debug info, I'm moving on"
All of my logs I'm producing end with this same libpal frame at various context numbers 100-something, 200-something, etc.

Related

In u-boot compile, linker script's sdram start and length are set by CONFIG_SPL_BSS_START_ADDR and CONFIG_SPL_BSS_MAX_SIZE values, why?

I'm trying to build u-boot for our simple test board. (arm64)
After setting in include/configs/ab21m.h (our board),
#define CONFIG_SPL_BSS_START_ADDR 0x4f00000
#define CONFIG_SPL_BSS_MAX_SIZE SZ_32K
when I compile it, it gives me error while linking u-boot-spl. The error message is like this.
===================== WARNING ======================
This board does not use CONFIG_DM_ETH (Driver Model
for Ethernet drivers). Please update the board to use
CONFIG_DM_ETH before the v2020.07 release. Failure to
update by the deadline may result in board removal.
See doc/driver-model/migration.rst for more info.
====================================================
UPD include/generated/timestamp_autogenerated.h
CFGCHK u-boot.cfg
CC cmd/version.o
AR cmd/built-in.o
LD u-boot
CC spl/common/spl/spl.o
OBJCOPY u-boot.srec
OBJCOPY u-boot-nodtb.bin
SYM u-boot.sym
RELOC u-boot-nodtb.bin
COPY u-boot.bin
MKIMAGE u-boot.img
LD u-boot.elf
AR spl/common/spl/built-in.o
LD spl/u-boot-spl
aarch64-none-elf-ld.bfd: invalid length for memory region .sdram
make[1]: *** [scripts/Makefile.spl:509: spl/u-boot-spl] Error 1
make: *** [Makefile:1984: spl/u-boot-spl] Error 2
make: *** Waiting for unfinished jobs....
By the way, the linker script for spl starts like this after the build.
MEMORY { .sram : ORIGIN = 0x4000000,
LENGTH = (14*1024*1024) }
MEMORY { .sdram : ORIGIN = 0x4f00000,
LENGTH = SZ_32K }
OUTPUT_FORMAT("elf64-littleaarch64", "elf64-littleaarch64", "elf64-littleaarch64")
OUTPUT_ARCH(aarch64)
ENTRY(_start)
SECTIONS
{
This is strange because this 0x4f00000 and SZ_32K is what I gave for CONFIG_SPL_BSS_START_ADDR and CONFIG_SPL_BSS_MAX_SIZE. I placed this range in the on-chip RAM area some enough space above CONFIG_SPL_TEXT_BASE and below CONFIG_SPL_STACK with enough stack space. (I referenced imx8mm_evk board). What should I correct?
BTW, I found CONFIG_SYS_SDRAM_BASE, CONFIG_SYS_INIT_RAM_ADDR are all set to 0x40000000in imx8mm_evk, which is the DRAM start addrss. But CONFIG_SYS_INIT_RAM_SIZE is set to 0x200000 (2MB) where there is actually 3072MB DDR. Why is this value set to small size?
I asked two qeustions. Any help will be very much appreciated.
So, your first problem is a literal one. You have SZ_32K as the value for CONFIG_SPL_BSS_MAX_SIZE but since you're likely lacking #include <linux/sizes.h> in include/configs/ab21m.h you're not getting that constant evaluated.
As for what this is all doing, and why you should likely use something more like 2MB as seen on other platforms and place it in SDRAM rather than much smaller on-chip memory, if you look at arch/arm/cpu/armv8/u-boot-spl.lds you can see we're defining where the BSS should reside and that's likely larger than 32KB (and you'll get an overflow error when linking, if so).

valgrind give error when printing the second line to file

I'm using valgrind to find faults in my code. The command I use is
valgrind --leak-check=yes ./a.out
and I compile the code with -g code alone. I get many errors pointing to a single write line (The three printed values are initialized and well defined).
write (22,*) avlength, stdlength, avenergy
All with the Conditional jump or move depends on uninitialised value(s) error. The said line is the second line from a bunch of lines printing to a single file. At the end of the errors, I get two more, one pointing to the line opening the file
resStep = int(conf*100/iterate)
if (resStep.lt.10) then
write (resFile, "(A5,I1)") "res00",resStep
elseif (ResStep.lt.100) then
write (resFile, "(A4,I2)") "res0",resStep
else
write (resFile, "(A3,I1)") "res",resStep
endif
open (unit=22,file=trim(resFile),status='replace',
c action='write')
resStep is integer. The error is Syscall param write(buf) points to uninitialised byte(s). Finally, I get an error Address 0x52d83f4 is 212 bytes inside a block of size 8,344 alloc'd when I flush the file (before closing it).
I can't find any logic here. If the problem is with opening the file in a faulty way, wouldn't I get the error at the first line?
I use f95 to compile this and my gcc version is 4.1.2. I can't upgrade any of it.
Wild guess: check the data type of resFile. Is it a string or a unit number?
My Fortran 95 is beyond rusty but try moving the call to open() before the calls to write() and pass an integer resUnit instead of resFile as the first argument to write():
CHARACTER(LEN=20):: resFile
INTEGER(KIND=2) :: resUnit, resStep
resStep = 1
resFile = 'MY-results'
resUnit = 22
open (unit=resUnit, file=trim(resFile), status='replace', action='write')
write(resUnit, "(A5,I1)") "res00", resStep
END

What could possibly lead to the following assembly execution result

I was debugging under IBM AIX with dbx. I was seeing the following:
(dbx) print $r4
0x00000001614aa050
(dbx) print *((int64*)0x00000001614aa050)
-1
(dbx) print $r3
0x0000000165e08468
Then I "stepi" my 64bit program which executed the following instruction:
std r3,0x0(r4)
I then immediately checked the content of that memory:
(dbx) print *((int64*)0x00000001614aa050)
-1
Still -1? I was expecting the content in $r3 should be saved to that
memory. I then manually assigned the value to that address using my
variables:
(dbx) print &bmc._pLong
0x00000001614aa050
(dbx) assign bmc._pLong=(int64 *)0x0000000165e08468
(dbx) print *((int64*)0x00000001614aa050)
6004180072 (which is 0x0000000165e08468)
How could that happen?
I assume, somehow, this is "pilot" error. e.g. you did stepi and then the std instruction was displayed? which means that that is the instruction it is about to execute -- not the instruction it executed -- at least I think that's right.
I would do some stepi's before and after and make sure I am understanding what stepi is doing. And, of course, print out the iar, and the instruction at the iar, to verify that dbx is not fibbing on you.

mod_perl segmentation fault

HI,
I'm running an apache 2.2.3 on an Oracle64-bit (Red Hat clone) and I'm hitting a brick wall with an issue. I have a program which utilizes MIME::Lite to send mail through sendmail (I apologize, not sure what versions of sendmail or mod_perl I'm running, although I do believe the sendmail portion is irrelevant as you'll see in a moment)
On occasion, apache will segfault (11), and digging deep into the MIME::Lite module, I see it is on the following line:
open SENDMAIL, "|$sendmailcmd" or Carp::croak "open |$sendmailcmd: $!\n"; (this is in MIME::Lite)
Now, one would automatically suspect sendmail, but if I did the same line to use /bin/cat (as shown):
open SENDMAIL, "|/bin/cat"
apache still segfaults.
I attached an strace to the apache processes and see the following:
(when it does NOT crash)
12907 write(2, "SENDMAIL send_by_sendmail 1\n", 28) = 28
12907 write(2, "SENDMAIL /usr/lib/sendmail -t -o"..., 40) = 40
12907 pipe([24, 26]) = 0
12907 pipe([28, 29]) = 0
12907 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b4bcbbd75d0) = 13186
Note the "SENDMAIL sent_by_sendmail" are my comments. You can clearly see pipes opening. When it DOES crash, you'll see the following:
10805 write(2, "SENDMAIL send_by_sendmail (for y"..., 40) = 40
10805 --- SIGSEGV (Segmentation fault) # 0 (0) ---
Now notice it never pipes. I've tried GDB and it hasn't really shown me anything.
Finally, I wrote a simple program to run through mod_perl and regular cgi:
print header();
print "test";
open SENDMAIL, "|/bin/cat" or Carp::croak "open |sendmailcmd: $!\n";
print SENDMAIL "foodaddy";
close SENDMAIL;
print "test done <br/>";
Under mod_perl it has successfully crashed.
My analysis is telling me it has to do with it trying to open a file handle, the piping function returns either false or a corrupt file handle.
I also increased the file descriptor limit to 2048, no dice.
Does anyone have any thoughts as to where I should look? Any thoughts?
I appreciate the help
I just spent a long time tracking down a problem that started with identical symptoms. I eventually discovered that Test::More does not play well with mod_perl . Removing this module from my code appears to have solved the problem (so far!). I didn't follow this any deeper, but I suspect that the problem actually lies in Test::Builder.
I managed to treat perhaps only the symptoms, not the cause. I happened to have this issue when used global/package scope variables on the package level, used inside a perl object instance, as soon as I passed them as object properties instead, not as automatic default perl variables scoping, I stopped to experience perl segmentation fault suddenly.

Trying to understand crash log output

I'm trying to understand debug output from a crash log. I have the following line from the crashlog:
22 FG 0x00022b94 0x1000 + 138132
I understand how to use atos on 0x00022b94 to get the source code location.
What I would like to know is why the crash log helpfully splits that number into 0x1000 + 138132? I have googled and the googles failed me.
0x1000 is where the __TEXT segment of that binary file (your app or some dylib) is mapped to, and 138132
is the (decimal) offset from that origin. This separation allows program to find the error location in a position-independent way.