I try to profile a simple c prog using valgrind:
[zsun#nel6005001 ~]$ valgrind --tool=memcheck ./fl.out
==2238== Memcheck, a memory error detector
==2238== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==2238== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==2238== Command: ./fl.out
==2238==
==2238==
==2238== HEAP SUMMARY:
==2238== in use at exit: 1,168 bytes in 1 blocks
==2238== total heap usage: 1 allocs, 0 frees, 1,168 bytes allocated
==2238==
==2238== LEAK SUMMARY:
==2238== definitely lost: 0 bytes in 0 blocks
==2238== indirectly lost: 0 bytes in 0 blocks
==2238== possibly lost: 0 bytes in 0 blocks
==2238== still reachable: 1,168 bytes in 1 blocks
==2238== suppressed: 0 bytes in 0 blocks
==2238== Rerun with --leak-check=full to see details of leaked memory
==2238==
==2238== For counts of detected and suppressed errors, rerun with: -v
==2238== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 12 from 8)
Profiling timer expired
The c code I am trying to profile is the following:
void forloop(void){
int fac=1;
int count=5;
int i,k;
for (i = 1; i <= count; i++){
for(k=1;k<=count;k++){
fac = fac * i;
}
}
}
"Profiling timer expired" shows up, what does it mean? How to solve this problem? thx!
The problem is that you are using valgrind on a program compiled with -pg. You cannot use valgrind and gprof together. The valgrind manual suggests using OProfile if you are on Linux and need to profile the actual emulation of the program under valgrind.
By the way, this isn't computing factorial.
If you're really trying to find out where the time goes, you could try stackshots. I put an infinite loop around your code and took 10 of them. Here's the code:
6: void forloop(void){
7: int fac=1;
8: int count=5;
9: int i,k;
10:
11: for (i = 1; i <= count; i++){
12: for(k=1;k<=count;k++){
13: fac = fac * i;
14: }
15: }
16: }
17:
18: int main(int argc, char* argv[])
19: {
20: int i;
21: for (;;){
22: forloop();
23: }
24: return 0;
25: }
And here are the stackshots, re-ordered with the most frequent at the top:
forloop() line 12
main() line 23
forloop() line 12 + 21 bytes
main() line 23
forloop() line 12 + 21 bytes
main() line 23
forloop() line 12 + 9 bytes
main() line 23
forloop() line 13 + 7 bytes
main() line 23
forloop() line 13 + 3 bytes
main() line 23
forloop() line 6 + 22 bytes
main() line 23
forloop() line 14
main() line 23
forloop() line 7
main() line 23
forloop() line 11 + 9 bytes
main() line 23
What does this tell you? It says that line 12 consumes about 40% of the time, and line 13 consumes about 20% of the time. It also tells you that line 23 consumes nearly 100% of the time.
That means unrolling the loop at line 12 might potentially give you a speedup factor of 100/(100-40) = 100/60 = 1.67x approximately. Of course there are other ways to speed up this code as well, such as by eliminating the inner loop, if you're really trying to compute factorial.
I'm just pointing this out because it's a bone-simple way to do profiling.
You are not going to be able to compute 10000! like that. You will need some sort of bignum implementation for computing factorials. This is because int is "usually" 4 bytes long which means that "usually" it can hold 2^32 - 1 (signed int, 2^31) - 13! is more than that. Even if you used an unsigned long ("usually" 8 bytes) you'd overflow by the time you reached 21!.
As for what it "profiling timer expired" means - it means valgrind received the signal SIGPROF: http://en.wikipedia.org/wiki/SIGPROF (probably means your program took too long).
Related
For an iCE40 1k device, Following is the snippet from the output of the command "iceunpack -vv example.bin"
I could not understand why there are 332x144 bits?
My understanding is that [1], the CRAM BLOCK[0] starts at the logic tile (1,1), and it should contain:
48 logic tiles, each 54x16,
14 IO tiles, each 18x16
How the "332 x 144" is calculated?
Where does the IO tile and logic tiles bits are mapped in CRAM BLOCK[0] bits?
e.g., which bits of CRAM BLOCK[0] indicates the bits for logic tile (1,1) and bits for IO tile (0,1)?
Set bank to 0.
Next command at offset 26: 0x01 0x01
CRAM Data [0]: 332 x 144 bits = 47808 bits = 5976 bytes
Next command at offset 6006: 0x11 0x01
[1]. http://www.clifford.at/icestorm/format.html
Thanks.
height=9x16=144 (1 I/O tile and 8 Logic tiles)
Width=18+42+5x54 = 330 (1 I/O tile, 1 ram tile and 5 Logic tiles) plus "two zero bytes" = 332.
It's very difficult for me to understand GDT (Global Descriptor Table) in JOS (xv6-rev7)
For example
.word (((lim) >> 12) & 0xffff), ((base) & 0xffff);
Why shift right 12? Why AND 0xffff?
What do these number mean?
What does the formula mean?
Can anyone give me some resources or tutorials or hints?
Here, It's two parts of snippet code as following for my problem.
1st Part
0654 #define SEG_NULLASM \
0655 .word 0, 0; \
0656 .byte 0, 0, 0, 0
0657
0658 // The 0xC0 means the limit is in 4096−byte units
0659 // and (for executable segments) 32−bit mode.
0660 #define SEG_ASM(type,base,lim) \
0661 .word (((lim) >> 12) & 0xffff), ((base) & 0xffff); \
0662 .byte (((base) >> 16) & 0xff), (0x90 | (type)), \
0663 (0xC0 | (((lim) >> 28) & 0xf)), (((base) >> 24) & 0xff)
0664
0665 #define STA_X 0x8 // Executable segment
0666 #define STA_E 0x4 // Expand down (non−executable segments)
0667 #define STA_C 0x4 // Conforming code segment (executable only)
0668 #define STA_W 0x2 // Writeable (non−executable segments)
0669 #define STA_R 0x2 // Readable (executable segments)
0670 #define STA_A 0x1 // Accessed
2nd Part
8480 # Bootstrap GDT
8481 .p2align 2 # force 4 byte alignment
8482 gdt:
8483 SEG_NULLASM # null seg
8484 SEG_ASM(STA_X|STA_R, 0x0, 0xffffffff) # code seg
8485 SEG_ASM(STA_W, 0x0, 0xffffffff) # data seg
8486
8487 gdtdesc:
8488 .word (gdtdesc − gdt − 1) # sizeof(gdt) − 1
8489 .long gdt # address gdt
The complete part: http://pdos.csail.mit.edu/6.828/2012/xv6/xv6-rev7.pdf
Well, it isn't a real formula at all. Limit is shifted twelve bits to right, what's equivalent to division by 2^12, what is 4096, and that is granularity of GDT entry base, when G bit is set (in your code G bit is encoded in constants you use in your macro). Whenever address is to be accessed using correnspondig selector, only higher 20 bits are compared with limit and if they're greater, #GP is thrown. Also note that standard pages are 4KB in size, so any number greater than limit by less than 4 kilobytes is handled by page corresponding selector limit. Landing is there partly for suppressing compiler warnings about number overflow, as the operand 0xFFFF is maximal value for single word (16 bits).
Same applies for other shifts and AND, where in other expressions numbers can be shifted more to get another parts.
The structure of GDT descriptor sees above.
((lim) >> 12) & 0xffff) corresponding to Segment Limit(Bit 0-15). Shift right means minimal unit is 2^12 byte(granularity of GDT entry base); && 0xffff means we need the lower 16 bits of lim) >> 12, which fits to lowest part of 16 bits of GDT descriptor.
The rest of the 'formula' is the same.
here is a good material for learning GTD descriptor.
I came across a strange error today, and I still don't understand it:
long long N = 2000;
long long N2 = N*N;
long long *s = malloc(sizeof(long long)*N2); // create array
// populate it
for (long long k = 1; k <= 55; k++) {
doesn't produce any errors, but
long long N = 2000;
long long N2 = N*N;
long long s[4000000]; // create array
// populate it
for (long long k = 1; k <= 55; k++) {
gives me a code=2 EXC_BAD_ACCESS on the for line before assigning 1 to k (according to the debugger), as if there was no space left to allocate another 8-byte variable. This code is at the beginning of a method; no other variables have been assigned or allocated. I'm guessing that I simply can't allocate a 4000000-element long long array to the stack, but somehow I can allocate it to the dynamic heap. Could someone please explain what's going on, what the limits are, etc.? This is Objective-C on a Mac running Mountain Lion, 2GB RAM. A long long is 8 bytes wide, so the array should only be 32MB; I can't see why this should be an issue.
Thank you!
(By the way, if the details look familiar, it's because this is the beginning of my solver for Project Euler's Problem 149. I've avoided mentioning any details of the solution here, as I've solved the problem already.)
Your first example allocates memory from the heap, which is in the “data segment”, and your second allocates memory on the stack, which is in the “stack segment”. Each of these has a different size limit.
The default stack segment size limit, according to Technical Q&A QA1419, is 8 MiB. You can double-check this by running ulimit -a in a terminal:
:; ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 256
pipe size (512 bytes, -p) 1
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 709
virtual memory (kbytes, -v) unlimited
As you can see, the stack size is limited to 8192 KiB = 8 MiB.
The Technical Q&A I linked above describes some ways to increase the stack size limit. The maximum to which you can increase it without running as root is 64 MiB.
If you create threads, each thread gets its own stack. According to the Q&A, you can set a thread's stack size up to 1 GiB if you use the NSThread API.
Auto locals are allocated on the stack; according to this technical note, the default stack size for an OSX process's main thread is 8MB, and less for additional threads. You can try the linker option or setrlimit solutions given in the note, but C tradition is to use the heap for any large allocations.
I am using valgrind on a program which runs an infinite loop.
As memcheck displays the memory leaks after the end of the program, but as my program has infinite loop it will never end.
So is there any way i can forcefully dump the data from valgrind time to time.
Thanks
Have a look at the client requests feature of memcheck. You can probably use VALGRIND_DO_LEAK_CHECK or similar.
EDIT:
In response to the statement above that this doesn't work. Here is an example program which loops forever:
#include <valgrind/memcheck.h>
#include <unistd.h>
#include <cstdlib>
int main(int argc, char* argv[])
{
while(true) {
char* leaked = new char[1];
VALGRIND_DO_LEAK_CHECK;
sleep(1);
}
return EXIT_SUCCESS;
}
When I run this in valgrind, I get an endless output of new leaks:
$ valgrind ./a.out
==16082== Memcheck, a memory error detector
==16082== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==16082== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==16082== Command: ./a.out
==16082==
==16082== LEAK SUMMARY:
==16082== definitely lost: 0 bytes in 0 blocks
==16082== indirectly lost: 0 bytes in 0 blocks
==16082== possibly lost: 0 bytes in 0 blocks
==16082== still reachable: 1 bytes in 1 blocks
==16082== suppressed: 0 bytes in 0 blocks
==16082== Reachable blocks (those to which a pointer was found) are not shown.
==16082== To see them, rerun with: --leak-check=full --show-reachable=yes
==16082==
==16082== 1 bytes in 1 blocks are definitely lost in loss record 2 of 2
==16082== at 0x4C2BF77: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16082== by 0x4007EE: main (testme.cc:9)
==16082==
==16082== LEAK SUMMARY:
==16082== definitely lost: 1 bytes in 1 blocks
==16082== indirectly lost: 0 bytes in 0 blocks
==16082== possibly lost: 0 bytes in 0 blocks
==16082== still reachable: 1 bytes in 1 blocks
==16082== suppressed: 0 bytes in 0 blocks
==16082== Reachable blocks (those to which a pointer was found) are not shown.
==16082== To see them, rerun with: --leak-check=full --show-reachable=yes
==16082==
==16082== 2 bytes in 2 blocks are definitely lost in loss record 2 of 2
==16082== at 0x4C2BF77: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16082== by 0x4007EE: main (testme.cc:9)
==16082==
==16082== LEAK SUMMARY:
==16082== definitely lost: 2 bytes in 2 blocks
==16082== indirectly lost: 0 bytes in 0 blocks
==16082== possibly lost: 0 bytes in 0 blocks
==16082== still reachable: 1 bytes in 1 blocks
==16082== suppressed: 0 bytes in 0 blocks
==16082== Reachable blocks (those to which a pointer was found) are not shown.
==16082== To see them, rerun with: --leak-check=full --show-reachable=yes
The program does not terminate.
with valgrind 3.7.0, you can trigger (a.o.) leak search from the shell,
using vgdb.
See e.g. http://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.monitor-commands
(you can do these monitor commands from gdb or from a shell command line, using vgdb).
Use of VALGRIND_DO_LEAK_CHECK (acm answer) works for me.
Remarks :
- Program has to be launch with valgrind (valgrind myProg ...)
- valgrind-devel package has to be installed (to have )
I'm using the command:
valgrind --tool=memcheck --leak-check=yes ./prog
When this runs with a test script, I get no inline error messages or warnings, I just get a Heap Summary and a leak summary.
Am I missing a flag or something?
==31420== HEAP SUMMARY:
==31420== in use at exit: 1,580 bytes in 10 blocks
==31420== total heap usage: 47 allocs, 37 frees, 7,132 bytes allocated
==31420==
==31420== 1,580 (1,440 direct, 140 indirect) bytes in 5 blocks are definitely lost in loss record 2 of 2
==31420== at 0x4C274A8: malloc (vg_replace_malloc.c:236)
==31420== by 0x400FD4: main (lab1.c:51)
==31420==
==31420== LEAK SUMMARY:
==31420== definitely lost: 1,440 bytes in 5 blocks
==31420== indirectly lost: 140 bytes in 5 blocks
==31420== possibly lost: 0 bytes in 0 blocks
==31420== still reachable: 0 bytes in 0 blocks
==31420== suppressed: 0 bytes in 0 blocks
==31420==
==31420== For counts of detected and suppressed errors, rerun with: -v
==31420== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
The last time I used valgrind (a few days ago) it would print out error messages as they occurred, in addition to the heap and leak summaries.
EDIT:
I tried leak-check=full, same result
The line mentioned in the heap summary (lab1.c:51) is:
temp_record = malloc(sizeof(struct server_record));
And I use this pointer pretty often in my code. That is what was so helpful about the valgrind error messages before, they would show me when I would lose my pointer to this malloc or other problems.