Can't find uninitialised values causing valgrind issues

Can't find uninitialised values causing valgrind issues - valgrind

Okay, so I'm trying to work on this hangman program, and when I check it with valgrind i keep getting Conditional jump or move depends on uninitialized value(s) with my strcpy and
Here is the header source file:
#include "randword.h"
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static char **list;
void InitDictionary(char* name){
int size=0, ln;
FILE *fp = fopen(name, "r");
list = (char**)malloc(50*sizeof(char*));
*list = (char*)malloc(50*sizeof(char));
while(fgets(*(list+size), 49, fp)){
for(ln=0;*(*(list+size)+ln)!='\0';ln++){}
ln=ln-1;
if (*(*(list+size)+ln) == '\n')
*(*(list+size)+ln) = '\0';
size++;
*(list+size) = (char*)malloc(50*sizeof(char));
}
fclose(fp);
}
void ChooseRandomWord(char** word){
int randIndex,num;
for(num=0;*(list+num)!=NULL;num++){}
srand(time(NULL));
randIndex = rand() % (num-1);
strcpy(*word, *(list+randIndex));
for(;num>=0;num--)
free(*(list+num));
free(list);
}
and here is the main file itself:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "randword.h"
int main(){
char *word, *reveal, guess;
int guesses = 8,i,x,index,correct=0,istrue;
reveal = (char*)malloc(50*sizeof(char));
word = (char*)malloc(50*sizeof(char));
InitDictionary("words.txt");
ChooseRandomWord(&word);
for(x=0;*(word+x)!='\0';x++){
*(reveal+x)='-';
}
printf("Welcome to Hangman!\nI will guess a secret word. On each turn, you guess\na letter. If the letter is in the secret word, I\nwill show you where it appears; if not, a part of\nyour body gets strung up on the scaffold. The\nobject is to guess the word before you are hung.\n");
while(guesses>0){
istrue=0;
printf("Your word now looks like this: %s\nYou have %d guesses left.\n", reveal, guesses);
printf("Guess a letter: ");
scanf("%c", &guess);
getchar();
guess = toupper(guess);
for(index=0;*(word+index)!='\0';index++){
if(*(word+index)==guess){
if(*(word+index)==*(reveal+index)){
istrue=2;
break;
}
else{
istrue=1;
*(reveal+index)=*(word+index);
correct++;
}
}
}
if(istrue==2){
printf("You have already guessed %c.\n",guess);
}
else if(istrue){
printf("That is correct.\n");
}
else{
printf("There are no %c's in the word.\n", guess);
guesses--;
}
if(*(word+correct)=='\0'){
printf("You guessed the word: %s.\n", reveal);
free(reveal);
free(word);
return 0;
}
}
printf("You have run out of guesses and lost.\n");
free(word);
free(reveal);
return 0;
}
here is the valgrind output:
==5319== Conditional jump or move depends on uninitialised value(s)
==5319== at 0x80487F1: ChooseRandomWord (in /home/ulr146/GOZA_hw4/randword)
==5319== by 0x80488CB: main (in /home/ulr146/GOZA_hw4/randword)
==5319== Uninitialised value was created by a heap allocation
==5319== at 0x4026FDE: malloc (vg_replace_malloc.c:207)
==5319== by 0x80486E3: InitDictionary (in /home/ulr146/GOZA_hw4/randword)
==5319== by 0x80488C0: main (in /home/ulr146/GOZA_hw4/randword)
==5319==
==5319== Conditional jump or move depends on uninitialised value(s)
==5319== at 0x4025DBD: free (vg_replace_malloc.c:323)
==5319== by 0x804885E: ChooseRandomWord (in /home/ulr146/GOZA_hw4/randword)
==5319== by 0x80488CB: main (in /home/ulr146/GOZA_hw4/randword)
==5319== Uninitialised value was created by a heap allocation
==5319== at 0x4026FDE: malloc (vg_replace_malloc.c:207)
==5319== by 0x80486E3: InitDictionary (in /home/ulr146/GOZA_hw4/randword)
==5319== by 0x80488C0: main (in /home/ulr146/GOZA_hw4/randword)
Welcome to Hangman!
I will guess a secret word. On each turn, you guess
a letter. If the letter is in the secret word, I
will show you where it appears; if not, a part of
your body gets strung up on the scaffold. The
object is to guess the word before you are hung.
==5319==
==5319== Conditional jump or move depends on uninitialised value(s)
==5319== at 0x40276F7: strlen (mc_replace_strmem.c:242)
==5319== by 0x406F6D7: vfprintf (vfprintf.c:1581)
==5319== by 0x4075B5F: printf (printf.c:35)
==5319== by 0x8048923: main (in /home/ulr146/GOZA_hw4/randword)
==5319== Uninitialised value was created by a heap allocation
==5319== at 0x4026FDE: malloc (vg_replace_malloc.c:207)
==5319== by 0x80488A2: main (in /home/ulr146/GOZA_hw4/randword)
Your word now looks like this: ----
You have 8 guesses left.
Guess a letter: d
That is correct.
Your word now looks like this: D---
You have 8 guesses left.
Guess a letter: a
That is correct.
Your word now looks like this: DA-A
You have 8 guesses left.
Guess a letter: t
That is correct.
==5319==
==5319== Conditional jump or move depends on uninitialised value(s)
==5319== at 0x40276F7: strlen (mc_replace_strmem.c:242)
==5319== by 0x406F6D7: vfprintf (vfprintf.c:1581)
==5319== by 0x4075B5F: printf (printf.c:35)
==5319== by 0x8048A3E: main (in /home/ulr146/GOZA_hw4/randword)
==5319== Uninitialised value was created by a heap allocation
==5319== at 0x4026FDE: malloc (vg_replace_malloc.c:207)
==5319== by 0x80488A2: main (in /home/ulr146/GOZA_hw4/randword)
You guessed the word: DATA.
==5319==
==5319== ERROR SUMMARY: 6 errors from 4 contexts (suppressed: 13 from 1)
==5319== malloc/free: in use at exit: 0 bytes in 0 blocks.
==5319== malloc/free: 10 allocs, 10 frees, 952 bytes allocated.
==5319== For counts of detected errors, rerun with: -v
==5319== All heap blocks were freed -- no leaks are possible.
I can't quite figure out how to fix the uninitialized value issues because I don't know what I did that made it uninitialized.

You allocated one more element for list than the items read from file in InitDictionary(). Thus, when you going toward the last non-NULL pointer in list in ChooseRandom(), it tried to go toward the end of an uninitialized array.
IMHO, I would set up list as follows:
int size = 0;
char buf[50];
list = malloc(50*sizeof(char*));
while(fgets(buf, 49, fp)){
list[size] = malloc(50 * sizeof(char));
for(ln=0;buf[ln]!='\0';ln++){
list[size][ln] = buf[ln];
}
list[size][ln] = '\0';
--ln;
if (list[size][ln] == '\n')
list[size][ln] = '\0';
size++;
}
That is, use a local array for reading. Allocate for new array only after a new line is read.

Related

Why the bitfield's least significant bit is promoted to MSb during typecasting in the below program?

Why do we get this value as output:- ffffffff
struct bitfield {
signed char bitflag:1;
};
int main()
{
unsigned char i = 1;
struct bitfield *var = (struct bitfield*)&i;
printf("\n %x \n", var->bitflag);
return 0;
}
I know that in a memory block of size equal to the data-type, the first bit is used to represent if it is positive(0) or negative(1); when interpreted as a signed data-type. But, still can't figure out why -1 (ffffffff) is printed. When the struct with only one bit set, I was expecting that when it gets promoted to a 1 byte char. Because, my machine is a little-endian and I was expecting that one bit in the field to be interpreted as the LSb in my 1 byte character.
Can somehow please explain. I'm really confused.

How should GMP/MPFR limbs be interpreted?

The arbitrary precision libraries GMP and MPFR use heap-allocated arrays of machine word-sized integers to store the limbs that make up the high precision number/mantissa.
How should this array of limbs be interpreted to recover the arbitrary precision integer number? In other words: for N limbs holding B bits each, how should I interpret them to recover the N*B bit number?
Does the limb size really affect the in-memory representation (see below)? If so, what is the rationale behind this?
Background:
I wrote a small program to look inside the representation, but I was confused by what I saw. The limbs seem to be ordered in most significant digit order, whereas the limbs themselves are in native least significant digit format. When representing the 64-bit word 0xAAAABBBBCCCCDDDD using 32-bit words and precision fixed to 128 bits, I see:
% c++ limbs.cpp -lgmp -lmpfr -o limbs && ./limbs
ccccdddd|aaaabbbb|00000000|00000000
00000000|00000000|ccccdddd|aaaabbbb
This seems to imply that the in-memory representation can not be read back as a string of bits to recover the arbitrary precision number (e.g., if loaded this into a register on a machine that supported N*B sized words). Furthermore, this also seems to suggest that the limb size changes the representation, so that I would not be able to deserialize a number without knowing which limb size was used to serialize it.
Here's my test program (uses 32-bit limbs with the __GMP_SHORT_LIMB macro):
#define __GMP_SHORT_LIMB
#include <gmp.h>
#include <mpfr.h>
#include <iomanip>
#include <iostream>
constexpr int PRECISION = 128;
void PrintLimbs(mp_limb_t const *const limbs) {
std::cout << std::hex;
constexpr int NUM_LIMBS = PRECISION / (8 * sizeof(mp_limb_t));
for (int i = 0; i < NUM_LIMBS; ++i) {
std::cout << std::setfill('0') << std::setw(2 * sizeof(mp_limb_t)) << limbs[i];
if (i < NUM_LIMBS - 1) {
std::cout << "|";
}
}
std::cout << "\n";
}
int main() {
{ // GMP
mpz_t num;
mpz_init2(num, PRECISION);
mpz_set_ui(num, 0xAAAABBBBCCCCDDDD);
PrintLimbs(num->_mp_d);
mpz_clear(num);
}
{ // MPFR
mpfr_t num;
mpfr_init2(num, PRECISION);
mpfr_set_ui(num, 0xAAAABBBBCCCCDDDD, MPFR_RNDN);
PrintLimbs(num->_mpfr_d);
mpfr_clear(num);
}
return 0;
}

3 things that matter for the byte representation:
The limb size depends on your machine and the chosen ABI. The real size is also affected by the optional presence of nails (an experimental feature, thus it is unlikely that limbs have nails). MPFR does not support the presence of nails.
The limb representation in memory follows the endianness of the machine.
Limbs are stored least significant limb first (a.k.a. little endian).
Note that from the last two points, on a same big-endian machine, the byte representation of the array will depend on the limb size.
Concerning the size of the array of limbs, it depends on the type. For instance, with the mpn layer of GMP, it is entirely handled by the user.
For MPFR, the size is deduced from the precision of the mpfr_t object; and if the precision is not a multiple of the limb bitsize, the trailing bits are always set to 0. Note also that more memory may be allocated than the one actually used, and it must not be confused with the size of the array; you can ignore this fact, as the unused data are always after the actual array of limbs.
EDIT concerning the rationale: Manipulating limbs instead of bytes is for speed reasons. Then I suppose that little endian has been chosen to represent the array of limbs for two reasons. First, it makes the basic operations (addition, subtraction, multiplication) easier to implement and potentially faster. Second, this is much better to implement arithmetic modulo 2^K, in particular when K may change.

It finally clicked for me. The limb size does not affect the in-memory representation.
The data in GMP/MPFR is stored consistently in little-endian format, even when interpreted as a string of bytes across limbs. But registers on x86 are big-endian.
The inconsistent outcome when printing the limbs comes from how words are interpreted when read back from memory. When loaded into a register, memory is reinterpreted from little-endian (how it is stored in memory) to big-endian (how it is stored in registers).
I've modified the example below to show how it is in fact the word size with which we reinterpret memory that affects how the content is printed, as the output will be the same no matter if 32-bit or 64-bit limbs are used:
#define __GMP_SHORT_LIMB
#include <gmp.h>
#include <mpfr.h>
#include <iomanip>
#include <iostream>
#include <cstdint>
constexpr int PRECISION = 128;
template <typename InterpretAs>
void PrintLimbs(mp_limb_t const *const limbs) {
constexpr int LIMB_BITS = 8 * sizeof(InterpretAs);
constexpr int NUM_LIMBS = PRECISION / LIMB_BITS;
std::cout << LIMB_BITS << "-bit: ";
for (int i = 0; i < NUM_LIMBS; ++i) {
const auto limb = reinterpret_cast<InterpretAs const *>(limbs)[i];
for (int b = 0; b < LIMB_BITS; ++b) {
if (b > 0 && b % 16 == 0) {
std::cout << " ";
}
uint64_t bit = (limb >> (LIMB_BITS - 1 - b)) & 0x1;
std::cout << bit;
}
if (i < NUM_LIMBS - 1) {
std::cout << "|";
}
}
std::cout << "\n";
}
int main() {
uint64_t literal = 0b1111000000000000000000000000000000000000000000000000000000001001;
{ // GMP
mpz_t num;
mpz_init2(num, PRECISION);
mpz_set_ui(num, literal);
std::cout << "GMP where limbs are interpreted as:\n";
PrintLimbs<uint64_t>(num->_mp_d);
PrintLimbs<uint32_t>(num->_mp_d);
PrintLimbs<uint16_t>(num->_mp_d);
mpz_clear(num);
}
{ // MPFR
mpfr_t num;
mpfr_init2(num, PRECISION);
mpfr_set_ui(num, literal, MPFR_RNDN);
std::cout << "MPFR where limbs are interpreted as:\n";
PrintLimbs<uint64_t>(num->_mpfr_d);
PrintLimbs<uint32_t>(num->_mpfr_d);
PrintLimbs<uint16_t>(num->_mpfr_d);
mpfr_clear(num);
}
return 0;
}
This prints (regardless of limb size):
GMP where limbs are interpreted as:
64-bit: 1111000000000000 0000000000000000 0000000000000000 0000000000001001|0000000000000000 0000000000000000 0000000000000000 0000000000000000
32-bit: 0000000000000000 0000000000001001|1111000000000000 0000000000000000|0000000000000000 0000000000000000|0000000000000000 0000000000000000
16-bit: 0000000000001001|0000000000000000|0000000000000000|1111000000000000|0000000000000000|0000000000000000|0000000000000000|0000000000000000
MPFR where limbs are interpreted as:
64-bit: 0000000000000000 0000000000000000 0000000000000000 0000000000000000|1111000000000000 0000000000000000 0000000000000000 0000000000001001
32-bit: 0000000000000000 0000000000000000|0000000000000000 0000000000000000|0000000000000000 0000000000001001|1111000000000000 0000000000000000
16-bit: 0000000000000000|0000000000000000|0000000000000000|0000000000000000|0000000000001001|0000000000000000|0000000000000000|1111000000000000

sem_open - valgrind complains about uninitialised bytes

I have a trivial program:
int main(void)
{
const char sname[]="xxx";
sem_t *pSemaphor;
if ((pSemaphor = sem_open(sname, O_CREAT, 0644, 0)) == SEM_FAILED) {
perror("semaphore initilization");
exit(1);
}
sem_unlink(sname);
sem_close(pSemaphor);
}
When I run it under valgrind, I get the following error:
==12702== Syscall param write(buf) points to uninitialised byte(s)
==12702== at 0x4E457A0: __write_nocancel (syscall-template.S:81)
==12702== by 0x4E446FC: sem_open (sem_open.c:245)
==12702== by 0x4007D0: main (test.cpp:15)
==12702== Address 0xfff00023c is on thread 1's stack
==12702== in frame #1, created by sem_open (sem_open.c:139)
The code was extracted from a bigger project where it ran successfully for years, but now it is causing segmentation fault.
The valgrind error from my example is the same as seen in the bigger project, but there it causes a crash, which my small example doesn't.

I see this with glibc 2.27-5 on Debian. In my case I only open the semaphores right at the start of a long-running program and it seems harmless so far - just annoying.
Looking at the code for sem_open.c which is available at:
https://code.woboq.org/userspace/glibc/nptl/sem_open.c.html
It seems that valgrind is complaining about the line (270 as I look now):
if (TEMP_FAILURE_RETRY (__libc_write (fd, &sem.initsem, sizeof (sem_t)))
== sizeof (sem_t)
However sem.initsem is properly initialised earlier in a fairly baroque manner, firstly by explicitly setting fields in the sem.newsem (part of the union), and then once that is done by a call to memset (L226-228):
/* Initialize the remaining bytes as well. */
memset ((char *) &sem.initsem + sizeof (struct new_sem), '\0',
sizeof (sem_t) - sizeof (struct new_sem));
I think that this particular shenanigans is all quite optimal, but we need to make sure that all of the fields of new_sem have actually been initialised... we find the definition in https://code.woboq.org/userspace/glibc/sysdeps/nptl/internaltypes.h.html and it is this wonderful creation:
struct new_sem
{
#if __HAVE_64B_ATOMICS
/* The data field holds both value (in the least-significant 32 bytes) and
nwaiters. */
# if __BYTE_ORDER == __LITTLE_ENDIAN
# define SEM_VALUE_OFFSET 0
# elif __BYTE_ORDER == __BIG_ENDIAN
# define SEM_VALUE_OFFSET 1
# else
# error Unsupported byte order.
# endif
# define SEM_NWAITERS_SHIFT 32
# define SEM_VALUE_MASK (~(unsigned int)0)
uint64_t data;
int private;
int pad;
#else
# define SEM_VALUE_SHIFT 1
# define SEM_NWAITERS_MASK ((unsigned int)1)
unsigned int value;
int private;
int pad;
unsigned int nwaiters;
#endif
};
So if we __HAVE_64B_ATOMICS then the structure has a data field which contains both the value and the nwaiters, otherwise these are separate fields.
In the initialisation of sem.newsem we can see that these are initialised correctly, as follows:
#if __HAVE_64B_ATOMICS
sem.newsem.data = value;
#else
sem.newsem.value = value << SEM_VALUE_SHIFT;
sem.newsem.nwaiters = 0;
#endif
/* pad is used as a mutex on pre-v9 sparc and ignored otherwise. */
sem.newsem.pad = 0;
/* This always is a shared semaphore. */
sem.newsem.private = FUTEX_SHARED;
I'm doing all of this on a 64-bit system, so I think that valgrind is complaining about the initialisation of the 64-bit sem.newsem.data with a 32-bit value since from:
value = va_arg (ap, unsigned int);
We can see that value is defined simply as an unsigned int which will usually still be 32 bits even on a 64-bit system (see What should be the sizeof(int) on a 64-bit machine?), but that should just be an implicit cast to 64-bits when it is assigned.
So I think this is not a bug - just valgrind getting confused.

Why does lseek return 0?

lseek() is supposed to return the position of the file descriptor.
The documentation says:
Upon successful completion, lseek()
returns the resulting offset location
as measured in bytes from the
beginning of the file. Otherwise, a value of -1 is returned
and errno is set to indicate the
error.
Trouble is, not even this works:
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
printf("size off_t: %i\n", sizeof(off_t));
off_t pos;
pos = lseek(file, (off_t)0, SEEK_CUR);
printf("pos: %lli\n", pos);
// same result for SEEK_SET and SEEK_END
pos = lseek(file, (off_t)2352, SEEK_CUR);
printf("pos: %lli\n", pos);
This gives me:
size off_t: 8
pos: 0
pos: 0
Why is this? Is there an alternative to find the current offset, using the raw I/O functions? (read, open, lseek, …)
Edit 1:
I tried to make the example simpler.

Try adding #include <unistd.h> to the top.
See: http://forums.macosxhints.com/archive/index.php/t-35508.html
Basically, since you didn't #include <unistd.h>, the compiler is "guessing" that lseek() returns an int.
Probably an int is 4-bytes long, and since PPC is "big-endian" byte order, you're getting the "top" 4 bytes, which are all zero.
Include unistd.h lets the compiler realize that lseek() is returning an off_t, so you don't lose half the bytes.

Something else is up, probably something silly. I tried your code, as here:
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
int main(int argc, char *argv[]){
off_t pos;
int file ;
if((file = open("/Users/chasrmartin/.bash_history",O_RDONLY)) == -1){
perror(argv[0]);
exit(1);
}
printf("size off_t: %i\n", sizeof(off_t));
pos = lseek(file, (off_t)0, SEEK_CUR);
printf("pos: %lli\n", pos);
// same result for SEEK_SET and SEEK_END
pos = lseek(file, (off_t)2352, SEEK_CUR);
printf("pos: %lli\n", pos);
exit(0);
}
and get this result:
bash $ gcc foo.c
bash $ ./a.out
size off_t: 8
pos: 0
pos: 2352
(Just to be definite, this is on Mac OS/X 10.5.6 on Intel.)
Update.
Or maybe it's not silly. I just tried it on a PPC G5, and get the results you do.
Update 2
Okay, here's the result on a PPC:
$ gcc foo.c
$ ./a.out
size off_t: 8
pos: 0
pos: 0

What kind of file is it? Is it a pipe by any chance? Because if it's anything but a regular file, chances are it doesn't support seeking:
The behavior of lseek() on devices which are incapable of seeking is implementation-defined. The value of the file offset associated with such a device is undefined.

I'm not sure I understand your question, but here are a few thoughts which might help.
Offset 0 is valid; it means you are at the beginning of the file
Depending on your platform, off_t may well be limited to 32 bits unsigned.
Are you intending to seek relative to your current position?
-- MarkusQ

You might want to change the test to:
if ( (pos = lseek(file, (off_t)i, SEEK_CUR)) != -1 ) {
You are probably hitting a -1 somewhere, but you're testing for 0 here.

Valgrind Error Involving Uninitialised String: False Flag?

When running valgrind to check for errors in a program written in C89/90, it comes up with a Uninitialised value was created by a heap allocation error for a strToUpper() function I wrote, despite the string being initialised.
I'm using this function to compare strings ignoring case. Unfortunately, C89 doesn't seem to include the strcasecmp() function in <string.h>, so I've written my own, which calls the strToUpper() and strcmp() functions.
CODE
char* strToUpper(char* inStr)
{
int i;
char *upperStr;
size_t strLen = strlen(inStr);
upperStr = (char*)malloc(sizeof(char) * (strLen + 1));
/* Does this for loop not initialise upperStr? */
for (i = 0; i < strLen; i++)
upperStr[i] = toupper(inStr[i]);
return upperStr;
}
VALGRIND ERROR
==27== Conditional jump or move depends on uninitialised value(s)
==27== at 0x4C31FAA: strcmp (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27== by 0x406649: strcasecmp (stringPlus.c:178)
==27== ...
==27== by 0x400FEB: main (main.c:58)
==27== Uninitialised value was created by a heap allocation
==27== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==27== by 0x4062E4: strToUpper (stringPlus.c:58)
==27== by 0x406622: strcasecmp (stringPlus.c:175)
==27== ...
==27== by 0x400FEB: main (main.c:58)
Any ideas?

You are not terminating your copied string.
Add something like
upperStr[i] = '\0';
after your for loop.

Instead of using malloc(blah) I always use calloc(1,blah). The latter sets all allocated memory to zero.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas