Is there any problem with popen stream to be closed after fdopen - valgrind

I'm seeing a strange issue. Sample code is included below
When this code is run with valgrind, it complains that the memory allocated with popen is still reachable. Should i worry about this warning? If yes, what is a possible solution?
Func1()
FILE *fp = NULL;
int fd = 0;
fp = popen(g_cmd, "r");
fd = fileno(fp); // store fd for later processing.
...
Func2(fd)
FILE *popen_fp = NULL;
popen_fp = fdopen(fd, "r"); // Convert fd to File pointer.
if (popen_fp) pclose(popen_fp);
==11748== 256 bytes in 1 blocks are still reachable in loss record 1 of 1
==11748== at 0x4C29F73: malloc
==11748== by 0x5542627: popen##GLIBC_2.2.5
LEAK SUMMARY:
==11748== definitely lost: 0 bytes in 0 blocks
==11748== indirectly lost: 0 bytes in 0 blocks
==11748== possibly lost: 0 bytes in 0 blocks
==11748== still reachable: 256 bytes in 1 blocks

Related

Why didn't Valgrind detect Uninitialised value was used

#include <stdlib.h>
int* matvec(int A[][3], int* x, int n) {
int i, j;
int* y = (int*)malloc(n * sizeof(int));
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
y[i] += A[i][j] * x[j];
}
}
free(y);
}
void main() {
int a[3][3] = {
{0, 1, 2},
{2, 3, 4},
{4, 5, 6}
};
int x[3] = {1, 2, 3};
matvec(a, x, 3);
}
I detect memory problem by below command:
valgrind --tool=memcheck --leak-check=full --track-origins=yes ./a.out
it gives output:
==37060== Memcheck, a memory error detector
==37060== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==37060== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==37060== Command: ./a.out
==37060==
==37060==
==37060== HEAP SUMMARY:
==37060== in use at exit: 0 bytes in 0 blocks
==37060== total heap usage: 1 allocs, 1 frees, 12 bytes allocated
==37060==
==37060== All heap blocks were freed -- no leaks are possible
==37060==
==37060== For lists of detected and suppressed errors, rerun with: -s
==37060== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Question is: Why the valgind didn't detect the problem that: y array is used as uninitialized
From the Valgrind User Manual ยง4.2.2 Use of uninitialised values:
A complaint is issued only when your program attempts to make use of uninitialised data in a way that might affect your program's externally-visible behaviour.
You are starting with some uninitialized memory, keeping it uninitialized, and then freeing it. You are not using it in an externally-visible way, such as by using it in an if condition or passing it to a system call.

Eigen on STM32 works only until a certain size

I am trying to use Eigen C++ library on STM32F4 Discovery embedded board to perform some matrix operations in the future, specifically to do some kalman filtering on sensor data.
I tried linking against the standard c++ library and even tried to compile the program using g++ arm compiler.
typedef Eigen::Matrix<float, 10, 10> Matrix10d;
Matrix10d mat1 = Matrix10d::Constant(10, 10, 1);
Matrix10d mat2 = Matrix10d::Constant(10, 10, 2);
Matrix10d result;
result = mat1 * mat2;
I can compile the same code if the matrix size as been set to 7. If I cross that then the code wont compile and the eigen gives me a warning that
warning: argument 1 value '4294967295' exceeds maximum object size 2147483647
These are the partial error messages I am getting
n function 'throw_std_bad_alloc,
inlined from 'check_size_for_overflow at bla/bla/Eigen/src/Core/util/Memory.h:289:24
Here is the memory allocation in Linker script I am using
/*
* STM32F407xG memory setup.
* Note: Use of ram1 and ram2 is mutually exclusive with use of ram0.
*/
MEMORY
{
flash0 : org = 0x08000000, len = 1M
flash1 : org = 0x00000000, len = 0
flash2 : org = 0x00000000, len = 0
flash3 : org = 0x00000000, len = 0
flash4 : org = 0x00000000, len = 0
flash5 : org = 0x00000000, len = 0
flash6 : org = 0x00000000, len = 0
flash7 : org = 0x00000000, len = 0
ram0 : org = 0x20000000, len = 128k /* SRAM1 + SRAM2 */
ram1 : org = 0x20000000, len = 112k /* SRAM1 */
ram2 : org = 0x2001C000, len = 16k /* SRAM2 */
ram3 : org = 0x00000000, len = 0
ram4 : org = 0x10000000, len = 64k /* CCM SRAM */
ram5 : org = 0x40024000, len = 4k /* BCKP SRAM */
ram6 : org = 0x00000000, len = 0
ram7 : org = 0x00000000, len = 0
}
I am just running STM32F4 discovery board with unchanged Chibios configuration
# Stack size to be allocated to the Cortex-M process stack. This stack is
# the stack used by the main() thread.
ifeq ($(USE_PROCESS_STACKSIZE),)
USE_PROCESS_STACKSIZE = 0x400
endif
Update
I was not able to reproduce this error anymore. The sad thing is that I didn't do anything to solve the issue.
arm-none-eabi-gcc -c -mcpu=cortex-m4 -O3 -Os -ggdb -fomit-frame-pointer -falign-functions=16 -ffunction-sections -fdata-sections -fno-common -flto -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant -Wall -Wextra -Wundef -Wstrict-prototypes -Wa,-alms=build/lst/ -DCORTEX_USE_FPU=TRUE -DCHPRINTF_USE_FLOAT=TRUE -DTHUMB_PRESENT -mno-thumb-interwork -DTHUMB_NO_INTERWORKING -MD -MP -MF .dep/build.d -I.
The above are the compiler options that I am using if anyone is interested.
Now I can multiply even 20x20 matrices with out any problem.
Matrix20d mat1 = Matrix20d::Constant(20, 20, 2);
// Multiply the matrix with a vector.
Vector20d vec = Vector20d::Constant(20, 1, 2);
Vector20d result;
systime_t startTime = chVTGetSystemTimeX();
result = mat1 * vec;
// Calculate the timedifference
systime_t endTime = chVTGetSystemTimeX();
systime_t timeDifference = chTimeDiffX(startTime, endTime);
chprintf(chp,"Time taken for the multiplication in milliseconds : %d\n", (int)timeDifference);
chprintf(chp, "System time : %d \n", startTime);
chprintf(chp, "Systime end : %d \n", endTime);
chprintf(chp, "Values in the vector : \n [");
for(Eigen::Index i=0; i < result.size();i++)
{
chprintf(chp, "%0.3f, ", result(i));
}
chprintf(chp, "] \n");
chThdSleepMilliseconds(1000);
It took about ~1ms to do the above computation.
I thought that there might be some problem with my compiler. So I tried with two versions of compilers
Version - 1
arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Version-2
arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors 6-2017-q2-update) 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437]
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Performance of Fortran versus MPI files writing/reading

I am confused about the performance of Fortran writing and reading performance (speed) versus MPI one for small and big files.
I wrote the following simple dummy program to test this (just writing dummy values to files):
PROGRAM test
!
IMPLICIT NONE
!
#if defined (__MPI)
!
! Include file for MPI
!
#if defined (__MPI_MODULE)
USE mpi
#else
INCLUDE 'mpif.h'
#endif
#else
! dummy world and null communicator
INTEGER, PARAMETER :: MPI_COMM_WORLD = 0
INTEGER, PARAMETER :: MPI_COMM_NULL = -1
INTEGER, PARAMETER :: MPI_COMM_SELF = -2
#endif
INTEGER (kind=MPI_OFFSET_KIND) :: lsize, pos, pos2
INTEGER, PARAMETER :: DP = 8
REAL(kind=DP), ALLOCATABLE, DIMENSION(:) :: trans_prob, array_cpu
INTEGER :: ierr, i, error, my_pool_id, world_comm
INTEGER (kind=DP) :: fil
REAL :: start, finish
INTEGER :: iunepmat, npool, arr_size, loop, pos3, j
real(dp):: dummy
integer*8 :: unf_recl
integer :: ios, direct_io_factor, recl
iunepmat = 10000
arr_size = 102400
loop = 500
! Initialize MPI
CALL MPI_INIT(ierr)
call MPI_COMM_DUP(MPI_COMM_WORLD, world_comm, ierr)
call MPI_COMM_RANK(world_comm,my_pool_id,error)
ALLOCATE(trans_prob(arr_size))
trans_prob(:) = 1.5d0
!Write using Fortran
CALL MPI_BARRIER(world_comm,error)
!
CALL cpu_time(start)
!
DO i=1, loop
! This writes also info on the record length using a real with 4 bytes.
OPEN(unit=10+my_pool_id, form='unformatted', position='append', action='write')
WRITE(10+my_pool_id ) trans_prob(:)
CLOSE(unit=10+my_pool_id)
ENDDO
CALL MPI_COMM_SIZE(world_comm, npool, error)
! Master collect and write
IF (my_pool_id==0) THEN
INQUIRE (IOLENGTH=direct_io_factor) dummy
unf_recl = direct_io_factor * int(arr_size * loop, kind=kind(unf_recl))
ALLOCATE (array_cpu( arr_size * loop ))
array_cpu(:) = 0.0d0
OPEN(unit=100,file='merged.dat',form='unformatted', status='new', position='append', action='write')
DO i=0, npool - 1
OPEN(unit=10+i,form='unformatted', status ='old', access='direct', recl = unf_recl )
READ(unit=10+i, rec=1) array_cpu(:)
CLOSE(unit=10+i)
WRITE(unit=100) array_cpu(:)
ENDDO
CLOSE(unit=100)
DEALLOCATE (array_cpu)
ENDIF
call cpu_time(finish)
!Print time
CALL MPI_BARRIER(world_comm,error)
IF (my_pool_id==0) print*, ' Fortran time', finish-start
!Write using MPI
CALL MPI_BARRIER(world_comm,error)
!
CALL cpu_time(start)
!
lsize = INT( arr_size , kind = MPI_OFFSET_KIND)
pos = 0
pos2 = 0
CALL MPI_FILE_OPEN(world_comm, 'MPI.dat',MPI_MODE_WRONLY + MPI_MODE_CREATE,MPI_INFO_NULL,iunepmat,ierr)
DO i=1, loop
pos = pos2 + INT( arr_size * (my_pool_id), kind = MPI_OFFSET_KIND ) * 8_MPI_OFFSET_KIND
CALL MPI_FILE_SEEK(iunepmat, pos, MPI_SEEK_SET, ierr)
CALL MPI_FILE_WRITE(iunepmat, trans_prob, lsize, MPI_DOUBLE_PRECISION,MPI_STATUS_IGNORE,ierr)
pos2 = pos2 + INT( arr_size * (npool -1), kind = MPI_OFFSET_KIND ) * 8_MPI_OFFSET_KIND
ENDDO
!
CALL MPI_FILE_CLOSE(iunepmat,ierr)
CALL cpu_time(finish)
CALL MPI_BARRIER(world_comm,error)
IF (my_pool_id==0) print*, ' MPI time', finish-start
DEALLOCATE (trans_prob)
END PROGRAM
The compilation is made with:
mpif90 -O3 -x f95-cpp-input -D__FFTW -D__MPI -D__SCALAPACK test_mpi2.f90 -o a.x
and then run in parallel with 4 cores:
mpirun -np 4 ./a.x
I get the following results:
Loop size 1
array size 10,240,000
File size: 313 Mb
Fortran time 0.237030014 sec
MPI time 0.164155006 sec
Loop size 10
array size 1,024,000
File size: 313 Mb
Fortran time 0.242821991 sec
MPI time 0.172048002 sec
Loop size 100
array size 102,400
File size: 313 Mb
Fortran time 0.235879987 sec
MPI time 9.78289992E-02 sec
Loop size 50
array size 1,024,000
File size: 1.6G
Fortran time 1.60272002 sec
MPI time 3.40623116 sec
Loop size 500
array size 102,400
File size: 1.6G
Fortran time 1.44547606 sec
MPI time 3.38340592 sec
As you can see the performances of MPI degrade significantly for larger files. Is it possible to improve MPI performance for large files ?
Is this behavior expected?

How can I use iCE40 4K block RAM in 512x8 read mode with IceStorm?

I am trying to figure out how to use the block RAM on my iCE40HX-8K Breakout Board. I would like to access it in a 512x8 configuration, which as far as I can tell from the documentation is supported by project IceStorm, but I haven't been able to get it to work like I expected.
If I understand correctly, initializing an SB_RAM40_4K primitive with the READ_MODE parameter set to 1 should set the block up in 512x8 read mode, which uses a 9 bit read address, and reads 8 bits of data at each address.
Here is the simplest example I could think of. It sets up an SB_RAM40_4K with some pre-initialized memory and reads straight to the pins of the on-board LED's.
hx8kboard.pcf
set_io leds[0] B5
set_io leds[1] B4
set_io leds[2] A2
set_io leds[3] A1
set_io leds[4] C5
set_io leds[5] C4
set_io leds[6] B3
set_io leds[7] C3
set_io clk J3
top.v
module top (
output [7:0] leds,
input clk
);
//reg [8:0] raddr = 8'd0;
reg [8:0] raddr = 8'd1;
SB_RAM40_4K #(
.INIT_0(256'h00000000000000000000000000000000000000000000000000000000_44_33_22_11),
.WRITE_MODE(1),
.READ_MODE(1)
) ram40_4k_512x8 (
.RDATA(leds),
.RADDR(raddr),
.RCLK(clk),
.RCLKE(1'b1),
.RE(1'b1),
.WADDR(8'b0),
.WCLK(1'b0),
.WCLKE(1'b0),
.WDATA(8'b0),
.WE(1'b0)
);
endmodule
LED output when raddr == 0
\|/ \|/
O O O O O O O O
LED output when raddr == 1
\|/ \|/ \|/ \|/
O O O O O O O O
I would think that address 1 in 512x8 mode would be the second 8 bits from RAM, which is 8'h22 or 8'b0010010. Instead I get 8'h33 or 8'b00110011. After a little experimentation, this seems to be the lower 8 bits of a 16 bit read.
I'm not sure where I went wrong. Any help understanding what's going on here would be appreciated. Thanks!
This question is not really about Yosys or Project IceStorm. The format used for the SB_RAM40_4K INIT_* parameters is the same for the IceStorm flow and the Lattice iCEcube2 flow. However, Lattice has done a very very bad job at documenting this format. Otherwise I'd just point you to the right Lattice document.. :)
You are interested in the 512x8 mode. First you need to know that in 512x8 mode only the even bits of .RDATA() and .WDATA() are used (not the 8 LSB bits, as your code suggests!).
The data in .INIT_* is stored as 16 16-bit words per parameter. The lowest 16-bit word in .INIT_0() contains the 8-bit word at addr 0 in its even bits and the 8-bit word at addr 256 in its odd bits.
The next 16-bit word in .INIT_0() contains words 1 and 257. The lowest 16-bits in .INIT_1() contain words 16 and 272, and so forth.
The easiest way to investigate this kind of stuff is probably to either read the SB_RAM40_4K simulation model in /usr/local/share/yosys/ice40/cells_sim.v, or simply let Yosys infer the memory and observe what yosys does. For example the following design:
module test(input clk, wen, input [8:0] addr, input [7:0] wdata, output reg [7:0] rdata);
reg [7:0] mem [0:511];
initial mem[0] = 255;
always #(posedge clk) begin
if (wen) mem[addr] <= wdata;
rdata <= mem[addr];
end
endmodule
Will produce the following output when run through yosys -p 'synth_ice40; write_verilog' test.v:
(* top = 1 *)
(* src = "test.v:1" *)
module test(clk, wen, addr, wdata, rdata);
(* src = "/usr/local/bin/../share/yosys/ice40/brams_map.v:255" *)
(* unused_bits = "0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15" *)
wire [15:0] _0_;
(* src = "test.v:1" *)
input [8:0] addr;
(* src = "test.v:1" *)
input clk;
(* src = "test.v:1" *)
output [7:0] rdata;
(* src = "test.v:1" *)
input [7:0] wdata;
(* src = "test.v:1" *)
input wen;
(* src = "/usr/local/bin/../share/yosys/ice40/brams_map.v:277|/usr/local/bin/../share/yosys/ice40/brams_map.v:35" *)
SB_RAM40_4K #(
.INIT_0(256'bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1x1x1x1x1x1x1x1),
.INIT_1(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_2(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_3(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_4(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_5(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_6(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_7(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_8(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_9(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_A(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_B(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_C(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_D(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_E(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.INIT_F(256'hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx),
.READ_MODE(32'sd1),
.WRITE_MODE(32'sd1)
) \mem.0.0.0 (
.MASK(16'hxxxx),
.RADDR({ 2'h0, addr }),
.RCLK(clk),
.RCLKE(1'h1),
.RDATA({ _0_[15], rdata[7], _0_[13], rdata[6], _0_[11], rdata[5], _0_[9], rdata[4], _0_[7], rdata[3], _0_[5], rdata[2], _0_[3], rdata[1], _0_[1], rdata[0] }),
.RE(1'h1),
.WADDR({ 2'h0, addr }),
.WCLK(clk),
.WCLKE(wen),
.WDATA({ 1'hx, wdata[7], 1'hx, wdata[6], 1'hx, wdata[5], 1'hx, wdata[4], 1'hx, wdata[3], 1'hx, wdata[2], 1'hx, wdata[1], 1'hx, wdata[0] }),
.WE(1'h1)
);
endmodule
(Scroll all the way to the right to see the initialization pattern generated for the mem[0] = 255 initialization.)

Reading TIFF file BitsPerSample and ImageWidth

I'm writing an Objective-C application which can work with TIFF images in a much faster way than NSImage do (merging images with it for example costs lots of memory space), so I'm starting with a TIFF reader/writer using a combination of Objective-C and C, for best performance.
By reading Adobe documentation for TIFF files, I've been able to read every metadata of my TIFF images, with some exceptions: BitsPerSample and ImageWidth. While the other metadata returned possible values, BitsPerSample hexadecimal value/offset is fffffffffffffffe (aka. -2), while it has 3 values of size 2 (short), which means that there should be an offset in that place, and since the beginning of the TIFF file is offset 0 and the file weights 5.846.655 bytes, that offset would be invalid even if it was unsigned (18446744073709551614).
In the same way, ImageWidth returned value was 944, while the image width is 1200. Since I can detect that the file is a TIFF file by obtaining the short value from index 2 with length 2 and comparing it to 42, I assume my shortIntegerFromBytesAtRangeWithEndian function is working.
unsigned short shortIntegerFromBytesAtRangeWithEndian(char* bytes, unsigned long start, unsigned long length, int endian) {
unsigned short returnedInt = 0;
BOOL isBigEndian = endian == TIFF_IMAGE_ENDIAN_BIG;
for (unsigned long index = isBigEndian ? 0 : length-1 ; index < length; index += endian){
returnedInt = (returnedInt << BYTE_SIZE) + bytes[index + start];
}
return returnedInt;
}
endian is -1 for little endian and 1 for big endian.
Are these variables read in a different way than other values? These are the values of the image in hexadecimal and decimal, with the size in bytes (I've omitted the offset values since some of them are too big):
? (-2): 0 (0) - Size: 4
ImageWidth (256): 3b0 (944) - Size: 2
ImageLength (257): 320 (800) - Size: 2
Compression (259): 1 (1) - Size: 2
PhotometricInterpretation (262): 2 (2) - Size: 2
StripOffsets (273): 5a6c (23148) - Size: 4
Orientation (274): 1 (1) - Size: 2
SamplesPerPixel (277): 3 (3) - Size: 2
RowsPerStrip (278): 320 (800) - Size: 2
StripByteCounts (279): 56e400 (5694464) - Size: 4
PlanarConfiguration (284): 1 (1) - Size: 2
ResolutionUnit (296): 2 (2) - Size: 2
? (34665): 583e6c (5783148) - Size: 4