For learning purposes, how to code this Python example using the TensorFlow C API ?
import tensorflow as tf
hello = tf.constant("hello TensorFlow!")
sess=tf.Session()
print(sess.run(hello))
I have tried it this way:
#include <string.h>
#include <iostream.h>
#include "c_api.h"
int main( int argc, char ** argv )
{
TF_Graph * graph = TF_NewGraph();
TF_SessionOptions * options = TF_NewSessionOptions();
TF_Status * status = TF_NewStatus();
TF_Session * session = TF_NewSession( graph, options, status );
char hello[] = "Hello TensorFlow!";
TF_Tensor * tensor = TF_AllocateTensor( TF_STRING, 0, 0, 8 + TF_StringEncodedSize( strlen( hello ) ) );
TF_OperationDescription * operationDescription = TF_NewOperation( graph, "Const", "hello" );
TF_Operation * operation;
struct TF_Output * output;
TF_StringEncode( hello, strlen( hello ), 8 + ( char * ) TF_TensorData( tensor ), TF_StringEncodedSize( strlen( hello ) ), status );
TF_SetAttrTensor( operationDescription, "value", tensor, status );
TF_SetAttrType( operationDescription, "dtype", TF_TensorType( tensor ) );
operation = TF_FinishOperation( operationDescription, status );
output->oper = operation;
output->index = 0;
TF_SessionRun( session, 0,
0, 0, 0, // Inputs
output, &tensor, 1, // Outputs
&operation, 1, // Operations
0, status );
printf( "%i", TF_GetCode( status ) );
TF_CloseSession( session, status );
TF_DeleteSession( session, status );
TF_DeleteStatus( status );
TF_DeleteSessionOptions( options );
return 0;
}
I am testing it on Windows using the TensorFlow.dll from:
http://ci.tensorflow.org/view/Nightly/job/nightly-libtensorflow-windows/lastSuccessfulBuild/artifact/lib_package/libtensorflow-cpu-windows-x86_64.zip
The above code GPFs on the TF_SessionRun() call. Once we find a solution for that, how to retrieve the output ? Should a different tensor be used for the
output ? The above code reuses it in both the output and the operation.
many thanks
There was a bug to solve beside the offset initialization. This version seems to work fine:
#include <iostream.h>
#include "c_api.h"
int main( int argc, char ** argv )
{
TF_Graph * graph = TF_NewGraph();
TF_SessionOptions * options = TF_NewSessionOptions();
TF_Status * status = TF_NewStatus();
TF_Session * session = TF_NewSession( graph, options, status );
char hello[] = "Hello TensorFlow!";
TF_Tensor * tensor = TF_AllocateTensor( TF_STRING, 0, 0, 8 + TF_StringEncodedSize( strlen( hello ) ) );
TF_Tensor * tensorOutput;
TF_OperationDescription * operationDescription = TF_NewOperation( graph, "Const", "hello" );
TF_Operation * operation;
struct TF_Output output;
TF_StringEncode( hello, strlen( hello ), 8 + ( char * ) TF_TensorData( tensor ), TF_StringEncodedSize( strlen( hello ) ), status );
memset( TF_TensorData( tensor ), 0, 8 );
TF_SetAttrTensor( operationDescription, "value", tensor, status );
TF_SetAttrType( operationDescription, "dtype", TF_TensorType( tensor ) );
operation = TF_FinishOperation( operationDescription, status );
output.oper = operation;
output.index = 0;
TF_SessionRun( session, 0,
0, 0, 0, // Inputs
&output, &tensorOutput, 1, // Outputs
&operation, 1, // Operations
0, status );
printf( "status code: %i\n", TF_GetCode( status ) );
printf( "%s\n", ( ( char * ) TF_TensorData( tensorOutput ) ) + 9 );
TF_CloseSession( session, status );
TF_DeleteSession( session, status );
TF_DeleteStatus( status );
TF_DeleteSessionOptions( options );
return 0;
}
Do we have to delete the tensorOutput ? Not sure why we have to add 9 (instead of 8) to get the beginning of the string.
TF_STRING tensors are encoded using the format described here. In your code, you accounted for space (8 bytes) to encode the one offset, but didn't actually initialize it. To do that, you'd want to add something like:
memset(TF_TensorData(tensor), 0, 8);
Before the call to TF_SetAttrTensor, as that will set the "offset" of the string element to 0 (which the where you're encoding the one string value).
To your second question: You aren't actually re-using the same tensor pointer. The comments for TF_SessionRun suggest that TF_SessionRun is allocating a new TF_Tensor object that the caller takes ownership of. So, in your code snippet, the tensor variable is being overwritten to point to a newly allocated tensor.
Hope that helps.
The example will work on Linux with TensorFlow installed as described here https://www.tensorflow.org/install/lang_c with changed include directives:
#include <stdio.h>
#include <string.h>
#include <tensorflow/c/c_api.h>
To build it is enough to execute
gcc hello_tf.c -ltensorflow -o hello_tf
as it was said in TensorFlow docs.
Related
I am using Ubuntu 18.04 on a Oracle virtual Box on HP machine. I have tried to install and run a OpenCL code but I got the following errors that OpenCL has returned. I am trying to just add values of sin^2(i) and cos^2(i) and taking average all of them. So the answer is 1.000 but due to some problem in the installation or the machine I am getting a bunch of errors and answer as 0.
I have tried adding and removing beignet. It did not resolve the issue
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <CL/opencl.h>
// OpenCL kernel. Each work item takes care of one element of c
const char *kernelSource = "\n" \
"#pragma OPENCL EXTENSION cl_khr_fp64 : enable \n" \
"__kernel void vecAdd( __global double *a, \n" \
" __global double *b, \n" \
" __global double *c, \n" \
" const unsigned int n) \n" \
"{ \n" \
" //Get our global thread ID \n" \
" int id = get_global_id(0); \n" \
" \n" \
" //Make sure we do not go out of bounds \n" \
" if (id < n) \n" \
" c[id] = a[id] + b[id]; \n" \
"} \n" \
"\n" ;
int main( int argc, char* argv[] )
{
// Length of vectors
unsigned int n = 100000;
// Host input vectors
double *h_a;
double *h_b;
// Host output vector
double *h_c;
// Device input buffers
cl_mem d_a;
cl_mem d_b;
// Device output buffer
cl_mem d_c;
cl_platform_id cpPlatform; // OpenCL platform
cl_device_id device_id; // device ID
cl_context context; // context
cl_command_queue queue; // command queue
cl_program program; // program
cl_kernel kernel; // kernel
// Size, in bytes, of each vector
size_t bytes = n*sizeof(double);
// Allocate memory for each vector on host
h_a = (double*)malloc(bytes);
h_b = (double*)malloc(bytes);
h_c = (double*)malloc(bytes);
// Initialize vectors on host
int i;
for( i = 0; i < n; i++ )
{
h_a[i] = sinf(i)*sinf(i);
h_b[i] = cosf(i)*cosf(i);
}
size_t globalSize, localSize;
cl_int err;
// Number of work items in each local work group
localSize = 64;
// Number of total work items - localSize must be devisor
globalSize = ceil(n/(float)localSize)*localSize;
// Bind to platform
err = clGetPlatformIDs(1, &cpPlatform, NULL);
// Get ID for the device
err = clGetDeviceIDs(cpPlatform, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
// Create a context
context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
// Create a command queue
queue = clCreateCommandQueue(context, device_id, 0, &err);
// Create the compute program from the source buffer
program = clCreateProgramWithSource(context, 1,
(const char **) & kernelSource, NULL, &err);
// Build the program executable
clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
// Create the compute kernel in the program we wish to run
kernel = clCreateKernel(program, "vecAdd", &err);
// Create the input and output arrays in device memory for our calculation
d_a = clCreateBuffer(context, CL_MEM_READ_ONLY, bytes, NULL, NULL);
d_b = clCreateBuffer(context, CL_MEM_READ_ONLY, bytes, NULL, NULL);
d_c = clCreateBuffer(context, CL_MEM_WRITE_ONLY, bytes, NULL, NULL);
// Write our data set into the input array in device memory
err = clEnqueueWriteBuffer(queue, d_a, CL_TRUE, 0,
bytes, h_a, 0, NULL, NULL);
err |= clEnqueueWriteBuffer(queue, d_b, CL_TRUE, 0,
bytes, h_b, 0, NULL, NULL);
// Set the arguments to our compute kernel
err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &d_a);
err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &d_b);
err |= clSetKernelArg(kernel, 2, sizeof(cl_mem), &d_c);
err |= clSetKernelArg(kernel, 3, sizeof(unsigned int), &n);
// Execute the kernel over the entire range of the data set
err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &globalSize, &localSize,
0, NULL, NULL);
// Wait for the command queue to get serviced before reading back results
clFinish(queue);
// Read the results from the device
clEnqueueReadBuffer(queue, d_c, CL_TRUE, 0,
bytes, h_c, 0, NULL, NULL );
//Sum up vector c and print result divided by n, this should equal 1 within error
double sum = 0;
for(i=0; i<n; i++)
sum += h_c[i];
printf("final result: %f\n", sum/n);
// release OpenCL resources
clReleaseMemObject(d_a);
clReleaseMemObject(d_b);
clReleaseMemObject(d_c);
clReleaseProgram(program);
clReleaseKernel(kernel);
clReleaseCommandQueue(queue);
clReleaseContext(context);
//release host memory
free(h_a);
free(h_b);
free(h_c);
return 0;
}
These are the error messages that I have got
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)
final result: 0.000000
I am using LPCXpresso1549 to generate chirp signal with frequency between 35000 Hz and 45000 Hz. First off, I generate DAC chirp samples on matlab, and store them in const uint16_t chirpData[]. The samples frequency is 96000 Hz, therefore 96001 samples. Then, I set the timer to send samples out one by one every (1/96000) second. However, my signal I got having frequency between 3200 Hz to 44000 Hz. Is that because the timer is slow?
enter code here
const uint16_t chirpData[NUM_SAMPLES] = { 2048, ...., 1728, 2048} //96001 sampels
#include "mbed.h"
#include "chirp.h"
Serial pc(USBTX, USBRX);
Timer t;
AnalogOut aout(P0_12);
int main()
{
int i = 0;
while(true) {
// Write the sample to the analog
t.start(); //start timer
if(t.read() >= 0.00001){ // 1/samplef = 0.00001
aout.write_u16(chirpData[i]);
i++;
t.reset(); // reset timer to zero
if(i > 96000) {
i = 0;
}
}
}
}
In this case I recommend you using Threads executing Tasks by:
#define xTaskCreate( pvTaskCode, pcName, usStackDepth, pvParameters, uxPriority, pxCreatedTask ) xTaskGenericCreate( ( pvTaskCode ), ( pcName ), ( usStackDepth ), ( pvParameters ), ( uxPriority ), ( pxCreatedTask ), ( NULL ), ( NULL ) )
xTaskHandle taskHandle;
xTaskCreate(..); //Check Task.h
Then you can set your task cycle by
void your_task() {
unsigned task_cycle_ms = 1/freq; //Careful, convert to ms.
portTickType xLastWakeTime = 0;
portTickType xFrequency = task_cycle_ms/portTICK_RATE_MS;
for(;;) {
vTaskDelayUntil( &xLastWakeTime, xFrequency );
//your code to execute every cycle here
}
}
I have to add freetype library to keil uvision 4 for dealing ttf font files.
I followed the steps in Simple Glyph Loading Tutorial.
I am trying to compile the code below called example1.c. I tried the tutorial in Ubuntu terminal with the help of Undefined reference to 'FT_Init_FreeType'. It compiled without error.
But unfortunately I don't know how to link the library to keil.
It shows "Error: L6218E: Undefined symbol FT_Init_FreeType (referred from example1.o)."
Can anyone help me?
example1.c:
/* example1.c */
/* */
/* This small program shows how to print a rotated string with the */
/* FreeType 2 library. */
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <ft2build.h>
#include FT_FREETYPE_H
#define WIDTH 640
#define HEIGHT 480
/* origin is the upper left corner */
unsigned char image[HEIGHT][WIDTH];
/* Replace this function with something useful. */
void
draw_bitmap( FT_Bitmap* bitmap,
FT_Int x,
FT_Int y)
{
FT_Int i, j, p, q;
FT_Int x_max = x + bitmap->width;
FT_Int y_max = y + bitmap->rows;
for ( i = x, p = 0; i < x_max; i++, p++ )
{
for ( j = y, q = 0; j < y_max; j++, q++ )
{
if ( i < 0 || j < 0 ||
i >= WIDTH || j >= HEIGHT )
continue;
image[j][i] |= bitmap->buffer[q * bitmap->width + p];
}
}
}
void
show_image( void )
{
int i, j;
for ( i = 0; i < HEIGHT; i++ )
{
for ( j = 0; j < WIDTH; j++ )
putchar( image[i][j] == 0 ? ' '
: image[i][j] < 128 ? '+'
: '*' );
putchar( '\n' );
}
}
int
main( int argc,
char** argv )
{
FT_Library library;
FT_Face face;
FT_GlyphSlot slot;
FT_Matrix matrix; /* transformation matrix */
FT_Vector pen; /* untransformed origin */
FT_Error error;
char* filename;
char* text;
double angle;
int target_height;
int n, num_chars;
if ( argc != 3 )
{
fprintf ( stderr, "usage: %s font sample-text\n", argv[0] );
exit( 1 );
}
filename = argv[1]; /* first argument */
text = argv[2]; /* second argument */
num_chars = strlen( text );
angle = ( 25.0 / 360 ) * 3.14159 * 2; /* use 25 degrees */
target_height = HEIGHT;
error = FT_Init_FreeType( &library ); /* initialize library */
/* error handling omitted */
error = FT_New_Face( library, filename, 0, &face );/* create face object */
/* error handling omitted */
/* use 50pt at 100dpi */
error = FT_Set_Char_Size( face, 50 * 64, 0,
100, 0 ); /* set character size */
/* error handling omitted */
slot = face->glyph;
/* set up matrix */
matrix.xx = (FT_Fixed)( cos( angle ) * 0x10000L );
matrix.xy = (FT_Fixed)(-sin( angle ) * 0x10000L );
matrix.yx = (FT_Fixed)( sin( angle ) * 0x10000L );
matrix.yy = (FT_Fixed)( cos( angle ) * 0x10000L );
/* the pen position in 26.6 cartesian space coordinates; */
/* start at (300,200) relative to the upper left corner */
pen.x = 300 * 64;
pen.y = ( target_height - 200 ) * 64;
for ( n = 0; n < num_chars; n++ )
{
/* set transformation */
FT_Set_Transform( face, &matrix, &pen );
/* load glyph image into the slot (erase previous one) */
error = FT_Load_Char( face, text[n], FT_LOAD_RENDER );
if ( error )
continue; /* ignore errors */
/* now, draw to our target surface (convert position) */
draw_bitmap( &slot->bitmap,
slot->bitmap_left,
target_height - slot->bitmap_top );
/* increment pen position */
pen.x += slot->advance.x;
pen.y += slot->advance.y;
}
show_image();
FT_Done_Face ( face );
FT_Done_FreeType( library );
return 0;
}
Create a new project "freetype". In the project settings change the "Output" to a static library:
Add the freetype sources to the project, and build. Do not use your "amalgamated" source file - that will destroy the library granularity and lead to excessively large code.
Add the resulting freetype.lib file to your application project. The linker will select only those modules from the library that are necessary to resolve references in your application thus keeping size to a minimum.
You may get smaller code size from including the freetype source directly in your application and using cross-module optimisation (this will work regardless of the use of separate compilation or the amalgamated file); however the build time may be excessive as it requires repeated full-builds to fully optimise. Note that unlike compiler-optimisation, cross-module optimisation does not affect the debugging experience - you can use the debugger normally even with it enabled.
EDIT :
The cross-module optimisation feature may not apply when using the GNU toolchain; it refers to the use of Keil MDK-ARM which uses ARM's RealView toolchain. Other aspects of this answer may also be applicable only to MDK-ARM.
After a long research I could find an alternate solution for the problem. I could reach at freetype amalgamate project, which one is the exact solution for this .
Here all the source files are amalgamated into two files. One ".c" file and one ".h" file. So it can be easily integrate into any other project.
Here is the link for freetype amalgamate.
Thank you.
When I use overloading [][] operators in c++ to create a minimal matrix class
class matrix {
private:
vector<T> elems_;
size_t nrows_;
size_t ncols_;
public:
T const* operator[] ( size_t const r ) const { return &elems_[r * ncols_]; }
T* operator[] ( size_t const r ) { return &elems_[r * ncols_]; }
matrix ();
matrix ( size_t const nr, size_t const nc )
: elems_( nr * nc ), nrows_( nr ), ncols_( nc )
{ }
matrix ( size_t const nr, size_t const nc, T const *data)
: elems_( nr * nc ), nrows_( nr ), ncols_( nc )
{ size_t ptr=0;
for (int i=0;i<nr;i++)
for (int j=0;j<nc;j++)
elems_[ptr] = data[ptr++];
}
}
g++ returns the warning operation on ‘ptr’ may be undefined [-Wsequence-point]. In previous post
Why I got "operation may be undefined" in Statement Expression in C++? it is explained that the last thing in the compound statement should be an expression followed by a semicolon but not which purpose it serves. Does it mean that g++ will always throw this warning for compound statements that have a for-loop at the end?
If so can anyone explain why this is useful? I cannot think of any reason why anyone should be advised against ending a compound statement with a for loop.
In elems_[ptr] = data[ptr++], it is undefined whether elems_[ptr] is evaluated first or data[ptr++] is evaluated first. This is so, because = does not introduce a sequence point.
Depending on the order, elems_[ptr] = data[ptr++] yields different results. Hence the warning.
I'm having trouble figuring out why a piece of blas call is throwing n error. The problem call is the last blas call. The code compiles without issue and runs fine up until this call then fails with the following message.
** ACML error: on entry to DGEMV parameter number 6 had an illegal value
As far as I can tell everything the input types are correct and array a has
I would really appreciate an insight into the problem.
Thanks
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "cblas.h"
#include "array_alloc.h"
int main( void )
{
double **a, **A;
double *b, *B, *C;
int *ipiv;
int n, nrhs;
int info;
int i, j;
printf( "How big a matrix?\n" );
fscanf( stdin, "%i", &n );
/* Allocate the matrix and set it to random values but
with a big value on the diagonal. This makes sure we don't
accidentally get a singular matrix */
a = alloc_2d_double( n, n );
A= alloc_2d_double( n, n );
for( i = 0; i < n; i++ ){
for( j = 0; j < n; j++ ){
a[ i ][ j ] = ( ( double ) rand() ) / RAND_MAX;
}
a[ i ][ i ] = a[ i ][ i ] + n;
}
memcpy(A[0],a[0],n*n*sizeof(double)+1);
/* Allocate and initalise b */
b = alloc_1d_double( n );
B = alloc_1d_double( n );
C = alloc_1d_double( n );
for( i = 0; i < n; i++ ){
b[ i ] = 1;
}
cblas_dcopy(n,b,1,B,1);
/* the pivot array */
ipiv = alloc_1d_int( n );
/* Note we MUST pass pointers, so have to use a temporary var */
nrhs = 1;
/* Call the Fortran. We need one underscore on our system*/
dgesv_( &n, &nrhs, a[ 0 ], &n, ipiv, b, &n, &info );
/* Tell the world the results */
printf( "info = %i\n", info );
for( i = 0; i < n; i++ ){
printf( "%4i ", i );
printf( "%12.8f", b[ i ] );
printf( "\n" );
}
/* Want to check my lapack result with blas */
cblas_dgemv(CblasRowMajor,CblasTrans,n,n,1.0,A[0],1,B,1,0.0,C,1);
return 0;
}
The leading dimension (LDA) needs to be at least as large as the number of columns (n) for a RowMajor matrix. You’re passing a LDA of 1.
Separately, I’m slightly suspicious of your matrix types; without seeing how alloc_2d_double is implemented there’s no way to be sure if you’re laying out the matrix correctly or not. Generally speaking, intermixing pointer-to-pointer-style “matrices” with BLAS-style matrices (contiguous arrays with row or column stride) is something of a code smell. (However, it is possible to do correctly, and you may well be handling it properly; it’s just not possible to tell if this is the case from the code you posted).