how to convert std::vector<float> to a tensor without copy in tensorflow in c++? - tensorflow

In c++, a multidimensional matrix is stored in std::vector<float>. I need to use it in tensorflow, which uses tensors. The conversion from a std::vector to a tensor seems not obvious. There is a c_api which convert a vector to a TF_Tensor instead of Tensor. std::copy also works, but I want to perform a conversion without copy.

Tensorflow now has a way to do this in the C++ API by providing your own tensorflow::TensorBuffer and using the following constructor:
#include <tensorflow/core/framework/tensor.h>
#include <tensorflow/core/framework/types.pb.h>
...
tensorflow::Tensor(tensorflow::DataType type, const TensorShape & shape, TensorBuffer *buf)
Since tensorflow::TensorBuffer is an abstract class, you'll need to subclass it and implement a few methods yourself (that said, it's fairly easy to do). One thing to note: notice how we have OwnsMemory() returning false. If you want to use manual memory management (malloc/free or new/delete), you can set this to true and then override the destructor yourself. That said, since you're using a vector I'd just set it to false and take care to not have the buffer go out of scope. When it does, vector will free its own internal memory anyways.
eg;
class MyBuffer: public tensorflow::TensorBuffer {
std::size_t len_;
public:
MyBuffer(void* data, std::size_t len): len_(len), tensorflow::TensorBuffer(data){}
//returns how many bytes we have in our buffer
std::size_t size() const override {return len_;};
//needed so TF knows this isn't a child of some other buffer
TensorBuffer* root_buffer() override { return this; }
// Not actually sure why we need this, but it lets TF know where the memory for this tensor came from
void FillAllocationDescription(tensorflow::AllocationDescription* proto) const override{};
// A value of false indicates this TensorBuffer does not own the underlying data
bool OwnsMemory() const override { return false; }
}
Then, you just need to provide the correct tensorflow::DataType (eg; tensorflow::DT_FLOAT32) and a tensorflow::TensorShape (you can just instantiate it and add each dimension using <TensorShape>.addDim(<the dimension>). You could modify the above by storing the std::vector inside and then exposing the contents by using .data() and a void* cast to make a constructor for MyBuffer that takes in a vector. Or you could just do that yourself outside of MyBuffer.

Related

Is that an in or in/out parameter? Doxygen, C++

If a pointer is passed to a function for read only, then this pointer is an IN parameter.
If a pointer is passed to a function for read only, but this function makes a copy of the pointer to have access to it in module related functions for read only operations, this pointer is still IN.
If the function still uses the pointer as read only, but the other module related functions use the pointer for write operations, what does that make the pointer?
An IN parameter, but without const? An in/out parameter?
Example of what I mean:
class SteeringWheel {
public: float rotation;
public: SteeringWheel(void) {
this->rotation = 0.f;
}
};
class Car {
private: SteeringWheel *steeringWheel;
public:
/**
* #param[?] steeringWheel Is the steering wheel in or in/out?
*/
Car (SteeringWheel *steeringWheel) {
this->steeringWheel = steeringWheel;
}
/**
* #param[in] degree Steering amount in degrees.
*/
void steer(float degree)
{
this->steeringWheel->rotation += degree;
}
};
int main(int argc, char **argv)
{
SteeringWheel steeringWheel();
/* car() uses steeringWheel as read only. */
Car car(&steeringWheel);
/* steer() uses steeringWheel from car() to write. */
car.steer(50.f);
return 0;
}
I believe that the in and out specifiers do not exactly mean what you think. From the doxygen documentation of the param tag:
The \param command has an optional attribute, (dir), specifying the
direction of the parameter. Possible values are "[in]", "[in,out]",
and "[out]", note the [square] brackets in this description. When a
parameter is both input and output, [in,out] is used as attribute.
The direction of the parameter usually mean the following:
in: The parameter is injected into the function as input, but not written to.
out: The parameter is injected into the function, but not as input. Rather, it is written to by the function.
in, out: The parameter is injected into the function as input and is eventually written to by the function.
In your example:
/**
* #param[?] steeringWheel Is the steering wheel in or in/out?
*/
Car (SteeringWheel *steeringWheel) {
this->steeringWheel = steeringWheel;
}
I think the steeringWheel parameter is in because you inject it and use it in your method. However, you never write to it (i.e. to the parameter itself), so it is not out. In other words, you only use your method to inject an address to your function, nothing else. The same apply for your second method, where you inject the degree parameter, but never write to it.
To clarify a bit more on the meaning of in and out, here is an example of an out parameter:
/**
* #param[out] p_param We write to the parameter!
*/
void makeFour(int * p_param)
{
*p_param = 4; // Out because of this line!
}
Notice that we write a new value directly into the parameter. This is the meaning of out: information comes out of the method through the parameter. You can now write:
int main()
{
int myInt = 0;
std::cout << myInt; // prints 0.
makeFour(&myInt); // p_param == &myInt here.
std::cout << myInt; // prints 4: the method wrote directly
// in the parameter (out)!
return 0;
}
Hope this helps!
It is not easy to decide, but I would still mark your parameter as in,out (or out), as it is a pointer to a non-const object, and you may change the state of that outside object directly or indirectly later - as in your example.
Marking it in hides the detail that the pointed SteeringWheel object may change later upon usage of Car.
Also, it can puzzle users why an input only pointer parameter is not marked const.
Making it in,out may not be accurate completely, but is surely more error prone.
An alternative could be something like the following (a note regarding the lifetime of the SteeringWheel should come handy here anyway):
/**
* #param[in] steeringWheel Pointer to the SteeringWheel object.
* #warning The memory address of the pointed object is saved.
* It must outlive this object, and can change upon usage of this object.
*/
Car (SteeringWheel *steeringWheel) {
this->steeringWheel = steeringWheel;
}
But I would just probably stick with marking it in,out.
Specifying the direction of parameters in C++ may be complicated, and frankly speaking, I am not too much in favor of them, as having tokens for pointers, references, and the keyword for constness provide enough information in the signature on how a parameter may be used. Thus, marking it in the DoxyPress documentation is a bit redundant, not expressive enough (as your example shows), and may get out of sync with the implementation. Documenting parameter directions may play a bigger role in case of other languages that lack these additional constructs in function signatures.

How to dismember structure data and operators?

I want to build algebraic system, so I need a carrier, which is basically some data type, and a bunch of operators over that type. It is natural for algebras to differ in signature meaning the same type might have different set of operators with the same notation.
Say I have a vector type. Normally I would use euclidean metric and norm for it, so I import vector, euclidean, where vector contains data declaration for vector type, but all the overloaded operators for the same vector go to euclidean. Then when I want to work with riemanian space I simply import vector, riemanian and get a completely different algebra with the same interface.
I know, this can be achieved in object paradigm via inheritance, but maybe it is possible to do that with plain modules? All I need is to declare data in one module and operators in other all for the same structure.
Two possibilities come to mind. One is using UFCS, defining named functions (it won't work for the operator overloads) in other modules that take the type as the first parameter, then are callable with dot syntax (forgive me if I mess up the math here):
module myvector;
struct vector {
float x;
float y;
}
module myvectormath;
import myvector;
vector add(vector lhs, vector rhs) {
// inside, it is just a regular function
vector result;
result.x = lhs.x + rhs.x;
result.y = lhs.y + rhs.y;
return result;
}
usage:
import myvector;
import myvectormath;
// but it can be called with dot notation
vector a = vector(0,0).add(vector(5, 5));
Another possible way is to put the data in a struct or a mixin template, then make the math by putting that in another struct with the needed functions:
// data definition
module myvector;
// the data will be an external named type, so we can pass it on more easily - will help interop
struct VectorData {
float x;
float y;
}
// and this provides the stuff to get our other types started
mixin template vector_payload() {
// constructors for easy initialization
this(float x, float y) {
_data.x = x;
_data.y = y;
}
this(VectorData d) {
_data = d;
}
// storing our data
VectorData _data;
// alias this is a feature that provides a bit of controlled implicit casting..
alias _data this;
}
// math module #1
module myvectormath;
import myvector;
struct vector {
// mixin all the stuff from above, so we get those ctors, the data, etc.
mixin vector_payload!();
// and add our methods, including full operator overloading
vector opBinary(string op:"+")(vector rhs) {
vector result;
result.x = this.x + rhs.x;
result.y = this.y + rhs.y;
return result;
}
}
// math module #2
module myvectormath2;
import myvector;
struct vector {
// again, mix it in
mixin vector_payload!();
// and add our methods
vector opBinary(string op:"+")(vector rhs) {
vector result;
// this one has horribly broken math lol
result.x = this.x - rhs.x;
result.y = this.y - rhs.y;
return result;
}
}
// usage
import myvectormath;
// OR
//import myvectormath2;
void main() {
vector a = vector(0, 0) + vector(5, 5);
import std.stdio;
writeln(a);
}
In the usage module, if you just replace imports, the rest of the code remains unmodified. What happens though if you want to use both modules at once and intermix them? That's where the inner struct _Data, the constructor taking it, and alias this magic comes in. First, we'll import both and see what happens:
test32.d(23): Error: myvectormath.vector at test324.d(4) conflicts with myvectormath2.vector at test322.d(4)
So, first, we want to disambiguate the name. There's all kinds of ways to do this, you can learn more in the import section of the D docs: http://dlang.org/module.html#Import
For now, I'm going to just use the fully qualified name.
// usage
import myvectormath;
import myvectormath2;
void main() {
// specify the kind we want to use here...
myvectormath.vector a = myvectormath.vector(0, 0) + myvectormath.vector(5, 5);
import std.stdio;
writeln(a); // and we get a result of 0, 5, so it used the addition version correctly
}
How can we easily move them around internally? Let's make a function that uses version #2:
void somethingWithMath2(myvectormath2.vector vec) {
// whatever
}
It will complain if you pass the variable "a" to it because it is myvectormath.vector, and this is myvectormath2.
test32.d(27): Error: function test32.somethingWithMath2 (vector a) is not callable using argument types (vector)
But, we can pretty easily convert them thanks to the external data struct, the ctor, and alias this in the mixin template:
somethingWithMath2(myvectormath2.vector(a));
Compiles! The way that works under the hood is myvectormath2.vector has two constructors: (float, float) and (VectorData). Neither of them match the type of a, so next it tries a's alias this... which is VectorData. So it implicitly converts and then matches the VectorData ctor.
You could also just pass the data around:
import myvector;
void somethingWithMath2(VectorData a_in) {
// to do math on it, we construct the kind of vectormath we're interested in:
auto a = myvectormath2.vector(a_in);
// and use it
}
And then call it this way:
// will implicitly convert any of the sub vectormath types to the base data so this just works
somethingWithMath2(a);
Passing around the data would probably be most nice, since then the caller doesn't need to know what kind of stuff you'll be doing with it.
The constructor it uses here is trivial by the way, and shouldn't incur significant runtime loss (possibly none at all if the compiler switch is set to inline it; this is basically just a reinterpret_cast; the data representation is identical).
Note that it will not let you add myvectormath2.vector + myvectormath.vector, that will be a type mismatch. But if you do want to allow that, all you have to do is change the overloaded operator to accept VectorData instead of one of the math types! Then it will implicitly convert and you have the same data to work on. Think of VectorData as being a base class in OOP terms.
I think that covers the bases, let me know if you have any further questions.

What kind of pointer returned if I use "&" to get address of a value type in C++\CLI?

Suppose I write the following code:
public ref class Data
{
public:
Data()
{
}
Int32 Age;
Int32 year;
};
public void Test()
{
int age = 30;
Int32 year = 2010;
int* pAge = &age;
int* pYear = &year;
Data^ data = gcnew Data();
int* pDataYear = &data->Year; // pData is interior pointer and the compiler will throw error
}
If you compile the program, the compiler will throw error:
error C2440: 'initializing' : cannot convert from 'cli::interior_ptr' to 'int *'
So I learned the "&data->Year" is a type of interior pointer.
UPDATES: I tried to use "&(data->Year)", same error.
But how about pAge and pYear?
Are they native pointers, interior pointers or pinned pointers??
If I want to use them in the following native function:
void ChangeNumber(int* pNum);
Will it be safe to pass either pAge or pYear?
They (pAge and pYear) are native pointers, and passing them to a native function is safe. Stack variables (locals with automatic storage lifetime) are not subject to being rearranged by the garbage collector, so pinning is not necessary.
Copying managed data to the stack, then passing it to native functions, solves the gc-moving-managed-data-around problem in many cases (of course, don't use it in conjunction with callbacks that expect the original variable to be updated before your wrapper has a chance to copy the value back).
To get a native pointer to managed data, you have to use a pinning pointer. This can be slower than the method of copying the value to the stack, so use it for large values or when you really need the function to operate directly on the same variable (e.g. the variable is used in callbacks or multi-threading).
Something like:
pin_ptr<int> p = &mgd_obj.field;
See also the MSDN documentation

Creating a global "null" struct for re-use in C program?

Not sure what I'm doing wrong here. I have a struct that is used heavily through my program.
typedef struct _MyStruct {
// ... handful of non-trivial fields ...
} MyStruct;
I expect (read, intend) for lots of parts of the program to return one of these structs, but many of them should be able to return a "null" struct, which is a singleton/global. The exact use case is for the implementing function to say "I can't find what you asked me to return".
I assumed this would be a simple case of defining a variable in a header file, and initializing it in the .c file.
// MyStruct.h
// ... Snip ...
MyStruct NotFoundStruct;
-
// MyStruct.c
NotFoundStruct.x = 0;
NotFoundStruct.y = 0;
// etc etc
But the compiler complains that the initialization is not constant.
Since I don't care about what this global actually references in memory, I only care that everything uses the same global, I tried just removing the initialization and simply leaving the definition in the header.
But when I do this:
MyStruct thing = give_me_a_struct(some_input);
if (thing == NotFoundStruct) {
// ... do something special
}
Th compiler complains that the operands to the binary operator "==" (or "!=") are invalid.
How does one define such as globally re-usable (always the same memory address) struct?
This doesn't directly answer your question, but it won't fit in a comment...
If you have a function that may need to return something or return nothing, there are several options that are better than returning a "null struct" or "sentinel struct," especially since structs are not equality comparable in C.
One option is to return a pointer, so that you can actually return NULL to indicate that you are really returning nothing; this has the disadvantage of having significant memory management implications, namely who owns the pointer? and do you have to create an object on the heap that doesn't already exist on the heap to do this?
A better option is to take a pointer to a struct as an "out" parameter, use that pointer to store the actual result, then return an int status code indicating success or failure (or a bool if you have a C99 compiler). This would look something like:
int give_me_a_struct(MyStruct*);
MyStruct result;
if (give_me_a_struct(&result)) {
// yay! we got a result!
}
else {
// boo! we didn't get a result!
}
If give_me_a_struct returns zero, it indicates that it did not find the result and the result object was not populated. If it returns nonzero, it indicates that it did find the result and the result object was populated.
C doesn't allow global non-const assignments. So you must do this in a function:
void init() {
NotFoundStruct.x = 0;
NotFoundStruct.y = 0;
}
As for the comparison, C doesn't know how to apply a == operator to a struct. You can overload (redefine) the operator in C++, but not in C.
So to see if a return value is empty, your options are to
Have each function return a boolean value to indicate found or not, and return the struct's values via pointers through the argument list. (eg. bool found = give_me_a_struct(some_input, &thing);)
Return a pointer to a struct, which can be NULL if nothing exists. (eg. MyStruct* thing = give_me_a_struct(some_input);)
Add an additional field to the struct that indicates whether the object is valid.
The third option is the most generic for other cases, but requires more data to be stored. The best bet for your specific question is the first option.
// MyStruct.h
typedef struct _MyStruct {
// fields
} MyStruct;
extern MyStruct NotFoundStruct;
// MyStruct.c
#include "my_struct.h"
MyStruct NotFoundStruct = {0};
But since you can't use the == operator, you will have to find another way to distinguish it. One (not ideal) way is to have a bool flag reserved to indicate validity. That way, only that must be checked to determine if it's a valid instance.
But I think you should consider James's proposed solution instead
In the header:
// Structure definition then
extern MyStruct myStruct;
In the .c that contains global data
struct MyStruct myStruct
{
initialize field 1,
initialize field 2,
// etc...
};

AutoPtr in C++/CLI mixed mode

I have a C++/CLI wrapper around native .lib and .h files. I use the AutoPtr class pretty extensively in the wrapper class to manage the unmanaged objects I create for wrapping. I have hit a roadblock with the copy constructor/assignment operator.
Using the AutoPtr class from Mr. Kerr: http://weblogs.asp.net/kennykerr/archive/2007/03/26/AutoPtr.aspx
He suggests the following(in the comments) to recreate the behavior of the assignment operator:
SomeManagedClass->NativePointer.Reset(new NativeType);
Which I believe is true. But when I compile my code:
ByteMessageWrap (const ByteMessageWrap% rhs)
{
AutoPtr<ByteMessage> m_NativeByteMessage(rhs.m_NativeByteMessage.GetPointer());
};
ByteMessageWrap% operator=(const ByteMessageWrap% rhs)
{
//SomeManagedClass->NativePointer.Reset(new NativeType);
if (this == %rhs) // prevent assignment to self
return *this;
this->m_NativeByteMessage.Reset(rhs.m_NativeByteMessage.GetPointer());
return *this;
};
-- I get the following errors:
error C2662:
'WrapTest::AutoPtr::GetPointer' :
cannot convert 'this' pointer from
'const WrapTest::AutoPtr' to
'WrapTest::AutoPtr %'
Has anyone experienced similar issues?
For further background on the answer, I removed the "const" keyword from the signature. I know that is not smiled upon in terms of code correctness for a copy ctor, but the CLR doesn't like it at all -- sort of belies the CLR at its core with memory management.
I wonder if it's possible to leave the const in the signature and then use GCHandle or pin_ptr to make sure memory doesn't move on you while performing the copy?
Looking at Kenny Kerr's AutoPtr, it transfers ownership in its constructor -- essentially a "move" constructor rather than a copy constructor. This is analogous with std::auto_ptr.
If you really want to transfer ownership from rhs to this (i.e. leave rhs without it NativeByteMessage), you need to change your copy ctor into a move ctor.
Also, you need to use initialization syntax;
// warning - code below doesn't work
ByteMessageWrap (ByteMessageWrap% rhs)
: m_NativeByteMessage(rhs.m_NativeByteMessage); // take ownership
{
}
ByteMessageWrap% operator=(ByteMessageWrap% rhs)
{
//SomeManagedClass->NativePointer.Reset(new NativeType);
if (this == %rhs) // prevent assignment to self
return *this;
m_NativeByteMessage.Reset(rhs.m_NativeByteMessage.Release());
return *this;
}