Send heavy data through protobuf. Custom field - numpy

I'm developing the API for the application using protobuf and grpc.
I need to send the data with the arbitrary size. Sometimes it is small, sometimes huge. For example Nympy array. If the size is small I want to send it through protobuf, if the size if huge I want to dump data into file and send the filepath to this file through protobuf.
To do so I've created a following .proto messages:
message NumpyTroughProtobuf {
repeated int32 shape = 1;
repeated float array = 2;
}
message NumpyTroughfile {
string filepath = 1;
}
message NumpyTrough {
google.protobuf.Any data = 1;
}
The logic is simple: If the size is big I use data as NumpyTroughfile or if small data as NumpyTroughProtobuf.
Problem (what I want to avoid):
The mechanism of data transformation is the part of my app.
In the current approach I have to check and covert the data before I create the message NumpyTrough. So I have to add some logic into my application which will care of data check and cast. The same I have to do for any language which I use (for example if I send massages from Python to C++).
What I want to do:
The mechanism of data transformation is the part of customized protobuf.
I want to hide the data transformation. I want that my app to send a pure Numpy array into NumpyTrough.data field. All data transformation should be hided.
So I want that the logic of data transformation be the part of custom Protobuf field, not the part of my application.
Which meant that I would like to create a custom type of the field. I just implement the behavior of this filed (marshal/unmarshal) for any languages which I use. Then I can just directly send Numpy data into this custom field and this field will decide how to proceed: turn the data in into file or via other method, send trough Protobuf and restore on the receiver side.
Somethig like this https://github.com/gogo/protobuf/blob/master/custom_types.md but it seems this is not a part of protobuf ecosystem.

Protobuf only defines schema.
You can't add logic to a protobuf definition.
Protobug Any represents arbitrary binary data and so -- somewhere -- you'll need to explain to your users what it represents in order that they can ship data in the correct format to your service.
You get to decide how to distribute the processing of the data:
Either partly client-side functionality that performs preprocessing of the data and ships the output (either as structured data using non-Any types or, if still necessary as Any).
Or partly server-side that receives entirely unprocessed client-side data shipped through Any
Or some combination of the two
NOTE You may want to consider shipping the data regardless of size as file references to simplify your implementation. You're correct to bias protobuf to smaller message sizes but, depending on the file size distribution, does it make sense to complicate your implementation with 2 paths?

Related

GNU Radio block with variable number of intputs/outputs

I am currently trying to do the signal processing of multiple channels in parallel using a custom source block. Up to now I created an OOT-source block which streams data for only one channel into one output perfectly fine.
Now I am searching for a way to expand this module so that it can support a higher number of channels ( = outputs of the source block; up to 64) in parallel. Because the protocol used to pull the samples pulls them all at one it is not possible to use more instances of the same source block.
Things I have found so far:
A pdf which seems to explain exactly what to do already but it seems that it is outdated and that this functionality is no longer supported under GNU Radio.
The description of a feature which should be implemented in the future.
Is there are known solution or workaround for this problem?
Look at the add block: It has configurable many inputs!
Now, the trick here is threefold:
define an io_signature as input and output that allows for adjustable numbers. If you use gr_modtool add to create a new block, your io_signatures will be filled with <+MIN_IN+>, <+MAX_IN+>, <+MIN_OUT+> and <+MAX_OUT+>. Adjust these to reflect your actual minimum and maximum in- and output port numbers. If you want to have 1 to infinity inputs, use 1, -1.
in your (general_)work method, check for the number of inputs by doing something like ninputs = input_items.size(), and for the number of outputs by doing noutputs = output_items.size().
(optionally, if you want to use GRC) modify the <sink>/<source> definitions in your block GRC XML:
<sink>
<name>in</name>
<type>complex</type>
<nports>$num_inputs</nports>
</sink>
num_inputs could be a block parameter; compare the add_XX block source code.

How to write a structure in kernel space to a file in user space using kernel module?

struct stud
{
char name[10];
int rno;
}s[10];
I want to send the data of structure array s from a kernel module to a file in userspace. One way is to combine the data to form a string and send through copy_to_user() but it'll further require tokenization to separate out the data in userspace.
Plz suggest some effective method.
#Gaurav, go through the below link and decide upon which mechanism to use to transfer data from kernel space to user space or vice-versa.
http://people.ee.ethz.ch/~arkeller/linux/kernel_user_space_howto.html
Hope link helps to solve your problem!.

How to create a lazy-evaluated range from a file?

The File I/O API in Phobos is relatively easy to use, but right now I feel like it's not very well integrated with D's range interface.
I could create a range delimiting the full contents by reading the entire file into an array:
import std.file;
auto mydata = cast(ubyte[]) read("filename");
processData(mydata); // takes a range of ubytes
But this eager evaluation of the data might be undesired if I only want to retrieve a file's header, for example. The upTo parameter doesn't solve this issue if the file's format assumes a variable-length header or any other element we wish to retrieve. It could even be in the middle of the file, and read forces me to read all of the file up to that point.
But indeed, there are alternatives. readf, readln, byLine and most particularly byChunk let me retrieve pieces of data until I reach the end of the file, or just when I want to stop reading the file.
import std.stdio;
File file("filename");
auto chunkRange = file.byChunk(1000); // a range of ubyte[]s
processData(chunkRange); // oops! not expecting chunks!
But now I have introduced the complexity of dealing with fixed size chunks of data, rather than a continuous range of bytes.
So how can I create a simple input range of bytes from a file that is lazy evaluated, either by characters or by small chunks (to reduce the number of reads)? Can the range in the second example be seamlessly encapsulated in a way that the data can be processed like in the first example?
You can use std.algorithm.joiner:
auto r = File("test.txt").byChunk(4096).joiner();
Note that byChunk reuses the same buffer for each chunk, so you may need to add .map!(chunk => chunk.idup) to lazily copy the chunks to the heap.

Erlang binary protocol serialization

I'm currently using Erlang for a big project but i have a question regarding a proper proceeding.
I receive bytes over a tcp socket. The bytes are according to a fixed protocol, the sender is a pyton client. The python client uses class inheritance to create bytes from the objects.
Now i would like to (in Erlang) take the bytes and convert these to their equivelant messages, they all have a common message header.
How can i do this as generic as possible in Erlang?
Kind Regards,
Me
Pattern matching/binary header consumption using Erlang's binary syntax. But you will need to know either exactly what bytes or bits your are expecting to receive, or the field sizes in bytes or bits.
For example, let's say that you are expecting a string of bytes that will either begin with the equivalent of the ASCII strings "PUSH" or "PULL", followed by some other data you will place somewhere. You can create a function head that matches those, and captures the rest to pass on to a function that does "push()" or "pull()" based on the byte header:
operation_type(<<"PUSH", Rest/binary>>) -> push(Rest);
operation_type(<<"PULL", Rest/binary>>) -> pull(Rest).
The bytes after the first four will now be in Rest, leaving you free to interpret whatever subsequent headers or data remain in turn. You could also match on the whole binary:
operation_type(Bin = <<"PUSH", _/binary>>) -> push(Bin);
operation_type(Bin = <<"PULL", _/binary>>) -> pull(Bin).
In this case the "_" variable works like it always does -- you're just checking for the lead, essentially peeking the buffer and passing the whole thing on based on the initial contents.
You could also skip around in it. Say you knew you were going to receive a binary with 4 bytes of fluff at the front, 6 bytes of type data, and then the rest you want to pass on:
filter_thingy(<<_:4/binary, Type:6/binary, Rest/binary>>) ->
% Do stuff with Rest based on Type...
It becomes very natural to split binaries in function headers (whether the data equates to character strings or not), letting the "Rest" fall through to appropriate functions as you go along. If you are receiving Python pickle data or something similar, you would want to write the parsing routine in a recursive way, so that the conclusion of each data type returns you to the top to determine the next type, with an accumulated tree that represents the data read so far.
I only covered 8-bit bytes above, but there is also a pure bitstring syntax, which lets you go as far into the weeds with bits and bytes as you need with the same ease of syntax. Matching is a real lifesaver here.
Hopefully this informed more than confused. Binary syntax in Erlang makes this the most pleasant binary parsing environment in a general programming language I've yet encountered.
http://www.erlang.org/doc/programming_examples/bit_syntax.html

Can mrjob tasks output sets?

I tried outputting a python set from a mapper in mrjob. I changed the function signatures of my combiners and reducers accordingly.
However, I get this error:
Counters From Step 1
Unencodable output:
TypeError: 172804
When I change the sets to lists, this error disappears. Are there certain python types that cannot be outputted by mappers in mrjob?
Values are moved between stages of the MapReduce using Protocols, generally Raw, JSON or Pickle.
You must make sure that the values being moved around can be properly handled by the Protocol you pick. I would imagine that there's no default JSON representation of a set, and perhaps there's no raw representation either?
Try setting the INTERNAL_PROTOCOL to Pickle, as so:
class yourMR(MRJob):
INTERNAL_PROTOCOL = PickleProtocol
def map(self, key, value):
# mapper
def reduce(self, key, value):
# reducer
Note: MRJob will handle pickling and unpickling for you, so don't worry about that aspect. You can also set the INPUT and OUTPUT protocols if necessary (for multiple stages, or set output from the reducer).