I'm currently using Erlang for a big project but i have a question regarding a proper proceeding.
I receive bytes over a tcp socket. The bytes are according to a fixed protocol, the sender is a pyton client. The python client uses class inheritance to create bytes from the objects.
Now i would like to (in Erlang) take the bytes and convert these to their equivelant messages, they all have a common message header.
How can i do this as generic as possible in Erlang?
Kind Regards,
Me
Pattern matching/binary header consumption using Erlang's binary syntax. But you will need to know either exactly what bytes or bits your are expecting to receive, or the field sizes in bytes or bits.
For example, let's say that you are expecting a string of bytes that will either begin with the equivalent of the ASCII strings "PUSH" or "PULL", followed by some other data you will place somewhere. You can create a function head that matches those, and captures the rest to pass on to a function that does "push()" or "pull()" based on the byte header:
operation_type(<<"PUSH", Rest/binary>>) -> push(Rest);
operation_type(<<"PULL", Rest/binary>>) -> pull(Rest).
The bytes after the first four will now be in Rest, leaving you free to interpret whatever subsequent headers or data remain in turn. You could also match on the whole binary:
operation_type(Bin = <<"PUSH", _/binary>>) -> push(Bin);
operation_type(Bin = <<"PULL", _/binary>>) -> pull(Bin).
In this case the "_" variable works like it always does -- you're just checking for the lead, essentially peeking the buffer and passing the whole thing on based on the initial contents.
You could also skip around in it. Say you knew you were going to receive a binary with 4 bytes of fluff at the front, 6 bytes of type data, and then the rest you want to pass on:
filter_thingy(<<_:4/binary, Type:6/binary, Rest/binary>>) ->
% Do stuff with Rest based on Type...
It becomes very natural to split binaries in function headers (whether the data equates to character strings or not), letting the "Rest" fall through to appropriate functions as you go along. If you are receiving Python pickle data or something similar, you would want to write the parsing routine in a recursive way, so that the conclusion of each data type returns you to the top to determine the next type, with an accumulated tree that represents the data read so far.
I only covered 8-bit bytes above, but there is also a pure bitstring syntax, which lets you go as far into the weeds with bits and bytes as you need with the same ease of syntax. Matching is a real lifesaver here.
Hopefully this informed more than confused. Binary syntax in Erlang makes this the most pleasant binary parsing environment in a general programming language I've yet encountered.
http://www.erlang.org/doc/programming_examples/bit_syntax.html
I'm looking at the stratum protocol and I'm having a problem with the nbits value of the mining.notify method. I have trouble calculating it, I assume it's the currency difficulty.
I pull a notify from a dogecoin pool and it returned 1b3cc366 and at the time the difficulty was 1078.52975077.
I'm assuming here that 1b3cc366 should give me 1078.52975077 when converted. But I can't seem to do the conversion right.
I've looked here, here and also tried the .NET function BitConverter.Int64BitsToDouble.
Can someone help me understand what the nbits value signify?
You are right, nbits is current network difficulty.
Difficulty encoding is throughly described here.
Hexadecimal representation like 0x1b3cc366 consists of two parts:
0x1b -- number of bytes in a target
0x3cc366 -- target prefix
This means that valid hash should be less than 0x3cc366000000000000000000000000000000000000000000000000 (it is exactly 0x1b = 27 bytes long).
Floating point representation of difficulty shows how much current target is harder than the one used in the genesis block.
Satoshi decided to use 0x1d00ffff as a difficulty for the genesis block, so the target was
0x00ffff0000000000000000000000000000000000000000000000000000.
And 1078.52975077 is how much current target is greater than the initial one:
$ echo 'ibase=16;FFFF0000000000000000000000000000000000000000000000000000 / 3CC366000000000000000000000000000000000000000000000000' | bc -l
1078.52975077482646448605
I'm using CocoaAsyncSocket for an iOS project. I'm trying to read VarInts through an asynchronous interface. The problem is unlike something else like a String, where I can prefix a length, I don't know the length of a varint beforehand. It needs to be processed one byte at a time, but since each read operation is asynchronous other read calls may have been queued in between.
I considered reading into a buffer then processing it, say reading 5 bytes (the max length for a varint-32), and pushing extra bytes back, but that may hang unnecessarily if the varint is only 4 bytes and I'm waiting for a 5th byte to be available.
How can I do this? Also, I cannot change the protocol on the other end, to use fixed size ints.
Here's a snippet of code as Josh requested
- (void)readByte:(void (^)(int8_t))onComplete {
NSUInteger size = 1;
int32_t tag = OSAtomicAdd32(1, &_nextTag);
dispatch_async(self.dispatchQueue, ^{
[self.onCompleteHandlers setObject:(^void (NSData* data) {
int8_t x = 0;
[data getBytes:&x length:size];
onComplete(x);
}) forKey:[NSNumber numberWithInteger:((NSInteger) tag)]];
[self.socket readDataToLength:size withTimeout:-1 tag:tag];
});
}
A callback is saved in a dictionary, which is used in the delegate method socket: didReadData: withTag.
Suppose I'm reading a VarInt byte by byte:
execute read first byte for varint
don't know if we need to read another byte for a varint or not; that depends on the result of the first read
(possible) read another byte for something else
read second byte for varint, but now it's actually the 3rd byte being read
I can imagine using a flag to indicate whether or not I'm in a multipart-read, and a queue to hold reads that should be executed after the multipart-read, and I've started writing it but it's quite messy. Just wondering if there is a standard/recommended/better way to approach this problem.
in short there are 4 ways to know how much to read from a socket...
read some format that you can infer the length from like the Content-Length header... only works if the whole request can be put together before the body is sent.
read until some pattern: like \r\n\r\n at the end of the headers
read until some timeout... after you get no bytes after n seconds you flush the buffers and close the connection.
read until the server closes the connection... actually used to be pretty common.
these each have problems and I would probably lean in your case from using some existing protocol.
of course there is overhead to doing it that way, and you may find that you don't want to use any of that application level stuff and your requests may be like:
client>"doMath(2+5)\0"
server>"(7)\0"
but it is hard to answer your general question specifically.
edit:
So I looked into the varint base-128 issue a little more and I think really only a timeout or the server closing the connection will work, if you are writing these right at the TCP level which is horrible...
I have scales equipped with RS232 serial port and a Bluetooth transmitter. I made a program in VBA to receive data from the scales. However, lets say out of 10 incoming strings I get 3 distorted. My regular strings look like: "+001500./3 G S". This means 1500.3 grams above zero and the output is stable. But sometimes I get strings like separated like "+" or "001500./3" or "G S". When I plug serial cable I have no distortions.
Serial ports are just byte streams. You can never make assumptions about how many of the bytes will show up on each read operation. It's only coincidence that when you use a real cable you read the whole string at once. You have to do the string splitting yourself, and continue reading when you only get a partial result.
I'm trying to write some reasonably generic networking code. I have several kinds of packets, each represented by a different struct. The function where all my sending occurs looks like:
- (void)sendUpdatePacket:(MyPacketType)packet{
for(NSNetService *service in _services)
for(NSData *address in [service addresses])
sendto(_socket, &packet, sizeof(packet), 0, [address bytes], [address length]);
}
I would really like to be able to send this function ANY kind of packet, not just MyPacketType packets.
I thought maybe if the function def was:
- (void)sendUpdatePacket:(void*)packetRef
I could pass in anykind of pointer to packet. But, without knowing the type of packet, I can't dereference the pointer.
How do I write a function to accept any kind of primitive/struct as its argument?
What you are trying to achieve is polymorphism, which is an OO concept.
So while this would be quite easy to implement in C++ (or other OO languages), it's a bit more challenging in C.
One way you could get around is it to create a generic "packet" structure such as this:
typedef struct {
void* messageHandler;
int messageLength;
int* messageData;
} packet;
Where the messageHandler member is a function pointer to a callback routine which can process the message type, and the messageLength and messageData members are fairly self-explanatory.
The idea is that the method which you pass the packetStruct to would use the Tell, Don't Ask principle to invoke the specific message handler pointer to by messageHandler, passing in the messageLength and messageData without interpreting it.
The dispatch function (pointed to by messageHandler) would be message-specific and will be able to cast the messageData to the appropriate meaningful type, and then the meaningful fields can be extracted from it and processed, etc.
Of course, this is all much easier and more elegant in C++ with inheritance, virtual methods and the like.
Edit:
In response to the comment:
I'm a little unclear how "able to cast
the messageData to the appropriate
meaningful type, and then the
meaningful fields can be extracted
from it and processed, etc." would be
accomplished.
You would implement a handler for a specific message type, and set the messageHandler member to be a function pointer to this handler. For example:
void messageAlphaHandler(int messageLength, int* messageData)
{
MessageAlpha* myMessage = (MessageAlpha*)messageData;
// Can now use MessageAlpha members...
int messageField = myMessage->field1;
// etc...
}
You would define messageAlphaHandler() in such a way to allow any class to get a function pointer to it easily. You could do this on startup of the application so that the message handlers are registered from the beginning.
Note that for this system to work, all message handlers would need to share the same function signature (i.e. return type and parameters).
Or for that matter, how messageData
would be created in the first place
from my struct.
How are you getting you packet data? Are you creating it manually, reading it off a socket? Either way, you need to encode it somewhere as a string of bytes. The int* member (messageData) is merely a pointer to the start of the encoded data. The messageLength member is the length of this encoded data.
In your message handler callback, you don't want probably don't want to continue to manipulate the data as raw binary/hex data, but instead interpret the information in a meaningful fashion according to the message type.
Casting it to a struct essentially maps the raw binary information on to a meaningful set of attributes matching to the protocol of the message you are processing.
The key is that you must realize that everything in a computer is just an array of bytes (or, words, or double words).
ZEN MASTER MUSTARD is sitting at his desk staring at his monitor staring at a complex pattern of seemingly random characters. A STUDENT approaches.
Student: Master? May I interrupt?
Zen Master Mustard: You have answered your own inquiry, my son.
S: What?
ZMM: By asking your question about interrupting me, you have interrupted me.
S: Oh, sorry. I have a question about moving structures of varying size from place to place.
ZMM: If that it true, then you should consult a master who excels at such things. I suggest, you pay a visit to Master DotPuft, who has great knowledge in moving large metal structures, such as tracking radars, from place to place. Master DotPuft can also cause the slightest elements of a feather-weight strain gage to move with the force of a dove's breath. Turn right, then turn left when you reach the door of the hi-bay. There dwells Master DotPuft.
S: No, I mean moving large structures of varying sizes from place to place in the memory of a computer.
ZMM: I may assist you in that endeavor, if you wish. Describe your problem.
S: Specifically, I have a c function that I want to accept several different types of structs (they will be representing different type of packets). So my struct packets will be passed to my function as void*. But without knowing the type, I can't cast them, or really do much of anything. I know this is a solvable problem, because sento() from socket.h does exactly that:
ssize_t sendto(int socket, const void *message, size_t length, int flags, const struct sockaddr *dest_addr,socklen_t dest_len);
where sendto would be called like:
sendto(socketAddress, &myPacket, sizeof(myPacket), Other args....);
ZMM: Did you describe your problem to Zen Master MANTAR! ?
S: Yeah, he said, "It's just a pointer. Everything in C is a pointer." When I asked him to explain, he said, "Bok, bok, get the hell out of my office."
ZMM: Truly, you have spoken to the master. Did this not help you?
S: Um, er, no. Then I asked Zen Master Max.
ZMM: Wise is he. What was his advice to you useful?
S: No. When I asked him about sendto(), he just swirled his fists in the air. It's just an array of bytes."
ZMM: Indeed, Zen Master Max has tau.
S: Yeah, he has tau, but how do I deal with function arguments of type void*?
ZMM: To learn, you must first unlearn. The key is that you must realize that everything in a computer is just an array of bytes (or, words, or double words). Once you have a pointer to the beginning of a buffer, and the length of the buffer, you can sent it anywhere without a need to know the type of data placed in the buffer.
S: OK.
ZMM: Consider a string of man-readable text. "You plan a tower that will pierce the clouds? Lay first the foundation of humility." It is 82 bytes long. Or, perhaps, 164 if the evil Unicode is used. Guard yourself against the lies of Unicode! I can submit this text to sendto() by providing a pointer to the beginning of the buffer that contains the string, and the length of the buffer, like so:
char characterBuffer[300]; // 300 bytes
strcpy(characterBuffer, "You plan a tower that will pierce the clouds? Lay first the foundation of humility.");
// note that sizeof(characterBuffer) evaluates to 300 bytes.
sendto(socketAddress, &characterBuffer, sizeof(characterBuffer));
ZMM: Note well that the number of bytes of the character buffer is automatically calculated by the compiler. The number of bytes occupied by any variable type is of a type called "size_t". It is likely equivalent to the type "long" or "unsinged int", but it is compiler dependent.
S: Well, what if I want to send a struct?
ZMM: Let us send a struct, then.
struct
{
int integerField; // 4 bytes
char characterField[300]; // 300 bytes
float floatField; // 4 bytes
} myStruct;
myStruct.integerField = 8765309;
strcpy(myStruct.characterField, "Jenny, I got your number.");
myStruct.floatField = 876.5309;
// sizeof(myStruct) evaluates to 4 + 300 + 4 = 308 bytes
sendto(socketAddress, &myStruct, sizeof(myStruct);
S: Yeah, that's great at transmitting things over TCP/IP sockets. But what about the poor receiving function? How can it tell if I am sending a character array or a struct?
ZMM: One way is to enumerate the different types of data that may be sent, and then send the type of data along with the data. Zen Masters refer to this as "metadata", that is to say, "data about the data". Your receiving function must examine the metadata to determine what kind of data (struct, float, character array) is being sent, and then use this information to cast the data back into its original type. First, consider the transmitting function:
enum
{
INTEGER_IN_THE_PACKET =0 ,
STRING_IN_THE_PACKET =1,
STRUCT_IN_THE_PACKET=2
} typeBeingSent;
struct
{
typeBeingSent dataType;
char data[4096];
} Packet_struct;
Packet_struct myPacket;
myPacket.dataType = STRING_IN_THE_PACKET;
strcpy(myPacket.data, "Nothing great is ever achieved without much enduring.");
sendto(socketAddress, myPacket, sizeof(Packet_struct);
myPacket.dataType = STRUCT_IN_THE_PACKET;
memcpy(myPacket.data, (void*)&myStruct, sizeof(myStruct);
sendto(socketAddress, myPacket, sizeof(Packet_struct);
S: All right.
ZMM: Now, just us walk along with the receiving function. It must query the type of the data that was sent and the copy the data into a variable declared of that type. Forgive me, but I forget the exact for of the recvfrom() function.
char[300] receivedString;
struct myStruct receivedStruct;
recvfrom(socketDescriptor, myPacket, sizeof(myPacket);
switch(myPacket.dataType)
{
case STRING_IN_THE_PACKET:
// note the cast of the void* data into type "character pointer"
&receivedString[0] = (char*)&myPacket.data;
printf("The string in the packet was \"%s\".\n", receivedString);
break;
case STRUCT_IN_THE_PACKET:
// note the case of the void* into type "pointer to myStruct"
memcpy(receivedStruct, (struct myStruct *)&myPacket.data, sizeof(receivedStruct));
break;
}
ZMM: Have you achieved enlightenment? First, one asks the compiler for the size of the data (a.k.a. the number of bytes) to be submitted to sendto(). You send the type of the original data is sent along as well. The receiver then queries for the type of the original data, and uses it to call the correct cast from "pointer to void" (a generic pointer), over to the type of the original data (int, char[], a struct, etc.)
S: Well, I'll give it a try.
ZMM: Go in peace.