I am using the TRX library to process the ISO8583 Message. I am receiving a Raw data EOF character. But the last byte is not removed from the buffer as it's not defined in the packager and it's causing an issue in parsing the next transaction. How to manage this?
And while sending a response back how to add EOF character?
Normally, this kind of stuff (protocol characters) is removed before decoding ISO8583 data.
For example, you read 100 bytes from the socket. And it's ISO data + EOF character. You would remove EOF character and process 99 bytes of ISO data through a decoder.
And reverse is true when you send data. You encode your data first then add EOF character. The resulting byte array goes into the socket.
Sorry, I don't know anything about TRX library, but, hopefully, general advice helps you somewhat.
Related
So I have a function I'm using to read data from a file. It works fine if the file is plain text, but when I try to read a binary file, like a png, it returns a different text (diff confirms that). I opened a hex editor to see what was wrong and found out it is putting some c2 bytes along with the file (I don't know if the position is random or if there are other bytes except this c2 one).
This is my function. I just want it to read and save to a variable.
proc read_file {path} {
set channel [open $path r]
fconfigure $channel -translation binary
set return_string "[read $channel]"
close $channel
return "$return_string"
}
To actually print, I'm doing this:
puts -nonewline [read_file file.png]
When you open a file, it defaults to being in text mode . In text mode (which is really a combination of options) the IO layer translates characters from whatever encoding they are in into Tcl's internal encoding, and does the reverse operation on output. The default encoding scheme is platform specific, but in your case it sounds like it is UTF-8. (Tcl uses a complex internal system of encodings; it doesn't expose those to the outside world.)
By contrast, when you put the channel into binary mode, the bytes on the outside are directly mapped to characters in the range 0-255 (and vice versa on output). You get a perfect copy, provided you put both input and output channels in binary mode. (There are other optimisations for binary mode, but they don't matter here.)
When you only put one of the channels in binary mode, you get what looks like corruption. It isn't random though. In particular, when the input is binary but the output is UTF-8, input bytes in the range 128-255 get converted into multiple output bytes, where the first of those bytes is in the sort of range you observed. There are other combinations that mess things up; the whole range of problems is collectively known as mojibake.
tl;dr Don't mix up binary and text data unless you're very careful. The results of getting it wrong are "surprising".
I store a byte array with 5 bytes in a Redis entry. Writing and reading using a client library works and expected, but when I try to read the value in a Redis console I get something I don't know how to interpret:
>get keyHere
"\x02\x8e\x8b\x0cb"
There is something I clearly don't understand because \x0cb is not a hex value for a byte and there are only 4 \x (and I expected 5 for 5 bytes).
Confused, I decided to performed an experiment. I educated myself about how to set raw bytes; I set an entry's value to "\x01\x07" and read it back. I expected "\x01\x07" but the read value is shown as "\x01\a".
>set "3" "\x01\x07"
OK
>get 3
"\x01\a"
How should I read entries in a Redis cache in the Redis console to see raw bytes?
If the byte is not printable, redis-cli prints the hex format, otherwise, it prints the c-escpaped sequence.
because \x0cb is not a hex value for a byte and there are only 4 \x (and I expected 5 for 5 bytes)
The first 4 bytes are not printable, so they are printed as hex format. The last byte is the b, which is printable.
I expected "\x01\x07" but the read value is shown as "\x01\a".
\x07's c-escaped sequence is \a, and is printable.
How should I read entries in a Redis cache in the Redis console to see raw bytes?
If you need the raw bytes (which might not be printable), you can specify the --raw option when running redis-cli.
I'm rewriting my P5 socket server in P6 using IO::Socket::Async, but the data received got truncated 1 character at the end and that 1 character is received on the next connection. Someone from Perl6 Facebook group (Jonathan Worthington) pointed that this might be due to the nature of strings and bytes are handled very differently in P6. Quoted:
In Perl 6, strings and bytes are handled very differently. Of note, strings work at grapheme level. When receiving Unicode data, it's not only possible that a multi-byte sequence will be split over packets, but also a multi-codepoint sequence. For example, one packet might have the letter "a" at the end, and the next one would be a combining acute accent. Therefore, it can't safely pass on the "a" until it's seen how the next packet starts.
My P6 is running on MoarVM
https://pastebin.com/Vr8wqyVu
use Data::Dump;
use experimental :pack;
my $socket = IO::Socket::Async.listen('0.0.0.0', 7000);
react {
whenever $socket -> $conn {
my $line = '';
whenever $conn {
say "Received --> "~$_;
$conn.print: &translate($_) if $_.chars ge 100;
$conn.close;
}
}
CATCH {
default {
say .^name, ': ', .Str;
say "handled in $?LINE";
}
}
}
sub translate($raw) {
my $rawdata = $raw;
$raw ~~ s/^\s+|\s+$//; # remove heading/trailing whitespace
my $minus_checksum = substr($raw, 0, *-2);
my $our_checksum = generateChecksum($minus_checksum);
my $data_checksum = ($raw, *-2);
# say $our_checksum;
return $our_checksum;
}
sub generateChecksum($minus_checksum) {
# turn string into Blob
my Blob $blob = $minus_checksum.encode('utf-8');
# unpack Blob into ascii list
my #array = $blob.unpack("C*");
# perform bitwise operation for each ascii in the list
my $dec +^= $_ for $blob.unpack("C*");
# only take 2 digits
$dec = sprintf("%02d", $dec) if $dec ~~ /^\d$/;
$dec = '0'.$dec if $dec ~~ /^[a..fA..F]$/;
$dec = uc $dec;
# convert it to hex
my $hex = sprintf '%02x', $dec;
return uc $hex;
}
Result
Received --> $$0116AA861013034151986|10001000181123062657411200000000000010235444112500000000.600000000345.4335N10058.8249E00015
Received --> 0
Received --> $$0116AA861013037849727|1080100018112114435541120000000000000FBA00D5122500000000.600000000623.9080N10007.8627E00075
Received --> D
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
Received --> $$0108AA863835028447675|18804000181121183810421100002A300000100900000000.700000000314.8717N10125.6499E00022
Received --> 7
First of all, TCP connections are streams, so there's no promises that the "messages" that are sent will be received as equivalent "messages" on the receiving end. Things that are sent can be split up or merged as part of normal TCP behavior, even before Perl 6 behavior is considered. Anything that wants a "messages" abstraction needs to build it on top of the TCP stream (for example, by sending data as lines, or by sending a size in bytes, followed by the data).
In Perl 6, the data arriving over the socket is exposed as a Supply. A whenever $conn { } is short for whenever $conn.Supply { } (the whenever will coerce whatever it is given into a Supply). The default Supply is a character one, decoded as UTF-8 into a stream of Perl 6 Str. As noted in the answer you already received, strings in Perl 6 work at grapheme level, so it will keep back a character in case the next thing that arrives over the network is a combining character. This is the "truncation" that you are experiencing. (There are some things which can never be combined. For example, \n can never have a combining character placed on it. This means that line-oriented protocols won't encounter this kind of behavior, and can be implemented as simply whenever $conn.Supply.lines { }.)
There are a couple of options available:
Do whenever $conn.Supply(:bin) { }, which will deliver binary Blob objects, which will correspond to what the OS passed to the VM. That can then be .decode'd as wanted. This is probably your best bet.
Specify an encoding that does not support combining characters, for example whenever $conn.Supply(:enc('latin-1')) { }. (However, note that since \r\n is 1 grapheme, then if the message were to end in \r then that would be held back in case the next packet came along with a \n).
In both cases, it's still possible for messages to be split up during transmission, but these will (entirely and mostly, respectively) avoid the keep-one-back requirement that grapheme normalization entails.
Why do messages going through websockets always start with \x00 and end with \xff, as in \x00Your message\xff?
This documentation might help...
Excerpt from section 1.2:-
Data is sent in the form of UTF-8 text. Each frame of data starts
with a 0x00 byte and ends with a 0xFF byte, with the UTF-8 text in
between.
The WebSocket protocol uses this framing so that specifications that
use the WebSocket protocol can expose such connections using an
event-based mechanism instead of requiring users of those
specifications to implement buffering and piecing together of
messages manually.
To close the connection cleanly, a frame consisting of just a 0xFF
byte followed by a 0x00 byte is sent from one peer to ask that the
other peer close the connection.
The protocol is designed to support other frame types in future.
Instead of the 0x00 and 0xFF bytes, other bytes might in future be
defined. Frames denoted by bytes that do not have the high bit set
(0x00 to 0x7F) are treated as a stream of bytes terminated by 0xFF.
Frames denoted by bytes that have the high bit set (0x80 to 0xFF)
have a leading length indicator, which is encoded as a series of
7-bit bytes stored in octets with the 8th bit being set for all but
the last byte. The remainder of the frame is then as much data as
was specified. (The closing handshake contains no data and therefore
has a length byte of 0x00.)
The working spec has changed and no longer uses 0x00 and 0xFF as start and end bytes
http://tools.ietf.org/id/draft-ietf-hybi-thewebsocketprotocol-04.html
I am not 100% sure about this but my guess would be to signify the start and end of the message. Since x00 is a single byte representation of 0 and xFF is a single byte representation of 255
I am trying to create NMEA-compatible proprietary sentences, which may contain arbitrary strings.
The usual format for an NMEA sentence with checksum is:
$GPxxx,val1,val2,...,valn*ck<cr><lf>
where * marks the start of a 2-digit checksum.
My question is: Can any of the value fields contain a * character themselves?
It would seem possible for a parser to wait for the final <cr><lf>, then to look back at the previous 3 characters to find the checksum if present (rather than just waiting for the first * in the sentence). However I don't know if the standard allows it.
Are there other characters which may cause problems?
The two ASCII characters to be careful with are $, which has to be at the start, and * which precedes the checksum. Anyone else parsing your custom NMEA wouldn't expect to find either of those characters anywhere else. Some parsers, when they hit a $ assume that a new line has started. With serial port communication sometimes characters get lost in transit, and that's why there's a $ start of sentence marker.
If you're going to make your own NMEA commands it is customary to start them with P followed by a 3 character code indicating the manufacturer or company creating the proprietary message, so you could use $PSQU. Note that although it is recommended that NMEA commands are 5 characters long, there are proprietary messages out there by various hardware and software manufacturers that are anywhere from 4 characters to 7 characters long.
Obviously if you're writing your own parser you can do what you like.
This website is rather useful:
http://www.gpsinformation.org/dale/nmea.htm
If you're extending the protocol yourself (based on "proprietary") - then sure, you can put in anything you like. I would stick to ASCII, but go wild within those bounds. (Obviously, you need to come up with your own $GPxxx so as not to clash with existing messages. Perhaps a new header $SQUEL, ...)
By definition, a proprietary message will not be NMEA-compatible.
A standard parser listening to an NMEA stream should ignore anything that doesn't match what it thinks is 'good' data. That means a checksum error, or any massively corrupted message like it would think your new message is with some random *s thrown in.
If you are merely writing an existing message, then a * doesn't make sense, and should be ignored, but you run the risk of major issues if the checksum is correct, and the parser doesn't understand the payload.