Can NMEA values contain '*' (asterisks)? - gps

I am trying to create NMEA-compatible proprietary sentences, which may contain arbitrary strings.
The usual format for an NMEA sentence with checksum is:
$GPxxx,val1,val2,...,valn*ck<cr><lf>
where * marks the start of a 2-digit checksum.
My question is: Can any of the value fields contain a * character themselves?
It would seem possible for a parser to wait for the final <cr><lf>, then to look back at the previous 3 characters to find the checksum if present (rather than just waiting for the first * in the sentence). However I don't know if the standard allows it.
Are there other characters which may cause problems?

The two ASCII characters to be careful with are $, which has to be at the start, and * which precedes the checksum. Anyone else parsing your custom NMEA wouldn't expect to find either of those characters anywhere else. Some parsers, when they hit a $ assume that a new line has started. With serial port communication sometimes characters get lost in transit, and that's why there's a $ start of sentence marker.
If you're going to make your own NMEA commands it is customary to start them with P followed by a 3 character code indicating the manufacturer or company creating the proprietary message, so you could use $PSQU. Note that although it is recommended that NMEA commands are 5 characters long, there are proprietary messages out there by various hardware and software manufacturers that are anywhere from 4 characters to 7 characters long.
Obviously if you're writing your own parser you can do what you like.
This website is rather useful:
http://www.gpsinformation.org/dale/nmea.htm

If you're extending the protocol yourself (based on "proprietary") - then sure, you can put in anything you like. I would stick to ASCII, but go wild within those bounds. (Obviously, you need to come up with your own $GPxxx so as not to clash with existing messages. Perhaps a new header $SQUEL, ...)
By definition, a proprietary message will not be NMEA-compatible.
A standard parser listening to an NMEA stream should ignore anything that doesn't match what it thinks is 'good' data. That means a checksum error, or any massively corrupted message like it would think your new message is with some random *s thrown in.
If you are merely writing an existing message, then a * doesn't make sense, and should be ignored, but you run the risk of major issues if the checksum is correct, and the parser doesn't understand the payload.

Related

What do the ASCII characters preceding a carriage return represent in a PDF page?

This is probably a rather basic question, but I'm having a bit of trouble figuring it out, and it might be useful for future visitors.
I want to get at the raw data inside a PDF file, and I've managed to decode a page using the Python library PyPDF2 with the following commands:
import PyPDF2
with open('My PDF.pdf', 'rb') as infile:
mypdf = PyPDF2.PdfFileReader(infile)
raw_data = mypdf.getPage(1).getContents().getData()
print(raw_data)
Looking at the raw data provided, I have began to suspect that ASCII characters preceding carriage returns are significant: every carriage return that I've seen is preceded with one. It seems like they might be some kind of token identifier. I've already figured out that /RelativeColorimetric is associated with the sequence ri\r. I'm currently looking through the PDF 1.7 standard Adobe provides, and I know an explanation is in there somewhere, but I haven't been able to find it yet in that 756 page behemoth of a document
The defining thing here is not that \r – it is just inserted instead of a regular space for readability – but the fact that ri is an operator.
A PDF content stream uses a stack based Polish notation syntax: value1 value2 ... valuen operator
The full syntax of your ri, for example, is explained in Table 57 on p.127:
intent ri (PDF 1.1) Set the colour rendering intent in the graphics state (see 8.6.5.8, "Rendering Intents").
and the idea is that this indeed appears in this order inside a content stream. (... I tried to find an appropriate example of your ri in use but cannot find one; not even any in the ISO PDF itself that you referred to.)
A random stream snippet from elsewhere:
q
/CS0 cs
1 1 1 scn
1.5 i
/GS1 gs
0 -85.0500031 -14.7640076 0 287.0200043 344.026001 cm
BX
/Sh0 sh
EX
Q
(the indentation comes courtesy of my own PDF reader) shows operands (/CS0, 1 1 1, 1.5 etc.), with the operators (cs, scn, i etc.) at the end of each line for clarity.
This is explained in 7.8.2 Content Streams:
...
A content stream, after decoding with any specified filters, shall be interpreted according to the PDF syntax rules described in 7.2, "Lexical Conventions." It consists of PDF objects denoting operands and operators. The operands needed by an operator shall precede it in the stream. See EXAMPLE 4 in 7.4, "Filters," for an example of a content stream.
(my emphasis)
7.2.2 Character Set specifies that inside a content stream, whitespace characters such as tab, newline, and carriage return, are just that: separators, and may occur anywhere and in any number (>= 1) between operands and operators. It mentions
NOTE The examples in this standard use a convention that arranges tokens into lines. However, the examples’ use of white space for indentation is purely for clarity of exposition and need not be included in practical use.
– to which I can add that most PDF creating software indeed attempts to delimit 'lines' consisting of an operands-operator sequence with returns.

Flat File Schema lines longer then expected

Hello there Stackoverflow, I've been tasked with making a flat file schema as well as a map, however, our specifications are that there are 3 fields,
----------
Name       Length
----------
TIdentity     2
OIdentity     17
Result        2
However, the file that we receive is 500(ish) characters long, is there a way to make it ignore the remaning empty characters??
Thanks for any help you guys might be able to supply
You should definitely ensure the spec and sample files are correct (particularly that the spec contains any whitespace requirements/options), but assuming they are and you're just supposed to ignore the whitespace, you can create node to stuff the whitespace into and just ignore it.
Without knowing a bit more about your requirements, it's hard to say exactly how this should work. If the whitespace is always a fixed length, make a node that expects that many characters. If it's not always a fixed length, you may have to make a repeating node that's one character long but not the record terminator (presumably CR/LF or something of the like). If the whitespace itself is the delimiter, you might be able to do something with the ignore_trailing_delimiter on the record.
Worst case scenario (whitespace is variable, you can't control the partner who sends it to you, and you can't get the FFDASM to sensibly deal with it), write a custom Decode component to preprocess the file and remove the extraneous whitespace.

How to write a custom assembly compiler (sort of) in VB.NET

I've been trying to write a simple script compiler for a custom language used by the Game Boy Advance's Z80 processor.
All I want it to do is look at a human-readable command, take it and its arguments and convert it into a hexadecimal value into a ROM file. That's it. Each command is a byte, and each may take a different number of arguments - arguments can be either 8, 16, or 32 bits and each command has a specific number of arguments that it takes.
All of this sort of code is handled by the game and converted into workable machine code within the game's memory, so I'm not writing a full-on assembly compiler if you will. The game automatically knows how many args a command has, what each command does, exactly how to execute it as it is, etc.
For instance, you have command 0x4E, which takes in one 8-bit argument and another 32-bit argument. In hex that would obviously be 4E XX YY YY YY YY. I want my compiler to read it from text as foo 0xXX 0xYYYYYYYY and directly write it into a file as the former.
My question is, how would I do that in VB.NET? I know it's probably a very simple answer, but I see a lot of different options to write it to a file--some work and most don't for me. Could you give me some sample code as to how I would do this?
Writing an assembly compiler as I understand it is not so simple. I recomed you to use one already written see: Software Development Tools for Z80 Family
If you are still interested in writing it here are instructions:
Write the text you want to translate to some file (or memory stream)
Read it line by line
Parse the line either splitting it to an array or with regular
expressions
Identify command and arguments (as far as I remember it some commands
does not have arguments)
Translate the command to Hex (with a collection or dictionary of
commands)
Write results to an array remembering the references for jump
addresses
When everything is translated resolve addresses and write them to
right places.
I think that the most tricky part is to deal with symbolic addressees.
If you are still interested write a first piece of code (or ask how to do it) and continue with next ones.
This sounds like an assembler, even if it for a 'custom language'.
Start by parsing the command lines. use string.split method to convert the string to an array of strings. the first element in the array is your foo, you can then look that up and output 4E, then convert the subsequent elements to bytes.

How to identify binary and text files using Smalltalk

I want to verify that a given file in a path is of type text file, i.e. not binary, i.e. readable by a human. I guess reading first characters and check each character with :
isAlphaNumeric
isSpecial
isSeparator
isOctetCharacter ???
but joining all those testing methods with and: [ ... and: [ ... and: [ ] ] ] seems not to be very smalltalkish. Any suggestion for a more elegant way?
(There is a Python version here How to identify binary and text files using Python? which could be useful but syntax and implementation looks like C.)
only heuristics; you can never be really certain...
For ascii, the following may do:
|isPlausibleAscii numChecked|
isPlausibleAscii :=
[:char |
((char codePoint between:32 and:127)
or:[ char isSeparator ])
].
numChecked := text size min: 1024.
isPossiblyText := text from:1 to:numChecked conform: isPlausibleAscii.
For unicode (UTF8 ?) things become more difficult; you could then try to convert. If there is a conversion error, assume binary.
PS: if you don't have from:to:conform:, replace by (copyFrom:to:) conform:
PPS: if you don't have conform: , try allSatisfy:
All text contains more space than you'd expect to see in a binary file, and some encodings (UTF16/32) will contain lots of 0's for common languages.
A smalltalky solution would be to hide the gory details in method on Standard/MultiByte-FileStream, #isProbablyText would probably be a good choice.
It would essentially do the following:
- store current state if you intend to use it later, reset to start (Set Latin1 converter if you use a MultiByteStream)
Iterate over N next characters (where N is an appropriate number)
Encounter a non-printable ascii char? It's probably binary, so return false. (not a special selector, use a map, implement a new method on Character or something)
Increase 2 counters if appropriate, one for space characters, and another for zero characters.
If loop finishes, return whether either of the counters have been read a statistically significant amount
TLDR; Use a method to hide the gory details, otherwise it's pretty much the same.

Why is there a convention of 1-based line numbers but 0-based char numbers?

According to TkDocs:
The "1.0" here represents where to insert the text, and can be read as "line 1, character 0". This refers to the first character of the first line; for historical conventions related to how programmers normally refer to lines and characters, line numbers are 1-based, and character numbers are 0-based.
I hadn't heard of this convention before, and I can't find anything relevant on Google. Can anyone explain this to me please?
I think you're referring to Tk's text widget. The man page says:
Lines are numbered from 1 for consistency with other UNIX programs that use this numbering scheme.
Although, I'm not sure which Unix tools it's talking about.
Update:
As mentioned in the comments, it looks like a lot of unix text manipulation tool starts line numbering at 1. And tcl/tk having a unix origin, it makes sense to be as compatible as possible with the underlying OS environment.
It really is nothing more than convention, but here is a suggestion.
Character positions are generally thought of in the same way as a Java iterator, which is a "pointer" to a position between two characters. Thus the first character is the one after index position 0. Substrings are taken between two inter-character positions, for instance.
Line positions on the other hand are generally thought of more in the way of a .NET enumerator, which is a "pointer" to the item itself, not to a position in between. Thus the first line is the line at position 1.