protocol-buffers: string or byte sequence of the exact length - serialization

Looking at https://developers.google.com/protocol-buffers/docs/proto3#scalar it appears that string and bytes types don't limit the length? Does it mean that we're expected to specify the length of transmitted string in a separate field, e.g. :
message Person {
string name = 1;
int32 name_len = 2;
int32 user_id = 3;
...
}

The wire type used for string/byte is Length-delimited. This means that the message includes the strings length. How this is made available to you will depend upon the language you are using - for example the table says that in C++ a string type is used so you can call name.length() to retrieve the length.
So there is no need to specify the length in a separate field.

One of the things that I wished GPB did was allow the schema to be used to set constraints on such things as list/array length, or numerical value ranges. The best you can do is to have a comment in the .proto file and hope that programmers pay attention to it!
Other serialisation technologies do do this, like XSD (though often the tools are poor), ASN.1 and JSON schema. It's very useful. If GPB added these (it doesn't change wire formats), GPB would be pretty well "complete".

Related

Generating Random String of Numbers and Letters Using Go's "testing/quick" Package

I've been breaking my head over this for a few days now and can't seem to be able to figure it out. Perhaps it's glaringly obvious, but I don't seem to be able to spot it. I've read up on all the basics of unicode, UTF-8, UTF-16, normalisation, etc, but to no avail. Hopefully somebody's able to help me out here...
I'm using Go's Value function from the testing/quick package to generate random values for the fields in my data structs, in order to implement the Generator interface for the structs in question. Specifically, given a Metadata struct, I've defined the implementation as follows:
func (m *Metadata) Generate(r *rand.Rand, size int) (value reflect.Value) {
value = reflect.ValueOf(m).Elem()
for i := 0; i < value.NumField(); i++ {
if t, ok := quick.Value(value.Field(i).Type(), r); ok {
value.Field(i).Set(t)
}
}
return
}
Now, in doing so, I'll end up with both the receiver and the return value being set with random generated values of the appropriate type (strings, ints, etc. in the receiver and reflect.Value in the returned reflect.Value).
Now, the implementation for the Value function states that it will return something of type []rune converted to type string. As far as I know, this should allow me to then use the functions in the runes, unicode and norm packages to define a filter which filters out everything which is not part of 'Latin', 'Letter' or 'Number'. I defined the following filter which uses a transform to filter out letters which are not in those character rangetables (as defined in the unicode package):
func runefilter(in reflect.Value) (out reflect.Value) {
out = in // Make sure you return something
if in.Kind() == reflect.String {
instr := in.String()
t := transform.Chain(norm.NFD, runes.Remove(runes.NotIn(rangetable.Merge(unicode.Letter, unicode.Latin, unicode.Number))), norm.NFC)
outstr, _, _ := transform.String(t, instr)
out = reflect.ValueOf(outstr)
}
return
}
Now, I think I've tried just about anything, but I keep ending up with a series of strings which are far from the Latin range, e.g.:
𥗉똿穊
𢷽嚶
秓䝏小𪖹䮋
𪿝ท솲
𡉪䂾
ʋ𥅮ᦸ
堮𡹯憨𥗼𧵕ꥆ
𢝌𐑮𧍛併怃𥊇
鯮
𣏲𝐒
⓿ꐠ槹𬠂黟
𢼭踁퓺𪇖
俇𣄃𔘧
𢝶
𝖸쩈𤫐𢬿詢𬄙
𫱘𨆟𑊙
欓
So, can anybody explain what I'm overlooking here and how I could instead define a transformer which removes/replaces non-letter/number/latin characters so that I can use the Value function as intended (but with a smaller subset of 'random' characters)?
Thanks!
Confusingly the Generate interface needs a function using the type not a the pointer to the type. You want your type signature to look like
func (m Metadata) Generate(r *rand.Rand, size int) (value reflect.Value)
You can play with this here. Note: the most important thing to do in that playground is to switch the type of the generate function from m Metadata to m *Metadata and see that Hi Mom! never prints.
In addition, I think you would be better served using your own type and writing a generate method for that type using a list of all of the characters you want to use. For example:
type LatinString string
const latin = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01233456789"
and then use the generator
func (l LatinString) Generate(rand *rand.Rand, size int) reflect.Value {
var buffer bytes.Buffer
for i := 0; i < size; i++ {
buffer.WriteString(string(latin[rand.Intn(len(latin))]))
}
s := LatinString(buffer.String())
return reflect.ValueOf(s)
}
playground
Edit: also this library is pretty cool, thanks for showing it to me
The answer to my own question is, it seems, a combination of the answers provided in the comments by #nj_ and #jimb and the answer provided by #benjaminkadish.
In short, the answer boils down to:
"Not such a great idea as you thought it was", or "Bit of an ill-posed question"
"You were using the union of 'Letter', 'Latin' and 'Number' (Letter || Number || Latin), instead of the intersection of 'Latin' with the union of 'Letter' and 'Number' ((Letter || Number) && Latin))
Now for the longer version...
The idea behind me using the testing/quick package is that I wanted random data for (fuzzy) testing of my code. In the past, I've always written the code for doing things like that myself, again and again. This meant a lot of the same code across different projects. Now, I could of course written my own package for it, but it turns out that, even better than that, there's actually a standard package which does just about exactly what I want.
Now, it turns out the package does exactly what I want very well. The codepoints in the strings which it generates are actually random and not just restricted to what we're accustomed to using in everyday life. Now, this is of course exactly the thing which you want in doing fuzzy testing in order to test the code with values outside the usual assumptions.
In practice, that means I'm running into two problems:
There's some limits on what I would consider reasonable input for a string. Meaning that, in testing the processing of a Name field or a URL field, I can reasonably assume there's not going to be a value like 'James Mc⌢' (let alone 'James Mc🙁') or 'www.🕸site.com', but just 'James McFrown' and 'www.website.com'. Hence, I can't expect a reasonable system to be able to support it. Of course, things shouldn't completely break down, but it also can't be expected to handle the former examples without any problems.
When I filter the generated string on values which one might consider reasonable, the chance of ending up with a valid string is very small. The set of possible characters in the set used by the testing/quick is just so large (0x10FFFF) and the set of reasonable characters so small, you end up with empty strings most of the time.
So, what do we need to take away from this?
So, whilst I hoped to use the standard testing/quick package to replace my often repeated code to generate random data for fuzzy testing, it does this so well that it provides data outside the range of what I would consider reasonable for the code to be able to handle. It seems that the choice, in the end, is to:
Either be able to actually handle all fuzzy options, meaning that if somebody's name is 'Arnold 💰💰' ('Arnold Moneybags'), it shouldn't go arse over end. Or...
Use custom/derived types with their own Generator. This means you're going to have to use the derived type instead of the basic type throughout the code. (Comparable to defining a string as wchar_t instead of char in C++ and working with those by default.). Or...
Don't use testing/quick for fuzzy testing, because as soon as you run into a generated string value, you can (and should) get a very random string.
As always, further comments are of course welcome, as it's quite possible I overlooked something.

Enumerating Strings as bytes?

I was looking for a way to enumerate String types in (vb).NET, but .NET enums only accept numeric type values.
The first alternative I came across was to create a dictionary of my enum values and the string I want to return. This worked, but was hard to maintain because if you changed the enum you would have to remember to also change the dictionary.
The second alternative was to set field attributes on each enum member, and retrieve it using reflection. Surely enough this worked aswell and also solved the maintenance problem, but it uses reflection and I've always read that using reflection should be a last resort thing.
So I started thinking and I came up with this: every ASCII character can be represented as a hexadecimal value, and you can assign hexadecimal values to enum members.
You could get rid of the attributes, assign the hexadecimal values to the enum members. Then, when you need the text value, convert the value to a byte array and use System.Text.Encodings.ASCII.GetString(enumMemberBytes) to get the string value.
Now speaking out of experience, anything I come up with is usually either flawed or just plain wrong. What do you guys think about this approach? Is there any reason not to do it like that?
Thanks.
EDIT
As pointed out by David W, enum member values are limited in length, depending on the underlying type (integer by default). So yes, I believe my method works but you are limited to characters in the ASCII table, with a maximum length of 4 or 8 characters using integers or longs respectively.
The easiest way I have found to dynamically parse a String representation of an Enumeration into the actual Enumeration type was to do the following:
Private EnumObject
[Undefined]
ValueA
ValueB
End Enum
dim enumVal as EnumObject = DirectCast([Enum].Parse(GetType(EnumObject), "ValueA"), EnumObject)
This removes the need to maintain a dictionary and allows you to just handle strings instead of converting to an Int or a Long. This does use reflection, but I have not come across any issues as long as you catch and handle any exceptions with the String Parse.

Use of byte arrays and hex values in Cryptography

When we are using cryptography always we are seeing byte arrays are being used instead of String values. But when we are looking at the techniques of most of the cryptography algorithms they uses hex values to do any operations. Eg. AES: MixColumns, SubBytes all these techniques(I suppose it uses) uses hex values to do those operations.
Can you explain how these byte arrays are used in these operations as hex values.
I have an assignment to develop a encryption algorithm , therefore any related sample codes would be much appropriate.
Every four digits of binary makes a hexadecimal digit, so, you can convert back and forth quite easily (see: http://en.wikipedia.org/wiki/Hexadecimal#Binary_conversion).
I don't think I full understand what you're asking, though.
The most important thing to understand about hexadecimal is that it is a system for representing numeric values, just like binary or decimal. It is nothing more than notation. As you may know, many computer languages allow you to specify numeric literals in a few different ways:
int a = 42;
int a = 0x2A;
These store the same value into the variable 'a', and a compiler should generate identical code for them. The difference between these two lines will be lost very early in the compilation process, because the compiler cares about the value you specified, and not so much about the representation you used to encode it in your source file.
Main takeaway: there is no such thing as "hex values" - there are just hex representations of values.
That all said, you also talk about string values. Obviously 42 != "42" != "2A" != 0x2A. If you have a string, you'll need to parse it to a numeric value before you do any computation with it.
Bytes, byte arrays and/or memory areas are normally displayed within an IDE (integrated development environment) and debugger as hexadecimals. This is because it is the most efficient and clear representation of a byte. It is pretty easy to convert them into bits (in his mind) for the experienced programmer. You can clearly see how XOR and shift works as well, for instance. Those (and addition) are the most common operations when doing symmetric encryption/hashing.
So it's unlikely that the program performs this kind of conversion, it's probably the environment you are in. That, and source code (which is converted to bytes at compile time) probably uses a lot of literals in hexadecimal notation as well.
Cryptography in general except hash functions is a method to convert data from one format to another mostly referred as cipher text using a secret key. The secret key can be applied to the cipher text to get the original data also referred as plain text. In this process data is processed in byte level though it can be bit level as well. The point here the text or strings which we referring to are in limited range of a byte. Example ASCII is defined in certain range in byte value of 0 - 255. In practical when a crypto operation is performed, the character is converted to equivalent byte and the using the key the process is performed. Now the outcome byte or bytes will most probably be out of range of human readable defined text like ASCII encoded etc. For this reason any data to which a crypto function is need to be applied is converted to byte array first. For example the text to be enciphered is "Hello how are you doing?" . The following steps shall be followed:
1. byte[] data = "Hello how are you doing?".getBytes()
2. Process encipher on data using key which is also byte[]
3. The output blob is referred as cipherTextBytes[]
4. Encryption is complete
5. Using Key[], a process is performed over cipherTextBytes[] which returns data bytes
6 A simple new String(data[]) will return string value of Hellow how are you doing.
This is a simple info which might help you to understand reference code and manuals better. In no way I am trying to explain you the core of cryptography here.

Equivalent of "Dim As String * 1" VB6 to VB.NET

I have some VB6 code that needs to be migrated to VB.NET, and I wanted to inquire about this line of code, and see if there is a way to implement it in .NET
Dim strChar1 As String * 1
Intellisense keeps telling me that an end of statement is expected.
That's known as a "fixed-length" string. There isn't an exact equivalent in VB.NET.
Edit: Well, OK, there's VBFixedStringAttribute, but I'm pretty sure that exists solely so that automated migration tools can more easily convert VB6 code to VB.NET for you, and it's not really the ".NET way" to do things. Also see the warnings in the article for details on why this still isn't exactly the same thing as a fixed-length string in VB6.
Generally, fixed-length strings are only used in VB6 if you are reading fixed-size records from a file or over the network (i.e. parsing headers in a protocol frame).
For example, you might have a file that contains a set of fixed-length records that all have the format (integer, 1-character-string, double), which you could represent in VB6 as a user-defined type:
Public Type Record
anInteger As Integer
aSingleCharacter As String * 1
aDouble As Double
End Type
This way, VB6 code that reads from the file containing records in this format can read each fixed-sized record stored in the file, and in particular, it will only read 1 byte for aSingleCharacter. Without the * 1, VB6 would have no idea how many characters to read from the file, since a String can normally have any number of characters.
In VB.NET, you can do one of the following, depending on your needs:
If the length matters (i.e. you need to read exactly one byte from some data source, for example) consider using an array instead, such as
Dim aSingleByteArray(1) As Byte
Alternatively, you could use one of the Stream classes. In particular, if you are reading data from a network socket or a file, consider using NetworkStream or FileStream, respectively. A Stream is meant for byte-for-byte access (i.e. raw binary access). StreamReader is a related class that simplifies reading data when it is text-based, so that might be good if you are reading a text file, for example. Otherwise (if dealing with binary data), stick with one of the Stream classes.
If the length doesn't matter, you could just use a "normal" String. That is to say:
Dim aNormalString As String
Which answer is "correct" really depends on why it was declared that way in the original VB6 code.
The fixed length strings has been deprecated in VB.NET because there are several better options.
Since your fixed length string is just one character long, you can use the Char type in this case, as Mark suggested.
Dim strChar1 As Char
Seeing as you're doing a VB6 migration, I'd definitely consider VBFixedStringAttribute as well as the other options listed by Mike Spross, but, in this case, because it is a single character, a Char may be an option in this case too.
As mentioned elsewhere VBFixedString is only acknowledged by the Get and Put VB I/O API. So the best solution (other than rewriting your code that references the "fixed length string") is to write your own equivalent of Microsoft.VisualBasic.Compatibility.VB6.FixedLengthString. See this answer for more details.
VBFixedStringAttribute Class
<VBFixedString(1)> Dim strChar1 As String
ALthough this question was asked ages ago, VB.NET actually has a native fixed-length string -- <VbFixedArray(9)> Public fxdString As Char() 'declare 10-char fixed array. Doing this with scalars actually creates VB6-style Static Arrays.

How Do I Convert a Byte Stream to a Text String?

I'm working on a licensing system for my application. I'd like to put all licensing information (licensee name, expiration date, and enabled features) into an object, encrypt that object with a private key, then represent the encrypted data as a single text string which I can send via email to my customers.
I've managed to get the encrypted data into a byte stream, but I don't know how to convert that byte stream into a text value -- something that contains no control characters or whitespace. Can anyone offer advice on how to do that? I've been researching the Encoding class, but I can't find a text-only encoding.
I'm using Net 2.0 -- mostly VB, but I can do C# also.
Use a Base64Encoder to convert it to a text string that can be decoded with a Base64Decoder. It is great for representing arbitary binary data in a text friendly manner, only upper and lower case A-Z and 0-9 digits.
BinHex is an example of one way to do that. It may not be exactly what you want -- for example, you might want to encode your data such that it's impossible to inadvertently spell words in your string, and you may or may not care about maximizing the density of information. But it's an example that may help you come up with your own encoding.
I've found Base32 useful for license keys before. There are some C# implementations linked from this answer. My own license code is based on this implementation, which avoids ambiguous characters to make it easier to retype the keys.