After overviewing the AMF0 specification I find that I cannot understand the proper way to encode the StrictArray type.
Here is the most immediate section of the specification:
array-count = U32
strict-array-type = array-count *(value-type)
which describes the StrictArray type with Augmented Backus-Naur Form (ABNF) syntax (See RFC2234)
Does the StrictArray type have ordinal indices or simply encoded objects (without ordinal keys) in order of their appearance in the StrictArray object graph?
Also, as an additional question, does the serialization table (from which object reference IDs are generated) contain all objects in the object graph, or only objects which can be potentially encoded via reference (ECMAArray,StrictArray,TypedObject,AnonymousObject)?
See https://github.com/silexlabs/amfphp-2.0/blob/master/Amfphp/Core/Amf/Serializer.php line 329 to 336.
you write the number of objects, then each object.
additional question: same code, look for Amf0StoredObjects.
references ids are only for referencable objects. These vary for AMF0 and AMF3 though.
Related
I'm trying to understand whether indirect objects (declared with the obj/endobj keywords) can reside inside e.g. array, dictionary entries or other indirect objects.
For example
[ 3 0 0 obj (something) end ] would parse an array of [3, <indirect object>] if this was allowed.
From what I can see all indirect objects are always at the top level in a PDF, and the fact that object streams exist suggests me that this can't be possible, but I can't find a definite answer in the ISO standard.
EDIT:
It turns out that the ISO standard was not that clear, but the latest spec from Adobe is a bit clearer:
Note: In the data structures that make up a PDF document, certain
values are required to be specified as indirect object references. Except where this is explicitly called out, any object (other than a stream) may
be specified either directly or as an indirect object reference;
the semantics are entirely equivalent
Even if above it says
Any object in a PDF file may be labeled as an indirect object.
So I'm still not 100% sure.
I am looking into using Google Protobuffers for delta messaging. Meaning I only want to send out the changed values of my domain object.
But that exposes a problem with the protocol for this purpose. I can easily just omit the properties that have not changed, and that will present us with a compact message.
But what about properties that change value from _something_ to null? There is no way to distinguish between these two scenarios in a protocol buffer.
What have others done here? I am looking at a few different solutions:
Add a meta property to all objects, that is an array of int. In case any of the properties should change to null, include the field number in this array. If no properties change, then the meta property is just omitted and doesn't take up bandwidth in the message.
Add a meta property that is a bit mask, but works like the array mentioned in option 1. This might be harder for clients to understand though.
Use a standard way that I haven't been able to find yet.
BR Jay
Protobuf 3 isn't very well suited for this. But in protobuf 2, you can have a field that is present but has value of null.
Because protobuf 2 isn't going to disappear any time soon, I'd suggest just use that for this kind of purposes.
I just wanted to post a follow-up on this and explain what I did.
As #jpa correctly pointed out protobuffers are not made for delta-compression.
So the way I solved it was to use some meta properties and rely on that convention. I have a close partnership with the people consuming the data, so conventions can be agreed upon.
Values that are set specifically to null
I have added an int array to the messages. This int array is empty most of the time and does not have an impact on the message size. When a property is set to null, I will add the property tag to this array and that way indicate that it has specifically been set to null in that message update.
Arrays that are emptied
This works in the same way as the nulls array. I have added an int array to the messages. This int array is empty most of the time and does not have an impact on the message size. When an array is emptied, I will add the property tag to this array and that way indicate that it has specifically been emptied that message update.
Objects that are deleted
To indicate that an object has been deleted, I have added a boolean property indicating that the object has been deleted. When the object is deleted I will set this value to true, and otherwise null, so it doesn't take up space in the message. The resulting message is the key identifier for that object and the boolean indicating that it is deleted.
It requires that the convention is understood by the clients but otherwise it works pretty well.
I'm looking for a quick way to serialize custom structures consisting of basic value types and strings.
Using C++CLI to pin the pointer of the structure instance and destination array and then memcpy the data over is working quite well for all the value types. However, if I include any reference types such as string then all I get is the reference address.
Expected as much since otherwise it would be impossible for the structure to have a fixed.. structure. I figured that maybe, if I make the string fixed size, it might place it inside the structure though. Adding < VBFixedString(256) > to the string declaration did not achieve that.
Is there anything else that would place the actual data inside the structure?
Pinning a managed object and memcpy-ing the content will never give you what you want. Any managed object, be it String, a character array, or anything else will show up as a reference, and you'll just get a memory location.
If I read between the lines, it sounds like you need to call some C or C++ (not C++/CLI) code, and pass it a C struct that looks similar to this:
struct UnmanagedFoo
{
int a_number;
char a_string[256];
};
If that's the case, then I'd solve this by setting up the automatic marshaling to handle this for you. Here's how you'd define that struct so that it marshals properly. (I'm using C# syntax here, but it should be an easy conversion to VB.net syntax.)
[StructLayout(LayoutKind.Sequential, CharSet=CharSet.Ansi)]
public struct ManagedFoo
{
public int a_number;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst=256)]
public string a_string;
}
Explanation:
StructLayout(LayoutKind.Sequential) specifies that the fields should be in the declared order. The default LayoutKind, Auto, allows the fields to be re-ordered if the compiler wants.
CharSet=CharSet.Ansi specifies the type of strings to marshal. You can specify CharSet.Ansi to get char strings on the C++ side, or CharSet.Unicode to get wchar_t strings in C++.
MarshalAs(UnmanagedType.ByValTStr) specifies a string inline to the struct, which is what you were asking about. There are several other string types, with different semantics, see the UnmanagedType page on MSDN for descriptions.
SizeConst=256 specifies the size of the character array. Note that this specifies the number of characters (or when doing arrays, number of array elements), not the number of bytes.
Now, these marshal attributes are an instruction to the built-in marshaler in .Net, which you can call directly from your VB.Net code. To use it, call Marshal.StructureToPtr to go from the .Net object to unmanaged memory, and Marshal.PtrToStructure to go from unmanaged memory to a .Net object. MSDN has some good examples of calling those two methods, take a look at the linked pages.
Wait, what about C++/CLI? Yes, you could use C++/CLI to marshal from the .Net object to a C struct. If your structs get too complex to represent with the MarshalAs attribute, it's highly appropriate to do that. In that case, here's what you do: Declare your .Net struct like I listed above, without the MarshalAs or StructLayout. Also declare the C struct, plain and ordinary, also as listed above. When you need to switch from one to the other, copy things field by field, not a big memcpy. Yes, all the fields that are basic types (integers, doubles, etc.) will be a repetitive output.a_number = input.a_number, but that's the proper way to do it.
I need to serialize a (possibly complex *) object so that I can calculate the object's MAC**.
If your messages are strings you can simply do tag := MAC(key, string) and with very high probability if s1 != s2 then MAC(key, s1) != MAC(key, s2), moreover it is computationally hard to find s1,s2 such that MAC(k,s1) == MAC(k,s2).
Now my question is, what happens if instead of strings you need do MAC a very complex object that can contain arrays of objects and nested objects:
JSON
Initially I though that just using JSON serialization could do the trick but it turns out that JSON serializers do not care about order so for example: {b:2,a:1} can be serialized to both {"b":2,"a":1} or {"a":2,"b":1}.
URL Params
You can convert the object to a list of url query params after sorting the keys, so for example {c:1,b:2} can be serialized to b=2&c=1. The problem is that as the object gets more complex the serialization becomes difficult to understand. Example: {c:1, b:{d:2}}
1. First we serialize the nested object:{c:1, b:{d=2}}
2. Then url encode the = sign: {c:1, b:{d%3D2}}
3. Final serialization is: b=d%3D2&c=1
As you can see, the serialization quickly becomes unreadable and though I have not proved it yet I also have the feeling that it is not very secure (i.e. it is possible to find two messages that MAC to the same value)
Can anyone show me a good secure*** algorithm for serializing objects?
[*]: The object can have nested objects and nested arrays of objects. No circular references allowed. Example:
{a:'a', b:'b', c:{d:{e:{f:[1,2,3,4,5]}}, g:[{h:'h'},{i:'i'}]}}
[**]: This MAC will then be sent over the wire. I cannot know what languages/frameworks are supported by the servers so language specific solutions like Java Object Serialization are not possible.
[***]: Secure in this context means that given messages a,b: serialize(a) = serialize(b) implies that a = b
EDIT: I just found out about the SignedObject through this link. Is there a language agnostic equivalent?
What you are looking for is a canonical representation, either for the data storage itself, or for pre-processing before applying the MAC algorithm. One rather known format is the canonicalization used for XML-signature. It seems like the draft 2.0 version of XML signature is also including HMAC. Be warned that creating a secure verification of XML signatures is fraught with dangers - don't let yourself be tricked into trusting the signed document itself.
As for JSON, there seems to be a canonical JSON draft, but I cannot see the status of it or if there are any compliant implementations. Here is a Q/A where the same issue comes up (for hashing instead of a MAC). So there does not seem to be a fully standardized way of doing it.
In binary there's ASN.1 DER encoding, but you may not want to go into that as it is highly complex.
Of course you can always define your own binary or textual representation, as long as there is one representation for data sets that are semantically identical. In the case of an textual representation, you will still need to define a specific character encoding (UTF-8 is recommended) to convert the representation to bytes, as HMAC takes binary input only.
I use the graph version of OrientDB. Now I created a schema-less class, where I want to index a variable. This variable needs to become a property first. But when I try to create this property - of type string (or binary, or whatever) - it responds:
com.orientechnologies.orient.core.exception.OSchemaException: The database contains some schema-less data in the property 'clazz.clazz_name' that is not compatible with the type STRING. Fix those records and change the schema again [ONetworkProtocolHttpDb]
So I need to fix something, but what? What characters are illegal for a variable to become a property so that it can be indexed? (BTW, lists are also not an option)
There was indeed a problem I created.
I created a super-class where the property had to be created. One of the sub-classes inserted a List instead of a String. So when querying all vertices of sub-type
final Iterable<Vertex> iterable = this.graph.getVerticesOfClass("clazz");
I printed all types of clazz_name by vertex.getProperty("clazz_name").getClass().getName() where I saw OLinkedList. Reinserting those vertices fixed my problem.