Can indirect object inside PDF be nested? - pdf

I'm trying to understand whether indirect objects (declared with the obj/endobj keywords) can reside inside e.g. array, dictionary entries or other indirect objects.
For example
[ 3 0 0 obj (something) end ] would parse an array of [3, <indirect object>] if this was allowed.
From what I can see all indirect objects are always at the top level in a PDF, and the fact that object streams exist suggests me that this can't be possible, but I can't find a definite answer in the ISO standard.
EDIT:
It turns out that the ISO standard was not that clear, but the latest spec from Adobe is a bit clearer:
Note: In the data structures that make up a PDF document, certain
values are required to be specified as indirect object references. Except where this is explicitly called out, any object (other than a stream) may
be specified either directly or as an indirect object reference;
the semantics are entirely equivalent
Even if above it says
Any object in a PDF file may be labeled as an indirect object.
So I'm still not 100% sure.

Related

Using google protobuffer for delta messages

I am looking into using Google Protobuffers for delta messaging. Meaning I only want to send out the changed values of my domain object.
But that exposes a problem with the protocol for this purpose. I can easily just omit the properties that have not changed, and that will present us with a compact message.
But what about properties that change value from _something_ to null? There is no way to distinguish between these two scenarios in a protocol buffer.
What have others done here? I am looking at a few different solutions:
Add a meta property to all objects, that is an array of int. In case any of the properties should change to null, include the field number in this array. If no properties change, then the meta property is just omitted and doesn't take up bandwidth in the message.
Add a meta property that is a bit mask, but works like the array mentioned in option 1. This might be harder for clients to understand though.
Use a standard way that I haven't been able to find yet.
BR Jay
Protobuf 3 isn't very well suited for this. But in protobuf 2, you can have a field that is present but has value of null.
Because protobuf 2 isn't going to disappear any time soon, I'd suggest just use that for this kind of purposes.
I just wanted to post a follow-up on this and explain what I did.
As #jpa correctly pointed out protobuffers are not made for delta-compression.
So the way I solved it was to use some meta properties and rely on that convention. I have a close partnership with the people consuming the data, so conventions can be agreed upon.
Values that are set specifically to null
I have added an int array to the messages. This int array is empty most of the time and does not have an impact on the message size. When a property is set to null, I will add the property tag to this array and that way indicate that it has specifically been set to null in that message update.
Arrays that are emptied
This works in the same way as the nulls array. I have added an int array to the messages. This int array is empty most of the time and does not have an impact on the message size. When an array is emptied, I will add the property tag to this array and that way indicate that it has specifically been emptied that message update.
Objects that are deleted
To indicate that an object has been deleted, I have added a boolean property indicating that the object has been deleted. When the object is deleted I will set this value to true, and otherwise null, so it doesn't take up space in the message. The resulting message is the key identifier for that object and the boolean indicating that it is deleted.
It requires that the convention is understood by the clients but otherwise it works pretty well.

Can I have a string object store its data within the structure?

I'm looking for a quick way to serialize custom structures consisting of basic value types and strings.
Using C++CLI to pin the pointer of the structure instance and destination array and then memcpy the data over is working quite well for all the value types. However, if I include any reference types such as string then all I get is the reference address.
Expected as much since otherwise it would be impossible for the structure to have a fixed.. structure. I figured that maybe, if I make the string fixed size, it might place it inside the structure though. Adding < VBFixedString(256) > to the string declaration did not achieve that.
Is there anything else that would place the actual data inside the structure?
Pinning a managed object and memcpy-ing the content will never give you what you want. Any managed object, be it String, a character array, or anything else will show up as a reference, and you'll just get a memory location.
If I read between the lines, it sounds like you need to call some C or C++ (not C++/CLI) code, and pass it a C struct that looks similar to this:
struct UnmanagedFoo
{
int a_number;
char a_string[256];
};
If that's the case, then I'd solve this by setting up the automatic marshaling to handle this for you. Here's how you'd define that struct so that it marshals properly. (I'm using C# syntax here, but it should be an easy conversion to VB.net syntax.)
[StructLayout(LayoutKind.Sequential, CharSet=CharSet.Ansi)]
public struct ManagedFoo
{
public int a_number;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst=256)]
public string a_string;
}
Explanation:
StructLayout(LayoutKind.Sequential) specifies that the fields should be in the declared order. The default LayoutKind, Auto, allows the fields to be re-ordered if the compiler wants.
CharSet=CharSet.Ansi specifies the type of strings to marshal. You can specify CharSet.Ansi to get char strings on the C++ side, or CharSet.Unicode to get wchar_t strings in C++.
MarshalAs(UnmanagedType.ByValTStr) specifies a string inline to the struct, which is what you were asking about. There are several other string types, with different semantics, see the UnmanagedType page on MSDN for descriptions.
SizeConst=256 specifies the size of the character array. Note that this specifies the number of characters (or when doing arrays, number of array elements), not the number of bytes.
Now, these marshal attributes are an instruction to the built-in marshaler in .Net, which you can call directly from your VB.Net code. To use it, call Marshal.StructureToPtr to go from the .Net object to unmanaged memory, and Marshal.PtrToStructure to go from unmanaged memory to a .Net object. MSDN has some good examples of calling those two methods, take a look at the linked pages.
Wait, what about C++/CLI? Yes, you could use C++/CLI to marshal from the .Net object to a C struct. If your structs get too complex to represent with the MarshalAs attribute, it's highly appropriate to do that. In that case, here's what you do: Declare your .Net struct like I listed above, without the MarshalAs or StructLayout. Also declare the C struct, plain and ordinary, also as listed above. When you need to switch from one to the other, copy things field by field, not a big memcpy. Yes, all the fields that are basic types (integers, doubles, etc.) will be a repetitive output.a_number = input.a_number, but that's the proper way to do it.

What is the proper way to encode an AMF0 StrictArray

After overviewing the AMF0 specification I find that I cannot understand the proper way to encode the StrictArray type.
Here is the most immediate section of the specification:
array-count = U32
strict-array-type = array-count *(value-type)
which describes the StrictArray type with Augmented Backus-Naur Form (ABNF) syntax (See RFC2234)
Does the StrictArray type have ordinal indices or simply encoded objects (without ordinal keys) in order of their appearance in the StrictArray object graph?
Also, as an additional question, does the serialization table (from which object reference IDs are generated) contain all objects in the object graph, or only objects which can be potentially encoded via reference (ECMAArray,StrictArray,TypedObject,AnonymousObject)?
See https://github.com/silexlabs/amfphp-2.0/blob/master/Amfphp/Core/Amf/Serializer.php line 329 to 336.
you write the number of objects, then each object.
additional question: same code, look for Amf0StoredObjects.
references ids are only for referencable objects. These vary for AMF0 and AMF3 though.

Behaviour of arguments/parameters and variables inside Shared Functions in VB.NET

Well I am a new to VB.NET, converting a legacy system to .NET world. Recently I have been reviewing the already existing code since I joined the project quite late in the team.
So I find that there are many shared functions (not shared class) inside many classes. I doubt this may create some problem if two requests ( i.e two different HTTP request to the same method as it's a WCF application, of course exposed methods are not shared but internally called methods are shared) comes to the same shared method and both the calls to the method may have different method parameters/arguments, overwriting each other's arguments.
In short, if shared method has a list of arguments which is going to be processed, is there any chance of inconsistencies in the light of multiple access to the shared method via two http requests.
I would appreciate each and every response the thread.
Thanks,
JJ
No.
Parameters are local to the method call and will not interact across threads.
However, if you use Shared fields or variables, you will have issues.
It is true that parameters are local to the method call; however, that will not necessarily limit their reach. Class variables sent as ByVal parameters can still result in interactions across threads. You may want to read up on the SyncLock keyword. The use of the Shared keyword will not affect (i.e. reduce) the chances of such interactions.
The main issue about multithreaded applications is when the very same range of memory gets referenced by more than one thread at a time, particularly when any one of those threads may make memory writes.
Some things to think about:
(1) Visual Basic (and C#) dichotomizes variables (and data types) into two species: the "Value" (or "Structure"), and the "Reference" (or "Class").
(2) The "Value" data type means that a direct reference is made to an actual collection of bits that represents an Integer, or a Boolean, or even a bitmap, or some other kind of object. In old school parlance, this is the "image" of an instantiation of an object. It is the state space of the object. It is what makes an object itself versus being some other object, independant of where in memory it may be.
(3) The "Reference" data type means that this a very special Structure which somehow indicates the data type of the object and where in memory it resides. The computer will interpret a "Reference" to obtain the actual image of the object.
(4) When a "Value" parameter is passed ByVal, that means a new object is created that is in the identical image of the original expression being passed, and it is upon this copy that the function or method operates. The image of the original "Value" cannot be affected.
(5) When a "Value" parameter is passed ByRef, that means a new "Reference" variable is created, and that "Reference" variable will contain the information that will interpret back to the image of the original "Value". Now the image of the original "Value" can be changed.
(6) When a "Reference" parameter is passed ByVal, the very special Structure, which gets interpreted back to the actual image of the object, gets copied. It is upon this copy of the very special Structure that the function or method operates. This copy still points to the actual image of the object. Which means that an object of a Reference variable that is passed by ByVal can still have its image (i.e. its "Value") changed. However, the very special Structure of the original "Reference" itself cannot be changed.
(7) Note that the String type is an odd duck: It will behave as if it were a "Value" parameter even though it is in fact a "Reference" type. Hence a String passed ByVal will not be affected in the same way any other class would. Actually, String is an example of an immutable type - which means that steps are taken to prevent changes to the image of its "Value". (See http://msdn.microsoft.com/en-us/library/bb383979.aspx and http://codebetter.com/patricksmacchia/2008/01/13/immutable-types-understand-them-and-use-them/ for more details.)
(8) When a "Reference" parameter is passed ByRef, one now has created a new "Reference" object that points to the original "Reference" object (that, in turn, points to the "Value" of some other object). The use of ByRef on a "Reference" allows one to modify (or create anew) the very special Structure of the original "Reference" object being passed as a parameter. A function or method that performs a swap operation will use ByRef on "Reference" parameters.
(9) Some people say that a "Reference" is the same as a memory address. While in particular cases this may in fact be true, technically it is not. The very special Structure does not have to be a memory address in whatever image would be valid for the CPU - although utlimately the computer will translate it into a valid memory address at some point.
(10) The keyword Me is an automatic "Reference" to the object that is currently executing the class member. Under the hood, it exists as a parameter too, one that is sent unseen. EXCEPT in the case of a Shared member - in which case Me is unavailable.

How to get differing value type out of an concrete implementation if only Interface / abstract class is known?

what I am using:
VB.NET, NET 3.5, OpenXML SDK 2.0
what I want to do:
I am creating an xlsx reader / writer for my application (based on OpenXML SDK 2.0). I want to read xlsx files and store the data contained in each row in a DTO/PONO. Further I want to read the xlsx file and then modify it and save it.
my thoughts:
Now my problem is not with the OpenXML SDK, I can do what I need to do.
My problem is on how to structure my components. Specifically I have problems with the polymorphism at the lowest level of a Spreadsheet, the cell.
A cell in Excel/OpenXML can have different types of data associated with it. Like a Time, Date, Number, Text or Formula. These different type need to be handled differently when read/written from/to a spreadsheet.
I decided to have a common interface for all subtypes like TextCell, NumberCell, DateCell etc.
Now when I read the cell from the spreadsheet the Method/Factory can decide which type of cell to create.
Now because the cell is an abstract from the real implementation it does not know / does not need to know of what type it is. For writing / modifying the cell I solve this problem by calling .write(ICellWriter) on the cell I want to persist. As the cell itself knows what type of data it contains, it knows which method of ICellWriter it needs to call (static polymorpism).
My problem:
Writing to the xlsx file is no problem. My problem is, how do I get the data out of my cell into my DTO/PONO without resorting to type checking -> If TypeOf variable is ClassX then doesomething End If. As Methods / Properties have to have different Signatures and differentiating by only using a different return type is not allowed.
Edit:
The holder (collection, in this case a row of a table/spreadsheet) of the objects (refering to the cells) does not know the concrete implementations. So for writing a cell I pass it a Cellwriter. This Cellwriter has overloaded methods like Write(num as Integer), Write(text as String), Write(datum as Date). The cell object that gets this passed to it then calls the Write() method with the data type it holds. This works, as no return value is passed back.
But what do I do when I need the concrete data type returned?
Any ideas, advice, insight?
Thanks
Edit:
Glossary:
DTO: Data Transfere Object
PONO: Plain Old .Net Object
xlsx: referring to file ending of excel workbook files
Edit:
The Cell "subtypes" implement a common interface and do not inherit from a common superclass.
Edit:
After some thinking about the problem I came to realize that it’s not possible without reflection or knowledge of what type of cell I am expecting. Basically I was trying to recreate a spreadsheet or something with similar functionality and way too abstract/configurable for my needs. Thanks for your time & effort put in to writing the answer. I accepted the answer that was closest to what I realized.
I don't think you can.
If I'm understanding correctly, you have a different types of cells (StringCell, IntCell) and each of those concrete classes returns an object of type 'Object'. When you are using the base class 'Cell' and getting it's value - it's of type Object.
To work with it as a String, or Integer, Or Date, etc...etc... I think you need to inspect the type of that object, one way or another. You can use TypeOf like you demonstrated; I've also seen things like '.GetValueAsString()/.GetValueAsInteger()' on the base class. But you still need knowledge enough to say 'Dim myInt as Integer = myCell.GetValueAsInteger()'
Generally speaking, at least if you subscribe to the SOLID principals, you shouldn't care.
It states that, in a computer program if S is a subtype of T, then objects of type T may be replaced with objects of type S (i.e., objects of type S may be substitutes for objects of type T), without altering any of the desirable properties of that program (correctness, task performed, etc.)
http://en.wikipedia.org/wiki/Liskov_substitution_principle
If you have subtypes of cells, but you can't use them interchangeably, it's a good candidate for not using inheritance.
I don't know what you intending to do with the values in the cells that would require you to have the concrete class instead of using the base; but it might be possible to expose that functionality in the base itself. IE - if you need to add two cells, you can accomplish that treating them as generic cells (perhaps. At least provided they are of compatible types) without knowing what subtype they are. You should be able to return the base class in your DTO, regardless.
At least, I that's my understanding. I'd certainly wait for more people to chime in before listening to me.