I have an application that I'm converting to Symfony/Doctrine. It currently stores a serialized array as a JSON object in the database. I see that Doctrine has an array column that does similar serialization. Is there any performance benefit to the array column verses having custom json_decode/encode getters and setters? What serialization format is Doctrine using?
I believe Doctrine uses its own serialisation format, which is similar enough to that used by PHP's serialize()/unserialize() format to be recognisable but not similar enough to be compatible.
The performance benefit will probably be negligible as both are essentially implode()-like operations and PHP is pretty quick with string manipulation. There is however the overhead of Doctrine itself and the actual database transaction too, and these are much bigger factors when considering any optimisation.
If you're hydrating Doctrine objects from the serialized string, then Doctrine provides a method for its own serialised format which is perhaps quicker, although again is not a major bottleneck for computation.
Related
I'm currently struggling with the experimental KXS-properties serialization backend, mainly because of two reasons:
I can't find any documentation for it (I think there is none)
KXS-properties only includes a serializer / deserializer, but no encoder / decoder
The endpoint provided by the framework is essentially Map<String, Any>, but the map is flat and the keys already have the usual dot-separated properties syntax. So the step that I have to take is to encode the map to a single string that is printable to a .properties file AND decode a single string from a .properties file into the map. I'm generally following the Properties Format Spec from https://docs.oracle.com/javase/10/docs/api/java/util/Properties.html#load(java.io.Reader), it's not as easy as one might think.
The problem is that I can't use java.util.Properties right away because KXS is multiplatform and it would kinda kill the purpose of it when I'd restrict it to JVM because I use java.util.Properties. If I were to use it, the solution would be pretty simple, like this: https://gist.github.com/RaphaelTarita/748e02c06574b20c25ab96c87235096d
So I'm trying to implement my own encoder / decoder, following the rough structure of kotlinx.serialization.json.Json.kt. Although it's pretty tedious, it went well so far, but now I've stumbled upon a new problem:
As far as I know (I am not sure because there is no documentation), the map only contains primitives (or primitive-equivalents, as Kotlin does not really have primitives). I suspect this because when you write your own KSerializers for the KXS frontend, you can specify to encode to any primitive by invoking the encodeXXX() functions of the Encoder interface. Now the problem is: When I try to decode to the map that should contain primitives, how do I even know which primitives are expected by the model class?
I've once written my own serializer / deserializer in Java to learn about the topic, but in that implementation, the backend was a lot more tightly coupled to the frontend, so that I could query the expected primitive type from the model class in the backend. But in my situation, I don't have access to the model class and I have no clue how to retrieve the expected types.
As you can see, I've tried multiple approaches, but none of them worked right away. If you can help me to get any of these to work, that would be very much appreciated
Thank you!
The way it works in kotlinx.serialization is that there are serializers that describe classes and structures etc. as well as code that writes/read properties as well as the struct. It is then the job of the format to map those operations to/from a data format.
The intended purpose of kotlinx.serialization.Properties is to support serializing a Kotlin class to/from a java.util.Properties like structure. It is fairly simple in setup in that every nested property is serialized by prepending the property name to the name (the dotted properties syntax).
Unfortunately it is indeed the case that this deserializing from this format requires knowing the types expected. It doesn't just read from string. However, it is possible to determine the structure. You can use the descriptor property of the serializer to introspect the expectations.
From my perspective this format is a bit more simple than it should be. It is a good example of a custom format though. A key distinction between formats is whether they are intended to just provide a storage format, or whether the output is intended (be able to) to represent a well designed api. The latter ones need to be more complex.
I'm trying to understand the problem with ORM, and the wikipedia article on ORM mismatch starts with:
Object-oriented programs are designed with techniques that
result in encapsulated objects whose representation is hidden.
Mapping
such private object representation to database tables makes such
databases fragile according to OOP (object-oriented programming)
philosophy, since
there are significantly fewer constraints for design
of encapsulated private representation of objects compared to a
database's use of public data, which must be amenable to upgrade,
inspection and queries.
Could anybody expand on this fragility? What are these constraints?
The operations seem to me to be about reading rows (inspections and queries) to objects and writing objects (upgrades) to rows. What's the big deal?
Why do we need to use serialization?
If we want to send an object or piece of data through a network we can use streams of bytes. If we want to save some data to the disk, we can again use the binary mode along with the byte streams and save it.
So what's the advantage of using serialization?
Technically on the low-level, your serialized object will also end up as a stream of bytes on your cable or your filesystem...
So you can also think of it as a standardized and already available way of converting your objects to a stream of bytes. Storing/transferring object is a very common requirement, and it has less or little meaning to reinvent this wheel in every application.
As other have mentioned, you also know that this object->stream_of_bytes implementation is quite robust, tested, and generally architecture-independent.
This does not mean it is the only acceptable way to save or transfer an object: in some cases, you'll have to implement your own methods, for example to avoid saving unnecessary/private members (for example for security or performance reasons). But if you are in a simple case, you can make your life easier by using the serialization/deserialization of your framework, language or VM instead of having to implement it by yourself.
Hope this helps.
Quoting from Designing Data Intensive Applications book:
Programs usually work with data in (at least) two different
representations:
In memory, data is kept in objects, structs, lists, arrays, hash tables, trees, and so on. These data structures are optimized for
efficient access and manipulation by the CPU (typically using
pointers).
When you want to write data to a file or send it over the network, you have to encode it as some kind of self-contained sequence of bytes
(for example, a JSON document). Since a pointer wouldn’t make sense to
any other process, this sequence-of-bytes representation looks quite
different from the data structures that are normally used in memory.
Thus, we need some kind of translation between the two
representations. The translation from the in-memory representation to
a byte sequence is called encoding (also known as serialization or
marshalling), and the reverse is called decoding (parsing,
deserialization, unmarshalling).
Among other reasons to be compatible between architecture. An integer doesn't have the same number of bytes from one architecture to another, and sometimes from one compiler to another.
Plus what you're talking about is still serialization. Binary Serialization. You're putting all the bytes of your object together in order to store them and be able to reconvert them as an object later. This is serializing.
More info on wikipedia
Serialization is the process of converting an object into a stream so that it can be saved in any physical file like (XML) or can be saved in Database. The main purpose of Serialization in C# is to persist an object and save it in any specified storage medium like stream, physical file or DataBase.
In General, serialization is a method to persist an object's state, but i suggest you to read this wiki page, it is pretty detailed and correct in my opinion:
http://en.wikipedia.org/wiki/Serialization
In serialization, the point is not turning an object into bits and bytes, objects ARE bits and bytes already. Serialization is the process of making the object's "state" persistent. Notice the word "state", which means the values of the instance variables of the entire object graph (the target object and all the objects it references either directly or indirectly) WITHOUT methods and other extra runtime stuff stuck to them (and of course plus a little more info that JVM needs for restoration of these objects, such as their class types).
So this is the main reason of its necessity: Storing the whole bytes of objects would be expensive, and for all intents and purposes, unnecessary.
I'm considering switching from fluent nhibernate to subsonic as nhib just seems to have a MASSIVE memory footprint which I'm really not enjoying, but I just want to check how subsonic (the simple repository probably) would cope with:
adding extra fields to a database: at the moment I can map a dictionary value to a field in the database which is VERY cool, is this possible in subsonic? (or anything similar?)
FWIW: DynamicComponent(x => x.PropertyBag, GetDynamicComponentPart); where propertybag is a Dictionary.
many to many relationships
cascading saves/deletes
mapping a complex object to an xml or varchar(max) column (seralize it to xml obviously)
* adding extra fields to a database: at the moment I can map a
dictionary value to a field in the
database which is VERY cool, is this
possible in subsonic? (or anything
similar?)
FWIW: DynamicComponent(x => x.PropertyBag,
GetDynamicComponentPart); where
propertybag is a Dictionary.
Adding fields is fairly simple. Just add the field to the table, then re-generate the classes from the T4 template.
You won't get any mapping beyond basic primitive types, though. Certainly not a dictionary in a field.
* many to many relationships
You will have to make custom modifications to the T4 template to get any sort of support for many-to-many tables. SubSonic just treats them like any other table.
I have made such modifications and they are of limited usefulness.
* cascading saves/deletes
Only on the RDBMS side. That is, if you setup foreign key relationships with cascades. SubSonic doesn't do any of this.
* mapping a complex object to an xml or varchar(max) column (seralize
it to xml obviously)
Nope. You get no support like this. There are no extensibility hooks to insert your own type converters.
SubSonic is a completely different field from NHibernate. I would call NHib an ORM, but I would not call SubSonic that. Rob Conery, the author of SubSonic, would call it a query tool.
It is very simplistic, which is its goal and strength (as well as weakness). It assists with querying and modifications in a strongly typed way. It lacks a huge amount of features and configurability/extensibility compared to NHib or even Entity Framework.
I would caution against moving from NHib to SS, especially if you have any amount of functionality implemented in NHibernate already. Not that SS is a bad tool, but it has a lot of restrictions.
I have complex objects with collection fields which needed to be stored to Hadoop. I don't want to go through whole object tree and explicitly store each field. So I just think about serialization of complex fields and store it as one big piece. And than desirialize it when reading object. So what is the best way to do it? I though about using some kind serilization for that but I hope that Hadoop has means to handle this situation.
Sample object's class to store:
class ComplexClass {
<simple fields>
List<AnotherComplexClassWithCollectionFields> collection;
}
HBase only deals with byte arrays, so you can serialize your object in any way you see fit.
The standard Hadoop way of serializing objects is to implement the org.apache.hadoop.io.Writable interface. Then you can serialize your object into a byte array using org.apache.hadoop.io.WritableUtils.toByteArray(Writable ... writable).
Also, there are other serialization frameworks that people in the Hadoop community use, like Avro, Protocol Buffers, and Thrift. All have their specific use cases, so do your research. If you're doing something simple, implementing Hadoop's Writable should be good enough.