How to serialize and load an object in SBCL/Common Lisp - serialization

I have an object o that is an instance of a class X in SBCL.
I want a function write-X-object that serializes o to a file in such a way that when that file is read back in with load-X-object, the resulting object is equivalent to o.
;; writing the object
(write-X-object o "~/tmp/o.serialized")
;; reading the object, much later,
;; after sbcl has been exited and restarted
(setq v (read-X-object "~/tmp/o.serialized"))
o might be about a gigabyte in size (or an array of several million smaller objects), with a complex structure, so the idea is for the reading and writing to be as quick as possible.

There are three main ways to do this
Use the built in facilities of print/read: you can define a method on print-object for your class that would serialise it (perhaps you would have it depend on some special variable so you don't print gigabytes to the repl). Then you could define a reader macro (corresponding to whatever syntax you use to print your object) then to save your object you would do (with-open-file (x "/tmp/foo" :direction :output) (print my-object X)) and to get it back you would do (with-open-file (x "/tmp/foo") (read x). The pros are that this is simple. The cons are that this is slow and is not space-efficient.
You could take advantage of a third-party serialisation library like conspack which a commenter suggested. Pros: reasonably fast to read or write. Cons: you can't incrementally read an object larger than memory.
You could restructure your class (using the MOP) so that the object can be stored in memory and on disk in the exact same format and then use mmap to read/write it. An example library for this is manardb. This system also allows lots of different objects to be stored. Pros: no overhead to read/write to/from disk. Can handle objects larger than main memory automatically (the is handles swapping in/out of ram for you). Cons: may be small overhead for accessing fields of object. Note that with this method, one can normally avoid most of the cons by using an implementation that can properly optimise access through ffi pointers and by using specialised data structures (e.g a float array being much better than a general array)

like define a method print-object, I guess you can override generic-function MAKE-LOAD-FORM by defmethod.
make-load-form
and
make-load-form-saving-slots have defined in HyperSpec.
They return two forms, then you can write form to file with text, although that may take more space of disk (compare write specific binary spec).
And CLiki has a page about serialization. Some third-party package would be work at now I think.

Related

Slow Deletion of Handle Object in MATLAB

I used MATLAB to write a simulation engine for the simulation of product flows in a production environment. I inherited all used class from handle and used these handles (quite excessively, I guess) to link between e.g. products and work systems, orders, etc.
Now, to run multiple instances of my model, I create a simulation object that contains all other objects and their relations, run the model and free the simulation variable.
Creating and running the model takes ~50 seconds (this including the generation of all objects, their relations and of course the calculation over the course of the simulation run). Freeing the variable before the next run, currently takes ~3-4 minutes!
I tried clear, delete and plain overwriting of the old simulation object, without notifying significant differences in performance.
Is there a way to improve the performance without rewriting the code?
It is hard to say anything particular about your code without seeing it, or at least some high level design.
A short advice before optimizing the OO aspects :
Are you sure that the bottleneck is in the objects creation? Verify it with the profiler.
If the OO is indeed the bottleneck, here are some guesses:
You have used circular references. Matlab does not use garbage collector, but rather a smart reference counting mechanism, which can be quite slow in this case. Change the references between the objects to be tree-like instead.
You have created an enormous amount of objects. Matlab has a significant overhead for each object, much more than the traditional languages (c++, java). Re-design the system to have a smaller amount of objects.
Do you happen to use cell arrays to store other handle objects from within a handle object? This can cause serious slowdowns prior to Matlab R2011A. See http://www.mathworks.com/support/solutions/en/data/1-6VVMS0/index.html?product=ML
A workaround is to use a temp local variable to manipulate cell array, then assign this tmp variable back to your handle object property. I saw ~ 100X improvement in performance after doing this in one case.

Why do we use serialization?

Why do we need to use serialization?
If we want to send an object or piece of data through a network we can use streams of bytes. If we want to save some data to the disk, we can again use the binary mode along with the byte streams and save it.
So what's the advantage of using serialization?
Technically on the low-level, your serialized object will also end up as a stream of bytes on your cable or your filesystem...
So you can also think of it as a standardized and already available way of converting your objects to a stream of bytes. Storing/transferring object is a very common requirement, and it has less or little meaning to reinvent this wheel in every application.
As other have mentioned, you also know that this object->stream_of_bytes implementation is quite robust, tested, and generally architecture-independent.
This does not mean it is the only acceptable way to save or transfer an object: in some cases, you'll have to implement your own methods, for example to avoid saving unnecessary/private members (for example for security or performance reasons). But if you are in a simple case, you can make your life easier by using the serialization/deserialization of your framework, language or VM instead of having to implement it by yourself.
Hope this helps.
Quoting from Designing Data Intensive Applications book:
Programs usually work with data in (at least) two different
representations:
In memory, data is kept in objects, structs, lists, arrays, hash tables, trees, and so on. These data structures are optimized for
efficient access and manipulation by the CPU (typically using
pointers).
When you want to write data to a file or send it over the network, you have to encode it as some kind of self-contained sequence of bytes
(for example, a JSON document). Since a pointer wouldn’t make sense to
any other process, this sequence-of-bytes representation looks quite
different from the data structures that are normally used in memory.
Thus, we need some kind of translation between the two
representations. The translation from the in-memory representation to
a byte sequence is called encoding (also known as serialization or
marshalling), and the reverse is called decoding (parsing,
deserialization, unmarshalling).
Among other reasons to be compatible between architecture. An integer doesn't have the same number of bytes from one architecture to another, and sometimes from one compiler to another.
Plus what you're talking about is still serialization. Binary Serialization. You're putting all the bytes of your object together in order to store them and be able to reconvert them as an object later. This is serializing.
More info on wikipedia
Serialization is the process of converting an object into a stream so that it can be saved in any physical file like (XML) or can be saved in Database. The main purpose of Serialization in C# is to persist an object and save it in any specified storage medium like stream, physical file or DataBase.
In General, serialization is a method to persist an object's state, but i suggest you to read this wiki page, it is pretty detailed and correct in my opinion:
http://en.wikipedia.org/wiki/Serialization
In serialization, the point is not turning an object into bits and bytes, objects ARE bits and bytes already. Serialization is the process of making the object's "state" persistent. Notice the word "state", which means the values of the instance variables of the entire object graph (the target object and all the objects it references either directly or indirectly) WITHOUT methods and other extra runtime stuff stuck to them (and of course plus a little more info that JVM needs for restoration of these objects, such as their class types).
So this is the main reason of its necessity: Storing the whole bytes of objects would be expensive, and for all intents and purposes, unnecessary.

Advantages and disadvantages of encoding objects with NSCoding or simply writing data to files

I'm curious what the advantages of encoding objects in objective c with NSCoding and writing them to disk may be over simply writing a persistence object to disk. Is there a performance increase in terms of I/O or disk space usage?
Well, most NSCoding implementations will handle object graphs correctly; i.e. if you code a member object that's already been coded to the coder, it won't code it again. Decoding will restore the object graph correctly (so the decoded target object has multiple inbound references). You also get all the built in helper coding functions (for primitive types, and objects).
Other than that, NSCoders are just persistence object generators, so you end up doing similar work, only without the annoyances and common cases handled by Apple. What persistence generator could you write that wouldn't duplicate tonnes of NSCoder functionality?

VB.NET: is using Structures considered nasty?

I use to use Structures quite a lot in the VB6 days, and try to avoid them now with .NET. Just wondering if using structures in 2010 instead of a Class is considered nasty?
Thanks for the help.
Choosing a Structure takes consideration instead of being inherently "nasty". There are reasons why a Structure can be nasty; however there are also reasons a Class can be nasty in its own way...
Basically when you decide between these two object oriented kinds of containers, you're deciding how memory will be used.
There are different semantics associated with Structure and Class in VB.NET and they represent different memory usage patterns.
By creating a Class you're creating a reference type.
good for large data
memory contains a reference to the object location on the heap (like the concept of pointing to an object) though happens transparently to the VB.NET programmer because you're in "managed mode".
By creating a Structure you're creating a value type.
good for small data
memory allocated contains the actual value
be judicious because these are apt to get pushed on the stack area of memory (i.e. for local vars, but not class fields) - too large and you could run into stack issues.
Also some good video resources on YouTube if you're an audio learner.
Many articles on the Internet like these MSDN articles to teach the basics and details:
Value Types and Reference Types
7.1 Types - Reference and Value
MSDN Type Fundamentals - subheading: Reference and Value Types
Example
Structures exist because in some scenarios they make more sense than classes. They are particular useful for representing small abstract data types such as 3D points, latitude-longitude, rational numbers, etc.
The basic motivation for using structs is to avoid GC pressure. Since structs live inline (on the stack or inside whatever container you put them in) rather than on the heap, they typically result in far fewer small allocations, which makes a huge difference if you need to hold an array of one million points or rationals.
A key issue to watch out for is that structs are value types, and thus are generally passed around by value (the obvious exception being ref and out parameters). This has important implications. For instance:
Point3D[] points = ...;
points[9].Move(0, 0, 5);
The above code works fine, and increases the z coordinate of the 10th point by 5. The following code, however:
List<Point3D> points = ...;
points[9].Move(0, 0, 5);
Will still compile and run, but you will find that the z coordinate of the 10th point remains unchanged. This is because the List's index operator returns a copy of the point, and it is the copy that you are calling Move on.
The solution is quite simple. Always make structs immutable by marking all fields readonly. If you still need to Move points around, define + on the Point3D type and use assignment:
points[9] = points[9] + new Point3D(0, 0, 5);
It's considered pretty bad to use anything without understanding the implications.
Structures are value types, not reference types - and as such, they behave slightly differently. When you pass a value type, modifications are on a copy, not on the original. When you assign a value type to an object reference (say, in a non-generic list), boxing occurs. Make sure you read up on the full effect of choosing one over the other.
Read this for some understanding benefits of structures vs classes and vice-versa.
A structure can be preferable when:
You have a small amount of data and simply want the equivalent of the UDT
(user-defined type) of previous versions of Visual Basic
You perform a large number of operations on each instance and would incur
performance degradation with heap management
You have no need to inherit the structure or to specialize
functionality among its instances
You do not box and unbox the structure
You are passing blittable data across a managed/unmanaged boundary
A class is preferable when:
You need to use inheritance and polymorphism
You need to initialize one or more members at creation time
You need to supply an unparameterized constructor
You need unlimited event handling support
To answer your question directly, there is nothing inherantly wrong with using a structure in VB.NET. As with any design decision you do need to consider the consequences of this decision.
It's important that you're aware of the difference between a class and a structure so that you can make an educated decision about which is appropriate. As stated by Alex et al, one of the key differences between a structure and a class is that a structure is considered a value type and a class is considered a reference type.
Reference types use copy-by-reference sematics, this means that when an object is created or copied, only a pointer to the actual object is allocated on the stack, the actual object data is allocated on the heap.
In contrast, value types have copy-by-value sematics which means that each time a value type (e.g. a structure) is copied, then the entire object is copied to a new location on the stack/
For objects with a small amount of data, this isn't really a problem, but if you have a large amount of data then using a reference type will likely be less expensive in terms of stack allocations because only a pointer will be copied to the stack.
Microsoft have guidelines on the use of structures that more precisely describe the differences between classes and structures and the consequences of choosing one over the other
From a behavioral standpoints, there are three types of 'things' in .net:
Mutable reference types
Value types which can be mutated without being entirely replaced
Immutable reference and value types
Eric Lippert really dislikes group #2 above, since .net isn't terribly good at handling them, and sometimes treats them as though they're in group #1 or #3. Nonetheless, there are times when mutable value types make more sense semantically than would anything else.
Suppose, for example, that one has a rectangle and one wants to make another rectangle which is like the first one, but twice as tall. It is IMHO cleaner to say:
Rect2 = Rect1 ' Makes a new Rectangle that's just like Rect1
Rect2.Height = Rect2.Height*2
than to say either
Rect2 = Rect1.Clone ' Would be necessary if Rect1 were a class
Rect2.Height = Rect2.Height*2
or
Rect2 = New Rectangle(Rect1.Left, Rect1.Top, Rect1.Width, Rect1.Height*2)
When using classes, if one wants an object that's slightly different from an existing object, one must consider before mutating the object whether anyone else might want to use the original; if so, one must make a copy of it and then make the desired changes to the copy. With structs, there's no such restriction.
A simple way to think of value types is to regard every assignment operation as making a clone of the original, but in a way that's considerably cheaper than cloning a class type. If one would end up cloning a lot of objects as often as one would assign references without cloning, that's a substantial argument in favor of structs.

Mutable vs immutable objects

I'm trying to get my head around mutable vs immutable objects. Using mutable objects gets a lot of bad press (e.g. returning an array of strings from a method) but I'm having trouble understanding what the negative impacts are of this. What are the best practices around using mutable objects? Should you avoid them whenever possible?
Well, there are a few aspects to this.
Mutable objects without reference-identity can cause bugs at odd times. For example, consider a Person bean with a value-based equals method:
Map<Person, String> map = ...
Person p = new Person();
map.put(p, "Hey, there!");
p.setName("Daniel");
map.get(p); // => null
The Person instance gets "lost" in the map when used as a key because its hashCode and equality were based upon mutable values. Those values changed outside the map and all of the hashing became obsolete. Theorists like to harp on this point, but in practice I haven't found it to be too much of an issue.
Another aspect is the logical "reasonability" of your code. This is a hard term to define, encompassing everything from readability to flow. Generically, you should be able to look at a piece of code and easily understand what it does. But more important than that, you should be able to convince yourself that it does what it does correctly. When objects can change independently across different code "domains", it sometimes becomes difficult to keep track of what is where and why ("spooky action at a distance"). This is a more difficult concept to exemplify, but it's something that is often faced in larger, more complex architectures.
Finally, mutable objects are killer in concurrent situations. Whenever you access a mutable object from separate threads, you have to deal with locking. This reduces throughput and makes your code dramatically more difficult to maintain. A sufficiently complicated system blows this problem so far out of proportion that it becomes nearly impossible to maintain (even for concurrency experts).
Immutable objects (and more particularly, immutable collections) avoid all of these problems. Once you get your mind around how they work, your code will develop into something which is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways. Immutable objects are even easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce. In short, they're good practice all around!
With that said, I'm hardly a zealot in this matter. Some problems just don't model nicely when everything is immutable. But I do think that you should try to push as much of your code in that direction as possible, assuming of course that you're using a language which makes this a tenable opinion (C/C++ makes this very difficult, as does Java). In short: the advantages depend somewhat on your problem, but I would tend to prefer immutability.
Immutable Objects vs. Immutable Collections
One of the finer points in the debate over mutable vs. immutable objects is the possibility of extending the concept of immutability to collections. An immutable object is an object that often represents a single logical structure of data (for example an immutable string). When you have a reference to an immutable object, the contents of the object will not change.
An immutable collection is a collection that never changes.
When I perform an operation on a mutable collection, then I change the collection in place, and all entities that have references to the collection will see the change.
When I perform an operation on an immutable collection, a reference is returned to a new collection reflecting the change. All entities that have references to previous versions of the collection will not see the change.
Clever implementations do not necessarily need to copy (clone) the entire collection in order to provide that immutability. The simplest example is the stack implemented as a singly linked list and the push/pop operations. You can reuse all of the nodes from the previous collection in the new collection, adding only a single node for the push, and cloning no nodes for the pop. The push_tail operation on a singly linked list, on the other hand, is not so simple or efficient.
Immutable vs. Mutable variables/references
Some functional languages take the concept of immutability to object references themselves, allowing only a single reference assignment.
In Erlang this is true for all "variables". I can only assign objects to a reference once. If I were to operate on a collection, I would not be able to reassign the new collection to the old reference (variable name).
Scala also builds this into the language with all references being declared with var or val, vals only being single assignment and promoting a functional style, but vars allowing a more C-like or Java-like program structure.
The var/val declaration is required, while many traditional languages use optional modifiers such as final in java and const in C.
Ease of Development vs. Performance
Almost always the reason to use an immutable object is to promote side effect free programming and simple reasoning about the code (especially in a highly concurrent/parallel environment). You don't have to worry about the underlying data being changed by another entity if the object is immutable.
The main drawback is performance. Here is a write-up on a simple test I did in Java comparing some immutable vs. mutable objects in a toy problem.
The performance issues are moot in many applications, but not all, which is why many large numerical packages, such as the Numpy Array class in Python, allow for In-Place updates of large arrays. This would be important for application areas that make use of large matrix and vector operations. This large data-parallel and computationally intensive problems achieve a great speed-up by operating in place.
Immutable objects are a very powerful concept. They take away a lot of the burden of trying to keep objects/variables consistent for all clients.
You can use them for low level, non-polymorphic objects - like a CPoint class - that are used mostly with value semantics.
Or you can use them for high level, polymorphic interfaces - like an IFunction representing a mathematical function - that is used exclusively with object semantics.
Greatest advantage: immutability + object semantics + smart pointers make object ownership a non-issue, all clients of the object have their own private copy by default. Implicitly this also means deterministic behavior in the presence of concurrency.
Disadvantage: when used with objects containing lots of data, memory consumption can become an issue. A solution to this could be to keep operations on an object symbolic and do a lazy evaluation. However, this can then lead to chains of symbolic calculations, that may negatively influence performance if the interface is not designed to accommodate symbolic operations. Something to definitely avoid in this case is returning huge chunks of memory from a method. In combination with chained symbolic operations, this could lead to massive memory consumption and performance degradation.
So immutable objects are definitely my primary way of thinking about object-oriented design, but they are not a dogma.
They solve a lot of problems for clients of objects, but also create many, especially for the implementers.
Check this blog post: http://www.yegor256.com/2014/06/09/objects-should-be-immutable.html. It explains why immutable objects are better than mutable. In short:
immutable objects are simpler to construct, test, and use
truly immutable objects are always thread-safe
they help to avoid temporal coupling
their usage is side-effect free (no defensive copies)
identity mutability problem is avoided
they always have failure atomicity
they are much easier to cache
You should specify what language you're talking about. For low-level languages like C or C++, I prefer to use mutable objects to conserve space and reduce memory churn. In higher-level languages, immutable objects make it easier to reason about the behavior of the code (especially multi-threaded code) because there's no "spooky action at a distance".
A mutable object is simply an object that can be modified after it's created/instantiated, vs an immutable object that cannot be modified (see the Wikipedia page on the subject). An example of this in a programming language is Pythons lists and tuples. Lists can be modified (e.g., new items can be added after it's created) whereas tuples cannot.
I don't really think there's a clearcut answer as to which one is better for all situations. They both have their places.
Shortly:
Mutable instance is passed by reference.
Immutable instance is passed by value.
Abstract example. Lets suppose that there exists a file named txtfile on my HDD. Now, when you are asking me to give you the txtfile file, I can do it in the following two modes:
I can create a shortcut to the txtfile and pass shortcut to you, or
I can do a full copy of the txtfile file and pass copied file to you.
In the first mode, the returned file represents a mutable file, because any change into the shortcut file will be reflected into the original one as well, and vice versa.
In the second mode, the returned file represents an immutable file, because any change into the copied file will not be reflected into the original one, and vice versa.
If a class type is mutable, a variable of that class type can have a number of different meanings. For example, suppose an object foo has a field int[] arr, and it holds a reference to a int[3] holding the numbers {5, 7, 9}. Even though the type of the field is known, there are at least four different things it can represent:
A potentially-shared reference, all of whose holders care only that it encapsulates the values 5, 7, and 9. If foo wants arr to encapsulate different values, it must replace it with a different array that contains the desired values. If one wants to make a copy of foo, one may give the copy either a reference to arr or a new array holding the values {1,2,3}, whichever is more convenient.
The only reference, anywhere in the universe, to an array which encapsulates the values 5, 7, and 9. set of three storage locations which at the moment hold the values 5, 7, and 9; if foo wants it to encapsulate the values 5, 8, and 9, it may either change the second item in that array or create a new array holding the values 5, 8, and 9 and abandon the old one. Note that if one wanted to make a copy of foo, one must in the copy replace arr with a reference to a new array in order for foo.arr to remain as the only reference to that array anywhere in the universe.
A reference to an array which is owned by some other object that has exposed it to foo for some reason (e.g. perhaps it wants foo to store some data there). In this scenario, arr doesn't encapsulate the contents of the array, but rather its identity. Because replacing arr with a reference to a new array would totally change its meaning, a copy of foo should hold a reference to the same array.
A reference to an array of which foo is the sole owner, but to which references are held by other object for some reason (e.g. it wants to have the other object to store data there--the flipside of the previous case). In this scenario, arr encapsulates both the identity of the array and its contents. Replacing arr with a reference to a new array would totally change its meaning, but having a clone's arr refer to foo.arr would violate the assumption that foo is the sole owner. There is thus no way to copy foo.
In theory, int[] should be a nice simple well-defined type, but it has four very different meanings. By contrast, a reference to an immutable object (e.g. String) generally only has one meaning. Much of the "power" of immutable objects stems from that fact.
Mutable collections are in general faster than their immutable counterparts when used for in-place
operations.
However, mutability comes at a cost: you need to be much more careful sharing them between
different parts of your program.
It is easy to create bugs where a shared mutable collection is updated
unexpectedly, forcing you to hunt down which line in a large codebase is performing the unwanted update.
A common approach is to use mutable collections locally within a function or private to a class where there
is a performance bottleneck, but to use immutable collections elsewhere where speed is less of a concern.
That gives you the high performance of mutable collections where it matters most, while not sacrificing
the safety that immutable collections give you throughout the bulk of your application logic.
If you return references of an array or string, then outside world can modify the content in that object, and hence make it as mutable (modifiable) object.
Immutable means can't be changed, and mutable means you can change.
Objects are different than primitives in Java. Primitives are built in types (boolean, int, etc) and objects (classes) are user created types.
Primitives and objects can be mutable or immutable when defined as member variables within the implementation of a class.
A lot of people people think primitives and object variables having a final modifier infront of them are immutable, however, this isn't exactly true. So final almost doesn't mean immutable for variables. See example here
http://www.siteconsortium.com/h/D0000F.php.
General Mutable vs Immutable
Unmodifiable - is a wrapper around modifiable. It guarantees that it can not be changed directly(but it is possibly using backing object)
Immutable - state of which can not be changed after creation. Object is immutable when all its fields are immutable. It is a next step of Unmodifiable object
Thread safe
The main advantage of Immutable object is that it is a naturally for concurrent environment. The biggest problem in concurrency is shared resource which can be changed any of thread. But if an object is immutable it is read-only which is thread safe operation. Any modification of an original immutable object return a copy
source of truth, side-effects free
As a developer you are completely sure that immutable object's state can not be changed from any place(on purpose or not). For example if a consumer uses immutable object he is able to use an original immutable object
compile optimisation
Improve performance
Disadvantage:
Copying of object is more heavy operation than changing a mutable object, that is why it has some performance footprint
To create an immutable object you should use:
1. Language level
Each language contains tools to help you with it. For example:
Java has final and primitives
Swift has let and struct[About].
Language defines a type of variable. For example:
Java has primitive and reference type,
Swift has value and reference type[About].
For immutable object more convenient is primitives and value type which make a copy by default. As for reference type it is more difficult(because you are able to change object's state out of it) but possible. For example you can use clone pattern on a developer level to make a deep(instead of shallow) copy.
2. Developer level
As a developer you should not provide an interface for changing state
[Swift] and [Java] immutable collection