Is the format of the data held in kotlin.MetaData documented anywhere? - kotlin

I'm interested to know what data is held in the MetaData annotation added to each Kotlin class.
But most fields give no more detail than
"Metadata in a custom format. The format may be different (or even absent) for different kinds."
https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/jvm/runtime/kotlin/Metadata.kt
Is there are reference somewhere that explains how to interpret this data?

kotlin.Metadata contains information about Kotlin symbols, such as their names, signatures, relations between types, etc. Some of this information is already present in the JVM signatures in the class files, but a lot is not, since there's quite a few Kotlin-specific things which JVM class files cannot represent properly: type nullability, mutable/read-only collection interfaces, declaration-site variance, and others.
No specific actions were taken to make the schema of the data encoded in this annotation public, because for most users such data is needed to introspect a program at runtime, and the Kotlin reflection library provides a nice API for that.
If you need to inspect Kotlin-specific stuff which is not exposed via the reflection API, or you're just generally curious what else is stored in that annotation, you can take a look at the implementation of kotlinx.reflect.lite. It's a light-weight library, the core of which is the protobuf-generated schema parser. There's not much supported there at the moment, but there are schemas available
which you can use to read any other data you need.
UPD (August 2018): since this was answered, we've published a new (experimental and unstable) library, which is designed to be the intended way for reading and modifying the metadata: https://discuss.kotlinlang.org/t/announcing-kotlinx-metadata-jvm-library-for-reading-modifying-metadata-of-kotlin-jvm-class-files/7980

Related

Decoding and encoding strings for kotlinx.serialization.properties

I'm currently struggling with the experimental KXS-properties serialization backend, mainly because of two reasons:
I can't find any documentation for it (I think there is none)
KXS-properties only includes a serializer / deserializer, but no encoder / decoder
The endpoint provided by the framework is essentially Map<String, Any>, but the map is flat and the keys already have the usual dot-separated properties syntax. So the step that I have to take is to encode the map to a single string that is printable to a .properties file AND decode a single string from a .properties file into the map. I'm generally following the Properties Format Spec from https://docs.oracle.com/javase/10/docs/api/java/util/Properties.html#load(java.io.Reader), it's not as easy as one might think.
The problem is that I can't use java.util.Properties right away because KXS is multiplatform and it would kinda kill the purpose of it when I'd restrict it to JVM because I use java.util.Properties. If I were to use it, the solution would be pretty simple, like this: https://gist.github.com/RaphaelTarita/748e02c06574b20c25ab96c87235096d
So I'm trying to implement my own encoder / decoder, following the rough structure of kotlinx.serialization.json.Json.kt. Although it's pretty tedious, it went well so far, but now I've stumbled upon a new problem:
As far as I know (I am not sure because there is no documentation), the map only contains primitives (or primitive-equivalents, as Kotlin does not really have primitives). I suspect this because when you write your own KSerializers for the KXS frontend, you can specify to encode to any primitive by invoking the encodeXXX() functions of the Encoder interface. Now the problem is: When I try to decode to the map that should contain primitives, how do I even know which primitives are expected by the model class?
I've once written my own serializer / deserializer in Java to learn about the topic, but in that implementation, the backend was a lot more tightly coupled to the frontend, so that I could query the expected primitive type from the model class in the backend. But in my situation, I don't have access to the model class and I have no clue how to retrieve the expected types.
As you can see, I've tried multiple approaches, but none of them worked right away. If you can help me to get any of these to work, that would be very much appreciated
Thank you!
The way it works in kotlinx.serialization is that there are serializers that describe classes and structures etc. as well as code that writes/read properties as well as the struct. It is then the job of the format to map those operations to/from a data format.
The intended purpose of kotlinx.serialization.Properties is to support serializing a Kotlin class to/from a java.util.Properties like structure. It is fairly simple in setup in that every nested property is serialized by prepending the property name to the name (the dotted properties syntax).
Unfortunately it is indeed the case that this deserializing from this format requires knowing the types expected. It doesn't just read from string. However, it is possible to determine the structure. You can use the descriptor property of the serializer to introspect the expectations.
From my perspective this format is a bit more simple than it should be. It is a good example of a custom format though. A key distinction between formats is whether they are intended to just provide a storage format, or whether the output is intended (be able to) to represent a well designed api. The latter ones need to be more complex.

Why do we use only [List, Map, Set] collections in Kotlin?

I've been learning Kotlin and I've faced with Collections API. Before Kotlin I'd been learning Java and I know that in Java there's a lot of different types of Collections API. For example, instead of general List, Map, Queue, Set we use ArrayList, HashMap, LinkedList, LinkedMap and etc. Though in Kotlin we only use general types like Map, List, Set but also we can use HashMap and etc. So, what's going on there? Can you help me to figure out?
While Kotlin's original and primary target is the JVM, there is a huge push by JetBrains to make it multiplatform, and support JS and Native as well.
If you're using Kotlin on the JVM, the implementations of any collections you're using will still be the original JDK classes, e.g. java.util.ArrayList or java.util.HashSet. These are not reimplemented by the Kotlin standard library, which has some great benefits:
These are well-tested implementations, which are maintained anyway.
Using the exact same classes makes interop with Java a breeze, as you can pass them back and forth without having to perform conversions or mapping of any kind.
What Kotlin does do is introduce its own collection semantics over these existing implementations, in the form of the standard library interfaces such as List, Map, MutableList, MutableMap and so on. A small bit of compiler magic makes it so that these interfaces are implemented by the existing JDK classes as well.
If you don't need a specific implementation of a certain type of collection, you can use your collections via these interfaces plus the respective factory methods of the standard library (listOf, mapOf, mutableListOf, mutableMapOf, etc.). This keeps your code more generic, and independent of the concrete underlying implementations. You don't know what specific class the standard library mutableListOf function will create for you, only that it will be an object that satisfies the contract of the MutableList interface.
You should basically use these interfaces by default in your code, especially in public API:
In the case of function parameters, this lets clients provide the function with whatever implementation of the collection they wish to give you. If your function can operate on anything that's a List, you should ask for just that interface - no reason to require an ArrayList or LinkedList specifically.
If this is a return type, using these interfaces lets you change the specific implementation that you create internally in the future, without breaking client code. You can promise to just return a MutableList of things, and what implementation backs that list is not exposed to your clients.
If you look at all the collection handling functions of the Kotlin standard library, you'll see that on the surface, they almost exclusively operate on these interfaces. If you dig down deep enough, you'll find ArrayList instances being created, but this is not exposed to the client code, as it doesn't have to care about the concrete implementation most of the time.
Going back to the multiplatform point once more, if you write your code in a way such that it only relies on Kotlin standard library defined types, that code will be easily usable for non-JVM targets. If you reference kotlin.MutableList in your imports, that can immediately compile to JS code, because there's a Kotlin standard library implementation of that interface on each platform. Whether that maps to an existing class directly, wraps an existing class somehow, or is implemented for Kotlin from scratch, again, doesn't have to concern you. But if you refer to java.util.TreeSet in your code, that won't fly for the JS target, as the Java platform classes are not available there.
Can you still use classes such as java.util.ArrayList directly? Of course.
If you don't see your code going multiplatform at some point, using Java collections directly is perfectly okay.
If you need a specific implementation for a List or a Set for performance reasons, sometimes you'll have to use the Java classes directly.
Interestingly, in recent releases of Kotlin, these specific types of implementations (such as an array based list) are wrapped under standard library typealiases too, so that they're platform independent by default: see kotlin.collections.ArrayList or kotlin.collections.HashSet for examples of this. These Kotlin-defined types will usually show up first in IntelliJ completion, so you'll find yourself being pushed towards using them wherever possible. Same thing goes for most exceptions, e.g. IllegalArgumentException.
TL;DR: You can use either Kotlin collection types of Java types in Kotlin, but you should probably do the former whenever you can.

Why the message classes generated by the protocol buffer compiler are all immutable?

The protocol buffer compiler generated message classes are immutable. The message classes contain appropriate setter methods but no getter methods on it. This constraint does not apply to other serialization technologies like Java binary serialization, XML, JSON, etc.
As per my understanding, immutability is of use while doing concurrent programming. Immutability could be of help in achieving thread-safety. But, I assume, that is not the reason in case of protocol buffer.
What could be the reason of making message classes immutable?
After reading the protocol buffer documentation, it seems the above stated only applies to Java (at the least) and not to C++ and other supported platforms/languages.
Note: This question is only to satisfy my curiosity.
Thanks.
The google implementation indeed uses a builder pattern - i.e. a mutable (but not very usable in terms of entity) builder, which creates an immutable object instance. This is not a requirement - indeed, there are alternative implementations for several platforms that do not use this design pattern. But frankly, it simply isn't an issue, because if there is any friction (and what you describe: friction) then you should simply avoid using your DTO types (i.e. the objects used for serialization) as your primary domain entity types. As soon as you do that, it becomes a non-issue: you write your own domain entity types with whatever pattern you like (including any domain logic etc), and then map to/from the DTO types as and when you need to; then the choice of design pattern used by the DTO tier is a mere uninteresting implementation detail.
But again: for your chosen platform, take a look to see if any alternative implementations might suit your requirements more closely.

How to serialize a complex interface with unexported fields?

I need to serialize some complex interface (template.Template). It has many unexported fields, and gob don't want to work with them. Any suggestions?
P.S. Actualy, I trying to put a parsed template to memcache on App Engine.
The short answer is that there's usually a reason for unexported fields--template.Template, for instance, contains information which changes during parsing--so I'd advise against serializing them yourself with reflect. However, the GobEncoder and GobDecoder interfaces were recently added to gob; if you need to serialize a complex struct with unexported fields, encourage the author of the package to implement these interfaces. Even better, implement them yourself (shouldn't be hard for template.Template) and contribute your patch.
If the type is from another package (such as template) this can't be done with any of the current serialization libs for Go (gob, json, bson, etc.). Nor should it be done, because the fields are unexported.
However, if you really need to, you can write your own serializer using package reflect, specifically Value.Field() and friends to get the unexported fields. Then you just need to store them in a way that you can decode later.

Serialization of Objects

how does Serialization of objects works? How object got deserialized and a instance is created from serialized date without a call to any constructor?
I've kept this answer language agnostic since a language wasn't given.
When the object is serialized, all the require information to rebuild it is encoded in way which can be retrieved. This typically includes the type of the object, as well as the value of all the instance variables.
When the object is deserialized, an area in memory of the correct size is allocated and is populated using the serialized information such that the new object is identical to the serialized one.
The running program can then refer to this new object in memory without having to actually call the constructor.
There are lots of little details which this doesn't explain, but this is the general idea of serialization/deserialization.
Are you talking about Java? If so, serialization is an extralingual object creation mechanism. It's a backdoor that uses native code to create the object without calling any constructors. Therefore, when designing a class for serializability, you need to make sure that a class created through deserialization maintains the same invariants (key fields being initialized) as you would through the constructor path. A third way to create objects in Java is through cloning, and similar issues apply.
Cloning and serialization don't interact well with the use of final fields if you need to set the value of that field to something different than what is returned by clone or the deserialization process.
Josh Bloch's "Effective Java" has some chapters that explain these issues in more depth.
(this answer may apply to other languages too, but I've only used serialization in Java)
Regarding .NET: this isn't a definitive or textbook answer, and I might be all-out wrong...
.NET Serialization needs to be seperated out into Binary vs. others (XML or an XML derivitave typically). Binary serialization is mostly a black-box to me, but it allows the object to be serialized and restored in their current state. XML serialization typically only serialized the public fields/properties of an object, unless overriden by adding a custom ISerializable implementation.
In the case of XML serialization I believe .NET uses Reflection to determine which fields and properties get converted to their equivalent Elements. Adding an [XMLSerializable] attribute will implement a default behavior which can be adjusted by applying other attributes at the field level (such as [XMLAttribute]).
The metadata (which Reflection depends on) stores all the object members as well as their attributes and addresses, which allows the serializer to determine how it should build the output.