I noticed that the hashcode of Char-values is exactly the ID they have in ASCII, for example:
println('a'.hashCode()) //is 97
Is this true by contract and where can I see the implementation for this? The class Any.kt doesn't contain the implementation and Char.kt does neither.
I noticed that the hashcode of Char-values is exactly the ID they have in ASCII […]
That is impossible. ASCII only has 128 values, but Kotlin Char has 65536, so clearly, a Char cannot have their ASCII value as their hashcode, because 99.8% of them don't have an ASCII value.
Is this true by contract
No, it is not. The contract for kotlin.Char.hashCode() is:
fun hashCode(): Int
Returns a hash code value for the object. The general contract of hashCode is:
Whenever it is invoked on the same object more than once, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
If two objects are equal according to the equals() method, then calling the hashCode method on each of the two objects must produce the same integer result.
That is the whole contract. There is nothing about a relationship with ASCII.
and where can I see the implementation for this? The class Any.kt doesn't contain the implementation and Char.kt does neither.
I am assuming types like kotlin.Char or kotlin.Int are not actually implemented as Kotlin objects but as compiler intrinsics for performance reasons. For example, I would expect 42 to be a JVM int on the JVM platform and an ECMAScript number on the ECMAScript platform, and not implemented as a full-blown object with object header, instance variable table, class pointer, etc.
As it so happens, Kotlin's contract for hashCode() matches the contract for pretty much every other language as well, so I would expect that they as much as possible re-use the underlying platform's implementation. (In fact, I would suspect that is precisely the reason for designing the contract this way.)
Even for Kotlin/Native, it makes sense to map kotlin.Int to a native machine integer int_fast32_t or int32_t type.
I'm looking for a quick way to serialize custom structures consisting of basic value types and strings.
Using C++CLI to pin the pointer of the structure instance and destination array and then memcpy the data over is working quite well for all the value types. However, if I include any reference types such as string then all I get is the reference address.
Expected as much since otherwise it would be impossible for the structure to have a fixed.. structure. I figured that maybe, if I make the string fixed size, it might place it inside the structure though. Adding < VBFixedString(256) > to the string declaration did not achieve that.
Is there anything else that would place the actual data inside the structure?
Pinning a managed object and memcpy-ing the content will never give you what you want. Any managed object, be it String, a character array, or anything else will show up as a reference, and you'll just get a memory location.
If I read between the lines, it sounds like you need to call some C or C++ (not C++/CLI) code, and pass it a C struct that looks similar to this:
struct UnmanagedFoo
{
int a_number;
char a_string[256];
};
If that's the case, then I'd solve this by setting up the automatic marshaling to handle this for you. Here's how you'd define that struct so that it marshals properly. (I'm using C# syntax here, but it should be an easy conversion to VB.net syntax.)
[StructLayout(LayoutKind.Sequential, CharSet=CharSet.Ansi)]
public struct ManagedFoo
{
public int a_number;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst=256)]
public string a_string;
}
Explanation:
StructLayout(LayoutKind.Sequential) specifies that the fields should be in the declared order. The default LayoutKind, Auto, allows the fields to be re-ordered if the compiler wants.
CharSet=CharSet.Ansi specifies the type of strings to marshal. You can specify CharSet.Ansi to get char strings on the C++ side, or CharSet.Unicode to get wchar_t strings in C++.
MarshalAs(UnmanagedType.ByValTStr) specifies a string inline to the struct, which is what you were asking about. There are several other string types, with different semantics, see the UnmanagedType page on MSDN for descriptions.
SizeConst=256 specifies the size of the character array. Note that this specifies the number of characters (or when doing arrays, number of array elements), not the number of bytes.
Now, these marshal attributes are an instruction to the built-in marshaler in .Net, which you can call directly from your VB.Net code. To use it, call Marshal.StructureToPtr to go from the .Net object to unmanaged memory, and Marshal.PtrToStructure to go from unmanaged memory to a .Net object. MSDN has some good examples of calling those two methods, take a look at the linked pages.
Wait, what about C++/CLI? Yes, you could use C++/CLI to marshal from the .Net object to a C struct. If your structs get too complex to represent with the MarshalAs attribute, it's highly appropriate to do that. In that case, here's what you do: Declare your .Net struct like I listed above, without the MarshalAs or StructLayout. Also declare the C struct, plain and ordinary, also as listed above. When you need to switch from one to the other, copy things field by field, not a big memcpy. Yes, all the fields that are basic types (integers, doubles, etc.) will be a repetitive output.a_number = input.a_number, but that's the proper way to do it.
I'm writing a multithreaded program in the D programming language, but am pretty new to the language. There is a restriction on types passed between threads using the Tid.send() and receive[Only]() APIs in the std.concurrency package that they must be value types or must be constant to avoid race conditions between the sender and receiver threads. I have a simple struct Message type that I have been passing by value:
enum MessageType {
PrepareRequest,
PrepareResponse,
AcceptRequest,
Accepted
}
struct Message {
MessageType type;
SysTime timestamp;
uint node;
ulong value;
}
However, some MessageTypes don't have all the fields, and it's annoying to use a switch statement and remember which types have which fields when I could use polymorphism to do this work automatically. Is using an immutable class hierarchy recommended here, or is the approach I'm already using the best way to go, and why?
Edit
Also, if I should use immutable classes, what's the recommended way to create immutable objects of a user-defined class? A static method on the class they come from that casts the return value to immutable?
As a rule of a thumb, if you have a polymorphic type hierarchy, classes are the tool to use. And if mutation is out of the question by design, immutable classes should do the trick efficiently.
Great presentation from DConf2013 by Ali has been published recently : http://youtu.be/mPr2UspS0fE . It goes through topic of usage of const and immutable in D in great detail. Among the other good stuff it suggests to use
auto var = new immutable(ClassType)(...); syntax for creating immutable classes. All initialization goes to constructor then and no special hacks are needed.
i know this question has been already asked, but i didnt get it quite right, i would like to know, which is the base one, class or the type. I have few questions, please clear those for me,
Is type the base of a programing data type?
type is hard coded into the language itself. Class is something we can define ourselves?
What is untyped languages, please give some examples
type is not something that fall in to the oop concepts, I mean it is not restricted to oop world
Please clear this for me, thanks.
I didn't work with many languages. Maybe, my questions are correct in terms of : Java, C#, Objective-C
1/ I think type is actually data type in some way people talk about it.
2/ No. Both type and class we can define it. An object of Class A has type A. For example if we define String s = "123"; then s has a type String, belong to class String. But the vice versa is not correct.
For example:
class B {}
class A extends B {}
B b = new A();
then you can say b has type B and belong to both class A and B. But b doesn't have type A.
3/ untyped language is a language that allows you to change the type of the variable, like in javascript.
var s = "123"; // type string
s = 123; // then type integer
4/ I don't know much but I think it is not restricted to oop. It can be procedural programming as well
It may well depend on the language. I treat types and classes as the same thing in OO, only making a distinction between class (the definition of a family of objects) and instance (or object), specific concrete occurrences of a class.
I come originally from a C world where there was no real difference between language-defined types like int and types that you made yourself with typedef or struct.
Likewise, in C++, there's little difference (probably none) between std::string and any class you put together yourself, other than the fact that std::string will almost certainly be bug-free by now. The same isn't always necessary in our own code :-)
I've heard people suggest that types are classes without methods but I don't believe that distinction (again because of my C/C++ background).
There is a fundamental difference in some languages between integral (in the sense of integrated rather than integer) types and class types. Classes can be extended but int and float (examples for C++) cannot.
In OOP languages, a class specifies the definition of an object. In many cases, that object can serve as a type for things like parameter matching in a function.
So, for an example, when you define a function, you specify the type of data that should be passed to the function and the type of data that is returned:
int AddOne(int value) { return value+1; } uses int types for the return value and the parameter being passed in.
In languages that have both, the concepts of type and class/object can almost become interchangeable. However, there are many languages that do not have both. For instance, I believe that standard C has no support for custom-defined objects, but it certainly does still have types. On the otherhand, both PHP and Javascript are examples of languages where type is very loosely defined (basically, types are either single item, collection/array/object, or undefined [js only]), but they have full support for classes/objects.
Another key difference: you can have methods and custom-functions associated with a class/object, but not with a standard data-type.
Hopefully that clarified some. To answer your specific questions:
In some ways, type could be considered a base concept of programming, yes.
Yes, with the exception that classes can be treated as types in functions, as in the example above.
An untyped language is one that lets you use any type of variable interchangeably. Meaning that you can handle a string with the same code that handles an int, for instance. In practice most 'untyped' languages actually implement a concept called duck-typing, so named because they say that 'if it acts like a duck, it should be treated like a duck' and attempt to use any variable as the type that makes sense for the code encountered. Again, php and javascript are two languages which do this.
Very true, type is applicable outside of the OOP world.
I know that in terms of several distributed techniques (such as RPC), the term "Marshaling" is used but don't understand how it differs from Serialization. Aren't they both transforming objects into series of bits?
Related:
What is Serialization?
What is Object Marshalling?
Marshaling and serialization are loosely synonymous in the context of remote procedure call, but semantically different as a matter of intent.
In particular, marshaling is about getting parameters from here to there, while serialization is about copying structured data to or from a primitive form such as a byte stream. In this sense, serialization is one means to perform marshaling, usually implementing pass-by-value semantics.
It is also possible for an object to be marshaled by reference, in which case the data "on the wire" is simply location information for the original object. However, such an object may still be amenable to value serialization.
As #Bill mentions, there may be additional metadata such as code base location or even object implementation code.
Both do one thing in common - that is serializing an Object. Serialization is used to transfer objects or to store them. But:
Serialization: When you serialize an object, only the member data within that object is written to the byte stream; not the code that
actually implements the object.
Marshalling: Term Marshalling is used when we talk about passing Object to remote objects(RMI). In Marshalling Object is serialized(member data is serialized) + Codebase is attached.
So Serialization is a part of Marshalling.
CodeBase is information that tells the receiver of Object where the implementation of this object can be found. Any program that thinks it might ever pass an object to another program that may not have seen it before must set the codebase, so that the receiver can know where to download the code from, if it doesn't have the code available locally. The receiver will, upon deserializing the object, fetch the codebase from it and load the code from that location.
From the Marshalling (computer science) Wikipedia article:
The term "marshal" is considered to be synonymous with "serialize" in the Python standard library1, but the terms are not synonymous in the Java-related RFC 2713:
To "marshal" an object means to record its state and codebase(s) in such a way that when the marshalled object is "unmarshalled", a copy of the original object is obtained, possibly by automatically loading the class definitions of the object. You can marshal any object that is serializable or remote. Marshalling is like serialization, except marshalling also records codebases. Marshalling is different from serialization in that marshalling treats remote objects specially. (RFC 2713)
To "serialize" an object means to convert its state into a byte stream in such a way that the byte stream can be converted back into a copy of the object.
So, marshalling also saves the codebase of an object in the byte stream in addition to its state.
Basics First
Byte Stream - Stream is a sequence of data. Input stream - reads data from source. Output stream - writes data to destination.
Java Byte Streams are used to perform input/output byte by byte (8 bits at a time). A byte stream is suitable for processing raw data like binary files.
Java Character Streams are used to perform input/output 2 bytes at a time, because Characters are stored using Unicode conventions in Java with 2 bytes for each character. Character stream is useful when we process (read/write) text files.
RMI (Remote Method Invocation) - an API that provides a mechanism to create distributed application in java. The RMI allows an object to invoke methods on an object running in another JVM.
Both Serialization and Marshalling are loosely used as synonyms. Here are few differences.
Serialization - Data members of an object is written to binary form or Byte Stream (and then can be written in file/memory/database etc). No information about data-types can be retained once object data members are written to binary form.
Marshalling - Object is serialized (to byte stream in binary format) with data-type + Codebase attached and then passed Remote Object (RMI). Marshalling will transform the data-type into a predetermined naming convention so that it can be reconstructed with respect to the initial data-type.
So Serialization is a part of Marshalling.
CodeBase is information that tells the receiver of Object where the implementation of this object can be found. Any program that thinks it might ever pass an object to another program that may not have seen it before must set the codebase, so that the receiver can know where to download the code from, if it doesn't have the code available locally. The receiver will, upon deserializing the object, fetch the codebase from it and load the code from that location. (Copied from #Nasir answer)
Serialization is almost like a stupid memory-dump of the memory used by the object(s), while Marshalling stores information about custom data-types.
In a way, Serialization performs marshalling with implementation of pass-by-value because no information of data-type is passed, just the primitive form is passed to byte stream.
Serialization may have some issues related to big-endian, small-endian if the stream is going from one OS to another if the different OS have different means of representing the same data. On the other hand, marshalling is perfectly fine to migrate between OS because the result is a higher-level representation.
Marshaling refers to converting the signature and parameters of a function into a single byte array.
Specifically for the purpose of RPC.
Serialization more often refers to converting an entire object / object tree into a byte array
Marshaling will serialize object parameters in order to add them to the message and pass it across the network.
*Serialization can also be used for storage to disk.*
I think that the main difference is that Marshalling supposedly also involves the codebase. In other words, you would not be able to marshal and unmarshal an object into a state-equivalent instance of a different class.
Serialization just means that you can store the object and reobtain an equivalent state, even if it is an instance of another class.
That being said, they are typically synonyms.
Marshalling is the rule to tell compiler how the data will be represented on another environment/system;
For example;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
as you can see two different string values represented as different value types.
Serialization will only convert object content, not representation (will stay same) and obey rules of serialization, (what to export or no). For example, private values will not be serialized, public values yes and object structure will stay same.
Here's more specific examples of both:
Serialization Example:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
typedef struct {
char value[11];
} SerializedInt32;
SerializedInt32 SerializeInt32(int32_t x)
{
SerializedInt32 result;
itoa(x, result.value, 10);
return result;
}
int32_t DeserializeInt32(SerializedInt32 x)
{
int32_t result;
result = atoi(x.value);
return result;
}
int main(int argc, char **argv)
{
int x;
SerializedInt32 data;
int32_t result;
x = -268435455;
data = SerializeInt32(x);
result = DeserializeInt32(data);
printf("x = %s.\n", data.value);
return result;
}
In serialization, data is flattened in a way that can be stored and unflattened later.
Marshalling Demo:
(MarshalDemoLib.cpp)
#include <iostream>
#include <string>
extern "C"
__declspec(dllexport)
void *StdCoutStdString(void *s)
{
std::string *str = (std::string *)s;
std::cout << *str;
}
extern "C"
__declspec(dllexport)
void *MarshalCStringToStdString(char *s)
{
std::string *str(new std::string(s));
std::cout << "string was successfully constructed.\n";
return str;
}
extern "C"
__declspec(dllexport)
void DestroyStdString(void *s)
{
std::string *str((std::string *)s);
delete str;
std::cout << "string was successfully destroyed.\n";
}
(MarshalDemo.c)
#include <Windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int argc, char **argv)
{
void *myStdString;
LoadLibrary("MarshalDemoLib");
myStdString = ((void *(*)(char *))GetProcAddress (
GetModuleHandleA("MarshalDemoLib"),
"MarshalCStringToStdString"
))("Hello, World!\n");
((void (*)(void *))GetProcAddress (
GetModuleHandleA("MarshalDemoLib"),
"StdCoutStdString"
))(myStdString);
((void (*)(void *))GetProcAddress (
GetModuleHandleA("MarshalDemoLib"),
"DestroyStdString"
))(myStdString);
}
In marshaling, data does not necessarily need to be flattened, but it needs to be transformed to another alternative representation. all casting is marshaling, but not all marshaling is casting.
Marshaling doesn't require dynamic allocation to be involved, it can also just be transformation between structs. For example, you might have a pair, but the function expects the pair's first and second elements to be other way around; you casting/memcpy one pair to another won't do the job because fst and snd will get flipped.
#include <stdio.h>
typedef struct {
int fst;
int snd;
} pair1;
typedef struct {
int snd;
int fst;
} pair2;
void pair2_dump(pair2 p)
{
printf("%d %d\n", p.fst, p.snd);
}
pair2 marshal_pair1_to_pair2(pair1 p)
{
pair2 result;
result.fst = p.fst;
result.snd = p.snd;
return result;
}
pair1 given = {3, 7};
int main(int argc, char **argv)
{
pair2_dump(marshal_pair1_to_pair2(given));
return 0;
}
The concept of marshaling becomes especially important when you start dealing with tagged unions of many types. For example, you might find it difficult to get a JavaScript engine to print a "c string" for you, but you can ask it to print a wrapped c string for you. Or if you want to print a string from JavaScript runtime in a Lua or Python runtime. They are all strings, but often won't get along without marshaling.
An annoyance I had recently was that JScript arrays marshal to C# as "__ComObject", and has no documented way to play with this object. I can find the address of where it is, but I really don't know anything else about it, so the only way to really figure it out is to poke at it in any way possible and hopefully find useful information about it. So it becomes easier to create a new object with a friendlier interface like Scripting.Dictionary, copy the data from the JScript array object into it, and pass that object to C# instead of JScript's default array.
(test.js)
var x = new ActiveXObject('Dmitry.YetAnotherTestObject.YetAnotherTestObject');
x.send([1, 2, 3, 4]);
(YetAnotherTestObject.cs)
using System;
using System.Runtime.InteropServices;
namespace Dmitry.YetAnotherTestObject
{
[Guid("C612BD9B-74E0-4176-AAB8-C53EB24C2B29"), ComVisible(true)]
public class YetAnotherTestObject
{
public void send(object x)
{
System.Console.WriteLine(x.GetType().Name);
}
}
}
above prints "__ComObject", which is somewhat of a black box from the point of view of C#.
Another interesting concept is that you might have the understanding how to write code, and a computer that knows how to execute instructions, so as a programmer, you are effectively marshaling the concept of what you want the computer to do from your brain to the program image. If we had good enough marshallers, we could just think of what we want to do/change, and the program would change that way without typing on the keyboard. So, if you could have a way to store all the physical changes in your brain for the few seconds where you really want to write a semicolon, you could marshal that data into a signal to print a semicolon, but that's an extreme.
Marshalling is usually between relatively closely associated processes; serialization does not necessarily have that expectation. So when marshalling data between processes, for example, you may wish to merely send a REFERENCE to potentially expensive data to recover, whereas with serialization, you would wish to save it all, to properly recreate the object(s) when deserialized.
My understanding of marshalling is different to the other answers.
Serialization:
To Produce or rehydrate a wire-format version of an object graph utilizing a convention.
Marshalling:
To Produce or rehydrate a wire-format version of an object graph by utilizing a mapping file, so that the results can be customized. The tool may start by adhering to a convention, but the important difference is the ability to customize results.
Contract First Development:
Marshalling is important within the context of contract first development.
Its possible to make changes to an internal object graph, while keeping the external interface stable over time. This way all of the service subscribers won't have to be modified for every trivial change.
Its possible to map the results across different languages. For example from the property name convention of one language ('property_name') to another ('propertyName').
Marshaling uses Serialization process actually but the major difference is that it in Serialization only data members and object itself get serialized not signatures but in Marshalling Object + code base(its implementation) will also get transformed into bytes.
Marshalling is the process to convert java object to xml objects using JAXB so that it can be used in web services.
Serialisation vs Marshalling
Problem: Object belongs to some process(VM) and it's lifetime is the same
Serialisation - transform object state into stream of bytes(JSON, XML...) for saving, sharing, transforming...
Marshalling - contains Serialisation + codebase. Usually it used by Remote procedure call(RPC) -> Java Remote Method Invocation(Java RMI) where you are able to invoke a object's method which is hosted on remote Java processes.
codebase - is a place or URL to class definition where it can be downloaded by ClassLoader. CLASSPATH[About] is as a local codebase
JVM -> Class Loader -> load class definition
java -Djava.rmi.server.codebase="<some_URL>" -jar <some.jar>
Very simple diagram for RMI
Serialisation - state
Marshalling - state + class definition
Official doc
Think of them as synonyms, both have a producer that sends stuff over to a consumer... In the end fields of instances are written into a byte stream and the other end foes the reverse ands up with the same instances.
NB - java RMI also contains support for transporting classes that are missing from the recipient...