Message with a large List of simple objects

Message with a large List of simple objects - serialization

I am considering the pros and cons of how to work correctly with Akka.Net.
I added the Akka tag, because probably the underlying OS doesn't matter.
Suppose I have a list of 10.000 à 100.000 objects of relatively simple type.
Each object has a string, 2 integers and 10 doubles.
My estimate is that each object is 100 bytes.
So the complete list would be approx. 1 à 10 MB.
I would prefer to sent the list in 1 message, but I am reading this is wrong, messages in akka should typically be small.
What is the correct approach in akka?
Should I really sent 10.000 à 100.000 messages of 100 byte each?
Should I sent messages of 100 objects each?

Database persistence is a separate issue and IMHO belongs in a question of its own.
As for the list of items you want to send, you can either batch them into small chunks or send them individually (this is the preferred option).
The Akka.NET blog has a good article with rationale on why messages should not be "fat": https://petabridge.com/blog/large-messages-and-sockets-in-akkadotnet/

Related

Break 1 api into multiple by Dell Boomi?

I am trying to call a system which does not have it's api bulkified. So basically I have 1 record and for example 1000 child records. In order to send this info to other system am currently required to make 1000 api calls. Can we use middleware Dell Boomi to do this for me.
In short, I call only one Dell Boomi api with all 1000 records and Dell Boomi breaks into 1000 such calls and send this to other system.
Is this scenario even possible? Any suggestion in the right direction would be helpful.

user2272821 makes the assumption that you already have Boomi...
A Dell Boomi process (program) treats your workflow like a flowchart.
The first step would read your parent record (or params) from a source (like a file)
The second step would use a connection object to access your 1000 records
The third step would be to Data Process Shape to split your 'document' into 1000 bits.
The next step would be to process each 'bit' in some way - or many ways...
(There is built in functionality to handle REST and SOAP calls.)
so, yes, it should be able to help you.
note that once you have Boomi, you'll find that you can add filtering and divert records with different values down different paths, send rejected records to a log, all sorts of cool stuff.

What is the relation between Serialization and streaming?

Always when I find some articles or videos are talking about stream they're necessairly talking about serialization?
what is the relation between those? or to be specific,
Could we say that the data stream always needs serialization or could we find some data stream without serialization?

Firstly, it useful to have a reminder of serial vs parallel communication: if we take a simple example of transmitting a byte, in the parallel case all 8 bits are sent at the same time and in the serial case the 8 bits are sent one by one and the byte built again on the receiving side.
For your video domain example, If you imagine a frame of a video as being a large collection of bytes, lets say 720 by 1280 pixels and each pixel is represented by a byte, then we need 921,600 bytes to represent the frame.
If you are streaming the video you need to send each frame (plus overhead which we'll ignore here for simplicity) from the server to the client device, hence you need to send the 921,600 bytes for each frame.
If you had a very (very!) large parallel connections that could transmit 921,600 bytes in parallel between the server and the client in a single communication then this would be easy to understand.
However, this is almost always not the case, even for much smaller data structures, so serialisation is the name generally given to the process of taking the 921,600 bytes and breaking them down into the size which you can transmit - and that size is often one bit at a time.
Generally a video will be broken down into packets and the packets transmitted to the client. The packets themselves are just collections of bytes also and if the connection allows only a single bit of information to be transmitted at a time, then the packet needs to be broken down and sent 'serially' one bit at a time.
To complicate things, as is commonly the case in computer science and communications, the terms can mean different things in different contexts.
For example you may see it mentioned that you can either stream or 'serialise an object' in some client server communication. What this generally means is that you can either send the raw data 'stream' and let the client be responsible for how to interpret it, or you can use a framework or underlying mechanism which will take an object, convert it into a format that can be transmitted serially, and then reconstruct it on the other end and give it to the client. In fact the actually communication is serial in both cases (if it is using a serial communication channel) so the terms are being used in a different way here.

What are the limits of messages, queues and exchanges?

What are the allowed types of messages (strings, bytes, integers, etc.)?
What is the maximum size of a message?
What is the maximum number of queues and exchanges?

Theoretically anything can be stored/sent as a message. You actually don't want to store anything on the queues. The system works most efficiently if the queues are empty most of the time. You can send anything you want to the queue with two preconditions:
The thing you are sending can be converted to and from a bytestring
The consumer knows exactly what it is getting and how to convert it to the original object
Strings are pretty easy, they have a built in method for converting to and from bytes. If you know it is a string then you know how to convert it back. The best option is to use a markup string like XML, JSON, or YML. This way you can convert objects to Strings and back again to the original objects; they work across programming languages so your consumer can be written in a different language to your producer as long as it knows how to understand the object.
I work in Java. I want to send complex messages with sub objects in the fields. I use my own message object. The message object has two additional methods toBytes and fromBytes that convert to and from the bytestream. I use routing keys that leave no doubt as to what type of message the consumer is receiving. The message is Serializable. This works fine, but is limiting as I can only use it with other Java programs.
The size of the message is limited by the memory on the server, and if it is persistent then also the free HDD space too. You probably do not want to send messages that are too big; it might be better to send a reference to a file or DB.
You might also want to read up on their performance measures:
http://www.rabbitmq.com/blog/2012/04/17/rabbitmq-performance-measurements-part-1/
http://www.rabbitmq.com/blog/2012/04/25/rabbitmq-performance-measurements-part-2/
Queues are pretty light weight, you will most likely be limited by the number of connections you have. It will depend on the server most likely. Here is some info on a similiar question:
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-February/003042.html

What is the maximum size of a message?
It used to be 2 GiB before version 3.8.0:
%% Trying to send a term across a cluster larger than 2^31 bytes will
%% cause the VM to exit with "Absurdly large distribution output data
%% buffer". So we limit the max message size to 2^31 - 10^6 bytes (1MB
%% to allow plenty of leeway for the #basic_message{} and #content{}
%% wrapping the message body).
-define(MAX_MSG_SIZE, 2147383648).
Reference: https://github.com/rabbitmq/rabbitmq-common/blob/v3.7.21/include/rabbit.hrl#L279
It has been 512 MiB since version 3.8.0:
%% Max message size is hard limited to 512 MiB.
%% If user configures a greater rabbit.max_message_size,
%% this value is used instead.
-define(MAX_MSG_SIZE, 536870912).
Reference: https://github.com/rabbitmq/rabbitmq-common/blob/v3.8.0/include/rabbit.hrl#L238

See robthewolf's answer.
The max message size is 2GB, however, performance tuning for messages of this size is not effective. Max Message Size
There is no hard limit imposed by RabbitMQ Server Software on the number of queues, however, the hardware the server is running on may very well impact this limit.
3a. There is no queue length limit imposed by the server by default. You can, however, limit this through server-side policy (configuration) or client side policy. Max Queue Length
There is more information and links on a related post.

WCF best practises in regards to MaxItemsInObjectGraph

I have run into the exception below a few times in the past and each time I just change the configuration to allow a bigger object graph.
"Maximum number of items that can be serialized or deserialized in an object graph is '65536'. Change the object graph or increase the MaxItemsInObjectGraph quota."
However I was speaking to a colleague and he said that WCF should not be used to send large amounts of data, instead the data should be bite sized.
So what is the general consensus about large amounts of data being returned?

In my experience using synchronous web service operations to transmit large data sets or files leads to many different problems.
Firstly, you have performance related issues - serialization time at the service boundary. Then you have availability issues. Incoming requests can time out waiting for a response, or may be rejected because there is no dispatcher thread to service the request.
It is much better to delegate large data transfer and processing to some offline asynchronous process.
For example, in your situation, you send a request and the service returns a URI to the eventual resource you want. You may have to wait for the resource to become available, but you can code your consumer appropriately.

I haven't got any concrete examples but this article seems to point to WCF being used for large data sets, and I am aware of people using it for images.
Personally, I have always had to increase this property for any real world data.

Is this possible in wcf?

I have a wcf service which returns a list of many objects e.g. 100,000
I get an error when calling this function because the maximum size i am allowed to pass back from wcf has been exceeded.
Is there a built in way i could return this in smaller chunks e.g. 20,000 at a time
I can increase the size allowed back from the wcf but was wondering what the alternatives were.
Thanks

Without knowing your requirements, I'd take a look at two other possible options:
Paging: If your 100,000 objects are coming from a database, then use paging to reduce the amount of data and invoke the service in batches with a page number. If the objects are not coming from a database, then you'd need to look at how that data will be stored server-side during invocations.
Streaming: Return the data to the caller as a stream instead.
With the streaming option, you'd have to do some more work in terms of managing the serialization of the objects, but it would allow the client to 'pull' the objects from the service at its own pace. Streaming is supported in most, if not all, the standard bindings (including HTTP).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas