WCF BufferManager ReturnBuffer vs. Clear - wcf

I am using BufferManager in my WCF service. I created my own class to wrap around the BufferManager which implements IDisposable. Right now my Dispose method looks like this:
public void Dispose()
{
this.bufferManager.Clear();
}
My question: does this accomplish the same thing as calling ReturnBuffer on all of the buffers that have been taken by the manager?
Just for some background: I am using the BufferManager in the following method:
public byte[] ReadAllBufferedBytes(string filePath)
{
using (var fileStream =
new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
byte[] buffer = this.bufferManager.TakeBuffer((int)fileStream.Length);
fileStream.Read(buffer, 0, buffer.Length);
return buffer;
}
}
The reason I am doing this is because I kept getting OutOfMemory exceptions which would tear down the service.
The host server has 3 GB of memory. The service is in InstanceContextMode.Single mode, so images are processed one at a time. The images are received as byte arrays - the biggest might be 100MB, but are typically much smaller - converted, and then returned as a byte array. A lot ends up on the Large Object Heap, and image sizes vary quite a bit.
I am wondering if the issue is heap fragmentation.
As each page of a document gets converted, it is appended to a temp file on disk. After the conversion, I read the entire converted file from disk into a byte array and return it to the client.
The standard File.ReadAllBytes method creates a new byte array when it reads from the file, which inevitably ends up on the LOH due to the image sizes I'm working with (I assume this is what happens). I created the ReadAllBufferedBytes method to do the same thing, but to buffer the byte array and let the BufferManager return the buffer when it is disposed.
Another question is: do I even need to do all this?

The BufferManager is normally used in scenarios where you must prevent GC pressure - where there are a lot of small byte[] assignments, like when receiving or sending data on a very low level (e.g. sockets). The emphasis here is on a lot, otherwise the GC should be able to handle the memory allocations just fine.
To prevent the loading of the entire converted document file into memory you should use the FileStream directly (without reading it's entire content into memory - a byte[]) in combination with the streamed (response) TransferMode, if possible.

Related

Monitor buffer size of Kotlin Flow

I have a complex chain of operators on a Kotlin Flow, and many of them are ran in groups in different contexts using flowOn like this:
flowOf(1, 2, 3)
.map { /*do some stuff*/ }
.flowOn(context1)
.map { /*do some different stuff*/ }
.flowOn(context2)
According to documentation, each flowOn introduces a channel buffer with default size 64 (configurable).
In addition to this, I have a MutableSharedFlow with a fixed buffer size configured by the extraBufferCapacity parameter to which I'm emitting items.
I would like to monitor the current buffer sizes, however, the buffers are private property and there seems to be no method to retrieve the buffer reference or its current size. Is there any way to retrieve it, or is it intended solely for internal Flow purposes?

How to convert IDataReader into Stream in C#

In my WCF service, I am retrieving data from SQL server using Command.ExecuteReader() method. The data size is very large (around 1+ GB) and transferring this data to client over netTcp binding.
I am planning to implement stream mode instead of buffered mode in WCF. Can anyone point me to any article or document to do the same.
In simple words, my objective is to convert IDataReader to stream object that will transfer to client and client side, want to convert this stream back to dataset/datatable or anything that can be bind with Grid.
I cannot convert IdataReader to IEnumerable as data is coming through SP and no of columns in output set keep changing (I don;t want to add no of column limitation in code).
Ultimately, final communication will be done on dataset from WCF service to client app. If any solution like converting Dataset to stream, sent it to client and at client, convert stream back to dataset will also solve my problem.
You should not try to convert the IDataReader to a stream, but let your data access method return an IEnumerable of a type representing a single row of the query result like this:
public IEnumerable<Order> GetOrders()
{
IDbCommand cmd = ... <<build your command here>> ...
using(var rdr = cmd.ExecuteDataReader())
{
while(rdr.Read())
{
Order order = new Order {Id=rdr.GetDecimal(1), Name=rdr.GetString(2)};
yield return order;
}
}
}
Next you can serialize the result of this method to a stream (as shown by #Mohamed, for example). This way you can send a list of objects to the client without needing the complete resultset to be loaded in memory. And you are still sure the datareader is disposed when the reader has reached the end of the result.
you can convert anything to stream like this:
var stream = new MemoryStream(Encoding.UTF8.GetBytes(Newtonsoft.Json.JsonConvert.SerializeObject(datareader)));
but its not a best practice. you should create an array of objects with your datareader and then return it.

Unmarshalling parcelables from an InputSream

I have an InputStream (serialized bytes) containing marshalled Parcels with Parcelables inside them. I need to unmarshall and inflate the Parcel back up again so I can retrieve the Parcelable. How can I do this, when I don't know the length of the Parcelable? Is there a way to do this automatically just continuously reading from the bytestream?

WCF Function won't return stream

So I have a function which returns a pdf stream. The calling procedure gets the stream and writes it to disk (thanks ServiceGuy!). Normal VB code is as follows:
Public Function GetPDF() As System.IO.FileStream
GetPDF = File.OpenRead("C:\Documents and Settings\jspel903\Desktop\3211LD.pdf")
End Function
Now, for whatever reason when I put that code in my WCF (see below) and build/install, it won't return anything. The calling procedure gets nothing after My WCF is hosted in a Windows service (I can't get it to debug or step into). So I'm wondering if, perhaps, WCFs won't return a FileStream?
Here's the IService:
<ServiceContract()> _
Public Interface IService1
<OperationContract()> _
Function GetPDF() As System.IO.FileStream
'Should return a .pdf file as a stream
End Interface
Here's the svc:
Imports System.IO
Public Class Service1
Implements IService1
Public Function GetPDF() As System.IO.FileStream Implements IService1.GetPDF
GetPDF = File.OpenRead("C:\Documents and Settings\jspel903\Desktop\3211LD.pdf")
End Function
End Class
Seems to me like it should work. Any ideas?
You probably want to convert your FileStream to a byte array and return that. WCF is generally limited to returning serializable types.
// fs is your FileStream
byte[] Data = new byte[fs.Length];
fs.Read(Data,0,fs.Length);
It's not going to work... Think about what it would mean if it did. Your WCF client would have access to a file stream on your server device. It'd have to be able to perform all of the operations that you could do locally, over a remote connection.
WCF is a transport mechanism for data, it doesn't actually send object references to the server's objects. The client gets a copy of the object, that has been serialized by the server, then deserialized at the client side. If you could send a stream object, the file handle / memory reference etc wouldn't mean anything to the receiving client.
You need to read the data from the stream on the server and then convert it into an appropriate data object to transmit back to the client.
EDIT:
Apparently you can use streaming with WCF, but you're not going to be able to return a System.IO.FileStream, you can return a Stream.
Take a look at: Large Data And Streaming for a description and some of the restriction's you'll need to be aware of if you take that approach.

What is the difference between Serialization and Marshaling?

I know that in terms of several distributed techniques (such as RPC), the term "Marshaling" is used but don't understand how it differs from Serialization. Aren't they both transforming objects into series of bits?
Related:
What is Serialization?
What is Object Marshalling?
Marshaling and serialization are loosely synonymous in the context of remote procedure call, but semantically different as a matter of intent.
In particular, marshaling is about getting parameters from here to there, while serialization is about copying structured data to or from a primitive form such as a byte stream. In this sense, serialization is one means to perform marshaling, usually implementing pass-by-value semantics.
It is also possible for an object to be marshaled by reference, in which case the data "on the wire" is simply location information for the original object. However, such an object may still be amenable to value serialization.
As #Bill mentions, there may be additional metadata such as code base location or even object implementation code.
Both do one thing in common - that is serializing an Object. Serialization is used to transfer objects or to store them. But:
Serialization: When you serialize an object, only the member data within that object is written to the byte stream; not the code that
actually implements the object.
Marshalling: Term Marshalling is used when we talk about passing Object to remote objects(RMI). In Marshalling Object is serialized(member data is serialized) + Codebase is attached.
So Serialization is a part of Marshalling.
CodeBase is information that tells the receiver of Object where the implementation of this object can be found. Any program that thinks it might ever pass an object to another program that may not have seen it before must set the codebase, so that the receiver can know where to download the code from, if it doesn't have the code available locally. The receiver will, upon deserializing the object, fetch the codebase from it and load the code from that location.
From the Marshalling (computer science) Wikipedia article:
The term "marshal" is considered to be synonymous with "serialize" in the Python standard library1, but the terms are not synonymous in the Java-related RFC 2713:
To "marshal" an object means to record its state and codebase(s) in such a way that when the marshalled object is "unmarshalled", a copy of the original object is obtained, possibly by automatically loading the class definitions of the object. You can marshal any object that is serializable or remote. Marshalling is like serialization, except marshalling also records codebases. Marshalling is different from serialization in that marshalling treats remote objects specially. (RFC 2713)
To "serialize" an object means to convert its state into a byte stream in such a way that the byte stream can be converted back into a copy of the object.
So, marshalling also saves the codebase of an object in the byte stream in addition to its state.
Basics First
Byte Stream - Stream is a sequence of data. Input stream - reads data from source. Output stream - writes data to destination.
Java Byte Streams are used to perform input/output byte by byte (8 bits at a time). A byte stream is suitable for processing raw data like binary files.
Java Character Streams are used to perform input/output 2 bytes at a time, because Characters are stored using Unicode conventions in Java with 2 bytes for each character. Character stream is useful when we process (read/write) text files.
RMI (Remote Method Invocation) - an API that provides a mechanism to create distributed application in java. The RMI allows an object to invoke methods on an object running in another JVM.
Both Serialization and Marshalling are loosely used as synonyms. Here are few differences.
Serialization - Data members of an object is written to binary form or Byte Stream (and then can be written in file/memory/database etc). No information about data-types can be retained once object data members are written to binary form.
Marshalling - Object is serialized (to byte stream in binary format) with data-type + Codebase attached and then passed Remote Object (RMI). Marshalling will transform the data-type into a predetermined naming convention so that it can be reconstructed with respect to the initial data-type.
So Serialization is a part of Marshalling.
CodeBase is information that tells the receiver of Object where the implementation of this object can be found. Any program that thinks it might ever pass an object to another program that may not have seen it before must set the codebase, so that the receiver can know where to download the code from, if it doesn't have the code available locally. The receiver will, upon deserializing the object, fetch the codebase from it and load the code from that location. (Copied from #Nasir answer)
Serialization is almost like a stupid memory-dump of the memory used by the object(s), while Marshalling stores information about custom data-types.
In a way, Serialization performs marshalling with implementation of pass-by-value because no information of data-type is passed, just the primitive form is passed to byte stream.
Serialization may have some issues related to big-endian, small-endian if the stream is going from one OS to another if the different OS have different means of representing the same data. On the other hand, marshalling is perfectly fine to migrate between OS because the result is a higher-level representation.
Marshaling refers to converting the signature and parameters of a function into a single byte array.
Specifically for the purpose of RPC.
Serialization more often refers to converting an entire object / object tree into a byte array
Marshaling will serialize object parameters in order to add them to the message and pass it across the network.
*Serialization can also be used for storage to disk.*
I think that the main difference is that Marshalling supposedly also involves the codebase. In other words, you would not be able to marshal and unmarshal an object into a state-equivalent instance of a different class.
Serialization just means that you can store the object and reobtain an equivalent state, even if it is an instance of another class.
That being said, they are typically synonyms.
Marshalling is the rule to tell compiler how the data will be represented on another environment/system;
For example;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
public string cFileName;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
public string cAlternateFileName;
as you can see two different string values represented as different value types.
Serialization will only convert object content, not representation (will stay same) and obey rules of serialization, (what to export or no). For example, private values will not be serialized, public values yes and object structure will stay same.
Here's more specific examples of both:
Serialization Example:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
typedef struct {
char value[11];
} SerializedInt32;
SerializedInt32 SerializeInt32(int32_t x)
{
SerializedInt32 result;
itoa(x, result.value, 10);
return result;
}
int32_t DeserializeInt32(SerializedInt32 x)
{
int32_t result;
result = atoi(x.value);
return result;
}
int main(int argc, char **argv)
{
int x;
SerializedInt32 data;
int32_t result;
x = -268435455;
data = SerializeInt32(x);
result = DeserializeInt32(data);
printf("x = %s.\n", data.value);
return result;
}
In serialization, data is flattened in a way that can be stored and unflattened later.
Marshalling Demo:
(MarshalDemoLib.cpp)
#include <iostream>
#include <string>
extern "C"
__declspec(dllexport)
void *StdCoutStdString(void *s)
{
std::string *str = (std::string *)s;
std::cout << *str;
}
extern "C"
__declspec(dllexport)
void *MarshalCStringToStdString(char *s)
{
std::string *str(new std::string(s));
std::cout << "string was successfully constructed.\n";
return str;
}
extern "C"
__declspec(dllexport)
void DestroyStdString(void *s)
{
std::string *str((std::string *)s);
delete str;
std::cout << "string was successfully destroyed.\n";
}
(MarshalDemo.c)
#include <Windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int argc, char **argv)
{
void *myStdString;
LoadLibrary("MarshalDemoLib");
myStdString = ((void *(*)(char *))GetProcAddress (
GetModuleHandleA("MarshalDemoLib"),
"MarshalCStringToStdString"
))("Hello, World!\n");
((void (*)(void *))GetProcAddress (
GetModuleHandleA("MarshalDemoLib"),
"StdCoutStdString"
))(myStdString);
((void (*)(void *))GetProcAddress (
GetModuleHandleA("MarshalDemoLib"),
"DestroyStdString"
))(myStdString);
}
In marshaling, data does not necessarily need to be flattened, but it needs to be transformed to another alternative representation. all casting is marshaling, but not all marshaling is casting.
Marshaling doesn't require dynamic allocation to be involved, it can also just be transformation between structs. For example, you might have a pair, but the function expects the pair's first and second elements to be other way around; you casting/memcpy one pair to another won't do the job because fst and snd will get flipped.
#include <stdio.h>
typedef struct {
int fst;
int snd;
} pair1;
typedef struct {
int snd;
int fst;
} pair2;
void pair2_dump(pair2 p)
{
printf("%d %d\n", p.fst, p.snd);
}
pair2 marshal_pair1_to_pair2(pair1 p)
{
pair2 result;
result.fst = p.fst;
result.snd = p.snd;
return result;
}
pair1 given = {3, 7};
int main(int argc, char **argv)
{
pair2_dump(marshal_pair1_to_pair2(given));
return 0;
}
The concept of marshaling becomes especially important when you start dealing with tagged unions of many types. For example, you might find it difficult to get a JavaScript engine to print a "c string" for you, but you can ask it to print a wrapped c string for you. Or if you want to print a string from JavaScript runtime in a Lua or Python runtime. They are all strings, but often won't get along without marshaling.
An annoyance I had recently was that JScript arrays marshal to C# as "__ComObject", and has no documented way to play with this object. I can find the address of where it is, but I really don't know anything else about it, so the only way to really figure it out is to poke at it in any way possible and hopefully find useful information about it. So it becomes easier to create a new object with a friendlier interface like Scripting.Dictionary, copy the data from the JScript array object into it, and pass that object to C# instead of JScript's default array.
(test.js)
var x = new ActiveXObject('Dmitry.YetAnotherTestObject.YetAnotherTestObject');
x.send([1, 2, 3, 4]);
(YetAnotherTestObject.cs)
using System;
using System.Runtime.InteropServices;
namespace Dmitry.YetAnotherTestObject
{
[Guid("C612BD9B-74E0-4176-AAB8-C53EB24C2B29"), ComVisible(true)]
public class YetAnotherTestObject
{
public void send(object x)
{
System.Console.WriteLine(x.GetType().Name);
}
}
}
above prints "__ComObject", which is somewhat of a black box from the point of view of C#.
Another interesting concept is that you might have the understanding how to write code, and a computer that knows how to execute instructions, so as a programmer, you are effectively marshaling the concept of what you want the computer to do from your brain to the program image. If we had good enough marshallers, we could just think of what we want to do/change, and the program would change that way without typing on the keyboard. So, if you could have a way to store all the physical changes in your brain for the few seconds where you really want to write a semicolon, you could marshal that data into a signal to print a semicolon, but that's an extreme.
Marshalling is usually between relatively closely associated processes; serialization does not necessarily have that expectation. So when marshalling data between processes, for example, you may wish to merely send a REFERENCE to potentially expensive data to recover, whereas with serialization, you would wish to save it all, to properly recreate the object(s) when deserialized.
My understanding of marshalling is different to the other answers.
Serialization:
To Produce or rehydrate a wire-format version of an object graph utilizing a convention.
Marshalling:
To Produce or rehydrate a wire-format version of an object graph by utilizing a mapping file, so that the results can be customized. The tool may start by adhering to a convention, but the important difference is the ability to customize results.
Contract First Development:
Marshalling is important within the context of contract first development.
Its possible to make changes to an internal object graph, while keeping the external interface stable over time. This way all of the service subscribers won't have to be modified for every trivial change.
Its possible to map the results across different languages. For example from the property name convention of one language ('property_name') to another ('propertyName').
Marshaling uses Serialization process actually but the major difference is that it in Serialization only data members and object itself get serialized not signatures but in Marshalling Object + code base(its implementation) will also get transformed into bytes.
Marshalling is the process to convert java object to xml objects using JAXB so that it can be used in web services.
Serialisation vs Marshalling
Problem: Object belongs to some process(VM) and it's lifetime is the same
Serialisation - transform object state into stream of bytes(JSON, XML...) for saving, sharing, transforming...
Marshalling - contains Serialisation + codebase. Usually it used by Remote procedure call(RPC) -> Java Remote Method Invocation(Java RMI) where you are able to invoke a object's method which is hosted on remote Java processes.
codebase - is a place or URL to class definition where it can be downloaded by ClassLoader. CLASSPATH[About] is as a local codebase
JVM -> Class Loader -> load class definition
java -Djava.rmi.server.codebase="<some_URL>" -jar <some.jar>
Very simple diagram for RMI
Serialisation - state
Marshalling - state + class definition
Official doc
Think of them as synonyms, both have a producer that sends stuff over to a consumer... In the end fields of instances are written into a byte stream and the other end foes the reverse ands up with the same instances.
NB - java RMI also contains support for transporting classes that are missing from the recipient...