Apache HTTPClient5 - How to Prevent Connection/Stream Refused - apache-httpclient-5.x

Problem Statement
Context
I'm a Software Engineer in Test running order permutations of Restaurant Menu Items to confirm that they succeed order placement w/ the POS
In short, this POSTs a JSON payload to an endpoint which then validates the order w/ a POS to define success/fail/other
Where POS, and therefore Transactions per Second (TPS), may vary, but each Back End uses the same core handling
This can be as high as ~22,000 permutations per item, in easily manageable JSON size, that need to be handled as quickly as possible
The Network can vary wildly depending upon the Restaurant, and/or Region, one is testing
E.g. where some have a much higher latency than others
Therefore, the HTTPClient should be able to intelligently negotiate the same content & endpoint regardless of this
Direct Problem
I'm using Apache's HTTP Client 5 w/ PoolingAsyncClientConnectionManager to execute both the GET for the Menu contents, and the POST to check if the order succeeds
This works out of the box, but sometimes loses connections w/ Stream Refused, specifically:
org.apache.hc.core5.http2.H2StreamResetException: Stream refused
No individual tuning seems to work across all network contexts w/ variable latency, that I can find
Following the stacktrace seems to indicate it is that the stream had closed already, therefore needs a way to keep it open or not execute an already-closed connection
if (connState == ConnectionHandshake.GRACEFUL_SHUTDOWN) {
throw new H2StreamResetException(H2Error.PROTOCOL_ERROR, "Stream refused");
}
Some Attempts to Fix Problem
Tried to use Search Engines to find answers but there are few hits for HTTPClient5
Tried to use official documentation but this is sparse
Changing max connections per route to a reduced number, shifting inactivity validations, or connection time to live
Where the inactivity checks may fix the POST, but stall the GET for some transactions
And that tuning for one region/restaurant may work for 1 then break for another, w/ only the Network as variable
PoolingAsyncClientConnectionManagerBuilder builder = PoolingAsyncClientConnectionManagerBuilder
.create()
.setTlsStrategy(getTlsStrategy())
.setMaxConnPerRoute(12)
.setMaxConnTotal(12)
.setValidateAfterInactivity(TimeValue.ofMilliseconds(1000))
.setConnectionTimeToLive(TimeValue.ofMinutes(2))
.build();
Shifting to a custom RequestConfig w/ different timeouts
private HttpClientContext getHttpClientContext() {
RequestConfig requestConfig = RequestConfig.custom()
.setConnectTimeout(Timeout.of(10, TimeUnit.SECONDS))
.setResponseTimeout(Timeout.of(10, TimeUnit.SECONDS))
.build();
HttpClientContext httpContext = HttpClientContext.create();
httpContext.setRequestConfig(requestConfig);
return httpContext;
}
Initial Code Segments for Analysis
(In addition to the above segments w/ change attempts)
Wrapper handling to init and get response
public SimpleHttpResponse getFullResponse(String url, PoolingAsyncClientConnectionManager manager, SimpleHttpRequest req) {
try (CloseableHttpAsyncClient httpclient = getHTTPClientInstance(manager)) {
httpclient.start();
CountDownLatch latch = new CountDownLatch(1);
long startTime = System.currentTimeMillis();
Future<SimpleHttpResponse> future = getHTTPResponse(url, httpclient, latch, startTime, req);
latch.await();
return future.get();
} catch (IOException | InterruptedException | ExecutionException e) {
e.printStackTrace();
return new SimpleHttpResponse(999, CommonUtils.getExceptionAsMap(e).toString());
}
}
With actual handler and probing code
private Future<SimpleHttpResponse> getHTTPResponse(String url, CloseableHttpAsyncClient httpclient, CountDownLatch latch, long startTime, SimpleHttpRequest req) {
return httpclient.execute(req, getHttpContext(), new FutureCallback<SimpleHttpResponse>() {
#Override
public void completed(SimpleHttpResponse response) {
latch.countDown();
logger.info("[{}][{}ms] - {}", response.getCode(), getTotalTime(startTime), url);
}
#Override
public void failed(Exception e) {
latch.countDown();
logger.error("[{}ms] - {} - {}", getTotalTime(startTime), url, e);
}
#Override
public void cancelled() {
latch.countDown();
logger.error("[{}ms] - request cancelled for {}", getTotalTime(startTime), url);
}
});
}
Direct Question
Is there a way to configure the client such that it can handle for these variances on its own without explicitly modifying the configuration for each endpoint context?

Fixed w/ Combination of the below to Assure Connection Live/Ready
(Or at least is stable)
Forcing HTTP 1
HttpAsyncClients.custom()
.setConnectionManager(manager)
.setRetryStrategy(getRetryStrategy())
.setVersionPolicy(HttpVersionPolicy.FORCE_HTTP_1)
.setConnectionManagerShared(true);
Setting Effective Headers for POST
Specifically the close header
req.setHeader("Connection", "close, TE");
Note: Inactivity check helps, but still sometimes gets refusals w/o this
Setting Inactivity Checks by Type
Set POSTs to validate immediately after inactivity
Note: Using 1000 for both caused a high drop rate for some systems
PoolingAsyncClientConnectionManagerBuilder
.create()
.setValidateAfterInactivity(TimeValue.ofMilliseconds(0))
Set GET to validate after 1s
PoolingAsyncClientConnectionManagerBuilder
.create()
.setValidateAfterInactivity(TimeValue.ofMilliseconds(1000))
Given the Error Context
Tracing the connection problem in stacktrace to AbstractH2StreamMultiplexer
Shows ConnectionHandshake.GRACEFUL_SHUTDOWN as triggering the stream refusal
if (connState == ConnectionHandshake.GRACEFUL_SHUTDOWN) {
throw new H2StreamResetException(H2Error.PROTOCOL_ERROR, "Stream refused");
}
Which corresponds to
connState = streamMap.isEmpty() ? ConnectionHandshake.SHUTDOWN : ConnectionHandshake.GRACEFUL_SHUTDOWN;
Reasoning
If I'm understanding correctly:
The connections were being un/intentionally closed
However, they were not being confirmed ready before executing again
Which caused it to fail because the stream was not viable
Therefore the fix works because (it seems)
Given Forcing HTTP1 allows for a single context to manage
Where HttpVersionPolicy NEGOTIATE/FORCE_HTTP_2 had greater or equivalent failures across the spectrum of regions/menus
And it assures that all connections are valid before use
And POSTs are always closed due to the close header, which is unavailable to HTTP2
Therefore
GET is checked for validity w/ reasonable periodicity
POST is checked every time, and since it is forcibly closed, it is re-acquired before execution
Which leaves no room for unexpected closures
And otherwise the potential that it was incorrectly switching to HTTP2
Will accept this until a better answer comes along, as this is stable but sub-optimal.

Related

Handling bad messages using Kafka's Streams API

I have a basic stream processing flow which looks like
master topic -> my processing in a mapper/filter -> output topics
and I am wondering about the best way to handle "bad messages". This could potentially be things like messages that I can't deserialize properly, or perhaps the processing/filtering logic fails in some unexpected way (I have no external dependencies so there should be no transient errors of that sort).
I was considering wrapping all my processing/filtering code in a try catch and if an exception was raised then routing to an "error topic". Then I can study the message and modify it or fix my code as appropriate and then replay it on to master. If I let any exceptions propagate, the stream seems to get jammed and no more messages are picked up.
Is this approach considered best practice?
Is there a convenient Kafka streams way to handle this? I don't think there is a concept of a DLQ...
What are the alternative ways to stop Kafka jamming on a "bad message"?
What alternative error handling approaches are there?
For completeness here is my code (pseudo-ish):
class Document {
// Fields
}
class AnalysedDocument {
Document document;
String rawValue;
Exception exception;
Analysis analysis;
// All being well
AnalysedDocument(Document document, Analysis analysis) {...}
// Analysis failed
AnalysedDocument(Document document, Exception exception) {...}
// Deserialisation failed
AnalysedDocument(String rawValue, Exception exception) {...}
}
KStreamBuilder builder = new KStreamBuilder();
KStream<String, AnalysedPolecatDocument> analysedDocumentStream = builder
.stream(Serdes.String(), Serdes.String(), "master")
.mapValues(new ValueMapper<String, AnalysedDocument>() {
#Override
public AnalysedDocument apply(String rawValue) {
Document document;
try {
// Deserialise
document = ...
} catch (Exception e) {
return new AnalysedDocument(rawValue, exception);
}
try {
// Perform analysis
Analysis analysis = ...
return new AnalysedDocument(document, analysis);
} catch (Exception e) {
return new AnalysedDocument(document, exception);
}
}
});
// Branch based on whether analysis mapping failed to produce errorStream and successStream
errorStream.to(Serdes.String(), customPojoSerde(), "error");
successStream.to(Serdes.String(), customPojoSerde(), "analysed");
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
Any help greatly appreciated.
Right now, Kafka Streams offers only limited error handling capabilities. There is work in progress to simplify this. For now, your overall approach seems to be a good way to go.
One comment about handling de/serialization errors: handling those error manually, requires you to do de/serialization "manually". This means, you need to configure ByteArraySerdes for key and value for you input/output topic of your Streams app and add a map() that does the de/serialization (ie, KStream<byte[],byte[]> -> map() -> KStream<keyType,valueType> -- or the other way round if you also want to catch serialization exceptions). Otherwise, you cannot try-catch deserialization exceptions.
With your current approach, you "only" validate that the given string represents a valid document -- but it could be the case, that the message itself is corrupted and cannot be converted into a String in the source operator in the first place. Thus, you don't actually cover deserialization exception with you code. However, if you are sure a deserialization exception can never happen, you approach would be sufficient, too.
Update
This issues is tackled via KIP-161 and will be included in the next release 1.0.0. It allows you to register an callback via parameter default.deserialization.exception.handler. The handler will be invoked every time a exception occurs during deserialization and allows you to return an DeserializationResponse (CONTINUE -> drop the record an move on, or FAIL that is the default).
Update 2
With KIP-210 (will be part of in Kafka 1.1) it's also possible to handle errors on the producer side, similar to the consumer part, by registering a ProductionExceptionHandler via config default.production.exception.handler that can return CONTINUE.
Update Mar 23, 2018: Kafka 1.0 provides much better and easier handling for bad error messages ("poison pills") via KIP-161 than what I described below. See default.deserialization.exception.handler in the Kafka 1.0 docs.
This could potentially be things like messages that I can't deserialize properly [...]
Ok, my answer here focuses on the (de)serialization issues as this might be the most tricky scenario to handle for most users.
[...] or perhaps the processing/filtering logic fails in some unexpected way (I have no external dependencies so there should be no transient errors of that sort).
The same thinking (for deserialization) can also be applied to failures in the processing logic. Here, most people tend to gravitate towards option 2 below (minus the deserialization part), but YMMV.
I was considering wrapping all my processing/filtering code in a try catch and if an exception was raised then routing to an "error topic". Then I can study the message and modify it or fix my code as appropriate and then replay it on to master. If I let any exceptions propagate, the stream seems to get jammed and no more messages are picked up.
Is this approach considered best practice?
Yes, at the moment this is the way to go. Essentially, the two most common patterns are (1) skipping corrupted messages or (2) sending corrupted records to a quarantine topic aka a dead letter queue.
Is there a convenient Kafka streams way to handle this? I don't think there is a concept of a DLQ...
Yes, there is a way to handle this, including the use of a dead letter queue. However, it's (at least IMHO) not that convenient yet. If you have any feedback on how the API should allow you to handle this -- e.g. via a new or updated method, a configuration setting ("if serialization/deserialization fails send the problematic record to THIS quarantine topic") -- please let us know. :-)
What are the alternative ways to stop Kafka jamming on a "bad message"?
What alternative error handling approaches are there?
See my examples below.
FWIW, the Kafka community is also discussing the addition of a new CLI tool that allows you to skip over corrupted messages. However, as a user of the Kafka Streams API, I think ideally you want to handle such scenarios directly in your code, and fallback to CLI utilities only as a last resort.
Here are some patterns for the Kafka Streams DSL to handle corrupted records/messages aka "poison pills". This is taken from http://docs.confluent.io/current/streams/faq.html#handling-corrupted-records-and-deserialization-errors-poison-pill-messages
Option 1: Skip corrupted records with flatMap
This is arguably what most users would like to do.
We use flatMap because it allows you to output zero, one, or more output records per input record. In the case of a corrupted record we output nothing (zero records), thereby ignoring/skipping the corrupted record.
Benefit of this approach compared to the others ones listed here: We need to manually deserialize a record only once!
Drawback of this approach: flatMap "marks" the input stream for potential data re-partitioning, i.e. if you perform a key-based operation such as groupings (groupBy/groupByKey) or joins afterwards, your data will be re-partitioned behind the scenes. Since this might be a costly step we don't want that to happen unnecessarily. If you KNOW that the record keys are always valid OR that you don't need to operate on the keys (thus keeping them as "raw" keys in byte[] format), you can change from flatMap to flatMapValues, which will not result in data re-partitioning even if you join/group/aggregate the stream later.
Code example:
Serde<byte[]> bytesSerde = Serdes.ByteArray();
Serde<String> stringSerde = Serdes.String();
Serde<Long> longSerde = Serdes.Long();
// Input topic, which might contain corrupted messages
KStream<byte[], byte[]> input = builder.stream(bytesSerde, bytesSerde, inputTopic);
// Note how the returned stream is of type KStream<String, Long>,
// rather than KStream<byte[], byte[]>.
KStream<String, Long> doubled = input.flatMap(
(k, v) -> {
try {
// Attempt deserialization
String key = stringSerde.deserializer().deserialize(inputTopic, k);
long value = longSerde.deserializer().deserialize(inputTopic, v);
// Ok, the record is valid (not corrupted). Let's take the
// opportunity to also process the record in some way so that
// we haven't paid the deserialization cost just for "poison pill"
// checking.
return Collections.singletonList(KeyValue.pair(key, 2 * value));
}
catch (SerializationException e) {
// log + ignore/skip the corrupted message
System.err.println("Could not deserialize record: " + e.getMessage());
}
return Collections.emptyList();
}
);
Option 2: dead letter queue with branch
Compared to option 1 (which ignores corrupted records) option 2 retains corrupted messages by filtering them out of the "main" input stream and writing them to a quarantine topic (think: dead letter queue). The drawback is that, for valid records, we must pay the manual deserialization cost twice.
KStream<byte[], byte[]> input = ...;
KStream<byte[], byte[]>[] partitioned = input.branch(
(k, v) -> {
boolean isValidRecord = false;
try {
stringSerde.deserializer().deserialize(inputTopic, k);
longSerde.deserializer().deserialize(inputTopic, v);
isValidRecord = true;
}
catch (SerializationException ignored) {}
return isValidRecord;
},
(k, v) -> true
);
// partitioned[0] is the KStream<byte[], byte[]> that contains
// only valid records. partitioned[1] contains only corrupted
// records and thus acts as a "dead letter queue".
KStream<String, Long> doubled = partitioned[0].map(
(key, value) -> KeyValue.pair(
// Must deserialize a second time unfortunately.
stringSerde.deserializer().deserialize(inputTopic, key),
2 * longSerde.deserializer().deserialize(inputTopic, value)));
// Don't forget to actually write the dead letter queue back to Kafka!
partitioned[1].to(Serdes.ByteArray(), Serdes.ByteArray(), "quarantine-topic");
Option 3: Skip corrupted records with filter
I only mention this for completeness. This option looks like a mix of options 1 and 2, but is worse than either of them. Compared to option 1, you must pay the manual deserialization cost for valid records twice (bad!). Compared to option 2, you lose the ability to retain corrupted records in a dead letter queue.
KStream<byte[], byte[]> validRecordsOnly = input.filter(
(k, v) -> {
boolean isValidRecord = false;
try {
bytesSerde.deserializer().deserialize(inputTopic, k);
longSerde.deserializer().deserialize(inputTopic, v);
isValidRecord = true;
}
catch (SerializationException e) {
// log + ignore/skip the corrupted message
System.err.println("Could not deserialize record: " + e.getMessage());
}
return isValidRecord;
}
);
KStream<String, Long> doubled = validRecordsOnly.map(
(key, value) -> KeyValue.pair(
// Must deserialize a second time unfortunately.
stringSerde.deserializer().deserialize(inputTopic, key),
2 * longSerde.deserializer().deserialize(inputTopic, value)));
Any help greatly appreciated.
I hope I could help. If yes, I'd appreciate your feedback on how we could improve the Kafka Streams API to handle failures/exceptions in a better/more convenient way than today. :-)
For the processing logic you could take this approach:
someKStream
.mapValues(inputValue -> {
// for each execution the below "return" could provide a different class than the previous run!
// e.g. "return isFailedProcessing ? failValue : successValue;"
// where failValue and successValue have no related classes
return someObject; // someObject class vary at runtime depending on your business
}) // here you'll have KStream<whateverKeyClass, Object> -> yes, Object for the value!
// you could have a different logic for choosing
// the target topic, below is just an example
.to((k, v, recordContext) -> v instanceof failValueClass ?
"dead-letter-topic" : "success-topic",
// you could completelly ignore the "Produced" part
// and rely on spring-boot properties only, e.g.
// spring.kafka.streams.properties.default.key.serde=yourKeySerde
// spring.kafka.streams.properties.default.value.serde=org.springframework.kafka.support.serializer.JsonSerde
Produced.with(yourKeySerde,
// JsonSerde could be an instance configured as you need
// (with type mappings or headers setting disabled, etc)
new JsonSerde<>()));
Your classes, though different and landing into different topics, will serialize as expected.
When not using to(), but instead one wants to continue with other processing, he could use branch() with splitting the logic based on the kafka-value class; the trick for branch() is to return KStream<keyClass, ?>[] in order to further allow one to cast to the appropriate class the individual array items.
If you want to send an exception (custom exception) to another topic (ERROR_TOPIC_NAME):
#Bean
public KStream<String, ?> kafkaStreamInput(StreamsBuilder kStreamBuilder) {
KStream<String, InputModel> input = kStreamBuilder.stream(INPUT_TOPIC_NAME);
return service.messageHandler(input);
}
public KStream<String, ?> messageHandler(KStream<String, InputModel> inputTopic) {
KStream<String, Object> output;
output = inputTopic.mapValues(v -> {
try {
//return InputModel
return normalMethod(v);
} catch (Exception e) {
//return ErrorModel
return errorHandler(e);
}
});
output.filter((k, v) -> (v instanceof ErrorModel)).to(KafkaStreamsConfig.ERROR_TOPIC_NAME);
output.filter((k, v) -> (v instanceof InputModel)).to(KafkaStreamsConfig.OUTPUT_TOPIC_NAME);
return output;
}
If you want to handle Kafka exceptions and skip it:
#Autowired
public ConsumerErrorHandler(
KafkaProducer<String, ErrorModel> dlqProducer) {
this.dlqProducer = dlqProducer;
}
#Bean
ConcurrentKafkaListenerContainerFactory<?, ?> kafkaListenerContainerFactory(
ConcurrentKafkaListenerContainerFactoryConfigurer configurer,
ObjectProvider<ConsumerFactory<Object, Object>> kafkaConsumerFactory) {
ConcurrentKafkaListenerContainerFactory<Object, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
configurer.configure(factory, kafkaConsumerFactory.getIfAvailable());
factory.setErrorHandler(((exception, data) -> {
ErrorModel errorModel = ErrorModel.builder().message()
.status("500").build();
assert data != null;
dlqProducer.send(new ProducerRecord<>(DLQ_TOPIC, data.key().toString(), errorModel));
}));
return factory;
}
All above answers although valid and useful, they are assuming that your streams topology is stateless. For example going back to the original example,
master topic -> my processing in a mapper/filter -> output topics
"my processing in a mapper/filter" should be stateless. I.e. Not re-partitioning (aka writing to a persistent re-partition topic) or doing a toTable() (aka writing to a changelog topic). If the processing fails further down the topology and you commit the transaction (by following any of the 3 option mention above - flatmap, branch or filter - then you have to cater for manually or programmatically eventually deleting that inconsistent state. That would mean writing extra custom code for automatic this.
I would personally expect Streams to also give you a LogAndSkip option for any unhandled runtime exception, not only for deserialization and production ones.
Has anyone any ideas on this?
I don't believe these examples work at all when working with Avro.
When the schema can't be resolved (i.e there is bad/non-avro message corrupting the topic, for example) there is no key or value to deserialize in the first place because by the time the DSL .branch() code is called, the exception has already been thrown (or handled).
Can anyone confirm if this i indeed the case? The very fluent approach you refer to here isn't possible when working with Avro?
KIP-161 does explain how to use a handler, however, it's much more fluent to see it as part of the topology.

Netty - `Future.operationComplete` never called when running in SSL mode in idle timeout handler

We have implemented the following channelIdle implementation.
public void channelIdle(ChannelHandlerContext ctx, IdleStateEvent e)
throws Exception {
Response response = business.getResponse();
final Channel channel = e.getChannel();
ChannelFuture channelFuture
= Channels.write(
channel,
ChannelBuffers.wrappedBuffer(response.getXML().getBytes())
);
if (response.shouldDisconnect()) {// returns true and listener _is_ added.
channelFuture.addListener(new ChannelFutureListener() {
#Override
public void operationComplete(ChannelFuture future) throws Exception {
channel.close(); // never gets called :(
}
});
}
}
When running in non-SSL mode this works as expected.
However, when running with SSL enabled the operationComplete method never gets called. We've verified this a few times on various machines. The idle timeout happens many times but the operationComplete isn't called. We don't see any exceptions being thrown.
I've tried tracing through the code to see where operationComplete should get called but it is complex and I've not quite figured it out.
There is a call to future = succeededFuture(channel); in SslHandler.wrap() but I don't know if that means anything. The future returned from wrap is never used elsewhere in the SslHandler code.
This sounds like a bug.. Would it be possible to write a simple "test-case" that shows the problem and open an issue at our github issue tracker[1].
Be sure to explain if it happens all the time or only sometimes etc..
[1] https://github.com/netty/netty/issues

Memory leak using WCF GetCallbackChannel over named pipe

We have a simple wpf application that connects to a service running on the local machine. We use a named pipe for the connection and then register a callback so that later the service can send updates to the client.
The problem is that with each call of the callback we get a build up of memory in the client application.
This is how the client connects to the service.
const string url = "net.pipe://localhost/radal";
_channelFactory = new DuplexChannelFactory<IRadalService>(this, new NetNamedPipeBinding(),url);
and then in a threadpool thread we loop doing the following until we are connected
var service = _channelFactory.CreateChannel();
service.Register();
service.Register looks like this on the server side
public void Register()
{
_callback = OperationContext.Current.GetCallbackChannel<IRadalCallback>();
OperationContext.Current.Channel.Faulted += (sender, args) => Dispose();
OperationContext.Current.Channel.Closed += (sender, args) => Dispose();
}
This callback is stored and when new data arrives we invoke the following on the server side.
void Sensors_OnSensorReading(object sender, SensorReadingEventArgs e)
{
_callback.OnReadingReceived(e.SensorId, e.Count);
}
Where the parameters are an int and a double. On the client this is handled as follows.
public void OnReadingReceived(int sensorId, double count)
{
_events.Publish(new SensorReadingEvent(sensorId, count));
}
But we have found that commenting out _event.Publish... makes no difference to the memory usage. Does anyone see any logical reason why this might be leaking memory. We have used a profiler to track the problem to this point but cannot find what type of object is building up.
Well I can partially answer this now. The problem is partially caused by us trying to be clever and getting the connection to be opened on another thread and then passing it back to the main gui thread. The solution was to not use a thread but instead use a dispatch timer. It does have the downside that the initial data load is now on the GUI thread but we are not loading all that much anyway.
However this was not the entire solution (actually we don't have an entire solution). Once we moved over to a better profiler we found out that the objects building up were timeout handlers so we disabled that feature. That's OK for us as we are running against the localhost always but I can imagine for people working with remote services it would be an issue.

WCF Proxy Client taking time to create, any cache or singleton solution for it

we have more than dozon of wcf services and being called using TCP binding. There are a lots of calls to same wcf service at various places in code.
AdminServiceClient client = FactoryS.AdminServiceClient();// it takes significant time. and
client.GetSomeThing(param1);
client.Close();
i want to cache the client or produce it from singleton. so that i can save some time, Is it possible?
Thx
Yes, this is possible. You can make the proxy object visible to the entire application, or wrap it in a singleton class for neatness (my preferred option). However, if you are going to reuse a proxy for a service, you will have to handle channel faults.
First create your singleton class / cache / global variable that holds an instance of the proxy (or proxies) that you want to reuse.
When you create the proxy, you need to subscribe to the Faulted event on the inner channel
proxyInstance.InnerChannel.Faulted += new EventHandler(ProxyFaulted);
and then put some reconnect code inside the ProxyFaulted event handler. The Faulted event will fire if the service drops, or the connection times out because it was idle. The faulted event will only fire if you have reliableSession enabled on your binding in the config file (if unspecified this defaults to enabled on the netTcpBinding).
Edit: If you don't want to keep your proxy channel open all the time, you will have to test the state of the channel before every time you use it, and recreate the proxy if it is faulted. Once the channel has faulted there is no option but to create a new one.
Edit2: The only real difference in load between keeping the channel open and closing it every time is a keep-alive packet being sent to the service and acknowledged every so often (which is what is behind your channel fault event). With 100 users I don't think this will be a problem.
The other option is to put your proxy creation inside a using block where it will be closed / disposed at the end of the block (which is considered bad practice). Closing the channel after a call may result in your application hanging because the service is not yet finished processing. In fact, even if your call to the service was async or the service contract for the method was one-way, the channel close code will block until the service is finished.
Here is a simple singleton class that should have the bare bones of what you need:
public static class SingletonProxy
{
private CupidClientServiceClient proxyInstance = null;
public CupidClientServiceClient ProxyInstance
{
get
{
if (proxyInstance == null)
{
AttemptToConnect();
}
return this.proxyInstance;
}
}
private void ProxyChannelFaulted(object sender, EventArgs e)
{
bool connected = false;
while (!connected)
{
// you may want to put timer code around this, or
// other code to limit the number of retrys if
// the connection keeps failing
AttemptToConnect();
}
}
public bool AttemptToConnect()
{
// this whole process needs to be thread safe
lock (proxyInstance)
{
try
{
if (proxyInstance != null)
{
// deregister the event handler from the old instance
proxyInstance.InnerChannel.Faulted -= new EventHandler(ProxyChannelFaulted);
}
//(re)create the instance
proxyInstance = new CupidClientServiceClient();
// always open the connection
proxyInstance.Open();
// add the event handler for the new instance
// the client faulted is needed to be inserted here (after the open)
// because we don't want the service instance to keep faulting (throwing faulted event)
// as soon as the open function call.
proxyInstance.InnerChannel.Faulted += new EventHandler(ProxyChannelFaulted);
return true;
}
catch (EndpointNotFoundException)
{
// do something here (log, show user message etc.)
return false;
}
catch (TimeoutException)
{
// do something here (log, show user message etc.)
return false;
}
}
}
}
I hope that helps :)
In my experience, creating/closing the channel on a per call basis adds very little overhead. Take a look at this Stackoverflow question. It's not a Singleton question per se, but related to your issue. Typically you don't want to leave the channel open once you're finished with it.
I would encourage you to use a reusable ChannelFactory implementation if you're not already and see if you still are having performance problems.

What is the proper life-cycle of a WCF service client proxy in Silverlight 3?

I'm finding mixed answers to my question out in the web. To elaborate on the question:
Should I instantiate a service client proxy once per asynchronous invocation, or once per Silverlight app?
Should I close the service client proxy explicitly (as I do in my ASP.NET MVC application calling WCF services synchronously)?
I've found plenty of bloggers and forum posters out contradicting each other. Can anyone point to any definitive sources or evidence to answer this once and for all?
I've been using Silverlight with WCF since V2 (working with V4 now), and here's what I've found. In general, it works very well to open one client and just use that one client for all communications. And if you're not using the DuplexHttBinding, it also works fine to do just the opposite, to open a new connection each time and then close it when you're done. And because of how Microsoft has architected the WCF client in Silverlight, you're not going to see much performance difference between keeping one client open all the time vs. creating a new client with each request. (But if you're creating a new client with each request, make darned sure you're closing it as well.)
Now, if you're using the DuplexHttBinding, i.e., if you want to call methods on the client from the server, it's of course important that you don't close the client with each request. That's just common sense. However, what none of the documentation tells you, but which I've found to be absolutely critical, is that if you're using the DuplexHttBinding, you should only ever have one instance of the client open at once. Otherwise, you're going to run into all sorts of nasty timeout problems that are going to be really, really hard to troubleshoot. Your life will be dramatically easier if you just have one connection.
The way that I've enforced this in my own code is to run all my connections through a single static DataConnectionManager class that throws an Assert if I try to open a second connection before closing the first. A few snippets from that class:
private static int clientsOpen;
public static int ClientsOpen
{
get
{
return clientsOpen;
}
set
{
clientsOpen = value;
Debug.Assert(clientsOpen <= 1, "Bad things seem to happen when there's more than one open client.");
}
}
public static RoomServiceClient GetRoomServiceClient()
{
ClientsCreated++;
ClientsOpen++;
Logger.LogDebugMessage("Clients created: {0}; Clients open: {1}", ClientsCreated, ClientsOpen);
return new RoomServiceClient(GetDuplexHttpBinding(), GetDuplexHttpEndpoint());
}
public static void TryClientClose(RoomServiceClient client, bool waitForPendingCalls, Action<Exception> callback)
{
if (client != null && client.State != CommunicationState.Closed)
{
client.CloseCompleted += (sender, e) =>
{
ClientsClosed++;
ClientsOpen--;
Logger.LogDebugMessage("Clients closed: {0}; Clients open: {1}", ClientsClosed, ClientsOpen);
if (e.Error != null)
{
Logger.LogDebugMessage(e.Error.Message);
client.Abort();
}
closingIntentionally = false;
if (callback != null)
{
callback(e.Error);
}
};
closingIntentionally = true;
if (waitForPendingCalls)
{
WaitForPendingCalls(() => client.CloseAsync());
}
else
{
client.CloseAsync();
}
}
else
{
if (callback != null)
{
callback(null);
}
}
}
The annoying part, of course, is if you only have one connection, you need to trap for when that connection closes unintentionally and try to reopen it. And then you need to reinitialize all the callbacks that your different classes were registered to handle. It's not really all that difficult, but it's annoying to make sure it's done right. And of course, automated testing of that part is difficult if not impossible . . .
You should open your client per call and close it immediately after. If you in doubt browse using IE to a SVC file and look at the example they have there.
WCF have configuration settings that tells it how long it should wait for a call to return, my thinking is that when it does not complete in the allowed time the AsyncClose will close it. Therefore call client.AsyncClose().