How to get started with WCF Performance profiling - wcf

I'm trying to figure out how to profile a WCF service so I can identify any bottlenecks.
I have found a bit of information on line, but nothing that assumes no prior knowlege which is where I'm at.
What are recomended FREE tools?
- visual studio tools
- clrprofiler
Here is information I found using vsperfcmd.exe to profile wcf service and according to this it is very simple, but I need to fill in the gaps on where to start. My assumptions are to copy VsPerfCLREnv and VsPerfCmd to the server that hosts my wcf service and perform some configuraiton steps that I'm not quite sure on. I'm also not quite sure how I would be able to see the call stack to evaluate the performance of each call.
clrprofiler seems a bit simpler. I assume I would copy clrprofiler.exe to the server, File->Profile Service and add the name and start/stop commands. (is this a friendly name or filename or the service display name?) I assume I would then run my tests against the service and I could see the call stack in clrprofiler. Does that sound correct?
[edit]
I'm not so interested in testing the network since this is on a test server, and this is a large wcf project with multiple devs on it and I am unable to make changes to the project for the sole purpose of monitoring the performance. I want to focus on the performance of the actual methods within it.
Any assistance on getting started is greatly appreciated.

For WCF it is not enough to profile your code only as bunch of things happen on the channel stack (security, deserialization, formatting etc). A good way to visualise that is by using WCF Tracing at verbose level and then using the service trace viewer to see how long it is taking at each step of message processing. Read here on how to configure and use WCF tracing. This is the single most thing that has hepled me with diagnosing WCF issues.
Of course all other code profiling, DB profiling etc. are valid approach as well. You may even use a tool like SoapUI to test your network communication and client side performance overhead for a more end-to-end benchmark.

some things I've learned that someone might find helpful:
you cannot remote profile a service, even over your local network. The profiler must be running on the same machine as the service. (This actually took me quite a while to figure out. Maybe obvious to you, but it was never spelled out so I kept trying to do it)
Visual Studio didn't work for me to profile my WCF service. I was able to get a bit of help from the VS profiler team, but never came out of it with a working solution.
VS was slow to connect and disconnect the profiler and often instrumented my binaries and left them in a corrupted state.
.net binaries do not need to be instrumented since they contain the metadata of the methods which is odd that visual studio kept hosing my binaries trying to instrument them.
I also tried the VS standalone profiler but this is very complex to use and requires reboots of my server.
I ended up getting an internal profiler to work (after getting a private build from the team) so I'm not sure how many profilers out there are designed to work with a WCF service.
I actually set the profiler to watch the WAS service and then added my additional binaries to the profiler.
process explorer is useful when troubleshooting if the profiler is connected or not. Use it to look at inetinfo.exe environment

Can you run it under a debugger?
Can you stand a simple, old-fashioned, method that just works? Here's one.

In addition to Mike's comments, you can use the built-in WCF performance counters to see a number of performance-related metrics and you can also see call times on a WCF trace. Once you know which operations are 'slow' it's usually easier to add some custom timing/logging code to those operations than using a general purpose profiler for something like this. This coming from someone who used to work on commercial profilers.

Tools you should look into: svctracelogviewer (and turn on tracing in both your service and clients). SoapUI for simulating load (and do analysis) and Fiddler, an excellent HTTP sniffer/diagnostics tool.

Related

Best ESB/Message Queue for appharbor

I'm currently trying to find the best message queue solution for an appharbor application. Most of the ones of looked at assume you have a windows environment with MSMQ and DTC installed, which I don't believe the appharbor environment provides.
I would like something that works well with ravendb, as that is the database we are using. Something who's only dependence is on raven would be ideal, especially if it integrates with our existing unit of work. Ie, when save changes is called in our controller action the messages are saved in the same transaction.
It would also need a host that works in a console application for background processing.
Ideally I would like something that "just works" in a development environment also. With raven, for example, we use the embedded mode while developing and I would like something that doesn't require installation.
I've looked at nServicebus, which seems to fail these conditions because it needs a transport (msmq, sql, etc) and much of the documentation is out of date.
I also looked at rhino service bus but there is a distinct lack of documentation and community. I'm also not sure if it can depend entirely on ravendb.
The others I looked at all seemed quite heavyweight and required installation and configuration to run in a development environment.
Edit: the other option, is to implement our own.
First of all, congratulations on being the 1000th NServiceBus question on StackOverflow!
Second, if you were to use SQL for persisting your business data, then you could run NServiceBus on top of that same SQL where all the messages go through tables (instead of queues) and then you wouldn't need the DTC.
Third, if you did want to go with RavenDB as your transport for NServiceBus, you would have to implement the ISendMessages and IReceiveMessages interfaces on top of it, but I believe that somebody in the community has already started working on that, so possibly you could join forces with them.
Finally, I wouldn't recommend writing your own ESB these days - not when there are so many good choices already out there. You mentioned the issues of community and documentation - those tend to be handled the worst when writing your own infrastructure.

Self-diagnostic test for software layers?

With the increasing effort put into dividing software pieces into independent layers and decoupling them using dynamic discovery and dependency injection, it has become harder to determine which "layer" in the system is contributing to a system-wide failure of the "application".
Unit tests help by making sure that all "modules" that makes up a layer is working as expected. But unit tests are written in a such a way that each "modules" are isolated by using techniques such as stubbing and mocking.
Consider the simplistic example below:
L1. Database -> L2. Database Layer -> L3. Windows Service -> L4. Client Application
For instance, if the database engine is down, then the system will not operate normally. It will be hard to tell if the database engine is really down or that there is a bug in the database layer (L2) code. To check, you would have to launch some kind of a database management tool to check if the database engine is running.
What we are trying to achieve is a developer tool that can be launched whenever "there is something wrong" with the system, and this tool will "query" each layer for their "integrity" or "diagnostic" data. The tool will provide the list of software layers and their "integrity status". It will then be able to say, right off the bat, that layer X is the cause of the issue (I.e. The database engine is down).
Of course, each layer will be responsible for providing its own "diagnostic method" that can be queried by the tool.
I guess what we're trying to achieve here is some kind of an "Integration Test" framework or something similar that can be used in run-time (not compile/build time like a unit test). The inspiration came from physical devices that have their own "On-board Diagnostics" like cars. A good example in the software world is the Power-on Self Test that is being run every time the computer is being turned-on.
Have anyone seen or heard something like this? Any suggestions or pointers will surely help a lot!
You could have a common interface that each layer would implement as a WCF service. This way you could connect to every layer and diagnose it. Having this diagnostics available might be useful but if you want to implement it everywhere (every layer) it will be another thing that may fail - how would you diagnose that? Your system would swarm with WCF services which is not good as it requires a lot of maintenance and makes it less stable. Plus it requires a lot of work to implement.
Alternative that I would suggest is to have a good logging system in place. Minimum would be to have every module log an error in all catch sections but I suggest more than that especially for debug purposes. I recommend using Log4Net which is free and very flexible. It is effective in not logging when it is not required meaning that you can set the logging level high and it does not impact performance even in production code. You can change logging level during runtime by changing a setting in a config file. I use Log4Net a lot and it works well.
Once you have your code logging stuff you can configure Log4Net so that all logs go into a central database. You would then have one place where you could relatively easy diagnose what's going on, what has failed, where and what was the exception or message. You can even set up email messages to be sent when something goes wrong.

Thoughts and Experiences with Workflow Foundation 4

Now that Microsoft has revamped their workflow framework in Windows Workflow Foundation 4, what are your thoughts and experiences with this new framework?
I have been working with WF4 for a few months now, and I have run into a few pitfalls:
There's no way to enforce an interface with Workflow Services (Xamlx).
When an error occurs in the workflow, whether it is with communication, correlation, persistence, or some unhandled exception in the workflow, it's almost impossible to tell what went wrong because the trace logs tell you nothing relevant. For example, I had an Entity Framework object as a workflow variable and the workflow persistence had some trouble serializing it. Unfortunately, non of the errors in the trace files indicated that this was the problem. I went through many hours of trial and error before I figured out what went wrong.
Some of the provided activities are insufficient. For example, I had to extend the Send activity to support dynamic endpoints. Unfortunately, I wasn't able to make it completely dynamic, for example, the interface name cannot be dynamic.
If a workflow gets too big, the designer becomes very slow. One workflow that's over 100KB in size took more than a minute to load! And forget about debugging a workflow of this size.
No persistence provider for Oracle.
Despite the pitfalls, I'm very impressed with the persistence capabilities to the database, the ease of snapping activities together in the designer, and the ease of setting up WCF services as workflow services.
I'm curious about the experiences of the other developers using Workflow Foundation 4.
Edit:
I was able to solve the problem of the extremely slow designer for large workflows. It turned out that there were unresolvable Imports, which apparently causes the designer a lot of stress.
I posted on the MSDN forums about this issue.
Update
There are a slew of problems that we are facing with AppFabric, now that we're running in production. It is clear to me that AppFabric Workflow Services, as of now, are not ready for use. I would stay away from this until new versions are released.
I think you did a pretty good summary of the WF4 issues.
My main pain point is the inability to change the definition of in process workflows. That is being fixed in the next version though but for now a big problem.
I also had difficulties with exceptions in workflows - mostly determining why they occurred, the source, and a description or message. I got better at this as I gained more experience, and if I began another workflow project I'd be able to debug it far more efficiently. It's just a different paradigm and so can't be approached the same way as straight code.
Another issue I had with WF 4.0 was unit testing with WorkflowInvoker; the specifics escape me but mocking dependencies and parent/child workflows was a real headache.
Generally I really like WF 4.0, a massive improvement over 3.5. Running in debug mode can be very slow, debugging in the designer is more trouble than it's worth, but the framework is great and very usable.

Testing fault tolerant code

I’m currently working on a server application were we have agreed to try and maintain a certain level of service. The level of service we want to guaranty is: if a request is accepted by the server and the server sends on an acknowledgement to the client we want to guaranty that the request will happen, even if the server crashes. As requests can be long running and the acknowledgement time needs be short we implement this by persisting the request, then sending an acknowledgement to the client, then carrying out the various actions to fulfill the request. As actions are carried out they too are persisted, so the server knows the state of a request on start up, and there’s also various reconciliation mechanisms with external systems to check the accuracy of our logs.
This all seems to work fairly well, but we have difficult saying this with any conviction as we find it very difficult to test our fault tolerant code. So far we’ve come up with two strategies but neither is entirely satisfactory:
Have an external process watch the server code and then try and kill it off at what the external process thinks is an appropriate point in the test
Add code the application that will cause it to crash a certain know critical points
My problem with the first strategy is the external process cannot know the exact state of the application, so we cannot be sure we’re hitting the most problematic points in the code. My problem with the second strategy, although it gives more control over were the fault takes, is I do not like have code to inject faults within my application, even with optional compilation etc. I fear it would be too easy to over look a fault injection point and have it slip into a production environment.
I think there are three ways to deal with this, if available I could suggest a comprehensive set of integration tests for these various pieces of code, using dependency injection or factory objects to produce broken actions during these integrations.
Secondly, running the application with random kill -9's, and disabling of network interfaces may be a good way to test these things.
I would also suggest testing file system failure. How you would do that depends on your OS, on Solaris or FreeBSD I would create a zfs file system in a file, and then rm the file while the application is running.
If you are using database code, then I would suggest testing failure of the database as well.
Another alternative to dependency injection, and probably the solution I would use, are interceptors, you can enable crash test interceptors in your code, these would know the state of the application and introduce the above listed failures at the correct time, or any others you may want to create. It would not require changes to your existing code, just some additional code to wrap it.
A possible answer to the first point is to multiply experiments with your external process so that probability to impact problematic parts of code is increased. Then you can analyze core dump file to determine where the code has actually crashed.
Another way is to increase observability and/or commandability by stubbing library or kernel calls, i.e., without modifying your application code.
You can find some resources on Fault Injection page of Wikipedia, in particular in Software Implemented Fault Injection section.
Your concern about fault injection is not a fundamental concern. You merely need a foolproof way to prevent such code ending up in deployment. One way to do so is by designing your fault injector as a debugger. I.e. the faults are injected by a process external to your process. This already provides a level of isolation. Furthermore, most OS'es provide some kind of access control which prevents debugging unless specifially enabled. In the most primitive form, it's by limiting it to root, on other operating systems it requires a specific "debug privilege". Naturally, on production nobody will have that, and thus your fault injector cannot even run on production.
Practially, the fault injector can set breakpoints at specific addresses, i.e. function or even line of code. You can then react to that, e.g. by terminating the process after a certain breakpoint is hit three times.
I was just about to write the same as Justin :)
The component I would suggest to replace during testing could be the logging component (if you have one, if not, I'd strongly suggest to implement one...). It's relatively easy to replace it with code that generates error and the logger usually gets enough information to know the current application state.
Also it seems to be feasible to make sure that the testing code doesn't go into production. I would discourage conditional compilation though but rather go with some configuration file to select the logging component.
Using "random" kills might help to detect errors but is not well suited for systematic testing because of its non-determinism. Therefore I wouldn't use it for automatic tests.

WCF in the enterprise, any pointers from your experience?

Looking to hear from people who are using WCF in an enterprise environment.
What were the major hurdles with the roll out?
Performance issues?
Any and all tips appreciated!
Please provide some general statistics and server configs if you can!
WCF can be configuration hell. Be sure to familiarize yourself with its diagnostics and svcTraceViewer, lest you get madenning cryptic, useless exceptions. And watch out for the generated client's broken implementation of the disposable pattern.
I've been recently hired to a company that previously handled their client/server communication with traditional asp.net web services and passing dataset's back and forth.
I re-wrote the core so now there is a Net.Tcp "connected" client... and everything is done through there. It was a week worth of "in-production-discoveries"... but well worth it.
The pain points we had to find out late in the game was:
1) The default throttling blocked the 11th user onward (it defaults to allow only 10).
2) The default "maxBufferSize" was set to 65k, so the first bitmap that needed to be downloaded crashed the server :)
3) Other default configurations (max concurent connections, max concurrent calls, etc).
All in all, it was absolutely worth it... the app is a lot faster just by changing their infrustructure and now that we have "connected" users... the server can send messages down to the clients.
Other beautiful gains is that, since we know 100% who is connected, we can actually enforce our licensing policy at the application level. Before now (and before I was hired) my company had to simply log, and then at the end of the month bill the clients extra for connecting too many times.
As already stated, configuration nightmare and exceptions can be cryptic. You can enable tracing and use the trace log viewer to generally troubleshoot a problem but its definitely a shifting of gears to troubleshoot a WCF service, especially once you've deployed it and you are experiencing problems before your code is even executing.
For communication between components within my organization I ended up using [NetDataContract] on my services and proxies which is recommended against (you can't integrate with platforms outside of .NET and to integrate you need the assembly that has the contracts) though I found the performance to be stellar and my overall development time reduced by using it. For us it was the right solution.
WCF is definitely great for enterprise stuff as it is designed with scalability, extensibility, security, etc... in mind.
as maxidad said, it can be very hard though as exceptions often tell you nearly nothing, if you use security (obvisously for enterprise scenarios) you have to deal with certificates, meaningless MessageSecurityExceptions and so on.
Dealing with WCF services is definitely harder than with old asmx service, but it's worth the effort once you're in.
supplying server configs will not be useful to you as it has to fit to your scenario. using the right bindings is very important, as well as security, concurreny. there is no single way to go when using wcf. just think about your requirements. do you need callbacks, what are your users? what kind of security do you need?
however, WCF will be definitely the right technology for enterprise scale applications.