Monitor Data transffer on Windows Azure? - wcf

How can I monitor traffic going out of a wcf service (self-hosted) on Windows Azure ? The amount of data going into my stress-test app doesn't seem to add up to what I'm seeing on the pricing page (which doesn't seem to be updated live anyway). The service is using https and messages are pretty small. Is the SSL handshake traffic negligible? I also have a data-miner worker roler that continuously downloads data from the internet, but from what I've read, inbound traffic is free, so it shouldn't count in the OUT traffic.
How can I get a reliable traffic monitor?

Billing page is usually updated once a day (once in a 24hr period). So you have wait a lot until you see results of your stress test added to the billing page for your account.
One place that you can monitor this (among other KPI for you application) is the MONITOR tab in the Management Portal. You can navigate to your Cloud Service being under test, click the MONITOR menu item, then click on the Add Metric at the bottom, and finally chose Network Out. This monitoring dashboard gets data every 5 minutes so it shall reflect network usage you are talking about.
Here is a screenshot of how to achieve this:
Other option that you have is to use a Network Performance Counter such as Network Interface : Bytes Sent/sec. You have to configure Windows Azure Diagnostics to monitor that specific performance counter. You can then set a scheduled transfer period of 1 minute and dig into the table created by the diagnostics agent for data.
P.S. And yes, you are correct - INBOUND data for Azure is FREE.

Related

How to monitor nservicebus queue length

We use nservicebus for a few applications and monitor endpoint heartbeats and failed messages through service pulse.
Most of the time messages are processed within minutes, but occasionally there is a spike in traffic and clients will ask if there is a problem. I would like to know the length of an endpoint queue so that I can respond and provide estimates.
We use sql as a transport layer and subscription store. I cannot view the database remotely.
What is the best approach to surface this data?
I could expose an SSRS report on top of the database, add code to service control and service pulse since they are both open source, or add a custom check through service pulse...
How about running a job (at a configured interval on the SQL server) on the queues tables that will write the number of messages to a table you can query?
You can than use this table to run your monitoring tool and generate alerts, or indeed write a customCheck so you will get alerts on ServicePulse...
While this is a temporary solution, we are working on filling that gap, take a look at this anouncement: https://groups.google.com/d/msg/particularsoftware/zRJ18bxeY2Y/zrLu9WOIAQAJ
we've been working on enhancing the Particular Service Platform to close existing gap and provide a means of monitoring your NServiceBus-related system more easily.
The initial offering will focus on identifies key metrics (one of them is the queue length) for assessing the health of a system and then presenting these metrics to you in a manner that's easy to visualize and consume.
In the weeks ahead we will share more information about our monitoring philosophy and how we are looking to ease the pain of implementing it. So follow our blog to get notified of updates.
In the meantime you are welcome to join the live webinar,on the monitoring theme, Wednesday, June 28 at 12:00 EDT (17:00BST).
Also: me and my college, William Brander will show the metrics you should consider when monitoring microservices.
link- https://particular.net/what-to-consider-when-monitoring-microservices
Hope this helps,
If I can help, please feel free to email support at particular.net

scalability of azure cloud queue

In current project we currently use 8 worker role machines side by side that actually work a little different than azure may expect it.
Short outline of the system:
each worker start up to 8 processes that actually connect to cloud queue and processes messages
each process accesses three different cloud queues for collecting messages for different purposes (delta recognition, backup, metadata)
each message leads to a WCF call to an ERP system to gather information and finally add retreived response in an ReDis cache
this approach has been chosen over many smaller machines due to costs and performance. While 24 one-core machines would perform by 400 calls/s to the ERP system, 8 four-core machines with 8 processes do over 800 calls/s.
Now to the question: when even increasing the count of machines to increase performance to 1200 calls/s, we experienced outages of Cloud Queue. In same moment of time, 80% of the machines' processes don't process messages anymore.
Here we have two problems:
Remote debugging is not possible for these processes, but it was possible to use dile to get some information out.
We use GetMessages method of Cloud Queue to get up to 4 messages from queue. Cloud Queue always answers with 0 messages. Reconnect the cloud queue does not help.
Restarting workers does help, but shortly lead to same problem.
Are we hitting the natural end of scalability of Cloud Queue and should switch to Service Bus?
Update:
I have not been able to fully understand the problem, I described it in the natual borders of Cloud Queue.
To summarize:
Count of TCP connections have been impressive. Actually too impressive (multiple hundreds)
Going back to original memory size let the system operate normally again
In my experience I have been able to get better raw performance out of Azure Cloud Queues than service bus, but Service Bus has better enterprise features (reliable, topics, etc). Azure Cloud Queue should process up to 2K/second per queue.
https://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/
You can also try partitioning to multiple queues if there is some natural partition key.
Make sure that your process don't have some sort of thread deadlock that is the real culprit. You can test this by connecting to the queue when it appears hung and trying to pull messages from the queue. If that works it is your process, not the queue.
Also take a look at this to setup some other monitors:
https://azure.microsoft.com/en-us/documentation/articles/storage-monitor-storage-account/
It took some time to solve this issue:
First a summarization of the usage of the storage account:
We used the blob storage once a day pretty heavily.
The "normal" diagonistics that Azure provides out of the box also used the same storage account.
Some controlling processes used small tables to store and read information once an hour for ca. 20 minutes
There may be up to 800 calls/s that try to increase a number to count calls to an ERP system.
When recognizing that the storage account is put under heavy load we split it up.
Now there are three physical storage accounts heaving 2 queues.
The original one still keeps up to 800/s calls for increasing counters
Diagnositics are still on the original one
Controlling information has been also moved
The system runs now for 2 weeks, working like a charm. There are several things we learned from that:
No, the infrastructure is "not just there" and it doesn't scale endlessly.
Even if we thought we didn't use "that much" summarized we used quite heavily and uncontrolled.
There is no "best practices" anywhere in the net that tells the complete story. Esp. when start working with the storage account a guide from MS would be quite helpful
Exception handling in storage is quite bad. Even if the storage account is overused, I would expect some kind of exception and not just returning zero message without any surrounding information
Read complete story here: natural borders of cloud storage scalability
UPDATE:
The scalability has a lot of influences. You may are interested in Azure Service Bus: Massive count of listeners and senders to be aware of some more pitfalls.

Running load tests from home network

I need to perform a load test using loadrunner to simulate load generated from external network (My home network) on servers placed in some organization in the same region.
The application which will be tested is a web site (Not Heavy one) which users can be logged into and get personal information.
I am very concerned that my home network bandwidth wouldn't be enough to generate the following load :
I need to simulate 250 Web concurrent users which will perform about 30,000 transactions in an hour.
My home network specs and statistics:
Download - 75M - 7.5 Megabyte/sec
Upload - 3.5 M - 350Kbyte / sec
From your experience is this would be enough to generate the desired load? If not what can be done to simulate load from external network?
One Load Generator is never enough from a process perspective. Consider at least three, two for primary load and one for a control set. So, right off of the bat you are likely to have issues.
Mentioned previously. Go to the cloud: Amazon, CloudAzure, GoDaddy, Rackspace, 1&1, etc... all have virtual machines that you can use for performance testing hosts running load generator software. More locations is better as this minimizes the influence of one host network over another if you are looking for representative experiences. Odds are your site will be on one backbone and some of your load generators may have to peer over from another backbone. This is not bad as this provides a more realistic view of your end user experiences from different locations.
Check your end user agreement from your home. Unless you have a business class agreement from your home such traffic may appear to be a DDOS event, setting off alarms at your service provider. Don't be surprised if you find yourself suddenly cut off from the internet without warning. I have seen this happen before with people attempting to generate load from their homes against a site.
As you can see in the comments, the amount of load you can generate is affected not only by the network bandwidth but also by the script itself and the LG machine specifications. What I mean is that there is no definitive answer to your question without taking all the parameters into account.
What you should do is create an account on one of the popular cloud providers (Amazon, Azure, HP) and create a machine with the exact specifications you need based on the parameters as you know them. Most of these services allow you to increase the machine size and the bandwidth if needed for some extra pay.
Good luck!

How to identify process that generates data transfer out in EC2?

I am hosting a small web-based application with Apache Web Server on EC2. On my monthly fee I usually see ~40GB usage of data transfer out, which cause about $5 or so a month.
Although this is no big money, I am curious on how these data transfer out were generated. I am sure at Midnight there won't be anyone actually visiting the web-based application. And yet there are some data transfer out at ~50M per hour (as I can see from the details report from amazon).
Is there any way to figure out what process actually generates those data-transfer out activity (even at Midnight when no one uses the web-application)?
thanks!
J.
How you looked at Boundary, may be they can help. They can monitor data going in and out of your EC2 instance (networking) You can see details like what ports the packets are coming from and where they are going to.
You have to install and agent on your machine and sign up for a trial.

Real time application on Microsoft Azure

I'm working on a real-time application and building it on Azure.
The idea is that every user reports something about himself and all the other users should see it immediately (they poll the service every seconds or so for new info)
My approach for now was using a Web Role for a WCF REST Service where I'm doing all the writing to the DB (SQL Azure) without a Worker Role so that it will be written immediately.
I've come think that maybe using a Worker Role and a Queue to do the writing might be much more scalable, but might interfere with the real-time side of the service. (The worker role might not take the job immediately from the queue)
Is it true? How should I go about this issue?
Thanks
While it's true that the queue will add a bit of latency, you'll be able to scale out the number of Worker Role instances to handle the sheer volume of messages.
You can also optimize queue-reading by getting more than one message at a time. Since a single queue has a scalability target of 500 TPS, this lets you go well beyond 500 messages per second on reads.
You might look into a Cache for buffering the latest user updates, so when polling occurs, your service reads from cache instead of SQL Azure. That might help as the volume of information increases.
You could have a look at SignalR, it does not support farm scenarios out-of-the-box, but should be able to work with the use of either internal endpoint calls to update every instance, using the Azure Service Bus, or using the AppFabric Cache. This way you get a Push scenario rather than a Pull scenario, thus you don't have to poll your endpoints for potential updates.