load testing through wifi/lan + vpn - testing

I really did not found answer on my question in the web. I am currently doing load tests for a web service, for example: how service will handle 15 threads in 1 seconds, for that I use Jmeter. I always get different average response times for 15 threads. When I'm in my company's inner network, I get wonderful results, but when I am at home, using lan/wifi + vpn to get access to that web services, I get horrible results. When I test it through vpn, web service can not handle 30 threads in 1 seconds, average response time is like 13 seconds, otherwise from company's network, average response time is 4-5 seconds. Also, that web service, would be also called from a system using vpn.
My question is, what is correct result and correct way to test it. Test it from company's network, or though vpn?

Response time consists of the following metrics:
Connect time
Latency (also known as Time To First Byte)
Time to last byte
So my expectation is that it's not the high response time, it's more about bandwidth of your ISP and VPN connections, theoretically you can subtract these connect time and the time for the packets to travel back and forth and get the "real" response time, however a better idea would be setting up a remote JMeter slave to be "local" to the system under test and orchestrate it from your "remote" JMeter master host, this way you will be able to obtain "clean" results without these network-related slow downs.
More information: Apache JMeter Glossary

Arguably, the correct way to test it should be the way your users are accessing your web service.
If the majority of users are accessing it through a VPN from outside, then test it that way; if it is the other way around test it from the company's network.
In the case of mixed access, you might want to test both at the same time.

Related

TURN servers: always or never needed for a given network, or needed unpredictably?

I am currently just using a STUN server and am wondering whether TURN is necessary for an MVP. The number of users accessing the website from a workplace with super secure firewalls should be near-zero.
Let's say 4 people are testing WebRTC connection reliability. Sometimes they all successfully connect and can see/hear one another, but other times they cannot see/hear someone and refresh the page to try again.
Does the fact that they can sometimes all see/hear each other rule out whether a TURN server would make a difference?
In other words, is it possible for a STUN server to identify my IP so I can connect one second, but fail if I try again a few seconds later? Or is it just network-based, so if I a STUN doesn't work for me on my network now, it 100% will not work in 5 minutes either?
If the latter (and a TURN is either always or never needed for a given network), I guess that tells me the problem in my code is elsewhere...

Asp.net Web Api 2 OWIN self hosted, high CPU, what is the average compute load I should expect?

I built a set of 3 APIs using Asp.net Web Api 2, self-hosted using OWIN in an Azure Cloud service Worker role.
The Worker Role is exposed to the internet with a custom domain.
Each API has a single controller, doing some normal dictionary operations, table calls and Azure Redis calls. 1 request on two just do a single Redis call and return in around 10ms.
The average call when going through all the API code is 150ms.
The answer is a JSON object of around 10k in size.
Everything works fine, but I have a problem.
I'm having around 25 peaks connections per second and no more than 2 Million requests per day, and I can barely get the CPU below 40% with 3 Azure D2_V2 (2 cores , 8GB RAM) instances running.
I'm in trouble because I'm spending almost 1.5k$ a month for an Api serving just 15-25 calls per second.
If I remove or scale down an instance, the CPU go up to 55-60%, Redis and Azure table calls slows down a lot and an API request takes 3- 5 seconds to get back.
I tried everything at the best of my abilities, I thought could be some bots or DDos attack, so I installed the nuget package WebApi Throttle, set a maximum of 1 requests per IP per second.
Nothing changed.
I reviewed all the code configuration to cut unoptimized parts, but 1 call in 2 just call redis and get back and the others are very clean and simple C# returning in 150ms with 2 azure table calls + 1 azure queue set.
The API Controllers are async, everything is async.
I enabled Profiling, the CPU is high in the main azure process, and the Redis Get method, nothing else relevant here, no bottlenecks.
I enabled Diagnostics, no errors.
I installed Application Insights, and here I see something strange that cannot tell if it is normal or not.
I see this IP: 13.88.23.0 doing thousands requests to the APIs with querystring values generally used in normal requests. A lot of them fail.
This IP is Azure itself, why is calling the Api?
Some of these requests are stuck for minutes, I can see that from the Application insights panel, it's always the same IP.
Then I see the remaining logs, dependencies etc,nothing relevant.
Apart from that , what could I do to understand the problem?
I can't think is normal to consume so many CPU resources for an API with just 2 Million calls a day, or not?
Is there an additional profiling technique I could use?
Based on your experience, how many API calls should I expect to serve with 3 dual core 8GB RAM servers in normal conditions? (assuming there is something wrong in my configuration)
Thanks
UPDATE
I separated the API in two cloud services, 2 in one and 1 in another.
I still see in Application Insights calls from another IP belonging to Microsoft.
I suppose this is normal, probably Application insights cannot detect the real IP of the client since is a Worker Role and show the internal one.
But the problem of having to use so much power for so few calls remain.
Any thoughts on that?

Domain Name Re-resolution issue Firefox

If I have four identical servers hosted on AWS EC2, divided into two groups and the each group is located in different AWS regions. There is one ELB in front of each group. I configured two weighted alias records (not latency based) point to the ELB of each group in the AWS Route53.
Each server has a simple apache2 server installed which display a simple page with different words to distinguish from each other. I started a browser client (made by Selenium library) frequently reloading page with the URL which is the domain name of these servers (pause for 1 seconds) but I found that that the browser (firefox) always return pages from servers in one group instead of returning pages from both group in 50% times as how Weighted Round Robin works.
I also found that if I pause for a relatively longer time, pages from the other group do get returned but as long as I refresh the page frequently. It never changes. Request always hits one group not the other.
I also wrote a simple Java program to keep querying the domain name from AWS Route 53 and the address I got back does change between two group, but the browser seems stuck in the connection with the group it first connected (as long as I frequently refresh)
I suspect it is the problem of TCP connection still alive, but I am not sure. BTW, I have already disabled browser cache and I am using Mac OS X 10.9. (This does happen on Ubuntu as well)
Any ideas will be really appreciated. This issue is really important to my works the deadline of which is approaching. Many many thanks in advance.
Unfortunately, that's normal.
Many, perhaps most, browsers cache the dns response they get from the OS, and the timing of this cache is unrelated to the DNS TTL -- it's at the discretion of the browser developers.
For Firefox, the time by default appears to be 60 seconds, so, not likely to be directly related to keepalives, though there's certainly some potential for that, too... albeit for a shorter time interval, in some cases, since many servers will tear down an idle kept-alive connection before 60 seconds, since a connection idle for so long is a potentially expensive waste of a resource.
Firefox: http://kb.mozillazine.org/Network.dnsCacheExpiration
For a discussion of the issue and observations made of the behavior of different browsers, see also: http://dyn.com/blog/web-browser-dns-caching-bad-thing/

how to scale a WCF service in such a scenario

I have an app which can track vehicles. Vehicles can change location, appear or disappear at any time. In order to be always up to date, every 3 seconds the app sends to the server the region that is currently visible on the map and the server responds with a list of vehicles which are in the specific area.
Problem: What happens when I have a database of 1000 vehicles and 10000 requests being sent to the server every 3 seconds? How would you solve this scalability issue with WCF?
There are a couple a thing to do
On Client-Side
As Joachim said, try to limit request from client-side. I am not sure that vehicule will move significally every 3 seconds. Eventually, try to combine positions and others informations in a batch.
On Server-Side
Problem: What happens when I have a database of 1000 vehicles and
10000 requests being sent to the server every 3 seconds? How would you
solve this scalability issue with WCF?
The best way to answer this question is to do a load test. The results are very depending on your service implementation. If your request takes more than 1 sec, you will certainly have performance problems.
Your can also add a queue behind your service for handling request, and even deploy your service on many servers in order to dispatch requests between different servers.

Interpreting and evaluating Jmeter output

I am using Jmeter to carry out load and stress tests on a RESTful web service for a university project. I have used JMeter to successfully return results but I am not sure what constitutes an acceptable level of performance. What level of throughput etc. should i be looking to achieve with my web service?
Aim for less than 50us (microsecond) if you can. Use other 'good' sites' response time as your yardstick. Check out google.com and bing.com load times. For more 'heavy' sites see cnn.com or nytimes.com for examples. Depending on how heavy your site is in term of calculation or database reads milliseconds response time are still ok. Be unhappy if it takes more than 300ms because that is an insanely long CPU time these days.