What is squeeze testing? - testing

In the talk "Beyond DevOps: How Netflix Bridges the Gap," around 29:10 Josh Evans mentions squeeze testing as something that can help them understand system drift. What is squeeze testing and how is it implemented?

It seems to be a term used by the folks at Netflix.
It is to run some tests/benchmarks to see changes in performance and calculate the breaking point of the application. Then see if that last change was inefficient or determine the recommended auto-scaling parameters before deploying it.
There is a little more information here and here:
One practice which isn’t yet widely adopted but is used consistently
by our edge teams (who push most frequently) is automated squeeze
testing. Once the canary has passed the functional and ACA analysis
phases the production traffic is differentially steered at an
increased rate against the canary, increasing in well-defined steps.
As the request rate goes up key metrics are evaluated to determine
effective carrying capacity; automatically determining if that
capacity has decreased as part of the push.

As someone who helped with the development of squeeze testing at Netflix. It is using the large amount of stateless requests from actual production traffic to test the system. One of those ways is to put inordinately more load on one instance of a service until it breaks. Monitor the key performance metrics of that instance and use that information to inform how to configure auto scaling policies. It eliminates the problems of fake traffic not stressing the system in the right way.
The reasons it might not work for every one:
you need more traffic than any one instance can handle.
the requests need to be somewhat uniform in demand on the service under test.
clients & protocol need to be resilient to errors should things go wrong.
The way it is set up is a proxy is put in front of the service. The proxy is configured to send a specific RPS to one instance. I used the bresenham's line algorithm to evenly spread the fluctuation in incoming traffic over time to a precise out going RPS. turn up the dial on the RPS, watch it burn.

Related

WebRTC Broadcasting - Make a Peer Act as a Streaming Server

Can we implement a mid-scale broadcasting (maybe upto few hundreds) over WebRTC by using a star topology.
Here a peer can take the role of a streaming server (and even be put in a server kind of setup from where more bandwidth is accessible).
Will this kind of setup not scale pretty well (as the central peer can take advantage of a server infrastructure, if needed); say for 100-200 users or even more?
Can we consider it a viable option than going with a dedicated MCU solution? Or, if you know can you point out its limitations?
Can somebody point me to any implementation code for this?
That would be quite difficult. I think that in theory you could pass stream from one peer connection to multiple other ones. That said even if it works, I really doubt it will work on any reasonable scale.
The best way is to use the MCU. You'll get a real star topology.
I can only suggest using our services - AddLive - http://www.addlive.com (usual disclaimer - I work there).

Image Resizer - Best Practice for security

We are currently testing out Image Resizer library and one of the questions is, how do we avoid malicious attacks to the site if someone programmically send thousands of resizing requests of images with arbitrary sizes to the server, overloading the CPU/RAM of server and potentially causing disk space to run out due to tremendous caching files.
Is there any way to whitelisting certain dimensions? Or what is the best practice to avoid this scenario?
Thanks!
Stephen
Neither CPU or RAM can generally be overloaded during a (D)DOS attack to ImageResizer. Memory allocation is contiguous, meaning an image cannot be processed unless there is around 15-30% free RAM remaining. Under the default pipeline, only 2 cores are used for image processing, so a regular server will not see CPU saturation either.
In general, there are far more effective ways to attack an ASP.NET website than though ImageResizer. Any database-heavy page is more likely to be a weak point, as the memory allocations are smaller and easier to saturate the server with.
Disk space starvation can be mitigated by enabling autoClean="true".
If you're a high-profile site with lots of determined ill-wishers, you can also consider the following:
Use request signing - only URLs generated by your server will be accepted.
Use the Presets plugin to white-list defined permitted command combinations.
Both of these reduce development agility and limit your options for responsive web design, so unless you have actually been attacked in the past, I don't suggest them.
In practice, (D)DOS attacks against dynamic imaging software are rarely useful at bringing down anything except — temporarily — uncached images — even when running under the same application pool. Since visited images tend to be cached, the actual effect is rather laughable.

Services similar to S3/EC2

Does any other provider offer a cloud computing + storage layer like S3/EC2, with free data transfer between the two layers?
I have looked at:
Softlayer CloudLayer Storage -- no free transfer between the cloudlayer storage and cloudlayer computing instances.
Rackspace CloudFiles - Quite a bit of marketing mumbo-jumbo, and something about Cloud Connect, I gave up on the site once the Live Chat CSS Popup started following me around.
Does anyone know of any others?
I'm looking to store some large (non-random access) files for constant re-processing on a storage solution, and process it nearby, without paying transfer costs daily (looking to store in the 500-2000GB range, re-processing it all daily).
Re-processing requires a (Linux) server with a "decent" (weasel word alert) configuration.
Thanks!
'Cloud computing' is a bit of a myth.
They're all just, essentially, virtual private servers. 'Cloud' instances tend to have the flexibility to be billed by the hour, rather than monthy, but they're still just a VPS.
Persistent storage is a useful feature from a very limited number of VPS providers, but one that can easily be emulated by having two+ VPS' in the same data centre (Linode are an excellent VPS provider with free local data transfer, sadly they're rather limited by capacity). I don't know of any other VPS/Cloud providers who offer their own persistent storage solution.
It is something you can easily achieve yourself. VPS servers tend to be a little restrictive on hard drive capacity if you're looking for 500-2000GB, Perhaps you could consider a dedicated server and handle storage and processing on the same machine... you can't get data more local than the same machine!
First, the short version: stop looking for “free”.
Now, in more detail: you're looking to consume some somewhat-non-trivial computing, data storage and networking resources. Presumably you've got a good reason for doing this; if you truly have, you'll have the ability to also purchase the resources required for what you want to do. There are a few options on this front, none of which are free:
Buy and host your own hardware.
Buy the hardware and host it in a colocation facility.
Hire the hardware
Long term hire
Short term hire
All the Amazon are doing is short-term, easy set up hiring of resources. Their prices are quite keen (if some other option is cheaper, it's because it is missing something significant that Amazon do; maybe it's something you don't need but that's up to you to figure out). You can host the core of the Amazon API quite easily on whatever resources you've hired (see Eucalyptus) but be aware that going from having the software and the API to having everything work smoothly is a really big step; the more I work with Eucalyptus installations, the more impressed with Amazon I become. And that's despite being also pretty impressed with Eucalyptus itself.
But none of this is free. It takes real resources to provide – e.g., electricity to power the machines and keep them cool and a building to house them in – and ultimately, that's got to be paid for somewhere. To expect otherwise is to believe that others should have to pay for things for you; it's pretty rare that that happens, and the more you need to consume, the rarer it is (especially if the economy isn't doing too good). So stop thinking in terms of how you can get it for nothing (“freeload”) and instead take a good look at what it really costs to provide through various routes and seek to minimize your costs. If you can't afford even that, your #1 problem isn't hosting but funding; fix that first.
Rest assured you're not alone in this matter. This is what lots of other people worldwide have to do to make their projects into reality. Good luck!
GoGrid has an external storage with free transfer and access over typical protocols like SMB, NFS, rsync, FTP. The first two allows for mounting as normal drive.
Note also that many providers will allow you to create cloud servers with 2 TB instance storage. For sure not able to name all of them, but you can find some with cloudorado.com .

Webservice wcf performance counters for queue

Am trying to performance test a wcf webservice which should get a lot of traffic. Which performance counters are sensible to use and for which purpose..Naturally I am looking at CPU and RAM, but I would like to know when IIS is queing and when its having trouble...
Any advice on sensible performance counters gratefully received...
Cheers alex
MSDN has an entire section on WCF administration and diagnostics, and specifically, for performance counters in WCF.
There are also specific sections for performance counters hosted service calls, as well as for the endpoint and for operations.
I would suggest looking through those first, as there is a good amount of valuable information there.
Analysing performance counters is complicated and takes a lot of practice, which is my way of saying that I am not experienced enough to give a complete list.
You are going to look for some specific things to start with.
First off is of course how long it takes to return the webservice calls. This tells you if you even have a performance issue at that load.
Next every one looks at CPU. This really does not tell you a lot however.
RAM is good, but you want to know how often your app is paging to disk, so check the Page Faults/sec.
Check your logical and physical disks for Current Disk Queue Length. If your physical disk is queing at all, you are reading/writing to much to the disk.
Beyond that you would normally be trying to find a specific and likely obscure problem.
I usually take performance testing in stages. Do a first test with the basics and if a particular page is having a problem look at the load it is causing.
If the whole production server is not performing adequately, it is easier to add more hardware, but I prefer looking at the code that is running and make that better.
Before you run your performance monitors, you want to add the registry key:
HKLM/Services/CurrentControlSet/service/
Add ServiceModelService 4.0.0.0
under that add Performance then add a DWORD FileMappingFile.
The size for that will be number of services exposed * 33 * 350.
In your config you would then add
<system.ServiceModel>
<diagnostics performanceCounters="ServiceOnly"/>
</system.ServiceModel>
You can watch the following counters:
CPU / RAM (for memory leaks) / for each service Call and Call Duration as well as Calls Outstanding
CPU will show you how heavily your are saturating your server
RAM will show if you have memory leaks if it continues to grow and grow and grow
Calls will show the number of calls you are getting accumulative,
Calls Per Second will give you the volume you're handling
Calls Outstanding are clients that are waiting because your services could not handle the volume.
If you find some questionable numbers in these groupings then start looking at other elements like Calls Faulted or Calls Failed. (not sure of the difference between a failure and a fault)
It is rare that you would need to dig further into the issues than what the service only numbers will provide. When you get into the other two sets of counters your shared memory utilization gets really high.

Spread vs MPI vs zeromq?

In one of the answers to Broadcast like UDP with the Reliability of TCP, a user mentions the Spread messaging API. I've also run across one called ØMQ. I also have some familiarity with MPI.
So, my main question is: why would I choose one over the other? More specifically, why would I choose to use Spread or ØMQ when there are mature implementations of MPI to be had?
MPI was deisgned tightly-coupled compute clusters with fast, reliable networks. Spread and ØMQ are designed for large distributed systems. If you're designing a parallel scientific application, go with MPI, but if you are designing a persistent distributed system that needs to be resilient to faults and network instability, use one of the others.
MPI has very limited facilities for fault tolerance; the default error handling behavior in most implementations is a system-wide fail. Also, the semantics of MPI require that all messages sent eventually be consumed. This makes a lot of sense for simulations on a cluster, but not for a distributed application.
I have not used any of these libraries, but I may be able to give some hints.
MPI is a communication protocol while Spread and ØMQ are actual implementation.
MPI comes from "parallel" programming while Spread comes from "distributed" programming.
So, it really depends on whether you are trying to build a parallel system or distributed system. They are related to each other, but the implied connotations/goals are different. Parallel programming deals with increasing computational power by using multiple computers simultaneously. Distributed programming deals with reliable (consistent, fault-tolerant and highly available) group of computers.
The concept of "reliability" is slightly different from that of TCP. TCP's reliability is "give this packet to the end program no matter what." The distributed programming's reliability is "even if some machines die, the system as a whole continues to work in consistent manner." To really guarantee that all participants got the message, one would need something like 2 phase commit or one of faster alternatives.
You're addressing very different APIs here, with different notions about the kind of services provided and infrastructure for each of them. I don't know enough about MPI and Spread to answer for them, but I can help a little more with ZeroMQ.
ZeroMQ is a simple messaging communication library. It does nothing else than send a message to different peers (including local ones) based on a restricted set of common messaging patterns (PUSH/PULL, REQUEST/REPLY, PUB/SUB, etc.). It handles client connection, retrieval, and basic congestion strictly based on those patterns and you have to do the rest yourself.
Although appearing very restricted, this simple behavior is mostly what you would need for the communication layer of your application. It lets you scale very quickly from a simple prototype, all in memory, to more complex distributed applications in various environments, using simple proxies and gateways between nodes. However, don't expect it to do node deployment, network discovery, or server monitoring; You will have to do it yourself.
Briefly, use zeromq if you have an application that you want to scale from the simple multithread process to a distributed and variable environment, or that you want to experiment and prototype quickly and that no solutions seems to fit with your model. Expect however to have to put some effort on the deployment and monitoring of your network if you want to scale to a very large cluster.