I want to simulate our whole checkout process under load. This essentially involves running a number of POSTs in sequence, where the client is storing a unique cookie for each sequence that allows the session to be preserved. Can anyone recommend a software or service that meets these conditions?
This sort of thing could be very easily, effectively and freely accomplished using Apache JMeter. You can either record the journey using JMeter's proxy or simply add the requests manually.
To simulate cookies add a Cookie Manager to the testplan. For any other tokens or session ids that need to be correlated you can use a Regular Expression Extractor.
There are lots of options for this kind of test. Free/open-source tools will require a bit more work on your part but are otherwise free. Tools like ours (Load Tester 5) will get the job done much quicker, but there is a cost for the software. If your organization does not have much experience with load testing and are on a tight schedule, you might want to bring in outside help to help you meet your deadline and learn the process (we offer services as well!).
Related
My goal is to synchronize a web-application with an internal database. The web-application has a public API, but in order to fully synchronize the two sources I would need to make around 2000 separate API calls every time. My instinct tells me that this is excessive and possibly irresponsible, but I lack the experience to know for sure.
In this particular case the web-application is Asana, but I've encountered similar situations before with other services. Is there any way to know if you're abusing a service through excessive API calls? I know I'm not going to DOS a company like Asana, but I can't shake the feeling that there must be a better way than making ~150k requests per day.
The only other option I can think of is to update the web-service only when I know there's been a change in the database, but I'll lose a lot of capability that way.
I apologize for the subjectivity of this question, but I'm really hoping that someone can explain if there's any kind of etiquette that's expected when using public APIs.
(I work at Asana)
This is an excellent question, or rather set of questions.
You are designing a system that will repeatedly make requests for every object. What will happen as the number of objects grows? Even if your initial request rate were reasonable, this would suffer problems with scalability. A more scalable solution is one that scales with the number of changes in the system. This will also grow over time, but much more slowly - the number of changes a single user can make per day is relatively constant, but the total number of objects they've created over time grows and grows. So my first piece of advice would be to avoid doing things this way, and instead find a way to detect changes and just act on those. It would be interesting to know why you feel you'll lose capability by taking this approach.
Now, I happen to know that the Asana API does not currently provide you with any friendly mechanism to just detect changes in the system. This is a commonly requested feature and we are looking into it, though I unfortunately cannot promise a delivery date. So you might be left with no choice but to poll our system for now.
As for being polite to the API, many service providers set limits on their API usage to prevent accidental or malicious use of the API from impacting the service to their other customers -- Asana is no exception. Sometimes these limits are published, other times not, and there is no standard limit: it all depends on the service. But it is very thoughtful of you to be curious about service limitations.
That said, 150k requests per day is, for the Asana API, kind of a lot. If all of our API users gave us that much traffic, we might be serving more requests per day than Google Web Search, and we're not quite that scalable yet. :) Technically, sometimes, we might handle requests at that volume from a single user.
If you must poll, try to poll on intervals like 15 minutes. But please do not poll your entire workspace on this time period; it's likely to be too much traffic/data. We're working on trying to provide you with a better solution.
If you do happen to make too many requests of the Asana API, you will get back HTTP status code 429 instead of your desired response; you can read more about that here (https://asana.com/developers/documentation/getting-started/errors).
I am implementing a website on which the recruited MTurk workers will perform tasks. I plan to recruit workers using MTurk tasks, using which I will redirect them to an external website for actual work. I have the following questions relating to this plan.
Is there any foreseeable problems with this approach of running HITs? If so, how can we mitigate them?
how should I implement the authentication procedure on my external site? For example, how can I make sure the people who come to the website to perform a specific task are indeed the same group of people recruited earlier for this particular task on MTurk?
when the workers finish the task, how should I integrate the payment procedure with MTurk based on their performance? For example, say worker is owed $3 after finishing the task on my external site, is it possible for me to tell MTurk to pay him/her this amount programmatically?
The external site will be built using Python, if such detail matters.
Any suggestions and comments based on your experiences and insights in using MTurk would be much appreciated!
I am thinking through this for a similar project of mine. I've experimented as a worker myself. Here is my plan, I hope it is of use to you. (I have not implemented it yet. It is based on an academic HIT I participated in as a worker.) Here goes:
A. Create a template that has language something like:
1. Please open this web site in a new browser window:
http://your-url.xyz.blah/tasks/${token}
2. Read and follow the instructions there.
3. After completing the task, you will receive a confirmation code. Paste
it here: [________]
B. Create some random tokens for your Mechnical Turk data file:
1A1B43B327015141
09F49F2D47823E0C
B5C49A18B3DB56F4
4E93BB63B0938728
CCE7FA60BFEB3198
...
(Generate these tokens from your app; it needs to cross-reference them.)
C. Your app extracts the token from URL, looks up the task, and does whatever it needs to do. I personally don't worry about people stumbling onto a URL, since it is a one-time use token.
D. After a user completes the task on the external web site, the external app gives a confirmation code. The confirmation code should be random and opaque. Only your application will know if any particular code corresponds to a correct or incorrect answer. In fact, if you want, the correctness may not even be determined in real time -- it could be the result of an aggregation and/or comparison across multiple submissions.
E. Write some code to interact programmatically. Take the token and confirmation code supplied from the MTurk result and make sure they match with your external app. If they don't match, reject the HIT. If they match, check the correctness in your external app and approve or reject. You might consider a bonus pay structure.
So, to answer your particular questions:
I don't anticipate problems with the approach I described. That said, Mechanical Turk is both an art and a science. Perhaps more art. Writing good questions and paying Turkers appropriately is something you have to figure out with a combination of common sense, market research, and experimentation.
See (C) above. A token is designed to only be used once. Use long enough tokens and the probability of collision becomes very low.
See (E) above. The Mechanical Turk Developer Guide is a good place to start.
Please share your results back. Or have the Turkers send StackOverflow hundreds of postcards. :)
Notes:
I'm currently exploring qualification tests. I suspect they can be very useful.
I want to get a Turker's Worker ID in my external application, but I haven't figured that part out yet. I'm reading up on it; for example: Getting workerId by assignmentId
I am thinking about using the ExternalQuestion feature from the API: "... you can host the questions on your own web site using an "external" question. ... A HIT with an external question displays a web page from your web site in a frame in the Worker's web browser. Your web page displays a form for the Worker to fill out and submit. The Worker submits results using your form, and your form submits the results back to Mechanical Turk. Using your web site to display the form gives your web site control over how the question appears and how answers are collected."
You might also find PsiTurk to be useful: "PsiTurk is an open platform for conducting custom behvioral experiments on Amazon's Mechanical Turk. ... It is intended to provide most of the backend machinery necessary to run your experiment. It uses AMT's External Question HIT type, meaning that you can collect data using any website. As long as you can turn your experiment into a website, you can run it with PsiTurk!"
Just got LR 11 Vugen licence and tried TruClient, looks great and the firefox based script recording works really nice. However, I have not found answers to the following:
1)Is TruClient running limited the same way as QuickTest Pro virtual users scripts (1 user per OS)?
2)It is called Ajax TruClient, does it mean it supports only javascript based web pages or all (standard php/html) including javascript etc.?
Here are a few answers for ya:
1) TruClient is not limited like a GUI Vuser (WinRunner or now QTP) to a single GUI session on a Load Generator. You can run multiple AJAX TruClient Virtual Users on a single Load Generator and they will run "invisibly" like a virtual user. You might find that the driver is much heavier (takes more memory and CPU), so you can't run as many vusers as the Web HTTP/HTML vuser.
2) TruClient is not just for AJAX-based web pages - it can work on any web page that will render in a browser.
In addition to what Mark said, it's purely event driven, i.e. if a user clicks on a link, this is what gets rendered, consumed as a resource and subsequently displayed, as opposed to traditional headless implementations, which are, however in return, using less system resources.
This is one of the main caveats with TruClient (from experience): depending on the complexity of your script or workflow, single user simulated can take lots of resources, mainly memory, in my case.
This is because for every Virtual user that gets emulated, an instance of Gecko Web Engine is being spawned, in order to replay the script, and this has its cost.
However, the level of realism reaches very close to typical user session and experience, as you can, for example, set the typewriting speed, decide whether to simulate caching mechanisms or not, make additional corrections of pattern and images recognition, etc.
Overall, mostly positive experience, which has, however, certain price. Talk to your HP sales (disclaimer: A company which I don't work for, just experience).
A little more ...
TC is a big win in some respects as you can avoid a ton of nasty correlation. But it also has some downsides, the memory/CPU footprint can be huge, and the sync issues can be tricky.
HI,
I would like test access time for my website (or certain page, or query) WHEN there are 5000 concurrent connections. I want to test it for a high traffic website.
is it possible to simulate 5000 concurrent connections? if not, how do people test such situation?
If this question can't be answered, what keyword should I use to start searching?
We used httperf for this before. This tool also gives you some metrics like throughput. There is a website here which has a bunch of open source performance tools listed, most of them related to web performance testing.
There are a few load testing packages out there. HP has a tool call LoadRunner, if you click on the datasheet, it has more information. There also is an open source tool call OpenSTA. I just found that with a google search, so I can't tell you much about how that one works.
Finally I've found a service that allows to test:
up to 10.000 cc/sec for free.
up to 100.000 cc/sec for 100$/month.
https://loader.io/
Disclaimer: I have nothing with this service. I post it here, maybe it helps someone.
You could use something like jmeter. We use this for lead testing. It allows you to simulate all sorts of user activity as test cases, run concurrent connections, submit forms, even logged in actions.
The learning curve can be steep if what you need to do is complicated, but that's because it's so feature rich!
I'm building a utility that will hopefully keep my wife in tune with how much money we have available.
I need a simple secure way of logging into my bank account and retrieving the balance.
Something like mechanize is the only method I can think of. I'm not even sure if that would work given the properly authenticated https that banks use.
Any ideas?
Write a perl script using LWP::UserAgent. It supports HTTPS connections. The only issue might be if the site requires javascript.
Web Client Programming with Perl has a few examples to get you started if you're not too familiar with perl.
If you really want to go there, get these extensions for Firefox: Live HTTP Headers, Firebug, FireCookie, and HttpFox. Also download cURL and a scripting language that can run cURL command-line tasks (or a scripting language like PHP or Perl that has access to cURL libraries directly).
I've started down this road for some idempotent GET tasks like getting PDFs of the S&P reports (of the stocks I track) from my online brokerage, and downloading the check images for my bank account. Both tasks are repetitive and slow ways of downloading data to my computer that the financial institutions don't provide any way of making it easier.
Here's why you shouldn't: (as a shortcut I'm going to call the archetypal large bank, brokerage, or other financial institution "BloatBank")
BloatBank is not likely to make public their API for accessing this kind of information. So it can change any time and all your hard work will be for naught. Whenever they change their mechanism, you'll have to adapt.
If BloatBank finds out you've been using automatic scripting to try to access your account information, they may ban you because you've violated their terms of service.
You might screw up, and the interaction between the hodgepodge of scripts on BloatBank's server, and your scripts that access your account, might cause a Bad Thing like closing your account. Testing this kind of script is tremendously difficult because you don't have any documentation about how their online service works, and you don't have a test account you can mess with.
(a variant of the above) You think you're safe because you're issuing GET requests. But BloatBank is just a crazy bank that doesn't know anything about REST, so there are some GET requests that can mess up your account.
If someone else does use your script to maliciously sniff your online password or mess with your account, any liability coverage from BloatBank may disappear because you've opened a security hole.
Why don't you teach your wife how to login to the bank herself? Or use Quicken (or Mint, etc) and teach her how to use the auto-download feature?
Have you checked out Watir? It is fantastic for automating web-browser actions. And since it's written in Ruby, you can take the results and store them in a DB (or email them to yourself) if needed.
If you are open to AIR, I'd say build an AIR app. I have worked with mechanize and I think it's cool. AIR gives you similar features with a richer GUI (see HTMLLoader and DOM manipulation of webpage).
If I were you, I'd simply pull the page and manipulate the DOM to suit my visual needs.
Please, if you find this easy to do for your bank please post your bank's name. If I have the same one I'll be closing my account.
More to your question. The process of loading a web page inside of your code rather than in a browser can be a black art, especially if their is any javascript involved. Your best bet would probably be embedding the IE Web Browser control in your app and then simulating key strokes and mouse clicks to arrive at your balance page. Then scrape the HTML for the balance.
I could try paying for Quicken and letting it do the balance downloading. Then I'd just need to find a way to get the number out of the software automatically.
This way I'm not violating any terms of service and I'm also reducing security risk since all "hacking" goes on locally.