How do I calculate the number of concurrent users to use in a load test? - testing

We’ve come across this question fairly often at Load Impact, so I thought I’d add it to the Stack Overflow community to make it easier to find.
How do I calculate the number of concurrent users (VUs) that I need to simulate during a load test, in order to stress my system with the same kind of traffic that it will normally see in the course of a month, week or day?

Running a load test requires that you specify how many concurrent users should be simulated during testing. In other words, how many simulated users will be active, loading things or interacting with your site/app at the same time. Unfortunately, when looking at Google Analytics for example, we only see how many visits a website has per day or per month. A site can have a million visits per month, but still only ever experience max 100 concurrent visitors.
To convert the "visits per X" metric from Google Analytics, or some other analytics system, into a "concurrent users" metric that you can use for load testing, you can use the following method.
First, find out two things:
You need the total number of visits for a short time period when your site/app is at peak traffic levels. This can easily be found via e.g. Google Analytics by seeing what the highest number of visits was for a single hour in the course of e.g. a month. Look at the day that has seen the highest number of visits, and drill down to see what hour of that day was the busiest and how many visits you had during that hour. Note this value down. I will call this value "peak_hourly_visits" in this text.
You need to know the average time a user spends interacting with your site/app. In Google Analytics this is called "Average session duration" and I will call it that in this text also, but sometimes it is called "Average time on site". If this value changes a lot for your site/app depending on which time period you look at you might want to use one of the larger values you find, to be on the safe side. We want all times in seconds, so if e.g. Google Analytics tells you "00:03:19" (3 minutes, 19 seconds) you should note down 199 as the average session duration.
When you have those two values you use this formula to calculate the number of concurrent users to use in your load test:
concurrent_users = (peak_hourly_visits * average_session_duration) / 3600
Provided that each simulated user (VU) in your load test behaves realistically (i.e. simulates a real user well), you will now be able to stress your site/app with the same kind of traffic that it normally only sees during peak traffic hours.

Related

Storing vast amounts of "uptime" data for a website monitoring service

this is more of a general discussion rather than a code question.
I have a website monitoring platform whereby users of the system can input their website URL and we'll check it every X minutes based on the customer's interval, at each interval, an entry is stored as a UptimeCheck model in the Laravel 8 project with the status being down or up.
If a customer has 20 monitors, and each checks every minute, then over a 30 day period for the one customer they'd accumulate over 1 million rows.
My query, is really do I need to keep this number of rows?
The reason this number of rows is kept is so that we can present a graph showing the average website uptime.
My thinking is that if I created some kind of SVG programatically for each day and store this in the table then I wouldn't need to store as many entries, but my concern here is how would I merge SVG models into one to present a daily graph?
What kind of libraries could I use and how else might I approach this?
Unlike performance, the trick for storing uptime data is simple. You don't store it. ;)
You need to store DOWNTIME data instead. Register only unavailability events and extrapolate uptime when displaying reports.

How to increase Google Sheets v4 API quota limitations

The new Google Sheets API v4 currently has an unlimited read/write quota per day (which is fantastic), but restricted to 500 reads/writes per account per 100 seconds, and 100 read/writes per key per 100 seconds (or, I have found, multiple keys coming from the same IP). This is probably plenty for most use cases, but I have an edge case that requires bringing a frequently-updated Google Sheet with 70 tabs down to a node.js server that distributes these to user's clients every ~30-60 seconds or so (users are data annotators who are student research assistants). This wasn't so bad early in the project when there were only 20-30 tabs, but now that the data is large the server is blowing through the 100 quota and returning errors every 10-15 minutes.
The problem is such that:
Frequent data updates: Only data on 1-5 of the 70 tabs is likely to be updated on any given minute, but which tabs have new data is random (so I am pulling down the whole sheet of 70 = 70 reads).
Update interval: The need for updates happens randomly at about 30 second to 5-minute intervals (so some within the quota, some about 3-5x the quota).
Throttling: I have tried throttling the update to be within the 100 calls/100 seconds (my previous solution), but this introduces large usability issues, significantly decreasing usability/productivity/work quality.
Quota increase: The sheets API does not currently appear to include a way to pay to increase the quota. It does allow filling out a form to request an increase in the quota, but I'm not sure what the mean response time is on this (my request is only a few days old).
Multiple service accounts: I have tried using multiple service accounts to get the full 500 requests/100 seconds quota (rather than the per-user quota), since this is a server, but Google Sheets looks to rate-limit to 100 requests/100 seconds from a given IP
Alternatives: I have considered that this project may have just grown beyond the size that Sheets is easily able to handle, but there do not appear to be any good, usable, self-hosted, collaborative spreadsheets with easy-to-interface-to APIs out there.
Are there settings/methods suggested to achieve the full 500 calls/100 seconds for a server?
You can request quota update in Google Cloud Platform and it will be increased to 2500 per account an 500 per user. (about your #4)
You can use spreadsheets.get to read the entire spreadsheet in a single call, rather than 1 call per request. Alternately, you can use spreadsheets.values.batchGet to read multiple different ranges in a single call, if all you need are the values.
The Drive API offers "push notifications", so you can get notified when changes occur and react to those, instead of polling for them. The latency of the notifications is a little on the slow side, but it gets the job done.

Statistical Analysis and Smoothing based on user ages

I have a set of users in my system. Many of which are young users with regards to the age of their account.
I have then looked at a users event per day over the life time of the account (ie the event can occur on the 2nd day of their account existing then again on the 10th and so on all the way up to the life time of their account.)
What I am trying to do is look at the average occurrence of this event on a daily basis over all accounts, but need to take into consideration that I actually have many more young users (ie users which have not been around a long time).
I have tried a couple of statistical tricks, but am not 100% sure the best method to go about doing this.
Any point in the right direction would help.

Finding an applications scalibility point using JMeter

I am trying to find an applications scalibility point using JMeter. I define the scalability point as "The minimum number of concurrent users from which any increase no longer increases the Throughput per second".
I am using the following technique. Schedule my load test to run for an hour, starting a new thread sending SOAP/XML-RPC Requests every 30 seconds. I do this by setting my number of threads to 120 and my ramp up period to 3600 seconds.
Then looking at my TOTAL rows Throughput in my Summary Report Listener. A new row (thread) is added every 30 seconds, the total throughput number rises until it plateaus at about 123 requests per second after 80 of the threads are active in my case. It then slowly drops the throughput number to 120 per second as the last 20 threads are added. I then conclude that my applications scalability point is 123 requests per second with 80 active users.
My question, is this a valid way to find an application scalibility point or is there different technique that I should be trying?
From a technical perspective what you're doing does answer your question regarding one specific user scenario, though I think you might be missing the big picture.
First of all keep in mind that the actual HTTP request you're sending and ramp up times can often impact what you call a scalability point. Are your requests hitting a cache? Are they not random enough? Are they too random? Do they represent real world requests? is 30 seconds going to give you the same results as 20 seconds or 10 seconds?
From my personal experience it's MUCH easier and more intuitive to look at graphs when trying to analyze app performance. It's not just a question of raw numbers but also looking and trends and rates of change.
For example here is an example testing the ghost.org blogging platofom using JMeter with an interactive JMeter results graph.
http://blazemeter.com/blog/ghost-performance-benchmark

Bloomberg API request limit

Is there anyway to determine how many requests or how much data you have in your remaining request limit amount for Bloomberg API?
from Bloomberg HelpDesk on April 2014 (this is valid for a basic desktop client):
We have 3 kind of limits..
You can have no more that 3500 real time fields open at the same time.
If you exceed this limit you will see "NA Limit" as error message and
you just need to delete some securities/ fields in order for the error
message to disappeared and to see the values.
We have also a daily limit. The Daily API limit is 500,000 hits/per
day. A "hit" is defined as one request for a single security/field
pairing. Therefore, if you request static data for 5 fields and 10
securities, that will translate into a total of 50 hits. so try to
refresh just the portion of the spreadsheet that really needs to be
refreshed and avoid refreshing it all or reopen it many times a day.
The last limit is a monthly limit. Our monthly limits comes from a
proprietary model. Only about 0.4% of our user database ever goes over
this limit. This limit is based on unique securities and depends on
the type of data being downloaded. For example some of the data on the
system such as intra-day is valued a little bit higher than historical
end of day for any given list of securities. We do not recommend more
than 5000 to 7000 unique identifiers per month and the limit upgrade
will only allow you to get data to complete your project. Once a
security is used once in a month then if you use it again it will not
count again towards the monthly limit.
We normally grant 2 resets per month in case you exceed your daily
limit and if you exceed your monthly limit we grant 1 extension per
month (10% more), if you breach the limit again you will then need to
wait for the midnight for the daily limit to be reset automatically or
the end of the month for the reset of the monthly.
Bloomberg do not state what the explicit limits are, and there is no programmatic way of finding out what the limits are or what proportion of your limits you have used.
The best information from Bloomberg that I have found is on the WAPI page (in the terminal). On the menus on the LHS, go to WAPI Home > API Resources > API Data Limits. There are two pages, 'Extended Rules and Usage Limits' and 'Managing Your API Data Limits' that shed some further light on the matter.
Broadly speaking... there is a daily limit of individual data requests (i.e. security/field pairs - but duplicates are counted for each request). However, your limit for subscriptions is based on the number of securities you are subscribing to concurrently - i.e. if you expect to be requesting the price of a security every 5 mins, you are much better off subscribing to that security's price. Then there is a monthly limit that is based on the number of unique securities that you are making requests for.
there is an upper limit on Bloomberg API, 500,000 hits per day.
-- information from Bloomberg Help Help
The daily limit is clearly stated - it is the monthly limit that is not to my knowledge disclosed in writing. I have been told the following in the context of discussions about Data Licence, which is one Bloomberg product for bulk data subscription. The monthly limit is expressed as a budget in $, and it is the equivalent price for your requests, priced under the Data Licence schema, which clearly is not secret if you enquire about that product. So why the secrecy about the budget? The reason it is commercially sensitive is that this budget is many times the monthly cost of the Terminal Licence, so clearly if you (a) know what it is and (b) either have access via API to the budget spent (nope) OR write software to 'count the cost' (not hard), then you could pony up a couple of terminals and vastly reduce your Data Licence spend. Bloomberg naturally frown on this sort of activity because it represents an arbitrage opportunity in their pricing model and it is not really 'playing nice'. They likewise do not like if you hit 'the wrong kind of data' too often or the monthly limit at all often, and these activities may prompt them to investigate your business model to be sure you are in compliance with all the T&C of the Data Addendum. Out of courtesy to Bloomberg I am not posting that budget number here, but you should be able to get it from your salesperson and confirm the validity of what I have said, because it may change at any time as it is not part of any contract.
I don't believe this is possible programmatically, however if you speak to the Bloomberg helpdesk they will be able to tell you whether you are near the limit, and reset it for you if necessary. Obviously they will only do that a certain number of times. I have not managed to get a definitive answer as to what the limit is, but it's designed to be large enough that you would not hit it just running spreadsheets, which have a limit of 3500 Bloomberg real-time formulas.
If you feel the download limit is not breached but you still get the error message, you can run the following steps to solve the issue:
Close Excel completely.
From the Windows "Start" menu, select All Programs > Bloomberg > Stop API Process. A command prompt window appears.
Press <Enter> to close the window.
From the Windows Start menu, select All Programs > Bloomberg > API Environment Diagnostics.
Click the Start button.
When the test is complete, if there are any red errors, click the "Repair" button.
Re-open Excel and test a formula.
500'000 data points is the approximate daily limit, however remember different types of data use up varying amounts. It is not 1 for 1. Typically requests for esoteric securities and fields will use up more data per request, than PX_LAST for AAPL US for example. Also there are different types of request, such as reference or historic, which will also consume your limit differently.
If you are requesting intra-day realtime data, these fields are typically not charged to your usage limit. Rather you have limits on how many times the realtime 'pipe' can be opened.
Bloomberg are typically very helpful at resetting your monthly data usage limit should you exceed it on an adhoc basis. This is not written company policy, but seems to be part of their customer care. If you are persistently breaching limits each month, they are likely to stop resetting your limits and try to move you to B-PIPE. But otherwise for my experience they are flexible