Simultaneous queries in Solr - lucene

Hej,
I am deploying a Solr server containg more than 30m docs. Currently, I am testing the searching performance and the results are very dependant of the number of simultaneous queries I execute:
1 simultaneous query: 2516ms
2 simultaneous queries: 4250,4469 ms
3 simultaneous queries: 5781, 6219, 6219 ms
4 simultaneous queries: 6484, 7203, 7719, 7781 ms
...
Jetty threadpool is configured as default:
New class="org.mortbay.thread.BoundedThreadPool"
Set name="minThreads" 10
Set name="lowThreads" 50
Set name="maxThreads" 10000
I would like to know if there is any factor I can set for decreasing the impact of the simultaneous requests in response times.
Solrconfig is configured also as default but without cache for measuring worst cases and mergeFactor=5 (searching will be more requested than updating).
Thanks in advance

Why are you trying to do this with caching turned off? What exactly are you trying to measure?
You have effectively forced Solr (Lucene) to perform every search from the disk. What you are actually measuring is concurrency of Java itself combined with your OS and disk throughput. This has nothing to do with Jetty or Solr.
Caches are your friend. You really should be using them in any sort of a production capacity. In my opinion, you should be measuring your throughput under load while varying the caches to see what the tradeoff is between cache size and throughput.

Please check out this IBM Tutorial for Solr
I got a great help from this.
Hope you will find your answer. :-)

Related

Singlestore (MemSQL)

I have a Singlestore (previously MemSQL) cloud database set up.
My software is running in the background, constantly writing to a table.
When I try to query this table, it takes 10+ seconds. When the software is shut off, the query takes milliseconds.
What would be the reason for this? And is there anything that can be done to mitigate against this?
From a high level, cluster resources are much more utilized while the background software constantly writes to the table. The same resources that handle the constant writes are concurrently trying to serve the query, so it makes sense its faster when there is no writing.
A 'knob to turn' WRT database ingest performance is partition count - you can try creating a test DB w/ more partitions that the current DB (say 2x more). Then try querying from the test DB, both while the background software is running and while it is not - compare this to the DB w/ fewer partitions.
For general guidance on troubleshooting query performance, see this section of the docs: https://docs.singlestore.com/managed-service/en/query-data/query-procedures/troubleshooting-poorly-performing-queries.html
If you're an active customer, you can file a support ticket for the issue for some additional analysis of the backend workings

Multi-threaded performance testing MS SQL server DB

Let's assume the following situation:
I have a database server that uses 4 core CPU;
My machine has 2 core CPU;
Assume they are of equal speed in terms of GHZ;
Systems are connected over a network (two lines 200mb/s each);
Test tool that I use provides # of threads parameter and will issue commands in parallel to the server.
QUESTIONS:
How would you test parallel reads/writes via stored procedure? Please brainstorm as any advice is appreciated;
How can I prove that many threads are executing the queries on the server (or should I not pay attention to this as this servers and DB's responsibility)?
What controls how many threads are executed at any time primarily in case of SQL server? I checked the "server properties" > processors > # of processors and threads section - waht more should I check?
How can I check that my application truly executes on all my machine cores - in other words - uses real threads instead of virtual ones? Or should I pay attention only to the virtual ones?
Should I pay attention to the network bandwidth? Can it be a bottleneck (I dont' send any big data, only commands with variables).
1.) not sure perhaps someone else can answer
2.) SQL Sentry allows you to monitor your SQL activity (use the free trial and buy if you like it)
3.) Max Dop controls the number of processors & also the cost threshold will affect parrallelism
4.) Same as 2 perhaps, i'm not sure i understand
5.) Depends on what you are doing are where you see aproblem SQL sentry will show wait stats that may help

How to circumvent BigQuery's 20 concurrent queries limitation?

Wondering if anyone knows... or have ran into this.. there's a 20 concurrent queries limitation for BigQuery.
https://developers.google.com/bigquery/quota-policy#queries
Is there a way to disable the limit? Our MapReduce tasks needs many concurrent queries in order to complete within a reasonable amount of time.
We have a similar problem. There is no way how to change this from your side. Also "upgrading your plan" as #Dominik suggest won't help.
You have to contact Google directly, explain your problem (business case) and if it is valid they can increase your quota limits (for certain Google Cloud project)

What is "Excessive resource usage" in SQL Azure?

I searched online for awhile about what is "Excessive resource usage" on SQL Azure, still cannot get an idea.
Some articles suggest query takes too long, too much memory etc will cause "Excessive resource usage". But If I use simple query, simple data structure, what will happen?
For example: I get a 1G SQL Azure as session state. Since session is a very small string, and save/delete all the time, I don't think it will grow to 1G for millions of session simultaneously. You can calculate, for 1 million session, 20 char each, only take 20M space, consider 20 minutes expire etc. Cannot even close to 1G. But the queries, should be lots and lots. Each query will be very simple and fast by index.
I wanna know, if this use will be consider as "Excessive resource usage"? Is there any hard number to limit you on the usage?
Btw, as example above, if all happen in same datacenter, so all cost is 1G database which is $10 a month, right?
Unfortunately the answer is 'it depends'. I think that probably the best reference (with guidance) on the SQL Azure Query Throttle is here: TechNet Article on SQL Azure Perormance This will povide details about the metrics that are monitored and the mechanism of the throttle.
The reason that I say it depends is that the throttle is non-deterministic for any given user. This is because the throttle will be activated based on the total load on the node (physical SQL Server in Azure DC). While the subscribers who will get throttled are the subscribers delivering the greatest load the level at which the throttle kicks in will depend on the total load on the node. SO if you are on a quiet node (where other tenant DBs are relatively inactive) then you will be able to put through a bunch more throughput than if you are on a busy node.
It is very appealing to use 1GB SQL Azure DBs for session state storage; you've identified the cost benefits. You are taking a risk though. One way to mitigate this risk is to partition across at least two SQL Azure 1GB DBs and adjust the load yourself based on whether one of the DBs starts hitting the throttle.
Another option if you want determinism for throughput is to use the WIndows Azure Cache to back your sesion state store. The Cache has hard pre-defined limits for query throughput so you can plan for it more easily Azure Caching FAQ including Limits. The Cache approach is probably a bit more expensive but with a lower risk of problems.

How to control number of running worker processes for MongoDB?

Well, as the question simply explains itself, let me clear it up little more.
I am running MongoDB primarily for read-only purposes at back-end. My crons do the writes and they don't really push it hard when they are triggered. Some updates, some new documents etc.
The thing is requests usually do not even hit the application level because of entire page caching handled within MemCached by Nginx. So the application doesn't query database for another hour per page.
But so far as I can see in my process list, there are 21 MongoDB worker processes that are using none of the CPU but reasonably large amount of memory because of the previous queries.
I checked the configuration settings and googled around but couldn't find any answer.. So, is there any way to limit those processes or at least to tell MongoDB reduce/empty its memory usage after a while?
Workers are using for talking to config server and other replica as well apart from just serving user request. This is documented in here.
we can limit net.maxIncomingConnections as par recommendation on this page to limit the number of workers processing user request. But this should be used with precaution as setting this number too low and then sending more concurrent calls will result in some calls being queued.