Improving Solr performance - lucene

I have deployed a 5-sharded infrastructure where:
shard1 has 3124422 docs
shard2 has 920414 docs
shard3 has 602772 docs
shard4 has 2083492 docs
shard5 has 11915639 docs
Indexes total size: 100GB
The OS is Linux x86_64 (Fedora release 8) with vMem equal to 7872420 and I run the server using Jetty (from Solr example download) with:
java -Xmx3024M -Dsolr.solr.home=multicore -jar start.jar
The response time for a query is around 2-3 seconds. Nevertheless, if I execute several queries at the same time the performance goes down inmediately:
1 simultaneous query: 2516ms
2 simultaneous queries: 4250,4469 ms
3 simultaneous queries: 5781, 6219, 6219 ms
4 simultaneous queries: 6484, 7203, 7719, 7781 ms...
Using JConsole for monitoring the server java proccess I checked that Heap Memory and the CPU Usages don't reach the upper limits so the server shouldn't perform as overloaded. Can anyone give me an approach of how I should tune the instance for not being so hardly dependent of the number of simultaneous queries?
Thanks in advance

You may want to consider creating slaves for each shard so that you can support more reads (See http://wiki.apache.org/solr/SolrReplication), however, the performance you're getting isn't very reasonable.
With the response times you're seeing, it feels like your disk must be the bottle neck. It might be cheaper for you just to load up each shard with enough memory to hold the full index (20GB each?). You could look at disk access using the 'sar' utility from the sysstat package. If you're consistently getting over 30% disk utilization on any platter while searches are ongoing, thats a good sign that you need to add some memory and let the OS cache the index.
Has it been awhile since you've run an optimize? Perhaps part of the long lookup times is a result of a heavily fragmented index spread all over the platter.

As I stated on the Solr mailinglist, where you asked same question 3 days ago, Solr/Lucene benefits tremendously from SSD's. While sharding on more machines or adding bootloads of RAM will work for I/O, the SSD option is comparatively cheap and extremely easy.
Buy an Intel X25 G2 ($409 at NewEgg for 160GB) or one of the new SandForce based SSD's. Put your existing 100GB of indexes on it and see what happens. That's half a days work, tops. If it bombs, scavenge the drive for your workstation. You'll be very happy with the performance boost it gives you.

Related

How to estimate the maximum number of reads and writes per second a RDBMS server can handle?

Before spinning up an actual (MySQL, Postgres, etc) database, are there ways to estimate how many reads & writes per second the database can handle?
I'm assuming this is dependant on the CPU and memory (+ network if we're sharding), but is there a good best practice on how to put these variables together?
This is useful for estimating cost and understanding how much of a traffic spike can the db handle.
You can learn from others to gauge transactions per second you'll get from certain instances. For example, https://aiven.io/blog/postgresql-12-gcp-aws-performance gives you a good idea of how PostgreSQL 12 performs.
Percona has blogged about performance benchmarks also: https://www.percona.com/blog/2017/01/06/millions-queries-per-second-postgresql-and-mysql-peaceful-battle-at-modern-demanding-workloads/
Here's another benchmark with useful information: http://dimitrik.free.fr/blog/posts/mysql-performance-80-and-sysbench-oltp_rw-updatenokey.html about MySQL 8.0 and links to 5.7 performance.
There are several blogs about SQL Server performance such as https://storagehub.vmware.com/t/microsoft-sql-server-2017-database-on-vmware-vsan-tm-6-7-using-vmware-cloud-foundation-tm/performance-test-results/ that can also help you recognize the workloads these databases can handle.
Under 10K tps shouldn't be much of a problem with modern hardware. You can start with a most common configuration on the cloud or a standard sized server in your own environment. Use SSDs. Optimize your server settings to gain more speed and be ready to add more resources gradually. As Gordon mentions, benchmark your database after you have installed it. I'd start with 32G memory, 8 cores and SSDs to pull 10K tps as a thumbrule and adjust from there.
As you assumed, a lot depends on the # and type of CPU/memory/SSD, your workload, how you structure data, latency between your app and database, reporting happening against the database, master/slave configuration, types of transactions, storage engines etc.

Are there any in-memory (persistent) solutions faster than Aerospike for a single-node?

I am working on a cloud application that requires low latency and very high read/writes per second. I will only have around 1 million records stored persistently but this may fluctuate largely as the application runs.
After YCSB benchmarking Aerospike and Redis, I found that Aerospike beats Redis and MongoDB both in terms of performance on a single-node for 60/40 read write.
Some points to note:
Fetching all my data using a single 32-bit integer key (no advanced queries)
Running on a single machine with 8 GB RAM and an SSD (small number of records)
Multiple clients need access to the database at once (via LAN)
I'm also assuming that key-value stores will outperform document stores and are the best fit considering I do not need advanced queries.
Before committing myself to Aerospike, are there any other solutions which may be more fit for my scenario considering that I am only running a single node with a small-ish amount of records?
Not that I'm aware of. I think Aerospike is the fastest.
However, for some use cases you can consider Tarantool.
Here's one of the benchmarks: https://medium.com/#rvncerr/tarantool-vs-competitors-racing-in-microsoft-azure-ebde9c5d619

Optimizing write performance of a 3 Node 8 Core/16G Cassandra cluster

We have setup a 3 node performance cluster with 16G RAM and 8 Cores each. Our use case is to write 1 million rows to a single table with 101 columns which is currently taking 57-58 mins for the write operation. What should be our first steps towards optimizing the write performance on our cluster?
The first thing I would do is look at the application that is performing the writes:
What language is the application written in and what driver is it using? Some drivers can offer better inherent performance than others. i.e. Python, Ruby, and Node.js drivers may only make use of one thread, so running multiple instances of your application (1 per core) may be something to consider. Your question is tagged 'spark-cassandra-connector' so that possibly indicates your are using that, which uses the datastax java driver, which should perform well as a single instance.
Are your writes asynchronous or are you writing data one at a time? How many writes does it execute concurrently? Too many concurrent writes could cause pressure in Cassandra, but not very many concurrent writes could reduce throughput. If you are using the spark connector are you using saveToCassandra/saveAsCassandraTable or something else?
Are you using batching? If you are, how many rows are you inserting/updating per batch? Too many rows could put a lot of pressure on cassandra. Additionally, are all of your inserts/updates going to the same partition within a batch? If they aren't in the same partition, you should consider batching them up.
Spark Connector Specific: You can tune the write settings, like batch size, batch level (i.e. by partition or by replica set), write throughput in mb per core, etc. You can see all these settings here.
The second thing I would look at is look at metrics on the cassandra side on each individual node.
What does the garbage collection metrics look like? You can enable GC logs by uncommenting lines in conf/cassandra-env.sh (As shown here). Are Your Garbage Collection Logs Speaking to You?. You may need to tune your GC settings, if you are using an 8GB heap the defaults are usually pretty good.
Do your cpu and disk utilization indicate that your systems are under heavy load? Your hardware or configuration could be constraining your capability Selecting hardware for enterprise implementations
Commands like nodetool cfhistograms and nodetool proxyhistograms will help you understand how long your requests are taking (proxyhistograms) and cfhistograms (latencies in particular) could give you insight into any other possibile disparities between how long it takes to process the request vs. perform mutation operations.

SQL Server query performance slows over time

I've seen this question asked in many ways all over the Internet but despite implementing the abundance of advice (and some voodoo), I'm still struggling. I have a 100GB+ database that is constantly inserting and updating records in very large transactions (200+ statements per trans). After a system restart, the performance is amazing (data is written to a large SATA III SSD connected via USB 3.0). The SQL Server instance is running on a VM running under VMWare Workstation. The host is set to hold the entire VM in memory. The VM itself has a paging cache of 5000 MB. The SQL Server user is set to 'hold pages in memory'. I have 5 GBs of RAM allocated to the VM, and the max memory of the SQL Server instance is set to half a Gig.
I have played with every single one of these parameters to attempt to maintain consistent performance, but sure and steady, the performance eventually degrades to the point where it begins to time out. Here's the kicker though, if I stop the application that's loading the database, and then execute the stored proc in the Management Studio, it runs like lightning, clearly indicating it's not an issue with the query, and probably nothing to do with memory management or paging. If I then restart the loader app, it still crawls. If I reboot the VM however, the app once again runs like lightning...for a while...
Does anybody have any other suggestions based upon the symptoms presented?
Depending on how large your hot set is, 5GB memory may just tax it for a 100+gb database.
Check indices and query plans. We can not help you without them. And I bet you miss some indices - which is the standard performance issue people have.
Otherwise, once you made your homework - head over to dba.stackexchange.com and ask there.
Generally - consider that 200 statements per transaction may simply indicate a seriously sub-optimal programming. For example you could bulk-load the data into a temp table then merge into the final one.
Actually, I may have a working theory. What I did was add some logic to the app that when it times out, sit for two minutes, and then try again, and voila! Back to full speed. I rubber-ducky'd my co-worker and came up with the concept that my perceived SSD write speeds were actually the write speed to the VMWare host's virtual USB 3 buffer, and that the actual SSD write speeds were slower. I'm probably hitting against the host's buffer size and by forcing the app to wait 2 minutes, the host has a chance to dump its back-buffered data to the SSD. Elementary, Watson :)
If this approach also fails to be sustainable, I'll report in.
Try executing this to determine your problem queries:
SELECT TOP 20
qs.sql_handle,
qs.execution_count,
qs.total_worker_time AS Total_CPU,
total_CPU_inSeconds = --Converted from microseconds
qs.total_worker_time/1000000,
average_CPU_inSeconds = --Converted from microseconds
(qs.total_worker_time/1000000) / qs.execution_count,
qs.total_elapsed_time,
total_elapsed_time_inSeconds = --Converted from microseconds
qs.total_elapsed_time/1000000,
st.text,
qp.query_plan
FROM
sys.dm_exec_query_stats as qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) as st
cross apply sys.dm_exec_query_plan (qs.plan_handle) as qp
ORDER BY qs.total_worker_time desc
Then check your estimated and actual execution plans on the queries this command helps you pinpoint.
Source How do I find out what is hammering my SQL Server? and at the bottom of the page of http://technet.microsoft.com/en-us/magazine/2007.11.sqlquery.aspx
Beyond the excellent indexing suggestions already given,
be sure to read up on parameter sniffing. That could be the cause of the problem.
SQL Server - parameter sniffing
http://www.sommarskog.se/query-plan-mysteries.html#compileps
As a result you could have a bad query plan being re-used, or SQL's buffer could be getting full and writing pages out of memory to disk (maybe that's other allocated memory in your case).
You could run DBCC FreeProcCache and DBCC FreeSystemCache to empty it and see if you get a performance boost.
You should give SQL more memory too - as much as you can while leaving room for other critical programs and the OS. You might have 5gb of Ram on the VM, but SQL is only getting to play with a 1/2 gb, which seems REALLY small for what you're describing.
If those things don't move you in the right direction, install the SQL Management Data Warehouse so you can see exactly what is happening when your slow down begins. Running it takes up additional memory, but you will give the DBA's more to go on.
In the end, what I did was a combination of two things, putting in logic to recover when timeouts occurred, and setting the host core count to only reflect physical cores, not logical cores, so for example, the host has 2 cores that are hyper-threaded. When I set my VM to use 4 cores, it occasionally gets hung in some infinite loop, but when I set it to 2 cores, it runs without fail. Still, aberrant behavior like this is difficult to mitigate reliably.

SQL HW to performance ration

I am seeking a way to find bottlenecks in SQL server and it seems that more than 32GB ram and more than 32 spindels on 8 cores are not enough. Are there any metrics, best practices or HW comparations (i.e. transactions per sec)? Our daily closure takes hours and I want it in minutes or realtime if possible. I was not able to merge more than 12k rows/sec. For now, I had to split the traffic to more than one server, but is it a proper solution for ~50GB database?
Merge is enclosed in SP and keeped as simple as it can be - deduplicate input, insert new rows, update existing rows. I found that the more rows we put into single merge the more rows per sec we get. Application server runs in more threads, and uses all the memory and processor on its dedicated server.
Follow a methodology like Waits and Queues to identify the bottlenecks. That's exactly what is designed for. Once you identified the bottleneck, you can also judge whether is a hardware provisioning and calibration issue (and if so, which hardware is the bottleneck), or if is something else.
The basic idea is to avoid having to do random access to a disk, both reading and writing. Without doing any analysis, a 50 GB database needs at least 50GB of ram. Then you have to make sure indexes are on a separate spindle from the data and the transaction logs, you write as late as possible, and critical tables are split over multiple spindles. Are you doing all that?