IndexWriter takes long time on close/commit and flush - lucene

I am using a nfs share for Lucene indexes, where close/commit and flush take very long time. Is there a way to make it run bit faster?

Related

Lucene.Net 3.0.5 - Near Real Time Search, reopening reader performance

I'm indexing a sequence of documents with the IndexWriter and commiting the changes at the end of the iteration.
In order to access the uncommitted changes I'me using NRTS as described here
Imagine that I'm indexing 1000 documents and iterating through them to check if there's any I can reuse/update. (some specific requirements I have)
I'm reopening the reader at each iteration:
using (var indexReader = writer.GetReader())
using (var searcher = new IndexSearcher(indexReader))
How slow should it be to reopen the reader? Once the index gets to around 300K documents, Occasionally, indexing 1000 documents can take around 60 seconds (not much text)
Am I taking the wrong approach? Please advise.
To increase your performance, you need to not optimize so often.
I use a separate timer for optimization. Every 40 minutes it enables optimization to five segments (a good value according to "Lucene In Action"), which then occurs if the indexer is running (there being no need to optimize if the indexer is shut down). Then, once a day, it enables optimization to one segment at a very low usage time of day. I usually see about 5 minutes for the one-segment optimization. Feel free to borrow my strategy, but in any case, don't optimize so often - your optimization is hurting your overall index rate, especially given that your document size is small, and so the 500 doc iteration loop must be happening frequently.
You could also put in some temporary logging code at the various stages to see where your indexer is spending its time so you can tweak iteration size, settling time between loops (if you're paranoid like me), optimization frequency, etc.

Partial index creation takes the CPU down

I am trying to create an index on a table with a couple of million entries. Unfortunately, whenever I try it, the CPU goes up and I have to kill it around 90% CPU utilization, because otherwise it would harm production.
What can I do to create the index then? It's a partial index. I have already set maintenance_work_mem to 2GB. I can't really change checkpoint_segments while the database is running. CREATE INDEX CONCURRENTLY would take the database just down quicker.
So what else could I do?
Index creation certainly can't hit 90% CPU on a modern multicore system (mainly because it's using just one core). What's more likely you're blocking all the queries against the table. Please try building the index CONCURRENTLY (manual).

Why does autovacuum: VACUUM ANALYZE (to prevent wraparound) run?

I have an autovacuum VACUUM ANALYZE query running on a table, and it always takes many hours, even a couple of days to finish. I know Postgres runs autovacuum jobs occasionally to perform cleanup and maintenance tasks, and it's necessary. However, most tables simply have an VACUUM, not a VACUUM ANALYZE.
Why does this specific table require a vacuum analyze, and how can I resolve the issue of it taking so long?
On a separate note, I did not notice this vacuum analyze query running before a few days ago. This is when I was attempting to create an index, and it failed prematurely saying it ran out of open files (or something like that). Could this contribute to the vacuum analyze running for so long?
Upgrading from PG 9.1 to PG 9.5 forced a situation where a number of tables reached their XID freeze limit. As a result, the running system is running autovacuum processes on a number of tables, many of them indicating '(to prevent wraparound)'. This has been a very busy database up to this point, so I am not surprised.
Since I can't force autovacuum to not carry this out, and since it's a bad idea to do so, I reconfigured the otherwise idle database to run autovacuum at a high rate of activity so it will complete faster (hopefully) and we can get back to business.
I set the following temporarily in my postgres.conf and it seems to be working quite well. Really gets the I/O cranking. I am leaving out the additional settings that optimize the WAL size and transactions, since that is highly system dependent:
# TEMPORARY -- aggressive autovacuum
autovacuum_max_workers = 16 # max number of autovacuum subprocesses
autovacuum_vacuum_cost_delay = 4ms # default vacuum cost delay for
# autovacuum, in milliseconds;
autovacuum_vacuum_cost_limit = 10000 # default vacuum cost limit for autovacuum
I stop and start the db server and then monitor the transactions occurring using a shell call like so:
watch -d -n 300 psql -c "select query from pg_stat_activity;"
I think the VACUUM ANALYZE is a red herring. The table came due for both a VACUUM and an ANALYZE at the same time, so it is doing a VACUUM ANALYZE, but I really doubt that the ANALYZE is contributing to the problem at all.
I wonder if the "VACUUM (to prevent wrap around)" is ever finishing, or if it getting interrupted part way through and therefore restarting without ever making real progress. A good inspection of your log files should help clarify this (as well as help clarify exactly what that thing about running out of open files was about).
Also, based on the size of the table and your settings for cost-based vacuuming, you should be able to estimate how long the vacuum should take and compare that how long it is actually taking.
Also, the transaction throughput on your system is very relevant to wrap-around issues. Wraparound vacuums should be very rare, unless your database is extraordinarily active.

Same Length Sql on Local and Production

I have a sql local and on production servers which is of same length. When I test sql on local it takes about 2 seconds to run and when i run the same thing on production or server it takes about 7 seconds to run.
Why so much difference?
the primary factor responsible for variation of SQL response time (especially when running the same query a few times in a row) is caching. Actually, there may be several caching effects at play at the same time:
Code caching (next time you issue the same query you won't have to do the hard parse -- saves time and resources)
Data caching, first of all
a) database-level caching (use of buffer cache)
but also
b) OS-level caching
or even
c) hardware-level caching
You can determine what exactly is going on by enabling autotrace and analyzing its output. If the first time you see a lot of recursive calls, and none (or much less) subsequently, that tells you about code caching (cursor sharing preventing you from parsing every time). If the first time you see a lot of physical reads, but much fewer subsequently, then it's database buffer cache at play. If the number of physical reads stays the same, but elapsed time changes, then it could be low-level data caching (OS or hardware).
There are, of course, other factors that may affect elapsed time -- such as database workload -- but if you are observing this over a short period of time, then it's probably not them.
There are two more ways to get this fixed, what you do is run the proc through cache tables it will make it faster, or just indexed it

Lucene experts: how best to run diagnostics against an IndexWriter to resolve performance issues?

I've got an index that currently occupies about 1gb of space and has about 2.5 million documents. The index is stored on a solid-state drive for speed. I'm adding 2500 documents at a time and committing after each batch has been added. The index is a "live" index and needs to be kept up-to-date throughout the day and night, so minimising write speeds is very important. I'm using a merge factor of 10 and am never calling Optimize(), rather allowing the index to optimize itself as needed based on the merge factor.
I need to commit the documents after each batch has been added because I record this fact so that if the app crashes or restarts, it can pick up where it left off. If I didn't commit, the stored state would be inconsistent with what's in the index. I'm assuming my additions, deletions and updates are lost if the writer is destroyed without committing.
Anyway, I've noticed that after an arbitrary period of time, which could be anywhere from two minutes or two hours and some variable number of previous commits, the indexer seems to stall on the IndexWriter.AddDocument(doc) method and I can't for the life of me figure out why it's stalling or how to fix it. The block can stay in place for upwards of two hours, which seems strange for an index taking up less than 2GB in the low millions of documents and having an SSD drive to work with.
What could cause AddDocument to block? Are there any Lucene diagnostic utilities that could help me? What else could I look for to track down the problem?
You can use IndexWriter.SetInfoStream() to redirect diagnostics output to a stream that might give you a hint of what's wrong.