I have 4 lucene index with Hibernate-search. Each has 2 million document. Recently I need to add #Facet fields. But whole index rebuild time is too slow.
No, you have to rebuild the index.
The process of rebuilding the index does indeed take some time, but with some tuning you can speed it up significantly.
Since there are many situations in which you'll need to rebuild the index, it's worth spending a bit of time to investigate on how to make it fast enough to be acceptable: you will need this also in case of disaster recovery.
I have an Azure SQL Database that has proved pretty successful so far. It's about 20 months old, no maintenance done... but it has handled a lot. Some tables have millions of rows, and when querying on columns that are indexed, query response times are acceptable when using the web application that talks to it.
However, I read conflicting advice on rebuilding indexes.
This guy says there is no point in doing it: http://beyondrelational.com/modules/2/blogs/76/posts/15290/index-fragmentation-in-sql-azure.aspx
This guy says go ahead rebuild:
https://alexandrebrisebois.wordpress.com/2013/02/06/dont-forget-about-index-maintenance-on-windows-azure-sql-database/
I have run some rebuild index statements on some of the smaller tables storing a few thousand rows. Some of the fragmentation would drop by about 1/2... then if I run it a second time, it might go down by about
these rebuilds ran in about 2-10 seconds depending on size of table...
Then I ran an index that had the following fragmentation on a table that has about 2 million rows:
PK__tmp_ms_x__CDEC17C03A4CDB46 55.2335782060565
PK__tmp_ms_x__CDEC17C03A4CDB46 0
IX_this_is_my_fk_index 15.7538877620014
It took 33 minutes.
The result was
PK__tmp_ms_x__CDEC17C03A4CDB46 0.01
PK__tmp_ms_x__CDEC17C03A4CDB46 0
IX_this_is_my_fk_index 0
Questions:
Query speeds have not really changed since doing the above. Is this normal?
Given that there are many things I have no control over in SQL Azure, does it even make sense to Rebuild indexes?
BTW: I am not and never have been a DBA... just a developer
Rebuilding indexes will matter if the indexes are actually being used. If the index isnt being used for the query youre running, then you wont see a difference. If its only lightly being used, you'll see a minor difference if you run stats. If its being heavily used, you should see a good performance increase, most of the time. The other thing to note with Microsoft SQL is that index fragmentation is sometimes irrelevant. Usually when I'm choosing whether or not to rebuild an index, im looking at the page count combined with fragmentation. If im running a query and i'm having performance issues, and im using the index, and the index has more than 16000 pages, and the index is more than 50% fragmented, ill rebuild it. If the table is small or if I can use the online option, i will just go ahead and rebuild all of them at the same time..
Specifically for Azure, my opinion is that if you are trying to improve performance, its still a good step to take because its so easy, even if you cant be sure of the results. Whether or not its a shared service and whether or not you can control the hardware layer, reviewing the index fragmentation and rebuilding the indexes are something you have access to, so why not make use of it?
So I guess the short answer is yes, in certain situations.
What I would suggest, rather than manually reviewing indexes and rebuilding them, is set up a nightly or weekly job that runs when your db is least active. Have it go through all the tables and rebuild the indexes. You can also give it a set running time if you have lots of tables, and then make it "stateful" (you can use a table to retain progress info) so it remembers where it left off and resumes at the next scheduled run.
I have a database that was recreated each night with about 10 columns and 1,000,000 rows. The data is completely deleted and re-inserted.
I have full text search on this table turned on and I rebuild it after each night.
Recently I noticed that my indexes get extremely fragmented by doing this. Is rebuilding my indexes after each insert the solution to this?
I'm trying to speed up my search and filtering of the data, should I just rebuild the indexes each night as well as the full text index?
Is the fragmentation really slowing me down that much?
I'm also open to other ways to improve performance on this table on a nightly basis if you have any suggestions.
We currently have a SQL Agent Job that runs once a week to identify highly fragment indexes and rebuild them. For certain large indexes on large tables, this ends up causing the system to timeout, as the index is unavailable during the rebuild.
We have identified a strategy that should significantly reduce the fragmentation that occurs, but that won't be implemented for some time, and it doesn't cover everything.
We checked in to upgrading to the Enterprise edition, which allows for online index rebuilding. However, the cost is prohibitive for us at this point.
The indexes don't really change that much, so we can assume that they are static, at least for the most part.
I did envision a way that we could perhaps simulate the online index rebuilding. It could work as follows
For each of the large indexes identified, run a script to:
Check the fragmentation and proceed if it exceeds a certain threshold.
Create a new index, entitled CurrentIndex_TEMP.
Initiate a rebuild on the index.
Remove the temporary index.
It seems that once the temporary index has been built, it would be possible to rebuild the other index without causing any downtime, since SQL Server would have another index that would then be available to use on queries that would have otherwise used the other query.
Iterating through this for each index would hopefully minimize the increase in overall index size, as each temporary index would be removed before any other temporary indexes were created.
This strategy would also retain the historical data on the indexes. I had originally considered a strategy of first renaming the current index, then creating it again with the original name, and then removing the index that had been renamed. This, however, would result in a loss of history.
So, my question...
Is this a feasible strategy? Are there any significant problems I may run into? I understand that this will take some manual oversight from time to time, but I'm willing to accept that at this point.
Thanks for the help.
Any offline index rebuild with lock the table so you don't gain anything by creating a duplicated index.
With great effort your can simulate online index rebuilds. You have to rebuild all indexes on the table at once.
Create a copy of the table T with identical schema ("T_new")
Rename T to T_old
Create a view T defined as select * from T_old and set up INSTEAD OF DML triggers which perform all DML on both T_old and T_new
In a background job copy over batches from T_old to T_new using the MERGE statement
Finally, after the copy is completed, perform some renaming and dropping to make T_new the new T
This requires insanely high effort and good testing. But you can realize pretty much arbitrary schema changes with this online.
On oracle 10gr2, I have several sql queries that I am comparing performance. But after their first run, the v$sql table has the execution plan stored for caching, so for one of the queries I go from 28 seconds on first run to .5 seconds after.
I've tried
ALTER SYSTEM FLUSH BUFFER_CACHE;
After running this, the query consistently runs at 5 seconds, which I do not believe is accurate.
Thought maybe deleting the line item itself from the cache:
delete from v$sql where sql_text like 'select * from....
but I get an error about not being able to delete from view.
Peter gave you the answer to the question you asked.
alter system flush shared_pool;
That is the statement you would use to "delete prepared statements from the cache".
(Prepared statements aren't the only objects flushed from the shared pool, the statement does more than that.)
As I indicated in my earlier comment (on your question), v$sql is not a table. It's a dynamic performance view, a convenient table-like representation of Oracle's internal memory structures. You only have SELECT privilege on the dynamic performance views, you can't delete rows from them.
flush the shared pool and buffer cache?
The following doesn't answer your question directly. Instead, it answers a fundamentally different (and maybe more important) question:
Should we normally flush the shared pool and/or the buffer cache to measure the performance of a query?
In short, the answer is no.
I think Tom Kyte addresses this pretty well:
http://www.oracle.com/technology/oramag/oracle/03-jul/o43asktom.html
http://www.oracle.com/technetwork/issue-archive/o43asktom-094944.html
<excerpt>
Actually, it is important that a tuning tool not do that. It is important to run the test, ignore the results, and then run it two or three times and average out those results. In the real world, the buffer cache will never be devoid of results. Never. When you tune, your goal is to reduce the logical I/O (LIO), because then the physical I/O (PIO) will take care of itself.
Consider this: Flushing the shared pool and buffer cache is even more artificial than not flushing them. Most people seem skeptical of this, I suspect, because it flies in the face of conventional wisdom. I'll show you how to do this, but not so you can use it for testing. Rather, I'll use it to demonstrate why it is an exercise in futility and totally artificial (and therefore leads to wrong assumptions). I've just started my PC, and I've run this query against a big table. I "flush" the buffer cache and run it again:
</excerpt>
I think Tom Kyte is exactly right. In terms of addressing the performance issue, I don't think that "clearing the oracle execution plan cache" is normally a step for reliable benchmarking.
Let's address the concern about performance.
You tell us that you've observed that the first execution of a query takes significantly longer (~28 seconds) compared to subsequent executions (~5 seconds), even when flushing (all of the index and data blocks from) the buffer cache.
To me, that suggests that the hard parse is doing some heavy lifting. It's either a lot of work, or its encountering a lot of waits. This can be investigated and tuned.
I'm wondering if perhaps statistics are non-existent, and the optimizer is spending a lot of time gathering statistics before it prepares a query plan. That's one of the first things I would check, that statistics are collected on all of the referenced tables, indexes and indexed columns.
If your query joins a large number of tables, the CBO may be considering a huge number of permutations for join order.
A discussion of Oracle tracing is beyond the scope of this answer, but it's the next step.
I'm thinking you are probably going to want to trace events 10053 and 10046.
Here's a link to an "event 10053" discussion by Tom Kyte you may find useful:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:63445044804318
tangentially related anecdotal story re: hard parse performance
A few years back, I did see one query that had elapsed times in terms of MINUTES on first execution, subsequent executions in terms of seconds. What we found was that vast majority of the time for the first execution time was spent on the hard parse.
This problem query was written by a CrystalReports developer who innocently (naively?) joined two humongous reporting views.
One of the views was a join of 62 tables, the other view was a join of 42 tables.
The query used Cost Based Optimizer. Tracing revealed that it wasn't wait time, it was all CPU time spent evaluating possible join paths.
Each of the vendor supplied "reporting" views wasn't too bad by itself, but when two of them were joined, it was agonizingly slow. I believe the problem was the vast number of join permutations that the optimizer was considering. There is an instance parameter that limits the number of permutations considered by the optimizer, but our fix was to re-write the query. The improved query only joined the dozen or so tables that were actually needed by the query.
(The initial immediate short-term "band aid" fix was to schedule a run of the query earlier in the morning, before report generation task ran. That made the report generation "faster", because the report generation run made use of the already prepared statement in the shared pool, avoiding the hard parse.
The band aid fix wasn't a real solution, it just moved the problem to a preliminary execution of the query, when the long execution time wasn't noticed.
Our next step would have probably been to implement a "stored outline" for the query, to get a stable query plan.
Of course, statement reuse (avoiding the hard parse, using bind variables) is the normative pattern in Oracle. It mproves performance, scalability, yada, yada, yada.
This anecdotal incident may be entirely different than the problem you are observing.
HTH
It's been a while since I worked with Oracle, but I believe execution plans are cached in the shared pool. Try this:
alter system flush shared_pool;
The buffer cache is where Oracle stores recently used data in order to minimize disk io.
We've been doing a lot of work lately with performance tuning queries, and one culprit for inconsistent query performance is the file system cache that Oracle is sitting on.
It's possible that while you're flushing the Oracle cache the file system still has the data your query is asking for meaning that the query will still return fast.
Unfortunately I don't know how to clear the file system cache - I just use a very helpful script from our very helpful sysadmins.
FIND ADDRESS AND HASH_VALUE OF SQL_ID
select address,hash_value,inst_id,users_executing,sql_text from gv$sqlarea where sql_id ='7hu3x8buhhn18';
PURGE THE PLAN FROM SHARED POOL
exec sys.dbms_shared_pool.purge('0000002E052A6990,4110962728','c');
VERIFY
select address,hash_value,inst_id,users_executing,sql_text from gv$sqlarea where sql_id ='7hu3x8buhhn18';