I have a MySQL table with an index covering two fields. If I insert new rows into the table, will my index still work?
Yes, the index is automatically updated; therefore indices (aka indexes) make inserts, updates and deletes slower.
I have a MySQL table with an index covering two fields. If insert new rows into the table, will my index still work?
Yes, the index is updated every time there is an insert/update/delete executed for that table.
Yep, it will still work. Indexes would be pretty useless if you couldn't use them after you had inserted new data. Be aware that the more data you add to your tables, the bigger your indexes will grow, so you want to be sure you create them properly and limit them to what you need them for.
Yes. Could you explain more why you thought it might not?
Note that if you also have individual indexes on one or both of the fields, mysql (at least some 5.0 versions I've used) won't necessarily use the combined index by default; when in doubt, use EXPLAIN to verify that your indexes are used as you expect.
As others have stated, indices are updated with each insert. It's part of the performance penalty for inserts and updates.
Actually this isn't as odd a question as some answerers allude. In the 1980s I engineered a system using an RDBMS which deferred updates "for performance" (according to the documentation). Ironically it was named RDM, originally for Realtime Data Manager (or something very close to that), but later backronymed Responsive Data Manager.
RDM seemed intended for humans speed updates, so deferring database writes until several rows were complete was probably acceptable. The system I engineered used the database backend wrapped by a database server connected to automatic data sources. AFAIK, this was the first application of RDM expecting rapid row inserts. One of the problem solutions was engineering a timestamp field so all the nodes in the system could say "update me with everything which has changed since yyyy-mm-dd hh:mm:ss". Turns out that was a worst-case index update/refresh scenario, and hell to diagnose. Once undocumented calls meaning trust me, I really want the data flushed and the indexes updated now were added, it worked pretty well.
Related
Given two scenarios on SQL Server 2008/2005 -
1 Table has 5 rows
2 Table has 1 million rows
If we need to update a few rows, what is is efficient and why?
1) UPDATE the required columns
2) DELETE the row and INSERT new row with the updated information
You should not be asking this question. You are asking "Is it better to do it the right way, or the wrong way, in the name of some nebulous idea of 'faster'?"
Do you have an application that is somehow too slow? Do you for some reason think that the problem is because your UPDATEs are taking too long? Have you done any measurement and benchmarking of the performance of your database interactions?
What you are doing is premature optimization of the worst kind, and you are doing your application a disservice by doing so. You are making wild guesses about how to speed up your code, with absolutely nothing to base it on.
Write your code right. Then try to find where you have a performance problem. Do you even HAVE a performance problem, or are you asking this question simply because you think it's something you should be asking about? You shouldn't.
Even if you specifically DID have a problem with your UPDATEs being too slow, we can't answer the question of "Is X faster than Y" because you have not given us nearly enough information, such as:
What database you are using
The table layouts
What indexes are on the database
How you're interfacing with the database
Please, write your code correctly, and then come back with specifics about what is too slow, rather than guessing at micro-optimizations.
Usually updating a single row will be faster. Reason being deleting existing row and inserting a new row, both of these operations will impact clustered index. Updating a single row will also have impact on various indices but not on clustered index. No data point to support my claim but logically DB engines should behave this way.
A similar question has been asked, but since it always depends, I'm asking for my specific situation separately.
I have a web-site page that shows some data that comes from a database, and to generate the data from that database I have to do some fairly complex multiple joins queries.
The data is being updated once a day (nightly).
I would like to pre-generate the data for the said view to speed up the page access.
For that I am creating a table that contains exact data I need.
Question: for my situation, is it reasonable to do complete table wipe followed by insert? or should I do update,insert?
SQL wise seems like DELETE + INSERT will be easier (INSERT part is a single SQL expression).
EDIT: RDBMS: MS SQL Server 2008 Ent
TRUNCATE will be faster than delete, so if you need to empty a table do that instead
You didn't specify your RDBMS vendor but some of them also have MERGE/UPSERT commands This enables you do update the table if the data exists and insert if it doesn't
It partly depends on how the data is accessed. If you have a period of time with no (or very few) users accessing it, then there won't be much impact on the data disappearing (between the DELETE and the completion of the INSERT) for a short while.
Have you considered using a materialized view (MSSQL calls them indexed views) instead of doing it manually? This could also have other performance benefits as an indexed view gives the query optimizer more choices when its constructing execution plans for other queries that reference the table(s) in the view.
It depends on the size of the table and the recovery model on the database. If you are deleting many hundreds of thousands of records and reinstating them vs updating a small batch of a few hundred and inserting tens of rows, it will add an unnecessary size to your transaction logs. However you could use TRUNCATE to get around this as it won't affect the transaction log.
Do you have the option of a MERGE/UPSERT? If you're using MS-SQL you can use CROSS APPLY to do something similar if you don't.
One approach to handling this type of problem is to insert into new table, then do a table Rename. This will insure that all new data is present at the same time.
What if some data that was present yesterdays is not anymore? Delete may be safer or you could end up deleting some records anyway.
And in the end it doesnt really matter which way you go.
Unless on the case #kevinw mentioned
Although I fully agree with SQLMenace's answer I do would like to point out that MERGE does NOT remove unneeded records ! If you're sure that your new data will be a super-set of the existing data, then MERGE is great, otherwise you'll either need to make sure that you delete any superfluous records later on, or use the TRUNCATE + INSERT method ...
(Personally I'm still a fan of the latter as it usually is quite fast, just make sure to drop all indexes/unique constraints upfront and rebuild them one by one. This has the benefit of the INSERT transaction being smaller and the index-adding being done in (smaller) transactions again later on). (**)
(**: yes, this might be tricky on live system, but then again he already mentioned this was done during some kind of overnight anyway, I'm extrapolating there is no user-access at that time)
I am creating a new DB and was wondering if there was any downside to adding numerous indexes to tables that I think may require one.
If I create an index but end up not utilizing it will it cause any issues?
Indexes make it faster to search tables, but longer to write to. Having unused indexes will end come causing some unnecessary slow down.
Each Index :
Takes some place, on disk, and in RAM
Takes some time to update, each time you insert/update/delete a row
Which means you should only define indexes that are useful for your application's needs : too many indexes will slow down the writes more than you'll gain from the reads.
Yes.
You should only add those indexes, that are necessary.
An index requires extra space, and, when inserting / updating / deleting records, the DBMS needs to update those indexes as well. So, this means that it takes more time to update/add/delete a record, since the DBMS has to do some extra administration.
adding numerous indexes to tables that
I think may require one.
You should only add those indexes for which you're sure that they're necessary.
To determine the columns where you could put indexes on, you could:
add indexes to columns that are
foreign keys
add indexes to columns that are often used in where clauses
add indexes to columns that are used in order by clauses.
Another -and perhaps better- approach, is to use SQL Profiler:
use SQL Profiler to trace your
application / database for a while
save the trace results
use the trace results in the Index Tuning Wizard, which will tell you which indexes you should create, what columns should be in each index, and it will also tell you the order of those columns for the index.
Indexes cause an increase in database size and the amount of time to insert/update/delete records. Try not to add them unless you know you will use them.
Having an index means INSERTs and UPDATEs take a bit longer. If you have too many indexes, then the benefit of faster search times can become not worth the extra INSERT and UPDATE time.
Yes; having an index makes selects faster but potentially makes inserts slower, as the indexes must be updated for each insert. If your application does a lot of writing and not much reading (e.g. an audit log) you should avoid extra indexing.
update and insert cost more as indexes need to be updated as well
more space used
Don't create any extra indices at the begining. Wait until you've at least partly developed the system so you can have an idea about the usage of the table. Generate query plans to see what gets queried (and how, and the performance costs) and THEN add new indices as needed.
Do not index blindly! Take a look at your data to see which columns are actually being used in SELECT predicates and index those.
Also consider that indexes take room. Sometimes a lot of room. I have seen databases where the indexing data far outweighed the raw data in sheer volume.
Extra space, Extra time to insert like everyone has said.
Also, you should be certain of your indexes and your design because sometimes indexes can actually slow down queries if the query optimizer chooses the wrong index. This is uncommon but can happen if by optimizing an index for a particular join and causing another join to actually become slower. I don't know about SQL Server but you'll find lots of tricks around for hinting the mySQL optimizer to build queries in specific ways to get around this.
DBA's get payed a lot of money to know about weird gotcha's like this with indexes(among other things) so yeah, there are downsides to adding lots of indexes so be careful. Lean heavily on your query profiler and don't just throw indexes blindly at problems.
Take a look at the columns used in your where clauses, look at columns used in joins if any.
Generally the very simplest rule of thumb.
Extraneous indexes as pointed out before will slow down your DML statements and are generally not recommended. Ideally, what I have done is finish the entire module and during your unit testing phase, ensure that you can do load analysis on the module and then check whether or not you are seeing slow downs and after analyzing where the slow downs are, add indexes.
I think it's been answered already, but basically indexes slow down inserts/updates as the index is updated when a new record is inserted (or an existing one updated).
Space is also a consideration, both memory and disk.
So for databases where a large amount of transactions are occurring, this will definitely have a noticeable performance impact (which is why performance tuning includes adding and removing indexes to optimize certain activities performed against the database).
Good luck
I'm performing some MySQL table maintenance that will mean removing some redundant columns and adding some new ones.
Some of the columns to drop are of the same type as ones to add. Would the procedure be faster if I took advantage of this and reused some of the existing columns?
My rationale is that changing column names should be a simple table metadata change, whereas removing and adding columns means either finding room at the end of the file (fragmenting data) or rebuilding every row with the correct columns so that they're at the same place on the disk.
The engine in question is MyISAM and I'm not up to scratch on how exactly it'll treat this so I'd like to hear from anyone who has been in the same situation before!
Unless you have a serious issue with performance, I wouldn't take the renaming approach - because of all the dirty data you're going to leave lying around.
Also, by dropping the table, you will cause any indexes to get re-built - which is a good idea every once in a while...
Martin
I would drop the columns. You will have fragmentation either way. That should be handled in your regular maintenance plans. You could accelerate those after a large number of modification operations.
If you don't know, in Myisam table, every ALTER TABLE operation will do a copy of entire table, thus the table will be locked for the time your server needs to copy the table.
I've used that same logic, and got stung because even with changes that are supposed to not require rewriting the table (i.e. a table rename), a MySQL bug caused it to think it was a change that required rewriting the table.
If the fields you are dealing with are date, datetime or timestamp fields, you are likely to be hit by this, which means that you should just assume it has to do a full rewrite and plan that way.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." (Donald Knuth). My SQL tables are unlikely to contain more than a few thousand rows each (and those are the big ones!). SQL Server Database Engine Tuning Advisor dismisses the amount of data as irrelevant. So I shouldn't even think about putting explicit indexes on these tables. Correct?
The value of indexes is in speeding reads. For instance, if you are doing lots of SELECTs based on a range of dates in a date column, it makes sense to put an index on that column. And of course, generally you add indexes on any column you're going to be JOINing on with any significant frequency. The efficiency gain is also related to the ratio of the size of your typical recordsets to the number of records (i.e. grabbing 20/2000 records benefits more from indexing than grabbing 90/100 records). A lookup on an unindexed column is essentially a linear search.
The cost of indexes comes on writes, because every INSERT also requires an internal insert to each column index.
So, the answer depends entirely on your application -- if it's something like a dynamic website where the number of reads can be 100x or 1000x the writes, and you're doing frequent, disparate lookups based on data columns, indexing may well be beneficial. But if writes greatly outnumber reads, then your tuning should focus on speeding those queries.
It takes very little time to identify and benchmark a handful of your app's most frequent operations both with and without indexes on the JOIN/WHERE columns, I suggest you do that. It's also smart to monitor your production app and identify the most expensive, and most frequent queries, and focus your optimization efforts on the intersection of those two sets of queries (which could mean indexes or something totally different, like allocating more or less memory for query or join caches).
Knuth's wise words are not applicable to the creation (or not) of indexes, since by adding indexes you are not optimising anything directly: you are providing an index that the DBMSs optimiser may use to optimise some queries. In fact, you could better argue that deciding not to index a small table is premature optimisation, as by doing so you restrict the DBMS optimiser's options!
Different DBMSs will have different guidelines for choosing whether or not to index columns based on various factors including table size, and it is these that should be considered.
What is an example of premature optimisation in databases: "denormalising for performance" before any benchmarking has indicated that the normalised database actually has any performance issues.
Primary key columns will be indexed for the unique constraint. I would still index all foreign key columns. The optimizer can choose to ignore your index if it is irrelevant.
If you only have a little bit of data then the extra cost for insert/update should not be significant either.
Absolutely incorrect. 100% incorrect. Don't put a million pointless indexes, but you do want a Primary Key (in most cases), and you do want it CLUSTERED correctly.
Here's why:
SELECT * FROM MySmallTable <-- No worries... Index won't help
SELECT
*
FROM
MyBigTable INNER JOIN MySmallTable ON... <-- Ahh, now I'm glad I have my index.
Here's a good rule to go by.
"Since I have a TABLE, I'm likely going to want to query it at some time... If I'm going to query it, I'm likely going to do so in a consistent way..." <-- That's how you should index the table.
EDIT: I'm adding this line: If you have a concrete example in mind, I'll show you how to index it, and how much of a savings you'll get from doing so. Please supply a table, and an example of how you plan in using that table.
I suggest that you follow the usual rules about indexing, which approximately means "create indexes on those columns that you use in your queries".
This might sound unnecessary with such a small database. As others have already said: as long as your database stays as small as you have described, the queries will be fast enough anyway, and the indexes aren't really needed. They can even slow down insertions and updates, but unless you have very specific requirements there, it doesn't matter with such a small database.
But, if the database grows (which databases sometimes have a tendency to do) you don't have to remember to add indexes to that old database that you've probably forgotten about by then. Maybe it has even been installed at one your customers, and you can't modify it!
I guess what I'm saying is this: indexes should be such a natural part of your database design, that it is the lack of indexes that is the optimization, premature or not.
It depends. Is the table a reference table?
There are tables of a thousand rows where the absence of an index, and the resulting table scans can make the difference between a fairly simple operation delaying the user by 5 minutes instead of 5 seconds. I have seen exactly this problem, using a DBMS other than SQL Server.
Generally, if the table is a reference table, updates on it will be relatively rare. This means that the performance hit for updating the index will also be relatively rare. If the optimizer passes over the index, the performance hit on the optimizer will be negligible. The space needed to store the index will also be negligible.
If you declare a primary key, you should get an automatic index on that key. That automatic index will almost always do you enough good to justify its cost. Leave it in there. If you create a reference table without a primary key, there are other problems in your design methodology.
If you do frequent searches or frequent joins on some set of columns other than the primary key, an additional index might pay for itself. Don't fix that problem unless it is a problem.
Here's the general rule of thumb: go with the default behavior of the DBMS, unless you find a reason not to. Anything else is a premature preoccupation with optimization on your part.
If the rows have narrow width, and a few thousand rows fit on say 10-20 8K pages, it is unlikely that the SQL optimiser would elect to use an index even if you create one.
Put indexes ONLY if you have to :)
There are times when putting indexes can actually hurt performance, depending on what the table is used for...
So, in other words, you would think about putting indexes on tables when it is necessary as determined by profiling the application.
Indexes are often created implicitly when using UNIQUE constraints. I wouldn't try to avoid their use in that case!
As a general rule of thumb, it's good to avoid smaller indexes as they typically won't be used.
But sometimes they can provide a huge boost as I outlined here.
I guess there is an auto indexing on the primary key of the table which should be sufficient when querying on a table with less data.
So, yes explicit indexes can be avoided in case there is a small data set to be worked upon.
Even if you have an index, SQL Server might not even use it, depending on the statistics for that table. And if you plan to put in an index for a report that will run at most a couple times a year, bear in mind that the INSERT/UPDATE penalties for adding the index will be in effect ALL THE TIME. Before adding an index, ask yourself if it is worth the performance penalty.
You have to understand that based upon the query two lookups may be done, one into the index to get the pointer to the row, the next to the row itself. If the data that is being queried is in the index columns that extra step may not be necessary.
It is entirely possible that double dipping for data may be slower even if the optimizer goes after the index. Whether or not we care is up to application profiling and eventual explain plans.