Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
For what practical purposes would I'd potentially need to add an index to columns in my table? What are they typically needed for?
Indexes are database structures that improve the speed of retrieving data from the columns they are applied on. The wikipedia article on the subject gives a pretty good overview without going in to too much implementation-specific details.
Basic indexes have two common uses.
They speed up queries.
They implement unique constraints (and hence help define primary keys).
In addition, specialized indexes can enable functionality in some databases, in particular, text search and GIS queries.
Indexing columns speeds up queries on tables with many rows.
Indexes allow your database to search for the desired row using searching algorithms like binary search.
This would only be helpful if you had a large number of rows, for example 16 or more (this number is taken from the quicksort algorithm, which says if sorting 16 or less items, just do an insertion sort). Otherwise there would be negligible performance gain compared to a plain linear search.
If a table had 100 rows and you wanted to find the 80th row, without indexes, it might take 80 operations to find the 80th row. However with indexes, assuming they enable something like binary search, you could find the 80th row in something like 10 or less operations.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I understand that it is not advisable to create indexes on tables that will be frequently updated. However, does the same hold true for other DML operations? Is it recommended to create an index on a table that will have frequent INSERT and DELETE operations performed on it?
Indexing overhead is highly dependent on table size, complexity and the number and size of the various INSERT/UPDATE/DELETE operations.
Sometimes it's faster to drop the indexes, perform the operations then recreate the indexes than it is to perform the operations with the indexes intact.
Other times it's slower.
You also need to weigh this against the impact on any SELECT operations that would be going on at the same time.
"Premature optimization is the root of all evil (or at least most of
it) in programming" (Knuth, 1974 Turing Award Lecture).
Until you're faced with actual performance problems that can't be fixed by improving your query, I'd ignore all the fringe-last-ditch-effort options like "not having indexes". Having the right indexes is almost always a performance improvement in normal operations.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Theoretical:
I've been tasked with building a database to contain guidelines for manufacturing purposes. The guidelines to be returned are based on 42 input values and are all specific to the particular combination of inputs.
I plan on indexing all of these columns and realize that it will be resource intensive if I have to rebuild or re-index.
What design approaches have I not considered? What potential problems exist with the approach of creating a unique constraint on 42 columns? Does anyone have any experience with this sort of a design or any insights?
Thanks for any help!
A good reason for not doing it is that SQL Server doesn't support it:
Up to 32 columns can be combined into a single composite index key.
(documentation here).
It seems unlikely that you really need a single composite index with 42 columns. But, you can't have one even if that were desirable.
Put index only on columns which will be searched/sorted.
Add simple autoincrement index.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm currently in disagreement with my colleague regarding the best design of our database.
We have a need to frequently access the total user balance from our database of transactions, we will potentially need to access this information several times a second.
He says that sql is fast and all we need to do is SUM() the transactions. I, on the other hand, believe that eventually with enough users and a large database our server will be spending most of its time summing the same records in the database. My solution is to have a separate table to keep a record of the totals.
Which one of us is right?
That is an example for database denormalization. It makes the code more complex and introduces potential for inconsistencies, but the query will be faster. If that's worth it depends on the need for the performance boost.
The sum could also be quite fast (i.e. fast enough) if it can be indexed properly.
A third way would be using cached aggregates that are periodically recalculated. Works best if you don't need real-time data (such as for account activity up until yesterday, which you can maybe augment with real-time data from the smaller set of today's data).
Again, the tradeoff is between making things fast and keeping things simple (don't forget that complexity also tend to introduce bugs and increase maintenance costs). It's not a matter of one approach being "right" for all situations.
I don't think that one solution fits all.
You can go very far with a good set of indexes and well written queries. I would start with querying real time until you can't, and then jump to the next solution.
From there, you can go to storing aggregates for all non changing data (for example, beginning of time up to prior month), and just query the sum for any data that changes in this month.
You can save aggregated tables, but how many different kinds of aggregates are you going to save? At some point you have to look into some kind of a multi dimensional structure.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a relational database that I am going to be restructuring and one of the tables has over 100 columns in it. To me that seems a little much, but I don't know the best practices. This table is queried a lot and needs to be fast and about half the time all columns are needed. My question is should I split this table into smaller tables that can be joined together when needed or keep it as is?
Any advice would be appreciated.
I believe it'd be better to leave it as one big table. Since the queries are typically returning all columns, I presume inserts also insert to all columns. Splitting it into multiple tables would potentially make inserts a nightmare. Plus, having one big table saves the DB from having to do a lot of work to put a bunch of small tables back together on every query.
You mileage will vary depending on your specific conditions and could vary according to your specific DBMS. Try both ways and measure.
If I needed all the columns most of the time, I would guess that keeping everything in one table would be more efficient than joining multiple tables. I have had cases where I de-normalized data to decrease the number of tables, which made the queries much faster.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I would like to speed up our SQL queries. I have started to read a book on Datawarehousing, where you have a separate database with data in different tables etc. Problem is I do not want to create a separate reporting database for each of our clients for a few reasons:
We have over 200, maintenance on these databases is enough
Reporting data must be available immediately
I was wondering, if i could simply denormalize the tables that i report on, as currently there are a lot of JOINs and believe these are expensive (about 20,000,000 rows in tables). If i copied the data into multiple tables, would this increase the performance by a far bit? I know there are issues with data being copied all over the place, but this could also be good for a history point of view.
Denormalization is no guarantee of an improvement in performance.
Have you considered tuning your application's queries? Take a look at what reports are running, identify places where you can add indexes and partitioning. Perhaps most reports only look at the last month of data - you could partition the data by month, so only a small amount of the table needs to be read when queried. JOINs are not necessarily expensive if the alternative is a large denormalized table that requires a huge full table scan instead of a few index scans...
Your question is much too general - talk with your DBA about doing some traces on the report queries (and look at the plans) to see what you can do to help improve report performance.
The question is very general. It is hard to answer whether denormalization will increase performance.
Basically, it CAN. But personally, I wouldn't consider denormalizing as a solution for Reporting issues. In my practice business people love to build huuuge reports which would kill OLTP DB in the least appropriate time. I would continue reading Datawarehousing :)
Yes for OLAP application your performance will improve by denormalization. but if you use same denormalized table for your OTLP application you will see a performance bottleneck over there. I suggest you too create new denormlize tables or materialized view for your reporting purpose and also you can incremently fast refresh your MV so you will get reporting data immediately.