I have 60 columns in a table.
1). I want to add one more column to that table. Will there be any impact on performance?
2). How many columns can I add?
3). any idea for the avoid recursion. [I have no idea here - annakata]
Yes, but one more column is less of a problem than the fact that you already have 60.
I bet most of them are nullable?
With very wide tables (many columns) it becomes harder to write maintainable SQL. You are forced to deal with lots of exceptions due to the NULLS.
See also this post which asks how many fields is too many?
Will there be any impact on performance?
If you are adding a "Notes" type TEXT column, or a Blob storing the Image of a User, and most/many of your queries are
SELECT * FROM MyTable
then you will definitely be creating a performance issue.
If you always explicitly only name the columns your query needs, like:
SELECT Col1, ColX, ColN
FROM MyTable
then adding a new column will have little to no impact on performance - but wider rows mean fewer records per data page, so there is SOME impact, and if you are adding an Index to the new column then that index has to be maintained too - but if your application needs it then that is a necessary "cost".
We have plenty of tables with > 60 columns. However, I would like to think that that is By Design, rather than because the table has just grown willy-nilly.
If I were you I would be less concerned with the fact that you have to add one more column, and more concerned with deciding whether 60 columns is appropriate.
Related
I have a table with 49 columns and I need to add two more. I could add another table related to that one with those new columns to avoid making the table bigger and bigger.
However I would like to know how much would affect performance having 2 more columns in that table if they are not used in joins?
Is there really a difference in performance doing the join of table A with table B if A has 4 columns or 100 if you only use 3 of them?
Also the table is not highly populated, it doesn't even have 500 rows but would like to know as the DBA doesn't like it just to understand his point of view.
Thanks.
EDIT:
I'll edit to explain that my only work in this table is to add 2 more columns to the existing 49 that the table currently holds and that they will be bit columns. So that's why I wanted to know if increasing the columns size would impact the performance at all assuming they never do a select * when they join with that table.
I think the best answer is: does these new columns will be empty most of the time for your rows ?
Yes: Maybe you can add these columns to the main table. Depends if you needs them most of the time you select rows in this table.
No: Create a new table and join. Empty column for each row is useless disk space.
NB: 50 columns seems like horrible anyway...
Adding these two columns to your table should not significantly impact performance, in particular as your table stores less than 500 rows. Saying that, your DBA does not like this as it does't follow best practices for table design, in particular if many column values are NULL/empty and this design will not scale well. However, unless you anticipate that this table is going to grow in size rapidly, adding two columns should not pose a performance problem.
If you add another table then I assume you will have to use joins to access that data properly. Really this could end up costing more than the two new attributes would added to the single table.
If your table could be refactored then that would be the best option, but if not then you would only lose efficiency by attempting to do so. Dont make a second table simply to stay under the 50 attributes. Two columns added to 49 is not going to be an unworkable load by any measure, but there could be other reasons to redesign your table. If you have a bunch of empty or null cells then you are wasting resources and giving your system more work to do, finding a way to eliminate these would undoubtedly have a greater effect on performance than adding a column or two
This question already has answers here:
Why is SELECT * considered harmful?
(16 answers)
Closed 9 years ago.
I was wondering which is best practice. Lest say I have a table with 10+ columns and I want to select data from it.
I've heard that 'select *' is better since selecting specific columns makes the database search for these columns before selecting while selecting all just grabs everything. On the other hand, what if the table has a lot of columns in it?
Is that true?
Thanks
It is best practice to explicitly name the columns you want to select.
As Mitch just said the performance isn't different. I even heard that looking up the actual columns names when using * is slower.
But the advantage is that when your table changes then your select does not change when you name your columns.
I think these two questions here and here have satisfactory answers.
* is not better, actually it is slower is one reason that select * is not good. In addition to this, according to OMG Ponies, select * is anti-pattern. See the questions in the links for detail.
selecting specific columns is better as it is raises the probability that SQL Server can access the data from indexes rather than querying the table data.
It's also require less changes, since any code that consumes the data will be getting the same data structure regardless of changes you make to the table schema in the future.
Definetly not. Try making a SELECT * from a table which has millions of rows and tens of columns.
The performance with SELECT * will be worse.
It depends on what you're about to do with the result. Selecting unnecessary data is not a good practice either. You wouldn't create a bunch of variables with values you would never use. So selecting many columns you don't need is not a good idea either.
It depends.
Selecting all columns can make query slower because of need of reading all columns from disk -- if there are a lot of string columns (which are not in index) then it can have huge impact on query (IO) performance. And from my practise -- you rely need all columns.
From the other hand -- for small database with a few user and good enough hardware it's much easier to select just all columns -- especially if schema changes often.
However -- I would always recommended to explicitly select columns to make sure it doesn't hurt performance.
I have the following table structure:
EVENT_ID(INT) EVENT_NAME(VARCHAR) EVENT_DATE(DATETIME) EVENT_OWNER(INT)
I need to add the field EVENT_COMMENTS which should be a text field or a very big VARCHAR.
I have 2 places where I query this table, one is on a page that lists all the events (in that page I do not need to display the event_comments field).
And another page that loads all the details for a specific events, which I will need to display the event_comments field on.
Should I create an extra table with the event_id and the event_comments for that event? Or should I just add that field on the current table?
In other words, what I'm asking is, if I have a text field in my table, but I don't SELECT it, will it affect the performance of the queries to my table?
Adding a field to your table makes it larger.
This means that:
Table scans will take more time
Less records will fit into a page and hence into the cache, thus increasing the risk of cache misses
Selecting this field with a join, however, would take more time.
So adding this field into this table will make the queries which don't select it run slower, and those which do select it run faster.
Yes, it affect the performance. At least, according to this article published yesterday.
According to it, if you don't want to suffer performance issues, it's better to put them in a separate table and JOIN them when needed.
This is the relative section:
Try to limit the number of columns in
a table. Too many columns in a table
can make the scan time for queries
much longer than if there are just a
few columns. In addition, if you have
a table with many columns that aren't
typically used, you are also wasting
disk space with NULL value fields.
This is also true with variable size
fields, such as text or blob, where
the table size can grow much larger
than needed. In this case, you should
consider splitting off the additional
columns into a different table,
joining them together on the primary
key of the records
You should put in on the same table.
Yes, it probably will affect other queries on the same table, and you should probably do it anyway, as you probably don't care.
Depending on the engine, blobs are either stored inline (MyISAM), partially off-page (InnoDB) or entirely off-page (InnoDB Plugin, in some cases).
These have the potential to decrease the number of rows per page, and therefore increase the number of IO operations to satisfy some query.
However, it is extremely unlikely that you care, so you should just do it anyway. How many rows does this table have? 10^9 ? How many of them have non-null values for the blob?
It shouldn't be too much of a hit, but if you're worried about performance, you should always run a few benchmarks and run EXPLAINs on your queries to see the true effect.
How many events are you expecting to have?
Chances are that if you don't have a truckload of hundred of thousands events, your performance will be good in any case.
I was wondering if anyone ever had a change to measure how a would 100 joined tables perform?
Each table would have an ID column with primary index and all table are 1:1 related.
It is a common problem within many data entry applications where we need to collect 1000+ data points. One solution would be to have one big table with 1000+ columns and the alternative would be to split them into multiple tables and join them when it is necessary.
So perhaps more real question would be how 30 tables (30 columns each) would behave with multitable join.
500K-1M rows should be the expected size of the tables.
Cheers
As a rule of thumb, anymore than 25 joins might be a performance problem. I try to keep joins below 10-15. It depends on the database activity and number of concurrent users, and the ratio of reads/writes.
Suggest you look at indexed views.
With any well tuned database, 'good' indexes for the query workload are the key .
They'd most likely perform terribly, unless you had a very small number of rows per table.
Go for a wider table, but normalize it properly. My guess is that if you normalize your data properly, you will have a slightly more sane design.
What you describe is similar to the implementation of column-oriented database (wikipedia). The data is stored in "column major" format which slows down adding each row, but is much faster for querying in the case of a where clause which restricts the returned rowset.
Why is it that you would rather split the rows? Is it that you measure the data elements for each row at different times? Or is it that the query result of a row would be very large?
Since first posting this, you answered me below that your reason for desiring a split of the table is that you usually only work with a subset of the data.
In that case, splitting the table can help your performance (amount of runtime consumed by the query) some amount. This may be an important factor in your wanting to work with less data -- in the case where your database engine runs slowly with large rows.
If performance is not a concern, rather than using SQL JOINs, it might serve you to explicitly list the columns you wish to retrieve in each query. For example, if you only wish to retrieve width, height, and length for a row, you could use:
SELECT width, height, length FROM datatable; rather than SELECT * FROM datatable; and accomplish the same improvement of getting less data returned. The SQL statements used would probably be shorter than the alternative join statements we were considering.
There's no way to better organise the tables? For example a "DataPointTypes" and "DataPointValues" table?
For example (and I don't know your particular circumstances) if all of your tables are like "WebsiteDataPoints (WebsitePage, Day, Visits)" "StoreDataPoints (Branch, Week, Sales)" etc. you could instead have
DataPointSources(Name)
(with data: Website,Store)
DataPointTypes(SourceId, ColumnName)
(with data: (Website, WebsitePage), (Website, Day), (Store, Branch), (Store, Sales) etc.)
DataPointEntry(Id, Timestamp)
DataPointValues (EntryId, Value(as varchar probably))
(with data: (1, Website-WebsitePage, 'pages.php'), (2, Store-Branch, 'MainStore'), (1, Website-Day, '12/03/1980'), (2, Store-Sales '35') etc.)
In this way each table becomes a source, each column becomes a type, each row becomes an entry, and each cell becomes a value.
I have the following SQL query:
SELECT * FROM table WHERE field_1 <> field_2
Which is the best index structure to use, to keep this query efficient: two indexes on field_1 and field_2 or a single index which includes both fields?
EDIT: The database is MySQL
If you have a enormous table better is to denormalize it and store the result of filed1<>field2 in separate column, and update it on every insert/update of the corresponding row
I imagine this may depend on which platform you are using, but on MS SQL Server definitely one index!
Indexes are not going to help you.
The databse must do a table scan, as it is comparing two fields in the same row.
It depends on your database engine, but generally it's best to assume that a query will only use one index per table. This would imply that a single index across both columns is likely to be best.
However, the only way to find out is to populate a table with dummy data and try it out. Make sure that the dummy data is representative in terms of how it is distributed as, for example, if 99% of field2 values are identical to each other then it may reduce the value of having an index.
To be sure, I'd try all three options, but remember you are writing to each index with every insert / update. (so indexing both fields will have to be more beneficial by a margin to compensate for the negative effects on write performance) Remember, it doesn't have to be perfect, it just has to be good enough to handle the system throughput without creating unacceptable UI performance latencies.
What I'd try first is A single index on the field that has the most distinct values... i.e. if Field1 has 1000 different values in it, and field 2 only has 20, then put the index on field1.
Here's a nice article about indexes and inequality matches:
http://sqlinthewild.co.za/index.php/2009/02/06/index-columns-selectivity-and-inequality-predicates/
alternatively, if your data is vast, you might consider using a trigger to mark another column with a bit, indiciating if the columns match or not, and then search on that column. All depends on your situation, of course.