So a pretty big problem just came up with a few databases we're using. For some reason, people's databases have grown to a ridiculous file size without any change in the table's data. While I have no idea what is causing this all of a sudden, I'm more concerned right now with clearing the disk space its using. I've run sp_spaceused and have tracked the main culprit down to 1 of 2 tables (depending on the database). For each database, one of these tables is allocating over half a GB to reserved spacewhile the data is only something like 50 MB. It shows that the index_size is at ~113 MB. The table has no clustered index and has about 15 columns, all of relatively small length except for 2 columns that are of type nvarchar with length 255 (these are usually null or empty in the table).
I've tried running DBCC shrinkdatabase and truncate table but it didn't do anything. I've researched this a bit and some others have had this problem, but if shrinkdatabase didn't fix it, then no solution was found for them either.
Let me know if there's anything else you guys need to know about the table or database set up. I don't know what else to try and this is a significant issue for us as people's databases are all of a sudden taking up 10 times the space they were before.
EDIT:
After trying to run DBCC DBREINDEX and trying to change to a clustered index through Enterprise Manager, I get an error message saying:
Could not allocate new page for database 'DB'. There are no more pages available in filegroup PRIMARY. Space can be created by dropping objects, adding additional files, or allowing file growth.
I've tried deleting rows from this table too and it has no effect on the table's size. The log file increases, as expected, but that's the only change to the table's size.
These are heaps (no clustered indexes)? Why?
If the users have done a lot of updates or deletes, I would try rebuilding the tables. Shrinkdatabase is not the way to do this. It will not fix fragmentation, unrecovered space from deletes or altering the width of a column, or rows wasted as forwarding pointers. I bet the tables would fare much better with a rebuild and/or a clustered index. In SQL Server 2000 you would do this by either:
(a) adding a clustered index. (I can't give you exact syntax for adding a clustered index (or changing one of your existing indexes, or the non-clustered primary key to be clustered), because I don't have enough information about your table.)
(b) DBCC DBREINDEX('dbo.tablename');
This will rebuild the non-clustered indexes on your heaps, but that may do you no good depending on where the wasted space is happening. I still suggest you should create a clustered index. If you can share some details about your table (e.g. the indexes that exist, what kind of queries are typically run, how you currently uniquely identify a row and the nature of data changes/additions to this table) we might be able to give more detailed advice.
Related
I have a table of about 60GB and I'm trying to create an index,
and its very slow (almost a day, and still running!)
I see most of the time is on Disk I/O(4MB/s), and it doesn't use the memory or cpu so much
I tried: running 'pragma cache_zise = 10000' and 'pragma page_zise=4000'
(after I created the table), and its still doesn't help.
How can I make the 'create index' run in reasonable time?
Creating an index on a database table is a one time operation and it can be expensive based on many factors ranging from how many fields and of what type are included in the index, the size of the data table that is to be indexed, the hardware of the machine the database is running on, and possibly even more.
To give a reasonable answer on speeding things up, we would need to know the schema of the table, the definition of the index you are creating, are you reasonably sure if you are including uniqueness in your index that the data is actually unique, what are the hardware specs of your server, what are your disk speeds, how much available space on the disks, are you using a raid array, what level of raid, how much ram do you have and what is the utilization. etc...
Now all that said, this might be faster but I have not tested it.
make a structurally duplicate table of the table you wish to index.
Add the index to the new empty table.
copy the data from the old table to the new table in chunks.
drop the old table.
My theory is that it will be less expensive to index the data as it is added than to dig through the data that is already there and add the index after the fact.
When you create table,you should create the index. PS:you should consider the index is properly.and you need not to create the index at runtime.
I'm working with table partitioning on extremely large fact table in a warehouse. I have executed the script a few different ways. With and without non clustered indexes. With indexes it appears to dramatically expand the log file while without the non clustered indexes it appears to not expand the log file as much but takes more time to run due to the rebuilding of the indexes.
What I am looking for is any links or information as to what is happening behind the scene specifically to the log file when you split a table partition.
I think it isn't to hard to theorize what is going on (to a certain extent). Behind the scenes each partition is given a different HoBT, which in normal language means each partition is in effect sitting on it's own hidden table.
So theorizing the splitting of a partition (assuming data is moving) would involve:
inserting the data into the new table
removing data from the old table
The NC index can be figured out, but depending on whether there is a clustered index or not, the theorizing will alter. It also matters whether the index is partition aligned or not.
Given a bit more information on the table (CL or Heap) we could theorize this further
If the partition function is used by a
partitioned table and SPLIT results in
partitions where both will contain
data, SQL Server will move the data to
the new partition. This data movement
will cause transaction log growth due
to inserts and deletes.
This is from an article by Microsoft on Partitioned Table and Index Strategies
So looks like its doing a delete from old partition and and insert into the new partition. This could explain the growth in t-log.
I have a database of around 20GB. I need to delete 5 tables & drop a few columns in some other 3 tables.
Dropping 5 tables with free some 3 GB and dropping columns in other tables should free another 8GB.
How do I reclaim this space from MySQL.
I've read dumping the database and restoring it back as one of the solution but I'm not really sure how that works, I am not even sure if this only works for deleting the entire database or just parts of it?
Please suggest how to go about this. THanks.
From the comments, it sounds like you're using InnoDB without the file per table option.
Reclaiming space from the innodb tablespace is not generally possible in this mode. Your only course of action is to dump the whole database, turn on file-per-table mode, and reload it (with a completely clean mysql instance). This is going to take a long time with a large database; mk-parallel-dump and restore tools might be a bit quicker, but it will still take a while. Be sure to test this process on a non-production server first.
EDIT: Doesn't apply without file_per_table, Mark is right there.
What's going on is that once MySQL takes space, it won't give it back. This is so that if you delete 500 rows and then immediately insert 500, it doesn't have to give that space back to the file system and then request it back. It's an optimization to avoid filesystem overhead, and it works well when you delete little bits.
If you delete a large amount, it will take a long time to end up using all that space again, which can be annoying. This can be fixed two ways: dropping the table and reloading the contents, or optimizing the table (which I believe basically reloads the table internally).
All you have to do to get space back from a table is:
OPTIMIZE TABLE my_big_table;
Note that this can take a while, it's not a near instant operation. Basically, plan for a some downtime. If your tables are just a few gigs, it shouldn't be too long (probably a few minutes). This also rebuilds the indexes and does some other housekeeping.
You can see more about optimize on the MySQL site. Here is it's advice:
OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file.
I understand that indexes should get updated automatically but when that does not happen we need to reindex.
My question is (1) Why this automatic udate fails or why an index become bad?
(2) How do I prgramatically know which table/index needs re-indexing at a point of time?
Indexes' statistics may be updated automatically. I do not believe that the indexes themselves would be rebuilt automatically when needed (although there may be some administrative feature that allows such a thing to take place).
Indexes associated with tables which receive a lot of changes (new rows, updated rows and deleted rows) may have their indexes that become fragmented, and less efficient. Rebuilding the index then "repacks" the index in a contiguous section of storage space, a bit akin to the way defragmentation of the file system makes file access faster...
Furthermore the Indexes (on several DBMS) have a FILL_FACTOR parameter, which determine how much extra space should be left in each node for growth. For example if you expect a given table to grow by 20% next year, by declaring the fill factor around 80%, the amount of fragmentation of the index should be minimal during the first year (there may be some if these 20% of growth are not evenly distributed,..)
In SQL Server, It is possible to query properties of the index that indicate its level of fragmentation, and hence it possible need for maintenance. This can be done by way of the interactive management console. It is also possible to do this programatically, by way of sys.dm_db_index_physical_stats in MSSQL 2005 and above (maybe even older versions?)
I have a big table in SQL Server 2005 that's taking about 3.5 GB of space (according to sp_spaceused). It has 10 million records, and several indexes.
I just dropped a bunch of columns from it, such that the record length got reduced to a half, and to my surprise it took zero time to do that. Obviously, sp_spaceused was still reporting the same taken space, SQL server hadn't really done anything when dropping the columns, other than marking them as "dropped".
So I moved all the data from this table into another new table, truncated it, and moved all the data back, so that it'd get all reconstructed.
Now, after that, data is taking 2.8 GB, which IS less than before, but I expected a bigger drop.
Is it possible that the fact that this table originally had these columns is still leaving something there?
Was truncating it not enough? Should I drop it and create it again with the smaller column set?
Or is the data really taking 2.8 GB?
Thanks!
You will need to rebuild the clustered index (assuming you have one - by default, your primary key is the clustered key).
ALTER INDEX (your clustered index) ON TABLE (your table) REBUILD
The data is really the leaf level of your clustered index - once you rebuild it, it will be "compacted" and the rows should be stored on much fewer data pages, reducing your database size, too.
If that doesn't help at all, you might also need to run a DBCC SHRINKDATABASE on your database to really reclaim the space. These two steps together should really get you some smaller database file!
Marc
How did you calculate that "expected a bigger drop"? Note that the data comes in 8K pages, which means that even if individual rows are smaller, that does not always mean you need less pages to store them.
For example (an extreme example), if your rows used to be 7.5K each, only one row per page would fit. You drop some columns, your row is 5K, but still it is one row per page.