SQL Server 2005: disk space taken by dropped columns - sql-server-2005

I have a big table in SQL Server 2005 that's taking about 3.5 GB of space (according to sp_spaceused). It has 10 million records, and several indexes.
I just dropped a bunch of columns from it, such that the record length got reduced to a half, and to my surprise it took zero time to do that. Obviously, sp_spaceused was still reporting the same taken space, SQL server hadn't really done anything when dropping the columns, other than marking them as "dropped".
So I moved all the data from this table into another new table, truncated it, and moved all the data back, so that it'd get all reconstructed.
Now, after that, data is taking 2.8 GB, which IS less than before, but I expected a bigger drop.
Is it possible that the fact that this table originally had these columns is still leaving something there?
Was truncating it not enough? Should I drop it and create it again with the smaller column set?
Or is the data really taking 2.8 GB?
Thanks!

You will need to rebuild the clustered index (assuming you have one - by default, your primary key is the clustered key).
ALTER INDEX (your clustered index) ON TABLE (your table) REBUILD
The data is really the leaf level of your clustered index - once you rebuild it, it will be "compacted" and the rows should be stored on much fewer data pages, reducing your database size, too.
If that doesn't help at all, you might also need to run a DBCC SHRINKDATABASE on your database to really reclaim the space. These two steps together should really get you some smaller database file!
Marc

How did you calculate that "expected a bigger drop"? Note that the data comes in 8K pages, which means that even if individual rows are smaller, that does not always mean you need less pages to store them.
For example (an extreme example), if your rows used to be 7.5K each, only one row per page would fit. You drop some columns, your row is 5K, but still it is one row per page.

Related

What happens when I drop a column in SQL Server

If my understanding is right then the number of rows that are stored on one page in SQL Server is determined by the number of columns in the table and their datatypes. One I/O operation can read one page, so the more rows fits into one page, the more rows can be returned by one I/O operation, and your queries run faster.
I wonder what happens when you drop a column? Does SQL Server go back to the memory and "re-stores" the data, i.e what if I drop enough columns for more data to fit on one page? And if SQL Server does not do that automatically, can I force the process?
I'm removing a lot of text columns and a few IDs on the heavily used table, and I hoping that the I/O will improve after I drop the columns.
Dropping a column is a logical operation, not physical. No data gets modified. The column metadata gets marked as 'deleted' and will be ignored. The record size is unchanged. Read SQL Server table columns under the hood for more details explanation and clear examples demonstrating my claim.
As Stefan said, you have to rebuild the table (heap or clustered index) to 'reclaim' the space.
I'm removing a lot of text columns and a few IDs on the heavily used table, and I hoping that the I/O will improve after I drop the columns.
Use indexes to reduce IO. For unindexable ad-hoc analytical workloads, use columnstores. Read How to analyse SQL Server performance.
You can rebuild the clustered index to force SQL Server use the free space. Otherwise you have "holes" in the pages.

SQL Server 2000, table allocates too much space

So a pretty big problem just came up with a few databases we're using. For some reason, people's databases have grown to a ridiculous file size without any change in the table's data. While I have no idea what is causing this all of a sudden, I'm more concerned right now with clearing the disk space its using. I've run sp_spaceused and have tracked the main culprit down to 1 of 2 tables (depending on the database). For each database, one of these tables is allocating over half a GB to reserved spacewhile the data is only something like 50 MB. It shows that the index_size is at ~113 MB. The table has no clustered index and has about 15 columns, all of relatively small length except for 2 columns that are of type nvarchar with length 255 (these are usually null or empty in the table).
I've tried running DBCC shrinkdatabase and truncate table but it didn't do anything. I've researched this a bit and some others have had this problem, but if shrinkdatabase didn't fix it, then no solution was found for them either.
Let me know if there's anything else you guys need to know about the table or database set up. I don't know what else to try and this is a significant issue for us as people's databases are all of a sudden taking up 10 times the space they were before.
EDIT:
After trying to run DBCC DBREINDEX and trying to change to a clustered index through Enterprise Manager, I get an error message saying:
Could not allocate new page for database 'DB'. There are no more pages available in filegroup PRIMARY. Space can be created by dropping objects, adding additional files, or allowing file growth.
I've tried deleting rows from this table too and it has no effect on the table's size. The log file increases, as expected, but that's the only change to the table's size.
These are heaps (no clustered indexes)? Why?
If the users have done a lot of updates or deletes, I would try rebuilding the tables. Shrinkdatabase is not the way to do this. It will not fix fragmentation, unrecovered space from deletes or altering the width of a column, or rows wasted as forwarding pointers. I bet the tables would fare much better with a rebuild and/or a clustered index. In SQL Server 2000 you would do this by either:
(a) adding a clustered index. (I can't give you exact syntax for adding a clustered index (or changing one of your existing indexes, or the non-clustered primary key to be clustered), because I don't have enough information about your table.)
(b) DBCC DBREINDEX('dbo.tablename');
This will rebuild the non-clustered indexes on your heaps, but that may do you no good depending on where the wasted space is happening. I still suggest you should create a clustered index. If you can share some details about your table (e.g. the indexes that exist, what kind of queries are typically run, how you currently uniquely identify a row and the nature of data changes/additions to this table) we might be able to give more detailed advice.

SQL Server Table Partitioning, what is happening behind the scenes?

I'm working with table partitioning on extremely large fact table in a warehouse. I have executed the script a few different ways. With and without non clustered indexes. With indexes it appears to dramatically expand the log file while without the non clustered indexes it appears to not expand the log file as much but takes more time to run due to the rebuilding of the indexes.
What I am looking for is any links or information as to what is happening behind the scene specifically to the log file when you split a table partition.
I think it isn't to hard to theorize what is going on (to a certain extent). Behind the scenes each partition is given a different HoBT, which in normal language means each partition is in effect sitting on it's own hidden table.
So theorizing the splitting of a partition (assuming data is moving) would involve:
inserting the data into the new table
removing data from the old table
The NC index can be figured out, but depending on whether there is a clustered index or not, the theorizing will alter. It also matters whether the index is partition aligned or not.
Given a bit more information on the table (CL or Heap) we could theorize this further
If the partition function is used by a
partitioned table and SPLIT results in
partitions where both will contain
data, SQL Server will move the data to
the new partition. This data movement
will cause transaction log growth due
to inserts and deletes.
This is from an article by Microsoft on Partitioned Table and Index Strategies
So looks like its doing a delete from old partition and and insert into the new partition. This could explain the growth in t-log.

Free space in MySQL after deleting tables & columns?

I have a database of around 20GB. I need to delete 5 tables & drop a few columns in some other 3 tables.
Dropping 5 tables with free some 3 GB and dropping columns in other tables should free another 8GB.
How do I reclaim this space from MySQL.
I've read dumping the database and restoring it back as one of the solution but I'm not really sure how that works, I am not even sure if this only works for deleting the entire database or just parts of it?
Please suggest how to go about this. THanks.
From the comments, it sounds like you're using InnoDB without the file per table option.
Reclaiming space from the innodb tablespace is not generally possible in this mode. Your only course of action is to dump the whole database, turn on file-per-table mode, and reload it (with a completely clean mysql instance). This is going to take a long time with a large database; mk-parallel-dump and restore tools might be a bit quicker, but it will still take a while. Be sure to test this process on a non-production server first.
EDIT: Doesn't apply without file_per_table, Mark is right there.
What's going on is that once MySQL takes space, it won't give it back. This is so that if you delete 500 rows and then immediately insert 500, it doesn't have to give that space back to the file system and then request it back. It's an optimization to avoid filesystem overhead, and it works well when you delete little bits.
If you delete a large amount, it will take a long time to end up using all that space again, which can be annoying. This can be fixed two ways: dropping the table and reloading the contents, or optimizing the table (which I believe basically reloads the table internally).
All you have to do to get space back from a table is:
OPTIMIZE TABLE my_big_table;
Note that this can take a while, it's not a near instant operation. Basically, plan for a some downtime. If your tables are just a few gigs, it shouldn't be too long (probably a few minutes). This also rebuilds the indexes and does some other housekeeping.
You can see more about optimize on the MySQL site. Here is it's advice:
OPTIMIZE TABLE should be used if you have deleted a large part of a table or if you have made many changes to a table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file.

SQL Server Indexes - Initial slow performance after creation

Using SQL Server 2005. This is something I've noticed while doing some performance analysis.
I have a large table with about 100 million rows. I'm comparing the performance of different indexes on the table, to see what the most optimal is for my test scenario which is doing about 10,000 inserts on that table, among other things on other tables. While my test is running, I'm capturing an SQL Profiler trace which I load in to an SQL table when the test has finished so I can analyse the stats.
The first test run after recreating a different set of indexes on the table is very noticeably slower than subsequent runs - typically about 10-15 times slower for the inserts on this table on the first run after the index creation.
Each time, I clear the data and execution plan cache before the test.
What I want to know, is the reason for this initial poorer performance with a newly created set of indexes?
Is there a way I can monitor what is happening to cause this for the first run?
One possibility is that the default fill factor of zero is coming in to play.
This means that there's 'no room' in the index to accommodate your inserts. When you insert, a page split in the index is needed, which adds some empty space to store the new index information. As you carry out more inserts, more space is created in the index. After a while the rate of splitting will go down, because your inserts are hitting pages that are not fully filled, so splits are not needed. An insert requiring page splits is more expensive than one that doesn't.
You can set the fill factor when you create the index. Its a classic trade off between space used and performance of different operations.
I'm going go include a link to some Sybase ASE docs, 'cos they are nicely written and mostly applicable to SQL Server too.
Just to clarify:
1) You build an index on a table with 100m pre-existing rows.
2) You insert 10k rows into the table
3) You insert another 10k rows into the table
Step 3 is 10x faster than step 2?
What kind of index is the new index - not clustered, right? Because inserts on a clustered index will cause very different behavior. In addition, is there any significant difference in the profile of the 2 inserts, because depending on the clustered index, they will have different behavior. Typically, it should either have no clustered index or be clustered on an increasing key.