How to find out how much space a SQL Server table uses? - sql

Is it possible to get the amount of space on disk that a particular table uses? Let's say I have a million users stored in my table and I want to know how much space it's required to store all users and/or one of them.
Update:
I'm planning to use redis to cache some fields from one particular table in memory to quickly retrieve the needed data after. So I need to calculate how much space approximately will it take and thus will it fit in the memory or not. Definitely it depends on the data types that I use inside my table but if a table consists of several dozens of fields it would take too much time to count this one by one.
There is exactly such answer for the MySQL though it's not suitable for SQL Server: How can you determine how much disk space a particular MySQL table is taking up? You can check it to see what I mean.

If you have SSMS, you can right-click on the table in the Object Explorer, go to Properties, and then look at the Storage page. The field, Data space, is the size of the data in that table, but it probably does not include some of the overhead costs of the table.

This is really an extended comment, because it does not directly answer the question.
For most purposes, you just use the size of the columns, add them together, and multiply by the number of rows. This lowballs the estimate, but it is reasonable. And (depending on how you handle the types) might be a reasonable estimate of the size of exporting the data.
That said, the storage of tables is a difficult matter. Here are some of the factors you need to take into account:
The size of individuals fields. This is made slightly more difficult because some types have varying sizes, so those are entirely data dependent.
The number of pages occupied by a table (or equivalently how full each data page is). Note that this can vary, depending on how full each table is.
The number of pages occupied by "overflow" data types, such as varchar(max).
Whether or not the data pages are compressed or encrypted.
The indexes for the table.
How full each index page is.
And, no doubt, I've left out a bunch of other relevant internal details (here is a place to start on page layouts).
In other words, there isn't a simple answer. Equivalent tables on two different systems could occupy very different amounts of space. This is true of the "same" table on the same system at different times.
The general answer when working with databases is that you need a lot more space than number of rows * row size -- I seem to recall using a factor of 3 at one point in time. In general, storage is pretty cheap, so this is not the limiting factor using a database.

We would need to see your full database schema, with tables and columns and all fields' data types. Without those pieces of information it's just a lucky guess. Here is a helpful cheat sheet of the sizes of each data type: https://www.connectionstrings.com/sql-server-2012-data-types-reference/
Then you just have to do the Math and calculate the space needed for X, which is your number of records

Related

Which is more space efficient, multiple colums or multple rows? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
Suppose if i have a table A with 100 columns of same data type and 100 rows.
Table B with 2 columns and 5000 rows of same data type of above table columns.
Which table takes more disk space to store & which is more efficient?
Either a table has 2 columns or 100. You would not convert one into the other or you would do something very wrong.
A product table may have 100 columns (item number, description, supplier number, material, list price, actual price ...). How would you make this a two column table? A key-value table? A very bad idea.
A country table may have 2 columns (iso code and name). How would you make this a 100-columns table? By having columns usa_name, usa_code, germany_name, germany_code, ...? An even worse idea.
So: The question is out of question :-) There is nothing to decide between.
The real answer here is... it depends.
Oracle stores its data in "data blocks", which are stored in "Extents" which are stored in "Segments" which make up the "Tablespace". See here.
A data block is much like a block used to store data for the operating system. In fact, an Oracle data block should be specified in multiples of the operating system's blocks so there isn't unnecessary I/O overhead.
A data block is split into 5 chunks:
The Header - Which has information about the block
The Table Directory - Tells oracle that this block contains info about whatever table it is storing data for
Row Directory - The portion of the block that stores information about the rows in the block like addresses.
Row data - The meat and potatoes of the block where the row data is stored. Keeping in mind that Rows can span blocks.
Free space - This is the middle of the bingo board and you don't have to actually put your chip here.
So the two important parts of Oracle data storage, for this question, in it's data blocks are the Row Data and the Row Directory (And to some extend, the Free Space).
In your first table you have very large rows, but fewer of them. This would suggest a smaller row directory (unless it spans multiple blocks because of the size of the rows, in which case it would be Rows*Blocks-Necessary-To-Store-Them). In your second table you have more rows, which would suggest a larger row directory than the first table.
I believe that a row directory entry is two bytes. It describes the offset in bytes from the start of the block where the row data can be found. If your data types for your two columns in second table are TINYINT() then your rows would be 2 bytes as well. In effect, you have more rows, so your directory here is as big as your data. It's datasize*2, which will cause you to store more data for this table.
The other gotcha here is that data stored in the row directory of a block is not deleted when the row is deleted. The header that contains the row directory in the block is only reused when a new insert comes along that needs the space.
Also, every block has it's free space that it keeps for storing more rows and header information, as well as holding transaction entries (see the link above for that).
At any rate, it's unlikely that your row directory in a given block would be larger than your row data, and even then Oracle may be holding onto free space in the block that trumps both depending on the size of the table and how often it's accessed and whether oracle is automatically managing free space for you, or your managing manually (does anyone do that?).
Also, if you toss an index on either of these tables, you'll change the stats all around anyway. Indexes are stored like tables, and they have their own Segments, extents, and blocks.
In the end, your best bet is to not worry too much about the blocks and whatnot (after all, storage is cheap) instead:
Define appropriate field types for your data. Don't store boolean values in a CHAR(100), for instance.
Define your indexes wisely. Don't add an index just to be sure. Make good decisions when your tuning.
Design your schema for your end user's needs. Is this a reporting database? In that case, shoot for denormalized pre-aggregated data to keep reads fast. Try to keep down the number of joins a user needs to get at their result set.
Focus on cutting CPU and I/O requirements based on the queries that will come through for the schema you have created. Storage is cheap, CPU and I/O aren't, and your end user isn't going to give a rats ass about how many hard drives (or ram if it's in-memory) you needed to cram into your box. They are going to care about how quick the application reads and writes.
p.s. Forgive me if I misrepresented anything here. Logical database storage is complicated stuff and I don't deal much with Oracle so I may be missing a piece to the puzzle, but the overall gist is the same. There is the actual data you store, and then there is the metadata for that data. It's unlikely that the metadata will trump, in size, the data itself, but given the right circumstances, it's possible (especially with indexing figured in). And, in the end, don't worry to much about it anyway. Focus on the needs of the end user/application in designing your schema. The end user will balk a hell of a lot more than your box.
Efficiency is a nebulous concept, and depends on what you are measuring. If you are having to jump through hoops extracting data that is poorly indexed (or requires function-based indexes) that were implemented because disk space was deemed more important than proper design, then I'd say that you will wind up with a far less efficient app from both a data retrieval standpoint, not to mention having to deal with code complexity implemented to try and overcome poor design.
Considering that each column has to store some meta data, I'm guessing that table B might be more space efficient, since the size of your actual data is constant and equal in both cases.
In terms of memory I think it depends on type of data (images, videos, int, varchar...etc) stored in the tables. (assuming you do not mean that both tables contains the same data as I do not see how you'd change the columns into rows)
In terms of efficiency and I hope I am right if I say that table B is more efficient as indexing 2 columns is easier that indexing 5 therefore data can be retrieved easier in any way possible compared to a table with 5 columns where some kind of query might take longer time.

SQL - split a large table according to how frequently they're accessed?

I have a table that has 50 fields:
10 Fields that are almost always needed.
40 Fields that are very rarely needed.
I would roughly say that the fields in (1) are needed to be accessed 1000 times more frequently than the fields in (2).
Should I split them to two tables with one-to-one relation, or keep all in the same table?
The process that you are describing is sometimes referred to as "vertical partitioning". Taken to an extreme (one column per vertical partition), this is how columnar databases store data. Unfortunately (to the best of my knowledge), Postgres does not currently have direct support for vertical partitioning.
Your idea of splitting the data into two tables is fine. I would note the following:
You will need to modify queries that use the extra columns to use the second table. (You can wrap the join into a view which you use when you want the extra columns.)
If both tables have a clustered primary key that connects them, then the join should be really fast.
If you are inserting/updating/deleting data, then you need to be careful about synchronization. I think you can handle this with an INSTEAD OF trigger on a view combining the tables.
If some records do not have extra columns, this can be a big win on the space side.
If all records and all columns are going to be loaded into the cache, then this probably is not a big win.
This can be a big performance win, under some circumstances. But there is additional manual work to keep the tables synchronized.
There's really not nearly enough information here to estimate (never mind actually quantify) what the benefits might be, but the costs are very clear -- more complex code, a more complex schema, probably greater overall space usage, and a performance overhead when adding and removing rows.
A performance improvement might come from scanning a smaller amount of data when performing a full table scan, or from an increased likelihood in finding data blocks in memory when required, and an overall smaller memory footprint, but without specific information on the types of operation commonly performed, and whether the server is under memory pressure, no reliable advice can be given.
Be very wary of making your system more complex as a side-effect of uncertain performance gains.

Is altering the Page Size in SQL Server the best option for handling "Wide" Tables?

I have a multiple tables in my application that are both very wide and very tall. The width comes from sometimes 10-20 columns with a variety of datatypes varchar/nvarchar as well as char/bigint/int/decimal. My understanding is that the default page size in SQL is 8k, but can be manually changed. Also, that varchar/nvarchar columns are except from this restriction and they are often(always?) moved to a separate location, a process called Row_Overflow. Evenso, MS documentation states that Row-Overflowed data will degrade performance. "querying and performing other select operations, such as sorts or joins on large records that contain row-overflow data slows processing time, because these records are processed synchronously instead of asynchronously"
They recommend moving large columns into joinable metadata tables. "This can then be queried in an asynchronous JOIN operation".
My question is, is it worth enlarging the page size to accomodate the wide columns, and are there other performance problems thatd come up? If I didnt do that and instead partitioned the table into 1 or more metadata tables, and the tables got "big" like in the 100MM records range, wouldnt joining the partitioned tables far outweigh the benefits? Also, if the SQL Server is on a single core machine (or on SQL Azure) my understanding is that parallelism is disabled, so would that also eliminate the benefit of moving the tables intro partitions given that the join would no longer be asynchronous? Any other strategies that you'd recommend?
EDIT: Per the great comments below and some additional reading (that I shouldve done originally), you cannot manually alter SQL Server page size. Also, related SO post: How do we change the page size of SQL Server?. Additional great answer there from #remus-rusanu
You cannot change the page size.
varchar(x) and (MAX) are moved off-row when necessary - that is, there isn't enough space on the page itself. If you have lots of large values it may indeed be more effective to move them into other tables and then join them onto the base table - especially if you're not always querying for that data.
There is no concept of synchronously and asynchronously reading that off-row data. When you execute a query, it's run synchronously. You may have parallelization but that's a completely different thing, and it's not affected in this case.
Edit: To give you more practical advice you'll need to show us your schema and some realistic data characteristics.
My understanding is that the default page size in SQL is 8k, but can
be manually changed
The 'large pages' settings refers to memory allocations, not to change the database page size. See SQL Server and Large Pages Explained. I'm afraid your understanding is a little off.
As a general non-specific advice, for wide fixed length columns the best strategy is to deploy row-compression. For nvarchar, Unicode compression can help a lot. For specific advice, you need to measure. What is the exact performance problem you encountered? How did you measured? Did you used a methodology like Waits and Queues to identify the bottlenecks and you are positive that row size and off-row storage is an issue? It seems to me that you used the other 'methodology'...
you can't change the default 8k page size
varchar and nvarchar are treated like any other field, unless the are (max) which means they will be stored a little bit different because they can extend the size of a page, which you cant do with another datatype, also because it is not possible
For example, if you try to execute this statement:
create table test_varchars(
a varchar(8000),
b varchar(8001),
c nvarchar(4000),
d nvarchar(4001)
)
Column a and c are fine because both on them are max 8000 bytes in length.
But, you would get the following errors on columns b and d:
The size (8001) given to the column 'b' exceeds the maximum allowed for any data type (8000).
The size (4001) given to the parameter 'd' exceeds the maximum allowed (4000).
because both of them exceed the 8000 bytes limit. (Remember that the n in front of varchar or char means unicode and occupies double of space)

mysql - Creating rows vs. columns performance

I built an analytics engine that pulls 50-100 rows of raw data from my database (lets call it raw_table), runs a bunch statistical measurements on it in PHP and then comes up with exactly 140 datapoints that I then need to store in another table (lets call it results_table). All of these data points are very small ints ("40","2.23","-1024" are good examples of the types of data).
I know the maximum # of columns for mysql is quite high (4000+) but there appears to be a lot of grey area as far as when performance really starts to degrade.
So a few questions here on best performance practices:
1) The 140 datapoints could be, if it is better, broken up into 20 rows of 7 data points all with the same 'experiment_id' if fewer columns is better. HOWEVER I would always need to pull ALL 20 rows (with 7 columns each, plus id, etc) so I wouldn't think this would be better performance than pulling 1 row of 140 columns. So the question: is it better to store 20 rows of 7-9 columns (that would all need to be pulled at once) or 1 row of 140-143 columns?
2) Given my data examples ("40","2.23","-1024" are good examples of what will be stored) I'm thinking smallint for the structure type. Any feedback there, performance-wise or otherwise?
3) Any other feedback on mysql performance issues or tips is welcome.
Thanks in advance for your input.
I think the advantage to storing as more rows (i.e. normalized) depends on design and maintenance considerations in the face of change.
Also, if the 140 columns have the same meaning or if it differs per experiment - properly modeling the data according to normalization rules - i.e. how is data related to a candidate key.
As far as performance, if all the columns are used it makes very little difference. Sometimes a pivot/unpivot operation can be expensive over a large amount of data, but it makes little difference on a single key access pattern. Sometimes a pivot in the database can make your frontend code a lot simpler and backend code more flexible in the face of change.
If you have a lot of NULLs, it might be possible to eliminate rows in a normalized design and this would save space. I don't know if MySQL has support for a sparse table concept, which could come into play there.
You have a 140 data items to return every time, each of type double.
It makes no practical difference whether this is 1x140 or 20x7 or 7x20 or 4x35 etc. It could be infinitesimally quicker for one shape of course but then have you considered the extra complexity in the PHP code to deal with a different shape.
Do you have a verified bottleneck, or is this just random premature optimisation?
You've made no suggestion that you intend to store big data in the database, but for the purposes of this argument, I will assume that you have 1 billion (10^9) data points.
If you store them in 140 columns, you'll have a mere 7 millon rows, however, if you want to retrieve a single data point from lots of experiments, then it will have to fetch a large number of very wide rows.
These very wide rows will take up more space in your innodb_buffer_pool, hence you won't be able to cache so many; this will potentially slow you down when you access them again.
If you store one datapoint per row, in a table with very few columns (experiment_id, datapoint_id, value) then you'll need to pull out the same number of smaller rows.
However, the size of rows makes little difference to the number of IO operations required. If we assume that your 1 billion datapoints doesn't fit in ram (which is NOT a safe assumption nowadays), maybe the resulting performance will be approximately the same.
It is probably better database design to use few columns; but it will use less disc space and perhaps be faster to populate, if you use lots of columns.

Is there any harm choosing a large value for varchar in MySQL?

I'm about to add a new column to my table with 500,000 existing rows. Is there any harm in choosing a large value for the varchar? How exactly are varchars allocated for existing rows? Does it take up a lot of disk space? How about memory effects during run time?
I'm looking for MySQL specific behavior details not general software design advises.
There's no harm in choosing a large value for a varchar field. Only the actual data will be stored, and MySQL doesn't allocate the full specified length for each record. It stores the actual length of the data along with the field, so it doesn't need to store any padding or allocate unused memory.
Depends on what you're doing. See the relevant documentation page for some of the details:
http://dev.mysql.com/doc/refman/5.0/en/char.html
The penalty in disk space isn't really any different than what you have for e.g. TEXT types, and from a performance perspective it MAY actually be faster.
The primary problem is the maximum row size. Note that the exact implications of this differ between storage engines. Consult the MySQL docs for your storage engine of choice for maximum row size information.
I should also add that there can be performance benefits to minimizing row size, but it really depends on your workload, indexing, and just how big the rows are, whether or not it will be meaningful for you.
MySQL VARCHAR fields store the contents, plus 2 bytes for length. So empty VARCHAR fields will use up space to mark their lengths.
Also, if this is the only VARCHAR field in your table, and your storage engine is MyISAM, it would force dynamic row format which may yield a performance hit (testing will confirm).
http://dev.mysql.com/doc/refman/5.0/en/column-count-limit.html
http://dev.mysql.com/doc/refman/5.0/en/dynamic-format.html