How is data stored in SQL server? [closed] - sql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How is data stored in SQL server?

This Wikipedia article describes it rather well.
Here is a subset of it, relating to Data Storage:
Data storage The main unit of
data storage is a database, which is a
collection of tables with typed
columns. SQL Server supports different
data types, including primary types
such as Integer, Float, Decimal, Char
(including character strings), Varchar
(variable length character strings),
binary (for unstructured blobs of
data), Text (for textual data) among
others. It also allows user-defined
composite types (UDTs) to be defined
and used. SQL Server also makes server
statistics available as virtual tables
and views (called Dynamic Management
Views or DMVs). A database can also
contain other objects including views,
stored procedures, indexes and
constraints, in addition to tables,
along with a transaction log. A SQL
Server database can contain a maximum
of 231 objects, and can span multiple
OS-level files with a maximum file
size of 220 TB. The data in the
database are stored in primary data
files with an extension .mdf.
Secondary data files, identified with
an .ndf extension, are used to store
optional metadata. Log files are
identified with the .ldf
extension.
Storage space allocated to a database
is divided into sequentially numbered
pages, each 8 KB in size. A page is
the basic unit of I/O for SQL Server
operations. A page is marked with a
96-byte header which stores metadata
about the page including the page
number, page type, free space on the
page and the ID of the object that
owns it. Page type defines the data
contained in the page - data stored in
the database, index, allocation map
which holds information about how
pages are allocated to tables and
indexes, change map which holds
information about the changes made to
other pages since last backup or
logging, or contain large data types
such as image or text. While page is
the basic unit of an I/O operation,
space is actually managed in terms of
an extent which consists of 8 pages. A
database object can either span all 8
pages in an extent ("uniform extent")
or share an extent with up to 7 more
objects ("mixed extent"). A row in a
database table cannot span more than
one page, so is limited to 8 KB in
size. However, if the data exceeds 8
KB and the row contains Varchar or
Varbinary data, the data in those
columns are moved to a new page (or
possibly a sequence of pages, called
an Allocation unit) and replaced with
a pointer to the data.
For physical storage of a table, its
rows are divided into a series of
partitions (numbered 1 to n). The
partition size is user defined; by
default all rows are in a single
partition. A table is split into
multiple partitions in order to spread
a database over a cluster. Rows in
each partition are stored in either
B-tree or heap structure. If the table
has an associated index to allow fast
retrieval of rows, the rows are stored
in-order according to their index
values, with a B-tree providing the
index. The data is in the leaf node of
the leaves, and other nodes storing
the index values for the leaf data
reachable from the respective nodes.
If the index is non-clustered, the
rows are not sorted according to the
index keys. An indexed view has the
same storage structure as an indexed
table. A table without an index is
stored in an unordered heap structure.
Both heaps and B-trees can span
multiple allocation units.

SQL Server data is stored in data files that, by default, have an .MDF extension. The log (.LDF) files are sequential files used by SQL Server to log transactions executed against the SQL Server instance (more on instances in a moment). The log files (.LDF files) are truncated automatically when using the SIMPLE recovery model, but not when using BULK LOGGED or FULL recovery.
Instances allow for more than one installation of SQL Server on a single machine. If the instance is nameless, it is the default instance. Named instances are possible as well. For eg:
MACHINENAME <-- the default instance is just the machine name
MACHINENAME\Test <-- this is the "Test" instance on this machine
You can use tools like SQL Server Management Studio (as of SQL Server 2005) or Enterprise Manager (SQL Server 2000 and before) to interact with the instance & the databases under the instance.
All instances (as of SQL Server 2005) will have a hidden resource database, as well as a master, model, msdb, and temp database. These databases are "system" databases.
Not sure what else you're looking for. Hope that helps.
EDIT:
Oh yeah, physically, data in the "data files" (.MDF files, by default) is structured in what are known as "pages" in SQL Server. Data in the log files (.LDF files) is stored sequentially. In the enterprise, the data and log files are sometimes split on different physical hard drives for better disk I/O. Or hardware RAID is used for this purpose.
EDIT2:
Forgot to mention file groups. Using file groups, you can design your logical database schema such that elements of that schema are physically separated, typically to disburse the physical database across different hard drives. For example, you could have a data file group, an indexes file group, and an images file group (for binary images).

I recommend the book 'Microsoft SQL Server 2008 Internals' -- in fact anything by Kalen Delaney on internals is good, IMO.

SQL Server is a Relational Database Management System:
A Relational database management
system (RDBMS) is a database
management system (DBMS) that is based
on the relational model as introduced
by E. F. Codd. Most popular commercial
and open source databases currently in
use are based on the relational model.
A short definition of an RDBMS may be
a DBMS in which data is stored in the
form of tables and the relationship
among the data is also stored in the
form of tables.

You can take this just about as deep as you want to go, but for SQL Server 2008 Files and Filegroups Architecture - MSDN is a good overview of basic database architecture.
The MSDN site will be a valuable resource if you need even more detailed specifics on how SQL Server 2008 stores data.

What is RDBMS?
RDBMS stands for Relational Database Management System. RDBMS data is structured in database tables, fields and records. Each RDBMS table consists of database table rows. Each database table row consists of one or more database table fields.
RDBMS store the data into collection of tables, which might be related by common fields (database table columns). RDBMS also provide relational operators to manipulate the data stored into the database tables. Most RDBMS use SQL as database query language.
Edgar Codd introduced the relational database model. Many modern DBMS do not conform to the Codd’s definition of a RDBMS, but nonetheless they are still considered to be RDBMS.
The most popular RDBMS are MS SQL Server, DB2, Oracle and MySQL.
Source

Related

SQL Server row length calculator

I am looking for an up to date tool to accurately calculate the total row size and page-density of any SQL table definition for SQL Server 2005+.
Please note that there are plenty of resources concerning calculating sizes of rows in existing tables, estimating techniques for sizing, etc... However, I am designing tables and have some options about column size which I am trying to balance with efficient data access - meaning that I can relocate less-frequently accessed long text into dedicated tables to allow the most frequent access of these new tables to operate at optimum speed.
Ideally there would be an online facility where a create statement can be cut and pasted, or a sproc I can run on a dev db.
and The answer is a simple one until you start making proper table design and balance that against joins and FK data and disk access.
I'd have a look an see how many data pages you are using and remember that one reads an extend (8 data pages) from disk, not only the data page you are looking for. Then there is the option for data compression in your table as well as sparse columns and out of row type of data storage and variable length characters.
It's not about how much data is in a column, it's really about how many data reads and CPU you need to get it. this you can test when executing a Query and looking against the ACTUAL QUERY PLAN.
As for space used you can use a stored procedure called sp_spaceused. here is a source you can use to see how one could use it in dbforms
Hope it helps
Walter

Difference between Partition and Page in SQL Server?

What is the difference between partition and page in SQL Server? Is these are available by default or we need to create explicitly?
Page is the most basic element of storage in SQL Server.
In SQL Server, the page size is 8 KB. This means SQL Server databases
have 128 pages per megabyte. Each page begins with a 96-byte header
that is used to store system information about the page. This
information includes the page number, page type, the amount of free
space on the page, and the allocation unit ID of the object that owns
the page.
Partition:- Partitioning allows a table, index, or index-organized table to be subdivided into smaller pieces, where each piece of such a database object is called a partition. Each partition has its own name, and may optionally have its own storage characteristics. The data of partitioned tables and indexes is divided into units that can be spread across more than one filegroup in a database. The data is partitioned horizontally, so that groups of rows are mapped into individual partitions.
SQL Server 2012 supports up to 15,000 partitions by default. In
earlier versions, the number of partitions was limited to 1,000 by
default. On x86-based systems, creating a table or index with more
than 1000 partitions is possible, but is not supported.

Store file on file system or as varbinary(MAX) in SQL Server

I understand that there is a lot of controversy over whether it is bad practice to store files as blob's in a database, but I just want to understand whether this would make sense in my case.
I am creating an ASP.NET application, used internally at a large company, where the users needs to be able to attach files to a 'job' in the system. These files are generally PDF's or Word documents, probably never exceeding a couple of mb.
I am creating a new table like so:
ID (int)
JobID (int)
FileDescription (nvarchar(250))
FileData (varbinary(MAX)
Is the use of varbinary(MAX) here ideal, or should I be storing the path to the file and simply storing the file on the file system somewhere?
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it LARGE_DATA.
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!

Is it sensible to store long, unique text strings in OLAP cubes for drillthrough retrieval (especially in SSAS)?

I'm motivated to store some long text strings in an OLAP cube, long on the order of 1,000s or 10,000s of characters -- but I'm wondering if this will lead me astray. (I'm also curious to learn a little more about how OLAP engines handle strings.) The particular use case I have in mind is that I have a unique, pre-existing "record description" for each of my OLAP facts, and I want to put those descriptions in the cube so that I have the option to get them back when I do a DRILLTHROUGH operation. In contrast, I don't need the record descriptions to appear when doing normal pivot table / aggregate type operations. (The descriptions are too long to display sensibly in a pivot table, plus each fact has a unique description, meaning it doesn't make sense to aggregate over descriptions.) My current dataset has around 700,000 facts, though I'm also curious if the answer would change for larger datasets.
My hope was that an OLAP server could do something sensible if I put these long strings in a cube. In the Sql Server / SSAS case in particular, I thought perhaps I'd put them in a dimension marked as ROLAP, to save memory usage, and use a degenerate dimension (aka a "fact dimension", in SSAS terminology), to avoid needless ETL complexities. But I'm curious if this would be regarded as a horrible practice for some reason, or if there are any hidden gotchas.
Update: My example use case is where you have a string associated with each OLAP fact. But it might also be instructive to consider the case where the strings are instead associated with each particular value of a particular dimension. (e.g. Suppose you had a Company dimension and each company had a somewhat lengthy Company Description string.)
Here's what I've been able to uncover about the implications of storing such strings in SSAS, especially SSAS 2008. Where I consider data structures, it's exclusively focused on MOLAP storage, which is what I've been experimenting with.
First, standard MS ETL (extract/transform/load, i.e. data import) tools like Business Intelligence Development Studio may try to prevent you from importing large textfields, especially varchar(max) fields, but there is a workaround, and it's proven effective for me. (For BIDS it involves manually setting the DataSize element in an XML file, potentially to the magic size of 163315555 bytes. Props to Matija Lah for figuring this out.)
Second, as far as I can tell, storing lots of long, unique strings shouldn't wreak havoc on the on-disk data structures used by SSAS. Also, the size of the string data on disk should be of the same order of magnitude as the string data in your data source. Here's some rough info on SSAS handles strings:
The core OLAP data structures (e.g. for the attributes of a dimension, or for the facts of a measure groups) don't directly contain strings; instead contain offsets into "string store" files (extensions .ksstore, .asstore, .bsstore, or .string.data), which contain the actual string data.
Within a given string store, each string is represented only once. If several rows in your source data tables contain duplicate strings, then at the SSAS/MOLAP level, that will translate into duplicated file-offsets, rather than duplicated string values
If you're source string has length n, then the corresponding data structure in the string store has 8-ish bytes of overhead, plus 2*n bytes per character. (Strings are inherently stored in 2-byte Unicode format in SSAS.)
For some fantastic detail about this stuff, I suggest the book Microsoft SQL Server 2008 Analysis Services Unleashed, in particular chapter 20, "The Physical Data Model".
At least in my experiments, string store files do not seem to be compressed -- at least they're not notably smaller than an uncompressed string store would be.
I've verified experimentally that text data takes the same order of magnitude of bytes whether stored in SSAS MOLAP or in a sql table. In particular, I did a "select sum(len(myfield)) from mytable" from one of my dimension tables, and then compared to the size of the corresponding attribute's files in my SSAS data directory. Size was 172MB in SQL and 304MB in SQL server. (Sql size was 147MB if I summed all unique strings, rather than all strings.) In my case the size difference was mostly explained by character encoding; my source sql data is stored with one byte per character, whereas SSAS stores all strings with two bytes per character. I found that the .kssstore file totally dominated all the other files associated with this attribute in size, regardless of whether or not I optimized the attribute via AttributeHierarchyOptimizedState=FullyOptimized.
Third, there is a 4GB cap on the size of string store files, which limits the amount of unique text that can be associated, say, with a particular dimension/attribute. In my case I'm less than 10% of the way to the limit, but this might affect some people. (Quick order-of-magnitude calculation for the original post: 1M facts * 10,000 bytes/per fact = 10GB-ish worth of text.) If you do hit this limit, you'll apparently hit it at cube "processing" time. Apparently it applies even to ROLAP dimensions. There may be some hacks to work around this. See here. Note that Sql Server 2012 may remove this 4GB limitation.
Forth, it seems that if long unique strings create a problem in SSAS, they do so at the level of in-memory representation. One potential problem (that I haven't looked into in detail) is that having these extra strings cached in memory will keep SSAS from keeping other important data structures in memory, and thus degrade performance. Another problem, suggested by the book The Microsoft Data Warehouse Toolkit (though I haven't yet found this claim elsewhere), is that SSAS does some expansive string padding on its in-memory data structures:
"The relational database stores variable length string columns ... However, other parts of the SQL Server toolset will fill these columns out to their full width. Notable, Integration Services and Analysis Services pad string columns with spaces as they are loaded into memory. Both Integration Services and Analysis Services love physical memory, so there's a cost to declaring string columns that are far wider than they need to be."
To conclude, so far storing my long string data in the cube seems convenient, and I haven't uncovered any reasons to expect disaster, so I'm giving it a try. I'll try to provide an update if things don't work out.
You could store the values in a table relationaly and then create an integer surrogate key.
add the integer surrogate to your UDM and create a SSRS Drillthrough action
http://msdn.microsoft.com/en-US/library/ms174526(v=SQL.90).aspx
that looks up the text field by the key value.
I would use a degenerate dimension, but hide it via SSAS until requested via a Drillthrough Action.
I can't guide you on the internal storage of strings for the AS engine, but as for storing them in SQL, I would make sure your varchar(MAX) column was at the end of your columns to speed up SQL engines scanning of those rows.
At 700,000 rows, with enough memory and disk I/O, you aren't taxing SQL much.
Haven't worked through all the possibilities described and link to from it yet, but this thread from 2007 is on the same topic and seems pretty relevant:
http://www.sqldev.org/sql-server-analysis-services/discussion-about-how-to-create-a-fact-drillthrough-dimension-the-best-way-34857.shtml
One new possibility raised here is that, rather than treating text stored in the fact table as a degenerate dimension, you could potentially treat it as a text-valued (vs numeric-valued) measure. Initial googling suggests that SSAS might support this but there are some tricks to getting this right, e.g. you probably want to disable aggregation for that measure, you might need to do something non-standard to get the field to appear in a drillthrough, and it might require SSAS enterprise edition.

SQL 2005 DB Partitioning for SharePoint

Background
I have a massive db for a SharePoint site collection. It is 130GB and growing at 10gb per month. 100GB of the 130GB is in one site collection. 30GB is the version table. There is only one site collection - this is by design.
Question
Am I able to partition a database (SharePoint) using SQL 2005s data partitioning features (creating multiple data files)?
Is it possible to partition a database that is already created?
Has anyone partitioned a SharePoint DB? Will I encounter any issues?
You would have to create a partition set and rebuild the table on that partition set. SQL2005 can only partition on a single column, so you would have to have a column in the DB that
Behaves fairly predictably so you don't get a large skew in the amount of data in each partition
IIRC the column has to be a numeric or datetime value
In practice it's easiest if it's monotonically increasing - you can create a series of partitions (automatically or manually) and the system will fill them up as it gets to the range definitions.
A date (perhaps the date the document was entered) would be ideal. However, you may or may not have a useful column on the large table. M.S. tech support would be the best source of advice for this.
The partitioning should be transparent to the application (again, you need a column with appropriate behaviour to use as a partition key).
Unless you are lucky enough to have a partition key column that is also used as a search predicate in the most common queries you may not get much query performance benefit from the partitioning. An example of a column that works well is a date column on a data warehouse. However, your Sharepoint application may not make extensive use of this sort fo query.
Mauro,
Is there no way you can segment the data on a Sharepoint level?
ie you may have multiple "sites" using a single (SQL) content database.
You could migrate site data to a new content database, which will allow you to reduce the data in that large content site and then shrink the datafiles.
it will also assist you in managing your obvious continued growth.
James.