Indexing for uniqueidentifier column in table - sql

i have created a table where i used unique identifier(GUID) as a primary key of table. Now i need to create a indexing on my table which one will be best for me..i am going to use this table for error logging.
Following is my table structure
CREATE TABLE [dbo].[errors](
[error_id] [uniqueidentifier] NOT NULL,
[assembly_name] [varchar](50) NULL,
[method_name] [varchar](50) NULL,
[person_id] [int] NULL,
[timestamp] [datetime] NULL,
[description] [varchar](max) NULL,
[parameter_list] [varchar](max) NULL,
[exception_text] [nvarchar](max) NULL)
So which table i use as a primary key and index.
Thanks in advance.

You can use that as PK but not good if you use it as clustered index.In that case the GUID will be copied in all the nc index keys and thus makes them much wider and could cause performance issue.Also, this might cause page spilts which is no good.Wide indexes means more space will be used.If you have used GUID to avoid the last page contention issues try to use some sort of hashing technique to make sure that data goes on diff pages.But in that case you have to use same hashing while selecting form table using PK.

Related

Does dropping of non clustered index removes existing Full Text Indexing in SQL Server table?

CREATE TABLE [dbo].[TicketTasks]
(
[TicketTaskId] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
[TicketTaskTypeId] [char](2) NOT NULL,
[TicketId] [int] NOT NULL,
[CreatedUtc] [datetime] NOT NULL,
[DeletedUtc] [datetime] NULL,
[DepartmentId] [int] NOT NULL,
[TaskAction] [nvarchar](max) NULL,
[TaskResult] [nvarchar](max) NULL,
[TaskPrivateNote] [nvarchar](max) NULL,
CONSTRAINT [PK_TicketTasks]
PRIMARY KEY CLUSTERED ([TicketTaskId] ASC)
)
HI this is my table structure.
TicketTaskId which is clustered
TicketId non clustered index with [CreatedUtc], [DeletedUtc], [DepartmentId]
TaskAction, TaskResult, TaskPrivateNote is Added for full text indexing
I have around 50 million records in this table. Currently I have to drop and recreate non clustered index for TicketID to sort in desc.
Does this dropping of non clustered index affect any other indexes?
Will it remove the full text indexing done for the table?
Common non-clustered indexes are independent of each other, the only problem you might get will be if you tamper with the clustered one (usually accompanied by a PRIMARY KEY), since it physically reorganizes the data and all other indexes contain a pointer to it.
However, full text indexes are attached to an index that enforces uniqueness of values (a UNIQUE index). This is done by the KEY INDEX option when creating the full text index:
CREATE FULLTEXT INDEX
ON [dbo].[TicketTasks](TaskAction)
KEY INDEX <UniqueIndexName>
So as long as you don't change this referenced unique index, you can drop the other non-clustered indexes.

Save records in rows or columns in SQL Server

I have to save three document types in a table. number of document types is fixed and will not change. there is more than 1 million records and in the future it can be more than 100 millions. for this purpose performance is so important in my program. I don't know which way can improve the database performance. row-based or column based?
Row-Based:
CREATE TABLE [Person].[Document]
(
[Id] [uniqueidentifier] NOT NULL,
[PersonId] [uniqueidentifier] NOT NULL,
[Document] [varbinary](max) NULL,
[DocType] [int] NOT NULL,
)
Column-based:
CREATE TABLE [Person].[Document]
(
[Id] [uniqueidentifier] NOT NULL,
[PersonId] [uniqueidentifier] NOT NULL,
[Document_Page1] [varbinary](max) NULL,
[Document_Page2] [varbinary](max) NULL,
[Document_Page3] [varbinary](max) NULL,
)
The normalized (or as you called it - row based) solution is more flexible.
It allows you to change the number of documents saved for each person without changing the database structure, and usually is the preferred solution.
A million rows is a small table for SQL server.
I've seen database tables with 50 million rows that performs very well.
It's a question of correct indexing.
I do suggest that if you want better performance use an int identity column for your primary key instead of a uniqueidentifier, since it's very light weight and much easier for the database to index because it's not randomly ordered to begin with.
I would go with the normalized solution.

Implementing custom fields in a database for large numbers of records

I'm developing an app which requires a user defined custom fields on a contacts table. This contact table can contain many millions of contacts.
We're looking at using a secondary metadata table which stores information about the fields, along with a tertiary value table which stores the actual data.
Here's the rough schema:
CREATE TABLE [dbo].[Contact](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](max) NULL,
[MiddleName] [nvarchar](max) NULL,
[LastName] [nvarchar](max) NULL,
[Email] [nvarchar](max) NULL
)
CREATE TABLE [dbo].[CustomField](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FieldName] [nvarchar](50) NULL,
[Type] [varchar](50) NULL
)
CREATE TABLE [dbo].[ContactAndCustomField](
[ID] [int] IDENTITY(1,1) NOT NULL,
[ContactID] [int] NULL,
[FieldID] [int] NULL,
[FieldValue] [nvarchar](max) NULL
)
However, this approach introduces a lot of complexity, particularly with regard to importing CSV files with multiple custom fields. At the moment this requires a update/join statement and a separate insert statement for every individual custom field. Joins would also be required to return custom field data for multiple rows at once
I've argued for this structure instead:
CREATE TABLE [dbo].[Contact](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](max) NULL,
[MiddleName] [nvarchar](max) NULL,
[LastName] [nvarchar](max) NULL,
[Email] [nvarchar](max) NULL
[CustomField1] [nvarchar](max) NULL
[CustomField2] [nvarchar](max) NULL
[CustomField3] [nvarchar](max) NULL /* etc, adding lots of empty fields */
)
CREATE TABLE [dbo].[ContactCustomField](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FieldIndex] [int] NULL,
[FieldName] [nvarchar](50) NULL,
[Type] [varchar](50) NULL
)
The downside of this second approach is that there is a finite number of custom fields that must be specified when the contacts table is created. I don't think that's a major hurdle given the performance benefits it will surely have when importing large CSV files, and returning result sets.
What approach is the most efficient for large numbers of rows? Are there any downsides to the second technique that I'm not seeing?
Microsoft introduced sparse columns exactly for this type of problems. Tha point is that in a "classic" design you end up with large number of columns, most of the NULLs for any particular row. Same here with sparse columns, but NULLs don't require any storage. Moreover, you can create sets of columns and modify sets with XML.
Performance- and storage-wise, sparse columns are the winner.
http://technet.microsoft.com/en-us/library/cc280604.aspx
uery performance. Query performance for any "property bag table" approach is funny and comically slow - but if you need flexibility you can either have a dynamic table that is changed via an editor OR you have a property bag table. So when you need it, you need it.
But expect the performance to be slow.
The best approach would likely be a ContactCustomFields table which has - fields that are determined by an editor.

When is a SQL Server foreign key table too much?

When I have the below two tables, would the StatusTypes table be considered as overkill? i.e. is there more benefit to using it than not?
In this situation I don't expect to have to load these statuses up in an admin backend in order to add or change/ delete them, but on the other hand I don't often like not using foreign keys.
I'm looking for reasons for and against separating out the status type or keeping it in the Audit table.
Any help would be appreciated.
-- i.e. NEW, SUBMITTED, UPDATED
CREATE TABLE [dbo].[StatusTypes](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [nvarchar](250) NOT NULL,
CONSTRAINT [PK_StatusTypes] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Audits](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Description] [nvarchar](500) NULL,
[Country_Fkey] [int] NOT NULL,
[User_Fkey] [int] NOT NULL,
[CreatedDate] [date] NOT NULL,
[LastAmendedDate] [date] NULL,
[Status_Fkey] [int] NOT NULL,
CONSTRAINT [PK_Audits] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
GO
In this situation I like to keep the lookup table to enforce the status being one of a set of types. Some databases have an enum type, or can use check constraints, but this is the most portable method IMO.
However, I make the lookup table containing only a single string column containing the type's name. That way you don't have to actually join to the lookup table and your ORM (assuming you use one) can be completely unaware of it.
In this case the schema would look like:
CREATE TABLE [dbo].[StatusTypes](
[ID] [nvarchar](250) NOT NULL,
CONSTRAINT [PK_StatusTypes] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Audits](
[ID] [int] IDENTITY(1,1) NOT NULL,
...
[Status] [nvarchar](250) NOT NULL,
CONSTRAINT [PK_Audits] PRIMARY KEY CLUSTERED ([ID] ASC),
CONSTRAINT [FK_Audit_Status] FOREIGN KEY (Status) REFERENCES StatusTypes(ID)
) ON [PRIMARY]
GO
And a query for audit items of a particular type would be:
SELECT ...
FROM Audits
WHERE Status = 'ACTIVE'
So referential integrity is still enforced but queries don't need an extra join.
I'll offer a counter-argument: Use your development time where it is most useful. Maybe you don't need this runtime-check that much. Maybe you can use your development time for some other check that is more useful.
Is it even likely that an invalid status value will be set? You application surely uses a set of constants or an enum so it is unlikely that some rogue value slips in.
That said, there is a lot of value in ensuring integrity. I like to cover all my "enum" columns with a BETWEEN check constraint which is quickly done and even faster at runtime.

adding multiple values to sql

say i have a column defined as Address. also, I have a record, let's call it Rudy's. now Rudy's has multiple addresses, so I need to include multiple address so that they are all searchable. what is the best way to approach a solution in SQL?
You should add a child table with an Address column. You will have one to many relation where address is stored in a child table. You can add as many addresses per user as you want. Also you can add extra info like address type (home, work or primary, secondary ect.)
I wouldn't go for one column for address. If this is a postal address it is better to have more columns such as street, town, house number ect. Then you can have an advantage of using indexes on your columns.
You could try something like this:
CREATE TABLE [dbo].[Person](
[PersonId] [int] IDENTITY(1,1) NOT NULL,
[FullName] [varchar](50) NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED ([PersonId] ASC))
GO
CREATE TABLE [dbo].[Addresses](
[AddressId] [int] IDENTITY(1,1) NOT NULL,
[PersonId] [int] NOT NULL,
[AddressLine1] [varchar](50) NULL,
[AddressLine2] [varchar](50) NULL,
[City] [varchar](50) NULL,
[State] [varchar](4) NULL,
[Country] [varchar](50) NULL,
CONSTRAINT [PK_Addresses] PRIMARY KEY CLUSTERED ([AddressId] ASC))
GO
ALTER TABLE [dbo].[Addresses]
WITH CHECK ADD CONSTRAINT [FK_Addresses_Person] FOREIGN KEY([PersonId])
REFERENCES [dbo].[Person] ([PersonId])
GO
ALTER TABLE [dbo].[Addresses] CHECK CONSTRAINT [FK_Addresses_Person]
GO
Of course you can have it as complex as you want and follow the previous advice for storing address types etc.
It might help to download MS examples from http://sqlserversamples.codeplex.com/ and follow their best practices.