Need to Develop SQL Server 2005 Database to Store Insurance Rates - sql-server-2005

Good Evening All,
A client has asked that I develop a web application as part of his existing site built in ASP.net 3.5 that will enable his brokers to generate quotes for potential client groups. Such quotes will need to be derived using rate tables stored in a SQL Server 2005 database.
The table structure is as follows:
[dbo].[PlanRates](
[AgeCategory] [int] NULL,
[IndustryType] [int] NULL,
[CoverageType] [int] NULL,
[PlanDeductible] [float] NULL,
[InpatientBenefit] [float] NULL,
[Rate] [float] NULL,
[OPMD15Copay] [float] NULL,
[OPMD25Copay] [float] NULL
Question: Assuming I use page validation in the web application to verify input against business logic, do you anticipate issues arising relative to the web application returning a quotation using the above database table layout? If so, can you suggest a better way to structure my table?
Bonus goes to anyone who has programmed web-based insurance quoting systems.
Thanks much for your help and guidance.

I would definitely add a surrogate primary key, e.g. PlanRatesID INT IDENTITY(1,1) to make each entry uniquely identifiable.
Secondly, I would think the fields "PlanDeductible", "InpatientBenefit", "Rate" are money values, so I would definitely make them of type DECIMAL, not FLOAT. Float is not very accurate and could lead to rounding errors. For DECIMAL, you need to specify the amount of significant digits before and after the decimal point, too, e.g. DECIMAL(12,3) or something like that.
That's about it! :)

I would suggest using parameterized queries to save an retrieve your data to protect against SQL injection.
EDIT
It looks like
[AgeCategory] [int] NULL,
[IndustryType] [int] NULL,
[CoverageType] [int] NULL,
Are probably foreign keys, if so you may not want to make them null-able.

NULLable category, types and rates?
Category and type columns are lookup fields to other tables storing additional information for each type.
You should check which of the columns is really nullable, and define how to deal with NULL values.
As rates can change, you should also consider a date range in the table (ValidFrom, ValidTo DATETIME), or have a history table associated with the table above to make sure past rate calculations can be repeated. (Might be a legal/financial requirement)

Related

SQL - Storing undefined fields in a table

So, I have a few client that want to store user data, the thing is they want to store data without telling me what it is. What I mean is that, they will likely store things like username, first name, last name and email, but then there are a myriad of other things that are no defined and that can change.
So with that in mind, I want to create a table that can handle that data. I was thinking of setting up a table like this:
CREATE TABLE [dbo].[Details](
[Id] [int] NOT NULL,
[ClientId] [nvarchar](128) NOT NULL,
[ColumnName] [nvarchar](255) NOT NULL,
[ColumnValue] [nvarchar](255) NOT NULL,
CONSTRAINT [PK_Details] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
Is this the best way to store this data, or am I making a mistake?
Please help me before I go ahead and create this table! :D
Just make clear to your clients that fields the database knows about, like a user number, name, and address can be searched fast, checked for consistency, and even be used for program control (such as some users see or can do what others cannot), whereas "undefined data" will not.
Then find out, if they really need single entities. Then the name/value approach is exactly the way to go. Maybe, however, it suffices to store one note per user, where they enter all their comments, memos, etc., i.e. just one text to hold all "undefined data". If possible this would be a better approach.

SQL Indexing Strategy on Link Tables

I often find myself creating 'link tables'. For example, the following table maps a user record to an event record.
CREATE TABLE [dbo].[EventLog](
[EventId] [int] NOT NULL,
[UserId] [int] NOT NULL,
[Time] [datetime] NOT NULL,
[Timestamp] [timestamp] NOT NULL
)
For the purposes of this question, please assume the combination of EventId plus UserId is unique and that the database in question is a MS SQL Server 2008 installation.
The problem I have is that I am never sure as to how these tables should be indexed. For example, I might want to list all users for a particular event, or I might want to list all events for a particular user or, perhaps, retrieve a particular EventId/UserId record. Indexing options I have considered include:
Creating a compound primary key on EventId and UserId (but I
understand the index won't be useful when accessing by UserId on its
own).
Creating a compound primary key on EventId and UserId and a adding a
supplemental index on UserId.
Creating a primary key on EventId and a supplemental index on
UserId.
Any advice would be appreciated.
The indices are designed to solve performance problems. If you don't yet have such problem and cannot exactly know where you'll face troubles then you shouldn't create indexes. The indices are quite expensive. Because it not only takes up disk space but also causes the overhead of writing or modifying data. So you have to be clear understand what the specific performance problem you decide by creating an index. So you can appreciate the need to create it.
The answer to your question depends on several aspects.
It depends on the DBMS you are going to use. Some prefer single-column indexes (like Postgresql), some can take more advantage of multi-column indexes (like Oracle). Some can answer a query completely from a covering index (like sqlite), others cannot and eventually have to read the pages of the actual table (again, like postgres).
It depends on the queries you want to answer. For example, do you navigate in both directions, i.e., do you join on both of your Id columns?
It depends on your space and processing time requirements for data modification, too. Keep in mind that indexes are often bigger than the actual table that they index, and that updating indexes is often more expensive that just updating the underlying table.
EDIT:
When your conceptual model has a many-to-many relationship R between two entities E1 and E2, i.e., the logical semantics of R is either "related" or "not-related", than I would always declare that combined primary key for R. That will create a unique index. The primary motivation is, however, data consistency, not query optimization, i.e.:
CREATE TABLE [dbo].[EventLog](
[EventId] [int] NOT NULL,
[UserId] [int] NOT NULL,
[Time] [datetime] NOT NULL,
[Timestamp] [timestamp] NOT NULL,
PRIMARY KEY([EventId],[UserId])
)

Map documents to different entities

I have a problem with designing my database.
I have a table which contains documents with the following table-structure:
[Documents]
Id [int]
FileName [varchar]
FileFormat [varchar]
FileContent [image]
In my program: each document can be standalone (without any relationship to an entity) or with an relation to an object either of type Customer or Employee (some more are probably coming soon)
Each entity has an Id in the database. For example the Employee-Table looks like:
[Employee]
Id [int]
Fk_NameId [int]
Fk_AddressId [int]
Fk_ContactId [int]
My idea is to create a table for the connection of an entity and an document. I thought about something like:
[DocumentConnection]
DocumentId [int]
EntityId [int]
Entity [varchar]
The entity-column in the DocumentConnection-Table contains the table-name of the relation.
In the example of an entity of type Employee this column would contain "Employee".
In my application then I build the select-statement for the document by reading the entity-string from the database.
I'm not sure if this is a good way to do this.
I think it would be a much better design to have an EmployeeDocument table, CustomerDocument table, etc.
That will allow you to use foreign keys to the entity tables, which would not be possible in your proposed design. In your design, you would be able to put anything in the entity and entityId columns and it would not be enforceable via foreign key relationship that it actually relates to an existing entity.
The only reason I can see for using your DocumentConnection table would be if your application needed to dynamically create new types of relationships. I assume that isn't the case since you said each type of entity will have its own table.

How to make bulk insert work with multiple tables

How with SQL server bulk insert can I insert into multiple tables when there is a foreign key relationship?
What I mean is that the tables are this,
CREATE TABLE [dbo].[UndergroundFacilityShape]
([FacilityID] [int] IDENTITY(1,1) NOT NULL,
[FacilityTypeID] [int] NOT NULL,
[FacilitySpatialData] [geometry] NOT NULL)
CREATE TABLE [dbo].[UndergroundFacilityDetail]
([FacilityDetailID] [int] IDENTITY(1,1) NOT NULL,
[FacilityID] [int] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Value] [nvarchar](255) NOT NULL)
So each UndergroundFacilityShape can have multiple UndergroundFacilityDetail. The problem is that the FacilityID is not defined until the insert is done because it is an identity column. If I bulk insert the data into the Shape table then I cannot match it back up the the Detail data I have in my C# application.
I am guessing the solution is to run a SQL statement to find out what the next identity value is and popuplate the values myself and turn off the identity column for the bulk insert? Bear in mind that only one person is going to be running this application to insert data, and it will be done infrequently so we don't have to worry about identity values clashing or anything like that.
I am trying to import thousands of records, which takes about 3 minutes using standard inserts, but bulk insert will take a matter of seconds.
In the future I am expecting to import data that is much bigger than 'thousands' of records.
Turns out that this is quite simple. Get the current identity values on each of the tables, and populate them into the DataTable myself incrementing them as I use them. I also have to make sure that the correct values are used to maintain the relationship. That's it. It doesn't seem to matter whether I turn off the identity columns or not.
I have been using this tool on live data for a while now and it works fine.
It was well worth it though the import takes no longer than 3 seconds (rather than 3 minutes), and I expect to receive larger datasets at some point.
So what about if more than one person uses the tool at one time? Well yes I expect issues, but this is never going to be the case for us.
Peter, you mentioned that you already have a solution with straight INSERTs.
If the destination table does not have a clustered index (or has a clustered index and is empty), just using the TABLOCK query hint will make it a minimally-logged transaction, resulting on a considerable speed up.
If the destination table has a clustered index and is not empty, you can also enable trace flag 610 in addition to the TABLOCK query hint to make it a minimally-logged transaction.
Check the "Using INSERT INTO…SELECT to Bulk Import Data with Minimal Logging" section on the INSERT MSDN page.

Managing website 'event' database

How should I manage tables that refer to site 'events'. i.e. certain activities a user has done on a website that I use for tracking. I want to be able to do all kinds of datamining and correlation between different activities of users and what they have done.
Today alone I added 107,000 rows to my SiteEvent table. I dont think this is sustainable!
The database is SQL Server. I'm mainly referring to best practice activites with respect to managing large amounts of data.
For instance :
Should I keep these tables in a database all of their own? If i need to join with other tables this could be a problem. Currently I just have one database with everything in.
How ought I to purge old records. I want to ensure my db file doesnt keep growing.
Best practices for backing up and truncating logs
Will adding additional indexes dramatically increase the size of the DB with so many records?
Any other things i need to so in SQL Server that might come back to bite me later?
FYI: these are the tables
CREATE TABLE [dbo].[SiteEvent](
[SiteEventId] [int] IDENTITY(1,1) NOT NULL,
[SiteEventTypeId] [int] NOT NULL,
[SiteVisitId] [int] NOT NULL,
[SiteId] [int] NOT NULL,
[Date] [datetime] NULL,
[Data] [varchar](255) NULL,
[Data2] [varchar](255) NULL,
[Duration] [int] NULL,
[StageSize] [varchar](10) NULL,
and
CREATE TABLE [dbo].[SiteVisit](
[SiteVisitId] [int] IDENTITY(1,1) NOT NULL,
[SiteUserId] [int] NULL,
[ClientGUID] [uniqueidentifier] ROWGUIDCOL NULL CONSTRAINT [DF_SiteVisit_ClientGUID] DEFAULT (newid()),
[ServerGUID] [uniqueidentifier] NULL,
[UserGUID] [uniqueidentifier] NULL,
[SiteId] [int] NOT NULL,
[EntryURL] [varchar](100) NULL,
[CampaignId] [varchar](50) NULL,
[Date] [datetime] NOT NULL,
[Cookie] [varchar](50) NULL,
[UserAgent] [varchar](255) NULL,
[Platform] [int] NULL,
[Referer] [varchar](255) NULL,
[RegisteredReferer] [int] NULL,
[FlashVersion] [varchar](20) NULL,
[SiteURL] [varchar](100) NULL,
[Email] [varchar](50) NULL,
[FlexSWZVersion] [varchar](20) NULL,
[HostAddress] [varchar](20) NULL,
[HostName] [varchar](100) NULL,
[InitialStageSize] [varchar](20) NULL,
[OrderId] [varchar](50) NULL,
[ScreenResolution] [varchar](50) NULL,
[TotalTimeOnSite] [int] NULL,
[CumulativeVisitCount] [int] NULL CONSTRAINT [DF_SiteVisit_CumulativeVisitCount] DEFAULT ((0)),
[ContentActivatedTime] [int] NULL CONSTRAINT [DF_SiteVisit_ContentActivatedTime] DEFAULT ((0)),
[ContentCompleteTime] [int] NULL,
[MasterVersion] [int] NULL CONSTRAINT [DF_SiteVisit_MasterVersion] DEFAULT ((0)),
You said two things that are in conflict with each other.
I want to be able to do all kinds of datamining and correlation between different activities of users and what they have done.
I want to ensure my db file doesnt keep growing.
I am also a big fan of data mining, but you need data to mine. In my mind, create a scalable database design and plan for it to grow TREMENDOUSLY. Then, go grab all the data you can. Then, finally, you will be able to do all the cool data mining you are dreaming about.
Personally I would keep absolutely keep the log records outside the main database. The performance of your application would take a huge hit by having to constantly do writes.
I think the way to go is to create a secondary database on a different machine, publish a SOAP api that is irrelevant to the underlying DB Schema and have the application report to that. I'd also suggest that maybe-write semantics (don't wait for confirmation response) could do for you, if you can risk loosing some of this information.
On the secondary DB you can have your API calls trigger some sort of database pruning or detach/backup/recreate maintenance procedure. If you need a log then you shouldn't give up on the possibility of it being useful in the future.
If you need some sort of analysis service on that, the best way to go is SQL Server. Otherwise MySQL or PostGREs will do the job much cheaper.
Re-thinking the problem might be just what the doctor ordered. Can 100k records per day really be that useful? Seems like information overload to me. Maybe start by reducing the granularity of your usage tracking?
In terms of re-thinking the problem, you might explore one of the many web statistics packages out there. There are only a few fields in your sample table that are not part of an out-of-the-box implementation of WebTrends or Google Analytics or many others. The other items in your table can be set up as well, but take a bit more thought and some research into which package will meet all of your needs. Most of the off the shelf stuff can deal with campaign tracking, etc. these days.
One other option would be to offload the common stuff to a standard web-stats package and then parse this back into SQL Server with your custom data out-of-band.
I don't know how much other data you have, but if the 107K+ records a day represents the bulk of it, you might end up spending your time dealing with keeping your web stats working rather than your applications actual functionality.
I would keep them in the same database, unless you can safely purge / store old records for OLAP querying and then keep the primary database for OLTP purposes.
Make sure you set a large initial size for the database and set a large autogrow value, and ensure you don't run out of disk space. 107k records a day is going to take up space no matter how you store it.
As for backups, that's completely dependent on your requirements. A weekly full, daily diff and one/two hour diff should work fine as long as the IO subsystem can cope with it.
Additional indexes will take up space, but again, it depends on which columns you add. If you have 10^6 rows and you add a nonclustered index it'll take up 10^6 * 4 * 2. That's 10^6 for the actual indexed column, and an additional 4 bytes for the primary key as well, for each index entry. So for each 1 million records, a nonclustered index on an int column will take up roughly 8MBs.
When the table grows, you can add servers and do horizontal partitioning on the table so you spread out the data on multiple servers.
As for the IO, which is probably going to be the largest hurdle, make sure you have enough spindles to handle the load, preferably with indexes being on their own diskset/LUN and the actual data on their own set of disks/LUN.