How can I optimise this SQL query - sql

I'm writing a piece of software that is meant to identify files that have been put onto the web server (CMS) but are no longer needed and should/could be deleted.
To start with I'm trying to reproduce all required steps manually.
I'm using a batch script executed in the webroot to identify all (relevant) files on the server. Then, I'm importing the list to SQL Server and the table looks like this:
id filename
1 filename1.docx
2 files/file.pdf
3 files/filename2.docx
4 files/filename3.docx
5 files/file1.pdf
6 file2.pdf
7 file4.pdf
I also have a CMS database (Alterian/Immediacy CMC 6.X) which has 2 tables storing page content: page_data and PageXMLArchive.
I would like to scan the database to see if the files from the first table are referenced anywhere in the content of the site - p_content column from page_data table and PageXML column from PageXMLArchive table.
So I have a loop which gets each filename and checks if it's referenced in any of those tables, if it is it skips it, if it ain't it adds it to a temporary table.
At the end of the query the temporary table is displayed.
Query below:
DECLARE #t as table (_fileName nvarchar(255))
DECLARE #row as int
DECLARE #result as nvarchar(255)
SET #row = 1
WHILE(#row <= (SELECT COUNT(*) FROM ListFileReport))
BEGIN
SET #result = (SELECT [FileName] FROM ListFileReport WHERE id = #row)
IF ((SELECT TOP(1) p_content FROM page_data WHERE p_content LIKE '%' + LTRIM(RTRIM(#result)) + '%') IS NULL) OR ((SELECT TOP(1) PageXML FROM PageXMLArchive WHERE PageXML LIKE '%' + LTRIM(RTRIM(#result)) + '%') IS NULL)
BEGIN
INSERT INTO #t (_fileName) VALUES(#result)
END
SET #row = #row + 1
END
select * from #t
Unfortunately due to my poor SQL skills the query takes over 2 hours to execute and times out.
How can I imporve that query, or change it to achieve a similar thing without having to run 1000s of WHERE x LIKE statements on ntext fields? I can't make any changes to the database, it has to stay untouched (or it won't be supported - big deal for our customers).
Thanks
EDIT:
Currently I'm working around the issue by batching the results few hundred at a time. It works but takes forever.
EDIT:
Can I possibly utilise Full-Text search to achieve this? I am willing to take a snapshot of the database and work on the copy if there is a way of changing the schema to achieve the desired results.
page_data table:
USE [TD-VMB-01-STG]
GO
/****** Object: Table [dbo].[page_data] Script Date: 12/13/2011 13:19:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[page_data](
[p_page_id] [int] NOT NULL,
[p_title] [nvarchar](120) NULL,
[p_link] [nvarchar](250) NULL,
[p_content] [ntext] NULL,
[p_parent_id] [int] NULL,
[p_top_id] [int] NULL,
[p_stylesheet] [nvarchar](50) NULL,
[p_author] [nvarchar](50) NULL,
[p_last_update] [datetime] NULL,
[p_order] [smallint] NULL,
[p_window] [nvarchar](10) NULL,
[p_meta_keywords] [nvarchar](1000) NULL,
[p_meta_desc] [nvarchar](2000) NULL,
[p_type] [nvarchar](1) NULL,
[p_confirmed] [int] NOT NULL,
[p_changed] [int] NOT NULL,
[p_access] [int] NULL,
[p_errorlink] [nvarchar](255) NULL,
[p_noshow] [int] NOT NULL,
[p_edit_parent] [int] NULL,
[p_hidemenu] [int] NOT NULL,
[p_subscribe] [int] NOT NULL,
[p_StartDate] [datetime] NULL,
[p_EndDate] [datetime] NULL,
[p_pageEnSDate] [int] NOT NULL,
[p_pageEnEDate] [int] NOT NULL,
[p_hideexpiredPage] [int] NOT NULL,
[p_version] [float] NULL,
[p_edit_order] [float] NULL,
[p_order_change] [datetime] NOT NULL,
[p_created_date] [datetime] NOT NULL,
[p_short_title] [nvarchar](30) NULL,
[p_authentication] [tinyint] NOT NULL,
CONSTRAINT [aaaaapage_data_PK] PRIMARY KEY NONCLUSTERED
(
[p_page_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_order] DEFAULT (0) FOR [p_order]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_con__1CF15040] DEFAULT (0) FOR [p_confirmed]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_cha__1DE57479] DEFAULT (0) FOR [p_changed]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_acc__1ED998B2] DEFAULT (1) FOR [p_access]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_nos__1FCDBCEB] DEFAULT (0) FOR [p_noshow]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_edi__20C1E124] DEFAULT (0) FOR [p_edit_parent]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF__Temporary__p_hid__21B6055D] DEFAULT (0) FOR [p_hidemenu]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_subscribe] DEFAULT (0) FOR [p_subscribe]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_pageEnSDate] DEFAULT (0) FOR [p_pageEnSDate]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_pageEnEDate] DEFAULT (0) FOR [p_pageEnEDate]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_hideexpiredPage] DEFAULT (1) FOR [p_hideexpiredPage]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_version] DEFAULT (0) FOR [p_version]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_edit_order] DEFAULT (0) FOR [p_edit_order]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_order_change] DEFAULT (getdate()) FOR [p_order_change]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_created_date] DEFAULT (getdate()) FOR [p_created_date]
GO
ALTER TABLE [dbo].[page_data] ADD CONSTRAINT [DF_page_data_p_authentication] DEFAULT ((0)) FOR [p_authentication]
GO
PageXMLArchive table:
USE [TD-VMB-01-STG]
GO
/****** Object: Table [dbo].[PageXMLArchive] Script Date: 12/13/2011 13:20:00 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PageXMLArchive](
[ArchiveID] [bigint] IDENTITY(1,1) NOT NULL,
[P_Page_ID] [int] NOT NULL,
[p_author] [nvarchar](100) NULL,
[p_title] [nvarchar](400) NULL,
[Version] [int] NOT NULL,
[PageXML] [ntext] NULL,
[ArchiveDate] [datetime] NOT NULL,
CONSTRAINT [PK_PageXMLArchive] PRIMARY KEY CLUSTERED
(
[ArchiveID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[PageXMLArchive] ADD CONSTRAINT [DF_PageXMLArchive_ArchiveDate] DEFAULT (getdate()) FOR [ArchiveDate]
GO

You can avoid the loop in many ways, and here is an example...
SELECT
*
FROM
ListFileReport
WHERE
NOT EXISTS (
SELECT *
FROM page_data
WHERE p_content LIKE '%' + LTRIM(RTRIM(ListFileReport.FileName)) + '%'
)
AND
NOT EXISTS (
SELECT *
FROM PageXMLArchive
WHERE PageXML LIKE '%' + LTRIM(RTRIM(ListFileReport.FileName)) + '%'
)
Note: This removes the loop, and will yield a massive improvement because of that. But it still has to parse the whole of both lookup tables for every entry in ListFileReport, without any clever algorithmics, s their could be no useful indexing. So it will still be slow as a dog, it'll just have one broken leg instead of two.
The only way to avoid using LIKE is to parse all of the fields in the page_data and PageXMLArchive tables and create a list of referenced files. As HTML and XML are very structured, this can be done, but I'd look for a library or something to do it for you.
Then, you can create a another table with all of the files, without duplication, and with an appropriate index. Querying against that instead of using LIKE will be massively faster. I have no doubts at all. But writing or finding the code will be a chore.

A stored procedure especially has a loop with select and insert mixed will defiantly slow down the query.
ideally if you could insert into #table select a, b from table it will millions time faster than insert each row separately.
For your example, could do something like:
insert into #t (_fileName) select ... from p_content join ...on .. where sth like %sth
let me know if it is not applicable.

Related

Bigint column throwing numeric value out of range error in stored procedure

One of my customer is facing a weird issue at database side. There is one stored which is having only one insert statement
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[COL_INSERT_LOOP_DATA](
#LoopDataID AS bigint output,
#LoopID AS bigint,
#NumSegments AS int,
#Type as char(1),
#LoopData AS text,
#TransmissionID as bigint,
#SubmissionDate AS datetime
) AS
BEGIN
SET NOCOUNT ON
INSERT INTO LoopData (
LoopID,
NumSegments,
Type,
LoopData,
TransmissionID,
SubmissionDate
) VALUES (
#LoopID,
#NumSegments,
#Type,
#LoopData,
#TransmissionID,
#SubmissionDate
)
SELECT #LoopDataID=SCOPE_IDENTITY()
END
this stored procedure is throwing numeric value out of range error for last few weeks. We have found that it is happening when we mention more than 10 digits value in loopDataId column. But when we run the same insert statement explicitly as a separate query (without stored procedure), it is working.
Below is the loopData table definition -
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[LoopData](
[LoopDataID] [bigint] IDENTITY(1,1) NOT NULL,
[LoopID] [bigint] NOT NULL,
[TransmissionID] [bigint] NOT NULL,
[SubmissionDate] [datetime] NOT NULL,
[LoopData] [text] NOT NULL,
[Type] [char](1) NOT NULL,
[NumSegments] [int] NOT NULL,
[BackupLoopDataID] [bigint] NULL,
CONSTRAINT [PK_LoopData] PRIMARY KEY NONCLUSTERED
(
[LoopDataID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[LoopData] ADD CONSTRAINT [DF_LoopData_NumSegments] DEFAULT ((1)) FOR [NumSegments]
GO
ALTER TABLE [dbo].[LoopData] WITH CHECK ADD CONSTRAINT [FK_LoopData_Loop] FOREIGN KEY([LoopID])
REFERENCES [dbo].[Loop] ([LoopID])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[LoopData] CHECK CONSTRAINT [FK_LoopData_Loop]
GO
ALTER TABLE [dbo].[LoopData] WITH CHECK ADD CONSTRAINT [FK_LoopData_LoopData] FOREIGN KEY([BackupLoopDataID])
REFERENCES [dbo].[LoopData] ([LoopDataID])
GO
ALTER TABLE [dbo].[LoopData] CHECK CONSTRAINT [FK_LoopData_LoopData]
GO
ALTER TABLE [dbo].[LoopData] WITH CHECK ADD CONSTRAINT [FK_LoopData_Transmission] FOREIGN KEY([TransmissionID])
REFERENCES [dbo].[Transmission] ([TransmissionID])
GO
ALTER TABLE [dbo].[LoopData] CHECK CONSTRAINT [FK_LoopData_Transmission]
GO
Customer has been using the database for years without any issue. But sometimes they do partition pruning for performance improvement. For bigint column a 12 digit number should not be a problem.
Please help.

SQL Server Computed Column as Primary Key

I've created a table with a computed column as the primary key.
Table is created fine.And here is the script..
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ARITHABORT ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [planning.A062].[RMAllocation](
[Id] [int] IDENTITY(100,1) NOT NULL,
[RMAllocatonId] AS ('RMA_'+CONVERT([nvarchar](100),[Id])) PERSISTED NOT NULL,
[RequsitionNo] [nvarchar](100) NULL,
[RMDemandId] [nvarchar](104) NULL,
[HierarchyId] [nvarchar](102) NULL,
[Season] [nvarchar](50) NULL,
[VendorSupplierNo] [nvarchar](100) NULL,
[Year] [int] NULL,
[Month] [int] NULL,
[Week] [int] NULL,
[Day] [int] NULL,
[PlannedQty] [int] NULL,
[ConfirmedQty] [int] NULL,
[Status] [int] NULL,
[CreatedBy] [int] NULL,
[SyncId] [nvarchar](100) NULL,
[CreatedOn] [datetime2](7) NULL,
[UpdatedBy] [int] NULL,
[UpdatedOn] [datetime2](7) NULL,
[IsActive] [bit] NULL,
[RecordDateTime] [datetime2](7) NULL,
CONSTRAINT [PK_RMAllocation] PRIMARY KEY CLUSTERED
(
[RMAllocatonId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
The problem is when I change this table (Add/edit a column ) using Designer View,it gives me the following error.
error
Unable to create index 'PK_RMAllocation'.
Cannot define PRIMARY KEY constraint on nullable column in table 'RMAllocation'.
Could not create constraint. See previous errors.
When I use script to do modifications,it works. And Even I have declared the computed column as NOT NULL. How this happen??
Something is wrong with the designer. SQL Server is quite clear in the documentation that computed columns can be used for primary keys (for instance, here).
My guess is that the designer is dropping all constraints on the table and adding them back in. It ends up adding them in the wrong order, so the primary key is assigned before the not null on the computed column. I have no idea if there is any work-around other than the obvious one of not using the designer.
According to the documentation (emphasis mine)
A computed column cannot be used as a DEFAULT or FOREIGN KEY
constraint definition or with a NOT NULL constraint definition.
So it may be somewhat surprising that it works at all even in TSQL.
When the designer implements the change by recreating the table, it loses the NOT NULL on the column definition.
[Id] [int] IDENTITY(100,1) NOT NULL,
[RMAllocatonId] AS ('RMA_'+CONVERT([nvarchar](100),[Id])) PERSISTED,
[RequsitionNo] [nvarchar](100) NULL,
Semantically this concatenation of a NOT NULL constant and a NOT NULL column can never be NULL anyway.
Another way you can persuade SQL Server that the column will be NOT NULL-able even in the absence of a NOT NULL is by wrapping the definition in an ISNULL.
The following works fine with the designer
[RMAllocatonId] AS (ISNULL('RMA_'+CONVERT([nvarchar](100),[Id]),'')) PERSISTED
At insert time the system doesn't know the new [id] value. You need a trigger that update the value later.
When two Id values are same, corresponding RMAllocatonId values will be same. When two Id values are different, corresponding RMAllocatonId values will be different. So making the Id unique is equivalent to making RMAllocatonId unique.
If you ask me, just put the PRIMARY KEY on Id where it belongs and be done with it...

Error creating In-Memory Tables

I am trying to create a table, in-memory but it keeps giving me errors like:
Msg 10794, Level 16, State 85, Line 11 The index option 'pad_index' is
not supported with indexes on memory optimized tables.
and another one saying the same thing about Primary Key Clustered.
Those two things above (pad_index and PK Clustered) are needed, unless there is another way to get it to work...
[DEMO-Training1]
GO
/****** Object: Table [dbo].[wtAssetChildren] ******/
SET ANSI_NULLS ON GO
SET QUOTED_IDENTIFIER ON GO
CREATE TABLE [dbo].[wtAssetChildren] (
[wtAssetChildrenID] [int] IDENTITY(1,1) NOT NULL,
[wtGUID] [uniqueidentifier] NOT NULL,
[CallingAssetID] [int] NOT NULL,
[AssetID] [int] NOT NULL,
[Processed] [bit] NOT NULL,
CONSTRAINT [PK_wtAssetChildren] PRIMARY KEY CLUSTERED ([wtAssetChildrenID] ASC)
WITH (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[wtAssetChildren]
ADD CONSTRAINT [DF_wtAssetChildren_Processed] DEFAULT ((0)) FOR [Processed] GO
In-Memory OLTP Tables have a bunch of restrictions relative to the rest of what generally ships with MS SQL. Check out http://msdn.microsoft.com/en-us/library/dn246937.aspx for a full list.
Clustered PKs are not supported, for example. You'll have to use nonclustered primary key indices instead.
Below is an example of how you can create this table in a memory-optimized SQL Server 2014. Note that just because you can, doesn't mean you should. You mention you are beginner and memory-optimized tables are more of an advanced option.
CREATE DATABASE MemoryOptimizedTest;
GO
ALTER DATABASE MemoryOptimizedTest
ADD FILEGROUP MemoryOptimizedTest_MemoryOptimized
CONTAINS MEMORY_OPTIMIZED_DATA;
GO
ALTER DATABASE MemoryOptimizedTest
ADD FILE (
NAME = 'MemoryOptimizedTest__MemoryOptimized1'
, FILENAME = 'D:\SqlDataFiles\MemoryOptimizedTest__MemoryOptimized1')
TO FILEGROUP MemoryOptimizedTest_MemoryOptimized;
GO
ALTER DATABASE MemoryOptimizedTest
SET MEMORY_OPTIMIZED_ELEVATE_TO_SNAPSHOT=ON;
GO
USE MemoryOptimizedTest;
GO
CREATE TABLE [dbo].[wtAssetChildren](
[wtAssetChildrenID] [int] IDENTITY(1,1) NOT NULL
, [wtGUID] [uniqueidentifier] NOT NULL
, [CallingAssetID] [int] NOT NULL
, [AssetID] [int] NOT NULL
, [Processed] [bit] NOT NULL
CONSTRAINT [DF_wtAssetChildren_Processed] DEFAULT ((0))
, CONSTRAINT [PK_wtAssetChildren] PRIMARY KEY NONCLUSTERED( [wtAssetChildrenID] ASC )
)
WITH (MEMORY_OPTIMIZED=ON, DURABILITY=SCHEMA_AND_DATA);

How to come up with a script for modifying an existing table in the database without losing the existing information?

I have an application with a database which they are working well and everything is fine.
Now, I just need to modify one table in the database by adding more columns to it. I am using SQL Server and the database administrator asked me to provide him with the script of modifying that table.
So how to do that?
I am using SQL Server Management Studio and when I click on the table right-click, I used to select script to create, and Management Studio will give me the script. Now, this table has information, so I think I should not use create script for this table.
The new columns should allow null values.
So what should I use?
For your information, here is the script for creating the table:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Test](
[ID] [numeric](18, 0) IDENTITY(1,1) NOT NULL,
[No] [char](20) NOT NULL,
[Date] [smalldatetime] NULL,
[ProjectType] [char](500) NULL,
[ProjectPhase] [char](300) NULL,
[rejected_reason] [varchar](max) NULL,
[archived_reason] [varchar](max) NULL,
[forward_to] [varchar](max) NULL,
[forward_type] [varchar](max) NULL,
[forward_concern] [varchar](max) NULL,
CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[Test] ADD CONSTRAINT [DF_LLATestB_No] DEFAULT ('*') FOR [No]
GO
ALTER TABLE [dbo].[Test] ADD CONSTRAINT [DF_LLATestB_Status] DEFAULT ((2)) FOR [Status]
GO
ALTER TABLE [dbo].[Test] ADD CONSTRAINT [DF_LLATestB_AID] DEFAULT ((0)) FOR [AID]
GO
ALTER TABLE [dbo].[Test] ADD CONSTRAINT [DF_LLATestB_Hit] DEFAULT ((0)) FOR [Hit]
GO
ALTER TABLE [dbo].[Test] ADD CONSTRAINT [DF_LLATestB_Sent] DEFAULT ((0)) FOR [Sent]
GO
ALTER TABLE [dbo].[Test] ADD CONSTRAINT [DF_LLATestB_SentTo] DEFAULT ((0)) FOR [SentTo]
GO
ALTER TABLE [dbo].[Test] ADD CONSTRAINT [DF_LLa_Added_To_Cart] DEFAULT ((0)) FOR [Added_To_Cart]
GO
Examples of the new columns that should be added to this table are:
[rejected_reason] [varchar](max) NULL,
[archived_reason] [varchar](max) NULL,
[forward_to] [varchar](max) NULL,
[forward_type] [varchar](max) NULL,
When you've studied SQL Server Books Online on MSDN, it should be a piece of cake - use ALTER TABLE:
ALTER TABLE dbo.LLa
ADD [rejected_reason] [varchar](max) NULL
ALTER TABLE dbo.LLa
ADD [archived_reason] [varchar](max) NULL
ALTER TABLE dbo.LLa
ADD [forward_to] [varchar](max) NULL
ALTER TABLE dbo.LLa
ADD [forward_type] [varchar](max) NULL
The question is more: do you REALLY need up to 2 GByte of data (VARCHAR(MAX)) for each of your columns here?? Seems like a bit of overkill!
You should always use the appropriate data type - if that column will only ever hold up to 100 characters - why not use VARCHAR(100) instead of using the max datatype??

Considerations for updating a 'SiteVisit' row many times over a session

SQL Server 2005:
I have a SiteVisit row which contains information about a users visit, for instance HttpRefer, whether or not they placed an order, browser, etc.
Currently for reporting I am joining this table with SiteEvent which contains information about each 'section' visited. This then produces a view which shows statistics about how many sections each user visited. Obviously this is not a sustainable way to do this and now I'm doing some refactoring.
I'd like to move the SectionsVisited column from my View to an actual column in SiteVisit. I'd then update it everytime a user went to that session.
Now my actual question:
What kind of considerations do I need to take into account when updating a row many times per session. Obviously I have an index on the row (currently indexed by a GUID to prevent most malicious tampering).
I just want to know what non-obvious things I should do (if any). Are there any specific things I should do to optimize the table or will SQL server 2005 pretty much take care of itself
NB: it is a flash site so please dont just recommend a tracking tool. I want to do some
'crazy' datamining and have developed the tracking as such. This is primarily intended to be a database question not a question about 'how to track'.
Requested table definition:
USE [RazorSite]
GO
/****** Object: Table [dbo].[SiteVisit] Script Date: 10/29/2008 14:35:56 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[SiteVisit](
[SiteVisitId] [int] IDENTITY(1,1) NOT NULL,
[SiteUserId] [int] NULL,
[ClientGUID] [uniqueidentifier] ROWGUIDCOL NULL CONSTRAINT [DF_SiteVisit_ClientGUID] DEFAULT (newid()),
[ServerGUID] [uniqueidentifier] NULL,
[UserGUID] [uniqueidentifier] NULL,
[SiteId] [int] NOT NULL,
[EntryURL] [varchar](100) NULL,
[CampaignId] [varchar](50) NULL,
[Date] [datetime] NOT NULL,
[Cookie] [varchar](50) NULL,
[UserAgent] [varchar](255) NULL,
[Platform] [int] NULL,
[Referer] [varchar](255) NULL,
[RegisteredReferer] [int] NULL,
[FlashVersion] [varchar](20) NULL,
[SiteURL] [varchar](100) NULL,
[Email] [varchar](50) NULL,
[FlexSWZVersion] [varchar](20) NULL,
[HostAddress] [varchar](20) NULL,
[HostName] [varchar](100) NULL,
[InitialStageSize] [varchar](20) NULL,
[OrderId] [varchar](50) NULL,
[ScreenResolution] [varchar](50) NULL,
[TotalTimeOnSite] [int] NULL,
[CumulativeVisitCount] [int] NULL CONSTRAINT [DF_SiteVisit_CumulativeVisitCount] DEFAULT ((0)),
[ContentActivatedTime] [int] NULL CONSTRAINT [DF_SiteVisit_ContentActivatedTime] DEFAULT ((0)),
[ContentCompleteTime] [int] NULL,
[MasterVersion] [int] NULL CONSTRAINT [DF_SiteVisit_MasterVersion] DEFAULT ((0)),
CONSTRAINT [PK_SiteVisit] PRIMARY KEY CLUSTERED
(
[SiteVisitId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [dbo].[SiteVisit] WITH CHECK ADD CONSTRAINT [FK_SiteVisit_Platform] FOREIGN KEY([Platform])
REFERENCES [dbo].[Platform] ([PlatformId])
GO
ALTER TABLE [dbo].[SiteVisit] CHECK CONSTRAINT [FK_SiteVisit_Platform]
GO
ALTER TABLE [dbo].[SiteVisit] WITH CHECK ADD CONSTRAINT [FK_SiteVisit_Site] FOREIGN KEY([SiteId])
REFERENCES [dbo].[Site] ([SiteId])
GO
ALTER TABLE [dbo].[SiteVisit] CHECK CONSTRAINT [FK_SiteVisit_Site]
GO
ALTER TABLE [dbo].[SiteVisit] WITH CHECK ADD CONSTRAINT [FK_SiteVisit_SiteUser] FOREIGN KEY([SiteUserId])
REFERENCES [dbo].[SiteUser] ([SiteUserId])
GO
ALTER TABLE [dbo].[SiteVisit] CHECK CONSTRAINT [FK_SiteVisit_SiteUser]
Current indexes:
IX_CampaignId - non unique, non clustered
IX_ClientGUID - Unique, non clustered <-- this is how a user is identified for updates
IX_UserGUID - non unique, non clustered
PK_SiteVisit - (SiteVisitId column) - clustered
The best suggestion that I can give is: keep the table small.
How? Have one table that contains all "live" data, i.e. active sessions. When a session expires : move the data out to an "archive" table or even another database server to do your mining.
Have only very few indexes on the "live" table (session id). You can have all the indexes you want on the "archive" table for faster data retreival.