Creating a fulltext index on a view in SQL Server 2005

Creating a fulltext index on a view in SQL Server 2005 - sql-server-2005

I am having troubles creating a fulltext index on a view in SQL Server 2005. Reviewing the documentation I have not found the problem. The error message I receive is: "'Id' is not a valid index to enforce a full-text search key. A full-text search key must be a unique, non-nullable, single-column index which is not offline, is not defined on a non-deterministic or imprecise nonpersisted computed column, and has maximum size of 900 bytes. Choose another index for the full-text key."
I have been able to verify every requirement in the errorstring except the "offline" requirement, where I don't really know what that means. I'm pretty darn sure its not offline though.
I have the script to create the target table, view, and index below. I do not really need a view in the sample below, it is simplified as I try to isolate the issue.
DROP VIEW [dbo].[ProductSearchView]
DROP TABLE [dbo].[Product2]
GO
SET NUMERIC_ROUNDABORT OFF;
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT,
QUOTED_IDENTIFIER, ANSI_NULLS ON;
GO
CREATE TABLE [dbo].[Product2](
[Id] [bigint] NOT NULL,
[Description] [nvarchar](max) NULL,
CONSTRAINT [PK_Product2] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE VIEW [dbo].[ProductSearchView] WITH SCHEMABINDING
AS
SELECT P.Id AS Id,
P.Description AS Field
FROM [dbo].Product2 AS P
GO
-- this index may be overkill given the PK is set...
CREATE UNIQUE CLUSTERED INDEX PK_ProductSearchView ON [dbo].[ProductSearchView](Id)
GO
-- This is the command that fails
CREATE FULLTEXT INDEX ON [dbo].[ProductSearchView](Id, Field)
KEY INDEX Id
ON FullText WITH CHANGE_TRACKING AUTO;
GO

You need to specify the name of the index instead of the column name when creating the fulltext index:
CREATE FULLTEXT INDEX ON [dbo].[ProductSearchView](Id, Field)
KEY INDEX PK_ProductSearchView
ON FullText WITH CHANGE_TRACKING AUTO;
GO
This will remedy the error you are getting, but it will give you another error because you are trying to include a non-character based column in your text search. You may want to choose another indexed character column to use in your full text catalog instead.

Related

Optimizing slow query with multiple withs

I have the following query:
with matched_urls as
(
select l.Url, min(f.urlfiltertype) as Filter
from landingpages l
join landingpageurlfilters lpf on lpf.landingpageid = l.Url
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = 3062
group by l.Url
),
all_urls as
(
select l.Url, 5 as Filter
from landingpages l
where 'iylsuqnzukwv0milinztea' in (select domainid
from domainlandingpages dlp
where l.Url = dlp.landingpageid)
and l.Url not in (select Url from matched_urls)
union
select * from matched_urls
)
select l.*
from landingpages l
join all_urls u on l.Url = u.Url
order by u.Filter asc
offset 0 rows fetch next 30 rows only
These are the tables used in the query:
And this is the DDL for the tables:
CREATE TABLE [dbo].[LandingPages]
(
[Url] [nvarchar](448) NOT NULL,
[LastUpdated] [datetime2](7) NOT NULL,
CONSTRAINT [PK_LandingPages]
PRIMARY KEY CLUSTERED ([Url] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[DomainLandingPages]
(
[LandingPageId] [nvarchar](448) NOT NULL,
[DomainId] [nvarchar](128) NOT NULL,
CONSTRAINT [PK_DomainLandingPages]
PRIMARY KEY CLUSTERED ([DomainId] ASC, [LandingPageId] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[DomainLandingPages] WITH CHECK
ADD CONSTRAINT [FK_DomainLandingPages_Domains_DomainId]
FOREIGN KEY([DomainId]) REFERENCES [dbo].[Domains] ([Id])
GO
ALTER TABLE [dbo].[DomainLandingPages] CHECK CONSTRAINT [FK_DomainLandingPages_Domains_DomainId]
GO
ALTER TABLE [dbo].[DomainLandingPages] WITH CHECK
ADD CONSTRAINT [FK_DomainLandingPages_LandingPages_LandingPageId]
FOREIGN KEY([LandingPageId]) REFERENCES [dbo].[LandingPages] ([Url])
GO
ALTER TABLE [dbo].[DomainLandingPages] CHECK CONSTRAINT [FK_DomainLandingPages_LandingPages_LandingPageId]
GO
CREATE TABLE [dbo].[UrlFilters]
(
[Id] [int] IDENTITY(1,1) NOT NULL,
[GroupId] [int] NOT NULL,
[UrlFilterType] [int] NOT NULL,
[Filter] [nvarchar](max) NOT NULL,
CONSTRAINT [PK_UrlFilters]
PRIMARY KEY CLUSTERED ([Id] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
ALTER TABLE [dbo].[UrlFilters] WITH CHECK
ADD CONSTRAINT [FK_UrlFilters_Groups_GroupId]
FOREIGN KEY([GroupId]) REFERENCES [dbo].[Groups] ([Id])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[UrlFilters] CHECK CONSTRAINT [FK_UrlFilters_Groups_GroupId]
GO
CREATE TABLE [dbo].[LandingPageUrlFilters]
(
[LandingPageId] [nvarchar](448) NOT NULL,
[UrlFilterId] [int] NOT NULL,
CONSTRAINT [PK_LandingPageUrlFilters]
PRIMARY KEY CLUSTERED ([LandingPageId] ASC, [UrlFilterId] ASC)
WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] WITH CHECK
ADD CONSTRAINT [FK_LandingPageUrlFilters_LandingPages_LandingPageId]
FOREIGN KEY([LandingPageId]) REFERENCES [dbo].[LandingPages] ([Url])
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] CHECK CONSTRAINT [FK_LandingPageUrlFilters_LandingPages_LandingPageId]
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] WITH CHECK
ADD CONSTRAINT [FK_LandingPageUrlFilters_UrlFilters_UrlFilterId]
FOREIGN KEY([UrlFilterId]) REFERENCES [dbo].[UrlFilters] ([Id])
GO
ALTER TABLE [dbo].[LandingPageUrlFilters] CHECK CONSTRAINT [FK_LandingPageUrlFilters_UrlFilters_UrlFilterId]
GO
Here is the execution plan:
https://www.brentozar.com/pastetheplan/?id=H1tHt5pvP
The query is pulling all urls for a given domain and are then supposed to be ordered by UrlFilterType - however not all landing pages have a match, hence the two with clauses.
As far as I can see from the execution plan it's mainly doing index seeks, so I think I have the right indexes. However, the query takes very long to execute, so I hope there might be a smarter way of doing this.
Any input will be greatly appreciated!

First up, your statistics look wildly out of line. Estimated 3,700 rows, actual 219,000. That suggests, at the very least, a statistics update will possibly change the choices the optimizer is making. Because of those row estimates, the optimizer is choosing a seek and a nested loops operation where, based on the data distribution, you're reading 1/3 of the table, 200k rows of a 600k table. A scan here, probably with a hash join, would be more efficient.
The query itself isn't giving much to filter the [aarhus-cluster-onesearch-staging].[dbo].[LandingPages].[PK_LandingPages] table on. So it's pulling 200k rows in order to filter them down to 30. If you can find a way to additional filtering there, you should see a performance improvement.

OK, the big hit is at the bottom of that query plan - where it's reading from LandingPageUrlFilters where the URLfilterID comes from URLfilters.
It's getting completely the wrong estimates (out by 70x) and then sorts your URLs taking 30 seconds or so.
If you run the first CTE on its own, I think it will take a long time. That's what you need to optimise.
select l.Url, min(f.urlfiltertype) as Filter
from landingpages l
join landingpageurlfilters lpf on lpf.landingpageid = l.Url
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = 3062
group by l.Url
Suggestions
First thing to try is to also add a nonclustered index on LandingPageUrlFilters in the opposite order to your clustered index (e.g., CREATE NONCLUSTERED INDEX myindex ON LandingPageUrlFilters ([UrlFilterId] ASC, [LandingPageId] ASC). Note this will make a full copy of that table which may be rather large. It appears you already have a non-clustered index like this (based on the fact it's referring to IX_LandingPageUrlFilters_UrlFilterId)
The sort on the string field that is nvarchar(448) - which is actually close to 900 bytes per row - will take a much bigger memory grant. Consider adding ID int values as primary keys - it will require less memory and therefore less likely to spill to disk
Consider, instead of a CTE, create a temporary table (with appropriate PKs) of the section and LandingPageURLFilters and urlfilters. However, you will still need to do a sort when inserting these - which is likely to take just as long.
Part of the problem (another 10s or so) is a nested loop join to LandingPages. It was expecting less than 4000 rows (so a nested loop is OK) but had to do 220,000 loops. If necessary, consider a join hint (e.g., INNER HASH JOIN rather than INNER JOIN). However, it appears that landingpages isn't actually required in that query - just remove the table landingpages from the CTE, and use landingpageurlfilters.landingpageid
e.g.,
select lpf.landingpageid AS [Url], min(f.urlfiltertype) as Filter
from landingpageurlfilters lpf
join urlfilters f on lpf.urlfilterid = f.id
where f.groupid = 3062
group by lpf.landingpageid
I think that gives the same results as the CTE I copied above.

Creating a partitioned view in SQL Server 2008 R2 Enterprise

I'm extending some legacy software that splits data up in to multiple schemas by company, for example CP1.ACCOUNTS, CP2.ACCOUNTS, CPN.ACCOUNTS. I'm attempting to create an updatable view of these tables using partitioning, but I'm getting the typical "not updatable because a partitioning column was not found" error. The column I'm trying to partition on is the primary key, and as far as I can tell, isn't any of the things it isn't allowed to be.
So, with table definitions like so:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [CP1].[ACCOUNTS](
[ACCOUNTID] [char](10) NOT NULL,
[LASTNAME] [varchar](60) NOT NULL,
[FIRSTNAME] [varchar](35) NOT NULL,
[MIDDLE] [varchar](26) NULL,
[SUFFIX] [varchar](10) NULL,
[ADDRESS1] [varchar](55) NULL,
[ADDRESS2] [varchar](55) NULL,
[SOME_FLAG] [tinyint] NULL,
CONSTRAINT [ARM_CODE_KEY] PRIMARY KEY CLUSTERED
(
[CODE_] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
ALTER TABLE [CP1].[ACCOUNTS] WITH CHECK ADD CONSTRAINT [CK__ACCOUNTS__CODE___4DD705FF] CHECK ((left([ACCOUNTID],(3))='CP1'))
GO
ALTER TABLE [CP1].[ACCOUNTS] CHECK CONSTRAINT [CK__ACCOUNTS__CODE___4DD705FF]
GO
ALTER TABLE [CP1].[ACCOUNTS] ADD DEFAULT ((0)) FOR [SOME_FLAG]
GO
and the rest of the tables defined exactly as above, following the CP2, CP3, CPN pattern, and the view definition being a simple:
CREATE VIEW [ALL].[ACCOUNTS] AS
SELECT * FROM CP1.ACCOUNTS
UNION ALL
SELECT * FROM CP2.ACCOUNTS
--UNION ALL etc...
Inserts would be like:
INSERT INTO [ALL].[ACCOUNTS]
([ACCOUNTID]
,[LASTNAME]
,[FIRSTNAME]
,[MIDDLE]
,[SUFFIX]
,[ADDRESS1]
,[ADDRESS2]
,[SOME_FLAG])
VALUES
('CP1XYZ0001',
'SMITH',
'JOHN',
'Q',
'',
'123 Fake St',
'Apt 2',
0,
GO
generates an error like:
Msg 4436, level 16, State 12, Line 1
UNION ALL view 'ALL.ACCOUNTS' is not updatable because a partitioning column was not found.
Am I missing something simple? Am I just way out in left field here?

You need a constraint that defines which column is used as a partitioning column. As the error suggests, you don't have one defined. As described in the documentation:
To perform updates on a partitioned view, the partitioning column must
be a part of the primary key of the base table. If a view is not
updatable, you can create an INSTEAD OF trigger on the view that
allows updates. You should design error handling into the trigger to
make sure that no duplicate rows are inserted. For an example of an
INSTEAD OF trigger designed on a view, see Designing INSTEAD OF
Triggers.
In other words, SQL Server needs to be able to figure out which table gets the update.
You might be able to alter the tables to contain a company name column, which is then used as part of the primary key. Something like this might work:
create table . . .
CompanyName as 'CompanyA',
primary key (AccountId, CompanyName)
. . .
The alternative is to use an instead of trigger, as suggested in the documentation.

In case someone comes upon this, you can use a computed column for partitioning, just make sure to make it a persisted computed column.
In this case, the computed column should be left([ACCOUNTID],(3) and the partition constraint would be <computed column> = 'CP1'. Note: using left() in the constraint will cause it to still scan all partitions. The CHECK constraints can only use these operators: BETWEEN, AND, OR, <, <=, >, >=, =.
Also, since the question referenced enterprise edition, you'd get better performance using a partitioned table instead of a partitioned view.

Problems creating a full text index on a view

I have a view which has been created like this:
CREATE VIEW [dbo].[vwData] WITH SCHEMABINDING
AS
SELECT [DataField1] ,
[DataField2] ,
[DataField3]
FROM dbo.tblData
When I try to create a full text index on it, like this:
CREATE FULLTEXT INDEX ON [dbo].[vwData](
[DataField] LANGUAGE [English])
KEY INDEX [idx_DataField]ON ([ft_cat_Server], FILEGROUP [PRIMARY])
WITH (CHANGE_TRACKING = AUTO, STOPLIST = SYSTEM)
I get this error:
View 'dbo.vwData' is not an indexed view.
Full-text index is not allowed to be created on it.
Any idea why?

First you need to create a unique clustered index on a view, before creating a fulltext index.
Suppose you have a table:
CREATE TABLE [dbo].[tblData](
[DataField1] [Varchar] NOT NULL,
[DataField2] [varchar](10) NULL,
[DataField3] [varchar](10) NULL
)
And as you already did, you have a view:
CREATE VIEW [dbo].[vwData]
WITH SCHEMABINDING
AS
SELECT [DataField1] ,
[DataField2] ,
[DataField3]
FROM dbo.tblData
GO
Now you need to create unique clustered index on a view :
CREATE UNIQUE CLUSTERED INDEX idx_DataField
ON [dbo].[vwData] (DataField1);
GO
After the unique key is created since you already have fulltext catalog ft_cat_Server you can create a fulltext index:
CREATE FULLTEXT INDEX ON [dbo].[vwData](
[DataField1] LANGUAGE [English])
KEY INDEX [idx_DataField]ON ([ft_cat_Server], FILEGROUP [PRIMARY])
WITH (CHANGE_TRACKING = AUTO, STOPLIST = SYSTEM)
Hope this helps :)

you have to make your view indexed by creating unique clustered index:
create unique clustered index ix_vwData on vwData(<unique columns>)
After that, index idx_DataField must be a unique, non-nullable, single-column index.

SQL Azure not recognizing my clustered Index

I get the following error when I try to insert a row into a SQL Azure table.
Tables without a clustered index are not supported in this version of
SQL Server. Please create a clustered index and try again.
My problem is I do have a clustered index on that table. I used SQL Azure MW to generate the Azure SQL Script.
Here's what I'm using:
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[tblPasswordReset]') AND type in (N'U'))
DROP TABLE [dbo].[tblPasswordReset]
GO
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
IF NOT EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[tblPasswordReset]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[tblPasswordReset](
[PasswordResetID] [int] IDENTITY(1,1) NOT NULL,
[PasswordResetGUID] [uniqueidentifier] NULL,
[MemberID] [int] NULL,
[RequestDate] [datetime] NULL,
CONSTRAINT [PK_tblPasswordReset] PRIMARY KEY CLUSTERED
(
[PasswordResetID] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF)
)
END
GO
Why doesn't SQL Azure recognize my clustered Key? Is my script wrong?

Your script only creates the table if it did not exist yet. Perhaps there still is an old version of the table without a clustered index? You can check with:
select * from sys.indexes where object_id = object_id('tblPasswordReset')
If the table exists without the clustered index, you can add one like:
alter table tblPasswordReset add constraint
PK_tblPasswordReset primary key clustered
As far as I can see, your statement does conform to the Azure create table spec.

Be careful if you're using SSIS. I ran into this same problem, myself, but was using SSIS instead of manually inserting the data. By default SSIS will drop and recreate the table, so even though I had it properly defined with a clustered index, my SSIS script failed. On the "Edit Mappings" step in the SSIS wizard you can manually define the table creation script. I just deleted the table gen script there and my import worked.
(I'd leave this as a comment but my post count is too anemic)

Increasing performance on a logging table in SQL Server 2005

I have a "history" table where I log each request into a Web Handler on our web site. Here is the table definition:
/****** Object: Table [dbo].[HistoryRequest] Script Date: 10/09/2009 17:18:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[HistoryRequest](
[HistoryRequestID] [uniqueidentifier] NOT NULL,
[CampaignID] [int] NOT NULL,
[UrlReferrer] [nvarchar](512) NOT NULL,
[UserAgent] [nvarchar](512) NOT NULL,
[UserHostAddress] [nvarchar](15) NOT NULL,
[UserHostName] [nvarchar](512) NOT NULL,
[HttpBrowserCapabilities] [xml] NOT NULL,
[Created] [datetime] NOT NULL,
[CreatedBy] [nvarchar](100) NOT NULL,
[Updated] [datetime] NULL,
[UpdatedBy] [nvarchar](100) NULL,
CONSTRAINT [PK_HistoryRequest] PRIMARY KEY CLUSTERED
(
[HistoryRequestID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[HistoryRequest] WITH CHECK ADD CONSTRAINT [FK_HistoryRequest_Campaign] FOREIGN KEY([CampaignID])
REFERENCES [dbo].[Campaign] ([CampaignId])
GO
ALTER TABLE [dbo].[HistoryRequest] CHECK CONSTRAINT [FK_HistoryRequest_Campaign]
GO
37 seconds for 1050 rows on this statement:
SELECT *
FROM HistoryRequest AS hr
WHERE Created > '10/9/2009'
ORDER BY Created DESC
Does anyone have anysuggestions for speeding this up? I have a Clustered Index on the PK and a regular Index on the CREATED column. I tried a Unique Index and it barfed complaining there is a duplicate entry somewhere - which can be expected.
Any insights are welcome!

You are requesting all columns (*) over a non-covering index (created). On a large data set you are guaranteed to hit the Index Tipping Point where the clustered index scan is more efficient than an nonclustered index range seek and bookmark lookup.
Do you need * always? If yes, and if the typical access pattern is like this, then you must organize the table accordingly and make Created the leftmost clustered key.
If not, then consider changing your query to a coverable query, eg. select only HistoryRequestID and Created, which are covered by the non clustered index. If more fields are needed, add them as included columns to the non-clustered index, but take into account that this will add extra strorage space and IO log write time.

Hey, I've seen some odd behavior when pulling XML columns in large sets. Try putting your index on Created back, then specify the columns in your select statement; but omit the XML. See how that affects the return time for results.

For a log table, you probably don't need a uniqueidentifier column. You're not likely to query on it either, so it's not a good candidate for a clustered index. Your sample query is on "Created", yet there's no index on it. If you query frequently on ranges of "Created" values then it would be a good candidate for clustering even though it's not necessarily unique.
OTOH, the foreign key suggests frequent querying by Campaign, in which case having the clustering done by that column could make sense, and would also probably do a better job of scattering the inserted keys in the indexes - both the surrogate key and the timestamp would add records in sequential order, which is net more work over time for insertions because the node sectors are filled less randomly.
If it's just a log table, why does it have update audit columns? It would normally be write-only.

Rebuild indexes. Use WITH (NOLOCK) clause after the table names where appropriate, this probably applies if you want to run long(ish) running queries against table that are heavily used in a live environment (such as a log file). It basically means your query migth miss some of teh very latest records but you also aren't holding a lock open on the table - which creates additional overhead.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas