We have electronic forms that filers fill out online and we store the data in an SQL Server. We want to provide a search feature that allows us to search inside each electronic filing for matching keywords. We don’t need to know what word matched or where in the form it matches, we just need a ranked list of forms that match our keywords. We think SQL Full-Text Searching would be our best option because we are already using SQL server 2016. We just started with implementing a solution but would like some guidance since this is new territory for us.
Here is an example of how our tables are structured.
Filing is our top-level table for all electronic forms. We have sub tables that are all related through the FilingId. The Form Six Published Filings table has child tables to store information like Assets. The Form One Published Filings table has child tables to store information like Liabilities.
CREATE SCHEMA [Forms]
GO
CREATE SCHEMA [Form6]
GO
CREATE SCHEMA [Form1]
GO
CREATE TABLE [Forms].[Filing](
[FilingId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Forms_Filing_FilingId] PRIMARY KEY CLUSTERED,
[FilerUserId] [int] NOT NULL,
[FormYear] [int] NOT NULL,
[FormTypeId] [int] NOT NULL,
[FilingStatusId] [int] NOT NULL,
[FilerSignatureId] INT NULL,
[SubmissionDate] DATETIME2(0) NULL,
[IsScannedForm] BIT NOT NULL
CONSTRAINT [DF_Forms_Filing_IsScannedForm] DEFAULT(0)
)
GO
CREATE TABLE [Form6].[FormSixPublishedFilings](
[FormSixPublishedFilingId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form6_FormSixPublishedFilings_FormSixPublishedFilingId] PRIMARY KEY CLUSTERED,
[FilingId] INT NOT NULL
CONSTRAINT [FK_Form6_FormSixPublishedFilings_Filings] FOREIGN KEY ([FilingId]) REFERENCES [Forms].[Filing] ([FilingId]),
[LastDateOfEmployment] DATE NULL,
[NetWorthDate] DATE NULL,
[NetWorth] MONEY NULL
)
GO
CREATE TABLE [Form6].[FormSixPublishedAssets](
[FormSixPublishedAssetId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form6_FormSixPublishedAssets_FormSixPublishedAssetId] PRIMARY KEY CLUSTERED,
[FormSixPublishedFilingId] INT NOT NULL
CONSTRAINT [FK_Form6_FormSixPublishedAssets_FormSixPublishedFilings] FOREIGN KEY ([FormSixPublishedFilingId]) REFERENCES [Form6].[FormSixPublishedFilings] ([FormSixPublishedFilingId]),
[Name] VARCHAR(8000) NOT NULL,
[Amount] MONEY NOT NULL
)
GO
CREATE TABLE [Form1].[FormOnePublishedFilings]
(
[FormOnePublishedFilingId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form1_FormOnePublishedFilings_FormOnePublishedFilingId] PRIMARY KEY CLUSTERED,
[FilingId] INT NOT NULL,
CONSTRAINT [FK_Form1_FormOnePublishedFilings_Filing] FOREIGN KEY ([FilingId]) REFERENCES [Forms].[Filing] ([FilingId]),
[HasServedAsAgent] BIT NULL,
[LastDateOfEmployment] DATE NULL,
[AmendmentReason] VARCHAR(1024) NULL,
)
GO
CREATE TABLE [Form1].[FormOnePublishedLiabilities]
(
[FormOnePublishedLiabilityId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form1_FormOnePublishedLiabilities_FormOnePublishedLiabilityId] PRIMARY KEY CLUSTERED,
[FormOnePublishedFilingId] INT NOT NULL,
CONSTRAINT [FK_Form1_FormOnePublishedLiabilities_FormOnePublishedFilings] FOREIGN KEY ([FormOnePublishedFilingId]) REFERENCES [Form1].[FormOnePublishedFilings] ([FormOnePublishedFilingId]),
[NameOfCreditor] VARCHAR(8000) NOT NULL,
[AddressOfCreditor] VARCHAR(8000) NOT NULL
)
GO
In order to be able to search through all the forms, I think we need to create a view that just has two columns. One for the FilingId and the other column would be an XML data type which would be an XML representation of all the data in each electronic filing. This XML column is what we will be using to set up our full-text index. I think we will be using the FreeTextTable search because we would like to have the results ranked and also the search terms will be entered by end-users.
create view ViewForFullTextSearching with schemabinding as
select f.FilingId,
(select
filing.FilingId
,filing.FormYear
,filing.FormTypeId
,filing.FilingStatusId
,filing.FilerSignatureId
,filing.SubmissionDate
,filing.IsScannedForm
,form6Filing.LastDateOfEmployment 'Form6LastDateOfEmployment'
,form6Filing.NetWorthDate
,form6Filing.NetWorth
,form6Asset.Name
,form6Asset.Amount
,form1Filing.HasServedAsAgent
,form1Filing.LastDateOfEmployment 'Form1LastDateOfEmployment'
,form1Filing.AmendmentReason
,form1Liability.NameOfCreditor
,form1Liability.AddressOfCreditor
from Forms.Filing filing
left join Form6.FormSixPublishedFilings form6Filing on filing.FilingId = form6Filing.FilingId
left join Form6.FormSixPublishedAssets form6Asset on form6Filing.FormSixPublishedFilingId = form6Asset.FormSixPublishedFilingId
left join Form1.FormOnePublishedFilings form1Filing on filing.FilingId = form1Filing.FilingId
left join Form1.FormOnePublishedLiabilities form1Liability on form1Liability.FormOnePublishedFilingId = form1Filing.FormOnePublishedFilingId
where filing.FilingId = f.FilingId
for xml auto, type
) as 'Filing'
from Forms.Filing f
GO
create unique clustered index [IX_ViewForFullTextSearching_FilingId] ON [Forms].[ViewForFullTextSearching] ([FilingId])
GO
The above SQL does not actually work because I get this error.
Cannot create an index on view "EthicsFdms.Forms.ViewForFullTextSearching" because it contains one or more subqueries. Consider changing the view to use only joins instead of subqueries. Alternatively, consider not indexing this view.
So, I’m a bit lost on how to create a view with XML to search over if I’m not allowed to create a materialized view that has subqueries.
This view results look like this:
Next we setup our Full Text Catalog and Index on this view:
CREATE FULLTEXT CATALOG [FtcFilings];
GO
CREATE FULLTEXT INDEX ON [Forms].[ViewForFullTextSearching] ([Filing] language 1033) key index [IX_ViewForFullTextSearching_FilingId] on [FtcFilings];
GO
Then I was hoping we could search the filings like so:
select ftt.*
from [Forms].[Filing] filing
inner join freetextable(Forms.ViewForFullTextSearching, Filing, 'APPLE') as ftt on filing.FilingId = ftt.[KEY]
order by rank desc
Right now my challenges are, is it possible to create a materialized view like this? Seems like I can’t because materialized views can’t have subqueries. I’m not sure how to build the XML field w/out subqueries.
If I’m not able to create a materialized view then how else can I create a full-text index that can search electronic Forms?
You cannot create an indexed view (which is a synchronous materialized view in SQL Server) only if there is a mathematical surjection and all scalar computation is deterministic and precise. By the way OUTER JOIN, SUBQUERIES and set operators (UNION, EXCEPT, INTERSECT) cannot be used...
The best ways to design your systeme is to do it in the reverse way...
Create a persistent computed column using the CONCAT function of all the columns you want to fulltext index.
Create fulltext indexes on the computed columns
Create an UDF that search in the fulltext index on each tables and concatenate the result by UNION, and then aggregate results to compute the rank.
Let me know if you want more assistance to do so...
If these form filling data are seldom changed once created and it makes sense in business to store data of form1 and form6 together with its Filling, you may consider to go with document oriented design.
SQL server has good json support now. You can save all the Filling and form info in json, against which you can do full text search, and create views to simulate your current design if needed.
Here is an example -
create table tst.form (
form_id int not null identity primary key
,content_json nvarchar(max)
)
-- inside content_json, the json may look like -
{
"filler_user_id": 111,
"filler_type_id": 1,
"is_scanned_form": 1,
"form1": [
{
"form1_filling_id": 101,
"has_served_as_agent":0,
"liabilities": [{"name_of_creditor": "abc"}]
}
]
}
I only modelled form1 related info. You can add form6 related info as needed.
Then you can do full text search against this content_json column.
Then create views to simulate your current design if needed -
create or alter view tst.form_base WITH SCHEMABINDING as
select form_id
,convert(int, JSON_VALUE(content_json, '$.filler_user_id')) filler_user_id
,convert(int, JSON_VALUE(content_json, '$.filler_type_id')) filler_type_id
,convert(bit, JSON_VALUE(content_json, '$.is_scanned_form')) is_scanned_form
,JSON_QUERY(content_json, '$.form1') form1_json
from tst.form
create unique clustered index idx_form_base_form_id on tst.form_base(form_id);
-- you can create index as needed
create index idx_form_base_filler_user_id on tst.form_base(filler_user_id);
create or alter view tst.form1 as
select form_id
,a.form1_filling_id
,a.has_served_as_agent
,a.liabilities liabilities_json
from tst.form_base cross apply OPENJSON(form1_json) WITH (
form1_filling_id int '$.form1_filling_id',
has_served_as_agent int '$.has_served_as_agent',
liabilities nvarchar(max) '$.liabilities' as json) a
create or alter view tst.form1_liabilities as
select form_id
,form1_filling_id
,a.name_of_creditor
from tst.form1 cross apply OPENJSON(liabilities_json) WITH (
name_of_creditor nvarchar(max) '$.name_of_creditor') a
Then create some test data -
insert into tst.form (content_json) values ('{
"filler_user_id": 111,
"filler_type_id": 1,
"is_scanned_form": 1,
"form1": [
{
"form1_filling_id": 101,
"has_served_as_agent":0,
"liabilities": [{"name_of_creditor": "abc"}]
}
]
}');
insert into tst.form (content_json) values ('{
"filler_user_id": 222,
"filler_type_id": 1,
"is_scanned_form": 0,
"form1": [
{
"form1_filling_id": 102,
"has_served_as_agent":1,
"liabilities": [{"name_of_creditor": "def"}]
}
]
}');
Try it -
select *
from tst.form1_liabilities
I've started working with Fulltext indexing, and I've ran into a problem that I can't find a solution for.
Ive created a catalog with
create FULLTEXT CATALOG [ClaimDbCatalog] AS DEFAULT
Then my table looks like ...
create table Claim(
Id int identity(1,1) not null ,
DateTimeCreated dateTime not null default getDate(),
ScriptNumber varchar(20) not null,
IsResolved bit not null default 0,
ResolvedDateTime datetime,
PracticeId int not null references dbo.Practice(Id),
CreatedById int not null references dbo.SystemUser(Id)
CONSTRAINT [PK_Claim_Id] PRIMARY KEY CLUSTERED ([Id] ASC));
Created my index with :
create fulltext index idxClaimonIdFulltext Claim(ScriptNumber) KEY INDEX [PK_Claim_Id] ON ClaimDbCatalog
Then, looking at my test data ..
Finally, I attempt the fulltext search with
SELECT * from CONTAINSTABLE([dbo].[Claim], Scriptnumber, 'PR1234567890')
But this yields no results. I've tried using part of the text, but still no results.
What am I doing wrong?
The issue was due to me doing my tests in a TRAN. The moment I committed the data, the fulltext kicked in and worked.
I am creating an index on a table and I want to include a covering column: messageText nvarchar(1024)
After insertion, the messageText is never updated, so it's an ideal candidate to include in a covering index to speed up lookups.
But what happens if I update other columns in same index?
Will the entire row in the index need reallocating or will just that data from the updated column be updated in the index?
Simple Example
Imaging the following table:
CREATE TABLE [Messages](
[messageID] [int] IDENTITY(1,1) NOT NULL,
[mbrIDTo] [int] NOT NULL,
[isRead] [bit] NOT NULL,
[messageText] [nvarchar](1024) NOT NULL
)
And the following Index:
CREATE NONCLUSTERED INDEX [IX_messages] ON [Messages] ( [mbrIDTo] ASC, [messageID] ASC )
INCLUDE ( [isRead], [messageText])
When we update the table:
UPDATE Messages
SET isRead = 1
WHERE (mbrIDTo = 6546)
The query plan shows that the index IX_messages is utilized and will also be updated becuase the column isRead is part of the index.
Therefore does including large text fields (such as messageText in the above) as part of a covering column in an index, impact performance when other values, in that same index, are updated?
When a row is updated in SQL Server, the entire row is deleted and a new row with the updated records is inserted. Therefore, even if the messageText field is not changing, it will still have to be re-written to the disk.
Here is a blog post from Paul Randall with a good example: http://www.sqlskills.com/blogs/paul/do-changes-to-index-keys-really-do-in-place-updates/
I get the following error when I try to insert a row into a SQL Azure table.
Tables without a clustered index are not supported in this version of
SQL Server. Please create a clustered index and try again.
My problem is I do have a clustered index on that table. I used SQL Azure MW to generate the Azure SQL Script.
Here's what I'm using:
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[tblPasswordReset]') AND type in (N'U'))
DROP TABLE [dbo].[tblPasswordReset]
GO
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
IF NOT EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[tblPasswordReset]') AND type in (N'U'))
BEGIN
CREATE TABLE [dbo].[tblPasswordReset](
[PasswordResetID] [int] IDENTITY(1,1) NOT NULL,
[PasswordResetGUID] [uniqueidentifier] NULL,
[MemberID] [int] NULL,
[RequestDate] [datetime] NULL,
CONSTRAINT [PK_tblPasswordReset] PRIMARY KEY CLUSTERED
(
[PasswordResetID] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF)
)
END
GO
Why doesn't SQL Azure recognize my clustered Key? Is my script wrong?
Your script only creates the table if it did not exist yet. Perhaps there still is an old version of the table without a clustered index? You can check with:
select * from sys.indexes where object_id = object_id('tblPasswordReset')
If the table exists without the clustered index, you can add one like:
alter table tblPasswordReset add constraint
PK_tblPasswordReset primary key clustered
As far as I can see, your statement does conform to the Azure create table spec.
Be careful if you're using SSIS. I ran into this same problem, myself, but was using SSIS instead of manually inserting the data. By default SSIS will drop and recreate the table, so even though I had it properly defined with a clustered index, my SSIS script failed. On the "Edit Mappings" step in the SSIS wizard you can manually define the table creation script. I just deleted the table gen script there and my import worked.
(I'd leave this as a comment but my post count is too anemic)
I am having troubles creating a fulltext index on a view in SQL Server 2005. Reviewing the documentation I have not found the problem. The error message I receive is: "'Id' is not a valid index to enforce a full-text search key. A full-text search key must be a unique, non-nullable, single-column index which is not offline, is not defined on a non-deterministic or imprecise nonpersisted computed column, and has maximum size of 900 bytes. Choose another index for the full-text key."
I have been able to verify every requirement in the errorstring except the "offline" requirement, where I don't really know what that means. I'm pretty darn sure its not offline though.
I have the script to create the target table, view, and index below. I do not really need a view in the sample below, it is simplified as I try to isolate the issue.
DROP VIEW [dbo].[ProductSearchView]
DROP TABLE [dbo].[Product2]
GO
SET NUMERIC_ROUNDABORT OFF;
SET ANSI_PADDING, ANSI_WARNINGS, CONCAT_NULL_YIELDS_NULL, ARITHABORT,
QUOTED_IDENTIFIER, ANSI_NULLS ON;
GO
CREATE TABLE [dbo].[Product2](
[Id] [bigint] NOT NULL,
[Description] [nvarchar](max) NULL,
CONSTRAINT [PK_Product2] PRIMARY KEY CLUSTERED
(
[Id] ASC
)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE VIEW [dbo].[ProductSearchView] WITH SCHEMABINDING
AS
SELECT P.Id AS Id,
P.Description AS Field
FROM [dbo].Product2 AS P
GO
-- this index may be overkill given the PK is set...
CREATE UNIQUE CLUSTERED INDEX PK_ProductSearchView ON [dbo].[ProductSearchView](Id)
GO
-- This is the command that fails
CREATE FULLTEXT INDEX ON [dbo].[ProductSearchView](Id, Field)
KEY INDEX Id
ON FullText WITH CHANGE_TRACKING AUTO;
GO
You need to specify the name of the index instead of the column name when creating the fulltext index:
CREATE FULLTEXT INDEX ON [dbo].[ProductSearchView](Id, Field)
KEY INDEX PK_ProductSearchView
ON FullText WITH CHANGE_TRACKING AUTO;
GO
This will remedy the error you are getting, but it will give you another error because you are trying to include a non-character based column in your text search. You may want to choose another indexed character column to use in your full text catalog instead.