Fulltext search not yielding results - sql

I've started working with Fulltext indexing, and I've ran into a problem that I can't find a solution for.
Ive created a catalog with
create FULLTEXT CATALOG [ClaimDbCatalog] AS DEFAULT
Then my table looks like ...
create table Claim(
Id int identity(1,1) not null ,
DateTimeCreated dateTime not null default getDate(),
ScriptNumber varchar(20) not null,
IsResolved bit not null default 0,
ResolvedDateTime datetime,
PracticeId int not null references dbo.Practice(Id),
CreatedById int not null references dbo.SystemUser(Id)
CONSTRAINT [PK_Claim_Id] PRIMARY KEY CLUSTERED ([Id] ASC));
Created my index with :
create fulltext index idxClaimonIdFulltext Claim(ScriptNumber) KEY INDEX [PK_Claim_Id] ON ClaimDbCatalog
Then, looking at my test data ..
Finally, I attempt the fulltext search with
SELECT * from CONTAINSTABLE([dbo].[Claim], Scriptnumber, 'PR1234567890')
But this yields no results. I've tried using part of the text, but still no results.
What am I doing wrong?

The issue was due to me doing my tests in a TRAN. The moment I committed the data, the fulltext kicked in and worked.

Related

SQL Full Text Index on multiple tables and columns

We have electronic forms that filers fill out online and we store the data in an SQL Server. We want to provide a search feature that allows us to search inside each electronic filing for matching keywords. We don’t need to know what word matched or where in the form it matches, we just need a ranked list of forms that match our keywords. We think SQL Full-Text Searching would be our best option because we are already using SQL server 2016. We just started with implementing a solution but would like some guidance since this is new territory for us.
Here is an example of how our tables are structured.
Filing is our top-level table for all electronic forms. We have sub tables that are all related through the FilingId. The Form Six Published Filings table has child tables to store information like Assets. The Form One Published Filings table has child tables to store information like Liabilities.
CREATE SCHEMA [Forms]
GO
CREATE SCHEMA [Form6]
GO
CREATE SCHEMA [Form1]
GO
CREATE TABLE [Forms].[Filing](
[FilingId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Forms_Filing_FilingId] PRIMARY KEY CLUSTERED,
[FilerUserId] [int] NOT NULL,
[FormYear] [int] NOT NULL,
[FormTypeId] [int] NOT NULL,
[FilingStatusId] [int] NOT NULL,
[FilerSignatureId] INT NULL,
[SubmissionDate] DATETIME2(0) NULL,
[IsScannedForm] BIT NOT NULL
CONSTRAINT [DF_Forms_Filing_IsScannedForm] DEFAULT(0)
)
GO
CREATE TABLE [Form6].[FormSixPublishedFilings](
[FormSixPublishedFilingId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form6_FormSixPublishedFilings_FormSixPublishedFilingId] PRIMARY KEY CLUSTERED,
[FilingId] INT NOT NULL
CONSTRAINT [FK_Form6_FormSixPublishedFilings_Filings] FOREIGN KEY ([FilingId]) REFERENCES [Forms].[Filing] ([FilingId]),
[LastDateOfEmployment] DATE NULL,
[NetWorthDate] DATE NULL,
[NetWorth] MONEY NULL
)
GO
CREATE TABLE [Form6].[FormSixPublishedAssets](
[FormSixPublishedAssetId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form6_FormSixPublishedAssets_FormSixPublishedAssetId] PRIMARY KEY CLUSTERED,
[FormSixPublishedFilingId] INT NOT NULL
CONSTRAINT [FK_Form6_FormSixPublishedAssets_FormSixPublishedFilings] FOREIGN KEY ([FormSixPublishedFilingId]) REFERENCES [Form6].[FormSixPublishedFilings] ([FormSixPublishedFilingId]),
[Name] VARCHAR(8000) NOT NULL,
[Amount] MONEY NOT NULL
)
GO
CREATE TABLE [Form1].[FormOnePublishedFilings]
(
[FormOnePublishedFilingId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form1_FormOnePublishedFilings_FormOnePublishedFilingId] PRIMARY KEY CLUSTERED,
[FilingId] INT NOT NULL,
CONSTRAINT [FK_Form1_FormOnePublishedFilings_Filing] FOREIGN KEY ([FilingId]) REFERENCES [Forms].[Filing] ([FilingId]),
[HasServedAsAgent] BIT NULL,
[LastDateOfEmployment] DATE NULL,
[AmendmentReason] VARCHAR(1024) NULL,
)
GO
CREATE TABLE [Form1].[FormOnePublishedLiabilities]
(
[FormOnePublishedLiabilityId] INT NOT NULL IDENTITY(1,1)
CONSTRAINT [PK_Form1_FormOnePublishedLiabilities_FormOnePublishedLiabilityId] PRIMARY KEY CLUSTERED,
[FormOnePublishedFilingId] INT NOT NULL,
CONSTRAINT [FK_Form1_FormOnePublishedLiabilities_FormOnePublishedFilings] FOREIGN KEY ([FormOnePublishedFilingId]) REFERENCES [Form1].[FormOnePublishedFilings] ([FormOnePublishedFilingId]),
[NameOfCreditor] VARCHAR(8000) NOT NULL,
[AddressOfCreditor] VARCHAR(8000) NOT NULL
)
GO
In order to be able to search through all the forms, I think we need to create a view that just has two columns. One for the FilingId and the other column would be an XML data type which would be an XML representation of all the data in each electronic filing. This XML column is what we will be using to set up our full-text index. I think we will be using the FreeTextTable search because we would like to have the results ranked and also the search terms will be entered by end-users.
create view ViewForFullTextSearching with schemabinding as
select f.FilingId,
(select
filing.FilingId
,filing.FormYear
,filing.FormTypeId
,filing.FilingStatusId
,filing.FilerSignatureId
,filing.SubmissionDate
,filing.IsScannedForm
,form6Filing.LastDateOfEmployment 'Form6LastDateOfEmployment'
,form6Filing.NetWorthDate
,form6Filing.NetWorth
,form6Asset.Name
,form6Asset.Amount
,form1Filing.HasServedAsAgent
,form1Filing.LastDateOfEmployment 'Form1LastDateOfEmployment'
,form1Filing.AmendmentReason
,form1Liability.NameOfCreditor
,form1Liability.AddressOfCreditor
from Forms.Filing filing
left join Form6.FormSixPublishedFilings form6Filing on filing.FilingId = form6Filing.FilingId
left join Form6.FormSixPublishedAssets form6Asset on form6Filing.FormSixPublishedFilingId = form6Asset.FormSixPublishedFilingId
left join Form1.FormOnePublishedFilings form1Filing on filing.FilingId = form1Filing.FilingId
left join Form1.FormOnePublishedLiabilities form1Liability on form1Liability.FormOnePublishedFilingId = form1Filing.FormOnePublishedFilingId
where filing.FilingId = f.FilingId
for xml auto, type
) as 'Filing'
from Forms.Filing f
GO
create unique clustered index [IX_ViewForFullTextSearching_FilingId] ON [Forms].[ViewForFullTextSearching] ([FilingId])
GO
The above SQL does not actually work because I get this error.
Cannot create an index on view "EthicsFdms.Forms.ViewForFullTextSearching" because it contains one or more subqueries. Consider changing the view to use only joins instead of subqueries. Alternatively, consider not indexing this view.
So, I’m a bit lost on how to create a view with XML to search over if I’m not allowed to create a materialized view that has subqueries.
This view results look like this:
Next we setup our Full Text Catalog and Index on this view:
CREATE FULLTEXT CATALOG [FtcFilings];
GO
CREATE FULLTEXT INDEX ON [Forms].[ViewForFullTextSearching] ([Filing] language 1033) key index [IX_ViewForFullTextSearching_FilingId] on [FtcFilings];
GO
Then I was hoping we could search the filings like so:
select ftt.*
from [Forms].[Filing] filing
inner join freetextable(Forms.ViewForFullTextSearching, Filing, 'APPLE') as ftt on filing.FilingId = ftt.[KEY]
order by rank desc
Right now my challenges are, is it possible to create a materialized view like this? Seems like I can’t because materialized views can’t have subqueries. I’m not sure how to build the XML field w/out subqueries.
If I’m not able to create a materialized view then how else can I create a full-text index that can search electronic Forms?
You cannot create an indexed view (which is a synchronous materialized view in SQL Server) only if there is a mathematical surjection and all scalar computation is deterministic and precise. By the way OUTER JOIN, SUBQUERIES and set operators (UNION, EXCEPT, INTERSECT) cannot be used...
The best ways to design your systeme is to do it in the reverse way...
Create a persistent computed column using the CONCAT function of all the columns you want to fulltext index.
Create fulltext indexes on the computed columns
Create an UDF that search in the fulltext index on each tables and concatenate the result by UNION, and then aggregate results to compute the rank.
Let me know if you want more assistance to do so...
If these form filling data are seldom changed once created and it makes sense in business to store data of form1 and form6 together with its Filling, you may consider to go with document oriented design.
SQL server has good json support now. You can save all the Filling and form info in json, against which you can do full text search, and create views to simulate your current design if needed.
Here is an example -
create table tst.form (
form_id int not null identity primary key
,content_json nvarchar(max)
)
-- inside content_json, the json may look like -
{
"filler_user_id": 111,
"filler_type_id": 1,
"is_scanned_form": 1,
"form1": [
{
"form1_filling_id": 101,
"has_served_as_agent":0,
"liabilities": [{"name_of_creditor": "abc"}]
}
]
}
I only modelled form1 related info. You can add form6 related info as needed.
Then you can do full text search against this content_json column.
Then create views to simulate your current design if needed -
create or alter view tst.form_base WITH SCHEMABINDING as
select form_id
,convert(int, JSON_VALUE(content_json, '$.filler_user_id')) filler_user_id
,convert(int, JSON_VALUE(content_json, '$.filler_type_id')) filler_type_id
,convert(bit, JSON_VALUE(content_json, '$.is_scanned_form')) is_scanned_form
,JSON_QUERY(content_json, '$.form1') form1_json
from tst.form
create unique clustered index idx_form_base_form_id on tst.form_base(form_id);
-- you can create index as needed
create index idx_form_base_filler_user_id on tst.form_base(filler_user_id);
create or alter view tst.form1 as
select form_id
,a.form1_filling_id
,a.has_served_as_agent
,a.liabilities liabilities_json
from tst.form_base cross apply OPENJSON(form1_json) WITH (
form1_filling_id int '$.form1_filling_id',
has_served_as_agent int '$.has_served_as_agent',
liabilities nvarchar(max) '$.liabilities' as json) a
create or alter view tst.form1_liabilities as
select form_id
,form1_filling_id
,a.name_of_creditor
from tst.form1 cross apply OPENJSON(liabilities_json) WITH (
name_of_creditor nvarchar(max) '$.name_of_creditor') a
Then create some test data -
insert into tst.form (content_json) values ('{
"filler_user_id": 111,
"filler_type_id": 1,
"is_scanned_form": 1,
"form1": [
{
"form1_filling_id": 101,
"has_served_as_agent":0,
"liabilities": [{"name_of_creditor": "abc"}]
}
]
}');
insert into tst.form (content_json) values ('{
"filler_user_id": 222,
"filler_type_id": 1,
"is_scanned_form": 0,
"form1": [
{
"form1_filling_id": 102,
"has_served_as_agent":1,
"liabilities": [{"name_of_creditor": "def"}]
}
]
}');
Try it -
select *
from tst.form1_liabilities

Suggested Indexing for table with 50 million rows is queried using its CREATED_DATE column and USER_TYPE column

Table Users:
ID PK INT
USER_TYPE VARCHAR(50) NOT NULL
CREATED_DATE DATETIME2(7) NOT NULL
I have this table with 50 million rows, and it is queries using the following where clause:
WHERE
u.USER_TYPE= 'manager'
AND u.CREATED_DATE >= #StartDate
AND u.CREATED_DATE < #EndDate
What would be a good starting point for an index on this table to optimize for the above query where clause?
For that query, the index you want is a composite index with two columns: (user_type, created_date). The order matters, you want user_type first because of the equality comparison.
You'll be well served by creating a table with user types having an arbitrary INT ID and referring to the manager type by ID, instead of having the manager type directly in the users table. This will narrow the table data as well as any index referring to the user type.
CREATE TABLE user_type (
id INT NOT NULL IDENTITY(1,1),
description NVARCHAR(128) NOT NULL,
CONSTRAINT pk_user_type PRIMARY KEY CLUSTERED(id)
);
CREATE TABLE users (
id INT NOT NULL IDENTITY(1,1),
user_type_id INT NOT NULL,
created_date DATETIME2(7) NOT NULL,
CONSTRAINT pk_users PRIMARY KEY CLUSTERED(id),
CONSTRAINT fk_users_user_type FOREIGN KEY(user_type_id) REFERENCES user_type(id)
);
CREATE NONCLUSTERED INDEX
ix_users_type_created
ON
users (
user_type_id,
created_date
);
You would be querying using the user_type ID rather than directly with the text of course.
For any query. Run the query in SSMS with "Include Actual Execution Plan" on. SSMS will advice an index if it feels proper index doesn't exist.

I have a GUID Clustered primary key - Is there a way I can optimize or unfragment a table that might be fragmented?

Here's the code I have. The table actually has 20 more columns but I am just showing the first few:
CREATE TABLE [dbo].[Phrase]
(
[PhraseId] [uniqueidentifier] NOT NULL,
[PhraseNum] [int] NULL
[English] [nvarchar](250) NOT NULL,
PRIMARY KEY CLUSTERED ([PhraseId] ASC)
) ON [PRIMARY]
GO
From what I remember I read
Fragmentation and GUID clustered key
that it was good to have a GUID for the primary key but now it's been suggested it's not a good idea as data has to be re-ordered for each insert -- causing fragmentation.
Can anyone comment on this. Now my table has already been created is there a way to unfragment it? Also how can I stop this problem getting worse. Can I modify an existing table add NEWSEQUENTIALID?
Thats true ,NEWSEQUENTIALID helps to completely fill the data and index pages.
But NEWSEQUENTIALID datasize is 4 times than int.So 4 times more page will be require than int.
declare #t table(col int
,col2 uniqueidentifier DEFAULT NEWSEQUENTIALID())
insert into #t (col) values(1),(2)
select DATALENGTH(col2),DATALENGTH(col) from #t
Suppose x data page is require in case of int to hold 100 rows
In case of NEWSEQUENTIALID 4x data page will be require to hold 100 rows.
Therefore query will read more page to fetch same number of records.
So ,if you can alter table then you can add int identity column and make it PK+CI.You can drop or not [uniqueidentifier] as per your requirement or need.
Looks like this is dup to:
INT vs Unique-Identifier for ID field in database
But here's a rehash for your issue:
Rather than a guid and depending on your table depth, int or big int would be better choices, both from storage and optimization vantages. You might also consider defining the field as "int identity not null" to further help population.
GUIDs have a considerable storage impact, due to their length.
CREATE TABLE [dbo].[Phrase]
(
[PhraseId] [int] identity NOT NULL
CONSTRAINT [PK_Phrase_PhraseId] PRIMARY KEY,
[PhraseNum] [int] NULL
[English] [nvarchar](250) NOT NULL,
....
) ON [PRIMARY]
GO

In H2 Database, add index while table creation in single query

I am trying to create table having different indexes with single query but H2 gives Error for example:
create table tbl_Cust
(
id int primary key auto_increment not null,
fid int,
c_name varchar(50),
INDEX (fid)
);
but this gives error as
Unknown data type: "("; SQL statement:
[Error Code: 50004]
[SQL State: HY004]
Due to this I have to run 2 different queries to create table with Index. First query to create table and then second query to add index with
create INDEX c_fid on tbl_Cust(fid);
Is there something wrong in my query or H2 simply does not support this creation of table with index in single query?
Interesting question. The solution is even more interesting, as it involves MySQL compatibility mode.
It's actually possible to perform the exact same command you wrote without any modification, provided you just add to your jdbc url the MySQL mode.
Example URL like this: jdbc:h2:mem:;mode=mysql
SQL remains:
create table tbl_Cust
(
id int primary key auto_increment not null,
fid int,
c_name varchar(50),
INDEX (fid)
);
Update count: 0
(15 ms)
Too bad I did not see this question earlier... Hopefully the solution might become handy one day to someone :-)
I could resolve the problem. According to
http://www.h2database.com/html/grammar.html#create_index
I modified the query. It works fine with my H2 server.
CREATE TABLE subscription_validator (
application_id int(11) NOT NULL,
api_id int(11) NOT NULL,
validator_id int(11) NOT NULL,
PRIMARY KEY (application_id,api_id),
CONSTRAINT subscription_validator_ibfk_1 FOREIGN KEY (validator_id) REFERENCES validator (id) ON UPDATE CASCADE
);
CREATE INDEX validator_id ON subscription_validator(validator_id);

Ensuring uniqueness of multiple large URL fields in MS SQL

I have a table with the following definition:
CREATE TABLE url_tracker (
id int not null identity(1, 1),
active bit not null,
install_date int not null,
partner_url nvarchar(512) not null,
local_url nvarchar(512) not null,
public_url nvarchar(512) not null,
primary key(id)
);
And I have a requirement that these three URLs always be unique - any individual URL can appear many times, but the combination of the three must be unique (for a given day).
Initially I thought I could do this:
CREATE UNIQUE INDEX uniques ON url_tracker
(install_date, partner_url, local_url, public_url);
However this gives me back the warning:
Warning! The maximum key length is 900 bytes. The index 'uniques' has maximum
length of 3076 bytes. For some combination of large values, the insert/update
operation will fail.
Digging around I learned about the INCLUDE argument to CREATE INDEX, but according to this question converting the command to use INCLUDE will not enforce uniqueness on the URLs.
CREATE UNIQUE INDEX uniques ON url_tracker (install_date)
INCLUDE (partner_url, local_url, public_url);
How can I enforce uniqueness on several relatively large nvarchar fields?
Resolution
So from the comments and answers and more research I'm concluding I can do this:
CREATE TABLE url_tracker (
id int not null identity(1, 1),
active bit not null,
install_date int not null,
partner_url nvarchar(512) not null,
local_url nvarchar(512) not null,
public_url nvarchar(512) not null,
uniquehash AS HashBytes('SHA1',partner_url+local_url+public_url) PERSISTED,
primary key(id)
);
CREATE UNIQUE INDEX uniques ON url_tracker (install_date,uniquehash);
Thoughts?
I would make a computed column with the hash of the URLs, then make a unique index/constraint on that. Consider making the hash a persisted computed column. It shouldn't have to be recalculated after insertion.
Following the ideas from the conversation in the comments. Assuming that you can change the datatype of the URL to be VARCHAR(900) (or NVARCHAR(450) if you really think you need Unicode URLs) and be happy with the limitation on the length of the URL, this solution could work. This also assumes SQL Server 2008 or better. Please, always specify what version you're working with; sql-server is not specific enough, since solutions can vary greatly depending on the version.
Setup:
USE tempdb;
GO
CREATE TABLE dbo.urls
(
id INT IDENTITY(1,1) PRIMARY KEY,
url VARCHAR(900) NOT NULL UNIQUE
);
CREATE TABLE dbo.url_tracker
(
id INT IDENTITY(1,1) PRIMARY KEY,
active BIT NOT NULL DEFAULT 1,
install_date DATE NOT NULL DEFAULT CURRENT_TIMESTAMP,
partner_url_id INT NOT NULL REFERENCES dbo.urls(id),
local_url_id INT NOT NULL REFERENCES dbo.urls(id),
public_url_id INT NOT NULL REFERENCES dbo.urls(id),
CONSTRAINT unique_urls UNIQUE
(
install_date,partner_url_id, local_url_id, public_url_id
)
);
Insert some URLs:
INSERT dbo.urls(url) VALUES
('http://msn.com/'),
('http://aol.com/'),
('http://yahoo.com/'),
('http://google.com/'),
('http://gmail.com/'),
('http://stackoverflow.com/');
Now let's insert some data:
-- succeeds:
INSERT dbo.url_tracker(partner_url_id, local_url_id, public_url_id)
VALUES (1,2,3), (2,3,4), (3,4,5), (4,5,6);
-- fails:
INSERT dbo.url_tracker(partner_url_id, local_url_id, public_url_id)
VALUES(1,2,3);
GO
/*
Msg 2627, Level 14, State 1, Line 3
Violation of UNIQUE KEY constraint 'unique_urls'. Cannot insert duplicate key
in object 'dbo.url_tracker'. The duplicate key value is (2011-09-15, 1, 2, 3).
The statement has been terminated.
*/
-- succeeds, since it's for a different day:
INSERT dbo.url_tracker(install_date, partner_url_id, local_url_id, public_url_id)
VALUES('2011-09-01',1,2,3);
Cleanup:
DROP TABLE dbo.url_tracker, dbo.urls;
Now, if 900 bytes is not enough, you could change the URL table slightly:
CREATE TABLE dbo.urls
(
id INT IDENTITY(1,1) PRIMARY KEY,
url VARCHAR(2048) NOT NULL,
url_hash AS CONVERT(VARBINARY(32), HASHBYTES('SHA1', url)) PERSISTED,
CONSTRAINT unique_url UNIQUE(url_hash)
);
The rest doesn't have to change. And if you try to insert the same URL twice, you get a similar violation, e.g.
INSERT dbo.urls(url) SELECT 'http://www.google.com/';
GO
INSERT dbo.urls(url) SELECT 'http://www.google.com/';
GO
/*
Msg 2627, Level 14, State 1, Line 1
Violation of UNIQUE KEY constraint 'unique_url'. Cannot insert duplicate key
in object 'dbo.urls'. The duplicate key value is
(0xd111175e022c19f447895ad6b72ff259552d1b38).
The statement has been terminated.
*/