Query SQl Server 2005 Full Text Search noise/stop words - sql-server-2005

Is it possible to get the list of Full Text Search noise/stop words from SQL Server 2005 by querying the database?
I am aware that the noise words are in a text file ~/FTData/noiseEng.txt but this file is not accessible to our application.
I've look at the sys.fulltext_* tables but these don't seem to have the words.

It appears that this is not possible in SQL 2005 but is in SQL Server 2008.
Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / StopLists
This next query gets a list of all of
the stopwords that ship with SQL
Server 2008. This is a nice
improvement, you can not do this in
SQL Server 2005.
Stopwords and Stoplists - SQL Server 2008
SQL Server 2005 noise words have been
replaced by stopwords. When a database
is upgraded to SQL Server 2008 from a
previous release, the noise-word files
are no longer used in SQL Server 2008.
However, the noise-word files are
stored in the FTDATA\
FTNoiseThesaurusBak folder, and you
can use them later when updating or
building the corresponding SQL Server
2008 stoplists. For information about
upgrading noise-word files to
stoplists, see Full-Text Search
Upgrade.

I just copy the noise words file from \Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData into my app, and use it to strip noise words.
Public Function StripNoiseWords(ByVal s As String) As String
Dim NoiseWords As String = ReadFile("/Standard/Core/Config/noiseENU.txt").Trim
Dim NoiseWordsRegex As String = Regex.Replace(NoiseWords, "\s+", "|") ' about|after|all|also etc.
NoiseWordsRegex = String.Format("\s?\b(?:{0})\b\s?", NoiseWordsRegex)
Dim Result As String = Regex.Replace(s, NoiseWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each noise word with a space
Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces
Return Result
End Function

Related

UTF-8 conversion into SQL Windows Server 2008 R2 from 2003

We have been using SQL Server on Microsoft Windows Server 2003 SP2, and are now attempting to transfer across to a new server, running 2008 R2. One of our clients has a seperate jobs database which creates text files that are updated via FTP to a folder on our server 3 times daily, to then be imported into a corresponding series of tables in our database. Here is the old code for the import:
Delete
From Client.dbo.jobs
Go
BULK INSERT CarltonRR.dbo.jobs
FROM 'D:\folder\clientDatabaseUpload\jobs.txt'
WITH
(
DATAFILETYPE='char',
CODEPAGE = '65001',
FIELDTERMINATOR = '|',
ROWTERMINATOR = '\|\n'
)
Go
After the initial errors, and subsequent searching, I removed the 'CODEPAGE = '65001', line because of the issues mentioned in this documentation, that 2008 R2 does not support UTF-8, however the database would automatically convert to UTF-16. This resulted in problems displaying some characters (£ for example) which the old system handles fine. The Data Type for the field(s) that are not displaying properly is varchar(50)
Is there a change that needs to be made to the SQL queries from 2003 to 2008 R2 that would allow the special characters in the .txt files to be displayed in the database?
Edit: The Data Type for the field(s) in question is nvarchar(50), not varchar(50)
Edit 2: If it helps, the listed sign in replacement of the ' £ ' sign is ' ┬ú '

Restoring SQL Server 2000 database on a 2008 R2 is creating a new logical file

I have a database on SQL Server 2000. There are only two logical files in the PRIMARY file group: the data file and the log file. However, when restoring the database to SQL Server 2008 R2, there is now a new logical file named ftrow_Table1Field1 with a file name ftrow_Table1Field1{GUID}.ndf. (I've replaced the actual table, field name, and GUID for simplicity.) The path to the .ndf file is MSSQL10_50.MSSQLSERVER\MSSQL\FTData\.
I did not create this logical file, nor did I enable full-text search on the database. Field1 was originally a TEXT data type in SQL Server 2000, which I've changed via T-SQL to a VARCHAR(MAX) column. This is also not the only column I've converted from TEXT to VARCHAR(MAX).
Can anyone shed some light on what is going on here?
EDIT: I did another restore without running my massive T-SQL scripts for the next software release. Direct from the SQL Server 2000 backup, it creates this file. Looking at the Properties of the field in SSMS, it says Full Text is False. The data type is TEXT. This is not the only TEXT field in the database.
Okay. I figured it out. The SQL 2000 database thought there was a full-text index enabled on the field, but it wasn't really enabled. This carried over to SQL 2008 R2 during the restore, because R2 restored in SQL 2000 compatibility mode and preserved the presumed .NDF. I just removed that file from the file group, and it's good. Also, R2 will create full-text indexes in the .MDF itself, as opposed to creating an .NDF.

Generating TCP-H database for SqlServer 2008

Is there any way to populate TCP-H database for Microsoft SQL Server 2008.
TPC-H provides a DBGEN tool that can create huge tables according to a schema. By default, it generates text files (one per table) with tuples represented in lines and '|' separating the columns in a tuple and new line for the tuple end.
I need that huge table to be imported in SQL Server 2008.
This is the method.

Fulltext Search in SQL Server 2008 Step by step

How to get started with Fulltext Search in SQL Server 2008
read these links:
SQL SERVER – 2008 – Creating Full Text Catalog and Full Text Search
Using Full Text Search in SQL Server 2008
Setting Up Full Text Search: A Step-by-step Guide
Full-Text Search (SQL Server)
SQL Server 2008 Full Text Search Best Practices from the SQL CAT Team
I would add those links from Simple Talk's web site:
Understanding Full-Text Indexing in SQL Server
SQL Server Full Text Search Language Features
SQL Server Full Text Search Language Features - Part 2

Custom StopWord List In SQL Server 2005 Full-Text-Search

Is there anyway to add some custom stop words to SQL Server 2005?
I found the answer:
On SQL Server 2005:
On SQL 2005 they have the concept of "noise word lists". These are essentially the same thing, but they're stored as text files in the file system. These files have names like "noiseenu.txt" (U.S. English noise word text file) and are located in a subdirectory of your SQL Server instance directory (C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData\, for instance). You can edit it with any text editor and save again. I don't recall whether or not you need to bounce the service afterwards on 2005 (don't recall if the noiseword list is cached in memory, but you may as well bounce it to be sure). Then you have to rebuild your full-text indexes.
On SQL Server 2008:
You can create a custom stopword list on SQL 2008 the server will remove the stopwords at index time and when it parses your full-text search queries. All you have to do is specify that your full-text index use the custom stoplist.