OK this is the situation..
I am enabling fulltext search on a table but it only works on some fields..
CREATE FULLTEXT CATALOG [defaultcatalog]
CREATE UNIQUE INDEX ui_staticid on static(id)
CREATE FULLTEXT INDEX ON static(title_gr LANGUAGE 19,title_en,description_gr LANGUAGE 19,description_en) KEY INDEX staticid ON [defaultcatalog] WITH CHANGE_TRACKING AUTO
now why the following will bring results
Select * from static where freetext(description_en, N'str')
and this not (while the both have text with str in it ..)
Select * from static where freetext(description_gr, N'str')
(i have tried it also without the language specification - greek in this case)
(the collation is of the database is Greek_CI_AS)
btw
Select * from static where description_gr like N'%str%'
will work just fine ..
all fields are nvarchar type and the _gr fields hold english and greek text..(should not matter)
All help will be greatly appreciated
Just trying to figure out what's going on: what do you get with this query here?
SELECT * FROM static WHERE FREETEXT(*, N'str')
If you're not explicitly specifying any column to search in - does it give you the expected results?
Another point: I think you have a wrong language ID in your statement. According to SQL Server Books Online:
When specified as a string,
language_term corresponds to the alias
column value in the syslanguages
system table. The string must be
enclosed in single quotation marks, as
in 'language_term'. When specified
as an integer, language_term is the
actual LCID that identifies the
language.
and from what I found on the internet searching around, the LCID for Greek is 1032 - not 19. Can you try with 1032 instead of 19? Does that make a difference?
Marc
Related
I have a following (simplified) database structure:
[places]
name NVARCHAR(255)
description TEXT (usually quite a lot of text)
region_id INT FK
[regions]
id INT PK
name NVARCHAR(255)
[regions_translations]
lang_code NVARCHAR(5) FK
label NVARCHAR(255)
region_id INT FK
In real db I have few more fields in [places] table to search in, and [countries] table with similar structure to [regions].
My requirements are:
Search using name, description and region label, using the same behaviour as name LIKE '%text%' OR description LIKE '%text5' OR regions_translations.label LIKE '%text%'
Ignore all special characters like Ą, Ć, Ó, Š, Ö, Ü, etc. so for example, when someone search for
PO ZVAIGZDEM I return a place with name PO ŽVAIGŽDĖM - but of course also return this record, when user uses proper characters with accents.
Quite fast. ;)
I had a few approaches to solve this issue.
Create new column 'searchable_content', normalize text (so replace Ą -> A, Ö -> O and so on) and just do simple SELECT ... FROM places WHERE searchable_content LIKE '%text%' but it was slow
Add fulltext search index to table places and regions_translations - it was faster, but I could not find a way to ignore special characters (characters are from various of languages, so specyfying index language will not work)
Create new column as in first attempt, and addfulltext index only on that column - it was faster then attempt 1 (probably because I do not need to join tables) and I could manually normalize the content, but I feel like it's not a great solution.
Question is - what is the best approach here?
My top priority is to ignore special characters.
EDIT:
ALTER FULLTEXT CATALOG [catalog_name] REBUILD WITH ACCENT_SENSITIVITY = OFF
Probably is a solution to my issue with special characters (need to test it a bit more) - I query too fast, and index did not rebuild, that's why I did not get any records.
You can use the COLLATE clause on a column to specify a sql collation that will treat these special characters as their non-accented counterparts. Think of it as essentially casting one data type as another, except you're casting é as e (for example). You can use the same tool to return case sensitive or case insensitive results.
The documentation talks a little more about it, and you can do a search to find exactly which collation works best for you.
https://learn.microsoft.com/en-us/sql/t-sql/statements/collations?view=sql-server-ver16
When using Microsoft SQL Server Management Studio 14.0.17254.0 and it's query editor, I noticed that the word configuration is always syntax highlighted as a blue keyword. Even though it doesn't appear to be in the list of Reserved Keywords. Is this a keyword in some other SQL standard or has this been a keyword before? Is there any reason this should be highlighted with the same color as SELECT or WHERE?
I found similar question asking about other keywords, and they all seemed to have logical reasons, but couldn't find anything for this word.
Primary reason for this question is that I want to know if using this as a column name in queries without the the brace encapsulation [configuration] is completely safe.
There are both Reserved Keywords, and Keywords in SQL Server. CONFIGURATION is a keyword, but it isn't reserved. Just like int (for the datatype int), and other words, are a keyword , but you could still use it in a statement unquoted:
CREATE TABLE sample (int int,
date date,
char varchar(10),
last decimal(12,2),
first numeric(12,2),
system char(2));
Every single word there is a keyword of some kind (including sample), but only CREATE and TABLE are reserved. char is even a datatype and a function and isn't reserved, so you could have a nonsensical statement like CONVERT(char,CHAR(Char)), and every reference to "char" has a different meaning.
CONFIGURATION isn't a future keyword either, so at this time, you would seem fine to use it.
I have a Stored Procedure, which uses full-text search for my nvarchar fields. And I'm stuck when I realized, that Full-Text Search can't find field if I type only numeric values of this field.
For example, I have field Name in my table with value 'Request_121'
If I type Запрос_120 or Request - it's okay
If I type 120 - nothing is found
What is going on?
Screenshots:
No results found: https://gyazo.com/9e9e061ce68432c368db7e9162909771
Results found: https://gyazo.com/e4cb9a06da5bf8b9f4d702c55e7f181e
You cannot find 121 word part in your full-indexed column because SQL Server treats Request_121 as a single term. You can verify this by running the fts parser manually:
select * from sys.dm_fts_parser('"Request_121"', 1033, 0, 0)
Returns:
while running:
select * from sys.dm_fts_parser('"Request 121"', 1033, 0, 0)
Returns:
Note, in the second example 121 was picked as separate search term.
What you could do is to try using wildcards in your FTS query like:
FROM dbo.CardSearchIndexes idx WHERE CONTAINS(idx.Name, '"121*"');
However, again I doubt it will pick 121 being inside a non-breakable word part, only if you have 121 as standalone word. Play with sys.dm_fts_parser to see how SQL FTS engine breaks up your input and adjust your query accordingly.
UPDATE: I've noticed that you use Cyrillic search terms together with English. Notice, when running FTS queries it's also important to know what Language was specified when FTS index was created for Name column. If the FTS language locale is Cyrillic then it will not find English term Request in the Name column.
Note, in my dm_fts_parser examples above I have used 1033 (English) language id. Examine the LANGUAGE language_term operator in your CREATE FULLTEXT INDEX statement to check what language was used for FTS index.
I have field Name in my table with value 'Request_121'
Your query is wrong, you have a typo, write 121 instead of 120
FROM dbo.CardSearchIndexes idx WHERE CONTAINS(idx.Name, '121');
I have a table called TestTable with the below schema.
---------------------------------------------------------
ID(Integer) | Text(nvarchar(450)) | LanguageCode(Integer)
---------------------------------------------------------
Where ID is a primary key and Text column contains text strings in multiple languages.
I would like to create a full text index on the above table.
CREATE FULLTEXT INDEX ON TestTable
(
Text
Language <Should get language code from Language column>,
)
KEY INDEX PK_TestTableID;
GO
How can I achieve this?
Please help.
It's impossible to do it on a single table at the moment. With SQL Server you can only index a column with a single language.
What Microsoft suggest is to use a neutral word breaker (http://technet.microsoft.com/en-us/library/ms142507.aspx).
You're query would be as follows:
SELECT Description
FROM Asset
WHERE FREETEXT(Description, 'Cats', 'de-DE');
The problem with this is that you don't get the obvious benefits of breaking in the language of that text.
What you could do is have a view for each culture of the table and index with that cultures specific word breaker:
e.g TestTableGermanView
I stumbled upon this in the SQL docs. I haven't dug much deeper, but it looks interesting.
"For plain text content - When your content is plain text, you can convert it to the xml data type and add language tags that indicate the language corresponding to each specific document or document section. For this to work, however, you need to know the language before full-text indexing."
https://technet.microsoft.com/en-us/library/ms142507.aspx
I'm looking to perform case-insensitive search in a Firebird database, without modifying actual queries. In other words, I'd like all my existing "SELECT/WHERE/LIKE" statements to retrieve BOB, Bob, and bob. Does Firebird configuration allow to modify this behavior?
Try using something like:
Imagine you have a table of persons like this one:
CREATE TABLE PERSONS (
PERS_ID INTEGER NOT NULL PRIMARY KEY,
LAST_NAME VARCHAR(50),
FIRST_NAME VARCHAR(50)
);
Now there is an application, which allows the user to search for persons by last name and/or first name. So the user inputs the last name of the person he is searching for.
We want this search to be case insensitive, i.e. no matter if the user enters "Presley", "presley", "PRESLEY", or even "PrESley", we always want to find the King.
Ah yes, and we want that search to be fast, please. So there must be an index speeding it up.
A simple way to do case insensitive comparisons is to uppercase both strings and then compare the uppercased versions of both strings.
Uppercasing has limitations, because some letters cannot be uppercased. Note also that there are languages/scripts where there is no such thing as case. So the technique described in this article will work best for European languages.
In order to get really perfect results one would need a case insensitive (CI) and/or accent insensitive (AI) collation. However, at the time of this writing (July 2006) there are only two Czech AI/CI collations for Firebird 2.0. The situation will hopefully improve over time.
(You should know the concepts of Character Sets and Collations in order to understand what comes next. I use the DE_DE collation in my examples, this is the collation for German/Germany in the ISO8859_1 character set.)
In order to get correct results from the UPPER() function that is built into Firebird, you must specify a collation. This can be in the DDL definition of the table:
CREATE TABLE PERSONS (
PERS_ID INTEGER NOT NULL PRIMARY KEY,
LAST_NAME VARCHAR(50) COLLATE DE_DE,
FIRST_NAME VARCHAR(50) COLLATE DE_DE
);
or it can be done when calling the UPPER() function:
SELECT UPPER (LAST_NAME COLLATE DE_DE) FROM PERSONS;
http://www.destructor.de/firebird/caseinsensitivesearch.htm
or you can edit your queries and add the lower() function
LOWER()
Available in: DSQL, ESQL, PSQL
Added in: 2.0
Description: Returns the lower-case equivalent of the input string. This function also correctly lowercases non-ASCII characters, even if the default (binary) collation is used. The character set must be appropriate though: with ASCII or NONE for instance, only ASCII characters are lowercased; with OCTETS, the entire string is returned unchanged.
Result type: VAR(CHAR)
Syntax:
LOWER (str)
Important
If the external function LOWER is declared in your database, it will obfuscate the internal function. To make the internal function available, DROP or ALTER the external function (UDF).
Example:
select field from table
where lower(Name) = 'bob'
http://www.firebirdsql.org/refdocs/langrefupd21-intfunc-lower.html
Necromancing.
Instead of a lowercase-field, you can specify the collation:
SELECT field FROM table
WHERE Name = 'bob' COLLATE UNICODE_CI
See
https://firebirdsql.org/refdocs/langrefupd21-collations.html
Maybe you can also specify it as the default character-set for the database, see
http://www.destructor.de/firebird/charsets.htm
Eventually I went with creating shadow columns containing lower-cased versions of required fields.