I have a problem in SQL Server 2000 when searching for a term in farsi - sql-server-2000

I have a problem in SQL Server 2000 with farsi search.
I have a table with nvarchar fields with unicode (farsi) values and need to search content of that with unicode (farsi) text.
I am using
select * from table1
where fieldname like '%[farsi word]%'
My farsi word is exist but return 0 row.
What can I do?
thanks all.

If you're using NVARCHAR fields, you should also use Unicode when searching! You do this by prepending a N before your search term:
select * from table1
where fieldname like N'%[farsi word]%'
Also: be aware the if your search term begins with a % wildcard, you've basically disabled all use of any indices there might be to speed up your search. Using LIKE %...% for searching will always result in a pretty slow table scan....

Related

full-text search does not work with MATCH() and AGAINST()

I tried full text search by postgresql with this code
SELECT * FROM test_table WHERE MATCH (discription) AGAINST ('remote controller');
name of column is "discription"
keyword is "remote controller"
error message is here
ERROR: syntax error at or near "AGAINST"
1: ...LECT * FROM test_table WHERE MATCH (discription) AGAINST ('...
I can not figure out what's wrong??
As Gordon Linoff said, you are using MySQL syntax in Postrgres. The alternative query for Pstgreses is the following:
SELECT *
FROM test_table
WHERE to_tsvector(discription) ## to_tsquery('remote controller');
In a nutshell... the ts_vector data type is provided for storing preprocessed documents (is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word). When the tsquery is used to represent processed queries.
more information you can find in the following links:
https://www.postgresql.org/docs/12/textsearch-tables.html#TEXTSEARCH-TABLES-SEARCH
https://www.postgresql.org/docs/12/textsearch.html
Note: to improve the search performance you can use also index on this:
https://www.postgresql.org/docs/12/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX
This query will work:
SELECT * FROM test_table WHERE
difference(discription, 'remote controller') > 2;
The Soundex system is a method of matching similar-sounding names by converting them to the same code.
The soundex function converts a string to its Soundex code. The difference function converts two strings to their Soundex codes and then reports the number of matching code positions. Since Soundex codes have four characters, the result ranges from zero to four, with zero being no match and four being an exact match.
NOTE : the difference function gives the difference between the soundex code of two string.

T-SQL CONTAINS with numbers and dots (.)

Let's consider User.Note = 'Version:3.7.21.1'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7.2*"')
=> returns something
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns nothing
If User.Note = 'Version:3.7.21'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns something
If User.Note = 'Version:3.72.21'
SELECT * FROM [USER] WHERE CONTAINS(NOTE, '"3.7*"')
=> returns nothing
I can't figure out how it works. It should always returns something when I search for "3.7*".
Do you know what's the logic behind this ?
PS: if I replace the numbers by letters, there's no problem.
I think your problem is being caused by the unpredictability of the word breaker interacting with the punctuation marks within the data. Full text search is based on the concept of strings of characters, not including spaces and punctuation. When the engine is building the index it sees the periods and breaks the word in weird ways.
As an example, I made a small table with the three values you provided...
VALUES (1,'3.7.21.1'),(2,'3.7.21'),(3,'3.72.21')
Now when I do your selects, I get results on all four... not the results I expect, though.
For me, this returns all three values
SELECT * FROM containstext WHERE CONTAINS(secondid, '"3.7.2*"')
and this returns only 3.7.21
SELECT * FROM containstext WHERE CONTAINS(secondid, '"3.7*"')
So let's run this and take a look at the contents of the full text index
SELECT * FROM sys.dm_fts_index_keywords(db_id('{databasename}'), object_id('{tablename}'))
For my results (yours are quite probably different) I've got the following display_term values
display_term document_count
21 3
3 3
3.7.21 1
7 2
72 1
So let's look at the first search criterion '"3.7.2*"'
If I shove that into sys.dm_fts_parser...
select * from sys.dm_fts_parser('"3.7.2*"', 1033, NULL, 0)
...it's showing me that it's breaking with matches on
3
7
2
But if I do...
select * from sys.dm_fts_parser('"3.7*"', 1033, NULL, 0)
I'm getting a single exact match on the term 3.7 and sys.dm_fts_index_keywords told me earlier that I only have one document/row that contains 3.7
You might also experience additional weirdness because numbers 0-9 are usually in the system stopwords and can be left out of an index because they're considered to be useless. This might be why it works when you change to letters.
Also, I know you've decided to replace LIKE, but Microsoft has suggested that you only use alphanumeric characters in your full text indexes and, if you need to use non-alphanumeric characters in search criteria, you should use LIKE. Perhaps changing the periods to some alphanumeric replacement that won't be used in normal values?
Contains will only work if the column is in a full text index. If it it is not indexed you will need to use like:
SELECT * FROM [USER] WHERE NOTE like '3.7%' --or '%3.7%
Are you wanting to use CONTAINS because you think it will be faster?(It generally is)
The Microsoft document lists all the ways you can format and use CONTAINS(11 examples)
Here is the Microsoft doc on CONTAINS

How to search using similar characters in random positions in Microsoft SQL?

I'm looking for a query that would allow the user to use a variation of characters while searching for a result. The character positions are completely random. We use special characters È,Š,Ć,Č,Ž and Đ so all of the variations have to match, because most of users do not know how to spell correctly.
Example:
MISIC
MISIĆ
MISIČ
MIŠIC
MIŠIĆ
MIŠIČ
You can search it by using COLLATE
SELECT *
FROM TableNAme
WHERE
columnName COLLATE Like '%MISIC%' COLLATE Latin1_general_CI_AI
latin1 makes the server treat strings using charset latin 1,
basically ascii.
CI specifies case-insensitive, so "ABC" equals to "abc".
AI specifies accent-insensitive,so 'ü' equals to 'u'.
for more information collation go through the
Collete
refereance : #JINO SHAJI
as per #Adephx comment this is working as expected with few modification
SELECT * FROM [TABLE] WHERE [COLUMN] LIKE '%NAME%' COLLATE Latin1_general_CI_AI
Applying COLLATION is a great practice, especially if we want to get rid of all Accent-marks, however, if we need more granular control over individual accent-characters (È,Š,Ć,Č,Ž), we can do something like below to selectively compare individual accent-characters.
Most DBMSs provide string-comparison functionality based on how the words sound (pronounced). SQL Server provides two built-in functions for this: SOUNDEX() and DIFFERENCE(). In this scenario we can do this:
IF (DIFFERENCE('MISIC', 'MISIĆ')>=4)
AND (DIFFERENCE('MISIC', 'MISIČ')>=4)
AND (DIFFERENCE('MISIC', 'MIŠIC')>=4)
AND (DIFFERENCE('MISIC', 'MIŠIĆ')>=4)
AND (DIFFERENCE('MISIC', 'MIŠIČ')>=4)
PRINT 'Same word'
ELSE
PRINT 'Different word.'
Actually, in many languages 'Š' sounds quite different than 'S', therefore SQL Server considers them as less-compatible, but here is a workaround to impose equivalence:
WITH words AS (SELECT value FROM STRING_SPLIT(N'MISIĆ,MISIČ,MIŠIC,MIŠIĆ,MIŠIČ', ','))
SELECT
value,
CASE WHEN (DIFFERENCE('MISIC', replace(value,'Š','S'))>=4)
THEN 'Same word'
ELSE 'Not same'
END AS 'Comparison'
FROM words
Output:
value comparison
----- ----------
MISIĆ Same word
MISIČ Same word
MIŠIC Same word
MIŠIĆ Same word
MIŠIČ Same word
Above example will work in "Microsoft SQL Server 2016" or above, note that the STRING_SPLIT() function is only used to iterate over the array of words/strings, this function is not available in SQL Server 2014 or below.
Hope this helps.

Regular Expression Pattern for Search in SQL

I want to search a table which has file name(s) with a {Numerical Pattern String}.PDF.
Example: 1.PDF, 12.PDF, 123.PDF 1234.PDF etc.....
select * from web_pub_subfile where file_name like '[0-9]%[^a-z].pdf'
But above SQL Query is resulting even these kind of files
1801350 Ortho.pdf
699413.processing2.pdf
15-NOE-301.pdf
Could any one help me what I am missing here.
One way to do it is getting the substring before the file extension and checking if it is numeric. This solution only works well if there is only one . character in the file name.
select * from web_pub_subfile
where isnumeric(left(file_name,charindex('.',file_name)-1)) = 1
Note:
ISNUMERIC returns 1 for some characters that are not numbers, such as plus (+), minus (-), and valid currency symbols such as the dollar sign ($).
To handle file names with mutliple . characters and if there is always a .filetype extension, use
select * from web_pub_subfile
where isnumeric(left(file_name,len(file_name)-charindex('.',reverse(file_name)))) = 1
and charindex('.',file_name) > 0
Sample demo
As suggested by #Blorgbeard in the comments, to avoid the use of isnumeric, use
select * from web_pub_subfile
where left(file_name,len(file_name)-charindex('.',reverse(file_name))) NOT LIKE '%[^0-9]%'
and len(left(file_name,len(file_name)-charindex('.',reverse(file_name)))) > 0
You can't really do what you are trying to do using plain out of the box sql. The reason you are seeing those results is that the % character matches any character, any number of times. It's not like * in a regex which matches the pervious character 0 or more times.
Your best option would probably be to create some CLR functions that implement regex functionality on the SQL Server side. You can take a look at this link to find a good place to start.
Depending on your version if 2012+, you could use Try_Convert()
select * from web_pub_subfile where Try_Convert(int,replace(file_name,'.pdf',''))>0
Declare #web_pub_subfile table (file_name varchar(100))
Insert Into #web_pub_subfile values
('1801350 Ortho.pdf'),
('699413.processing2.pdf'),
('15-NOE-301.pdf'),
('1.pdf'),
('1234.pdf')
select * from #web_pub_subfile where Try_Convert(int,replace(file_name,'.pdf',''))>0
Returns
file_name
1.pdf
1234.pdf

Simple SQL result zero rows

I have really simple SQL which howevere returns 0 rows:
SQL:
SELECT `article_id`, `article_title`, `article_url`, `article_text`,
article_img`, `article_shares`, `article_likes`, `article_date`,
(SELECT COUNT(comment_id) FROM ci_comments WHERE comment_aid=article_id)
AS commentCount
FROM (`ci_articles`)
WHERE `article_url` = 'Jednym-slovom2'
SQL Table:
Any help is appreciated
Is your column article_url a padded column? If it always has 15 characters, then you might have whitespace at the end of the data that is less than 15 characters, ie 'Jednym-slovom2' might actually be 'Jednym-slovom2 '. The MS Sql Server datatype NCHAR will have columns like this. To solve this problem, just use text wildcards on your search like '%Jednym-slovom2%'
Try to copy the text out of the table and into your where clause. That will pull out spaces and weird characters.