Sql LIKE in Arabic? - sql

Consider this sample:
CREATE TABLE #tempTable
(name nvarchar(MAX))
INSERT INTO #tempTable VALUES (N'إِبْرَاهِيمُ'), (N'إبراهيم')
SELECT * FROM #tempTable WHERE name = N'إبراهيم'
SELECT * FROM #tempTable WHERE name LIKE N'%إبراهيم%'
Both selects only return إبراهيم but not إِبْرَاهِيمُ. How can I make it ignore these non-alphabetical characters in search? In other words, I want to get all similar words, including those with non-alpha characters.

You do not do it. Simple. NOTHING about arabic here - you have the same problem in english.
How can I make it ignore these non-alphabetical characters in search?
Like numbers? NOT AT ALL. Not with "standard SQL Syntax".
If you can, put a full text index on the field. And use the full text search syntax in your query. This is what it is for.

There is a thread over at sql stackexchange that has a workaround for this issue.
https://dba.stackexchange.com/questions/14153/treating-certain-arabic-characters-as-identical

Unfortunately, there is no case sensitive Arabic language, and of course, both select statements will return 'إبراهيم' because they were ordered to do that.
This is a problem we have been suffering from for a very long time, people always look for 'احمد' when it's written 'أحمد' and they won't find it.

this is a solution 100%:
$yourChaine = \Transliterator::create('NFC; [:Nonspacing Mark:] Remove; NFC')
->transliterate($yourChaine);

Related

SQL code for retrieving values of column starting using wildcard

I want to look for values in variable/column which start with 'S' and has 'gg' in between.
For instance Staggered is a word which starts with alphabet S and has gg in between the word.
so what sql query to write to get the result.
Due to the fact that you did not provide much meta information (which database?), I'll just show the following:
SELECT * FROM <table>
WHERE <columnname> LIKE 'S%gg%';
Good luck :)
As the target database is not mentioned, I will answer with Oracle syntax:
select *
from TABLE_NAME
where COL_NAME like 'S%gg%'

SQL Contains Question

Can someone explain this to me? I have two queries below with their results.
query:
select * from tbl where contains([name], '"*he*" AND "*ca*"')
result-set:
Hertz Car Rental
Hemingyway's Cantina
query:
select * from tbl where contains([name], '"*he*" AND "*ar*"')
result-set:
nothing
The first query is what I would expect, however I would expect the second query to return "Hertz Car Rental". Am I fundamentally misunderstanding how '*' works in full-text searching?
Thanks!
I think SQL Server is interpreting your strings as prefix_terms. The asterisk is not a plain old wildcard specifier. Fulltext and Contains are word oriented. For what you are trying to do, you would be better off using plain old LIKE instead of CONTAINS.
http://msdn.microsoft.com/en-us/library/ms187787.aspx
"*" only works as a suffix. If you use it as a prefix, the table needs to be scanned no matter what and the index is useless. At that point, you might as well do
Select * From Table Where (Name Like '%he%') And (Name Like '%ar%')
I would try replacing * with % to see how it goes.
select * from tbl where contains([name], '"%he%" AND "%ar%"')

SQL: Is there a way to get the average number of characters for a field?

Is there a simple sql query that can help me to determine the average number of characters that a (text) database field has?
For instance my field is called "message". Ideally I would love to do something like this...
select average(characterlength(message)) from mydatabasetable
is this even possible through sql?
Thanks!
Edit
Original bad phrasing: In SQL Server, LEN is for varchar fields. For Text fields, try DATALENGTH
Correction because #gbn is right: LEN will not work with Text or NText datatypes. For TEXT, try Datalength.
End Edit
SELECT AVG(DATALENGTH(yourtextfield)) AS TEXTFieldSize
Edit - added
The above is for the TEXT datatype. For NTEXT, divide by 2.
select avg(length(fieldname)) from table
Though the answer could potentially differ depending on your RDBMS.
For MySQL:
SELECT AVG(CHAR_LENGTH(<column>)) AS avgLength FROM <table>
Retrieved from:
http://forums.devshed.com/database-management-46/average-length-field-38905.html
select avg(length(textfield)) from mytable;
Yes, it is possible. In SqlServer for example it would be:
SELECT AVG(LEN(Name)) FROM MyTable
select sum(len(theTextColumn)) / count(*) from theTable;

Use like in T-SQl to search for words separated by an unknown number of spaces

I have this query:
select * from table where column like '%firstword[something]secondword[something]thirdword%'
What do I replace [something] with to match an unknown number of spaces?
Edited to add: % will not work as it matches any character, not just spaces.
Perhaps somewhat optimistically assuming "unknown number" includes zero.
select *
from table where
REPLACE(column_name,' ','') like '%firstwordsecondwordthirdword%'
The following may help: http://blogs.msdn.com/b/sqlclr/archive/2005/06/29/regex.aspx
as it describes using regular expressions in SQL queries in SQL Server 2005
I would definitely suggest cleaning the input data instead, but this example may work when you call it as a function from the SELECT statement. Note that this will potentially be very expensive.
http://www.bigresource.com/MS_SQL-Replacing-multiple-spaces-with-a-single-space-9llmmF81.html

Make an SQL request more efficient and tidy?

I have the following SQL query:
SELECT Phrases.*
FROM Phrases
WHERE (((Phrases.phrase) Like "*ing aids*")
AND ((Phrases.phrase) Not Like "*getting*")
AND ((Phrases.phrase) Not Like "*contracting*"))
AND ((Phrases.phrase) Not Like "*preventing*"); //(etc.)
Now, if I were using RegEx, I might bunch all the Nots into one big (getting|contracting|preventing), but I'm not sure how to do this in SQL.
Is there a way to render this query more legibly/elegantly?
Just by removing redundant stuff and using a consistent naming convention your SQL looks way cooler:
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND phrase NOT LIKE '%getting%'
AND phrase NOT LIKE '%contracting%'
AND phrase NOT LIKE '%preventing%'
You talk about regular expressions. Some DBMS do have it: MySQL, Oracle... However, the choice of either syntax should take into account the execution plan of the query: "how quick it is" rather than "how nice it looks".
With MySQL, you're able to use regular expression where-clause parameters:
SELECT something FROM table WHERE column REGEXP 'regexp'
So if that's what you're using, you could write a regular expression string that is possibly a bit more compact that your 4 like criteria. It may not be as easy to see what the query is doing for other people, however.
It looks like SQL Server offers a similar feature.
Sinec it sounds like you're building this as you go to mine your data, here's something that you could consider:
CREATE TABLE Includes (phrase VARCHAR(50) NOT NULL)
CREATE TABLE Excludes (phrase VARCHAR(50) NOT NULL)
INSERT INTO Includes VALUES ('%ing aids%')
INSERT INTO Excludes VALUES ('%getting%')
INSERT INTO Excludes VALUES ('%contracting%')
INSERT INTO Excludes VALUES ('%preventing%')
SELECT
*
FROM
Phrases P
WHERE
EXISTS (SELECT * FROM Includes I WHERE P.phrase LIKE I.phrase) AND
NOT EXISTS (SELECT * FROM Excludes E WHERE P.phrase LIKE E.phrase)
You are then always just running the same query and you can simply change what's in the Includes and Excludes tables to refine your searches.
Depending on what SQL server you are using, it may support REGEX itself. For example, google searches show that SQL Server, Oracle, and mysql all support regex.
You could push all your negative criteria into a short circuiting CASE expression (works Sql Server, not sure about MSAccess).
SELECT *
FROM phrases
WHERE phrase LIKE '%ing aids%'
AND CASE
WHEN phrase LIKE '%getting%' THEN 2
WHEN phrase LIKE '%contracting%' THEN 2
WHEN phrase LIKE '%preventing%' THEN 2
ELSE 1
END = 1
On the "more efficient" side, you need to find some criteria that allows you to avoid reading the entire Phrases column. Double sided wildcard criteria is bad. Right sided wildcard criteria is good.