Fulltext search in SQL Server - sql

I'm trying to create a simple search page on my site but finding it difficult to get full text search working as I would expect it to word.
Example search:
select *
from Movies
where contains(Name, '"Movie*" and "2*"')
Example data:
Id Name
------------------
1 MyMovie
2 MyMovie Part 2
3 MyMovie Part 3
4 MyMovie X2
5 Other Movie
Searches like "movie*" return no results since the term is in the middle of a work.
Searches like "MyMovie*" and "2*" only return MyMovie Part 2 and not MyMovie Part X2
It seems like I could just hack together a dynamic SQL query that will just
and name like '%movie%' and name like '%x2%' and it would work better than full text search which seems odd since it's a large part of SQL but doesn't seem to work as good as a simple like usage.
I've turned off my stop list so the number and single character results appear but it just doesn't seem to work well for what I'm doing which seems rather basic.

select
*
from Movies
where
name like ('%Movie%')
or name like ('%2%')
;

select * from Movies where freetext(Name, ' "Movie*" or "2*" ')

Related

Basic SQL Script to find a special character, but only when present more than once

I am relearning MS-SQL for a project.
I have a table with a field where the data includes the special character |.
Most times the field does not have it, sometimes once, sometimes 4 times.
I have been able to get it filtered to when present, but I would like to try to show only the times it appears more than once.
This is what I have come up so far:
SELECT UID, OBJ_UID, DESCRIPTION
FROM SPECIFICS
WHERE (NAMED LIKE '%[|]%')
Is there an easy way?
You can replace | with blank and compare length of strings
SELECT
UID, OBJ_UID, DESCRIPTION
FROM
SPECIFICS
WHERE
LEN(NAMED) - LEN(REPLACE(NAMED, '|', '')) > 1
Query returns rows where | appears more than one time

Using Lucene Fuzzy search with a word that has no aliases

I wish do searches using fuzzy search. Using Luke to help me, if I search for a word that has aliases (eg similar words) it all works as expected:
However if I enter a search term that doesn't have any similar words (eg a serial code), the search fails and I get no results, even though it should be valid:
Do I need to structure my search in a different way? Why don't I get the same in the second search as the first, but with only one "term"?
You have not specified Lucene version so I would assume you are using 6.x.x.
The behavior that you are seeing is a correct behavior of Lucene Fuzzy Search.
Refer this and I quote ,
At most, this query will match terms up to 2 edits.
Which roughly but not very accurately means that two texts varying with maximum of two characters at any positions would be a returned as match if using FuzzyQuery.
Below is a sample output from one of my simple Java programs that I illustrate here,
Lets say three Indexed Docs have a field with values like -
"123456787" , "123456788" , "123456789" ( Appended 7 , 8 and 9 to
– 12345678 )
Results :
No Hits Found for search string -> 123456 ( Edit distance = 3 , last
3 digits are missing)
3 Docs found !! for Search String -> 1234567 ( Edit distance = 2 )
3 Docs found !! for Search String -> 12345678 ( Edit distance = 1 )
1 Docs found !! for Search String -> 1236787 ( Edit distance = 2 for
found one, missing 4 , 5 and last digit for remaining two documents)
No Hits Found for search string -> 123678789 ( Edit distance = 4 ,
missing 4 , 5 and last two digits)
So you should read more about Edit Distance.
If your requirement is to match N-Continuous characters without worrying about edit distance , then N-Gram Indexing using NGramTokenizer is the way to go.
See this too for more about N-Gram

SQL Server 2014 equivalent to mysql's find_in_set()

I'm working with a database that has a locations table such as:
locationID | locationHierarchy
1 | 0
2 | 1
3 | 1,2
4 | 1
5 | 1,4
6 | 1,4,5
which makes a tree like this
1
--2
----3
--4
----5
------6
where locationHierarchy is a csv string of the locationIDs of all its ancesters (think of a hierarchy tree). This makes it easy to determine the hierarchy when working toward the top of the tree given a starting locationID.
Now I need to write code to start with an ancestor and recursively find all descendants. MySQL has a function called 'find_in_set' which easily parses a csv string to look for a value. It's nice because I can just say "find in set the value 4" which would give all locations that are descendants of locationID of 4 (including 4 itself).
Unfortunately this is being developed on SQL Server 2014 and it has no such function. The CSV string is a variable length (virtually unlimited levels allowed) and I need a way to find all ancestors of a location.
A lot of what I've found on the internet to mimic the find_in_set function into SQL Server assumes a fixed depth of hierarchy such as 4 levels maximum) which wouldn't work for me.
Does anyone have a stored procedure or anything that I could integrate into a query? I'd really rather not have to pull all records from this table to use code to individually parse the CSV string.
I would imagine searching the locationHierarchy string for locationID% or %,{locationid},% would work but be pretty slow.
I think you want like -- in either database. Something like this:
select l.*
from locations l
where l.locationHierarchy like #LocationHierarchy + ',%';
If you want the original location included, then one method is:
select l.*
from locations l
where l.locationHierarchy + ',' like #LocationHierarchy + ',%';
I should also note that SQL Server has proper support for recursive queries, so it has other options for hierarchies apart from hierarchy trees (which are still a very reasonable solution).
Finally It worked for me..
SELECT * FROM locations WHERE locationHierarchy like CONCAT(#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param,',%%') OR
o.unitnumber like CONCAT('%%,',#param)

sql full text search not searching exact phrase

I have a database of around 10,000 records. Each record have a text of aprx 40 pages each. I need to implement full-text search in my database coz like query is taking lot of time.
I created the index, following these instructions and tried searching using full-text search.Although it increased the speed of showing results. But I am not able to search phrases in my table.
I am using following query for my exact phrase search
select * from ptcsoftcitation
WHERE CONTAINS(Judgment,'"said contention raised by the counsel"');
It's giving all those results which contains all the words but not exact phrase. It's behaving like ' "said" and "contention" and "raised" and "counsel" '
Please help me..
Assume I have 3 record:
bc
abc
bcd
Search exactly:
= 'bc' will return bc
Search closely:
LIKE 'bc' will return bc
Search contains:
LIKE '%bc' will return bc and abc
LIKE 'bc%' will return bc and bcd
LIKE '%bc%' will return bc, abc and bcd
I think your code should:
SELECT * FROM ptcsoftcitation
WHERE Judgment LIKE '%said contention raised by the counsel%'

Postgres text search on multiple rows

I have a table called 'exclude' that contains hashtags:
-------------
id | tag
-------------
1 #oxford
2 #uk
3 #england
-------------
I have another table called 'post':
-----------------------------------------------
id | tags | text
1 #oxford #funtimes Sitting in the sun
2 #oz Beach fun
3 #england Milk was a bad choice
-----------------------------------------------
In order to do a text search on the posts tags I've been running a query like follows:
SELECT * FROM post WHERE to_tsvector(tags) ## plainto_tsquery('mysearchterm')
However, I now want to be able to exclude all posts where some or all of the tags are in my exclude table. Is there any easy way to do this in SQL/Postgres?
I tried converting the tags row into one column, and using this term within the plainto_tsquery function but it doesn't work (I don't know how to do a text search 'not equal' to either, hence the logic is actual wrong, albeit on the right lines in my mind):
select * from post where to_tsvector(tags) ## plainto_tsquery(
select array_to_string(array(select RTRIM(value) from exclude where key = 'tag'), ' ')
)
What version of PostgreSQL are you on? And how flexible is your schema design? In other words, can you change it at will? Or is this out of your control?
Two things immediately popped to mind when I read your questions. One is you should be able to use array and the the #> (contains) or <# (is contains by) operators.
Here is documentation
Second, you might be able to utilize an hstore and do a similar operation.
to:
hstore #> hstore
It's not a true hstore, because you are not using a real key=>value pair. But, I guess you could do {tagname}=True or {tagname}=NULL. Might be a bit hackish.
You can see the documentation (for PostgreSQL 9.1) hstore and how to use it here