Redisearch full-text search in redis: finding partial text *text - redis

Following the example provided:
# add data to hash set
HSET movies:11002 title "Star Wars: Episode V - The Empire Strikes Back" plot "Luke Skywalker begins Jedi training with Yoda." release_year 1980 genre "Action" rating 8.7 votes 1127635
HSET movies:11003 title "The Godfather" plot "The aging patriarch of an organized crime dynasty transfers control of his empire to his son." release_year 1972 genre "Drama" rating 9.2 votes 1563839
# create index
FT.CREATE idx:movies ON hash PREFIX 1 "movies:" SCHEMA title TEXT SORTABLE release_year NUMERIC SORTABLE rating NUMERIC SORTABLE genre TAG SORTABLE
# search movies
FT.SEARCH idx:movies * SORTBY release_year ASC RETURN 2 title release_year
# search action movies
FT.SEARCH idx:movies "star #genre:{action}" RETURN 2 title release_year
# search movie that title starts with god
FT.SEARCH idx:movies #title:god* SORTBY release_year ASC RETURN 2 title release_year
# search movie that ends with father (DOESNT WORK)?
FT.SEARCH idx:movies #title:*father SORTBY release_year ASC RETURN 2 title release_year
# search movie that contains fath (DOESNT WORK)?
FT.SEARCH idx:movies #title:*fath* SORTBY release_year ASC RETURN 2 title release_year
How do I make the contains string work?
Someone mentioned AGGREGATION, but I don't know how to make this return results:
FT.AGGREGATE idx:movies "*" FILTER "contains('title', 'father')"
redis version: 6.2.7
# Modules
module:name=timeseries,ver=10611,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors]
module:name=graph,ver=20813,api=1,filters=0,usedby=[],using=[ReJSON],options=[]
module:name=search,ver=20408,api=1,filters=0,usedby=[],using=[ReJSON],options=[handle-io-errors]
module:name=ReJSON,ver=20009,api=1,filters=0,usedby=[search|graph],using=[],options=[handle-io-errors]
module:name=bf,ver=20215,api=1,filters=0,usedby=[],using=[],options=[]

There's really two questions here but I'm happy to answer both:
How do I make the contains string work?
RediSearch doesn't support prefix wildcards, only postfix wildcards. So, using FT.SEARCH, this is not possible.
Someone mentioned AGGREGATION, but I don't know how to make this return results.
Your call to the contains function is not providing the field name properly. Field names are almost always prefixed with an #. Easy mistake to make and I've made it many times myself. Try this:
FT.AGGREGATE idx:movies "*" FILTER "contains(#title, 'father')"
Hope that helps!

Related

Search in all vertices by one specific field value

I would like to know if it's possible to search in all vertices by one specifig field value, without naming each vertex explicitly 🤔
If you do not specify the label it is possible to query all nodes via property.
Say I have two labels Actors(properties: ActorId and Name) and Movies(properties: tconst and primaryTitle) in a database called IMDB and I want to search for either movies or actors named Kevin Bacon.
I can query across both node labels. However, if the property names are different this makes little sense and will not utilize the indices.
> GRAPH.QUERY IMDB "MATCH (a{Name: 'Kevin Bacon'}) RETURN a limit 1"
1) 1) 1) "a.ActorId"
2) "a.Name"
3) "a.tconst"
4) "a.primaryTitle"
2) 1) "nm0000102"
2) "Kevin Bacon"
3) "NULL"
4) "NULL"

Does Cypher has functions like Group By?

I am new to Neo4j and wondering if Cypher has functions like GROUP BY in SQL.
Here is my code:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN m.title AS movie, p.name AS actor
Here is my result from above query:
movie actor
"The Matrix" "Emil Eifrem"
"The Matrix" "Carrie-Anne Moss"
"The Matrix" "Keanu Reeves"
"The Matrix Reloaded" "Hugo Weaving"
"The Matrix Reloaded" "Laurence Fishburne"
"The Matrix Revolutions" "Hugo Weaving"
"The Matrix Revolutions" "Laurence Fishburne"
Here is the result I want to have:
movie actor num_of_actors
"The Matrix" "Emil Eifrem" 3
"The Matrix" "Carrie-Anne Moss" 3
"The Matrix" "Keanu Reeves" 3
"The Matrix Reloaded" "Hugo Weaving" 2
"The Matrix Reloaded" "Laurence Fishburne" 2
"The Matrix Revolutions" "Hugo Weaving" 2
"The Matrix Revolutions" "Laurence Fishburne" 2
Basically I would like to have the number of actors played in each movie together with the original results.
Thanks in advance
You'll want to review the aggregation functions, which you can use within a WITH clause to do grouping.
For example, if you wanted to group the actor names with each movie, you could do this:
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
WITH m, collect(p.name) as actors
RETURN m.title AS movie, actors
That said, there are some shortcuts we can do here since you're asking about the total number of actors per movie (see our knowledge base article on using degree counts from a node instead of doing expansions).
If you wanted to keep a separate row per actor, but also have the number of actors, since we know :ACTED_IN relationships will never go to the same actor more than once, we can get the degree of :ACTED_IN relationships incoming to each :Movie node to get our count. For best performance, get the degree before you expand out to actors:
MATCH (m:Movie)
WITH m, m.title as title, size((m)<-[:ACTED_IN]-()) as num_of_actors
MATCH (p:Person)-[:ACTED_IN]->(m)
RETURN title, p.name as actor, num_of_actors

How to get position of regexp match in string in PostgreSQL?

I have a table with book titles and I want to select books that have title matching a regexp and to order results by the position of the regexp match in title.
It's easy for a single-word searches. E.g.
TABLE book
id title
1 The Sun
2 The Dead Sun
3 Sun Kissed
I'm going to put .* between words in client's search term before sending query to DB, so I'd write SQL with prepared regexps here.
SELECT book.id, book.title FROM book
WHERE book.title ~* '.*sun.*'
ORDER BY COALESCE(NULLIF(position('sun' in book.title), 0), 999999) ASC;
RESULT
id title
3 Sun Kissed
1 The Sun
2 The Dead Sun
But if search term has more than one word I want to match titles that have all words from search term with anything between them, and sort by the position like before, so I need a function that returns a position of regexp, I didn't find an appropriate one in official PostgreSQL docs.
TABLE books
id title
4 Deep Space Endeavor
5 Star Trek: Deep Space Nine: The Never Ending Sacrifice
6 Deep Black: Space Espionage and National Security
SELECT book.id, book.title FROM book
WHERE book.title ~* '.*deep.*space.*'
ORDER BY ???REGEXP_POSITION_FUNCTION???('.*deep.*space.*' in book.title);
DESIRED RESULT
id title
4 Deep Space Endeavor
6 Deep Black: Space Espionage and National Security
5 Star Trek: Deep Space Nine: The Never Ending Sacrifice
I didn't find any function similar to ???REGEXP_POSITION_FUNCTION???, do you have any ideas?
One way (of many) to do this: Remove the rest of the string beginning at the match and measure the length of the truncated string:
SELECT id, title
FROM book
WHERE title ILIKE '%deep%space%'
ORDER BY length(regexp_replace(title, 'deep.*space.*', '','i'));
Using ILIKE in the WHERE clause, since that is typically faster (and does the same here).
Also note the fourth parameter to the regexp_replace() function ('i'), to make it case insensitive.
Alternatives
As per request in the comment.
At the same time demonstrating how to sort matches first (and NULLS LAST).
SELECT id, title
,substring(title FROM '(?i)(^.*)deep.*space.*') AS sub1
,length(substring(title FROM '(?i)(^.*)deep.*space.*')) AS pos1
,substring(title FROM '(?i)^.*(?=deep.*space.*)') AS sub2
,length(substring(title FROM '(?i)^.*(?=deep.*space.*)')) AS pos2
,substring(title FROM '(?i)^.*(deep.*space.*)') AS sub3
,position((substring(title FROM '(?i)^.*(deep.*space.*)')) IN title) AS p3
,regexp_replace(title, 'deep.*space.*', '','i') AS reg4
,length(regexp_replace(title, 'deep.*space.*', '','i')) AS pos4
FROM book
ORDER BY title ILIKE '%deep%space%' DESC NULLS LAST
,length(regexp_replace(title, 'deep.*space.*', '','i'));
You can find documentation for all of the above in the manual here and here.
-> SQLfiddle demonstrating all.
Another way to do this would be to first get the literal match for the pattern, then find the position of the literal match:
strpos(input, (regexp_match(input, pattern, 'i'))[1]);
Or in this case:
SELECT id, title
FROM book
ORDER BY strpos(book.title, (regexp_match(book.title, '.*deep.*space.*', 'i'))[1]);
However, there are few caveats:
this is not very efficient as it will scan the input string twice.
this will ignore lookaround (lookbehind, lookahead) constraints, since the literal match can appear multiple times, before the pattern match.
e.g: for the input 'aba' and pattern '(?<=b)a', strpos will return 1 (for the 1st 'a') although the actual position should be 3 (for the 2nd 'a').
BTW, you should probably use a greedy quantifier and narrow your character class as much as you can instead of .* to increase performance (e.g 'deep [\w\s]*? space')

SQL :: How to find records with most common in search string (Tags field)

We have tbl_Articles:
id title tags
=================================
1 article1 science;
2 article2 art;
3 article3 sports;art;
I am looking for a query to return records from tbl_Articles which have most common words with a specific tags string (Ex: politics;art;):
EX: Select (something from tbl_articles) where Tags has common in "politics;art;"
Result:
tbl_Articles
id title tags
=================================
2 article2 art;
3 article3 sports;art;
Are you looking for this?
select a.*
from articles a
where ';'+tags+';' like '%;politics;%' and
';'+tags+';' like '%;art;%'
Notice that I use the separator at the beginning and end so you can have "art" and "smart" as tags.
You can accomplish this task using LIKE predicate BUT, Please note that it only works on Character Patterns.
I think you are looking for Full-Text Query. The Free Text not only returns the exact wording, but also the nearest meanings attached to it.
For more details check for the following types..
FREETEXT
FREETEXTTABLE
CONTAINS
CONTAINSTABLE
The Performance benefit of using Full-Text search can be best realized when querying against a large amount of unstructured text data. A LIKE query (for example, ‘%microsoft%’) against millions of rows of text data can take minutes to return; whereas a Full-Text query (for ‘microsoft’) can take only seconds or less against the same data, depending on the number of rows that are returned.

How can I use Lucene for personal name (first name, last name) search?

I'm writing a search feature for a database of NFL players.
The user enters a search string like "Jason Campbell" or "Campbell" or "Jason".
I'm having trouble getting the appropriate results.
Which Analyzer should I use when indexing? Which Query when querying? Should I distinguish between first name and last name or just index the full name string?
I'd like the following behavior:
Query: "Jason Campbell" -> Result: exact match for 1 player, Jason Campbell
Query: "Campbell" -> Result: all players with Campbell in their name
Query: "Jason" -> Result: all players with Jason in their name
Query: "Cambel" [misspelled] -> Result: all players with Campbell in their name
StandardAnalyzer should work fine for all above queries. Your first query should be enclosed in double-quotes for an exact match, your last query would require a fuzzy query. For example you could set Cambell~0.5 and you could get Campbell as match(with the numeric value after the tilde indicating the fuzziness).
BTW I would suggest using Solr which provides features for spell-check and auto-suggest so you wouldn't have to reinvent the wheel. This is similar to Google's "did you mean..."