What does the Liferay documentation mean by "without using the indexer" - indexing

In the Liferay documentation, many *LocalServiceUtil classes have search methods with the following documentation:
Returns an ordered range of all the [...] matching the parameters without using the indexer, including keyword parameters for [...].
What does the without using the indexer part of the sentence mean?
In particular, does it mean that it does not use any database indexes? Does it mean that for instance JournalArticleLocalServiceUtil.search can be expected to run much slower than the equivalent JournalArticleLocalServiceUtil.getArticles? Or is it a different meaning?
Or does this indexer refer to the indexes in the result set in the same method's documentation, maybe?

The indexer refers to searchengine indexers such as those using Lucene, Solr, Elastic (or similar) implementations.
search and getArticles operations will query the database - if you do a keyword search your database might not use in (DB) index, because content or title are not part of an index by default. Therefore, when there is a bigger amount of articles, a keyword searchengine query might lead to a better response time.

Related

Neo4j index for full text search

I am working on neo4j database version 2.0.I have following requirements :
Case 1. I want to fetch all records where name contains some string,for example if i am searching for Neo4j then all records having name Neo4j Data,Neo4j Database,Neo4jDatabase etc. should be returned.
Case 2. When i want to fire field less query,if a set of properties is having matching value then those records should be returned or it may also be global level instead of label level.
Case Sensitivity is also a point.
I have read multiple thing about like,index,full text search,legacy index etc.,so what will be the best fit for my case,or i have to use elastic search etc.
I am using spring-data-neo4j in my application,so provide some configuration for SDN
Annotate your name with #Indexed annotation:
#Indexed(indexName = "whateverIndexName", indexType = IndexType.FULLTEXT)
private String name;
Then query for it following way (example for method in SDN repository, you can use similar anywhere else you use cypher):
#Query("START n=node:whateverIndexName({query}) return n"
Set<Topic> findByName(#Param("query") String query);
Neo4j uses lucene as backend for indexing so the query value must be a valid lucene query, e.g. "name:neo4j" or "name:neo4j*".
There is an article that explains the confusion around various Neo4j indexes http://nigelsmall.com/neo4j/index-confusion.
I don't think you need to be using elastic search-- you can use the legacy indexes or the lucene indexes to do full text searches.
Check out Michael Hunger's blog: jexp.de/blog
thix post specifically: http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/

RESTful API Design OR Predicates

I'm designing a RESTful API and I'm trying to work out how I could represent a predicate with OR an operator when querying for a resource.
For example if I had a resource Foo with a property Name, how would you search for all Foo resources with a name matching "Name1" OR "Name2"?
This is straight forward when it's an AND operator as I could do the following:
http://www.website.com/Foo?Name=Name1&Age=19
The other approach I've seen is to post the search in the body.
You will need to pick your own approach, but I can name few that seem to be pretty logical (although not without disadvantages):
Option 1.: Using | operator:
http://www.website.com/Foo?Name=Name1|Name2
Option 2.: Using modified query param to allow selection by one of the values from the set (list of possible comma-separated values):
http://www.website.com/Foo?Name_in=Name1,Name2
Option 3.: Using PHP-like notation to provide list instead of single string:
http://www.website.com/Foo?Name[]=Name1&Name[]=Name2
All of the above mentioned options have one huge advantage: they do not interfere with other query params.
But as I mentioned, pick your own approach and be consistent about it across your API.
Well one quick way to fixing that is to add an additional parameter that is identifying the relationship between your parameters wether they're an and or an or for example:
http://www.website.com/Foo?Name=Name1&Age=19&or=true
Or for much more complex queries just keep a single parameter and in it include your whole query by making up your own little query language and on the server side you would parse the whole string and extract the information and the statement.

Gremlin + Neo4j Lucene search

Does this gremlin script (executed via REST API of Neo4j) executes the sorting on the lucene index? Or are the nodes sorted in Neo4j?
g.idx('myIndex').get('name', 'aaa').sort{it.name}
Two additional questions:
1. How to set ordering? ASC/DESC
2. How to perform a fulltext search (LIKE). I already tried *, %, nothing worked
sort is a Groovy method. To reverse the order, use reverse:
g.idx('myIndex').get('name', 'aaa').sort{it.name}.reverse()
See:
http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html
http://groovy.codehaus.org/groovy-jdk/java/util/List.html
Besides doing what espeed suggested, which is using Gremlin's facilities to sort etc, you may also be interested in passing additional instructions down to Lucene itself. This can be done by prefixing the second argument into get with a magic string %query%. Like so:
... .get(null, "%query% _start_node_id_:15815486")
The key argument can be null if you don't need to use it.

SQL with Regular Expressions vs Indexes with Logical Merging Functions

I am trying to develop a complex textual search engine.
I have thousands of textual pages from many books.
I need to search pages that contain specified complex logical criterias.
These criterias can contain virtually any compination of the following:
A: Full words.
B: Word roots (semilar to stems; i.e. all words with certain key letters).
C: Word templates (in some languages roots are filled in certain templates to form various part of speech such as adjactives, past/present verbs...).
D: Logical connectives: AND/OR/XOR/NOT/IF/IFF and parentheses to state priorities.
Now, would it be faster to have the pages' full text in database (not indexed) and search through them all using SQL and Regular Expressions ?
Or would it be better to construct indexes of word/root/template-page-location tuples.
Hence, we can boost searching for individual words/roots/templates.
However, it gets tricky as we introduce logical connectives into our queries.
I thought of doing the following steps in such cases:
1: Seperately search for each individual words/roots/templates in the specified query.
2: On priority bases, we merge two result lists (from step 1) at a time depedning on the logical connective
For example, if we are searching for "he AND (is OR was)":
1: We shall search for "he", "is" and "was" seperately and get result lists for each word.
2: Merge the result lists of "is" and "was" using the merging function OR-MERGE.
3: Merge the merged result list from the OR-MERGE function with the one of "he" using the merging function AND-MERGE.
The result of step 3 is then returned as the result of the specified query.
What do you think gurues ? Which is faster ? Any better ideas ?
Thank you all in advance.
There are plenty of off-the-shelf solutions to this kind of problem. I would strongly recommend you use one of those instead of developing your own.
You don't say what database solution you're using. If it's Microsoft SQL Server, you could use its Full Text Search features. If it's MySQL, take a look at its Full-Text Search Functions. I'm sure Oracle, DB2 and any other major DBMS will have similar functionality.
Alternatively, take a look at Apache's Lucene for Java or Lucene for .NET. This will allow you to index documents without needing to use a DBMS.

Case-insensitive search using Hibernate

I'm using Hibernate for ORM of my Java app to an Oracle database (not that the database vendor matters, we may switch to another database one day), and I want to retrieve objects from the database according to user-provided strings. For example, when searching for people, if the user is looking for people who live in 'fran', I want to be able to give her people in San Francisco.
SQL is not my strong suit, and I prefer Hibernate's Criteria building code to hard-coded strings as it is. Can anyone point me in the right direction about how to do this in code, and if impossible, how the hard-coded SQL should look like?
Thanks,
Yuval =8-)
For the simple case you describe, look at Restrictions.ilike(), which does a case-insensitive search.
Criteria crit = session.createCriteria(Person.class);
crit.add(Restrictions.ilike('town', '%fran%');
List results = crit.list();
Criteria crit = session.createCriteria(Person.class);
crit.add(Restrictions.ilike('town', 'fran', MatchMode.ANYWHERE);
List results = crit.list();
If you use Spring's HibernateTemplate to interact with Hibernate, here is how you would do a case insensitive search on a user's email address:
getHibernateTemplate().find("from User where upper(email)=?", emailAddr.toUpperCase());
You also do not have to put in the '%' wildcards. You can pass MatchMode (docs for previous releases here) in to tell the search how to behave. START, ANYWHERE, EXACT, and END matches are the options.
The usual approach to ignoring case is to convert both the database values and the input value to upper or lower case - the resultant sql would have something like
select f.name from f where TO_UPPER(f.name) like '%FRAN%'
In hibernate criteria restrictions.like(...).ignoreCase()
I'm more familiar with Nhibernate so the syntax might not be 100% accurate
for some more info see pro hibernate 3 extract and hibernate docs 15.2. Narrowing the result set
This can also be done using the criterion Example, in the org.hibernate.criterion package.
public List findLike(Object entity, MatchMode matchMode) {
Example example = Example.create(entity);
example.enableLike(matchMode);
example.ignoreCase();
return getSession().createCriteria(entity.getClass()).add(
example).list();
}
Just another way that I find useful to accomplish the above.
Since Hibernate 5.2 session.createCriteria is deprecated. Below is solution using JPA 2 CriteriaBuilder. It uses like and upper:
CriteriaBuilder builder = session.getCriteriaBuilder();
CriteriaQuery<Person> criteria = builder.createQuery(Person.class);
Root<Person> root = criteria.from(Person.class);
Expression<String> upper = builder.upper(root.get("town"));
criteria.where(builder.like(upper, "%FRAN%"));
session.createQuery(criteria.select(root)).getResultList();
Most default database collations are not case-sensitive, but in the SQL Server world it can be set at the instance, the database, and the column level.
You could look at using Compass a wrapper above lucene.
http://www.compass-project.org/
By adding a few annotations to your domain objects you get achieve this kind of thing.
Compass provides a simple API for working with Lucene. If you know how to use an ORM, then you will feel right at home with Compass with simple operations for save, and delete & query.
From the site itself.
"Building on top of Lucene, Compass simplifies common usage patterns of Lucene such as google-style search, index updates as well as more advanced concepts such as caching and index sharding (sub indexes). Compass also uses built in optimizations for concurrent commits and merges."
I have used this in the past and I find it great.