How to get elasticsearch to perform similar to SQL 'LIKE' - sql

If using a SQL 'Like' statement to query data it will return data even if its only partially matched. For instance, if I'm searching for food and there's an item in my db called "raisins" when using SQL the query would return "raisins" even if my search only contained "rai". In elasticsearch, the query won't return a record unless the entire name (in this case "raisins") is specified. How can I get elasticsearch to behave similar to the SQL statement. I'm using Rails 3.1.1 and PostgreSQL. Thanks!

While creating index of model for elasticsearch use tokenizer on index which will fulfil your requirement. For. e.g.
tokenizer: {
:name_tokenizer => {type: "edgeNGram", max_gram: 100, min_gram: 3, side: "front"}
}
this will create tokens of size from 3 to 100 of your fields and as side is given as front it will check from the starting. You can get more details from here http://www.slideshare.net/clintongormley/terms-of-endearment-the-elasticsearch-query-dsl-explained

Related

How to keep SQL data and Elasticsearch in-sync, and which to search from?

I've seen two solutions mentioned, and was wondering what most people do.
Use logstash
Code your application to make writes to Elasticsearch alongside SQL. For example,
public saveRecord() {
saveToElasticsearch();
saveToSQL();
}
Another question is how to handle actually searching the entity? Do you ONLY use Elasticsearch?
If not, I would assume you fetch from Elasticsearch based on keywords and use the IDs returned to filter your SQL query. My question then, is how do you handle pagination? For example let's say you only want results 50 to 100. First you query Elasticsearch which returns 50-100. Then the SQL query reduces that to 20 results - the other 30 results are in what would've been the next Elasticsearch query (100 - 150 for example). Do you keep going back and forth?
As for your first question check here
As for the second question, if you plan to use elasticsearch as your search layer then better do it for all the searchable/filterable fields. As you've described, the alternative will get very messy very soon. Use elasticsearch for all your searches/filters and even aggregations if it suits your needs. Use the sql database as your point of truth and just get the full payload from there.
In general, if you will need to paginate then your search should better be in one place otherwise it will get ugly.

How to chain SQL, Text and scan queries in Apache Ignite

We have a clustered Ignite cache in which we lan to store a huge amount of data (in excess of 100 million records). We are currently using SQL queries to search for records using indices. But we have a requirement for some free text based searches and we were planning to evaluate how Text Queries can work. The free text search will be in conjunction with some SQL constraints so that the result data set is not huge. I was hoping to find a way to use the Text Search and may be scan search on the result of a SQL search (which I think could give a lot more flexibility and power to the query framework of Ignite). Is there a way to achieve this. We use Native persistence and replicated cache in our system.
All query kinds - Scan, SQL and Text - are independent from each other. You can't use SQL on top of Text query result directly.
You can try to execute local Text queries on all nodes, and then filter the results manually (not using SQL, just Java code). E.g.
Collection<List<Cache.Entry<Object, Object>>> results = ignite.compute().broadcast(() -> {
IgniteCache<Object, Object> cache = Ignition.localIgnite().<Object, Object>cache("foo");
TextQuery<Object, Object> qry = new TextQuery<Object, Object>(Value.class, "str").setLocal(true);
try (QueryCursor<Cache.Entry<Object, Object>> cursor = cache.query(qry)) {
return StreamSupport.stream(cursor.spliterator(), false)
.filter(e -> needToReturnEntry(e))
.collect(Collectors.toList());
}
});
List<Cache.Entry<Object, Object>> combinedResults = results.stream()
.flatMap(Collection::stream)
.collect(Collectors.toList());
needToReturnEntry(e) here needs to be implemented to do the same filtering as SQL constraints would.
Another way is to retrieve a list of primary keys from the Text query, and then add that to the SQL query. This will work if the number of keys isn't too big.
select * from TABLE where pKey in (<keys from Text Query>) and <other constraints>

Query fast without search, slow with search, but with search fast in SSMS

I have this function that takes data from the database and also has search. The problem is that when I search with Entity framework it's slow, but if I use the same query I got from the log and use it in SSMS it's fast. I must also say that there are allot of movies, 388262. I also tried adding an index on title at movie, but didn't help.
Query I use in SSMS:
SELECT *
FROM Movie
WHERE title LIKE '%pirate%'
ORDER BY ##ROWCOUNT
OFFSET 0 ROWS FETCH NEXT 30 ROWS ONLY
Entity code (_movieRepository.GetAll() returns Queryable not all movies):
public IActionResult Index(MovieIndexViewModel vm) {
IQueryable<Movie> query = _movieRepository.GetAll().AsNoTracking();
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.ToLower().Contains(vm.Search.ToLower()));
}
vm.TotalItemCount = query.Count();
vm.Movies = query.Skip(_pageSize * (vm.Page - 1)).Take(_pageSize);
vm.PageSize = _pageSize;
return View(vm);
}
Caveat: I don't have much experience with the Entity framework.
However, you might find useful debugging tips available in the Entity Framework Performance Article from Simple talk. Looking at what you've posted you might be able to improve your query performance by:
Choosing only the specific column you're interested in (it sounds like you're only interested in querying for the 'Title' column).
Pay special attention to your data-types. You might want to convert your NVARCHAR variables to VARCHAR(40) (or some appropriate character limit)
try removing all of the ToLower() stuff,
if (!string.IsNullOrWhiteSpace(vm.Search)) {
query = query.Where(m => m.title.Contains(vm.Search)));
}
sql server (unlike c#) is not case sensitive by default (though you can configure it to be that way). Your query is forcing sql server to lower case every record in the table and then do the comparison.

Using OrientDB 2.0.2, a LUCENE query does not seem to respect the "LIMIT n" operator

Using LUCENE inside of OrientDB seems to work fine, but there are very many LUCENE-specific query parameters that I would ordinarily pass directly to LUCENE (normally through Solr). The first one I need to pass is the result limiter such as SELECT * FROM V WHERE field LUCENE "Value" LIMIT 10.
If I use a value that only returns a few rows, I get the performance I expect, but if it has a lot of values, I need the limiter to get the result to return quickly. Otherwise I get an message in the console stating that The query would return more than 50000 records. Please consider using an index.
How do I pass additional LUCENE query paramters?
There's a known issue with the query parser which is in the process of being fixed, until then the following workaround should help:
SELECT FROM (
SELECT * FROM V WHERE Field LUCENE 'Value'
) LIMIT 10
Alternatively, depending on which client libraries you're using you may be able to set a limit using the out-of-band query settings.

Neo4j index for full text search

I am working on neo4j database version 2.0.I have following requirements :
Case 1. I want to fetch all records where name contains some string,for example if i am searching for Neo4j then all records having name Neo4j Data,Neo4j Database,Neo4jDatabase etc. should be returned.
Case 2. When i want to fire field less query,if a set of properties is having matching value then those records should be returned or it may also be global level instead of label level.
Case Sensitivity is also a point.
I have read multiple thing about like,index,full text search,legacy index etc.,so what will be the best fit for my case,or i have to use elastic search etc.
I am using spring-data-neo4j in my application,so provide some configuration for SDN
Annotate your name with #Indexed annotation:
#Indexed(indexName = "whateverIndexName", indexType = IndexType.FULLTEXT)
private String name;
Then query for it following way (example for method in SDN repository, you can use similar anywhere else you use cypher):
#Query("START n=node:whateverIndexName({query}) return n"
Set<Topic> findByName(#Param("query") String query);
Neo4j uses lucene as backend for indexing so the query value must be a valid lucene query, e.g. "name:neo4j" or "name:neo4j*".
There is an article that explains the confusion around various Neo4j indexes http://nigelsmall.com/neo4j/index-confusion.
I don't think you need to be using elastic search-- you can use the legacy indexes or the lucene indexes to do full text searches.
Check out Michael Hunger's blog: jexp.de/blog
thix post specifically: http://jexp.de/blog/2014/03/full-text-indexing-fts-in-neo4j-2-0/