How to keep SQL data and Elasticsearch in-sync, and which to search from? - sql

I've seen two solutions mentioned, and was wondering what most people do.
Use logstash
Code your application to make writes to Elasticsearch alongside SQL. For example,
public saveRecord() {
saveToElasticsearch();
saveToSQL();
}
Another question is how to handle actually searching the entity? Do you ONLY use Elasticsearch?
If not, I would assume you fetch from Elasticsearch based on keywords and use the IDs returned to filter your SQL query. My question then, is how do you handle pagination? For example let's say you only want results 50 to 100. First you query Elasticsearch which returns 50-100. Then the SQL query reduces that to 20 results - the other 30 results are in what would've been the next Elasticsearch query (100 - 150 for example). Do you keep going back and forth?

As for your first question check here
As for the second question, if you plan to use elasticsearch as your search layer then better do it for all the searchable/filterable fields. As you've described, the alternative will get very messy very soon. Use elasticsearch for all your searches/filters and even aggregations if it suits your needs. Use the sql database as your point of truth and just get the full payload from there.
In general, if you will need to paginate then your search should better be in one place otherwise it will get ugly.

Related

How to get all the records matching a regex in Aerospike?

I have millions of records in a set. I would like to retrieve all the records that match the same pattern.
For example I may have :
id=4444?mode=mode1?fieldA=abc
id=4444?mode=mode1?fieldA=azerty
id=4444?mode=mode1?fieldA=qwerty
id=4444?mode=mode1?fieldA=foo
id=4444?mode=mode1?fieldA=bar
Is it possible to make a query to get all the above records without knowing in advance the value of the fieldA ? Something like this in regex :
id=4444?mode=mode1?fieldA=[\w]*
Thanks for you time.
Yes, it can be done. You would need to query by a secondary index first to narrow the result set to a manageable size first, then write a filter using Lua which filters out the ones you don't want. This filter could take the regex you want to match against (passed in dynamically) and return only those records that match.
Whilst this would work, it would not be as performant as the key-value operations in Aerospike. You would definitely want to benchmark such a solution before putting it into production.
Predicate filtering was added in release 3.12 on March 15. You can use the stringRegex method of the PredExp class of the Java client to build complex filters such as the one you mentioned. It also currently exists for the C, C# and Go clients.
There's a similar example in the Aerospike Java client:
Statement stmt = new Statement();
stmt.setNamespace(params.namespace);
stmt.setSetName(params.set);
stmt.setFilter(Filter.range(binName, begin, end));
stmt.setPredExp(
PredExp.stringBin("bin3"),
PredExp.stringValue("prefix.*suffix"),
PredExp.stringRegex(RegexFlag.ICASE | RegexFlag.NEWLINE)
);
The RegexFlag class in com.aerospike.client.query defines which regular expressions you can use, and how they'd behave.

Advanced search REST API

My requirement is to implement advanced search Rest API for searching the phones. The URI for the search API is http://myservice/api/v1/phones/search?q=${query_expression}
Where q is the complex query expression. Have the following questions
1) Since advanced search involves a lengthy query expression, the URI will not fit in a GET call. Is it alright to implement the search API via POST request and still maintain the RESTfulness?
2) I have come across the following implementations for the advanced search:
1st approach - Send the complete infix expression for the query expression.
eg.
PHONENAME STARTSWITH 'AR' AND ( PHONETYPE = '4G' OR PHONECOLOR = 'RED')
2nd approach - Constructing entire query expression in the form of a json.
eg.
{"criteria":[
{"index":1,"field":"PHONENAME","value":"AR","comparator":"STARTSWITH"},
{"index":2,"field":"PHONETYPE","value":"4G","comparator":"EQUALS"},
{"index":3,"field":"PHONECOLOR","value":"RED","comparator":"EQUALS"}
],"criteria":"( 1 AND (2 OR 3) )"}
3rd approach - Alternative way to implement the query expression as a json.
eg.
{"and":[
{"field":"PHONENAME","value":"AR","comparator":"STARTSWITH"},
"or":[
{"field":"PHONETYPE","value":"4G","comparator":"EQUALS"},
{"field":"PHONECOLOR","value":"RED","comparator":"EQUALS"}]
]}
Which approach would be considered more RESTful out of the three? Suggestions for any other approaches are welcome :)
You could follow the approach taken by ElasticSearch, which out of the examples you had given is the third one.
See https://www.elastic.co/guide/en/elasticsearch/reference/current/search.html
The third approach is also easier to understand and easier to maintain.
For example if in the future you would need to add "fuzzy" query operator and it would have a completely different model, that would be an easy thing to do.
Yes, POST is a catch-all. It's preferable to use it for resource creation, but according to the spec it may be used in this way also. However, you should consider changing the endpoint to be /search-results. This gives you the flexibility to start storing search results later, and you can return a Location header pointing to the results of a particular complex query. Another alternative is to let users POST their search criteria, and then do a GET /search-results?criteria={id}.
Don't do the second one. It's hard to read and more complex than it should be. Either the first or the third are fine. The first is more compact but probably harder to handle on the back end. For the third, you really don't need the index.

SQL Server Efficient Search for LIKE '%str%'

In Sql Server, I have a table containing 46 million rows.
In "Title" column of table, I want make search. The word may be at any index of field value.
For example:
Value in table: BROTHERS COMPANY
Search string: ROTHER
I want this search to match the given record. This is exactly what LIKE '%ROTHER%' do. However, LIKE '%%' usage should not be used on large tables because of performance issues. How can I achieve it?
Though I don't know your requirements, your best approach may be to challenge them. Middle-of-the-string searches are usually not very practical. If you can get your users to perform prefix searches (broth%) then you can easily use Full Text's wildcard search (CONTAINS(*, '"broth*"')). Full Text can also handle suffix searches (%rothers) with a little extra work.
But when it comes to middle-of-the-string searches with SQL Server, you're stuck using LIKE. However you may be able to improve performance of LIKE by using a binary collation as explained in this article. (I hate to post a link without including its content but it is way too long of an article to post here and I don't understand the approach enough to sum it up.)
If that doesn't help and if middle-of-the-string searches are that important of a requirement then you should consider using a different search solution like Lucene.
Add Full-Text index if you want.
You can search the table using CONTAINS:
SELECT *
FROM YourTable
WHERE CONTAINS(TableColumnName, 'SearchItem')

Hibernate Search - possible to get new Lucene query after facets applied?

A Lucene Query is generated as so:
Query luceneQuery = builder.all().createQuery();
Then facets are applied.
I'm not sure if when facets are applied the luceneQuery is ANDed and ORed with other Querys resulting in a new Lucene Query. Alternatively, perhaps a bunch of BitSets's are applied to the original Query to refine the results. (I don't know).
If a new query is generated I'd like to retrieve it. If not, I need a rethink. That's the crux of the question.
Why:
I'm applying a faceted search on a field with multiple possible values.
E.g. TMovie.class many-to-many TTag.class (multiple-value-facet)
I'm filtering on TMovie where TTag is some value.
Anyway, the filtering works but there is a known problem whereby the Facet-counts returned are incorrect.
Detailed here: Add faceting over multivalued to application using Hibernate Search and https://forum.hibernate.org/viewtopic.php?f=9&t=1010472
I'm using this solution:
http://sujitpal.blogspot.ie/2007/04/lucene-search-within-search-with.html (see comment on new API under article)
The BitSet solution (in this example at least) generates counts based on the original Lucene Query. This works perfectly. However.....
If alternate (different, not TTags) facets are applied to the original query some complications arise.
The Bitset solution calculates on the original Lucene query. It does not calculate on the lucene query now reduced by the application of alternate Facets (a different FacetSelection) (or even TTag Facets themselves for that matter). I.e. the count calculations are irrespective of any other FacetSelection Facets applied.
So...
A. can I get the new Lucene query after facets are applied? The BitSet solution applied to this would be correct.
B. Any other alternative suggestions?
Thanks so much.. All comments welcome.
John
Regarding your first question, applying a facet is not modifying the original query, it uses a custom Collector called FacetCollector - see https://github.com/hibernate/hibernate-search/blob/master/engine/src/main/java/org/hibernate/search/query/collector/impl/FacetCollector.java. Under the hood the collector uses a Lucene FieldCache for doing the facet count. There is also the root of the limitation for multi-value faceting. FieldCache does not support multiple values per field.
Anyways, no additional queries are applied during faceting and the original query is unmodified. The benefit of course is performance. The solution you are pointing to probably works as well, but relies on running multiple queries. However, it might be a valid work around for your use case.

Hibernate and use of criteria method setFirstResult

I'm trying to learn the hibernate criteria API but I'm puzzled by the criteria method setFirstResult.
I don't understand why I would want to use it except in the rarest of circumstances. It seems to me that when I retrieve information from a database, I'm only interested in establishing some criteria and then executing the query against the criteria. Why do I care from which index number in the database the results should be read. It is not something I normally do when I write sql queries yet I see this method all over the hibernate literature. Is this method something I always have to invoke when writing Hibernate queries or can I safely ignore it?
Thank you,
Elliott
This is typically used when displaying paginated results of a query. The first page goes from 0 to 19, the second page from 20 to 39, etc.
Well I use it in a bunch of places.. its unfortunate or outright lucky/dumb that you have run into a case where you needed to page your results in which case you generally right queries that pick from one index to another. consider the case where you want to display the audit log of an app that is stored for every write action on the page. in that case you will show the 20 results based on which page the user is and what field the audit log is sorted on.