How to setup ElasticSearch to do SQL LIKE "%" for email addresses? - lucene

In SQL, I can search email addresses pretty well with SQL LIKE.
With an email "stack#domain.com", searching "stack", "#domain.com", "domain.com", or "domain" would get me back the desired email address.
How can I get the same result with ElasticSearch?
I played with nGram, edgeNGram, uax_url_email, etc and the search results have been pretty bad. Please correct me if I'm wrong, it sounds like I have to do the following:
for index_analyzer
use "keyword", "whitespace", or "uax_url_email" tokenizer so the email don't get tokenized
but wildcard queries don't seem to work (with tire at least)
use "nGram" or "edgeNGram" for filter
I always get way too many unwanted results like getting "first#domain.com" when searching "first-second".
for search_analyzer
don't do nGram
One experiment code
tire.settings :number_of_shards => 1,
:number_of_replicas => 1,
:analysis => {
:filter => {
:db_ngram => {
"type" => "nGram",
"max_gram" => 255,
"min_gram" => 3 }
},
:analyzer => {
:string_analyzer => {
"tokenizer" => "standard",
"filter" => ["standard", "lowercase", "asciifolding", "db_ngram"],
"type" => "custom" },
:index_name_analyzer => {
"tokenizer" => "standard",
"filter" => ["standard", "lowercase", "asciifolding"],
"type" => "custom" },
:search_name_analyzer => {
"tokenizer" => "whitespace",
"filter" => ["lowercase", "db_ngram"],
"type" => "custom" },
:index_email_analyzer => {
"tokenizer" => "whitespace",
"filter" => ["lowercase"],
"type" => "custom" }
}
} do
mapping do
indexes :id, :index => :not_analyzed
indexes :name, :index_analyzer => 'index_name_analyzer', :search_analyzer => 'search_name_analyzer'
indexes :email, :index_analyzer => 'index_email_analyzer', :search_analyzer => 'search_email_analyzer'
end
end
Specific cases that don't work well:
emails with hyphen (eg. email-hyphen#domain.com)
query string '#' at the beginning or end
exact matches
searching with wildcard like '#' gets very unexpected results.
Suppose I have, "aaa#email.com", "aaa_0#email.com", and "aaa-0#email.com, searching "aaa" gives me "aaa#a.com" "aaa-0#email.com. Searching "aaa*" give me everything, but "aaa-*" gives me nothing. So, how should I do exact match wildcard queries? For these type of queries, I get pretty much the same results for different tokenizer/analyzer.
I do these after each mapping change:
Model.tire.index.delete
Model.tire.create_elasticsearch_index
Model.tire.index.import Model.all
References:
Configure ElasticSearch to use ngram by default. - SQL LIKE %% behavior
http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/

Considering what you are trying to accomplish, KeywordAnalyzer might be a reasonable choice of analyzer, though I don't see anything that would cause problems with a WhitespaceAnalyzer.
I suspect you are running into problems with the query parsing and analysis, although you haven't really described how you are querying. Simplest case would be to simply use term or prefix queries.
It does seem a bit like StandardAnalyzer would serve your purpose here, mostly (differentiating between "aaa_0" and "aaa-0" would be a problem), as long as it is applied consistently, and your query is correct.

Related

Wordpress: Wp_Query - find posts with meta_query OR tax_query

I want to find posts which have a specific post meta key/value or posts which are in a specific category. Here is the query. I want to combine the tax_query and the meta_query with OR, but AFAICS there's no way to do this.
$args = [
'post_type' => 'my-event',
'post_status' => 'publish',
'posts_per_page' => -1,
'orderby' => 'title',
'order' => 'ASC',
'cat' => 'home',
'tax_query' => [
[
'taxonomy' => 'event-cat',
'terms' => [
'open',
],
'field' => 'slug',
'operator' => 'IN',
'include_children' => true,
],
],
'meta_query' => [
[
'key' => 'event_author',
'value' => [1,7,11,15],
'compare' => 'IN',
],
],
];
$loop = new WP_Query($args);
Result should be:
All posts (events) which are in the category 'open' (no matter who the author is) AND all posts which are from one of the specified authors (no matter in which category the event is).
On SO I found a few similar questions and I think I have to create a SQL query to find a solution but I don't know how to do it. Hope that someone can help me here.
Thanks and best regards.
In order to make custom,unconventional and a bit complicated queries like this, you have to learn some basic SQL Syntax. Trust me, SQL basics are not so hard to learn, and they are really worth it if you consider how many projects in many different programming languages need some SQL knowledge.
WordPress gives you the option to make custom queries with wpdb::get_results() function. In case you already know how to use SQL syntax, try exploring the wordpress database a little, and you will find what table columns you need to extract to get the result you want.

Ruby integration with mailchimp-api gem

I'm trying to subscribe a single email to multiple lists with RoR and the official mailchimp-api gem. It works, but the last four values (double_optin, update_existing, replace_interests, and send_welcome) are not updating and I get an error that the email "already exists" even though I'm trying to pass the update_existing as true. I've written Mailchimp several times and they feel they've reached the end of their assistance. They have said they are not experts in the wrapper--even if it is the "official" gem--and cannot help me further. My code looks like this:
responses << mailchimp_lists.each do |ml|
mailchimp.lists.subscribe(
ml,
{ "email" => order.customer_email,
"euid" => order.customer_id,
"leid" => ""
},
{ "FNAME" => order.customer_first_name,
"LNAME" => order.customer_last_name,
"COMPANY" => order.company_name,
"ADDRESS1" => order.billing_address_1,
"ADDRESS2" => order.billing_address_2,
"CITY" => order.billing_city,
"STATE" => order.billing_state,
"POSTALCODE" => order.billing_zip,
"SALUTATION" => ""
},
"html",
false,
true,
false,
false
)
end
I've tried sending the last four params in several different ways such as:
"email_type" => "html",
"double_optin" => false,
Or:
{"email_type" => "html"},
{"double_optin" => false}
At times, Mailchimp can see the params arrive in such a way that it seems it should not be triggering an "email already exists" error, but it just won't work. Any help is appreciated.
The mailchimp-api gem's documentation describes the subscribe method as:
#subscribe(id, email, merge_vars = nil, email_type = 'html', double_optin = true, update_existing = false, replace_interests = true, send_welcome = false)
While the batch_subscribe shows:
#batch_subscribe(id, batch, double_optin = true, update_existing = false, replace_interests = true)
Note that the batch method does not include a "send_welcome" param. When I removed it from my list of params for the subscribe method--essentially sending three booleans instead of four as suggested, the update_existing worked perfectly. Seems like an error in the documentation here: http://www.rubydoc.info/gems/mailchimp-api/2.0.4/Mailchimp/Lists#subscribe-instance_method
Hopefully this helps someone else!

ElasticSearch Tire two field conditional filter

I'm doing a mutli-index query With Tire and rails 3 and I want to filter out Venues who have approved => false so I need some sort of combo filter.
Here is the query
query = params[:q]
from = params.delete(:from)
size = params[:size] || 25
Tire.search(
[Venue.index_name,
Performer.index_name, User.index_name], load: true) do |s|
s.query do
string(query, fields: [:_all, :name, :title], use_dis_max: true)
end
s.from from if from
s.size size if size
end.results.to_a
This line removes all Performers and Users because they don't have an :approved field.
s.filter(:term, :approved => true )
And this line obviously removes all non-venues which is no good.
s.filter(:term, { :approved => true, :index_name => 'venues'} )
Any ideas besides adding an approved: true field to all Users and Performers? I think something like this is what I want conceptually:
s.filter(:term, :approved => true, :if => {:index_name => 'venues'} )
EDIT Thanks to Mallox I was able to find the Should construct but I'm still struggling to implement it Tire. It seems like the below code should work but it return no results on any query. I also remove the "{:terms => { :index_name => ["performers", "users"]}}," to make sure it wasn't my use of index name or multiple lines of query that was the problem and still no luck. Can anybody shed some light on how to do this in Tire?
s.filter(:bool, :should => [
{:terms => { :index_name => ["performers", "users"]}},
{:term => { :approved => true}},
] )
So i have little knowledge about Ruby and Tire, but the ElasticSearch query that you want to build would be based on a bool filter, that contains some "should" entries (which would translate into inclusive OR).
So in your case something along the lines of:
"filter" : {
"bool" : {
"should" : [
{
"terms" : { "_type" : ["Performers","Users"] }
},
{
"term" : { "approved" : true }
}
]
}
}
Take a look at the documentation here, maybe that'll help:
:http://www.elasticsearch.org/guide/reference/query-dsl/bool-filter/

LookbackAPI: When did user stories become unblocked?

I'm running the following query to the lookback API to find stories in a date range that were unblocked, but I'm getting no results. Am I missing something obvious? No errors, warnings or results returned.
Below is the Generated Query I get back from the lookback API:
'GeneratedQuery' => {
'fields' => 'true',
'skip' => 0,
'limit' => 100,
'find' => {
'_PreviousValues.Blocked' => 'true',
'_TypeHierarchy' => -51038,
'Blocked' => 'false',
'_ValidFrom' => {
'$lte' => '2012-11-02T04:00:00.000Z',
'$gte' => '2012-07-01T04:00:00.000Z'
}
}
},
When you pass in Boolean values, you need to make sure that they are bare true or false. If you pass them in as Strings, it will not behave as expected. Similarly for values of type Number. They should not have quotes around them.
Ok, the problem was related to "true" and "false" and the fact that I'm using Perl.
I'm using the Perl JSON library, and I didn't realize that you need to pass in JSON::true() and JSON::false() for true and false, not the literals 'true' and 'false'. So, in effect Larry was right: it was passing "true" instead of true.

How do I make DBIx::Class join tables using other operators than `=`?

Summary
I've got a table of items that go in pairs. I'd like to self-join it so I can retrieve both sides of the pair in a single query. It's valid SQL (I think), the SQLite engine actually does accept it, but I'm having trouble getting DBIx::Class to bite the bullet.
Minimal example
package Schema::Half;
use parent 'DBIx::Class';
__PACKAGE__->load_components('Core');
__PACKAGE__->table('half');
__PACKAGE__->add_columns(
whole_id => { data_type => 'INTEGER' },
half_id => { data_type => 'CHAR' },
data => { data_type => 'TEXT' },
);
__PACKAGE__->has_one(dual => 'Schema::Half', {
'foreign.whole_id' => 'self.whole_id',
'foreign.half_id' => 'self.half_id',
# previous line results in a '='
# I'd like a '<>'
});
package Schema;
use parent 'DBIx::Class::Schema';
__PACKAGE__->register_class( 'Half', 'Schema::Half' );
package main;
unlink 'join.db';
my $s = Schema->connect('dbi:SQLite:join.db');
$s->deploy;
my $h = $s->resultset('Half');
$h->populate([
[qw/whole_id half_id data /],
[qw/1 L Bonnie/],
[qw/1 R Clyde /],
[qw/2 L Tom /],
[qw/2 R Jerry /],
[qw/3 L Batman/],
[qw/3 R Robin /],
]);
$h->search({ 'me.whole_id' => 42 }, { join => 'dual' })->first;
The last line generates the following SQL:
SELECT me.whole_id, me.half_id, me.data
FROM half me
JOIN half dual ON ( dual.half_id = me.half_id AND dual.whole_id = me.whole_id )
WHERE ( me.whole_id = ? )
I'm trying to use DBIx::Class join syntax to get a <> operator between dual.half_id and me.half_id, but haven't managed to so far.
Things I've tried
The documentation hints towards SQL::Abstract-like syntax.
I tried writing the has_one relationship as such:
__PACKAGE__->has_one(dual => 'Schema::Half', {
'foreign.whole_id' => 'self.whole_id',
'foreign.half_id' => { '<>' => 'self.half_id' },
});
# Invalid rel cond val HASH(0x959cc28)
Straight SQL behind a stringref doesn't make it either:
__PACKAGE__->has_one(dual => 'Schema::Half', {
'foreign.whole_id' => 'self.whole_id',
'foreign.half_id' => \'<> self.half_id',
});
# Invalid rel cond val SCALAR(0x96c10b8)
Workarounds and why they're insufficient to me
I could get the correct SQL to be generated with a complex search() invocation, and no defined relationship. It's quite ugly, with (too) much hardcoded SQL. It has to imitated in a non-factorable way for each specific case where the relationship is traversed.
I could work around the problem by adding an other_half_id column and joining with = on that. It's obviously redundant data.
I even tried to evade said redundancy by adding it through a dedicated view (CREATE VIEW AS SELECT *, opposite_of(side) AS dual FROM half...) Instead of the database schema it's the code that got redundant and ugly, moreso than the search()-based workaround. In the end I wasn't brave enough to get it working.
Wished SQL
Here's the kind of SQL I'm looking for. Please note it's only an example: I really want it done through a relationship so I can use it as a Half ResultSet accessor too in addition to a search()'s join clause.
sqlite> SELECT *
FROM half l
JOIN half r ON l.whole_id=r.whole_id AND l.half_id<>r.half_id
WHERE l.half_id='L';
1|L|Bonnie|1|R|Clyde
2|L|Tom|2|R|Jerry
3|L|Batman|3|R|Robin
Side notes
I really am joining to self in my full expanded case too, but I'm pretty sure it's not the problem. I kept it this way for the reduced case here because it also helps keeping the code size small.
I'm persisting on the join/relationship path instead of a complex search() because I've got multiple uses for the association, and I didn't find any "one size fits all" search expression.
Late update
Answering my own question two years later, it used to be a missing functionality that has since then been implemented.
For those still interested by this, it's finally been implemented as of 0.08192 or earlier. (I'm on 0.08192 currently)
One correct syntax would be:
__PACKAGE__->has_one(dual => 'Schema::Half', sub {
my $args = shift;
my ($foreign,$self) = #$args{qw(foreign_alias self_alias)};
return {
"$foreign.whole_id" => { -ident => "$self.whole_id" },
"$foreign.half_id" => { '<>' => { -ident => "$self.half_id" } },
}
});
Trackback: DBIx::Class Extended Relationships on fREW Schmidt's blog where I got to first read about it.
I think that you could do it by creating a new type of relationship extending DBIx::Class::Relationship::Base but it doesn't seem incredibly well documented. Have you considered the possibility of just adding a convenience method on the resultset set for Half that does a ->search({}, { join => ... } and returns the resultset from that to you? It's not introspectable like a relationship but other than that it works pretty much as well. It uses DBIC's ability to chain queries to your advantage.
JB, notice that instead of:
SELECT *
FROM half l
JOIN half r ON l.whole_id=r.whole_id AND l.half_id<>r.half_id
WHERE l.half_id='L';
You can write the same query using:
SELECT *
FROM half l
JOIN half r ON l.whole_id=r.whole_id
WHERE l.half_id<>r.half_id AND l.half_id='L';
Which will return the same data and is definitely easier to express using DBIx::Class.
Of course, this doesn't answer the question "How do I make DBIx::Class join tables using other operators than =?", but the example you showed doesn't justify such need.
Have you tried:
__PACKAGE__->has_one(dual => 'Schema::Half', {
'foreign.whole_id' => 'self.whole_id',
'foreign.half_id' => {'<>' => 'self.half_id'},
});
I believe the matching criteria in the relationship definition is the same used for searches.
Here is how to do it:
...
field => 1, # =
otherfield => { '>' => 2 }, # >
...
'foreign.half_id' => \'<> self.half_id'