Trie/Decision Tree for Conditions/Queries - conditional-statements

I have set of queries and I get an Object at a time and I want to figure out the object satisfies which of the queries.
For example the queries look like
1) Customers with Age > 25 and NetWorth >= 100000
2) Customers with Address Contains "USA"
3) Customers with Age > 30 and Address contains "USA"
Below is the structure of Customer class
class Customer {
public String Name;
public int Age;
public int NetWorth;
public string Address;
}
Is there a way to build a trie like structure where I do not need to match the Customer with each query one by one as I can have a very large number of queries. I want to support >, >=, <, <=, = , !=, contains, !contains, empty, !empty operators.
I search on internet but could not find a specific algorithm I can use.
Before I build something I want to know if there is a known solution already exists.
I want to do this in C#.
Update
I thought more about it and seems like I can also build a decision tree, but not sure given a large set of queries how can I build a decision tree of minimal length.
Thanks,

Related

Lucene search getting strange facet results for a text field

I'm using Sitecore with Lucene, and I'm trying to facet on an integer field, so that I can get all of the existing values for that field. I have the following search result class with a definition for the field:
public class ContentTypeSearchResultItem : Sitecore.ContentSearch.SearchTypes.SearchResultItem
{
[Sitecore.ContentSearch.IndexField("crop_heat_units")]
public int CropHeatUnits { get; set; }
}
in my query, I have
query = query.FacetOn.FacetOn(x => x.CropHeatUnits)
I have a number of other facets of type ID or IEnumerable<Guid> and these work as I expect, but the crop_heat_units string facet is giving me weird results, such as chufacet.Values[0].Name = \u0001\0\0\0\0\0\0\0\u000e\b. Some of the other values are #\b\0\0\0\0 and 8\u0010\0\0\0\0\0.
In Sitecore, the values of the Crop Heat Units field are things like "2075" and "2200".
Each numeric value is indexed as a trie structure within Lucene where each term is logically assigned to larger and larger predefined brackets which are simply lower precision representations of the value.
So, the simple solution is to change int to string for your CropHeatUnits field definition and remove it from the fieldmap. Then your queries and facets should work as expected. If you want to use the CropHeatUnits values as integers then you will need to cast their string values to integer after their retrieval from Lucene.

get the pattern of unknown strings using sql?

I have database have thousand of unknow string they may be emails ,phonenum
BUT they are not for me mean they are not email or cell num for me they are only string for me but i want their common pattern so here is the string for example purposes
link to example click here
now what i want is this file out put if pattern matcehs 3 time here what i am doing is
DECLARE #strs2 nvarchar(255)
DECLARE #patternTable table(
id int ,
order by p.pat
but my example return this
485-2889
485-2889
) 485-2889
) 485-2889
.aol.com/aol/search?
.aol.com/aol/search?
gmail.com
gmail.com
but i want to add this for pattern
[a-zA-Z 0-9] [a-zA-Z 0-9] [a-zA-Z 0-9] - 485-2889
for gmail
[a-zA-Z 0-9] [a-zA-Z 0-9]# gmail.com
First of all, this is much more work than it might seem.
As far as I can say it's going to be method with heavy processing (and probably not something you want to do with a cursor in SQL (cursors are sort of bad in terms of efficiency).
You have to define a way for your code to identify a pattern. You will also have to work in priorities where a set of strings matches multiple patterns. For instance if you implement following pattern criteria (in your example):
BK-M18B-48
BK-M18B-52
BK-M82B-44
BK-M82S-38
BK-M82S-44
BK-R50B-58
BK-R50B-62
.....
should generate BK-[A-Z]-[0-9][0-9][A-Z]-[0-9][0-9]
Then next set can have multiple patterns as a result:
fedexcarepackage#outlook.com (example added for explanations)
fedexcarepackage#office.com
fedexcourierexpress#pisem.net
fedexcouriers#gmail.com ( another example added for explanations)
.....
Can generate :
fedexc%#%.% (as you said)
fedexc%#% (depending on processing)
fedexc[A-Z][A-Z]....%#%[A-Z]....[A-Z].[A-Z][A-Z][A-Z] (alphanumeris with '%' to compensate for length difference)
in addition to that if you take away fedexcarepackage#outlook.com from string list you get 1 additional pattern that you probably don't want to have:
fedexc%#%i%.% (because they have 'i' somewhere between the '#' and '.' (dot)
Anyway, that is something you will have to consider with your design.
I'll give you some basic logic you can work with:
Create a functions to identify each distinct pattern (1 pattern / function). For instnace, 1 function to check for static pieces of string (and attaching wildcards); Another to detect [A-Z],[0-9] patterns that match your conditions for this pattern to be valid; more if needed for different patterns.
Create a function to test a string with your pattern. So say you have 4 string, you find a pattern when comparing first 2 of them. Then you use this function to test if pattern applies to 3rd and 4th strings.
Create a function to test if 2 patterns are mutually exclusive. For instance 'PersonA#yahoo.%' and 'PersonA#%.net' patterns are not mutually exclusive, if they were both tested to be true. 'Person%#yahoo.com' and 'PersonB#yahoo.com' are mutually exclusive (both patterns cannot be true, so 1 is redundant.
Create a function to combine patterns that are NOT mutually exclusive (probably includes the use of function in 2nd and 3rd point). So 'PersonA#yahoo.%' and 'PersonA#%.net' can be combined into 'PersonA#%.%'
Once you have that setup, loop through each text line, and compare Current line to the next against each pattern criteria. Record any patterns you find (in a variable dedicated to that criteria, (don't mix them just yet).
Next comes the hardest part, safest way is to compare each pattern you find against each of the strings, to rule out the ones that don't apply to all strings. However, you could probably work out a way to combine patterns (in the same category) without cross checking
Finally, after you narrowed own your pattern list to 1 pattern per pattern type. Combine them into 1 or eliminate the ones
Keep in mind that in your pattern detection functions, you'll probably have to test each line multiple times and combine patterns. Some pseudo code to demonstrate:
Function CompareForStringMatches (String s1, String s2){ -- it should return a possible pattern found.
Array/List pattern;
int patternsFound=0;
For(i = 0, to length of shorter string){
For(x = 0, to length of shorter string){
if(longerString.contains(shorterString.substring(from i, to x)){
--record the pattern somewhere as:
pattern[patternsFound] = Replace(longerString, shorterString.Substring(from i, to x), '%') --pattern = longerString with substring replaced with '%' sign
patternsFound = patternsFound+1;
}
}
}
--After loops make another loop to check (partial) patterns against each other to eliminate patterns that are part of a larger pattern
--for instance Comparing 'random#asd.com' and 'sundom#asd.com' the patterns below should be found:
---compare'%andom#asd.com' and '%ndom#asd.com' and eliminate the first pattern, because both are valid, but second pattern includes the first one.
--You will have a lot of similar matches, but if you do this, you should end up with only a few patterns.
--after first cycle of checks do another one to combine patterns, where possible(for instance if you compare 'random#asd.com' and 'sundom#asd.net' you will end up with these 2 patterns'%ndom#asd.com' and 'Random#asd.%'.
--Since these patterns are true (because they were found during a comparison) you can combine them into '%ndom#asd.%'
--when you combine/eliminate all patterns, you should only have 1 left
return pattern[only pattern left];
}
PS: You can do things, much more efficiently, but if you have no idea where to start out, you probably need to do it the long way and work on improvements from first working prototypes.
Edit/Update
I suggest you make a wildcard detection method and then apply other patter checks you implement before it.
Wildcard detection for comparison of 2 strings (pseudo code), heavy processing version :
Compare 2 strings, check if every possible segment of shorter string is within longer:
for(int i = 0; i<shorterString.Length;i++){
for(int x = 0; i<shorterString.Length;i++){
if(longerString.contains(shorterString.substring(i,x))){ --from i to x
possiblePattern.Add(longerString.replace(shorterString.substring(i,x),'*')
--add to pattern list
}
}
--Next compare partal matches and eliminate ones that are a part of larger pattern
--So '*a#gmail.com' and '*na#yahoo.com' comparison should eliminate '*na#gmail.com', because if shorter pattern (with more symbols removed) is valid, then similar one with an extra symbol is part of it
--When that is done, combine remaining matches if there's more than 1 left.
--Remember, all patterns are valid if your first loop was correct, so '*#gmail.com' and 'personA#*.com' can be combined into '*#*.com
}
As for the alphanumeric detection. I would suggest you start by checking length of all strings. If they are the same, run the wildcard pattern detection method (for all of them). When done ONLY look for patern matches in wildcards.
So, You'll get a pattern like BK-*-* from wildcard detection run. On second iteration loop take 2 strings and only extract sub-strings that are represented by wildcard characters (use an array or an equivalent to store sub-strings, make sure not to combine both wildcards of a single string into 1 string).
So if you compare with pattern found above (BK-*-*) :
BK-M18B-48
BK-M18B-52
You should get following string sets to process after eliminating static characters:
Set 1:M18B and 48
Set 2:M18B and 52
Compare each character to opposite string in same position and check if characters match your category (like if String1[0].isaLetter AND String2[0].isaLetter). If they do add that 1 character to a pattern, if not either:
Add a wildcard character (will lead to pattern like BK-[A-Z]*[0-9][0-9]-[0-9][0-9]. If you do this combine adjacent wildcard characters to 1.
Pattern is false and you should abbort the ch'eck returning no patterns.
Use this basic logic to loop through strings, create (and store!!!!) patterns for each set of 2 strings. Loop through patterns, with wildcard detection (possibly a lighter version) to combine/eliminate paterns. So if you get patterns like '#yahoo.com' and '#gmail.com' from different sets of strings you should combine them into '#.com'
Keep in mind there's lots of room for optimization here.

Population of a selection box base on the Other Selection Box

Good Day! I have a problem I am trying to populate a Selection box base on the selected data on the other selection box here is my code
.py
licensetype = fields.Many2one('hr.licensetype','License Type')
license = fields.Many2one('hr.license','License')
#api.one
#api.onchange('licensetype')
def getlicense(self):
if len(self.licensetype) > 0:
mdlLicense= self.env['hr.license'].search([('license_name', '=', int(self.licensetype[0]))])
#raise Warning(mdlLicense.ids)
self.license = mdlLicense.ids
but still it populate all license I want to populate the License based on the selected License type. This is in Odoo8
Domains
A domain is a list of criteria, each criteria being a triple (either a list or a tuple) of (field_name, operator, value).
Here,
field_name :
It's string type and must be from the current model or any relational traversal field through the Many2one/One2many field using membership (.) dot operator.
- operator : It's for comparing field's value with passed value.
Valid operator list (>, >=, <, <=, =, !=, =?, ilike, like =like, =ilike, not like, not ilike, childs_of, in, not in)
value : It's for comparing with field's value.
Multiple criteria can be joined with three logical operators.
Logical AND, logical OR, logical NOT.
Read more about domain
You can easily achieve this by defining domain for that field, no need to write any extra code.
Just put domain in your xml code.
<field name="licensetype" />
<field name="license" domain="[('licensetype','=',licensetype)]" />
Note :
Remember there must be relation between hr.license and hr.licensetype. licensetype must be Many2one in hr.license.
It will give the same effect as you want.

Elasticsearch: match every position only once

In my Elasticsearch index I have documents that have multiple tokens at the same position.
I want to get a document back when I match at least one token at every position.
The order of the tokens is not important.
How can I accomplish that? I use Elasticsearch 0.90.5.
Example:
I index a document like this.
{
"field":"red car"
}
I use a synonym token filter that adds synonyms at the same positions as the original token.
So now in the field, there are 2 positions:
Position 1: "red"
Position 2: "car", "automobile"
My solution for now:
To be able to ensure that all positions match, I index the maximum position as well.
{
"field":"red car",
"max_position": 2
}
I have a custom similarity that extends from DefaultSimilarity and returns 1 tf(), idf() and lengthNorm(). The resulting score is the number of matching terms in the field.
Query:
{
"custom_score": {
"query": {
"match": {
"field": "a car is an automobile"
}
},
"_script": "_score*100/doc[\"max_position\"]+_score"
},
"min_score":"100"
}
Problem with my solution:
The above search should not match the document, because there is no token "red" in the query string. But it matches, because Elasticsearch counts the matches for car and automobile as two matches and that gives a score of 2 which leads to a script score of 102, which satisfies the "min_score".
If you needed to guarantee 100% matches against the query terms you could use minimum_should_match. This is the more common case.
Unfortunately, in your case, you wish to provide 100% matches of the indexed terms. To do this, you'll have to drop down to the Lucene level and write a custom (java - here's boilerplate you can fork) Similarity class, because you need access to low-level index information that is not exposed to the Query DSL:
Per document/field scanned in the query scorer:
Number of analyzed terms matched (overlap is the Lucene terminology, it is used the the coord() method of the DefaultSimilarity class)
Number of total analyzed terms in the field: Look at this thread for a couple different ways to get this information: How to count the number of terms for each document in lucene index?
Then your custom similarity (you can probably even extend DefaultSimilarity) will need to detect queries where terms matched < total terms and multiply their score by zero.
Since query and index-time analysis have already happened at this level of scoring, the total number of indexed terms will already be expanded to include synonyms, as should the query terms, avoiding the false-positive "a car is an automobile" issue above.

SQL from where like with or statement

I have a SQL statement to search for an application by name.
def apps = Application.findAll("from Application as app where lower(app.name) like '%${params.query.toLowerCase()}%' ")
I want to not just be able to search by the applications name, but by different properties such as type and language. How would I add or statements to allow me to do this. Thanks!
It's not clear (to me) what framework you're using, or what specific RDBMS your database is on, but the overall thrust of SQL conditions is usually pretty simple.
Pretty much, like most other computer languages, you use and/or/not statements. Of course, in SQL you use the words, instead of (what are usually) symbols:
AND instead of &&
OR instead of ||
NOT instead of !
along with the standard comparison operators:
<, <=, =, <>, >, >=
Note that 'equals' is only a single = sign, and 'not equals' is usually opposed angle brackets <>.
So, given your current statement, if you wanted to only get those applications in the user's language, and exclude apps that are 'disabled', you could do something like the following:
def apps = Application.findAll("from Application as app
where lower(app.name) like '%${params.query.toLowerCase()}%'
and app.language = userLanguageParameter
and app.status <> inactiveAppStatus ");