CVC4 minimize/maximize model optimization - optimization

Does CVC4 an option to maximize or minimize the result model for bitvectors as Z3 does?
Thanks.

Unfortunately, CVC4 does not (yet) support optimization. For bitvectors, you can always do it yourself using multiple queries and binary search, but it's not built-in.

Related

idea behind xgboost/lightgbm/catboost in comparison

I'm trying to decide, which one of the following I will use in practice for regression tasks: xgboost, lightgbm or catboost (python 3).
So, what are general idea behind each of them? Why should I choose one, but not another?
I'm not interested in very slight difference in the accuracy score like 0.781 vs 0.782. Result should be tenable, and my tool should be robust, convenient in use. The workhorse.
As I understand about these methods, Just how they are implemented is different, otherwise they have implemented GBM methods.
So you should just try to do some hyper parameter tuning.
Also, its good idea to read this paper:
catboost-vs-light-gbm-vs-xgboost
You cannot determine a priori which Tree algorithm (or any algorithm) will be automatically the best. This is because of the https://en.wikipedia.org/wiki/No_free_lunch_theorem
It's best to try them all out. You should also throw in Random Forest (RF) as another one to try.
I will say that http://CatBoost.ai (CB) does have one advantage over the others: if you have Categorical Variables, CB will most likely beat the others because it can handle categorical variables directly without One-Hot-Encoding.
You might try http://H2O.ai 's grid search which supports several algorithms (RF, XGBoost, GBM, Linear Regression) with Hypertuning of parameters to see which one works best. You can run this overnight. (CB is not included in H2O's grid search)

How does "minimize" work in Z3

I'm using the minimize function in Z3 a lot and I'm worrying about some scalability issues (when the number of variables I'm minimizing grows). What is the underlying algorithm of "minimize" and is there a general way to speed things up?
See this paper for details on the optimization algorithms used in Z3. Regarding your question about "general way to speed things up:" Impossible to tell without seeing exactly what you're trying to do and how you are encoding it. Posting a concrete example where things don't "scale" might be helpful.

Why are there multiple options for the same SMT solver

In Leon verifier, why are there different options that use the same solver even when inductive reasoning happens within Leon? Eg. all the 3 options: fairz3, smt-z3 and unrollz3 seem to use a z3 solver and perform inductive reasoning in leon.
All three options are doing the thing in principle, but differ slightly in implementation (leading to different performances/reliability).
The fairz3 option use the native Z3 api (via the ScalaZ3 library) while the smt-z3 communicates with a Z3 process standard input (using the SMT-LIB standard via the Scala SMT-LIB library). In order to use smt-z3 you will need to make sure a z3 command is in your PATH.
With fairz3, Leon and Z3 are running in the same process, which means that a crash in Z3 would bring down the whole process, and there is nothing that can be done in Leon to prevent it. When using smt-z3, we run Z3 as a separate process, and we can run Leon in isolation from that process. The process can be killed at any point if it becomes unresponsive (or if Leon decides to time out the solver).
The fair name is due to historical reason. The original implementation of Leon was based on the native API of Z3 (apparently for performance reason, it is faster to build formula trees directly in Z3 instead of building them in Leon and then translating them for Z3). The solver in Leon ended up being named FairZ3Solver, with Fair as in fair unrolling of the functions. All the unrolling logic was mixed with Z3 communication.
There is a second (new) implementation of the inductive unrolling in Leon (known as UnrollingSolver) that is independent of the underlying solver (Z3, CVC4, RandomSolver). That unrolling is just as "fair" as the one provided by fairz3. When you use unrollz3 you are using this UnrollingSolver (which is also used with smt-z3) with the underlying solver being the native interface of Z3 (not using SMT-LIB text interface). The main difference with FairZ3Solver is that, besides being more general, the unrolling is done on Leon tree. This slightly impacts the performance.

How do you find the most discriminant terms in binary document classification?

I want to use feature selection to find the terms in a document that are most useful for a binary classification task.
I've been looking around:
This mentions Mutual Information and the chi-squared test metric
http://nlp.stanford.edu/IR-book/html/htmledition/feature-selection-1.html
MATLAB has a number of functions as well:
http://www.mathworks.com/help/toolbox/stats/brj0qbu.html
Feature Selection in MATLAB
Of the above, relieff and rankfeatures look promising.
I do not know if my data follows a normal distribution. Any thoughts on which technique performs the best? Are there any newer methods you would suggest? The focus is to increase classification accuracy.
Thank you!
Since the answer is highly dependent on the nature of your data, I'd suggest playing with several options, possibly using a hold-out set for verification.
The easiest path would probably be to use Weka or RapidMiner for experimenting. Choosing from the plethora of options provided by them, you'll probably get acquainted with several other methods.
Having said that, I have found Mutual Information/Infogain to be useful on a large variety of problems.

precision and recall in lucene

hello all
I was wondering, if I want to measure precision and recall in lucene then what's the best way for me to do it? is there any sample cource code that I can use?
a little background, I am using lucene to create a kind of search engine for my thesis. and I also wanted to do an analysis of how well those search engine performs and the only way to do that I think is for me to compute the precision and recall metrics. so any suggestion would be helpful.
thanks though
You can try these email threads. Alternatively, you can use MRR.
See also Search Application Relevance Issues.