Strength reduction techniques with Bison - optimization

I have a Bison parser generator and I'm looking for ways to optimize it. Would techniques such as bit shifting for multiplication and division, or creating checks for x*2 and changing it to x+x make it faster? Or is bison already optimized for this sort of thing.

You should do it yourself. You are the one declaring what is the semantic of the operators, bison had no idea what it could be. What if you choose to implement + as printf("error") and * as exp1.erase(exp2)? Is x+x equals 2*x in your language? Probably not.
Will optimization make it faster? It depends on your target language. You can measure, estimate and compare the cost of the two methods, and decide if that optimization is needed.

Related

Support of trigonometric functions ( e.g.: cos, tan) in Z3

I would like to use Z3 to optimize a set of equations. The problem hereby is my equations are non-linear but most importantly they do have trigonometric functions. Is there a way to deal with that in z3?
I am using the z3py API.
Transcendental numbers and trigonometric functions are usually not supported by SMT solvers.
As Christopher pointed out (thanks!), Z3 does have support for trigonometric functions and transcendentals; but the support is rather quite limited. (In practice, this means you shouldn't expect Z3 to decide every single formula you throw at it; in complicated cases, it's most likely to simply return unknown.)
See https://link.springer.com/chapter/10.1007%2F978-3-642-38574-2_12 for the related publication. There are some examples in the following discussion thread that can get you started: https://github.com/Z3Prover/z3/issues/680
Also, note that the optimizing solver of Z3 doesn't handle nonlinear equations; so you wouldn't be able to optimize them. For this sort of optimization problems, traditional SMT solvers are just not the right choice.
However, if you're happy with δ-satisfiability (allowing a certain error factor), then check out dReal, which can deal with trigonometric functions: http://dreal.github.io/ So far as I can tell, however, it doesn't perform optimization.

Can variance be replaced by absolute value in this objective function?

Initially I modeled my objective function as follows:
argmin var(f(x),g(x))+var(c(x),d(x))
where f,g,c,d are linear functions
in order to be able to use linear solvers I modeled the problem as follows
argmin abs(f(x),g(x))+abs(c(x),d(x))
is it correct to change variance to absolute value in this context, I'm pretty sure they imply the same meaning as having the least difference between two functions
You haven't given enough context to answer the question. Even though your question doesn't seem to be about regression, in many ways it is similar to the question of choosing between least squares and least absolute deviations approaches to regression. If that term in your objective function is in any sense an error term then the most appropriate way to model the error depends on the nature of the error distribution. Least squares is better if there is normally distributed noise. Least absolute deviations is better in the nonparametric setting and is less sensitive to outliers. If the problem has nothing to do with probability at all then other criteria need to be brought in to decide between the two options.
Having said all this, the two ways of measuring distance are broadly similar. One will be fairly small if and only if the other is -- though they won't be equally small. If they are similar enough for your purposes then the fact that absolute values can be linearized could be a good motivation to use it. On the other hand -- if the variance-based one is really a better expression of what you are interested in then the fact that you can't use LP isn't sufficient justification to adopt absolute values. After all -- quadratic programming is not all that much harder than LP, at least below a certain scale.
To sum up -- they don't imply the same meaning, but they do imply similar meanings; and, whether or not they are similar enough depends upon your purposes.

GCC optimization flags for matrix/vector operations

I am performing matrix operations using C. I would like to know what are the various compiler optimization flags to improve speed of execution of these matrix operations for double and int64 data - like Multiplication, Inverse, etc. I am not looking for hand optimized code, I just want to make the native code more faster using compiler flags and learn more about these flags.
The flags that I have found so far which improve matrix code.
-O3/O4
-funroll-loops
-ffast-math
First of all, I don't recommend using -ffast-math for the following reasons:
It has been proved that the performance actually degrades when
using this option in most (if not all) cases. So "fast math" is
not actually that fast.
This option breaks strict IEEE compliance on floating-point
operations which ultimately results in accumulation of computational
errors of unpredictable nature.
You may well get different results in different environments and the difference may be
substantial. The term environment (in this case) implies the combination of: hardware,
OS, compiler. Which means that the diversity of situations when you can get unexpected
results has exponential growth.
Another sad consequence is that programs which link against the
library built with this option might
expect correct (IEEE compliant) floating-point math, and this is
where their expectations break, but it will be very tough to figure
out why.
Finally, have a look at this article.
For the same reasons you should avoid -Ofast (as it includes the evil -ffast-math). Extract:
-Ofast
Disregard strict standards compliance. -Ofast enables all -O3 optimizations. It also enables optimizations that are not valid for all standard-compliant programs. It turns on -ffast-math and the Fortran-specific -fno-protect-parens and -fstack-arrays.
There is no such flag as -O4. At least I'm not aware of that one, and there is no trace of it in the official GCC documentation. So the maximum in this regard is -O3 and you should be definitely using it, not only to optimize math, but in release builds in general.
-funroll-loops is a very good choice for math routines, especially involving vector/matrix operations where the size of the loop can be deduced at compile-time (and as a result unrolled by the compiler).
I can recommend 2 more flags: -march=native and -mfpmath=sse. Similarly to -O3, -march=native is good in general for release builds of any software and not only math intensive. -mfpmath=sse enables use of XMM registers in floating point instructions (instead of stack in x87 mode).
Furthermore, I'd like to say that it's a pity that you don't want to modify your code to get better performance as this is the main source of speedup for vector/matrix routines. Thanks to SIMD, SSE Intrinsics, and Vectorization, the heavy-linear-algebra code can be orders of magnitude faster than without them. However, proper application of these techniques requires in-depth knowledge of their internals and quite some time/effort to modify (actually rewrite) the code.
Nevertheless, there is one option that could be suitable in your case. GCC offers auto-vectorization which can be enabled by -ftree-vectorize, but it is unnecessary since you are using -O3 (because it includes -ftree-vectorize already). The point is that you should still help GCC a little bit to understand which code can be auto-vectorized. The modifications are usually minor (if needed at all), but you have to make yourself familiar with them. So see the Vectorizable Loops section in the link above.
Finally, I recommend you to look into Eigen, the C++ template-based library which has highly efficient implementation of most common linear algebra routines. It utilizes all the techniques mentioned here so far in a very clever way. The interface is purely object-oriented, neat, and pleasing to use. The object-oriented approach looks very relevant to linear algebra as it usually manipulates the pure objects such as matrices, vectors, quaternions, rotations, filters, and so on. As a result, when programming with Eigen, you never have to deal with such low level concepts (as SSE, Vectorization, etc.) yourself, but just enjoy solving your specific problem.

OpenMP with matrices and vectors

What is the best way to utilize OpenMP with a matrix-vector product? Would the for directive suffice (if so, where should I place it? I assume outer loop would be more efficient) or would I need schedule, etc..?
Also, how would I take advantage different algorithms to attempt this m-v product most efficiently?
Thanks
The first step you should take is the obvious one, wrap the outermost loop in a parallel for directive. As you assume. It's always worth experimenting a bit to get some evidence to support your (and my) assumptions, but if you were only allowed to make 1 change that would be the one to make.
I don't know much about cache-oblivious algorithms but I understand that they, generally, work by recursive division of a problem into sub-problems. This doesn't seem to fit with the application of parallel for directives. I suspect you could implement such an algorithm with OpenMP's tasks, but I suspect that the overhead of doing this would outweigh any execution improvements on any m-v product of reasonable dimensions.
(If you demonstrate the falsity of this argument on m-v products of size N I will retort 'N's not a reasonable dimension'. As ever with these performance questions, evidence trumps argument every time.)
Finally, depending on your compiler and the availability of libraries, you may not need to use OpenMP for m-v calculations, you might find auto-parallelisation works efficiently, or already have a library implementation which multi-threads this sort of computation.

Do lex and yacc provide optimized code?

Do Lex and Yacc provide optimized code or is it required that we write our own code manually for higher performance?
The code you write has a substantial effect on the speed, especially on the lexer side. Some versions of Flex come with half a dozen (or so) different word counters, most written with Flex, and a few written by hand -- the code gives a pretty good idea of how to optimize scanning speed (and a fairly reasonable comparison of what you can expect in hand-written vs. machine generated lexers).
On the parser side, you're generally a bit more constrained -- you can't make as many changes without affecting the semantics. Here, a great deal depends on the parser generator you use -- for example, some algorithms for less constrained grammars consistently produce relatively slow parsers, while other algorithms only slow down for the less constrained constructs, and run relatively fast as long as the input doesn't use the more complex constructs.
Yacc produces a table-driven parser, which can never be as fast as a well-written hand-coded one. I don't know if the same applies to lex.