LLVM ScalarEvolution Pass Cannot Compute Exit Count for Loop Vectorizer - optimization

I'm trying to figure out how to run LLVM's built-in loop vectorizer. I have a small program containing an extremely simple loop (I had some output at one point which is why stdio.h is still being included despite never being used):
1 #include <stdio.h>
2
3 unsigned NUM_ELS = 10000;
4
5 int main() {
6 int A[NUM_ELS];
7
8 #pragma clang loop vectorize(enable)
9 for (int i = 0; i < NUM_ELS; ++i) {
10 A[i] = i*2;
11 }
12
13 return 0;
14 }
As you can see, it does nothing at all useful; I just need the for loop to be vectorizable. I'm compiling it to LLVM bytecode with
clang -emit-llvm -O0 -c loop1.c -o loop1.bc
llvm-dis -f loop1.bc
Then I'm applying the vectorizer with
opt -loop-vectorize -force-vector-width=4 -S -debug loop1.ll
However, the debug output gives me this:
LV: Checking a loop in "main" from loop1.bc
LV: Loop hints: force=? width=4 unroll=0
LV: Found a loop: for.cond
LV: SCEV could not compute the loop exit count.
LV: Not vectorizing: Cannot prove legality.
I've dug around in the LLVM source a bit, and it looks like SCEV comes from the ScalarEvolution pass, which has the task of (among other things) counting the number of back edges back to the loop condition, which in this case (if I'm not mistaken) should be the trip count minus the first trip (so 9,999 in this case). I've run this pass on a much larger benchmark and it gives me the exact same error at every loop, so I'm guessing it isn't the loop itself, but that I'm not giving it enough information.
I've spent quite a bit of time combing through the documentation and Google results to find an example of a full opt command using this transformation, but have been unsuccessful so far; I'd appreciate any hints as to what I may be missing (I'm new to vectorizing code so it could be something very obvious).
Thank you,
Stephen

vectorization depends on number of other optimization which needs to be run before. They are not run at all at -O0, therefore you cannot expect that your code would be 'just' vectorized there.
Adding -O2 before -loop-vectorize in opt cmdline would help here (make sure your 'A' array is external / used somehow, otherwise everything will be optimized away).

Related

What is the perl6 equivalent of #INC, please?

I go
export PERL6LIB="/GitHub/perl6-Units/lib"
and then
echo $PERL6LIB
/GitHub/perl6-Units/lib
But when I run perl6 t/01-basic.t
use v6;
use Test;
plan 3;
lives-ok {
use Units <m>;
ok #Units::UNITS.elems > 0;
ok (0m).defined;
}
done-testing;
I still get an error
===SORRY!===
Could not find Units at line 8 in:
/Users/--me--/.perl6
/usr/local/Cellar/rakudo-star/2018.01/share/perl6/site
/usr/local/Cellar/rakudo-star/2018.01/share/perl6/vendor
/usr/local/Cellar/rakudo-star/2018.01/share/perl6
CompUnit::Repository::AbsolutePath<140707489084448>
CompUnit::Repository::NQP<140707463117264>
CompUnit::Repository::Perl5<140707463117304>
In Perl 5 I would have used print "#INC"; to see what paths are searched for the lib before the error is thrown. Using say flat $*REPO.repo-chain.map(*.loaded); either is before it loads or after it throws the exception.
Any help would be much appreciated - or maybe a hint on what to put in ~/.perl6 as I can't get a symlink to work either.
The error message itself is telling you what the library paths available are. You are failing to print them because you are expecting a run time action ( say ) to take place before a compile time error -- you could print out $*REPO at compile time, but again the exception is already showing you what you wanted.
$ PERL6LIB="/GitHub/perl6-Units/lib" perl6 -e 'BEGIN say $*REPO.repo-chain; use Foo;'
(file#/GitHub/perl6-Units/lib inst#/Users/ugexe/.perl6 inst#/Users/ugexe/.rakudobrew/moar-2018.08/install/share/perl6/site inst#/Users/ugexe/.rakudobrew/moar-2018.08/install/share/perl6/vendor inst#/Users/ugexe/.rakudobrew/moar-2018.08/install/share/perl6 ap# nqp# perl5#)
===SORRY!===
Could not find Foo at line 1 in:
/GitHub/perl6-Units/lib
/Users/ugexe/.perl6
/Users/ugexe/.rakudobrew/moar-2018.08/install/share/perl6/site
/Users/ugexe/.rakudobrew/moar-2018.08/install/share/perl6/vendor
/Users/ugexe/.rakudobrew/moar-2018.08/install/share/perl6
CompUnit::Repository::AbsolutePath<140337382425072>
CompUnit::Repository::NQP<140337350057496>
CompUnit::Repository::Perl5<140337350057536>
You can see /GitHub/perl6-Units/lib is showing up in the available paths, which is unlike your example. I'd question if your shell/env is actually setup correctly.

Yosys logic loop falsely detected

I've been testing yosys for some use cases.
Version: Yosys 0.7+200 (git sha1 155a80d, gcc-6.3 6.3.0 -fPIC -Os)
I wrote a simple block which converts gray code to binary:
module gray2bin (gray, bin);
parameter WDT = 3;
input [WDT-1:0] gray;
output [WDT-1:0] bin;
assign bin = {gray[WDT-1], bin[WDT-1:1]^gray[WDT-2:0]};
endmodule
This is an acceptable and valid code in verilog, and there is no loop in it.
It passes compilation and synthesis without any warnings in other tools.
But, when I run in yosys the next commands:
read_verilog gray2bin.v
scc
I get that a logic loop was found:
Found an SCC: $xor$gray2bin.v:11$1
Found 1 SCCs in module gray2bin.
Found 1 SCCs.
The next code, which is equivalent, pass the check:
module gray2bin2 (
gray,
bin
);
parameter WDT = 3;
input [WDT-1:0] gray;
output [WDT-1:0] bin;
assign bin[WDT-1] = gray[WDT-1];
genvar i;
generate
for (i = WDT-2; i>=0; i=i-1) begin : gen_serial_xor
assign bin[i] = bin[i+1]^gray[i];
end
endgenerate
endmodule
Am I missing a flag or synthesis option of some kind?
Using word-wide operators this circuit clearly has a loop (generated with yosys -p 'prep; show' gray2bin.v):
You have to synthesize the circuit to a gate-level representation to get a loop-free version (generated with yosys -p 'synth; splitnets -ports; show' gray2bin.v, the call to splitnets is just there for better visualization):
The answer given by CliffordVienna indeed gives a solution, but I also want to clarify that that it's not suitable to all purposes.
My analysis was done for the purpose of formal verification. Since I replaced the prep to synth to solve the falsely identified logic loops, my formal code got optimized. Wires which I've created that were driven only by the assume property pragma, were removed - this made many assertions redundant.
It's not correct to reduce any logic for the purpose of behavioral verification.
Therefore, if the purpose is to prepare a verification database, I suggest not to use the synth command, but to use a subset of commands the synth command executes.
You can find those commands under:
http://www.clifford.at/yosys/cmd_synth.html
In general, I've used all the commands specified in the above link that do not optimize logic:
hierarchy -check
proc
check
wreduce
alumacc
fsm
memory -nomap
memory_map
techmap
abc -fast
hierarchy -check
stat
check
And everything works as expected.

Why are inline if statements an average of at least one-third slower than other types of if?

Consider the following Perl 6 script skeleton:
my regex perlish { .*[ea]?[ui]? rl $ }
my Str #words = '/usr/share/dict/words'.IO.lines;
for #words -> $word {
...
}
base idea for the code in this question from the perl6 website's examples.
My /usr/share/dict/words is an indirect symbolic link to /usr/share/dict/american-english. It's 99,171 lines long, with one word/line.
For comparison's sake, Python 3 does 100 loops of the below in a total of 32 seconds: that's just 0.32942s / loop.1
Here are the things I've tried putting in place of the stub code, with their benchmark times as noted:
"Inline" if — 100 loops, average 9.74219s / loop, totalling 16 min 14.219s
say "$word probably rhymes with Perl" if $word ~~ /<perlish>/;
say "$word is a palindrome" if $word eq $word.flip && $word.chars > 1;
Short Circuit (not ternary) — 10 loops, average 6.1925s / loop, normalised to totalling +/- 10.3 min
$word eq $word.flip && $word.chars > 1 && say "$word is a palindrome";
$word ~~ /<perlish>/ && say "$word probably rhymes with Perl";
given/when (switch/case) — 100 loops, average 6.18568s / loop totalling 10 min 18.568s
given $word {
when /<perlish>/
{ say "$word probably rhymes with Perl"; proceed; }
when $word eq $word.flip && $word.chars > 1
{ say "$word is a palindrome"; proceed; }
}
"normal" if block — 100 loops, average 6.0588s / loop totalling 10 min 5.880s
if $word eq $word.flip && $word.chars > 1 { say "$word is a palindrome"; }
if $word ~~ /<perlish>/ { say "$word probably rhymes with Perl"; }
Somewhat unsurprisingly, the normal if block is fastest. But, why is the inline if (what the website uses for an example) so much slower?
1 I'm not saying Perl 6 is slow... but I thought Python was slow and... wow. Perl 6 is slow... ignoring multithreading, parallelism and concurrency, all of which are built in by Perl 6 and which Python leaves much to be desired.
Specs: Rakudo version 2015.12-219-gd67cb03 on MoarVM version 2015.12-29-g8079ca5 implementing Perl 6.c on a
2.2GHz QuadCore Intel Mobile i7 with 6GB of RAM.
I ran the tests like time for i in ``seq 0 100``; do perl6 --optimize=3 words.pl6; done.
(This page became the p6doc Performance page.)
Dealing with Perl 6 speed issues
I don't know why the statement modifier form of if is slower. But I can share things that can help folk deal with Perl 6 speed issues in general so I'll write about those, listed easiest first. (I mean easiest things for users and potential users to do, not easiest for compiler devs.)
Why does the speed of your code matter?
I recommend you share your answer to these higher level questions:
How much faster would your code need to run to make a worthwhile difference? Could the full speed up wait another month? Another year?
Are you exploring Perl 6 for fun, assessing its potential long term professional relevance to you, and/or using it in your $dayjob?
Wait for Rakudo to speed up
5 years ago Rakudo was 1,000 times slower or more for some operations. It's been significantly speeding up every year for years even though speeding it up was explicitly not the #1 dev priority. (The mantra has been "make it work, make it work right, make it fast". 2016 is the first year in which the "make it work fast" aspect is truly in the spotlight.)
So, imo, one sensible option if the Rakudo Perl 6 compiler is really too slow for what you want to do, is to wait for others to make it faster for you. It could make sense to wait for the next official release (there's at least several each year) or wait a year or three depending on what you're looking for.
Visit the freenode IRC channel #perl6
Compiler devs, the folk who best know how to speed up Perl 6 code, aren't answering SO questions. But they are generally responsive on #perl6.
If you don't get all the details or results you want from here then your best bet is to join the freenode IRC channel #perl6 and post your code and timings. (See next two headings for how best to do that.)
Profile code snippets
Rakudo on MoarVM has a built in profiler:
$ perl6 --profile -e 'say 1'
1
Writing profiler output to profile-1453879610.91951.html
The --profile option is currently only for micro-analysis -- the output from anything beyond a tiny bit of code will bring your browser to its knees. But it could be used to compare profiles of simple snippets using if conventionally vs as a statement modifier. (Your regex using examples are almost certainly too complex for the current profiler.)
Profiling results may well mean little to you without help and/or may point to confusing internal stuff. If so, please visit #perl6.
Write faster Perl 6 code, line by line
Your immediate focus seems to be the question of why one way of writing a line of code is slower than another way. But the flipside of this "academic" question is the practical one of writing faster lines of code.
But if someone's a Perl 6 newbie, how are they going to know how? Asking here is one way but the recommended approach is visiting #perl6 and letting folk know what you want.
#perl6 has on-channel evalbots that help you and others investigate your issue together. To try code snippets out publicly enter m: your code goes here. To do so privately write /msg camelia m: your code goes here.
For simple timing use variations on the idiom now - INIT now. You can also generate and share --profile results easily using a #perl6 evalbot. Just join the channel and enter prof-m: your code goes here.
Write faster Perl 6 code by refactoring
Use better algorithms, especially parallel/concurrent ones.
Use native arrays (eg Array[int8] for an array of 8 bit integers) for compact, faster number crunching.
For more info about doing this, visit #perl6.
Use (faster) foreign code
Use NativeCall wrappers for C libs such as Gumbo or for C++ libs (experimental). NativeCall itself is currently poorly optimized but that's set to change in 2016 and for many applications the NativeCall overhead is a small part of performance anyway.
Inline::Perl5 builds on NativeCall to enable use of Perl 5 in Perl 6 (and vice-versa) including arbitrary Perl 5 code and high-performance Perl 5 XS modules. This interop allows passing integers, strings, arrays, hashes, code references, file handles and objects between Perl 5 and Perl 6; calling methods on Perl 5 objects from Perl 6 and calling methods on Perl 6 objects from Perl 5; and subclassing Perl 5 classes in Perl 6.
(There are similar but less mature or even alpha variants for other langs like Inline::Python, Inline::Lua and Inline::Ruby.)
Review benchmarks
The best relevant benchmarking tool I know of is perl6-bench which compares various versions of Perl with each other including various versions of both Perl 5 and Perl 6.
There may already be benchmarks contrasting a regular if statement and a statement modifier form if statement but I doubt it. (And if not, you would be making a nice contribution to Perl 6 if you wrote an extremely simple pair of snippets and got them added to perl6-bench.)
Help speed Rakudo up
The Rakudo Perl 6 compiler is largely written in Perl 6. So if you can write Perl 6, then you can hack on the compiler, including optimizing any of the large body of existing high-level code that impacts the speed of your code.
Most of the rest of the compiler is written in a small language called NQP that's almost just a subset of Perl 6. So if you can write Perl 6 you can fairly easily learn to use and improve the middle-level NQP code too.
Finally, if low-level C hacking is your idea of fun, checkout MoarVM.
I had a different answer before, which was based on a piece of code I accidentally left in in between benchmark runs.
given this benchmark code:
my regex perlish { [ea?|u|i] rl $ }
my Str #words = '/usr/share/dict/words'.IO.lines;
multi sub MAIN('postfixif') {
for #words -> $word {
say "$word probably rhymes with Perl" if $word ~~ / [ea?|u|i] rl $ /;
say "$word is a palindrome" if $word eq $word.flip && $word.chars > 1;
}
}
multi sub MAIN('prefixif') {
for #words -> $word {
if $word ~~ /[ea?|u|i] rl $ / { say "$word probably rhymes with Perl" };
if $word eq $word.flip && $word.chars > 1 { say "$word is a palindrome" };
}
}
multi sub MAIN('postfixif_indirect') {
for #words -> $word {
say "$word probably rhymes with Perl" if $word ~~ / <perlish> /;
say "$word is a palindrome" if $word eq $word.flip && $word.chars > 1;
}
}
multi sub MAIN('prefixif_indirect') {
for #words -> $word {
if $word ~~ / <perlish> / { say "$word probably rhymes with Perl" };
if $word eq $word.flip && $word.chars > 1 { say "$word is a palindrome" };
}
}
multi sub MAIN('shortcut') {
for #words -> $word {
if $word.ends-with('rl') && $word ~~ / [ea?|u|i] rl $ / { say "$word probably rhymes with Perl" };
if $word eq $word.flip && $word.chars > 1 { say "$word is a palindrome" };
}
}
I get the following results:
3x postfixif_indirect: real 1m20.470s
3x prefixif_indirect: real 1m21.970s
3x postfixif: real 0m50.242s
3x prefixif: real 0m49.946s
3x shortcut: real 0m8.077s
The postfixif_indirect code corresponds to your "Inline" if, the prefixif_indirect code corresponds to your "normal" if block. The ones without "_indirect" just have the regex itself in the if statement rather than indirectly called as <perlish>.
As you can see, the speed difference between regular if blocks and postfix if is barely measurable on my machine. But also, I was measuring against a different file from yours. Mine has 479.828 lines, so you can't directly compare the timings anyway.
However, a quick glance over the profile output from perl6 --profile pointed out that 83% of total time was spent in ACCEPTS (which is the method that implements the smart match operator ~~) or in things called by it.
What tipped me off to the fact that the indirect call to perlish may be expensive was that the time spent inside perlish was only 60%. So about 23% of time was spent doing some sort of setup work before perlish could even start matching against the string. Pretty bad, I admit. Surely, this'll be a good target for optimization.
But the biggest gain was adding a short-circuiting check just to see if the string ends in "rl". This gets our code down to 10% of what it used to take.
Our regex engine surely deserves a whole lot more optimization. Potentially, if a regex can be statically known to only ever match if the target string starts with or ends in a specific substring, it could have a check emitted up-front so that none of the setup work has to be done in the "failure to match" case.
We'll definitely see what 2016 will bring. I'm already excited for sure!
EDIT: Even though i used "for i in seq 0 100, that only executes things three times on my machine. I have no clue what's up with that, but I corrected the timing lines to say 3x instead of 100x.

while [[ condition ]] stalls on loop exit

I have a problem with ksh in that a while loop is failing to obey the "while" condition. I should add now that this is ksh88 on my client's Solaris box. (That's a separate problem that can't be addressed in this forum. ;) I have seen Lance's question and some similar but none that I have found seem to address this. (Disclaimer: NO I haven't looked at every ksh question in this forum)
Here's a very cut down piece of code that replicates the problem:
1 #!/usr/bin/ksh
2 #
3 go=1
4 set -x
5 tail -0f loop-test.txt | while [[ $go -eq 1 ]]
6 do
7 read lbuff
8 set $lbuff
9 nwords=$#
10 printf "Line has %d words <%s>\n" $nwords "${lbuff}"
11 if [[ "${lbuff}" = "0" ]]
12 then
13 printf "Line consists of %s; time to absquatulate\n" $lbuff
14 go=0 # Violate the WHILE condition to get out of loop
15 fi
16 done
17 printf "\nLooks like I've fallen out of the loop\n"
18 exit 0
The way I test this is:
Run loop-test.sh in background mode
In a different window I run commands like "echo some nonsense >>loop_test.txt" (w/o the quotes, of course)
When I wish to exit, I type "echo 0 >>loop-test.txt"
What happens? It indeed sets go=0 and displays the line:
Line consists of 0; time to absquatulate
but does not exit the loop. To break out I append one more line to the txt file. The loop does NOT process that line and just falls out of the loop, issuing that "fallen out" message before exiting.
What's going on with this? I don't want to use "break" because in the actual script, the loop is monitoring the log of a database engine and the flag is set when it sees messages that the engine is shutting down. The actual script must still process those final lines before exiting.
Open to ideas, anyone?
Thanks much!
-- J.
OK, that flopped pretty quick. After reading a few other posts, I found an answer given by dogbane that sidesteps my entire pipe-to-while scheme. His is the second answer to a question (from 2013) where I see neeraj is using the same scheme I'm using.
What was wrong? The pipe-to-while has always worked for input that will end, like a file or a command with a distinct end to its output. However, from a tail command, there is no distinct EOF. Hence, the while-in-a-subshell doesn't know when to terminate.
Dogbane's solution: Don't use a pipe. Applying his logic to my situation, the basic loop is:
while read line
do
# put loop body here
done < <(tail -0f ${logfile})
No subshell, no problem.
Caveat about that syntax: There must be a space between the two < operators; otherwise it looks like a HEREIS document with bad syntax.
Er, one more catch: The syntax did not work in ksh, not even in the mksh (under cygwin) which emulates ksh93. But it did work in bash. So my boss is gonna have a good laugh at me, 'cause he knows I dislike bash.
So thanks MUCH, dogbane.
-- J
After articulating the problem and sleeping on it, the reason for the described behavior came to me: After setting go=0, the control flow of the loop still depends on another line of data coming in from STDIN via that pipe.
And now that I have realized the cause of the weirdness, I can speculate on an alternative way of reading from the stream. For the moment I am thinking of the following solution:
Open the input file as STDIN (Need to research the exec syntax for that)
When the condition occurs, close STDIN (Again, need to research the syntax for that)
It should then be safe to use the more intuitive:while read lbuffat the top of the loop.
I'll test this out today and post the result. I'd hope someone else benefit from the method (if it works).

Print only nonzero results using AMPL + Neos server

I'm doing a optimization model of a relatively big model. I will use 15 timesteps in this model, but now when I'm testing it I am only using 4. However, even with 11 time steps less than desired the model still prints 22 000 rows of variables, where perhaps merely a hundred differs from 0.
Does anyone see a way past this? I.e. a way using NEOS server to only print the variable name and corresponding value if it is higher than 0.
What I've tested is:
solve;
option omit_zero_rows 0; (also tried 1;)
display _varname, _var;
Using both omit_zero_rows 0; or omit_zero_rows 1; still prints every result, and not those higher than 0.
I've also tried:
solve;
if _var > 0 then {
display _varname, _var;
}
but it gave me syntax error. Both (or really, the three) variants were tested in the .run file I use for NEOS server.
I'm posting a solution to this issue, as I believe that this is an issue more people will stumble upon. Basically, in order to print only non-zero values using NEOS Server write your command file (.run file) as:
solve;
display {j in 1.._nvars: _var[j] > 0} (_varname[j], _var[j]);