How to prevent common sub-expression elimination (CSE) with GHC - optimization

Given the program:
import Debug.Trace
main = print $ trace "hit" 1 + trace "hit" 1
If I compile with ghc -O (7.0.1 or higher) I get the output:
hit
2
i.e. GHC has used common sub-expression elimination (CSE) to rewrite my program as:
main = print $ let x = trace "hit" 1 in x + x
If I compile with -fno-cse then I see hit appearing twice.
Is it possible to avoid CSE by modifying the program? Is there any sub-expression e for which I can guarantee e + e will not be CSE'd? I know about lazy, but can't find anything designed to inhibit CSE.
The background of this question is the cmdargs library, where CSE breaks the library (due to impurity in the library). One solution is to ask users of the library to specify -fno-cse, but I'd prefer to modify the library.

How about removing the source of the trouble -- the implicit effect -- by using a sequencing monad that introduces that effect? E.g. the strict identity monad with tracing:
data Eval a = Done a
| Trace String a
instance Monad Eval where
return x = Done x
Done x >>= k = k x
Trace s a >>= k = trace s (k a)
runEval :: Eval a -> a
runEval (Done x) = x
track = Trace
now we can write stuff with a guaranteed ordering of the trace calls:
main = print $ runEval $ do
t1 <- track "hit" 1
t2 <- track "hit" 1
return (t1 + t2)
while still being pure code, and GHC won't try to get to clever, even with -O2:
$ ./A
hit
hit
2
So we introduce just the computation effect (tracing) sufficient to teach GHC the semantics we want.
This is extremely robust to compile optimizations. So much so that GHC optimizes the math to 2 at compile time, yet still retains the ordering of the trace statements.
As evidence of how robust this approach is, here's the core with -O2 and aggressive inlining:
main2 =
case Debug.Trace.trace string trace2 of
Done x -> case x of
I# i# -> $wshowSignedInt 0 i# []
Trace _ _ -> err
trace2 = Debug.Trace.trace string d
d :: Eval Int
d = Done n
n :: Int
n = I# 2
string :: [Char]
string = unpackCString# "hit"
So GHC has done everything it could to optimize the code -- including computing the math statically -- while still retaining the correct tracing.
References: the useful Eval monad for sequencing was introduced by Simon Marlow.

Reading the source code to GHC, the only expressions that aren't eligible for CSE are those which fail the exprIsBig test. Currently that means the Expr values Note, Let and Case, and expressions which contain those.
Therefore, an answer to the above question would be:
unit = reverse "" `seq` ()
main = print $ trace "hit" (case unit of () -> 1) +
trace "hit" (case unit of () -> 1)
Here we create a value unit which resolves to (), but which GHC can't determine the value for (by using a recursive function GHC can't optimise away - reverse is just a simple one to hand). This means GHC can't CSE the trace function and it's 2 arguments, and we get hit printed twice. This works with both GHC 6.12.4 and 7.0.3 at -O2.

I think you can specify the -fno-cse option in the source file, i.e. by putting a pragma
{-# OPTIONS_GHC -fno-cse #-}
on top.
Another method to avoid common subexpression elimination or let floating in general is to introduce dummy arguments. For example, you can try
let x () = trace "hi" 1 in x () + x ()
This particular example won't necessarily work; ideally, you should specify a data dependency via dummy arguments. For instance, the following is likely to work:
let
x dummy = trace "hi" $ dummy `seq` 1
x1 = x ()
x2 = x x1
in x1 + x2
The result of x now "depends" on the argument dummy and there is no longer a common subexpression.

I'm a bit unsure about Don's sequencing monad (posting this as answer because the site doesn't let me add comments). Modifying the example a bit:
main :: IO ()
main = print $ runEval $ do
t1 <- track "hit 1" (trace "really hit 1" 1)
t2 <- track "hit 2" 2
return (t1 + t2)
This gives us the following output:
hit 1
hit 2
really hit 1
That is, the first trace fires when the t1 <- ... statement is executed, not when t1 is actually evaluated in return (t1 + t2). If we define the monadic bind operator as
Done x >>= k = k x
Trace s a >>= k = k (trace s a)
instead, the output will reflect the actual evaluation order:
hit 1
really hit 1
hit 2
That is, the traces will fire when the (t1 + t2) statement is executed, which is (IMO) what we really want. For example, if we change (t1 + t2) to (t2 + t1), this solution produces the following output:
hit 2
really hit 2
hit 1
The output of the original version remains unchanged, and we don't see when our terms are really evaluated:
hit 1
hit 2
really hit 2
Like the original solution, this also works with -O3 (tested on GHC 7.0.3).

Related

Purpose of anonymous modules in Agda

Going at the root of Agda standard library, and issuing the following command:
grep -r "module _" . | wc -l
Yields the following result:
843
Whenever I encounter such anonymous modules (I assume that's what they are called), I quite cannot figure out what their purpose is, despite of their apparent ubiquity, nor how to use them because, by definition, I can't access their content using their name, although I assume this should be possible, otherwise their would be no point in even allowing them to be defined.
The wiki page:
https://agda.readthedocs.io/en/v2.6.1/language/module-system.html#anonymous-modules
has a section called "anonymous modules" which is in fact empty.
Could somebody explain what the purpose of anonymous modules is ?
If possible, any example to emphasize the relevance of the definition of such modules, as well as how to use their content would be very much appreciated.
Here are the possible ideas I've come up with, but none of them seems completely satisfying:
They are a way to regroup thematically identical definitions inside an Agda file.
Their name is somehow infered by Agda when using the functions they provide.
Their content is only meant to be visible / used inside their englobing module (a bit like a private block).
Anonymous modules can be used to simplify a group of definitions which share some arguments. Example:
open import Data.Empty
open import Data.Nat
<⇒¬≥ : ∀ {n m} → n < m → n ≥ m → ⊥
<⇒¬≥ = {!!}
<⇒> : ∀ {n m} → n < m → m > n
<⇒> = {!!}
module _ {n m} (p : n < m) where
<⇒¬≥′ : n ≥ m → ⊥
<⇒¬≥′ = {!!}
<⇒>′ : m > n
<⇒>′ = {!!}
Afaik this is the only use of anonymous modules. When the module _ scope is closed, you can't refer to the module anymore, but you can refer to its definitions as if they hadn't been defined in a module at all (but with extra arguments instead).

Why syntax error in this very simple print command

I am trying to run following very simple code:
open Str
print (Str.first_chars "testing" 0)
However, it is giving following error:
$ ocaml testing2.ml
File "testing2.ml", line 2, characters 0-5:
Error: Syntax error
There are no further details in the error message.
Same error with print_endline also; or even if no print command is there. Hence, the error is in part: Str.first_chars "testing" 0
Documentation about above function from here is as follows:
val first_chars : string -> int -> string
first_chars s n returns the first n characters of s. This is the same
function as Str.string_before.
Adding ; or ;; at end of second statement does not make any difference.
What is the correct syntax for above code.
Edit:
With following code as suggested by #EvgeniiLepikhin:
open Str
let () =
print_endline (Str.first_chars "testing" 0)
Error is:
File "testing2.ml", line 1:
Error: Reference to undefined global `Str'
And with this code:
open Str;;
print_endline (Str.first_chars "testing" 0)
Error is:
File "testing2.ml", line 1:
Error: Reference to undefined global `Str'
With just print command (instead of print_endline) in above code, the error is:
File "testing2.ml", line 2, characters 0-5:
Error: Unbound value print
Note, my Ocaml version is:
$ ocaml -version
The OCaml toplevel, version 4.02.3
I think Str should be built-in, since opam is not finding it:
$ opam install Str
[ERROR] No package named Str found.
I also tried following code as suggested in comments by #glennsl:
#use "topfind"
#require "str"
print (Str.first_chars "testing" 0)
But this also give same simple syntax error.
An OCaml program is a list of definitions, which are evaluated in order. You can define values, modules, classes, exceptions, as well as types, module types, class types. But let's focus on values so far.
In OCaml, there are no statements, commands, or instructions. It is a functional programming language, where everything is an expression, and when an expression is evaluated it produces a value. The value could be bound to a variable so that it could be referenced later.
The print_endline function takes a value of type string, outputs it to the standard output channel and returns a value of type unit. Type unit has only one value called unit, which could be constructed using the () expression. For example, print_endline "hello, world" is an expression that produces this value. We can't just throw an expression in a file and hope that it will be compiled, as an expression is not a definition. The definition syntax is simple,
let <pattern> = <expr>
where is either a variable or a data constructor, which will match with the structure of the value that is produced by <expr> and possibly bind variable, that are occurring in the pattern, e.g., the following are definitions
let x = 7 * 8
let 4 = 2 * 2
let [x; y; z] = [1; 2; 3]
let (hello, world) = "hello", "world"
let () = print_endline "hello, world"
You may notice, that the result of the print_endline "hello, world" expression is not bound to any variable, but instead is matched with the unit value (), which could be seen (and indeed looks like) an empty tuple. You can write also
let x = print_endline "hello, world"
or even
let _ = print_endline "hello, world"
But it is always better to be explicit on the left-hand side of a definition in what you're expecting.
So, now the well-formed program of ours should look like this
open Str
let () =
print_endline (Str.first_chars "testing" 0)
We will use ocamlbuild to compile and run our program. The str module is not a part of the standard library so we have to tell ocamlbuild that we're going to use it. We need to create a new folder and put our program into a file named example.ml, then we can compile it using the following command
ocamlbuild -pkg str example.native --
The ocamlbuild tool will infer from the suffix native what is your goal (in this case it is to build a native code application). The -- means run the built application as soon as it is compiled. The above program will print nothing, of course, here is an example of a program that will print some greeting message, before printing the first zero characters of the testing string,
open Str
let () =
print_endline "The first 0 chars of 'testing' are:";
print_endline (Str.first_chars "testing" 0)
and here is how it works
$ ocamlbuild -package str example.native --
Finished, 4 targets (4 cached) in 00:00:00.
The first 0 chars of 'testing' are:
Also, instead of compiling your program and running the resulting application, you can interpret your the example.ml file directly, using the ocaml toplevel tool, which provides an interactive interpreter. You still need to load the str library into the toplevel, as it is not a part of the standard library which is pre-linked in it, here is the correct invocation
ocaml str.cma example.ml
You should add ;; after "open Str":
open Str;;
print (Str.first_chars "testing" 0)
Another option is to declare code block:
open Str
let () =
print (Str.first_chars "testing" 0)

How to enable hints and warnings in the online REPL

I figured that I can do it on the command line REPL like so:
java -jar frege-repl-1.0.3-SNAPSHOT.jar -hints -warnings
But how can I do the same in http://try.frege-lang.org
Hints and warnings are already enabled by default. For example,
frege> f x = f x
function f :: α -> β
3: application of f will diverge.
Perhaps we can make it better by explicitly saying it as warning or hint (instead of colors distinguishing them) something like:
[Warning] 3: application of f will diverge.
and providing an option to turn them on/off.
Update:
There was indeed an issue (Thanks Ingo for pointing that out!) with showing warnings that are generated in a later phase during the compilation. This issue has been fixed and the following examples now correctly display warnings in the REPL:
frege> h x = 0; h false = 42
function h :: Bool -> Int
4: equation or case alternative cannot be reached.
frege> f false = 6
function f :: Bool -> Int
5: function pattern is refutable, consider
adding a case for true

Forth as an interactive C program tester

I'm willing to use an interactive language to test some C code from a legacy project. I know a little Forth, but I haven't ever used it in a real world project. I'm looking at pForth right now.
Is it reasonable to use an interactive Forth interpreter to test the behavior of some function in a C program? This C code has lots of structs, pointers to structs, handles and other common structures found in C.
I suppose I'll have to write some glue code to handle the parameter passing and maybe some struct allocation in the Forth side. I want an estimate from someone with experience in this field. Is it worth it?
If you want interactive testing and are targeting embedded platforms, then Forth is definitely a good candidate. You'll always find a Forth implementation that runs on your target platform. Writing one is not even hard either if need be.
Instead of writing glue code specific to your immediate needs, go for a generic purpose Forth to C interface. I use gforth's generic C interface which is very easy to use. For structure handling in Forth, I use an MPE style implementation which is very flexible when it comes to interfacing with C (watch out for proper alignment though, see gforth %align / %allot / nalign).
The definition of generic purpose structure handling words takes about 20 lines of Forth code, same for single linked lists handling or hash tables.
Since you cannot use gforth (POSIX only), write an extension module for your Forth of choice that implements a similar C interface. Just make sure that your Forth and your C interface module uses the same malloc() and free() than the C code you want to test.
With such an interface, you can do everything in Forth by just defining stub words (i.e. map Forth words to C functions and structures).
Here's a sample test session where I call libc's gettimeofday using gforth's C interface.
s" structs.fs" included also structs \ load structure handling code
clear-libs
s" libc" add-lib \ load libc.so. Not really needed for this particular library
c-library libc \ stubs for C functions
\c #include <sys/time.h>
c-function gettimeofday gettimeofday a a -- n ( struct timeval *, struct timezone * -- int )
end-c-library
struct timeval \ stub for struct timeval
8 field: ->tv_sec \ sizeof(time_t) == 8 bytes on my 64bits system
8 field: ->tv_usec
end-struct
timeval buffer: tv
\ now call it (the 0 is for passing NULL for struct timezone *)
tv 0 gettimeofday . \ Return value on the stack. output : 0
tv ->tv_sec # . \ output : 1369841953
Note that tv ->tv_sec is in fact the equivalent of (void *)&tv + offsetof(struct timeval, tv_sec) in C, so it gives you the address of the structure member, so you have to fetch the value with #. Another issue here: since I use a 64 bits Forth where the cell size is 8 bytes, storing/fetching an 8 bytes long is straightforward, but fetching/storing a 4 bytes int will require some special handling. Anyhow, Forth makes this easy: just define special purpose int# and int! words for that.
As you can see, with a good generic purpose C interface you do not need to write any glue code in C, only the Forth stubs for your C functions and structures are needed, but this is really straightforward (and most of it could be automatically generated from your C headers).
Once you're happy with your interactive tests, you can move on to automated tests:
Copy/paste the whole input/output from your interactive test session to a file named testXYZ.log
strip the output (keeping only the input) from your session log and write this to a file named testXYZ.fs
To run the test, pipe testXYZ.fs to your forth interpreter, capture the output and diff it with testXYZ.log.
Since removing output from an interactive session log can be somewhat tedious, you could also start by writing the test script testXYZ.fs then run it and capture the output testXYZ.log, but I prefer starting from an interactive session log.
Et voilà !
For reference, here's the structure handling code that I used in the above example :
\ *****************************************************************************
\ structures handling
\ *****************************************************************************
\ Simple structure definition words. Structure instances are zero initialized.
\
\ usage :
\ struct foo
\ int: ->refCount
\ int: ->value
\ end-struct
\ struct bar
\ int: ->id
\ foo struct: ->foo
\ 16 chars: ->name
\ end-struct
\
\ bar buffer: myBar
\ foo buffer: myFoo
\ 42 myBar ->id !
\ myFoo myBar ->foo !
\ myBar ->name count type
\ 1 myBar ->foo # ->refCount +! \ accessing members of members could use a helper word
: struct ( "name" -- addr 0 ; named structure header )
create here 0 , 0
does>
# ;
\ <field-size> FIELD <field-name>
\ Given a field size on the stack, compiles a word <field-name> that adds the
\ field size to the number on the stack.
: field: ( u1 u2 "name" -- u1+u2 ; u -- u+u2 )
over >r \ save current struct size
: r> ?dup if
postpone literal postpone +
then
postpone ;
+ \ add field size to struct size
; immediate
: end-struct ( addr u -- ; end of structure definition )
swap ! ;
: naligned ( addr1 u -- addr2 ; aligns addr1 to alignment u )
1- tuck + swap invert and ;
\ Typed field helpers
: int: cell naligned cell postpone field: ; immediate
: struct: >r cell naligned r> postpone field: ; immediate
: chars: >r cell naligned r> postpone field: ; immediate
\ with C style alignment
4 constant C_INT_ALIGN
8 constant C_PTR_ALIGN
4 constant C_INT_SIZE
: cint: C_INT_ALIGN naligned C_INT_SIZE postpone field: ; immediate
: cstruct: >r C_PTR_ALIGN naligned r> postpone field: ; immediate
: cchars: >r C_INT_ALIGN naligned r> postpone field: ; immediate
: buffer: ( u -- ; creates a zero-ed buffer of size u )
create here over erase allot ;

How to list directories faster?

I have a few situations where I need to list files recursively, but my implementations have been slow. I have a directory structure with 92784 files. find lists the files in less than 0.5 seconds, but my Haskell implementation is a lot slower.
My first implementation took a bit over 9 seconds to complete, next version a bit over 5 seconds and I'm currently down to a bit less than two seconds.
listFilesR :: FilePath -> IO [FilePath]
listFilesR path = let
isDODD "." = False
isDODD ".." = False
isDODD _ = True
in do
allfiles <- getDirectoryContents path
dirs <- forM allfiles $ \d ->
if isDODD d then
do let p = path </> d
isDir <- doesDirectoryExist p
if isDir then listFilesR p else return [d]
else return []
return $ concat dirs
The test takes about 100 megabytes of memory (+RTS -s), and the program spends around 40% in GC.
I was thinking of doing the listing in a WriterT monad with Sequence as the monoid to prevent the concats and list creation. Is it likely this helps? What else should I do?
Edit: I have edited the function to use readDirStream, and it helps keeping the memory down. There's still some allocation happening, but productivity rate is >95% now and it runs in less than a second.
This is the current version:
list path = do
de <- openDirStream path
readDirStream de >>= go de
closeDirStream de
where
go d [] = return ()
go d "." = readDirStream d >>= go d
go d ".." = readDirStream d >>= go d
go d x = let newpath = path </> x
in do
e <- doesDirectoryExist newpath
if e
then
list newpath >> readDirStream d >>= go d
else putStrLn newpath >> readDirStream d >>= go d
I think that System.Directory.getDirectoryContents constructs a whole list and therefore uses much memory. How about using System.Posix.Directory? System.Posix.Directory.readDirStream returns an entry one by one.
Also, FileManip library might be useful although I have never used it.
Profiling your code shows that most of the CPU time goes in getDirectoryContents, doesDirectoryExist and </>. This means that only changing the data structure won't help very much. If you want to match the performance of find you should use lower level functions for accessing the filesystem, probably the ones which Tsuyoshi pointed out.
One problem is that it has to construct the entire list of directory contents, before the program can do anything with them. Lazy IO is usually frowned upon, but using unsafeInterleaveIO here cut memory use significantly.
listFilesR :: FilePath -> IO [FilePath]
listFilesR path =
let
isDODD "." = False
isDODD ".." = False
isDODD _ = True
in unsafeInterleaveIO $ do
allfiles <- getDirectoryContents path
dirs <- forM allfiles $ \d ->
if isDODD d then
do let p = path </> d
isDir <- doesDirectoryExist p
if isDir then listFilesR p else return [d]
else return []
return $ concat dirs
Would it be an option to use some sort of cache system combined with the read? I was thinking of an async indexing service/thread that kept this cache up-to-date in the background, perhaps you could do the cache as a simple SQL-DB which would then give you some nice performance when doing queries against it?
Can you elaborate anything on your "project/idea" so we can come up with something alternative?
I wouldn't go for a "full index" myself as I mostly build webbased services and "resposnetime" is criticial to me, on the other hand - if its an initial way of starting up a new server I am sure the customers wouldnt mind waiting that first time. I would just store the result in the DB for later lookups.