V10/cmd/sml/doc/optimize
How to Standard ML of New Jersey run faster
1. Each compilation unit is compiled separately. None of the
optimizations take place across compilation-unit boundaries.
Example:
fun f(x) = (x,x);
fun g 0 = nil | g i = f i :: g(i-1);
This is two compilation units if typed at top level, or if loaded
from a file because at the first semicolon, the function f is compiled,
and then at the next semicolon, g is compiled. The function g will run
significantly faster if any of the following is used instead:
fun f(x) = (x,x)
fun g 0 = nil | g i = f i :: g(i-1);
local fun f(x) = (x,x);
in fun g 0 = nil | g i = f i :: g(i-1)
end;
structure S = struct
fun f(x) = (x,x);
fun g 0 = nil | g i = f i :: g(i-1);
end;
In either of these last two, of course, the semicolons are optional.
Moral of the story: use small compilation units while typing to
the interactive system and seeing how things work. Use larger
compilation units when compiling large programs. I recommend the
use of the module system, or of "let" and "local" declarations,
to bind things together in a well-structured way.
The use of signature constraints to minimize the number of things
exported from structures will reduce memory usage, and is just clean style.
2. For the fanatic: (these are not guaranteed forever)
The initial environment (i.e. the List, Array, Ref, etc. structures)
is normally in a separate module from the user program. If you
would like a copy of this stuff in your program so that calls to the
pervasive functions will have less overhead, textually insert
src/boot/fastlib.sml near the beginning of your own structure.
This only helps, of course, if fastlib.sml is put into the same
compilation unit as the functions calling it, using the module
system as described above.
You can nest structures. To get better performance, after you
have developed your program, nest the whole thing in one huge
structure, e.g.
structure Whole : sig end = struct
your program
end
You can even put signatures and functors at top level inside such a
structure, although this is not "Standard" ML.
3. You can increase the level of optimization, if you want to wait
a bit longer for compiles. To make things compile more slowly
but run faster, execute this before compiling your program:
System.Control.CG.reducemore := 0;
System.Control.CG.rounds := 10;
System.Control.CG.bodysize := 20;
To make things compile faster but run slower, try this:
System.Control.CG.reducemore := 10000;
System.Control.CG.rounds := 0;
System.Control.CG.bodysize := ~100;
System.Control.CG.reduce := false;
4. You can measure the execution time of your programs using the
functions in System.Timer.
(* in the initial environment,
signature TIMER =
sig
datatype time = TIME of {sec : int, usec : int}
type timer
val start_timer : unit -> timer
val check_timer : timer -> time
val check_timer_gc: timer -> time
val makestring : time -> string
val add_time : time * time -> time
end
structure System.Timer : TIMER
*)
let val t = System.Timer.start_timer()
val _ = run_my_program()
val non_gc_time = System.Timer.check_timer t
val gc_time = System.Timer.check_timer_gc t
val total_time = System.Timer.add_time(non_gc_time,gc_time)
in print(System.Timer.makestring total_time)
end
5. You can also use the execution profiler, described in doc/profiling