Running GCC's profile guided optimization ("profile feedback directed optimization") on the program to generate the point-growth tree, which has already seen a few iterations of manual optimization.
Baseline speed (-O3): 20.98 points per second
With -march=native: 21.83 points per second (4.0% improvement over baseline)
With -march=native -fprofile-use : 25.95 points per second (23.7% improvement over baseline). PGO is amazing!
One "gotcha" is the -fprofile-generate execution must exit normally (not ctrl-C) in order for the profile data to be written out. This required modifying my program which runs in an infinite loop: periodically check the filesystem for a killswitch file.
gcc 4.5.3-1 (Debian Wheezy 32-bit)
No comments :
Post a Comment