(out-of-band) performance with gcc3
Konstantin Popov
kost at sics.se
Wed Jan 21 11:39:28 CET 2004
Thanks to Denys and Marc-Antoine for responses:
> > The good news is that I managed to find the "right" option(s) to boost
> > the Mozart's performance with gcc3: one got to compile the emulator
> > with '-fno-guess-branch-probability' (which brings 'PC' back into a
> > [hardware] register), and "arc profiling" (-fprofile-arcs,
> > -fbranch-probabilities) helps too. We're still slower than 2.95.3, BUT
> > one won't notice unless looks after that specifically.
>
> I guess things changed because iirc that was not the case when I tried
> it last. I also guess that you are talking only about emulate.cc. I
yes.
> also reported improvements with "arc profiling", but this is a bit
> difficult to setup to use right off CVS and the gains where not
> sufficient to justify the bother (at the time, at least).
I know ;-)
>
> > Don't ask me why exactly '-fno-guess-branch-probability' causes the
> > effect (my crazy guess was that wrong branch prediction can cause
> > havoc for register allocation, hmm, if it means anything ;-(). The bad
> > news is that it can pop up again (once we modify something).
>
> I haven't checked again, but the last time I fixed it it was a problem
> of inlining heuristics causing register allocation to go nuts. For
> what you observe, it is possible that branch prediction also interacts
> with inlining decision and thus on register allocation madness.
maybe.
This time I didn't change "inlining" limits.
>
> I don't know if this is any use here, but we might consider using
> __builtin_expect to influence branch prediction where we know what's
> right.
yes, of course!
>
> > Now, on x86 the emulator still suffers from the gcc 3.3.2's inability
> > to compile efficiently the "goto *PC" statement (it generates two
> > instructions instead of one). Should I contact gcc or someone else out
> > there has better connections with the gcc folks??
>
> well, after a quick check, I don't seem to see that. Could you point
> to an example in emulate.{cc,s}?
just check the MOVEXX: it should (and does with gcc 2.95.3) look like
MOVEXX:
movl 4(%ebp), %esi
movl 8(%ebp), %ebx
addl $12, %ebp # PC
movl (%esi), %eax
movl %eax, (%ebx)
movl *(%ebp) # * PC
but right now it looks like
MOVEXX:
movl 4(%ebp), %esi
movl 8(%ebp), %ebx
addl $12, %ebp # PC
movl (%esi), %eax
movl %eax, (%ebx)
movl (%ebp), %eax # * PC, <anonymous>
jmp *%eax # <anonymous>
I'd say a primitive peephole optimization is missing, or cannot fire
because the compiler is not sure whether %eax is dead after the jump.
>
> > What's worse, my attempt to supply an assembler snippet for that
> > 'DISPATCH' macro revealed the PROBLEM: "asm" snippets CANNOT contain
> > any instructions that change the control flow [outside the snippet]
> > while the likes of 'INLINEMINUS' etc. actually do!! So, we've got to
> > get rid of that: right now it works "accidentally", until some next
> > clever gcc optimization ;-[
>
> The asm stuff in emulate.cc is not used currently. It is only
oh, I didn't know that ;-) That's good!
> selected when FASTARITH is switched on, which it currently is not (by
> default). Christian disabled it a long time ago because these asm
> macros were not working. They've never been fixed and I don't know if
> they can be.
.. so the answer is simple - in their current incarnation they must
not be switched on.
Are there any plans to "reanimate" this stuff?
[ ]
> >> The good news is that I managed to find the "right" option(s) to boost
> >> the Mozart's performance with gcc3: one got to compile the emulator
> >> with '-fno-guess-branch-probability' (which brings 'PC' back into a
> >> [hardware] register),
>
> I was curious re what the effect would be on Darwin-ppc.
> I am using CVS head, and I introduced that flag
> (-fno-guess-branch-probability) in the Makefile.
> FYI, I am using gcc version 3.3 20030304 (Apple Computer, Inc. build
> 1495)
> and the other flags are -O3 -fstrict-aliasing -fschedule-insns2
> -fomit-frame-pointer
> (adding -finline-limit=500 on the emulator)
>
> The results (using Deny's benchmark) are... mixed, mostly negative
> actually:
> (But the variance on the results is scary, to be honest)
>
> With -fno-guess-branch-probability
> Bridge: 3940
> Compiler: 3110
> Diff: 3650
> FD: 5650
> Knights: 2730
> Nrev: 5010
> Port: 4930
> Rec: 5240
> Tak: 3750
> Tak Thread: 2160
>
> Without -fno-guess-branch-probability
> Bridge: 3720
> Compiler: 2920
> Diff: 3310
> FD: 5490
> Knights: 2700
> Nrev: 5170
> Port: 4820
> Rec: 5240
> Tak: 3710
> Tak Thread: 2720
Well, I'd say the keyword here is "gcc version 3.3 20030304 (Apple
Computer, Inc. build 1495)", as opposed to "vanilla" gcc 3.3.2 - as it
comes from gcc.gnu.or ;-[
Could you tell us the flags that are in effect (can be seen in
emulate.s)?
It would be immensely interesting to know how does the 2.95.3 behaves
on you platform. If you would take the trouble of obtaining that gcc..
Also, what do you mean by "variance" here - variance between your
consecutive runs, or the difference between -fguess-branch-probability
and -fno-guess-branch-probability?
[ ]
Cheers,
--- Kostja.
-
Please send submissions to hackers at mozart-oz.org
and administriva mail to hackers-request at mozart-oz.org.
The Mozart Oz web site is at http://www.mozart-oz.org/.
More information about the mozart-hackers
mailing list