Update: Conditional Moves vs. Branches – What Compilers Really Do

Recently, there’s been a lot of chatter online about compilers “ruining” hand-tuned branchless code. Developers expect a ternary expression or manual cmove trick to result in branchless assembly. Instead, the compiler happily emits a branch. What gives?

I had my own confusion…see my original article here.

Branch Prediction vs. Conditional Moves (cmove) – A Nuanced Trade-off

The reality is, cmove is not free. While it avoids mispredictions, it introduces longer dependency chains. Modern out-of-order cores are great at hiding branch misprediction latency if the prediction rate is decent (Agner Fog suggests ~75% or better). If the branch is predictable (often the case in real-world workloads), a predicted branch can outperform a conditional move.

To recap:

  • cmove avoids branch mispredictions (good for unpredictable data)
  • cmove extends dependency chains (bad for high ILP code)
  • Branches with high predictability (over 75%) are often faster than cmove
  • Unpredictable branches are catastrophic for performance

The Compiler’s View of the Ternary Operator

Many developers assume that writing:

r = cond ? a : b;

forces the compiler to generate a cmov or equivalent branchless code. This is wrong. Both GCC and Clang lower this to IR using a branch, not a cmove. Why? Because a and b could have side effects, and the C++ language requires that:

  • Only a executes if cond is true.
  • Only b executes if cond is false.

This makes the ternary operator semantically equivalent to:

if (cond)
    r = a;
else
    r = b;

As a result, the optimizer sees a branch in the IR, not a branchless cmove, and the backend codegen can’t magically turn it into a cmove unless it can prove both sides are side-effect-free and cheap.

When to Force cmove (and When Not To)

If you really want a cmove, you need to force the compiler’s hand by writing low-level code that eliminates ambiguity:

r = cond * a + (1 - cond) * b; // classic trick, but awkward

Or, with intrinsics:

r = _mm_blendv_epi8(b, a, mask); // for SIMD

Here is a complete example for Compiler Explorer:

#include <immintrin.h>
#include <cstdint>

// Example inputs
bool cond = true;
int32_t a = 42;
int32_t b = 24;

// Case 1: Plain ternary (compiler decides)
int32_t ternary_example() {
    return cond ? a : b;
}

// Case 2: Classic arithmetic trick to force cmove-like behavior
int32_t arithmetic_trick() {
    return cond * a + (1 - cond) * b;
}

// Case 3: SIMD blend (forces vectorized cmove if supported)
int32_t simd_blend() {
    __m128i va = _mm_set1_epi32(a);
    __m128i vb = _mm_set1_epi32(b);
    __m128i mask = _mm_set1_epi32(cond ? -1 : 0);  // -1 = all bits set (true)
    __m128i result = _mm_blendv_epi8(vb, va, mask);
    return _mm_extract_epi32(result, 0);  // extract first int
}

Which shows the different assembly generated:

ternary_example():
        lea     rax, [rip + a]
        lea     rcx, [rip + b]
        cmp     byte ptr [rip + cond], 0
        cmovne  rcx, rax
        mov     eax, dword ptr [rcx]
        ret

arithmetic_trick():
        movzx   eax, byte ptr [rip + cond]
        mov     ecx, dword ptr [rip + a]
        imul    ecx, eax
        xor     eax, 1
        imul    eax, dword ptr [rip + b]
        add     eax, ecx
        ret

simd_blend():
        movzx   eax, byte ptr [rip + cond]
        neg     eax
        movd    xmm1, dword ptr [rip + b]
        movd    xmm2, dword ptr [rip + a]
        movd    xmm0, eax
        pblendvb        xmm1, xmm2, xmm0
        movd    eax, xmm1
        ret

cond:
        .byte   1

a:
        .long   42

b:
        .long   24

Summary

CaseBest Option
Highly predictable (>=75%)Branch (compiler default)
Highly unpredictablecmove (forced if necessary)
Side-effectful expressionsBranch (required by language)

Final Word

Compilers aren’t out to sabotage your performance. They’re making reasonable trade-offs based on heuristics and analysis. If in doubt, profile real prediction rates and measure performance before jumping to conclusions.


This section is part of the ongoing updates to Branch Prediction – The Definitive Guide for High-Performance C++.


Discover more from John Farrier

Subscribe to get the latest posts sent to your email.

One thought on “Update: Conditional Moves vs. Branches – What Compilers Really Do

Leave a Reply

Discover more from John Farrier

Subscribe now to keep reading and get access to the full archive.

Continue reading