Performance

Open-methods are almost as fast as ordinary virtual member functions when compiled with optimization.

clang compiles the following code:

void call_poke_via_ref(std::ostream& os, Animal& a) {
    poke(os, a);
}

…​to this on the x64 architecture (variable names have been shortened for readability):

mov	    rax, qword ptr [rsi]
mov	    rdx, qword ptr [rip + hash_mult]
imul	rdx, qword ptr [rax - 8]
movzx	ecx, byte ptr [rip + hash_shift]
shr	    rdx, cl
mov	    rax, qword ptr [rip + vptrs]
mov	    rax, qword ptr [rax + 8*rdx]
mov	    rcx, qword ptr [rip + poke::slots_strides]
mov	    rax, qword ptr [rax + 8*rcx]
jmp	    rax

llvm-mca estimates a throughput of 4 cycles per dispatch. Comparatively, calling a native virtual functions takes one cycle. However, the difference is amortized by the time spent passing the arguments and returning from the function; plus, of course, executing the body of the function.

Micro benchmarks suggest that dispatching an open-methods with a single virtual argument is between 30% and 50% slower than calling the equivalent virtual function, with an empty body and no other arguments.

However, call_poke does two things: it constructs a virtual_ptr<Animal> from an Animal&; and then it calls the method. The construction of the virtual_ptr is the costly part, as it involves a hash table lookup. Once that price has been paid, the virtual_ptr can be used multiple times. It is passed to the overrider, which can make further method calls through it. It can be stored in variables in place of plain pointers.

Let’s look at another example: an AST for an arithmetic calculator:

#include <iostream>

#include <boost/openmethod.hpp>
#include <boost/openmethod/compiler.hpp>

using boost::openmethod::virtual_ptr;

struct Node {
    virtual ~Node() {}
};

struct Literal : Node {
    explicit Literal(int value) : value(value) {}

    int value;
};

struct Plus : Node {
    Plus(virtual_ptr<Node> left, virtual_ptr<Node> right)
        : left(left), right(right) {}

    virtual_ptr<Node> left, right;
};

struct Negate : Node {
    explicit Negate(virtual_ptr<Node> node) : child(node) {}

    virtual_ptr<Node> child;
};

BOOST_OPENMETHOD(value, (virtual_ptr<Node>), int);

BOOST_OPENMETHOD_OVERRIDE(value, (virtual_ptr<Literal> node), int) {
    return node->value;
}

BOOST_OPENMETHOD_OVERRIDE(value, (virtual_ptr<Plus> node), int) {
    return value(node->left) + value(node->right);
}

BOOST_OPENMETHOD_OVERRIDE(value, (virtual_ptr<Negate> node), int) {
    return -value(node->child);
}

BOOST_OPENMETHOD_CLASSES(Node, Literal, Plus, Negate);

auto main() -> int {
    boost::openmethod::initialize();

    Literal one(1), two(2);
    Plus sum(one, two);
    Negate neg(sum);

    std::cout << value(neg) << "\n"; // -3

    return 0;
}

The Negate overrider compiles to:

mov	rdi,    qword ptr [rsi + 8]
mov	rsi,    qword ptr [rsi + 16]

mov	rax,    qword ptr [rip + value::slots_strides]
call	    qword ptr [rdi + 8*rax]

neg	        eax
pop	        rcx

The first two instructions read the virtual_ptr from this - placing its content in registers rdi and rsi.

The next two instructions are the method call proper. According to llvm-mca, they take one cycle - the same as a native virtual function call.

When we create the Plus and Negate nodes, we call the conversion constructors of virtual_ptr<Node>, which occur the cost of hash table lookups. However, in this example, we know the exact types of the objects. In that case, we can use final_virtual_ptr to construct the virtual_ptr using a single instruction. For example:

Literal one(1);
Negate neg(boost::openmethod::final_virtual_ptr(one));

…​compiles to:

;; construct Literal
lea	rax, [rip + vtable for Literal+16]
mov	qword ptr [rsp], rax
mov	dword ptr [rsp+8], 1

;; construct Negate
mov	rax, qword ptr [rip+static_vptr<Literal>] ; address of openmethod v-table
lea	rcx, [rip+vtable for Negate+16]           ; address of native v-table
mov	qword ptr [rsp+16], rcx                   ; set native v-table
mov	qword ptr [rsp+24], rax                   ; set openmethod v-table
mov	rax, rsp                                  ; address of 'one'
mov	qword ptr [rsp+32], rax                   ; set vptr object pointer to 'one'

final_virtual_ptr does not require its argument to have a polymorphic type.