Performance
Open-methods are almost as fast as ordinary virtual member functions when compiled with optimization.
clang compiles the following code:
void call_poke_via_ref(std::ostream& os, Animal& a) {
poke(os, a);
}
…to this on the x64 architecture (variable names have been shortened for readability):
mov rax, qword ptr [rsi]
mov rdx, qword ptr [rip + hash_mult]
imul rdx, qword ptr [rax - 8]
movzx ecx, byte ptr [rip + hash_shift]
shr rdx, cl
mov rax, qword ptr [rip + vptrs]
mov rax, qword ptr [rax + 8*rdx]
mov rcx, qword ptr [rip + poke::slots_strides]
mov rax, qword ptr [rax + 8*rcx]
jmp rax
llvm-mca estimates a throughput of 4 cycles per dispatch. Comparatively, calling a native virtual functions takes one cycle. However, the difference is amortized by the time spent passing the arguments and returning from the function; plus, of course, executing the body of the function.
Micro benchmarks suggest that dispatching an open-methods with a single virtual argument is between 30% and 50% slower than calling the equivalent virtual function, with an empty body and no other arguments.
However, call_poke
does two things: it constructs a virtual_ptr<Animal>
from
an Animal&
; and then it calls the method. The construction of the
virtual_ptr
is the costly part, as it involves a hash table lookup. Once that
price has been paid, the virtual_ptr
can be used multiple times. It is passed
to the overrider, which can make further method calls through it. It can be
stored in variables in place of plain pointers.
Let’s look at another example: an AST for an arithmetic calculator:
#include <iostream>
#include <boost/openmethod.hpp>
#include <boost/openmethod/compiler.hpp>
using boost::openmethod::virtual_ptr;
struct Node {
virtual ~Node() {}
};
struct Literal : Node {
explicit Literal(int value) : value(value) {}
int value;
};
struct Plus : Node {
Plus(virtual_ptr<Node> left, virtual_ptr<Node> right)
: left(left), right(right) {}
virtual_ptr<Node> left, right;
};
struct Negate : Node {
explicit Negate(virtual_ptr<Node> node) : child(node) {}
virtual_ptr<Node> child;
};
BOOST_OPENMETHOD(value, (virtual_ptr<Node>), int);
BOOST_OPENMETHOD_OVERRIDE(value, (virtual_ptr<Literal> node), int) {
return node->value;
}
BOOST_OPENMETHOD_OVERRIDE(value, (virtual_ptr<Plus> node), int) {
return value(node->left) + value(node->right);
}
BOOST_OPENMETHOD_OVERRIDE(value, (virtual_ptr<Negate> node), int) {
return -value(node->child);
}
BOOST_OPENMETHOD_CLASSES(Node, Literal, Plus, Negate);
auto main() -> int {
boost::openmethod::initialize();
Literal one(1), two(2);
Plus sum(one, two);
Negate neg(sum);
std::cout << value(neg) << "\n"; // -3
return 0;
}
The Negate
overrider compiles to:
mov rdi, qword ptr [rsi + 8]
mov rsi, qword ptr [rsi + 16]
mov rax, qword ptr [rip + value::slots_strides]
call qword ptr [rdi + 8*rax]
neg eax
pop rcx
The first two instructions read the virtual_ptr
from this
- placing its
content in registers rdi
and rsi
.
The next two instructions are the method call proper. According to llvm-mca, they take one cycle - the same as a native virtual function call.
When we create the Plus
and Negate
nodes, we call the conversion
constructors of virtual_ptr<Node>
, which occur the cost of hash table lookups.
However, in this example, we know the exact types of the objects. In that case,
we can use final_virtual_ptr
to construct the virtual_ptr
using a single
instruction. For example:
Literal one(1);
Negate neg(boost::openmethod::final_virtual_ptr(one));
…compiles to:
;; construct Literal
lea rax, [rip + vtable for Literal+16]
mov qword ptr [rsp], rax
mov dword ptr [rsp+8], 1
;; construct Negate
mov rax, qword ptr [rip+static_vptr<Literal>] ; address of openmethod v-table
lea rcx, [rip+vtable for Negate+16] ; address of native v-table
mov qword ptr [rsp+16], rcx ; set native v-table
mov qword ptr [rsp+24], rax ; set openmethod v-table
mov rax, rsp ; address of 'one'
mov qword ptr [rsp+32], rax ; set vptr object pointer to 'one'
final_virtual_ptr
does not require its argument to have a polymorphic type.