Arch Dispatching
xsimd provides a generic way to dispatch a function call based on the architecture the code was compiled for and the architectures available at runtime.
The xsimd::dispatch()
function takes a functor whose call operator takes an architecture parameter as first operand, followed by any number of arguments Args...
and turn it into a
dispatching functor that takes Args...
as arguments.
-
template<class ArchList = supported_architectures, class F>
inline detail::dispatcher<F, ArchList> xsimd::dispatch(F &&f) noexcept
Following code showcases a usage of the xsimd::dispatch()
function:
#include "sum.hpp"
// Create the dispatching function, specifying the architecture we want to
// target.
auto dispatched = xsimd::dispatch<xsimd::arch_list<xsimd::avx2, xsimd::sse2>>(sum{});
// Call the appropriate implementation based on runtime information.
float res = dispatched(data, 17);
This code does not require any architecture-specific flags. The architecture specific details follow.
The sum.hpp
header contains the function being actually called, in an
architecture-agnostic description:
#ifndef _SUM_HPP
#define _SUM_HPP
#include "xsimd/xsimd.hpp"
// functor with a call method that depends on `Arch`
struct sum
{
// It's critical not to use an in-class definition here.
// In-class and inline definition bypass extern template mechanism.
template <class Arch, class T>
T operator()(Arch, T const* data, unsigned size);
};
template <class Arch, class T>
T sum::operator()(Arch, T const* data, unsigned size)
{
using batch = xsimd::batch<T, Arch>;
batch acc(static_cast<T>(0));
const unsigned n = size / batch::size * batch::size;
for (unsigned i = 0; i != n; i += batch::size)
acc += batch::load_unaligned(data + i);
T star_acc = xsimd::reduce_add(acc);
for (unsigned i = n; i < size; ++i)
star_acc += data[i];
return star_acc;
}
// Inform the compiler that sse2 and avx2 implementation are to be found in another compilation unit.
extern template float sum::operator()<xsimd::avx2, float>(xsimd::avx2, float const*, unsigned);
extern template float sum::operator()<xsimd::sse2, float>(xsimd::sse2, float const*, unsigned);
#endif
The SSE2 and AVX2 version needs to be provided in other compilation units, compiled with the appropriate flags, for instance:
// compile with -mavx2
#include "sum.hpp"
template float sum::operator()<xsimd::avx2, float>(xsimd::avx2, float const*, unsigned);
// compile with -msse2
#include "sum.hpp"
template float sum::operator()<xsimd::sse2, float>(xsimd::sse2, float const*, unsigned);