Arithmetic Operations

Binary operations:

Unary operations:

`neg()`	per slot negate
`pos()`	per slot positive
`reciprocal()`	per slot reciprocal
`decr()`	per slot decrement
`decr_if()`	per slot decrement, based on a mask
`incr()`	per slot increment
`incr_if()`	per slot increment, based on a mask

Saturated arithmetic:

`sadd()`	per slot saturated addition
`ssub()`	per slot saturated subtraction

Fused operations:

`fma()`	fused multiply add
`fms()`	fused multiply sub
`fnma()`	fused negate multiply add
`fnms()`	fused negate multiply sub

Average computation:

`avg()`	per slot average
`avgr()`	per slot rounded average

template<class T, class A> inline batch<T, A> add(batch<T, A> const &x, batch<T, A> const &y) noexcept

Computes the sum of the batches x and y.

Parameters:

x – batch or scalar involved in the addition.
y – batch or scalar involved in the addition.

Returns:

the sum of x and y

template<class T, class A> inline batch<T, A> decr(batch<T, A> const &x) noexcept

Subtract 1 to batch x.

Parameters:: x – batch involved in the decrement.
Returns:: the subtraction of x and 1.

template<class T, class A, class Mask> inline batch<T, A> decr_if(batch<T, A> const &x, Mask const &mask) noexcept

Subtract 1 to batch x for each element where mask is true.

Parameters:

x – batch involved in the increment.
mask – whether to perform the increment or not. Can be a batch_bool or a batch_bool_constant.

Returns:

the subtraction of x and 1 when mask is true.

template<class T, class A> inline batch<T, A> div(batch<T, A> const &x, batch<T, A> const &y) noexcept

Computes the division of the batch x by the batch y.

Parameters:

x – scalar or batch of scalars
y – scalar or batch of scalars

Returns:

the result of the division.

template<class T, class A> inline batch<T, A> fma(batch<T, A> const &x, batch<T, A> const &y, batch<T, A> const &z) noexcept

Computes (x*y) + z in a single instruction when possible.

Parameters:

x – a batch of integer or floating point values.
y – a batch of integer or floating point values.
z – a batch of integer or floating point values.

Returns:

the result of the fused multiply-add operation.

template<class T, class A> inline batch<T, A> fms(batch<T, A> const &x, batch<T, A> const &y, batch<T, A> const &z) noexcept

Computes (x*y) - z in a single instruction when possible.

Parameters:

x – a batch of integer or floating point values.
y – a batch of integer or floating point values.
z – a batch of integer or floating point values.

Returns:

the result of the fused multiply-sub operation.

template<class T, class A> inline batch<T, A> fnma(batch<T, A> const &x, batch<T, A> const &y, batch<T, A> const &z) noexcept

Computes -(x*y) + z in a single instruction when possible.

Parameters:

x – a batch of integer or floating point values.
y – a batch of integer or floating point values.
z – a batch of integer or floating point values.

Returns:

the result of the fused negated multiply-add operation.

template<class T, class A> inline batch<T, A> fnms(batch<T, A> const &x, batch<T, A> const &y, batch<T, A> const &z) noexcept

Computes -(x*y) - z in a single instruction when possible.

Parameters:

x – a batch of integer or floating point values.
y – a batch of integer or floating point values.
z – a batch of integer or floating point values.

Returns:

the result of the fused negated multiply-sub operation.

template<class T, class A> inline batch<T, A> fmas(batch<T, A> const &x, batch<T, A> const &y, batch<T, A> const &z) noexcept

Computes -(x*y) - z in a single instruction when possible.

Parameters:

x – a batch of integer or floating point values.
y – a batch of integer or floating point values.
z – a batch of integer or floating point values.

Returns:

a batch where each even-indexed element is computed as x * y - z and each odd-indexed element as x * y + z

template<class T, class A> inline batch<T, A> incr(batch<T, A> const &x) noexcept

Add 1 to batch x.

Parameters:: x – batch involved in the increment.
Returns:: the sum of x and 1.

template<class T, class A, class Mask> inline batch<T, A> incr_if(batch<T, A> const &x, Mask const &mask) noexcept

Add 1 to batch x for each element where mask is true.

Parameters:

x – batch involved in the increment.
mask – whether to perform the increment or not. Can be a batch_bool or a batch_bool_constant.

Returns:

the sum of x and 1 when mask is true.

template<class T, class A> inline batch<T, A> mod(batch<T, A> const &x, batch<T, A> const &y) noexcept

Computes the integer modulo of the batch x by the batch y.

Parameters:

x – batch involved in the modulo.
y – batch involved in the modulo.

Returns:

the result of the modulo.

template<class T, class A> inline batch<T, A> mul(batch<T, A> const &x, batch<T, A> const &y) noexcept

Computes the product of the batches x and y.

Template Parameters:

X – the actual type of batch.

Parameters:

x – batch involved in the product.
y – batch involved in the product.

Returns:

the result of the product.

template<class T, class A, class = std::enable_if_t<std::is_integral<T>::value>> inline batch<T, A> mul_lo(batch<T, A> const &x, batch<T, A> const &y) noexcept

Computes the low N bits of the 2N-bit lane-wise product of x and y.

Equivalent to mul(x, y); the low half is identical for signed and unsigned.

Parameters:

x – batch involved in the product.
y – batch involved in the product.

Returns:

the low N bits of the product, lane-wise.

template<class T, class A, class = std::enable_if_t<std::is_integral<T>::value>> inline batch<T, A> mul_hi(batch<T, A> const &x, batch<T, A> const &y) noexcept

Computes the high N bits of the 2N-bit lane-wise product of x and y.

The signedness of T selects the signed or unsigned high half.

Parameters:

x – batch involved in the product.
y – batch involved in the product.

Returns:

the high N bits of the product, lane-wise.

template<class T, class A, class = std::enable_if_t<std::is_integral<T>::value>> inline std::pair<batch<T, A>, batch<T, A>> mul_hilo(batch<T, A> const &x, batch<T, A> const &y) noexcept

Computes the full 2N-bit lane-wise product of x and y as {hi, lo}.

Parameters:

x – batch involved in the product.
y – batch involved in the product.

Returns:

pair of batches {hi, lo}.

template<class T, class A> inline batch<T, A> neg(batch<T, A> const &x) noexcept

Computes the opposite of the batch x.

Parameters:: x – batch involved in the operation.
Returns:: the opposite of x.

template<class T, class A> inline batch<T, A> pos(batch<T, A> const &x) noexcept

No-op on x.

Parameters:: x – batch involved in the operation.
Returns:: x.

template<class T, class A, class = std::enable_if_t<std::is_floating_point<T>::value>> inline batch<T, A> reciprocal(batch<T, A> const &x) noexcept

Computes the approximate reciprocal of the batch x.

The maximum relative error for this approximation is less than 1.5*2^-12.

Parameters:: x – batch of floating point numbers.
Returns:: the reciprocal.

template<class T, class A> inline batch<T, A> sadd(batch<T, A> const &x, batch<T, A> const &y) noexcept

Computes the saturate sum of the batch x and the batch y.

Template Parameters:

X – the actual type of batch.

Parameters:

x – batch involved in the saturated addition.
y – batch involved in the saturated addition.

Returns:

the result of the saturated addition.

template<class T, class A> inline batch<T, A> ssub(batch<T, A> const &x, batch<T, A> const &y) noexcept

Computes the saturate difference of the batch x and the batch y.

Template Parameters:

X – the actual type of batch.

Parameters:

x – batch involved in the saturated difference.
y – batch involved in the saturated difference.

Returns:

the result of the saturated difference.

template<class T, class A> inline batch<T, A> sub(batch<T, A> const &x, batch<T, A> const &y) noexcept

Computes the difference between x and y.

Template Parameters:

X – the actual type of batch.

Parameters:

x – scalar or batch of scalars
y – scalar or batch of scalars

Returns:

the difference between x and y