What's the expected range of alpha?
Using floating point division
float xf = q_to_float(x,31);
float yf = 1.0f/(1.0f+xf);
int y = float_to_q(yf,31);
measured 24 clock cycles, the floating point divide is 14 cycles by itself
But, interesting note:
Integer-only instructions following VDIVR or VSQRT instructions complete out-of-order. VDIV and VSQRT instructions take one cycle if no further floating-point instructions are executed.
Using integer division
int32_t q = 0x10000+(x>>15);
y = (0x7FFFFFFF/q)<<16;
This output only has the 16 top bits used, and the 15 LSB's of the input are ignored.
Measured: 4 clock cycles.
Using a Taylor series approximation
1/(1-x) = 1 + x + x^2 + x^3 + x^4 + ...
but this is only slowly convergent when |x| > 0.5
int32_t mx = -alpha;
y = mx + 0x7FFFFFFF;
int32_t mxx = ___SMMUL(mx,mx)<<1;
y += mxx;
int32_t mxxx = ___SMMUL(mxx,mx)<<1;
y += mxxx;
int32_t mxxxx = ___SMMUL(mxxx,mx)<<1;
y += mxxxx;
int32_t mxxxxx = ___SMMUL(mxxxx,mx)<<1;
y += mxxxxx;
measured 14 clock cycles
This can further be accelerated by replacing
___SMMUL(....,mx<<1) but that overflows when
All these implementations assume q31 input and output. The 1st one saturates at conversion from float to integer, the 2nd and 3rd overflow.
And probably other approximations are possible too...