# How to approximate float function by integer numbers?

11 views (last 30 days)
Dan Richter on 20 Dec 2022
Commented: Walter Roberson on 29 Dec 2022
I need to spare same space in 2 kB Flash MCU to finish with the program to control servos where x = 0.0° to 90.0°
How to approximate the function float y = x / 90 * 1250 + 3750 by an integer function, preferably using uint16_t
The integer divisor should be a power of 2.

Les Beckham on 20 Dec 2022
I'm assuming you want to replace the floating point calculation with an integer one? I'm not sure how that is going to "spare some space" unless you can't do floating point calculations and you don't have room for floating point emulation in software.
There are a lot of possible ways to do that depending on details of your application which you didn't provide.
This example comes pretty close.
x = 0:0.1:90;
y = x./90*1250 + 3750;
x2 = uint16(0:90 * 111); % scale your x values by 111
y2 = x2/8 + 3750; % replace /8 with right-shift 3 places
plot(x, y, x2/111, y2)
xlabel 'x'
ylabel 'y'
legend('Float calculation', 'Integer calculation', 'Location', 'southeast')
grid on ##### 2 CommentsShowHide 1 older comment
Les Beckham on 29 Dec 2022
You are quite welcome.

Walter Roberson on 29 Dec 2022
General exact process:
This can be done more compactly using bitand() and bitget() and similar, but sometimes it is easier to think in terms of streams of bits.
format long g
R = (1250/90)
R =
13.8888888888889
U64 = typecast(double(R), 'uint64');
B64 = dec2bin(U64, 64);
numerator = int64(bin2dec(['1', B64(end-51:end)]))
numerator = int64
7818749353073778
denominator = 2^(510 + 52 - bin2dec(B64(2:11)))
denominator =
562949953421312
sign = 2 * (B64(1)=='0') - 1
sign =
1
reconstructed = sign * double(numerator) / double(denominator)
reconstructed =
13.8888888888889
R - reconstructed
ans =
0
Approximating with a 16 bit denominator would take more work. Or perhaps less...
Walter Roberson on 29 Dec 2022
format long g
R = (1250/90)
R =
13.8888888888889
D = 15 - ceil(log2(abs(R)));
denominator = uint16(2^D)
denominator = uint16
2048
sign = 1; if R < 0; sign = -1; end
numerator = sign * int16(floor(abs(R) * denominator))
numerator = int16
28444
reconstructed = double(numerator) / double(denominator)
reconstructed =
13.888671875
R - reconstructed
ans =
0.000217013888889284
When you look at those, at first it looks as if it would be plausible that you could gain another bit of accuracy by using a numerator one bit different from twice as large as the existing one, so 56888 +/- 1. But if you do that then you lose the room for the numerator to be negative.
This code will not work properly for input values less than 1.

### Categories

Find more on Pulse and Transition Metrics in Help Center and File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!