# Bisection Algorithm to Calculate Square Root of an Unsigned Fixed-Point Number

This example shows how to generate HDL code from MATLAB® design implementing a bisection algorithm to calculate the square root of a number in fixed point notation.

Same implementation, originally using n-multipliers in HDL code, for wordlength `n`

, under sharing and streaming optimizations, can generate HDL code with only one multiplier demonstrating the power of MATLAB® HDL Coder™ optimizations.

The design of the square-root algorithm shows the pipelining concepts to achieve a fast clock rate in resulting RTL design. Since this design is already in fixed point, you don't need to run fixed-point conversion.

### MATLAB Design

% Design Sqrt design_name = 'mlhdlc_sqrt'; % Test Bench for Sqrt testbench_name = 'mlhdlc_sqrt_tb';

Lets look at the Sqrt Design

dbtype(design_name)

1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 2 % MATLAB design: Pipelined Bisection Square root algorithm 3 % 4 % Introduction: 5 % 6 % Implement SQRT by the bisection algorithm in a pipeline, for unsigned fixed 7 % point numbers (also why you don't need to run fixed-point conversion for this design). 8 % The demo illustrates the usage of a pipelined implementation for numerical algorithms. 9 % 10 % Key Design pattern covered in this example: 11 % (1) State of the bisection algorithm is maintained with persistent variables 12 % (2) Stages of the bisection algorithm are implemented in a pipeline 13 % (3) Code is written in a parameterized fashion, i.e. word-length independent, to work for any size fi-type 14 % 15 % Ref. 1. R. W. Hamming, "Numerical Methods for Scientists and Engineers," 2nd, Ed, pp 67-69. ISBN-13: 978-0486652412. 16 % 2. Bisection method, http://en.wikipedia.org/wiki/Bisection_method, (accessed 02/18/13). 17 % 18 19 % Copyright 2013-2015 The MathWorks, Inc. 20 21 %#codegen 22 function [y,z] = mlhdlc_sqrt( x ) 23 persistent sqrt_pipe 24 persistent in_pipe 25 if isempty(sqrt_pipe) 26 sqrt_pipe = fi(zeros(1,x.WordLength),numerictype(x)); 27 in_pipe = fi(zeros(1,x.WordLength),numerictype(x)); 28 end 29 30 % Extract the outputs from pipeline 31 y = sqrt_pipe(x.WordLength); 32 z = in_pipe(x.WordLength); 33 34 % for analysis purposes you can calculate the error between the fixed-point bisection routine and the floating point result. 35 %Q = [double(y).^2, double(z)]; 36 %[Q, diff(Q)] 37 38 % work the pipeline 39 for itr = x.WordLength-1:-1:1 40 % move pipeline forward 41 in_pipe(itr+1) = in_pipe(itr); 42 % guess the bits of the square-root solution from MSB to the LSB of word length 43 sqrt_pipe(itr+1) = guess_and_update( sqrt_pipe(itr), in_pipe(itr+1), itr ); 44 end 45 46 %% Prime the pipeline 47 % with new input and the guess 48 in_pipe(1) = x; 49 sqrt_pipe(1) = guess_and_update( fi(0,numerictype(x)), x, 1 ); 50 51 %% optionally print state of the pipeline 52 %disp('************** State of Pipeline **********************') 53 %double([in_pipe; sqrt_pipe]) 54 55 return 56 end 57 58 % Guess the bits of the square-root solution from MSB to the LSB in 59 % a binary search-fashion. 60 function update = guess_and_update( prev_guess, x, stage ) 61 % Key step of the bisection algorithm is to set the bits 62 guess = bitset( prev_guess, x.WordLength - stage + 1); 63 % compare if the set bit is a candidate solution to retain or clear it 64 if ( guess*guess <= x ) 65 update = guess; 66 else 67 update = prev_guess; 68 end 69 return 70 end

### Simulate the Design

It is always a good practice to simulate the design with the testbench prior to code generation to make sure there are no runtime errors.

mlhdlc_sqrt_tb

Iter = 01| Input = 0.000| Output = 0000000000 (0.00) | actual = 0.000000 | abserror = 0.000000 Iter = 02| Input = 0.000| Output = 0000000000 (0.00) | actual = 0.000000 | abserror = 0.000000 Iter = 03| Input = 0.000| Output = 0000000000 (0.00) | actual = 0.000000 | abserror = 0.000000 Iter = 04| Input = 0.000| Output = 0000000000 (0.00) | actual = 0.000000 | abserror = 0.000000 Iter = 05| Input = 0.000| Output = 0000000000 (0.00) | actual = 0.000000 | abserror = 0.000000 Iter = 06| Input = 0.000| Output = 0000000000 (0.00) | actual = 0.000000 | abserror = 0.000000 Iter = 07| Input = 0.000| Output = 0000000000 (0.00) | actual = 0.000000 | abserror = 0.000000 Iter = 08| Input = 0.000| Output = 0000000000 (0.00) | actual = 0.000000 | abserror = 0.000000 Iter = 09| Input = 0.000| Output = 0000000000 (0.00) | actual = 0.000000 | abserror = 0.000000 Iter = 10| Input = 0.000| Output = 0000000000 (0.00) | actual = 0.000000 | abserror = 0.000000 Iter = 11| Input = 4.625| Output = 0000010000 (2.00) | actual = 2.150581 | abserror = 0.150581 Iter = 12| Input = 12.500| Output = 0000011100 (3.50) | actual = 3.535534 | abserror = 0.035534 Iter = 13| Input = 16.250| Output = 0000100000 (4.00) | actual = 4.031129 | abserror = 0.031129 Iter = 14| Input = 18.125| Output = 0000100010 (4.25) | actual = 4.257347 | abserror = 0.007347 Iter = 15| Input = 20.125| Output = 0000100010 (4.25) | actual = 4.486090 | abserror = 0.236090 Iter = 16| Input = 21.875| Output = 0000100100 (4.50) | actual = 4.677072 | abserror = 0.177072 Iter = 17| Input = 35.625| Output = 0000101110 (5.75) | actual = 5.968668 | abserror = 0.218668 Iter = 18| Input = 50.250| Output = 0000111000 (7.00) | actual = 7.088723 | abserror = 0.088723 Iter = 19| Input = 54.000| Output = 0000111010 (7.25) | actual = 7.348469 | abserror = 0.098469 Iter = 20| Input = 62.125| Output = 0000111110 (7.75) | actual = 7.881941 | abserror = 0.131941 Iter = 21| Input = 70.000| Output = 0001000010 (8.25) | actual = 8.366600 | abserror = 0.116600 Iter = 22| Input = 81.000| Output = 0001001000 (9.00) | actual = 9.000000 | abserror = 0.000000 Iter = 23| Input = 83.875| Output = 0001001000 (9.00) | actual = 9.158330 | abserror = 0.158330 Iter = 24| Input = 83.875| Output = 0001001000 (9.00) | actual = 9.158330 | abserror = 0.158330 Iter = 25| Input = 86.875| Output = 0001001010 (9.25) | actual = 9.320676 | abserror = 0.070676 Iter = 26| Input = 95.125| Output = 0001001110 (9.75) | actual = 9.753205 | abserror = 0.003205 Iter = 27| Input = 97.000| Output = 0001001110 (9.75) | actual = 9.848858 | abserror = 0.098858 Iter = 28| Input = 101.375| Output = 0001010000 (10.00) | actual = 10.068515 | abserror = 0.068515 Iter = 29| Input = 102.375| Output = 0001010000 (10.00) | actual = 10.118053 | abserror = 0.118053 Iter = 30| Input = 104.250| Output = 0001010000 (10.00) | actual = 10.210289 | abserror = 0.210289

### Create a New HDL Coder Project

coder -hdlcoder -new mlhdlc_sqrt_prj

Next, add the file `mlhdlc_sqrt.m`

to the project as the MATLAB Function and `mlhdlc_sqrt_tb.m`

as the MATLAB Test Bench.

For a more complete tutorial on creating and populating MATLAB HDL Coder projects, see Get Started with MATLAB to HDL Workflow.

### Run HDL Code Generation

This design is already in fixed point and suitable for HDL code generation. It is not desirable to run floating point to fixed point advisor on this design.

Launch the Workflow Advisor.

Under

**Define Input Types**, choose`Keep original types`

for the option**Fixed-point conversion**.Under the

**Optimizations**tab in the**RAM Mapping**box, uncheck**MAP persistent variables to RAMs**. Unchecking this option prevents the pipeline from being inferred as RAM.Optionally, under the

**Optimizations > Area Optimizations**tab, you might want to set**Resource sharing factor**equal to the`wordlength`

(10 here) and select**Stream Loops**under the**Loop Optimizations**tab. Make sure to check**Distributed Pipelining**when you enable the optimizations.Click on the

**Code Generation**step and click**Run**.

Examine the generated HDL code by clicking on the hyperlinks in the Code Generation Log window.

### Examine the Synthesis Results

Run the logic synthesis step with the following default options if you have ISE installed on your machine.

In the synthesis report, note the clock frequency reported by the synthesis tool without any optimization options enabled.

Typically the timing performance of this design, using Xilinx ISE synthesis tool for the

`Virtex7`

chip family, device`xc7v285t`

, speed grade`-3`

, is around`229MHz`

with a maximum combinatorial path delay of`0.406ns`

.Optimizations for this design (loop streaming and multiplier sharing) work to reduce resource usage, with a moderate trade-off on timing. For the particular word-length size in test bench you see a reduction of

`n`

multipliers to one.