Floating Point Numbers

 

Floating Point Number Representation

Floating point numbers are used to represent real numbers in computers. Because real numbers can have many digits, we use scientific notation to represent them in binary.

You probably already know that \(976,000,000\) can be represented as \(9.76 \times 10^{8}\) in scientific notation.

Similarly, in binary, \(1010000\) can be represented as \(1.01 \times 2^{6}\) in scientific notation.

The standard way to represent a floating point number using scientific notation is as follows:

\(1 . \mathrm{m} \times 2^{\mathrm{e}}\)

  • \(\mathrm{m}\): mantissa.

    • More bits in mantissa imply more precision.

  • \(\mathrm{e}\): exponent.

    • More bits in exponent imply a wider range of values we can represent.

A float point number represents a real number using 32 bits, across 3 fields.

$$\begin{array}{|l|l|l|}\hline \text { Field } & {\text { Sign }} & {\text { Exponent }} & {\text { Mantissa }} \\ \hline \# \text { of bits } & {1} \\ \hline \text { Positions of bits } & {31} & {30-23} & {22-0} \\ \hline\end{array}$$

  • Sign: 0 if a number is positive, 1 if a number is negative

  • Exponent: represents \(\mathrm{e}\) field

  • Mantissa: represents \(\mathrm{m}\) field

A major problem is that the exponent field is treated as unsigned, but it has to represent both positive and negative numbers, e.g. \(1.01 \times 2^{6}\) and \(1.01 \times 2^{-6}\). Since there are 8 bits in the exponent field, we can represent values from 0 to 255. We split this range in half by subtracting 127 from the exponent field. Thus, half of the range is positive and the other half is negative.

Thus, a floating point number is actually represented as follows:

\(1 . \mathrm{m} \times 2^{\mathrm{e}-127}\)

Decimal -> Float

  1. Find the binary representation of the decimal.

  2. Put the binary number into a form \(1 . \mathrm{m} \times 2^{\mathrm{e}}\).

  3. The value in the exponent field is \(e+127\).

  4. If there are fewer than 23 bits in the mantissa field, add 0’s to the end until there are 23 bits.

  5. If the number is positive the sign bit is 0, if negative it is 1.

Example: Convert \(9.75\) into its floating point representation.

  1. \(9.75_{10}=1001.11_{2}\).

  2. \(1001.11=1.00111 \times 2^{3}\).

  3. Exponent field \(=3+127=130=10000010_{2}\)

  4. Mantissa field \(=00111\) followed by eighteen 0’s.

  5. Sign field \(=0\).

$$\begin{array}{|l|l|l|}\hline \text { Sign } & {\text { Exponent }} & {\text { Mantissa }} \\ \hline 0 & {10000010} & {00111 \text { followed by eighteen } 0 \text { 's }} \\ \hline\end{array}$$

Float -> Decimal

  1. Extract sign, exponent, and mantissa fields from a float.

  2. Represent the float as \(\pm 1 . \mathrm{m} \times 2^{\mathrm{e}-127}\)

  3. Convert to decimal.

Example: Given a float of \(01000000101000000000000000000000\), find its corresponding decimal value.

$$\begin{array}{|l|l|l|}\hline \text { Sign } & {\text { Exponent }} & {\text { Mantissa }} \\ \hline 0 & {10000001=129_{10}} & {01 \text { followed by 21} 0 \text { 's }} \\ \hline\end{array}$$

2. \(1.01 \times 2^{129-127}=1.01 \times 2^{2}=101\)

3. \(101=5_{10}\)

Special Values

One number that a float cannot represent is \(0\). So if the exponent field is \(0\) it means that the number \(0\).