NTA UGC NET 2023 » NTA Study Materials » Computer Science » Note on Floating Point Representation

Note on Floating Point Representation

The floating-point representation can handle values in the high range. Study the floating-point representation in computer architecture in detail.

To represent all forms of information inside digital computers, the binary number system is used. Binary bits are used to represent alphanumeric characters (i.e., 0 and 1). Digital representations are easier to develop, store, and produce with more accuracy and precision.

For digital number representation, there are several sorts of number representation approaches, such as the hexadecimal number system, binary number system, decimal number system, and octal number system. However, in a digital computer system, the binary number system is the most relevant and common way to represent numbers.

In modern computing, there are two basic techniques for storing real numbers (i.e., numbers containing a fractional component). Fixed Point Notation and Floating-Point Notation are the two. The number of digits after the decimal point in fixed-point notation is fixed, whereas the number of digits after the decimal point in floating-point notation is variable.

Floating-Point Representation

The integer and fractional parts of this format are not given a specified number of bits. Instead, it sets aside a particular number of bits for the number (known as the mantissa or significand) and another number of bits to indicate where the decimal place is inside that number (called the exponent).

The first part of the floating-point representation of data can be defined as the mantissa. The exponent is the second portion that designates the decimal (or binary) point’s position. A fraction of an integer can be used as the fixed point mantissa. A number in the following form is always understood as a floating-point number: Mxre.

Their symbols actually record only the mantissa m and the exponent e in the register. A base 2 exponent is used to represent a floating-point binary number. The floating-point number is considered to be normalised if the mantissa’s important number is none other than 1.

So the true number is (-1)s(1+m)x2(e-Bias), where s is the sign bit, m denotes the mantissa, e denotes the exponent value, and Bias denotes the bias number.

The sign representation, one’s complement representation, and two’s complement representation are all used to express signed integers and exponents.

The floating-point representation allows for greater flexibility. In the normalised form, any non-zero number can be expressed ±(1.b1b2b3 …)2x2n. This is the normalised form of x.

Let’s understand floating-point representation with an example!

Example − Assume a 32-bit number: 1 bit for the sign bit, 8 bits for the signed exponent, and 23 bits for the fractional component. The leading bit 1, which is always 1 for a normalised number, is not kept and is defined as a “hidden bit.”

The value of 53.5 is then normalised to -53.5=(-110101.1)2=(-1.101011)x25, which is represented as given below.

The 8-bit binary value of exponent value +5 is 00000101.

Note that 8-bit exponent ﬁeld is used to store integer exponents -126 ≤ n ≤ 127.

The smallest positive number that ﬁts into 32 bits is (1.00000000000000000000000)2x2-126=2-126≈1.18×10-38 and the largest normalised positive number that ﬁts into 32 bits is (1.11111111111111111111111)2x2127=(224-1)x2104 ≈ 3.40×1038. These figures are expressed in the following way:

The number of places reserved for binary digits plus one is the accuracy of a floating-point format (for the hidden bit). The precision in the instances considered here is 23+1=24.

For the preceding case, the gap is (1+2-23)-1=2-23for, however, due to non-uniform spacing, this is the same as the smallest positive floating-point value, unlike in the fixed-point scenario.

Non-terminating binary numbers, such as 1/3, can be represented in floating-point format, e.g., 1/3 = (0.010101 …)2 cannot be a ﬂoating-point number.

Ieee Floating Point Number Representation

The IEEE (Institute of Electrical and Electronics Engineers) has established a standard for Floating-Point Representation, as seen in the diagram below.

The correct number is (-1)s(1+m)x2(e-Bias), where s stands for the sign bit, m for the mantissa, e for exponent value, and Bias for bias number. The sign bit is 0 for positive values and 1 for negative integers. Exponents are represented using the two’s complement format.

The floating-point number is expressed in the following ways, according to the IEEE 754 standard:

128-bit precision: 1 sign bit, 15-bit exponent, and 112-bit mantissa
64-bit double-precision: 1 sign bit, 11-bit exponent, 52-bit mantissa
32-bit single-precision: 1 sign bit, 8-bit exponent, and 23-bit mantissa
16-bit half-precision: 1 sign bit, 5-bit exponent, and 10-bit mantissa

Conclusion

The great dynamic range of floating-point representation in computer architecture simplifies the design and programming of numerical operations. In the case of fixed-point representations, however, it limits the available accuracy and makes operation implementation slower and more difficult. Furthermore, because it avoids the requirement for particular scaling methods, it may lead to disappointing outcomes in uninformed users.

Frequently asked questions

Get answers to the most common queries related to the NTA Examination Preparation.

What causes floating points to be inaccurate?

Ans. The binary representation of floating-point decimal values is usually not accurate. This is a result of the CPU’s representation of floa...Read full

What can be expressed with floating-point numbers?

Ans. A floating-point representation of data may represent values of different orders of magnitude with a set number of digits: for example, the di...Read full

What causes rounding off problems in floats?

Ans. Because floating-point numbers have a finite number of digits, they cannot precisely represent all real numbers: if there are more digits tha...Read full

What is the method for storing floating-point numbers in memory?

Ans. The significand and the exponent are stored in floating-point values (along with a sign bit). The high-order bit, like signed integer types, d...Read full