bat365 Homepage

What Is Half Precision?

This video introduces the concept of half precision or float16, a relatively new floating-point data. It can be used to reduce memory usage by half and has become very popular for accelerating deep learning training and inference. We also look at the benefits as well as the tradeoffs over traditional 32-bit single precision or 64-bit double-precision data types for traditional control applications.

Half precision or float16 is a relatively new floating-point data type that uses 16 bits, unlike traditional 32-bit single precision or 64-bit double-precision data types.

So, when you declare a variable as half in MATLAB, say the number pi, you may notice some loss of precision when compared to single or double representation as we see here.

The difference comes from the limited numbers of bits used by half precision. We only have 10 bits of precision and 5 bits for the exponent as opposed to 23 bits of precision and 8 bits for exponent in single. Hence the eps is much larger and also the dynamic range is limited.

So why is it important? Half’s recent popularity is because of its usefulness in accelerating deep learning training and inference mainly on NVIDIA GPUs as highlighted in the articles here. In addition, both Intel and ARM platforms also support half to accelerate computations.

The obvious benefit of using half precision is in reducing the memory and reducing the data bandwidth by 50% as we see here for Resnet50. In addition, the hardware vendors also provide hardware acceleration for computations in half such as the CUDA intrinsics in the case of NVIDIA GPUs.

We are seeing traditional applications such as powertrain control systems do the same where you may have data in the form of lookup tables as shown in a simple illustration here. By using half as the storage type, you are able to reduce the memory footprint of this 2D lookup table by 4x.

However, it is important to understand the tradeoff of the limited precision and range of half precision. For instance, in case of the deep learning network, the quantization error was of the order of 10^-4 and one has to analyze how this impacts the overall accuracy of the network.

This was a short introduction to half precision. Please refer to links below to learn more on how to simulate and generate C/C++ or CUDA code from half in MATLAB and Simulink.