Inspur Releases TensorFlow-Supported FPGA Compute Acceleration Engine TF2

Author: Inspur Electronic Information Industry Co., Ltd / 2023-07-24 13:26 / Source: Inspur Electronic Information Industry Co., Ltd

Inspur Releases TensorFlow-Supported FPGA Compute Acceleration Engine TF2

FREMONT,Calif.,Aug. 25,2018 -- On August 23,at KDD2018 London -- a premier global conference focused on artificial intelligence -- Inspur released the FPGA computing acceleration engine TF2 supporting TensorFlow,which helps AI customers quickly implement FPGAs based on mainstream AI training software and deep neural network model DNN on inference. It delivers high performance and low latency for AI applications through the world's first DNN shifting technology on FPGAs.

At present,using the FPGA technology to achieve customizable,low latency,high performance and high power-consumption ratio for AI inference application has become the technical route adopted by many AI companies. However,before FPGA technology enters into large-scale AI business deployment,there are still many challenges such as high software writing threshold,limited performance optimization,and difficult power control. The goal of Inspur's TF2 Compute Acceleration Engine is to solve these challenges for customers.

The TF2 computing acceleration engine consists of two parts. The first part is the model optimization conversion tool TF2 Transform Kit,which optimizes and transforms the deep neural network model data trained by the framework such as TensorFlow. It greatly reduces the size of the model data file,as it can compress 32-bit floating-point model data into a 4-bit integer data model,making the actual model data file size smaller than the original 1/8 and basically keeps the rule storage of the original model data. The second part is the FPGA intelligent running engine TF2 Runtime Engine. It can automatically convert the previously optimized model file into FPGA target running file. In order to eliminate the dependence of deep neural network such as CNN on FPGA floating-point computing power,Inspur designed the innovative shift computing technology,which can quantize 32-bit float-point into 8-bit integer data. Combined with the aforementioned 4-bit integer model data,the conversion convolution operation floating-point multiplication is calculated as an 8-bit integer shift operation,which greatly improves the FPGA for inference calculation performance and effectively reduces its actual operating power consumption. This is also the world's first case of implementing the shift operation of deep neural network DNN on FPGA under the premise of maintaining the accuracy of the original model.

The SqueezeNet model on the Inspur F10A FPGA card shows excellent computational performance for the TF2 computing acceleration engine. The F10A is the world's first half-height and half-length FPGA accelerator card to support the Arria 10 chip. SqueezeNet is a typical convolutional neural network architecture which is a streamlining model but its accuracy is comparable to AlexNet. It is especially suitable for image-based AI applications with high real-time requirements. Running the SqueezeNet model optimized by the TF2 engine on the F10A,the calculation time of a single picture is 0.674ms while maintaining the original accuracy. It is slightly better than the currently widely used GPU P4 accelerator card in terms of calculation accuracy and delay.

Device

Peak Power

Date Type

Top1

Top5

FPS (images/s)

F10A

45W

INT8

57.62%

79.98%

1484

75W

FP32

58.14%

80.79%

1323

75W

INT8

56.79%

79.76%

1456

TF2 w/ F10A VS GPU

The Inspur TF2 computing acceleration engine improves the AI calculation performance on the FPGA through the technical innovations such as shift calculation and model optimization,and lowers the AI software implementation threshold of the FPGA. It supports the FPGA to be widely used in the AI ecosystem to promote more AI applications. Inspur plans to open TF2 to its AI customers,and will continue to upgrade and develop optimization technologies that can support multiple models,the latest deep neural network model and FPGA accelerator cards using with the latest chip. It is expected that the performance of the next-generation high-performance FPGA accelerator card will be three times of F10A.

Inspur is the world's leading AI computing platform provider,offering a four-layer AI stack of computing hardware,management suite,framework optimization,and application acceleration to build an agile,efficient,and optimized AI infrastructure. Inspur has become the most important AI server supplier for Baidu,Ali and Tencent,and has maintained close collaboration in systems and applications with leading AI companies such as Iflytek,SenseTime,Fac++,Toutiao and Didi. Inspur strives to help AI customers achieve maximum application performance improvement in voice,image,video,search engine,and network. According to IDC's 2017 China AI Infrastructure Market Research Report,Inspur's AI server market share reached 57% in the last year.