NVIDIA Releases TensorRT 8.0 With Big Performance Improvements

Written by Michael Larabel in NVIDIA on 20 July 2021 at 09:00 AM EDT. 5 Comments
NVIDIA
NVIDIA today is making available a much faster version of TensorRT, its SDK for optimized deep learning inference on their GPUs.

With TensorRT 8 that is being made public today, NVIDIA is reporting "2x performance" relative to the existing TensorRT 7 release. That 2x performance is around transformer optimizations while they are also claiming 2x accuracy against TensorRT 7 when using INT8 with quantization aware training.


TensorRT 8 also brings the BERT-Large inference time down to 1.2 ms on a V100, which is 2.5x faster than TensorRT 7. TensorRT 8 also has sparsity support for Ampere GPUs, among other improvements.


TensorRT 8.0 should be available shortly via developer.nvidia.com.
Related News
About The Author
Michael Larabel

Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.

Popular News This Week