NVIDIA registers the world’s quickest BERT training time and largest transformer-based model

Written on:August 13, 2019
Add One

The company’s immensely powerful DGX SuperPOD trains BERT-Large in a record-breaking 53 minutes and trains GPT-2 8B, the world’s largest transformer-based network, with 8.3 billion parameters. Read more…

Leave a Comment

Your email address will not be published. Required fields are marked *