inference – msoftnews

Meta unveils Llama API said to deliver record-breaking inference speeds

April 29, 2025 by admin

Tweet At LlamaCon, Meta launched the Llama API in a limited free preview, aiming to increase developer access to its …

Cerebras launches the world’s fastest AI inference, 20X performance compared to NVIDIA

August 27, 2024 by admin

Tweet Cerebras Systems launched Cerebras Inference, the world's fastest AI inference solution. It's 20x faster than NVIDIA's solutions and offers …

Nvidia announces TensorRT 8, slashes BERT inference times down to a millisecond

July 21, 2021 by admin

Tweet Providing over twice the precision and inference speed compared to the last generation, Nvidia’s new TensorRT 8 deep learning …

Researchers accelerate sparse inference on XNNPack and TensorFlow Lite for realtime apps

March 9, 2021 by admin

Tweet XNNPack and TensorFlow Lite now support efficient inference of sparse networks. Researchers demonstrated substantial speedups in inference times on …

FPGA-based inference accelerator outscores GPUs and ASICs in MLPerf benchmark

November 16, 2020 by admin

Tweet Mispology’s FPGA-based Zebra AI inference accelerator outperformed Nvidia A100, V100, Tesla T4, AWS Inferencia, Google TPUv3, and others, on …