nvidia

NVIDIA: Llama 3.1 Nemotron 70B Instruct

nvidia/llama-3.1-nemotron-70b-instruct

For $1, you can send approximately:

~59.5messages

How do we get this number?

One message = ~7,000 input tokens + ~7,000 output tokens

Input cost per message7,000 x $1.20/M = $0.008400

Output cost per message7,000 x $1.20/M = $0.008400

Total cost per message$0.016800

Messages for $159.52

Context window

131k

tokens

Max response

16k

tokens

Input price

$1.20

per million tokens

Output price

$1.20

per million tokens

Modalities

Input:textOutput:text

Description

NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels...