nvidia
NVIDIA: Llama 3.1 Nemotron 70B Instruct
nvidia/llama-3.1-nemotron-70b-instruct
For $1, you can send approximately:
~59.5messages
How do we get this number?
One message = ~7,000 input tokens + ~7,000 output tokens
Input cost per message7,000 x $1.20/M = $0.008400
Output cost per message7,000 x $1.20/M = $0.008400
Total cost per message$0.016800
Messages for $159.52
Context window
131k
tokens
Max response
16k
tokens
Input price
$1.20
per million tokens
Output price
$1.20
per million tokens
Modalities
Input:textOutput:text
Description
NVIDIA's Llama 3.1 Nemotron 70B is a language model designed for generating precise and useful responses. Leveraging [Llama 3.1 70B](/models/meta-llama/llama-3.1-70b-instruct) architecture and Reinforcement Learning from Human Feedback (RLHF), it excels...