nvidia
NVIDIA: Llama 3.1 Nemotron Ultra 253B v1
nvidia/llama-3.1-nemotron-ultra-253b-v1
For $1, you can send approximately:
~59.5messages
How do we get this number?
One message = ~7,000 input tokens + ~7,000 output tokens
Input cost per message7,000 x $0.60/M = $0.004200
Output cost per message7,000 x $1.80/M = $0.012600
Total cost per message$0.016800
Messages for $159.52
Context window
131k
tokens
Max response
0
tokens
Input price
$0.60
per million tokens
Output price
$1.80
per million tokens
Modalities
Input:textOutput:text
Description
Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural...