nvidia

NVIDIA: Llama 3.1 Nemotron Ultra 253B v1

nvidia/llama-3.1-nemotron-ultra-253b-v1

For $1, you can send approximately:

~59.5messages

How do we get this number?

One message = ~7,000 input tokens + ~7,000 output tokens

Input cost per message7,000 x $0.60/M = $0.004200

Output cost per message7,000 x $1.80/M = $0.012600

Total cost per message$0.016800

Messages for $159.52

Context window

131k

tokens

Max response

tokens

Input price

$0.60

per million tokens

Output price

$1.80

per million tokens

Modalities

Input:textOutput:text

Description

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural...