nvidia

NVIDIA: Llama 3.3 Nemotron Super 49B V1.5

nvidia/llama-3.3-nemotron-super-49b-v1.5

For $1, you can send approximately:

~286messages

How do we get this number?

One message = ~7,000 input tokens + ~7,000 output tokens

Input cost per message7,000 x $0.10/M = $0.000700

Output cost per message7,000 x $0.40/M = $0.002800

Total cost per message$0.003500

Messages for $1285.71

Context window

131k

tokens

Max response

tokens

Input price

$0.10

per million tokens

Output price

$0.40

per million tokens

Modalities

Input:textOutput:text

Description

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...