Explaining NVIDIA NVFP4, the DGX Spark's Secret Weapon

Developed by NVIDIA, NVFP4 helps larger AI models run faster on a wider range of hardware.

Matthew Smith • Jan 20, 2026

Photo: Jacob Bobo

The NVIDIA DGX Spark is among the most interesting pieces of computer hardware released in the past few years. It’s effectively a desktop PC, but one built entirely around NVIDIA hardware and targeted at AI workloads instead of gaming or general use. It serves up an NVIDIA GPU and CPU alongside 128GB of memory and 4TB of storage.

Such beefy specifications tell you that the DGX Spark is a performer, and it is, but focusing only on the hardware leaves out one important part of the equation: NVIDIA’s NVFP4. Introduced in June, 2025, NVFP4 is a new 4-bit floating point format designed to help larger models run on desktop hardware with less of a penalty for accuracy and intelligence.

Using NVFP4 requires some effort and intention on your part, however, both to understand and to use.

Quantization: A must-have for AI on the PC

NVFP4 is a form of quantization. You’ll need to know about quantization to understand NVFP4, so here’s a crash course.

An LLM contains billions, and sometimes trillions, of numbers (called parameters) that represent patterns learned from training data. A single number doesn’t require much computer memory but when you have billions of them, well, they add up. Most LLMs weigh in at a dozen to several dozen gigabytes, and larger models run into the hundreds, all of which must be loaded into memory for the model to function.

Most modern computer programs use 32-bit math as the default, which literally means a number is represented by a string of 32 binary numbers (1 or 0). The full-fat versions of LLMs often use 16-bit or sometimes 8-bit math. However, it’s always possible to convert a model to represent numbers with fewer bits.

That’s quantization.

NVFP4 is a new way to quantize

Quantization makes it possible to run an LLM containing a given number of parameters with less memory. But, that reduction in precision can also reduce the model’s intelligence. Representing the numbers in a model with fewer bits means some of those numbers won’t be right.

But it turns out there’s many ways to quantize. For example, a paper released in 2023 tested a range of methods used to quantize Meta’s Llama-7B and Llama-13B. It found the best 4-bit quantization methods reduced benchmark performance by less than 10%, while the worst reduced benchmark performance by half.

However, independent tests seem to indicate that speed is the real story of NVFP4 (for desktop and laptop PC users, at least). AI/ML research scientist Benjamin Marie found that models using NVFP4 were two to three times quicker, as measured by tokens output per second, than models quantized with other 4-bit formats. That could make larger models quantized to 4 bits feel a lot more usable on desk or PC hardware.

That’s important for devices with relatively modest hardware, including NVIDIA’s DGX Spark. As powerful though it may be, it’s still a small device using LPDDR5x RAM (which has limited memory bandwidth compared to VRAM) and a 240-watt internal power supply. You’re going to want NVFP4’s improved inference performance for best results.

A quick deep dive into NVFP4

So, NVFP4 can be used to quantize a model so that it uses less memory and delivers better inference performance on given hardware. But it’s not the only 4-bit quantization method. So, what does it do differently?

First, NVFP4 uses groups of smaller "micro-blocks" of 16 values that share a scaling factor, compared to the 32-value blocks used by MXFP4 (another 4-bit format used by OpenAI’s GPT-OSS). According to research from NVIDIA and Intel, this makes it easier to account for local variations in values.

$a fractional scale graph$
Source: NVIDIA

NVFP4 also has more precise scaling factors. MXFP4 can only use a power-of-two scale (meaning numbers like 1, 2, 4, 8, etc). NVFP4 uses 8-bit floating point (FP8) scales with higher precision. It allows for more accurate quantization.

And NVFP4 uses two levels of scaling. The FP8 scale is for each 16-value block, but a broader FP32 scale is available across an entire tensor. This helps NVFP4 handle variations in values both within and across blocks.

If that’s confusing, you’re in good company. It makes my head spin, too. If you want to dive a bit deeper, I recommend this video from Julia Turc, an ex-Google AI researcher. It explains 4-bit quantization generally and includes details on various 4-bit quantization methods including NVFP4.

Feature	NVIDIA NVFP4	MXFP4 (OCP Standard)
Micro-block Size	16 values per block	32 values per block
Scaling Precision	8-bit (FP8) high-precision	Power-of-two (integer)
Hardware Support	Blackwell Architecture (Native)	Broad (Hopper/Blackwell/FP8-capable)
Target Environment	Local AI / Edge Workstations	Large-scale Data Center
Performance Benefit	~2x throughput vs. 4-bit baseline	Standard Efficiency
Accuracy Loss	< 1% on 70B+ parameter models	Variable based on block size

What you need to run NVFP4

Models that use NVFP4 look like a great option for local AI inference performance. You can get a nice performance improvement without much if any noticeable reduction in model quality. However, a lot needs to align to actually use NVFP4.

First up, you need a GPU that uses a high-end variant of NVIDIA’s Blackwell architecture (or newer, if you’re reading this article in the future). That’s because Blackwell was designed at an architecture level with features that accelerate NVFP4. For desktop and laptop PCs, that means you’re going to ideally need a DGX Spark or DGX Station, or a GPU from the NVIDIA RTX PRO 6000 line.

What about the RTX 50-series? It seems it technically should work, since RTX 50-series GPUs use the Blackwell architecture, but there’s not much documentation on it at this point. Rather, most developer chat about NVFP4 on the RTX 50-series is about bugs discovered in the attempt. I also looked into running NVFP4 through LM Studio and Ollama on my own RTX 50-series laptop, but I haven't been able to find a way to do it yet

You’ll also need a model that was trained or quantized for NVFP4. While there’s quite a few now available, the selection is still a bit limited compared to the tens of thousands of other models that exist.

The future of NVFP4

The current state of NVFP4 is very much a “building the rails in front of the train” situation. NVIDIA only announced it a few months ago and papers describing its capabilities for training and quantization are even more recent.

Still, NVFP4 is important to NVIDIA. Just look at the last graph of NVIDIA’s NVFP4 announcement.

Source: NVIDIA

This graph promises that the Blackwell architecture can provide a 50x power efficiency improvement over Hopper when NVFP4 is in use. That’s a big deal for leading-edge AI labs and home users alike.

While it’s not a perfect analogy, I’m reminded of the early days of RTX ray tracing. It was promising from the start and solved a real problem developers faced, but it took a few years before it became widespread. I think the same could prove true for NVFP4.

More from MC News

Matthew S. Smith is a prolific tech journalist, critic, product reviewer, and influencer from Portland, Oregon. Over 16 years covering tech he has reviewed thousands of PC laptops, desktops, monitors, and other consumer gadgets. Matthew also hosts Computer Gaming Yesterday, a YouTube channel dedicated to retro PC gaming, and covers the latest artificial intelligence research for IEEE Spectrum.

DGX Spark; 20 core Arm, 10 Cortex-X925 + 10 Cortex-A725 Arm; 128GB LPDDR5x Unified RAM; 4TB Solid...

Original price $4,699.99

Todays price $4,499.99

Core Ultra 9 285K Arrow Lake Twenty Four-Core LGA 1851 Boxed Processor - Heatsink Not Included

Original price $649.99

Todays price $469.99

NVIDIA GeForce RTX 5090 ROG Astral Overclocked Triple Fan 32GB GDDR7 PCIe 5.0 Graphics Card

Todays price $4,829.99

Ryzen 9 9950X3D Granite Ridge AM5 4.30GHz 16-Core Boxed Processor - Heatsink Not Included

Original price $699.99

Todays price $639.99

Explaining NVIDIA NVFP4, the DGX Spark's Secret Weapon

Quantization: A must-have for AI on the PC

NVFP4 is a new way to quantize

A quick deep dive into NVFP4

What you need to run NVFP4

The future of NVFP4

Comment on This Post

See More Blog Categories

Buying Guides

How-To

PC Build Guides

Recent Posts

Reading vs. Answering: The Two Hardware Bottlenecks Behind Local AI Performance

The Micro Center 2026 Back-to-School Guide: Essential Tech for Students

Riftbound: Vendetta Launches at Micro Center on July 31

Explaining NVIDIA NVFP4, the DGX Spark's Secret Weapon

Quantization: A must-have for AI on the PC

NVFP4 is a new way to quantize

A quick deep dive into NVFP4

What you need to run NVFP4

The future of NVFP4

Comment on This Post

See More Blog Categories

Buying Guides

How-To

PC Build Guides

Recent Posts

Reading vs. Answering: The Two Hardware Bottlenecks Behind Local AI Performance

The Micro Center 2026 Back-to-School Guide: Essential Tech for Students

Riftbound: Vendetta Launches at Micro Center on July 31

Sign in for the best experience