Nvidia used its CES keynote to issue a blunt warning: AI’s next breakthroughs will be limited by compute, not ambition.
Onstage in Las Vegas, CEO Jensen Huang said soaring demand for training, inference, and reasoning is pushing infrastructure to its limits, reshaping how AI systems are built and deployed.
In the keynote, Huang laid out Nvidia’s move beyond chips into full AI platforms, connecting agents, robotics, and systems to limits on computing scale.
Compute is the new ceiling
Huang said the move toward AI-driven software has changed where pressure shows up in computing systems. Modern AI workloads strain entire data centers, including memory, networking, power delivery, and cooling, rather than just individual chips. “You don’t run it on CPUs, you run it on GPUs,” he said, citing accelerated computing as the base layer for how AI systems now operate.
That strain is increasing as reasoning models move into real-world use. Huang said test-time scaling, where systems use more compute to work through problems step by step, and long-context inference are driving higher demand for computing resources. These changes drive significant increases in token generation, data movement, and energy use with each AI interaction.
From chips to systems
According to Huang, Nvidia’s response to rising AI demand is to move beyond selling individual chips and deliver tightly integrated systems designed for modern workloads.
Rather than treating GPUs, CPUs, networking, memory, and cooling as separate layers, Nvidia is building them together so they operate as a single unit. The goal, he said, is to extract more performance and efficiency from every watt of power and every square foot of data center space.
Smarter AI means heavier inference
The rise of reasoning and agentic systems is changing how AI runs in real-world use. Instead of producing a single response, these models plan steps, search for information, call tools, and generate outputs iteratively. “You think in real time,” Huang said, describing test-time scaling as a shift that turns inference into an ongoing process rather than a one-off calculation.
The cost shows up quickly. Huang noted that reasoning-driven agents generate far more tokens, move more data through memory and networks, and keep systems active longer per task. As these agents move into enterprise software and replace traditional interfaces, inference becomes a sustained workload, adding to the infrastructure strain already facing data centers running AI at scale.
Physical AI adds new pressure
Physical AI emerged as a major source of new compute demand, extending AI workloads beyond data centers into vehicles, robots, and industrial systems. Huang described physical AI as requiring systems to understand how the real world behaves, a task that depends on large-scale simulation, continuous inference, and repeated training across many scenarios.
To support that, Nvidia relies on simulation and synthetic data to generate training inputs at scale. Huang said this approach allows the company to “turn compute into data,” using simulated environments to create the rare and complex situations physical AI must handle.
Autonomous vehicles and robots add further pressure, with inference running continuously as systems process sensor data and respond in real time, increasing infrastructure demand beyond traditional AI workloads.
Building infrastructure for AI’s long haul
Huang closed by stressing that sustaining AI’s growth will depend on infrastructure built for continuous scale rather than short-term gains. As models grow more demanding, he emphasized the importance of power efficiency, memory capacity, and high-speed networking in keeping systems viable over time.
The message was consistent with the rest of the keynote. Future AI progress will be shaped by the ability to design and operate systems capable of handling heavy, long-running workloads at scale.
Also read: Nvidia’s system-level ambitions align with recent moves to shore up supply, including a $5 billion investment in Intel’s next-generation fabrication effort.

