From Silicon to Intelligence: Understanding the Hardware Behind AI

A short video about NPUs and TPUs led to a deeper look at the physical side of AI. From the Neural Engine in your iPhone to the massive processors powering data-centre models.

From Silicon to Intelligence: Understanding the Hardware Behind AI
Bridge technology.

It began with a short video. FinninTech explained the difference between TPUs and NPUs. A brief clip that suddenly made the invisible world of AI hardware tangible.

Tiffany Janzen (@tiffintech) on Threads
What is the difference between NPUs and TPUs?! Here is a simple explanation! You’re going to start hearing about NPUs everywhere so it is good to understand why tech companies have become so obsessed with them 💡 #tech #stem #techexplained

That curiosity sent me down a path connecting the chip in my phone to the massive processors that train models like ChatGPT.

Quick takeaways

  • CPUs, GPUs, TPUs and NPUs form a spectrum of specialisation: from flexible generalists to highly efficient AI specialists.
  • The iPhone’s Neural Engine is Apple’s name for its NPU, a miniature AI processor for local tasks.
  • FLOPS and TOPS measure different kinds of computing power: precision versus speed.
  • Export limits on chips such as NVIDIA’s H100 show how computing power has become a geopolitical factor.

The spectrum of specialisation

Artificial intelligence may feel abstract, but it’s built on physical hardware — billions of transistors arranged for specific kinds of work.

At one end stands the CPU, a flexible all-rounder that handles logic and control. Then come GPUs, vast grids of simple cores designed for parallel maths. Beyond those lie TPUs and NPUs, processors made specifically for neural networks.

You can picture it as a line:

CPU → GPU → TPU / NPU
As you move right, flexibility decreases, but efficiency for AI tasks rises sharply.

Where a CPU handles general tasks, a GPU multiplies matrices, a TPU accelerates training in Google’s data centres, and an NPU performs small-scale AI tasks efficiently on your device.

The chip in your pocket

Apple’s A17 Pro chip, used in the iPhone 16 Pro and newer models, combines three types of processors:

a CPU for everyday applications, a GPU for graphics, and a Neural Engine for machine learning.

This Neural Engine performs around 35 trillion operations per second, powering on-device features such as transcription, photo recognition, and real-time translation. It consumes only a few watts, roughly a hundred thousand times less power than a data-centre GPU, yet fast enough for personal AI.

A17 Pro has the Neural Engine on it, Apple's NPU.

FLOPS and TOPS: the language of compute

FLOPS (floating-point operations per second) measure the ability to handle precise arithmetic -> needed for training large models.

TOPS (tera-operations per second) describe simpler, lower-precision calculations -> ideal for running those models efficiently.

Training requires floating-point accuracy and immense power; inference, which happens on your phone, can use integer maths to save energy.

In short: GPUs and TPUs are measured in FLOPS, NPUs in TOPS.

TPU vs GPU: same idea, different philosophy

A GPU is a programmable engine for parallel work and it is built for graphics, later adopted for AI.

A TPU is Google’s own design: a tensor processor built from the ground up for machine learning.

It’s not a GPU, but it draws on the same principle and that is performing many operations in parallel.

While GPUs remain flexible, TPUs are hard-wired for the algebra behind neural networks, making them faster and more efficient for that single purpose.

đź’ˇ
Coming soon: Microsoft’s Maia. Maia is Microsoft’s own AI accelerator, optimised for transformer workloads. Functionally it resembles Google’s TPU family, but it has its own architecture, software stack, and integration into Azure.

The far end of the spectrum

In data centres, processors such as NVIDIA’s H100 or B100 dominate.
Each consumes hundreds of watts and delivers several petaflops of performance. These chips now sit at the centre of export restrictions, because such computing capacity determines who can train the next generation of large models.

To comply with U.S. limits, NVIDIA built slower versions (A800, H800) for the Chinese market. It is the same hardware, with reduced interconnect speed.
The boundaries of computing power have become geopolitical borders.

đź’ˇ
The NVIDIA H100 and Google’s TPU both power today’s AI revolution, but they aren’t one-to-one rivals. The H100 is a flexible, general-purpose GPU evolved for deep learning; the TPU is a purpose-built tensor processor optimised for Google’s own ecosystem. They meet at the same goal, accelerating neural computation, from opposite ends of the design spectrum.

From abstraction to atoms

Once you see AI through its hardware, it feels less ethereal.
Every neural network, from the model in your phone to the ones shaping global research, depends on physical constraints: heat, energy, and silicon.

Understanding this spectrum, from the Neural Engine in your pocket to the Tensor Processor in Google’s data halls, brings AI down to earth.

It reminds us that intelligence, however artificial, still runs on very real machinery.


Infinite scale: The architecture behind the Azure AI superfactory - The Official Microsoft Blog
Today, we are unveiling the next Fairwater site of Azure AI datacenters in Atlanta, Georgia. This purpose-built datacenter is connected to our first Fairwater site in Wisconsin, prior generations of AI supercomputers and the broader Azure global datacenter footprint to create the world’s first planet-scale AI superfactory. By packing computing power more densely than ever…
From Sand to Software: A Whistle-Stop Tour of the AI Value Chain
AI may look like pure software, but it rests on a fragile chain of quartz, optics, fabs, and GPUs. This post traces the journey, stop by stop.
Compute: A New Measure of the World
Compute once meant calculation. Now it shapes work, art, and power. An invisible current running through the modern world’s every action.
Running a Local LLM on Your iPhone
I explored how far mobile AI has come by running LLMs directly on my iPhone. No cloud, no upload. Here’s what I learned from testing Haplo AI.