---
title: "From Silicon to Intelligence: Understanding the Hardware Behind AI"
description: "A short video about NPUs and TPUs led to a deeper look at the physical side of AI. From the Neural Engine in your iPhone to the massive processors powering data-centre models."
url: "https://hoeijmakers.net/cpu-gpu-tpu-npu-explained/"
date: 2025-11-12
updated: 2026-04-07
author: "Rob Hoeijmakers"
site: "hoeijmakers.net"
language: "en"
tags: ["AI"]
---

# From Silicon to Intelligence: Understanding the Hardware Behind AI

It began with a short video. [FinninTech](https://www.threads.com/@tiffintech) explained the difference between TPUs and NPUs. A brief clip that suddenly made the invisible world of AI hardware tangible.

That curiosity sent me down a path connecting the chip in my phone to the massive processors that train models like ChatGPT.

### Quick takeaways

- CPUs, GPUs, TPUs and NPUs form a spectrum of *specialisation*: from flexible generalists to highly efficient AI specialists.
- The iPhone’s **Neural Engine** is Apple’s name for its NPU, a miniature AI processor for local tasks.
- **FLOPS** and **TOPS** measure different kinds of computing power: precision versus speed.
- Export limits on chips such as NVIDIA’s H100 show how computing power has become a geopolitical factor.

## The spectrum of specialisation

Artificial intelligence may feel abstract, but it’s built on physical hardware — billions of transistors arranged for specific kinds of work.

At one end stands the **CPU**, a flexible all-rounder that handles logic and control. Then come **GPUs**, vast grids of simple cores designed for parallel maths. Beyond those lie **TPUs** and **NPUs**, processors made specifically for neural networks.

You can picture it as a line:

> CPU → GPU → TPU / NPUAs you move right, flexibility decreases, but efficiency for AI tasks rises sharply.

Where a CPU handles general tasks, a GPU multiplies matrices, a TPU accelerates training in Google’s data centres, and an NPU performs small-scale AI tasks efficiently on your device.

## The chip in your pocket

Apple’s **A17 Pro** chip, used in the iPhone 16 Pro and newer models, combines three types of processors:

a CPU for everyday applications, a GPU for graphics, and a **Neural Engine** for machine learning.

This Neural Engine performs around **35 trillion operations per second**, powering on-device features such as transcription, photo recognition, and real-time translation. It consumes only a few watts, roughly a hundred thousand times less power than a data-centre GPU, yet fast enough for personal AI.

## FLOPS and TOPS: the language of compute

**FLOPS** (*floating-point operations per second*) measure the ability to handle precise arithmetic -> needed for **training** large models.

**TOPS** (*tera-operations per second*) describe simpler, lower-precision calculations -> ideal for **running** those models efficiently.

Training requires floating-point accuracy and immense power; inference, which happens on your phone, can use integer maths to save energy.

In short: GPUs and TPUs are measured in FLOPS, NPUs in TOPS.

## TPU vs GPU: same idea, different philosophy

A **GPU** is a programmable engine for parallel work and it is built for graphics, later adopted for AI.

A **TPU** is Google’s own design: a *tensor processor* built from the ground up for machine learning.

It’s not a GPU, but it draws on the same principle and that is performing many operations in parallel.

While GPUs remain flexible, TPUs are hard-wired for the algebra behind neural networks, making them faster and more efficient for that single purpose.

Coming soon: Microsoft’s Maia. Maia is Microsoft’s own AI accelerator, optimised for transformer workloads. Functionally it resembles Google’s TPU family, but it has its own architecture, software stack, and integration into Azure.## The far end of the spectrum

In data centres, processors such as NVIDIA’s **H100** or **B100** dominate.Each consumes hundreds of watts and delivers several **petaflops** of performance. These chips now sit at the centre of export restrictions, because such computing capacity determines who can train the next generation of large models.

To comply with U.S. limits, NVIDIA built slower versions (A800, H800) for the Chinese market. It is the same hardware, with reduced interconnect speed.The boundaries of computing power have become geopolitical borders.

The NVIDIA H100 and Google’s TPU both power today’s AI revolution, but they aren’t one-to-one rivals. The H100 is a flexible, general-purpose GPU evolved for deep learning; the TPU is a purpose-built tensor processor optimised for Google’s own ecosystem. They meet at the same goal, **accelerating neural computation**, from opposite ends of the design spectrum.## From abstraction to atoms

Once you see AI through its hardware, it feels less ethereal.Every neural network, from the model in your phone to the ones shaping global research, depends on physical constraints: heat, energy, and silicon.

Understanding this spectrum, *from the Neural Engine in your pocket to the Tensor Processor in Google’s data halls*,  brings AI down to earth.

It reminds us that intelligence, however artificial, still runs on very real machinery.

---