Running a Local LLM on Your iPhone

I explored how far mobile AI has come by running LLMs directly on my iPhone. No cloud, no upload. Here’s what I learned from testing Haplo AI.

Running a Local LLM on Your iPhone
Portable and private LLM on the iPhone.

The ability to run large language models (LLMs) locally on a mobile device is no longer just theoretical. I recently experimented with this on my iPhone, driven by a simple question: can I privately summarise my July 2024 diary without relying on cloud-based AI?

Why run a model locally?

Privacy and portability were my key motivations. While cloud-based models like ChatGPT offer impressive capabilities — memory, nuance, philosophical breadth — they also come with trade-offs. For sensitive or personal use cases, such as summarising a diary or analysing local notes, running a model locally can be both practical and secure.

Running GPT-OSS Locally: What OpenAI Just Made Possible (And What It Didn’t)
OpenAI released GPT‑OSS under an open licence. Here’s what that really means, how I ran it on a Mac mini, and where you might start experimenting too.

Testing Haplo AI

I started with an app called Haplo AI, which offers a curated selection of downloadable models, including:

  • Gemma
  • Qwen
  • Mistral
  • LLaMA
  • Phi-3

I downloaded them all and began testing.

Current models in Haplo in detail.

First use case: summarising a diary

My test case was a single file: a July 2024 diary summary exported as a 188 KB HTML-to-PDF file. Initially, I tried uploading the PDF directly, but it was too large or structurally too complex for the app to handle. Markdown wasn’t supported either.

Eventually, I reduced the input to plain text via copy and paste, which proved more manageable — but even then, several models couldn’t handle the context window. Only Gemma and Qwen could process the input. And from those two, Gemma stood out by providing the only useful summary.

Haplo in screens

Observations and limitations

While it’s impressive that this works at all, there are limitations:

  • No memory: Each session is context-bound.
  • Smaller models: They lack the factual depth, coherence, and nuance of full-scale LLMs.
  • Interface quirks: For instance, Markdown wasn’t accepted and larger texts require manual trimming.

Still, the ability to do this privately, offline, and entirely on a phone is remarkable. It opens up use cases where privacy matters — whether that’s handling journal entries, summarising confidential notes, or experimenting with local automation.

See it in action

I recorded a short demo to show how it works in practice — not as a comparison to GPT-4, but as a peek into what’s already possible on-device:

What’s next?

I’ll likely try other apps, such as Private LLM, and explore additional use cases that require more autonomy and discretion.

It’s early days, but even these modest experiments point to a future where lightweight, private AI models are part of everyday workflows.


Running GPT-OSS Locally: What OpenAI Just Made Possible (And What It Didn’t)
OpenAI released GPT‑OSS under an open licence. Here’s what that really means, how I ran it on a Mac mini, and where you might start experimenting too.
Llama 3.2 on a Mac
I tested Meta’s Lama 3.2 LLM on my Mac Mini, setting it up via Docker. It’s fast, private, and generates code, but lacks memory and multimodal features like ChatGPT.
From Silicon to Intelligence: Understanding the Hardware Behind AI
A short video about NPUs and TPUs led to a deeper look at the physical side of AI. From the Neural Engine in your iPhone to the massive processors powering data-centre models.