ChatGPT

How OpenAI o3 Became a Geospy and Found My Photo’s Location

I asked OpenAI’s o3 model to play geospy—like a human OSINT analyst. It zoomed, searched, reasoned, and nailed the photo’s location. Watch it unfold.

Rob Hoeijmakers

07 May 2025 • 3 min read

Marroni stand, somewhere in the world.

I wanted to test something I thought only humans were good at: being a visual detective. Think OSINT, think “geo spy” work—where you inspect a photo in detail and try to figure out where it was taken. You look at signage, prices, language, colours, and clues in the background.

I gave this challenge to OpenAI’s new o3 model—an official release that represents a distinct step forward beyond GPT-4, available to all ChatGPT users, with advanced features reserved for Plus and higher-tier users.

What followed surprised me.

Watch the Video

Here’s a short, narrated demonstration. I gave ChatGPT O3 a photo I took in winter—nothing spectacular, seemingly generic, and not instantly recognisable. Then I asked it to find out where it was taken.

Here I show how it works and the process o3 follows to find the location of a photo.

What Happened

The Open AI o3 model took 5 minutes and 42 seconds to analyse the image and narrate its internal thought process. Here's a breakdown of how it tackled the task:

1. Initial Hints

It spotted a chestnut vendor marked Maroni, associated with Switzerland or Italy. Prices were listed in Swiss francs, and the Coop logo pointed clearly toward Switzerland.

2. Visual Forensics

The model zoomed into tiny elements:

A Swiss-looking licence plate
The windows of a bus or trolley
A restaurant sign, possibly referencing “Orient”
A vertical LED sign, possibly from Credit Suisse, BCV, or UBS

3. Online Agent Behaviour

Then the agent decided to go online:

It searched terms like John Gut Maroni, finding a vendor near Biel Bahnhofplatz. It cross-referenced this with:

Google Images
Historical price lists
Car park signage and LED display clues
The bilingual signage and Bernese number plates

The model o3 reads the price board, sees a name, assumes it is the owner and decides to do a web search.

4. Conclusion

It triangulated the evidence and correctly identified the location as Bahnhofplatz in Biel/Bienne, in the Swiss canton of Bern.

Why This Matters

This isn’t just about geolocation. The real takeaway is the agentic behaviour:

The model zooms, interprets, compares, and decides.
It acts like an assistant or detective, rather than a passive respondent.
It combines multimodal input, web browsing, and reasoning over time.

This opens up exciting use cases—from autonomous research to visual diagnostics and interactive agents capable of complex tasks.

Try It Yourself

This kind of exploration is not just entertaining—it’s a preview of what’s becoming possible with language-image agents. If you want to experiment:

Use a complex image with a mix of visual cues.
Let the model think out loud.
Encourage it to use external tools or web search.

It’s like watching a mind unfold.

💡

GPT-4 vs. o3: How a Simple Experiment Reveals the Difference
I made up a proverb and asked GPT-4 to explain it—GPT-4 confidently gave a detailed explanation, despite it being completely invented. Repeating this with OpenAI’s advanced o3 model, it actively researched online, realised the proverb didn’t exist, and clearly explained why. In short: GPT-4 fluently creates plausible narratives; o3 carefully verifies if they’re true.