Personal Encyclopedias

Last year, I visited my grandmother's house for the first time after the pandemic and came across a cupboard full of old photos. There were about 1,351 of them spanning all the way from my grandmother's wedding, my mom as a baby, to me in middle school, just right around the time when we got a smartphone and all photos since then are backed up online.

Photos all over the place

Everything was all over the place so I spent some time going through all of them individually and organizing them into groups. Some of the initial groups were based on photos with similar aspect ratios or film stock so there was a group of black/white 32mm square pictures that were taken around the time when my grandfather was in his mid 20s.

As I got done with grouping all of them, I was able to see flashes of stories in my head, but they were ephemeral and fragile. For example, there was a group of photos that looked like it was taken during my grandparents' wedding but I didn't know the chronological order they were taken because EXIF metadata didn't exist around that time.

Grouped photos

So I sat down with my grandmother and asked her to reorder the photos and tell me everything she could remember about her wedding. Her face lit up as she narrated the backstory behind the occasion, going from photo to photo, resurfacing details that had been dormant for decades. I wrote everything down, recorded the names of people in some of the photos, some of whom I recognized as younger versions of my uncles and aunts.

After the interview, I had multiple pages of notes connecting the photos to events that happened 50 years ago. Since the account was historical, as an inside joke I wanted to see if I could clean it up and present it as a page on Wikipedia so I could print it and give it to her.

So I cloned MediaWiki, spun up a local instance, and began my editorial work. I used the 2011 Royal Wedding as reference and drafted a page starting with the classic infobox and the lead paragraph.

Image of the wiki infobox

I split up the rest of the content into sections and filled them with everything I could verify like dates, names, places, who sat where. I scanned all the photos and spent some time figuring out what to place where. For every photo placement, there was a follow up to include a descriptive caption.

Whenever I mentioned a person, I linked them to an empty stub page. After I found out I could also link to the real Wikipedia, I was able to link things to real pages that provided wider context to things like venues, rituals, and the political climate around that time, like for instance a legal amendment that was relevant to the wedding ceremony.

In two evenings, I was able to document a full backstory for the photos into a neat article. These two evenings also made me realize just how useful encyclopedia software is to record and preserve media and knowledge that would've otherwise been lost over time.

Image of the wiki homepage

This was so much fun that I spent the following months writing pages to account for all the photos that needed to be stitched together.

I took help from r/genealogy about how to approach recording oral history and I was given resources to better conduct interviews, shoutout to u/stemmatis. I would get on calls with my grandmother and people in the family, ask them a couple of questions, and then write. It was also around this time that I used transcription APIs and language models to make the editorial process simpler.

Over time, I managed to write a lot of pages connecting people to different life events. Something about the encyclopedia format made it easy to connect the dots between families, migration patterns, religious affiliations, and education in a way that made me understand my own family much better.

Image of the many wiki pages

After finding all the stories behind the physical photos, I started to work on digital photos and videos that I had stored on Google Photos. The wonderful thing about digital photos is that they come with EXIF metadata that can reveal extra information like date, time, and sometimes geographical coordinates.

This time, without any interviews, I wanted to see if I could use a language model to create a page based on just browsing through the photos. As my first experiment, I created a folder with 624 photos of a family trip to Coorg back in 2012.

Image of folder

I pointed Claude Code at the directory and asked it to draft a wiki page by browsing through the images. I hinted at using ImageMagick and creating contact sheets so it would help with browsing through multiple photos at once.

After a few minutes and a couple of tokens later, it had created a compelling draft with a detailed account of everything we did during the trip by time of day. The model had no location data to work with, just timestamps and visual content, but it was able to identify the places from the photos alone, including ones that I had forgotten by now. It picked up details on the modes of transportation we used to get between places just from what it could see.

Image of coorg page

After I had clarified who some of the people in the pictures were, it went on to identify them automatically in the captions. All I had to do was fill in a few gaps that were locked in my memory and not available from the photos, and the page was complete.

For my next page, I wanted to see how the model would do when I gave it richer data. So I created a directory for my trip to Mexico City in 2022. I had taken 291 photos and 343 videos with an iPhone 12 Pro that included geographical coordinates as part of the EXIF metadata.

On top of that, I exported my location timeline from Google Maps, my Uber trips, my bank transactions, and Shazam history. I would ask Claude Code to start with the photos and then gradually give it access to the different data exports.

Image of folder with all these data

Here are some of the things that it did that I found to be super interesting during the many runs:

It cross-referenced my bank transactions with location data to ascertain the restaurants I went to.
Some of the photos and videos showed me in attendance at a soccer match, however, it was unknown which teams were playing. The model looked up my bank transactions and found a Ticketmaster invoice with information about the teams and name of the tournament.
It looked up my Uber trips to figure out commute times and exact locations of pickup and drop.
It used my Shazam tracks to write about the kinds of songs that were playing at a place, like Cuban songs at a Cuban restaurant.
In a follow up, I had mentioned that I remembered an evening dinner where there was a guitarist playing in the background, so it filtered all media that was captured in the evening, and then found a frame in a video that had the guitarist, so it went ahead and uploaded it and referenced the moment in the page.

The MediaWiki architecture worked well with the edits, since for every new data source it would make amendments like a real Wikipedia contributor would. I leaned heavily on features that already existed. Talk pages to clarify missing gaps and consolidate research notes, categories to group pages by theme, revision history to track how a page evolved as new data came in. I didn't have to build any of this, it was all just there.

Image of talk page

What started as me helping the model fill in gaps from my memory gradually inverted. The model was now surfacing things I had completely forgotten, cross-referencing details across data sources in ways I never would have done manually.

After this realization, I paused writing and started pointing Claude Code at different kinds of data exports to see what else it could write about. Using my Facebook, Instagram, and WhatsApp data archives, I was able to write pages of stories involving my best friend with whom I had shared around 100k messages and a couple thousand voice notes over a decade.

The model traced the arc of our friendship through the messages, pulled out the life episodes we had talked each other through, and wove them into multiple pages that read like it was written by someone who knew us both.

I also gave it a backup of my first Minecraft multiplayer server from 2013 and it was able to use the chat logs and world data to reconstruct our month-long journey from being noobs to beating the ender dragon.

Image of minecraft page with arc

This is when I realized I was no longer working on a family history project. What I had been building, page by page, was a personal encyclopedia. A structured, browsable, interconnected account of my life compiled from the data I already had lying around.

Here's what I think is going on. Most of our digital lives exist as scattered exports and backups sitting in folders we rarely open. Photos in one place, messages in another, transactions in a third. Individually, they're just data. But cross-referenced and narrated into encyclopedia pages, the computer can finally know you and reflect your life back as a coherent narrative.

I've been working on this as whoami.wiki. It uses MediaWiki as its foundation, which turns out to be a great fit because language models already understand Wikipedia conventions deeply from their training data. You bring your data exports, and AI agents do the editorial work of browsing, cross-referencing, and drafting pages that you can then review and refine.

Wikipedia's structure of infoboxes, citations, linked pages, and talk pages is a proven system for organizing knowledge that happens to work just as well for personal history as it does for world history. A page about your grandmother's wedding works the same way as a page about a royal wedding. A page about your best friend works the same way as a page about a public figure.

Beyond the model's ability to write these pages, I'm also excited about its ability to browse the encyclopedia when it needs context about you. It gives the model a structured account of your life that it can reference to actually be useful to you and not in a vague "preferences and memories" way, but with real knowledge.

There's also something I didn't expect ~ it's genuinely fun! Building your encyclopedia feels less like data management and more like the early days of Facebook timeline, when I actually enjoyed curating my own history. Browsing through finished pages, following links between people and events, stumbling on a detail you forgot, it turns your data into something you want to spend time with.

Today I'm releasing whoami.wiki as an open source project. The encyclopedia is yours, it runs on your machine, your data stays with you, and any model can read it. The project is early and I'm still figuring a lot of it out, but if this sounds interesting, I'd love for you to try it out and tell me what you think!