Case study · Wavelength

Building an AI matchmaker: psychological embeddings and custom memory at scale

How I designed a 66-dimensional persona-preference system and a custom memory layer to power relationship matching — without relying on categorical signals.

During a short engagement with Wavelength, I designed, architected, and built their AI matchmaker's profile analysis system and custom memory layer. What started as an LLM orchestration project quickly pulled me into territory I hadn't worked in before: vector spaces, statistical confidence modelling, and memory architecture. This is a writeup of what I built and why.

Why dating app matching is fundamentally different

Most dating apps — Tinder, Hinge, Bumble — and content recommendation systems like TikTok and Instagram rely on categorical signals: like, dislike, skip. They maintain large libraries of annotated content as anchors and calculate similarity between a user's signal history and new items.

This works well for content. But relationships are not content. Two people can be highly similar — same interests, same lifestyle, same values — and still be deeply incompatible. Compatibility isn't a similarity score; it's a function of how two distinct psychological profiles complement and interact with each other. The matching problem required a richer representation of who someone is and what they actually need in a partner.

The 66-dimensional persona-preference system

I built an analysis agent that processes user conversations on a regular cadence and produces a 66-dimensional weighted vector space representing each user's psychological profile. The 66 dimensions were selected based on established psychological frameworks [cite frameworks here].

For each user, the system maintains two parallel vector spaces:

Each dimension in the preference vector carries an additional magnitude value representing tolerance — how flexible the user is on that particular axis. Someone might score high on valuing ambition in a partner but have a wide tolerance, meaning the absence of it isn't a dealbreaker. Someone else might have the same score but a narrow tolerance, making it one.

Every dimension across both vectors holds a weight and a confidence score that update as new conversational signals are gathered. Early in a user's journey, confidence scores are low and weights are kept conservative. As more data accumulates, the system becomes more opinionated.

Why not just use embeddings? Off-the-shelf sentence embeddings capture semantic similarity, not psychological structure. A user saying "I love spontaneous travel" and "I hate planning ahead" would land in similar embedding space, but mean very different things on a stability dimension. Structured dimensions with explicit semantics were necessary to make the matching logic interpretable and controllable.

Calibrating LLMs for predictable vector output

Using an LLM to populate and update a structured vector space is non-trivial. Out of the box, LLMs produce inconsistent outputs — the same conversational input can yield meaningfully different dimension values across runs. Without calibration, the vectors would be noisy and unreliable.

To solve this, I designed a calibration layer using two constructs:

Beyond calibration, I implemented three mechanisms to ensure vector updates remained accurate and stable over time:

Custom memory

A profile analysis system that starts fresh every session is impractical. The agent needed to remember relevant facts about users across conversations, consolidate redundant information, and discard details that were no longer relevant — all while keeping context windows short enough for efficient inference.

I built a custom memory system inspired by the two-layer architecture described in mem0's research. The goal was deliberately not to build a large RAG pipeline. Retrieval-augmented generation gets expensive and noisy at scale; a fat vector store stuffed with raw conversation fragments isn't memory, it's a log.

The custom layer instead focuses on three operations:

Getting this balance right — knowing what to keep versus what to let go — was the trickiest part of the architecture. Retaining too much bloats context. Forgetting too aggressively loses important signal. The system uses confidence scores and recency weighting from the vector layer to inform these decisions, keeping the two systems tightly coupled.

Scale

Despite being a fully LLM-powered system rather than a categorical one, the architecture was designed with scale in mind. By the end of the engagement, the system could support matchmaking across hundreds of thousands of users on a regular cadence — weekly re-matching runs included — without prohibitive inference costs. The structured vector representation means the heavy LLM work (analysis and memory updates) happens asynchronously per user, while the matching computation itself is a fast vector operation.

Reflection

My prior work had been in LLM orchestration. This project pushed me into applied ML and mathematics I hadn't worked with directly before — vector space design, statistical confidence modelling, decay functions, memory architecture. Shipping a working and scalable system within a month, in a domain where the problem remains genuinely unsolved for most teams building in this space, is the part I'm most proud of.