1.5 Years of Talking to Claude

Since July 2024, I've had 888 conversations with Claude. Some lasted minutes, others days. Some were about debugging obscure ESP32 issues, others were deemed "Niche Geographic Queries" by the cluster labeler.

Loading analysis data...

How this works

Embeddings

Think of an embedding as a numeric representation of a word or text. You can view it as a list of numbers, or as a point in a multidimensional space. The cool thing now is that two texts that are semantically close to each other, will also be represented as points in space that are close to each other.

Creating an embedding from a piece of text is fairly straightforward. OpenAI offers an easy API for this, and producing embeddings locally can be done with the Transformers.js library.

Use cases for embeddings

Because embeddings turn semantic meaning into numbers, so to speak, they are a building block for many features you find in today's applications:

semantic search
personalized recommendations
duplicate detection
clustering and categorization

Dimension reduction and clustering

At first I tried using DBSCAN on the high-dimensional embeddings but this didn't work well. I found out I had to reduce the dimension to something like 15 in order for DBSCAN to produce good results.

Reducing the dimensions is done with PCA:

import { PCA } from 'ml-pca';
const pca = new PCA(embeddings);
const reduced = pca.predict(embeddings, { nComponents: 15 });

With the reduced embeddings, I run DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to group similar conversations together.

import DBSCAN from 'density-clustering';
const dbscan = new DBSCAN();
const clusters = dbscan.run(reduced, epsilon, minPoints);

With DBSCAN we also have to finagle some parameters like epsilon and minPoints. Getting these wrong will result in zero or 1 cluster being detected, or 47. I run DBSCAN a few times with different epsilon values to find a value that balances noise (ie conversations not deemed part of a cluster) vs a sensible amount of clusters.

For the 2D scatter plot visualization, I take the same 15D embeddings and run them through UMAP (Uniform Manifold Approximation and Projection). Without the UMAP step, the points on the 2D scatter plot would resemble one big randomly distributed point cloud with no definition between clusters.

import { UMAP } from 'umap-js';
const umap = new UMAP({ nComponents: 2, nNeighbors: 15, minDist: 0.1 });
const projection = umap.fit(reduced); // [[x, y], [x, y], ...]

Labeling

There's a few ways you can label the clusters. One is to simply extract some keywords from the conversation in the cluster and concatenate them into a string. Alternatively you can just pick the title of the most central conversatoin in a cluster. But what works best I found was to gather some samples from the conversations in the cluster and ask an LLM to summarize the topic into 1 to 3 words. While I was figuring this out in Node.js I used a Mastra agent for interacting with the LLM and getting a structured response back. But for the web version here I decided to reduce my bundle size by using plain OpenAI API calls.

1.5 Years of Talking to Claude

How this works

Embeddings

Use cases for embeddings

Dimension reduction and clustering

Labeling

Continue reading

Nearby peer discovery without GPS using environmental fingerprints

A durable task primitive built on BullMQ

Building a custom ESP32 thermostat