1.5 Years of Talking to Claude
An analysis of 888 conversations with Claude over 18 months. Perform your own analysis 100% offline.
Since July 2024, I've had 888 conversations with Claude. Some lasted minutes, others days. Some were about debugging obscure ESP32 issues, others were deemed "Niche Geographic Queries" by the cluster labeler.
How this works
Embeddings
Think of an embedding as a numeric representation of a word or text. You can view it as a list of numbers, or as a point in a multidimensional space. The cool thing now is that two texts that are semantically close to each other, will also be represented as points in space that are close to each other.
Creating an embedding from a piece of text is fairly straightforward. OpenAI offers an easy API for this, and producing embeddings locally can be done with the Transformers.js library.
Use cases for embeddings
Because embeddings turn semantic meaning into numbers, so to speak, they are a building block for many features you find in today's applications:
- semantic search
- personalized recommendations
- duplicate detection
- clustering and categorization
Dimension reduction and clustering
At first I tried using DBSCAN on the high-dimensional embeddings but this didn't work well. I found out I had to reduce the dimension to something like 15 in order for DBSCAN to produce good results.
Reducing the dimensions is done with PCA:
import { PCA } from 'ml-pca';
const pca = new PCA(embeddings);
const reduced = pca.predict(embeddings, { nComponents: 15 });
With the reduced embeddings, I run DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to group similar conversations together.
import DBSCAN from 'density-clustering';
const dbscan = new DBSCAN();
const clusters = dbscan.run(reduced, epsilon, minPoints);
With DBSCAN we also have to finagle some parameters like epsilon and minPoints. Getting these wrong will result in zero or 1 cluster being detected, or 47. I run DBSCAN a few times with different epsilon values to find a value that balances noise (ie conversations not deemed part of a cluster) vs a sensible amount of clusters.
For the 2D scatter plot visualization, I take the same 15D embeddings and run them through UMAP (Uniform Manifold Approximation and Projection). Without the UMAP step, the points on the 2D scatter plot would resemble one big randomly distributed point cloud with no definition between clusters.
import { UMAP } from 'umap-js';
const umap = new UMAP({ nComponents: 2, nNeighbors: 15, minDist: 0.1 });
const projection = umap.fit(reduced); // [[x, y], [x, y], ...]
Labeling
There's a few ways you can label the clusters. One is to simply extract some keywords from the conversation in the cluster and concatenate them into a string. Alternatively you can just pick the title of the most central conversatoin in a cluster. But what works best I found was to gather some samples from the conversations in the cluster and ask an LLM to summarize the topic into 1 to 3 words. While I was figuring this out in Node.js I used a Mastra agent for interacting with the LLM and getting a structured response back. But for the web version here I decided to reduce my bundle size by using plain OpenAI API calls.
Continue reading
Nearby peer discovery without GPS using environmental fingerprints
I propose a peer discovery technique to detect nearby devices by comparing similarity in their observed environments, such as WiFi or Bluetooth networks. Using locality-sensitive hashing and private set intersection, peers can compare their environments without disclosing the full details. With sufficient similarity between environments, peers can conclude they are near each other.
A durable task primitive built on BullMQ
Building reliable task systems with BullMQ requires juggling queues, workers, and events, turning simple functions into scattered configuration. DurableTask solves this by wrapping BullMQ primitives into a single abstraction that gives any function automatic archiving, retries, scheduling, and complete execution history.
Building a custom ESP32 thermostat
A guided tour through the code of a battery-powered ESP32 thermostat with e-paper display, deep sleep, and remote control