Nearby peer discovery without GPS using environmental fingerprints

Quick overview

After scanning nearby WiFi networks, we calculate an epoch (time rounded to e.g., the nearest 5 minutes) that serves as a salt for hashing and enforces time-based expiry. We then perform MinHash on the set of WiFi networks (salted with the epoch), producing a signature of k hash values.

Next, we apply LSH (locality-sensitive hashing) to divide this signature into b bands, where each band produces a preImage. Each preImage is then hashed again using SHA-256 to create a publicTag.

For each publicTag, we encrypt our peer information using the corresponding preImage as the encryption key, then submit this encrypted data to a rendezvous server indexed by the publicTag. Any nearby peer observing a similar environment will derive the same preImages and publicTags, allowing them to decrypt our peer information, while the rendezvous server learns nothing.

This system is not limited to WiFI networks. It can also be used to discover peers by a list of shared interests, or any other information that can be deduced from the environment.

A fully working implementation of this idea called 'Shimmer' can be found in my GitHub repo.

waerhert/shimmer

Interactive demo

Step 1: MinHash - Creating Similarity Fingerprints

MinHash is a technique that creates a compact "fingerprint" of a set. Similar sets produce similar fingerprints, which is exactly what we need for proximity detection.

Select at least 3 WiFi networks that your device might observe:

💡 Note: In real applications, you'd combine WiFi SSID, BSSID (MAC address), and bucketed signal strengths for more robust fingerprinting. This demo uses only SSIDs for simplicity.

Select at least 3 networks to calculate MinHash

Step 2: LSH - Creating Collision Buckets

Locality-Sensitive Hashing (LSH) divides the MinHash signature into bands. Each band is hashed to create a "bucket" where similar signatures are likely to collide, meaning they'll produce the same hash for at least one band.

⬆ Complete the MinHash demo above first to see LSH in action.

Step 3: Encryption and Announcement

Now we encrypt our peer information using each preImage as the encryption key and announce to the rendezvous server using the publicTags as indexes.

⬆ Complete the LSH demo above first to see encryption and announcement.

Step 4: Peer Discovery and Decryption

Finally, let's simulate another peer (Bob) discovering Alice's announcement. If Bob observes a similar WiFi environment, they'll generate matching publicTags and be able to decrypt Alice's peer information.

⬆ Complete the announcement step above to see peer discovery in action.

You may notice that a significant overlap is required to produce the same LSH tags. Sometimes it works with two networks overlapping, and sometimes it doesn't. This is due to the probabilistic nature of LSH. Tweaking the k-value of MinHash and the b-value (bands) of LSH can potentially improve the overlap detection. I have to do some more testing to find good values for each modality.

Features of Shimmer

Private Set Intersection

While LSH tags are probabilistic, peers can perform Private Set Intersection (PSI) after discovery to get exact similarity scores without disclosing their complete sets.

Multiple Modalities

The system works with any categorical data, not just WiFi networks: shared interests, Bluetooth networks, cell towers, or any modality you define. Each can have its own configuration (k-value, bands, epoch interval).

Epoch-Based Expiry

Sketches expire automatically based on configurable intervals (5 minutes, 10 minutes, etc.). Tags are withdrawn from the rendezvous server on expiry, preventing stale data and limiting tracking windows.

Rendezvous Options

Three implementations available: in-memory (testing), HTTP server (encrypted), or DHT-based (fully decentralized, but lacking some other features (see below)).

libp2p Integration

Shimmer plugs into libp2p as a service:

import { createLibp2p } from 'libp2p';
import { shimmer, httpRendezvous } from 'shimmer';

const node = await createLibp2p({
  services: {
    shimmer: shimmer({
      rendezvous: httpRendezvous('https://rendezvous.example.com'),
      sketcherConfig: {
        wifi: { k: 128, bands: 32, epochInterval: '5m' },
        interests: { k: 64, bands: 16, epochInterval: '1h' }
      }
    })
  }
});

// Sketch and discover
await node.services.shimmer.sketch('wifi', ['CafeGuest_WiFi', 'HomeNetwork_5G']);
const peers = await node.services.shimmer.discover('wifi');

The implementation provides automatic connection establishment, RTT measurement and libp2p peerStore integration.

Security considerations

I wanted to build something geolocation and P2P related for a while, just for fun. The goal was to make it privacy preserving and hard to spoof. There are many things to consider here, and ultimately it really boils down to 'What's the threat model?'. I don't have waterproof solutions for every security aspect here. If you answer the previous question, there may not even be any issue. The least I can do here is expand on some security considerations I've come across.

One issue that comes up a lot in geolocation based systems is that geolocation is easily spoofed. To prove that a device is really present at a location or nearby I came up with the idea of using environmental observations instead of a geolocation.

However, this also comes with a weakness: once an environment has been seen, it's easy for an attacker to keep pretending to be at in that environment (as long as the environment remains unchanged). For WiFi networks this is often the case. While building this I found out about a Google project "Eddystone-EID" that uses a beacon with rotating identifiers. This would cause an environmental change at every epoch and would make old knowledge of the environment useless, preventing such a spoofing attack. At least that's the gist as far as I understood.

The rendezvous server is another big one. Yes, the server does not learn the contents of the encrypted records. However... the server does learn which IP announces a tag, and which IP queries for it. This alone could allow any rendezvous server to learn about the proximity of IP addresses. Here are some potential mitigations:

Use Tor to talk to any rendezvous server
Oblivious HTTP (OHTTP) uses two proxies to separate IP knowledge from content knowledge
A DHT as a rendezvous: no single node knows the full set of announced tags, but this comes with a whole other set of DHT related issues: Sybill attacks, high mobile energy use, ...
Cryptographically Generated Addresses (IPv6): Instead of having a rendezvous server, have no rendezvous server! Addresses would be generated based on the LSH tags. The issue with this is that the network prefix still somehow has to be communicated. So not really a fully viable option.
Direct communication within the environment: Shocker! I know. The whole point however was to avoid this by design.

libp2p DHT

Shimmer includes a DHT-based rendezvous implementation using libp2p's kadDHT. It converts each publicTag to a CID and announces it through the Content Provider API.

Current limitations:

Records are neither encrypted nor signed (see issue)
kadDHT nodes retain Provider records for 24-48 hours, which doesn't align well with configurable epoch expiry windows

By contrast, the HTTP rendezvous implementation uses encrypted records (decryption requires the preImage) but doesn't use signatures. Signed records wouldn't add much value here I think: we could use them to prevent spam, but keypair generation costs nothing. For peer authentication, successfully decrypting with the correct preImage already proves the peer shares our environment. 🤷

Geospatial indexing

When spoofing a geolocation is not a concern, and you just want to rendezvous based on a geographical location, there are existing hierarchical geospatial indexing systems:

H3 - Uber's hexagonal hierarchical geospatial indexing system. Divides the world into hexagonal cells at multiple resolutions, allowing efficient proximity queries. Example cell
Geohash - Encodes geographic coordinates into short alphanumeric strings. Locations with shared prefixes are geographically close. Example: u15

Conclusion

Shimmer is an experiment in proximity detection: proving you're nearby without GPS coordinates or disclosing your environment. While there are still security considerations to address depending on your threat model, the core technique works and can be extended to any categorical similarity detection beyond just location.

Feel free to try it out, break it, or contribute improvements!