OSINT Investigation Infrastructure
Bellingcat
Entity resolution, CV geolocation, and multilingual monitoring pipelines for open-source investigations
The Opportunity
Bellingcat is the gold standard in open-source intelligence, using publicly available data to investigate war crimes, human rights abuses, and disinformation. Their investigators work across 150+ tools — from Sherlock and Maltego for identity research, to Sentinel Hub and Planet Labs for satellite imagery, to eight different Telegram analysis tools for conflict monitoring. Every one of these tools is manual, siloed, and disconnected from the others. The right ML infrastructure wouldn't replace these tools — it would connect them into a unified investigation layer that turns weeks of manual cross-referencing into minutes.
Bellingcat
Fit Matrix
The Problem Today
A single Bellingcat investigation might touch a dozen tools: Sherlock to trace a username, OpenCorporates to find the company, OpenSanctions to check for matches, FlightAware to track a jet, MarineTraffic to follow a ship, Telegram channels for ground truth, and Google Earth for satellite verification. Each tool lives in its own tab, produces its own output format, and has no awareness of the others. Investigators manually copy-paste between tools, maintain sprawling spreadsheets of cross-references, and use Maltego or Gephi to try to visualize connections after the fact.
The bottleneck isn't the tools — Bellingcat has excellent taste in tooling. The bottleneck is the connective tissue between them.
Before
- ×150+ investigation tools used in isolation, no cross-linking
- ×Manual geolocation across 45+ mapping services, hours per image
- ×Telegram, Twitter, and social platforms monitored separately per language
After
- ✓Unified entity graph connecting identities, companies, sanctions, and transport data
- ✓CV pipeline matching images against satellite and street-level databases in seconds
- ✓Real-time multilingual monitoring across Telegram, social media, and messaging platforms
What We'd Build
Entity Resolution & Link Analysis
The centerpiece. A graph-based engine that automatically connects data across Bellingcat's investigation tools — linking a username found via Sherlock to a corporate record in OpenCorporates, to a sanctions hit in OpenSanctions, to a flight in FlightAware, to a vessel in MarineTraffic, to a Telegram channel. When an investigator identifies a person of interest, the system surfaces every known connection across all data sources and previous investigations. This replaces days of manual cross-referencing with a single query.
The system would ingest data from:
- Identity tools: Sherlock, Maigret, Blackbird, WhatsMyName
- Corporate databases: EDGAR, OpenCorporates, LittleSis
- Sanctions lists: OpenSanctions, EU Sanctions Map, RuPEP
- Transport tracking: FlightAware, Flightradar24, MarineTraffic, VesselFinder
- Data breach indices: DeHashed, Have I Been Pwned, Intelx.io
- Previous Bellingcat investigations: internal case data
CV Geolocation Pipeline
Bellingcat investigators currently use 45+ geolocation tools manually — Google Earth, Mapillary, SunCalc, ShadeMap, Shadow Finder, Sentinel Hub, Planet Labs, and dozens more. A single geolocation task means opening multiple mapping services, comparing terrain features, analyzing shadows for time-of-day, and cross-referencing infrastructure details from GeoHints.
The CV pipeline automates this: take an image, extract visual features (terrain, vegetation, infrastructure, signage, shadows), and match against satellite imagery, street-level databases, and terrain models simultaneously. Shadow analysis gets automated via SunCalc-style calculations. The system returns ranked candidate locations with confidence scores, turning hours of manual comparison into seconds.
Multilingual OSINT Monitoring
Bellingcat operates across dozens of languages and monitors Telegram, Twitter/X, Bluesky, Discord, and VK for breaking events and investigation leads. They currently use eight separate Telegram tools (Telegago, TelegramDB, Telemetrio, Telepathy, TGStat, and more) plus individual platform search tools — all manually.
A unified NLP monitoring system would ingest streams from all platforms simultaneously, with real-time translation and classification. Telegram is the priority — it's the primary information channel in conflict zones like Ukraine and Syria. The system would detect emerging narratives, flag potential evidence, and route relevant content to active investigations automatically.
Investigation Data Infrastructure
The foundation layer underneath everything. Right now, investigation data lives in Obsidian notes, Logseq pages, Maltego graphs, and spreadsheets. There's no unified data model, no programmatic access to previous investigation artifacts, and no way to run queries across the full corpus of Bellingcat's work.
This is the data lake and API layer: a structured investigation database that all tools feed into and all investigators query from. Evidence gets preserved with legal-grade chain-of-custody metadata (critical for ICC prosecutions). Every entity, location, document, and relationship is indexed and searchable. New investigations automatically surface connections to old ones.