Provenance research at machine scale
Roughly 100,000 of the ~600,000 artworks looted in Europe between 1933 and 1945 are still missing. The evidence to find many of them already exists — seizure cards, depot inventories, auction catalogs, restitution files — but it is scattered across dozens of archives, in different languages and numbering systems. No person can hold it all at once. A machine can.
The Provenance Project reads these records, extracts who-owned-what-when as structured claims, and assembles them into a single knowledge graph — then reasons across it to surface evidence-cited leads: a work documented as seized, with no documented return, that may match something hanging in a museum today.
Each step is a claim backed by a specific document; the dashed span is the open question the record leaves unanswered. See the full case below.
It is a restitution-research instrument. It proposes matches between documented seizures and present-day locations, and every link in the chain cites the original scanned record. Researchers, families, archives, and institutions can follow each trail and judge it themselves.
The framing is deliberate: this is the careful, evidence-first work of provenance research, accelerated — not treasure hunting.
It is not a verdict machine and not a rumor generator. The AI proposes, ranks, and cites; people verify. A lead with no document trail is not a lead.
Falsified provenance was a deliberate wartime and post-war practice. So when sources contradict each other, the system treats the contradiction as a signal worth a human's attention — not noise to smooth over.
Method
Most databases store facts and overwrite conflicts. Provenance can't work that way — the historical record is full of deliberate lies, gaps, and disagreements. So the graph stores claims: each one carries the document it came from, a confidence score, and a citation back to the scan. Competing claims coexist. Identities are never hard-merged; "this is the same object as that" is itself a scored, reversible edge. A false merge would manufacture a false lead — the discipline that prevents it is what separates a research tool from a generator of plausible fiction.
Graph analytics do the heavy lifting that no reader could: collective entity resolution that links the same painting across archives despite different titles, languages, and inventory numbers; custody-gap detection — every work documented as seized with no documented return; laundering-motif matching borrowed from anti-money-laundering analysis and pointed at the 1940s art market; and dealer-network centrality, which doubles as a map of which archive to digitise next. Read the full methodology →
Worked example · verified against the primary source
The French state holds La rue Saint-Rustique à Montmartre, a Maurice Utrillo street scene, in trust as an MNR work — recovered to France after the war, its pre-war history never fully established. The official record jumps from Paris to a Cologne art society in January 1944 with no explanation of how the painting entered Germany.
Cross-referencing the Getty Provenance Index against that gap, the engine found the painting offered for sale at Commeter, Hamburg, on 20 November 1937 (lot 288, consignor withheld), and again a year later. To check the match, we pulled plate I of the digitised 1937 catalogue from Heidelberg University's IIIF service: it is captioned "M. Utrillo. Nr. 288" and matches the museum's own photograph of the painting element for element — the buildings, the figures, the signature placement.
That auction appearance is a concrete, previously unconnected step toward the painting's wartime path — sourced entirely to public records, and checkable by anyone. It is presented as a documented lead, not a conclusion: who consigned it, and how it reached Cologne, remain open.
Private collection, Paris — to 1937. Held today at the Centre Pompidou.
Commeter, Hamburg, 1937 & 1938 — "aus Privatbesitz". Getty PI record + Heidelberg catalogue plate.
Cologne, 1944 → repatriated to France, 1949 → assigned MNR custody.
The graph is built from openly-licensed records — bulk datasets and public archives, no scraping behind logins. Today it spans about 3.5 million records across seven sources, and is growing.
| Source | What it provides | Access |
|---|---|---|
| Getty Provenance Index | 1.8M art-market sale records, incl. the German-speaking market 1900–1945 | CC0 |
| French MNR / Rose Valland (POP) | 2,456 never-restituted works recovered to France | open |
| Joconde (musées de France) | 1M museum objects; 612k with former-ownership data | open |
| Art Institute of Chicago | 134k works incl. 15.7k published provenance narratives | CC0 |
| The Met | 485k works with credit-line / accession data | CC0 |
| Wikidata | ~70k paintings as a cross-archive identity hub | CC0 |
| Getty Knoedler stock books | 40k dealer records (US market, to 1971) | CC0 |
Next, through partnership and public APIs: US National Archives recovery records (Munich Central Collecting Point property cards, ALIU reports), the German Lost Art Foundation registry, Dutch and Arolsen holdings.
Early results, stated carefully.
Every published claim is reproducible from its cited sources. Identities are never silently merged. Contradictions are shown, not hidden. Calibrated language throughout: "offered at auction in 1937" is a fact with a scan behind it; "looted" is a legal conclusion we do not draw.
A name in a provenance line is a research signal, not proof of wrongdoing — many dealers were legitimate, and the despoiled were victims. AI extraction and matching make errors; that is why sources are always shown and people make the final call. We do not speculate about present-day private owners.