Provenance research at machine scale
Roughly 100,000 of the ~600,000 artworks looted in Europe between 1933 and 1945 are still missing. The evidence to find many of them already exists — seizure cards, depot inventories, auction catalogs, restitution files — but it is scattered across dozens of archives, in different languages and numbering systems. No person can hold it all at once. A machine can.
The Provenance Project reads these records, extracts who-owned-what-when as structured claims, and assembles them into a single knowledge graph — then reasons across it to surface evidence-cited leads: a work documented as seized, with no documented return, that may correspond to a work in a public collection today — a lead for researchers to verify against the original records, never a verdict.
Each step is a claim backed by a specific document; the dashed span is the open question the record leaves unanswered. See the full case below.
It is a restitution-research instrument. It proposes matches between documented seizures and present-day locations, and every link in the chain cites the original scanned record. Researchers, families, archives, and institutions can follow each trail and judge it themselves.
The framing is deliberate: this is the careful, evidence-first work of provenance research, accelerated — not treasure hunting.
It is not a verdict machine and not a rumor generator. The AI proposes, ranks, and cites; people verify. A lead with no document trail is not a lead.
Falsified provenance was a deliberate wartime and post-war practice. So when sources contradict each other, the system treats the contradiction as a signal worth a human's attention — not noise to smooth over.
Credit where it's due
The Provenance Project did not create this evidence, and it does not stand apart from the field — it depends on it entirely. Archivists, provenance researchers, restitution commissions, and museums assembled these records over decades of patient, expert work. Our role is narrow: to cross-reference their open records at a scale no individual can hold in their head at once, and to link you straight back to the original. Every trail we draw ends at someone else's authoritative document — never at ours.
If you maintain one of these resources: we cite and link to you, we don't re-host or replace you, and we would genuinely welcome your corrections at info@theprovenanceproject.com.
Method
Most databases store facts and overwrite conflicts. Provenance can't work that way — the historical record is full of deliberate lies, gaps, and disagreements. So the graph stores claims: each one carries the document it came from, a confidence score, and a citation back to the scan. Competing claims coexist. Identities are never hard-merged; "this is the same object as that" is itself a scored, reversible edge. A false merge would manufacture a false lead — the discipline that prevents it is what separates a research tool from a generator of plausible fiction.
Graph analytics do the heavy lifting that no reader could: collective entity resolution that links the same painting across archives despite different titles, languages, and inventory numbers; custody-gap detection — every work documented as seized with no documented return; laundering-motif matching borrowed from anti-money-laundering analysis and pointed at the 1940s art market; and dealer-network centrality, which doubles as a map of which archive to digitise next. Read the full methodology →
Worked example · verified against the primary source
The French state holds La rue Saint-Rustique à Montmartre, a Maurice Utrillo street scene, in trust as an MNR work — recovered to France after the war, its pre-war history never fully established. The official record jumps from Paris to a Cologne art society in January 1944 with no explanation of how the painting entered Germany.
Cross-referencing the Getty Provenance Index against that gap, the engine found the painting offered for sale at Commeter, Hamburg, on 20 November 1937 (lot 288, consignor withheld), and again a year later. To check the match, we pulled plate I of the digitised 1937 catalogue from Heidelberg University's IIIF service: it is captioned "M. Utrillo. Nr. 288" and matches the museum's own photograph of the painting element for element — the buildings, the figures, the signature placement.
That auction appearance is a concrete, previously unconnected step toward the painting's wartime path — sourced entirely to public records, and checkable by anyone. It is presented as a documented lead, not a conclusion: who consigned it, and how it reached Cologne, remain open.
Private collection, Paris — to 1937. Held today at the Centre Pompidou.
Commeter, Hamburg, 1937 & 1938 — "aus Privatbesitz". Getty PI record + Heidelberg catalogue plate.
Cologne, 1944 → repatriated to France, 1949 → assigned MNR custody.
The graph is built from openly-licensed records — bulk datasets and public APIs, no scraping behind logins. Today it spans more than 3.5 million records across eleven sources, and now joins the three sides of the problem: the loss (Nazi seizure records), the market (wartime sales), and the present day (museum collections).
| Source | What it provides | Access |
|---|---|---|
| ERR / Jeu de Paume (JDCRP) | The loss side — 36,585 objects looted by the Einsatzstab Reichsleiter Rosenberg, each with the despoiled owner, ERR code, and often the Nazis' own photograph | open API |
| Getty Provenance Index | 1.8M art-market sale records, incl. the German-speaking market 1900–1945 | CC0 |
| French MNR / Rose Valland (POP) | 2,456 never-restituted works recovered to France | open |
| Joconde (musées de France) | French national museums; former-ownership data on hundreds of thousands of objects | open |
| National Gallery of Art, DC | Paintings with full published provenance and IIIF images | CC0 |
| Cleveland Museum of Art | Paintings with published provenance and open-access images | CC0 |
| Art Institute of Chicago | Works including published provenance narratives | CC0 |
| The Met | Works with credit-line and accession data | CC0 |
| Wikidata | ~70k paintings as a cross-archive identity hub | CC0 |
Plus the Getty Knoedler and Goupil dealer stock books and the U.S. NEPIP provenance index. Next, through partnership and public archives: U.S. National Archives recovery records (Munich Central Collecting Point property cards, ALIU reports), the German Lost Art Foundation registry, and the Dutch NK / Origins-Unknown restitution collection.
Early results, stated carefully.
Every published claim is reproducible from its cited sources. Identities are never silently merged. Contradictions are shown, not hidden. Calibrated language throughout: "offered at auction in 1937" is a fact with a scan behind it; "looted" is a legal conclusion we do not draw.
A name in a provenance line is a research signal, not proof of wrongdoing — many dealers were legitimate, and the despoiled were victims. AI extraction and matching make errors; that is why sources are always shown and people make the final call. We do not speculate about present-day private owners.