The Provenance Project.
Cross-correlating the world's archives to trace art looted 1933–1945 — and never returned.

Provenance research at machine scale

Roughly 100,000 of the ~600,000 artworks looted in Europe between 1933 and 1945 are still missing. The evidence to find many of them already exists — seizure cards, depot inventories, auction catalogs, restitution files — but it is scattered across dozens of archives, in different languages and numbering systems. No person can hold it all at once. A machine can.

The Provenance Project reads these records, extracts who-owned-what-when as structured claims, and assembles them into a single knowledge graph — then reasons across it to surface evidence-cited leads: a work documented as seized, with no documented return, that may match something hanging in a museum today.

owner art-market event undocumented gap present-day holding · click any node
A real reconstruction. This is how the engine traced one work — Maurice Utrillo's La rue Saint-Rustique à Montmartre — from open records alone. Click any step to see the evidence behind it.

Each step is a claim backed by a specific document; the dashed span is the open question the record leaves unanswered. See the full case below.

What this is — and what it isn't

It is a restitution-research instrument. It proposes matches between documented seizures and present-day locations, and every link in the chain cites the original scanned record. Researchers, families, archives, and institutions can follow each trail and judge it themselves.

The framing is deliberate: this is the careful, evidence-first work of provenance research, accelerated — not treasure hunting.

It is not a verdict machine and not a rumor generator. The AI proposes, ranks, and cites; people verify. A lead with no document trail is not a lead.

Falsified provenance was a deliberate wartime and post-war practice. So when sources contradict each other, the system treats the contradiction as a signal worth a human's attention — not noise to smooth over.

Method

A knowledge graph built from claims, not assertions

Most databases store facts and overwrite conflicts. Provenance can't work that way — the historical record is full of deliberate lies, gaps, and disagreements. So the graph stores claims: each one carries the document it came from, a confidence score, and a citation back to the scan. Competing claims coexist. Identities are never hard-merged; "this is the same object as that" is itself a scored, reversible edge. A false merge would manufacture a false lead — the discipline that prevents it is what separates a research tool from a generator of plausible fiction.

Acquire archives Extract claimsAI reads records Claims graphnever hard-merge Resolve entitiesacross archives Analyticsgaps · motifs Human reviewpeople decide

Graph analytics do the heavy lifting that no reader could: collective entity resolution that links the same painting across archives despite different titles, languages, and inventory numbers; custody-gap detection — every work documented as seized with no documented return; laundering-motif matching borrowed from anti-money-laundering analysis and pointed at the 1940s art market; and dealer-network centrality, which doubles as a map of which archive to digitise next. Read the full methodology →

Worked example · verified against the primary source

One painting, one gap, one match

The French state holds La rue Saint-Rustique à Montmartre, a Maurice Utrillo street scene, in trust as an MNR work — recovered to France after the war, its pre-war history never fully established. The official record jumps from Paris to a Cologne art society in January 1944 with no explanation of how the painting entered Germany.

Cross-referencing the Getty Provenance Index against that gap, the engine found the painting offered for sale at Commeter, Hamburg, on 20 November 1937 (lot 288, consignor withheld), and again a year later. To check the match, we pulled plate I of the digitised 1937 catalogue from Heidelberg University's IIIF service: it is captioned "M. Utrillo. Nr. 288" and matches the museum's own photograph of the painting element for element — the buildings, the figures, the signature placement.

That auction appearance is a concrete, previously unconnected step toward the painting's wartime path — sourced entirely to public records, and checkable by anyone. It is presented as a documented lead, not a conclusion: who consigned it, and how it reached Cologne, remain open.

owner

Private collection, Paris — to 1937. Held today at the Centre Pompidou.

market

Commeter, Hamburg, 1937 & 1938 — "aus Privatbesitz". Getty PI record + Heidelberg catalogue plate.

recovered

Cologne, 1944 → repatriated to France, 1949 → assigned MNR custody.

The data

The graph is built from openly-licensed records — bulk datasets and public archives, no scraping behind logins. Today it spans about 3.5 million records across seven sources, and is growing.

3.5M
records ingested
1.9M
artworks in the graph
3.4M
relationships mapped
SourceWhat it providesAccess
Getty Provenance Index1.8M art-market sale records, incl. the German-speaking market 1900–1945CC0
French MNR / Rose Valland (POP)2,456 never-restituted works recovered to Franceopen
Joconde (musées de France)1M museum objects; 612k with former-ownership dataopen
Art Institute of Chicago134k works incl. 15.7k published provenance narrativesCC0
The Met485k works with credit-line / accession dataCC0
Wikidata~70k paintings as a cross-archive identity hubCC0
Getty Knoedler stock books40k dealer records (US market, to 1971)CC0

Next, through partnership and public APIs: US National Archives recovery records (Munich Central Collecting Point property cards, ALIU reports), the German Lost Art Foundation registry, Dutch and Arolsen holdings.

What we've found so far

Early results, stated carefully.

  • The method finds the needles. Pointed at museum provenance texts, it independently surfaced documented looted-art cases with no prior knowledge — including the Gutmann family's Degas Landscape with Smokestacks, the subject of a landmark U.S. restitution case. Rediscovering known cases is exactly how a method earns trust before it is turned on the unexamined long tail.
  • History, reproduced from open data. Measuring consignor anonymity in German auction catalogues, the engine traces the arc of the forced-sale market on its own: from roughly 40% anonymous in the late-Weimar years to about 76% at the 1942–44 looting peak.
  • A standing question list. Of the French MNR works, 1,726 carry an officially undetermined spoliation status — a concrete, bounded set of open cases the engine is working through, match by cited match.

Principles & limitations

What we hold ourselves to

Every published claim is reproducible from its cited sources. Identities are never silently merged. Contradictions are shown, not hidden. Calibrated language throughout: "offered at auction in 1937" is a fact with a scan behind it; "looted" is a legal conclusion we do not draw.

What this can't do

A name in a provenance line is a research signal, not proof of wrongdoing — many dealers were legitimate, and the despoiled were victims. AI extraction and matching make errors; that is why sources are always shown and people make the final call. We do not speculate about present-day private owners.