The Provenance Project

Provenance research at machine scale

Roughly 100,000 of the ~600,000 artworks looted in Europe between 1933 and 1945 are still missing. The evidence to find many of them already exists — seizure cards, depot inventories, auction catalogs, restitution files — but it is scattered across dozens of archives, in different languages and numbering systems. No person can hold it all at once. A machine can.

The Provenance Project reads these records, extracts who-owned-what-when as structured claims, and assembles them into a single knowledge graph — then reasons across it to surface evidence-cited leads: a work documented as seized, with no documented return, that may correspond to a work in a public collection today — a lead for researchers to verify against the original records, never a verdict.

owner art-market event undocumented gap present-day holding · click any node

A real reconstruction. This is how the engine traced one work — Maurice Utrillo's La rue Saint-Rustique à Montmartre — from open records alone. Click any step to see the evidence behind it.

Each step is a claim backed by a specific document; the dashed span is the open question the record leaves unanswered. See the full case below.

What this is — and what it isn't

It is a restitution-research instrument. It proposes matches between documented seizures and present-day locations, and every link in the chain cites the original scanned record. Researchers, families, archives, and institutions can follow each trail and judge it themselves.

The framing is deliberate: this is the careful, evidence-first work of provenance research, accelerated — not treasure hunting.

It is not a verdict machine and not a rumor generator. The AI proposes, ranks, and cites; people verify. A lead with no document trail is not a lead.

Falsified provenance was a deliberate wartime and post-war practice. So when sources contradict each other, the system treats the contradiction as a signal worth a human's attention — not noise to smooth over.

Credit where it's due

We build on decades of other people's work

The Provenance Project did not create this evidence, and it does not stand apart from the field — it depends on it entirely. Archivists, provenance researchers, restitution commissions, and museums assembled these records over decades of patient, expert work. Our role is narrow: to cross-reference their open records at a scale no individual can hold in their head at once, and to link you straight back to the original. Every trail we draw ends at someone else's authoritative document — never at ours.

The ERR Project & JDCRP — the database of objects looted through the Jeu de Paume, with the Nazis' own registration cards and photographs.
The German Lost Art Foundation — the Lost Art Database and Proveana.
The U.S. National Archives — the Allied recovery and art-looting investigation records.

The French Ministry of Culture — the MNR / Rose-Valland (POP) and Joconde databases.
The Getty — the Provenance Index of art-market records.
The museums — the National Gallery of Art, Art Institute of Chicago, Cleveland Museum of Art, the Met, and others who publish their provenance openly.

If you maintain one of these resources: we cite and link to you, we don't re-host or replace you, and we would genuinely welcome your corrections at info@theprovenanceproject.com.

Method

A knowledge graph built from claims, not assertions

Most databases store facts and overwrite conflicts. Provenance can't work that way — the historical record is full of deliberate lies, gaps, and disagreements. So the graph stores claims: each one carries the document it came from, a confidence score, and a citation back to the scan. Competing claims coexist. Identities are never hard-merged; "this is the same object as that" is itself a scored, reversible edge. A false merge would manufacture a false lead — the discipline that prevents it is what separates a research tool from a generator of plausible fiction.

Graph analytics do the heavy lifting that no reader could: collective entity resolution that links the same painting across archives despite different titles, languages, and inventory numbers; custody-gap detection — every work documented as seized with no documented return; laundering-motif matching borrowed from anti-money-laundering analysis and pointed at the 1940s art market; and dealer-network centrality, which doubles as a map of which archive to digitise next. Read the full methodology →

Worked example · verified against the primary source

One painting, one gap, one match

The French state holds La rue Saint-Rustique à Montmartre, a Maurice Utrillo street scene, in trust as an MNR work — recovered to France after the war, its pre-war history never fully established. The official record jumps from Paris to a Cologne art society in January 1944 with no explanation of how the painting entered Germany.

Cross-referencing the Getty Provenance Index against that gap, the engine found the painting offered for sale at Commeter, Hamburg, on 20 November 1937 (lot 288, consignor withheld), and again a year later. To check the match, we pulled plate I of the digitised 1937 catalogue from Heidelberg University's IIIF service: it is captioned "M. Utrillo. Nr. 288" and matches the museum's own photograph of the painting element for element — the buildings, the figures, the signature placement.

That auction appearance is a concrete, previously unconnected step toward the painting's wartime path — sourced entirely to public records, and checkable by anyone. It is presented as a documented lead, not a conclusion: who consigned it, and how it reached Cologne, remain open.

owner

Private collection, Paris — to 1937. Held today at the Centre Pompidou.

market

Commeter, Hamburg, 1937 & 1938 — "aus Privatbesitz". Getty PI record + Heidelberg catalogue plate.

recovered

Cologne, 1944 → repatriated to France, 1949 → assigned MNR custody.

The data

The graph is built from openly-licensed records — bulk datasets and public APIs, no scraping behind logins. Today it spans more than 3.5 million records across eleven sources, and now joins the three sides of the problem: the loss (Nazi seizure records), the market (wartime sales), and the present day (museum collections).

36,585

documented seizures, most with the ERR's own photograph

2.05M

artworks in the graph

3.7M

relationships mapped

Source	What it provides	Access
ERR / Jeu de Paume (JDCRP)	The loss side — 36,585 objects looted by the Einsatzstab Reichsleiter Rosenberg, each with the despoiled owner, ERR code, and often the Nazis' own photograph	open API
Getty Provenance Index	1.8M art-market sale records, incl. the German-speaking market 1900–1945	CC0
French MNR / Rose Valland (POP)	2,456 never-restituted works recovered to France	open
Joconde (musées de France)	French national museums; former-ownership data on hundreds of thousands of objects	open
National Gallery of Art, DC	Paintings with full published provenance and IIIF images	CC0
Cleveland Museum of Art	Paintings with published provenance and open-access images	CC0
Art Institute of Chicago	Works including published provenance narratives	CC0
The Met	Works with credit-line and accession data	CC0
Wikidata	~70k paintings as a cross-archive identity hub	CC0

Plus the Getty Knoedler and Goupil dealer stock books and the U.S. NEPIP provenance index. Next, through partnership and public archives: U.S. National Archives recovery records (Munich Central Collecting Point property cards, ALIU reports), the German Lost Art Foundation registry, and the Dutch NK / Origins-Unknown restitution collection.

What we've found so far

Early results, stated carefully.

The three sides now connect. With the loss records in the graph, a documented ERR seizure can be cross-referenced against present-day collections — surfacing candidate correspondences, each carrying the Nazis' own photograph of the object so a researcher can confirm or reject it by eye. These are leads to verify against the original records, never assertions that any institution holds looted property.
The method finds the needles. Pointed at museum provenance texts, it independently surfaced documented looted-art cases with no prior knowledge — including the Gutmann family's Degas Landscape with Smokestacks, the subject of a landmark U.S. restitution case. Rediscovering known cases is exactly how a method earns trust before it is turned on the unexamined long tail.
History, reproduced from open data. Measuring consignor anonymity in German auction catalogues, the engine traces the arc of the forced-sale market on its own: from roughly 40% anonymous in the late-Weimar years to about 76% at the 1942–44 looting peak.
A standing question list. Of the French MNR works, 1,726 carry an officially undetermined spoliation status — a concrete, bounded set of open cases the engine is working through, match by cited match.

Principles & limitations

What we hold ourselves to

Every published claim is reproducible from its cited sources. Identities are never silently merged. Contradictions are shown, not hidden. Calibrated language throughout: "offered at auction in 1937" is a fact with a scan behind it; "looted" is a legal conclusion we do not draw.

What this can't do

A name in a provenance line is a research signal, not proof of wrongdoing — many dealers were legitimate, and the despoiled were victims. AI extraction and matching make errors; that is why sources are always shown and people make the final call. We do not speculate about present-day private owners.