More than 340,000 duplicate image files are sitting inside the digital storage systems managed by Brussels-Capital Region's public institutions, according to internal audits circulated among archivists and IT managers this spring. The problem is not new. But the scale, now being quantified for the first time across multiple agencies, is proving far larger than administrators had previously acknowledged.
The issue has landed on desks at a particularly awkward moment. The Region is midway through a €4.2 million digitisation push — the BruDigital programme, launched in January 2025 — designed to make municipal records, heritage photographs and planning documents searchable online. Pouring more content into systems already bloated with redundant files is, in the words of one briefing document circulated by the Brussels Informatics Centre (CIRB), the equivalent of restocking a warehouse without first clearing the existing inventory.
What the Numbers Actually Show
The audit figures break down unevenly across institutions. The Brussels Urban Development directorate, which manages planning archives stretching back to the 1960s, accounts for roughly 118,000 of the duplicate files — about 35 percent of the regional total. The Royal Library of Belgium on the Mont des Arts, while not a regional body, shared comparable findings at a February conference hosted by the Flemish Institute for Archiving: its own photographic collections held a duplication rate of approximately 22 percent across scanned materials ingested before 2022.
Storage costs matter here. The CIRB estimates that each terabyte of managed cloud storage used by regional institutions costs around €180 per year in licensing and maintenance. Duplicate images, many of them high-resolution TIFF files exceeding 50 megabytes each, collectively consume an estimated 14 terabytes of unnecessary space. That works out to roughly €2,500 per year in pure storage waste — a modest figure on its own, but one compounded by the staff time spent tagging, re-uploading and attempting to retrieve mislabelled files. CIRB's own working paper, dated March 2026, put the labour cost at closer to €90,000 annually across the institutions surveyed.
The duplication problem is partly a legacy of how digitisation was done cheaply and quickly during the 2010s. Scanners at the Mundaneum archive in Mons and at the former Maison des Arts in Schaerbeek both fed into centralised regional servers without deduplication protocols in place. Files were scanned once by one department, then again by another working from physical copies of the same photograph, and neither team knew the other had done it.
What Comes Next for Brussels' Archives
The CIRB is piloting an automated deduplication tool on a subset of 60,000 files held by the Brussels Environment agency, using perceptual hashing — a technique that identifies visually identical or near-identical images even when file names differ. Early results from the pilot, which began in April 2026, reduced the test dataset by 19 percent within six weeks. If that rate holds across the full regional inventory, administrators expect to recover more than 60,000 files worth of clean, usable archive space by the end of 2026.
The Urban Development directorate is scheduled to begin its own deduplication pass in September, targeting the Ixelles and Saint-Gilles planning files first — two neighbourhoods where heritage building applications have generated the heaviest photographic documentation over the past two decades. Archivists there have been told to flag any file where metadata is missing or contradictory, which in practice means flagging about one in four images currently held.
For Brussels residents and researchers trying to use public archives — whether through the Be.Brussels open data portal or in person at the Archives of the City of Brussels on the Rue des Tanneurs — the practical impact of the cleanup should be a faster, more reliable search experience. The CIRB plans to publish updated access statistics each quarter from January 2027, which will give the public the first transparent look at whether the investment is actually working. Until those numbers appear, the Region is, for the moment, still counting the cost of its own disorder.