/blog/

2026 0630 Minor contagious metadata corruption in exiv2

I found some contagious metadata corruption in exiv2, such that reading a file with corrupt XML namespace values in XMP data poisons all subsequent metadata writes with the same corrupt namespace — even to other files.

I don’t think it is very serious (at least, compared to other data corruption bugs). All software seems to read the files just fine before and after, and the corruption is limited to XML namespace attributes. But it was interesting!

I made a proof of concept that demonstrates it.

How I got here

I never would have found this, except that I have been testing using git-annex to store my darktable photo library. I keep the originals in git-annex, but the XMP files containing edits and metadata in regular git. After setting up git-annex, I edited a few photos, and found weird changes in git diff:

  1. Replacing http with (ttp

    -    xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
    +    xmlns:xmpMM="(ttp://ns.adobe.com/xap/1.0/mm/"
    
  2. abobe, with two Bs

    -    xmlns:xmp="http://ns.adobe.com/xap/1.0/"
    +    xmlns:xmp="http://ns.abobe.com/xap/1.0/"
    

What is going on here?

At first, I assumed darktable was the source of the corruption, and opened [GH] darktable-org/darktable#21192. It was weird, though; not every edit had this problem. And I couldn’t reproduce it in a fresh library with test files. The darktable team couldn’t reproduce it either. I threw some tokens at the problem, but Claude and Codex also got nowhere. I was stuck.

I felt that I had to start looking at Lightroom, because I was worried about this corruption.

But if you’ve done this before, you know that juggling metadata between darktable and Lightroom isn’t very easy. Edits are straight up not compatible no matter what, but I was willing to accept that. However, I still wanted to keep the ratings I had assigned in darktable, but the two programs assume different XMP filenames (darktable uses basename.ext.xmp, while Lightroom uses basename.xmp), so I had to copy the ratings from what darktable expects to what Lightroom expects. I copiloted a script that would copy the xmp:Rating attribute from the darktable XMP to the Lightroom XMP, and when I reviewed the changes it was making, I discovered that, wait a minute, some of these files in Lightroom already have the bad metadata.

In particular, images from my wedding photographer seemed to have the bad data baked in.

The smoking gun

I redownloaded a single image from the photographer’s gallery, but it was fine. Then I redownloaded the entire .zip archive. The gallery software seems to generate this on demand, and then cache it for some period of time, because the first time you ask for a zip file, it takes a minute or two before it presents you with a link. And when I downloaded that file and extracted it, most of the files there had the bad metadata, including the single images I had just downloaded. In fact, when I redownloaded that single image again after getting the zip, the second copy contained corrupt metadata!

Now that was something I could probably reproduce in darktable. What if I import an image with corrupted namespace attributes first, then modify a known-good image and save it? Would it propagate the corrupted namespace attribute to the good image?

Yep. I had my repro. I added a comment to the darktable bug with definitive steps to reproduce.

With some help from Claude, I was able to pinpoint the problem code to some calls to the exiv2 library. I wrote a small C++ program that could replicate this behavior with just that library, no darktable required. I also opened [GH] Exiv2/exiv2#9323 over there. I was impressed with how quickly they merged a fix.

I did some more testing with my photographer’s gallery software too, from a company called Pic-Time. I found a reproduction there as well. I am not sure this helped, but I did open a ticket with them and explained what was going on and how to reproduce it.

Contagion

What I think is interesting about this is that it is infectious. I don’t know what the patient zero corrupted image was! I’d love to know if it originated with my photographer, or an image they were sent from elsewhere and opened to edit, or with Pic-Time, or what.

In theory there could be millions of images out there similarly infected. I wish I had access to a corpus of images from the web to check. If you work at Flickr or Google or something, see if you can find more of these out there!

Responses

Webmentions

Hosted on remote sites, and collected here via Webmention.io (thanks!).

Comments

Comments are hosted on this site and powered by Remark42 (thanks!).