Teach a man to fish…

(Third installment of my “Movie Repair Guide“, where you’ll learn how a movie can be extracted from a larger file)

repairing movies often gives surprises

When you repair movies, you never know what you can find.
Yesterday I had an old shoe moment while looking for some mountain footage in a corrupt file.
Some obscure AVI Windows file containing this infamous “Download” animation surfaced when I least expected it:


Error text.

Today I’ll teach you to fish, in other words I’ll explain how to rescue movies from inside larger files. It will only work if certain conditions are met, but it’s an exciting experience… that can end with an old shoe. You’ve been warned.

-=-=-

Container Structure Correction is a repair technique that acts upon the container structure data, leaving the media data and the index and tables untouched.
Unlike reindexing or other techniques that need to act on hundreds of audio or video frames, a structure correction is usually a question of few bytes to correct or a misplaced block of data. For this reason, it can be done manually.

If the same correction has to be applied on a collection of files, then maybe it’s worth spending time automating the structure correction.

Most common structure damages and how to fix them:

  • Embedded movie
  • Full movie data, including container, is embedded in a larger file, for example after a data recovery.
    In this case, the data can be extracted and saved with the container suffix and it usually works.

    In this example, we have examined the file with an Hex editor and have found an intact RIFF structure in the middle of a file. Note that the length of the structure 0x49e4 is encoded just after RIFF in 32 bits, big-endian.
    A quick check shows that after 0x49e4+8 bytes, there is suddenly no data, thus confirming that we are into something.

    02a3970: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    02a3980: 0000 0000 0000 0000 0000 0000 0000 0000 ................
    02a3990: 5249 4646 e449 0000 4156 4920 4c49 5354 RIFF.I..AVI LIST
    02a39a0: 2004 0000 6864 726c 6176 6968 3800 0000 ...hdrlavih8...
    02a39b0: 8545 0100 f816 0000 0000 0000 1008 0000 .E..............
    02a39c0: 0500 0000 0000 0000 0100 0000 7c0b 0000 ............|...
    02a39d0: 3c00 0000 3100 0000 0000 0000 0000 0000 <...1........... 02a39e0: 0000 0000 0000 0000 4c49 5354 d403 0000 ........LIST.... .......... 02a8310: b9b9 b9b9 b9b9 b9b9 b9b9 b9b9 b9b9 b9b9 ................ 02a8320: b9b9 b9b9 6964 7831 5000 0000 3030 6462 ....idx1P...00db 02a8330: 1000 0000 0400 0000 7c0b 0000 3030 6462 ........|...00db 02a8340: 1000 0000 880b 0000 7c0b 0000 3030 6462 ........|...00db 02a8350: 1000 0000 0c17 0000 7c0b 0000 3030 6462 ........|...00db 02a8360: 1000 0000 9022 0000 7c0b 0000 3030 6462 ....."..|...00db 02a8370: 1000 0000 142e 0000 7c0b 0000 0000 0000 ........|....... 02a8380: 0000 0000 0000 0000 0000 0000 0000 0000 ................

    By copying this data to a new file and saving with an AVI suffix, we now have a valid movie.
    Make sure you copy exactly the required bytes, starting from the R in RIFF, ending after idx structure, otherwise it won't work.

  • Multiple moov atoms found, but last one is corrupt
  • When you edit a movie in-place (ie without writing again all the file), some edition softwares just add a new moov container at the end of the file, without bothering to remove older ones. If the file has become corrupt, those old containers can still work.
    The trick consists in finding the moov structures and redirect towards an older one.
    You'll need an Hex editor and good knowledge of QuickTime file format to do that.

  • Mispositioned blocks of data
  • Files from a data recovery can present defects: contents not matching the file name, truncated files, mashed contents coming from several files, moved or missing blocks.
    As data recovery acts at the lower filesystem level, the data is always presented in blocks, for example exactly 2048 bytes.
    A movie can fail to open because some blocks have been moved, duplicated or missing. The fact have blocks always have a length of 2048 bytes helps to detect and correct those problems. It's like solving a puzzle and is usually complex.