What is MOV?
MOV is the extension used for QuickTime files, a multimedia file format created by Apple in 1991 and used by the production and broadcasting industries.
MOV is not a video or audio format, but a container format:
- MOV organizes the media content inside the file in several tracks
- Each track can be encoded in a different video or audio format
MOV versus MP4
MP4 container format is directly based on the QuickTime File Format (QTFF):
- MOV and MP4 extensions are mostly interchangeable in a QuickTime-only environment
- While MP4, being an international standard, has more support through software and hardware vendors, MOV has more features.
- MOV is more oriented towards production, MP4 is more a distribution format and is widely used in consumer video industry.
MOV has several strong points:
- Future-proof: MOV files are codec agnostic, have metadata support, and have unlimited size and duration.
- Workflow-friendly: QuickTime file format has abstract data references and track edit lists, which makes it suited for editing. Thanks to "Reference movies", MOV can do editing without copying media data, which other container formats can't do.
- Interoperability: MOV can be read and written by a variety of software, including AVID Media Composer, Adobe Premiere, Sony Vegas, GrassValley EDIUS and Final Cut

MOV has also some drawbacks:
- Fragility: QuickTime file format was not designed for reliance. Missing or bad data at critical locations will make the file unplayable.
- Feature Bloat: QuickTime file format has accumulated over 25 years a lot of byzantine features that are only supported in QuickTime-centric environments. Interoperability with third-party software and devices only happens when the MOV file limits itself to the core features, those also present in MP4 standard.
Structure of QuickTime MOV files

MOV files are structured in atoms that can be nested into many sub-levels, like nested boxes.
QuickTime file format allows for very diverse and complex structures, but the simplest and most common one is populated with 3 top level atoms:
- ftyp - File Type
- mdat - Media Data
- moov - Movie
ftyp atom contains information about the standards that file conforms to. It is a simple structure with a few dozen bytes only.
mdat atom contains media data, both video and audio, and occupies almost 100% of the file size. During live recording, video and audio data is written in bulk into this section of the QuickTime file.
moov atom is the "brain" of the MOV file. We will study in details its structure later.
mdat: Media data in bulk
Media data is stored in bulk: Video frames and audio block are recorded one after another, interleaved and without any header to make identification of individual frames easy.
For example, a MOV with two text tracks, one in French and one in English, with words counting as frames, would have a mdat atom like this:
LesReprésentantsduPeupleFrançais,WhenintheCourseofhumanevents,constituésenAssembléeNationale,itbecomesnecessaryforonepeopletodissolvethepoliticalbandswhichhaveconnectedthemwithanother,considérantquel'ignorance,l'oubliouleméprisdesdroitsdel'Hommeandtoassumeamongthepowersoftheearth,sontlesseulescausesdesmalheurspublicsetdelacorruptiondesGouvernements,ontrésolud'exposer,dansuneDéclarationsolennelle,lesdroitsnaturels,inaliénablesetsacrésdel'Homme,theseparateandequalstationtowhichtheLawsofNatureandofNature'sGodentitlethem,afinquecetteDéclaration,constammentprésenteadecentrespecttotheopinionsofmankindrequiresthattheyshoulddeclarethecauseswhichimpelthemtotheseparation.àtouslesMembresducorpssocial,leurrappellesanscesseleursdroitsetleursdevoirs;afinquelesactesdupouvoirlégislatif,etceuxdupouvoirexécutif,pouvantêtreàchaqueinstantcomparésaveclebutdetouteinstitutionpolitique,ensoientplusrespectés;afinquelesréclamationsdescitoyens,fondéesdésormaissurdesprincipessimplesetincontestables,tournenttoujoursaumaintiendelaConstitutionetaubonheurdetous.
The role of the moov atom is to organize this mess and make it readable.
If you know that you should expect French and English words, you can start separating words and with some effort it becomes readable.
Information about what each track contains is stored deep inside the moov section inside atoms called stsd (Sample Descriptions).
Les Représentants du Peuple Français, When in the Course of human events, constitués en Assemblée Nationale, it becomes necessary for one people to dissolve the political bands which have connected them with another, considérant que l'ignorance, l'oubli ou le mépris des droits de l'Homme and to assume among the powers of the earth, sont les seules causes des malheurs publics et de la corruption des Gouvernements, ont résolu d'exposer, dans une Déclaration solennelle, les droits naturels, inaliénables et sacrés de l'Homme, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, afin que cette Déclaration, constamment présente a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. à tous les Membres du corps social, leur rappelle sans cesse leurs droits et leurs devoirs ; afin que les actes du pouvoir législatif, et ceux du pouvoir exécutif, pouvant être à chaque instant comparés avec le but de toute institution politique, en soient plus respectés ; afin que les réclamations des citoyens, fondées désormais sur des principes simples et incontestables, tournent toujours au maintien de la Constitution et au bonheur de tous.
Even armed with French and English dictionaries, separating the words is quite challenging.
In some cases several solutions are possible: take humanevents for example.
You could split it into human events or a more picturesque humane vents, both are english words.
Therefore reading (in computer jargon we say parsing) media data is very hard, it requires heuristic methods and will not always to give the intended result.
This is why the moov section, in addition to track descriptions, includes an index that tells where each frame (word) starts and ends.
moov index consists in several tables with address, size, time, duration, and display order information for each frame of each track. Using this information, the media data starts to make sense:
moov: MOV "Header"
The moov atom is often referred to as the "header", but that's misleading:
ftyp atom is indeed the true header: a short, predictable piece of data that tells what the file is. Technically, moov atom is a database, not a header.

QuickTime file format allows for the moov atom to be anywhere inside the file, the most common locations being at the beginning, just after ftyp File Type atom, or at the very end, just after mdat atom containing media data.
In the example on the left, we can see the top level atoms ftyp, wide, mdat, moov and udta.
moov structure has nested atoms, up to 6 levels deep.
This MOV file contains 2 tracks (video and audio) and the critical parts are the media tables called stsd, stts, stsc, stsz and stco.
If any information in the moov atom is missing or incorrect, the MOV file becomes corrupt and unplayable.
Tools to inspect MOV files
There is a lot of free tools to inspect MOV and MP4 files: Atomic Parsley, Mp4 Explorer, Dumpster...
This example comes from Atom Inspector, available on Apple Developer website. Note that those tools won't go very far if the MOV file is damaged or one of the atom is corrupt.
How to Fix Corrupt MOV Files
How will MOV files become corrupt?
- If audio/video recording ends abruptly, the MOV file will lack the moov atom that gives sense to video and audio data stored in bulk in the file. Reindexing will be needed. (see below)
- If the MOV file has been undeleted or recovered from a formatted card, it probably won't play and will need reindexing as well.
- If the MOV contains a moov atom but doesn't play, there is probably some inconsistencies in the database. If we are lucky, this can be fixed by container structure correction.
Reindexing
To make the corrupt MOV file playable, the moov database has to be recreated.
Media data stored in bulk inside the file has to be parsed. As seen in the "French and English words" example above, this is not a deterministic task. A heuristic algorithm specific to the type of media (French words, English words) has to be developed. It has to be context-aware and use probabilities to determine if a piece of data is more likely to be video or audio, and where it exactly starts and ends.
One minute of video contains around 4000 frames of audio and video. It means that to repair one minute of video, the tool has to correctly identify 4000 elements stored in bulk inside the file. Taking into account that encoded video and audio don't contain patterns that can easily be identified (that would go against compression performance), the task is complex.
A repair tool with a failure rate of 1/1000 will cause an average of 4 glitches per minute of video repaired. That would be unacceptable.
This is why Aero Quartet develops unique tools for every request it receives. Our tools are configured to identify exactly one type of video and one type of audio, and this gives them a level of quality that generic repair tools can't achieve.