Page 1 of 1: DivX ;-) Repair

DivX ;-) Repair

Written by Candela
Official Site - please refer to this site for the latest version
Version 1.0, August 2000

 

 

1. Introduction

 

After many hours of tedious downloading, and finally sitting down to watch your movie some of you may have noticed one of the following things:

 

·        Suddenly the image froze but the sound kept playing. (example video)

·        Disoriented or equally coloured blocks of pixels distorted the image for a short time. (example screenshot)

 

You probably didn't pay too much attention to the latter, but in the first case you had to get out of your lazy chair and fast-forward a bit to make the movie play again. If it happened more then once, you probably deleted the file while wishing the ripper of the movie a one way ticket to hell. But were these 'bad frames', as they are generally called, really the ripper's fault? The answer, in most cases, is quite simply NO. They may be in fact your own fault. In this document I'm going to talk about the cause of these errors and how to solve them. It is based on my own experience and therefore may not be 100% accurate, but I'm always open to suggestions and improvements. I also do not claim I have invented the method for repairing. It is in fact an obvious thing to do but since I have yet to find someone already using it or a specific program for the task, the idea apparently hasn't crossed anyone's mind. Finally I like to mention that even though I'm going to focus on movies, it will actually work on any type of file.

 

You can contact me at the following email address divx_repair@hotmail.com. Or you can try to find me on IRC (EFNET or DALNET) where I'm known as Candela.

 

2. The cause

 

2.1. Some history

 

When I first saw these errors, I thought something had gone wrong during encoding and that the ripper hadn't bothered to check the movie before releasing it on the Internet. I didn't really give it anymore attention until one day I was talking to someone on IRC. It turned out that he had the same movie as me but without the image freezing. On top of that, the filesize of his movie was exactly the same as mine, which proved that it was in fact the same rip and not a different version made by someone else. After playing my version on different systems to make sure my computer wasn't to blame, there was only one possibility left: the file I had was corrupt. After some further investigation this turned out to be true and I was able to replace the corrupt data (only 2046 bytes to be exact) to successfully repair my movie.

 

2.2. Who's to blame?

 

Why or how does data in the movies get corrupt? In an ideal world it shouldn't happen, but then again in an ideal world I would have lots of money, buy all movies on DVD and wouldn't even bother to write this document. When downloading from the Internet, and especially when resuming broken downloads, things apparently can go wrong. Here is an extract from the GetRight help:

 

Rollback XX K on resumed connections: Because some data may have been corrupted when GetRight was disconnected, this allows you to backup a little bit and reget a small amount of data to be sure that no errors are in the file on your computer. It is suggested that this be the about number of kilobytes (K) downloaded in 2 seconds, which will depend on the speed of your Internet connection. For most modems, the default of 4K is fine.

 

As far as I know, GetRight is the only program that performs this kind of 'rollback' and I have yet to encounter an IRC or FTP client with the same functionality. This means that every other program you resume with can corrupt your files. And there are probably other sources of errors too. Don't despair though, these errors are rare and there are good ways to prevent them. But the errors in the movies show that they do happen.

 

2.3. Prevention

 

The following is already common practice and everyone should be doing it to transfer large files over the Internet. But as said before, the world is not perfect and some people never learn.

 

·        Firstly, do NOT download huge files like movies in one part. Split them in smaller parts first. That way, you don't have to download the entire file again when something goes wrong. Splitting can be done by a multitude of programs (e.g. WinRAR).

·        Secondly, always verify the integrity of the files and download corrupted ones again. The verification can either be done internally by the splitter itself (e.g. WinRAR), by a CRC checker (e.g. WIN-SFV32) or by both to be absolutely sure.

If you are already familiar with all of this you will have no trouble understanding how to repair the movies as it is very similar.

 

3. The cure

 

3.1. The basics

 

Let us first get some things straight. This is NOT a 'press one button and your movie is fixed' type of solution. It requires some effort, spare time and a bit of common sense. Even if you think you do not meet these simple requirements, I advise you to read on and decide afterwards.

 

The method is based on the following assumptions (if these are not correct for you don't even bother sending any comments):

 

·        The error is caused by corrupt data and good versions of the movie are available, hence encoding errors cannot be fixed. Also files where data has not been altered but inserted or removed1 cannot be repaired as this method is unable to resynchronise after an error.

·        You do not want to download the entire movie again to fix the errors, only the small amount of data that causes the error.

·        You want to restore the movie in its original state i.e. the state it was in when it was encoded. There are other and easier possibilities to remove the errors (e.g. cutting out the bad frames in an editor like VirtualDub) but these 'mutilate' the file and I find this bad practice. Therefore I will not explain how to do it and neither will I answer questions about it. And it would be best if you did not redistribute these kind of 'fixed' movies

 

Since you want to replace the corrupt data it's obvious you have to get hold of the good data. So the first thing you have to do is find someone who has the same movie without the error. When you have found someone the only thing left is ask him to send the good data. But how do you know where the file is corrupt and which bytes to copy? Most of you probably know you can find the differences between 2 files by comparing them with programs like a hex editor. Unfortunately this requires both files to be on your hard disk which makes this useless here, as you don't have the error-free movie.
Luckily, there exists such thing as a CRC which stands for Cyclic Redundancy Code. The theoretical background of this is not important here. All you need to know is that this number (often 32-bit) is calculated based on the content of a file. If a single byte in the file changes, the CRC will also change. In order to find out if a file is different from another one you only need to compare the CRC2 of both files. This only allows you to see if the file is different though, not what the differences are or where they are located. To circumvent this, you can split3 the files in smaller pieces and compare the CRC of these files. Then you will have good approximation4 of the location of the error (e.g. in part 5 of 20). Only the bad parts have to be downloaded and replaced. Finally you can merge the pieces back together and your file is repaired.

 

note1: Until now I have only encountered corrupt files were data was altered so probably inserted or deleted data almost never occurs.

 

note2: There is a slight possibility 2 different files of equal size will have the same CRC but the odds are practically zero and can be neglected.

 

note3: Even though they are perfectly suitable for error prevention, compression programs like RAR cannot be used as a file splitter here (even in store mode). This is because they alter the data of the file they split depending on which options you set. A program that merely copies the data into different pieces is needed here.

 

note4: The only way to find the error exactly is to do a byte by byte comparison of both files which, as I said before, is not possible.

 

3.2. Tools of the trade

 

Assuming you already have a player for your movies you are going to be needing 2 other Windows1 programs that can be downloaded from the Internet, Topsplit and WIN-SFV32. They are free, small in size, require no installation after unzipping and are very easy to use. You are not obligated to use these specific programs. There are other programs which do exactly the same thing (and are 'compatible'), but these are the best in my opinion. Topsplit, as the name suggests, splits large files into smaller pieces. WIN-SFV32 is a CRC calculator and validator. It uses .SFV files, which are plain text, to store the CRC values.

 

note1: I apologise to users of other OS. I'm sure you can find similar programs yourselves.

 

3.3. Step by step , day by d… (oops almost got carried away there for a moment)

 

Before going any further I would like to emphasise the importance of having a BACKUP copy of your movie. I don't want to be held responsible if you delete or mess up your movie. If you follow my directions closely and read section 4 attentively nothing should go wrong. But you never know because life just isn't fair. Also read the entire document before even thinking about trying anything and never ever do something you don't fully understand.

 

Step 1.

 

The first part is also the hardest. In order to know where your movie is corrupt and to get hold of replacement data, you need access to an error-free copy of the movie. This means you have to search for someone that has it and is willing to help you. IRC is a good place to start. Once you have found a nice person (like myself ;) make sure you don't have different versions of the movie. The perfect way to check is to compare the filesizes in BYTES1. If they are exactly the same, you can be almost 100% sure it's the same rip. Additional checks include the resolution of the picture, framerate, bitrate, sound quality, etc. These can also be used when comparing incompletely downloaded2 movies, where you obviously are not able to compare filesizes. All this information can easily be obtained in Windows Explorer (Select file · Right click · Properties · General & Details tab).

 

 

 

note1: Do not compare sizes expressed in KB or MB! These are only approximations of the real size (converted and rounded).

 

note2: You can also use this method as a safe way to finish incompletely downloaded movies because resuming might corrupt your file. And here's a nice thing to know: you can play incomplete movies in VirtualDub or by looking at the Preview tab visible in the above screenshots.

 

Step 2.

 

Next you both have to split the movie into smaller parts. First thing you need is enough free hard disk space to hold the movie. Then create a directory were you will put the files. Now start up Topsplit and select the movie (Source File Information · Select Source File) and the output directory (Output File Information · Select Output Folder). Decide on a split size to use (I recommend 2.000.000 bytes, read section 4 for further information) and configure Topsplit accordingly (Split Size · Change · By Size). Make sure the output filename is exactly the same for both of you to avoid some minor inconvenience later. Change it if necessary (Miscellaneous · Miscell · Change Split Name).

 

 

Split the movie (Start Process) and it will finish in a couple of minutes depending on your PC configuration (650MB on CD takes about 12 minutes on my lousy P133 with 6x CD-drive).

 

 

When it's done and everything went ok you'll find a lot of files in the output directory named1 .001, .002, .003, … with a size of 2.000.000 bytes.

 

note1: You may also find a .BAT file in the same dir. You can just ignore it or you can prevent it from being created (Miscellaneous · Setting · Split Setting · Batch File Setting · Automatically Generate Merge Batch File). It is used to join the parts again (with the DOS copy command) when you do not have Topsplit installed.

 

Step 3.

 

Next you have to find out were the corrupt data is. One of you will have to create a SFV table. Start WIN-SFV32, select the directory with the split files and press Next.

 

 

Then select the files, choose Create table and press Next again.

 

 

 

Wait for it to finish (about 5 minutes here) and send the .SFV file that was created to the other person. He will use this .SFV file to compare his files with yours. To do this, start WIN-SFV32, select the correct directory and press Next. Point to the .SFV file, select the files and choose Verify files instead of Create table (make sure Delete failed is unchecked if you have the good movie).

 

 

It will start comparing and files that are good will get a green square but when the CRC is different (i.e. the file contains corrupt data) the square will turn red.

 

 

note: I mentioned the filename of both movies had to be the same. If they are not, all files will get a blue square during verifying because they cannot be found. You can edit the .SFV file with a text editor and do a search & replace on the filenames to match yours. Another possibility is to rename all your files or to split again with the correct name.

 

Step 4.

 

As you have probably guessed by now, you will only have to download the files where CRC check failed (red square). Often the errors are small and you'll need only a couple of files. Once you've downloaded the good files replace your own with them. Now it's time to join1 the parts (Merge tab) so fire up Topsplit again and select the first of the split files (Source File Information · Select Source File). Also select an output folder (Output File Information · Select Output Folder) and an output filename (Merge File List · Miscell · Change Merge Name).

 

 

When it's finished you'll find a completely repaired movie on your hard disk. That wasn't so hard was it? ENJOY!

 

note1: You can let Topsplit delete the files as they are joined together (Setting · Merge Setting · Delete Each split file after merge). This is very usefull if you do not have enough free space to hold yet another copy of the movie. However, if the files have their read-only attribute set (e.g. when splitting from CD) Topsplit can't delete them. You can turn this attribute off in Windows Explorer (Select files · Right click · Properties · General tab · Attributes · Read-only).

 

4. Caveats

 

I have decided to put some warnings and remarks in a new chapter instead of including them in the previous one. This is a very important section in my opinion so be sure to read it very thoroughly. It will deal about the rare occasion were both people have a corrupt movie, but the errors are at different locations in the file. As you may have guessed, this allows you to fix both movies but it could also be the reason why your movie is still corrupt after repairing albeit in a different place. I will also give some comments about the splitsize. I'll say it again: read this section thoroughly! Don't come crying to me when you screw up your movie because you didn't think it was necessary to listen.

 

Up until now you didn't have to pay any special attention to the time where the errors occurred in the movie. You only needed it to check if the other movie was ok at that particular moment. When you replaced all your files that had a different CRC your movie was fixed.

But let's suppose the following situation: movie A is corrupt around time [0:15:10] (hours:minutes:seconds) and movie B has a problem around [1:10:05]. If you repaired movie A with movie B by merely replacing the files with a different CRC, you would get an exact copy of movie B with an error around time [1:10:05]. This is off course not what you want. In this case it is still easy to solve because there is almost 1 hour between the errors. If the CRC of files .057 and .265 doesn't match you can be sure that .057 contains the error at [0:15:10] and that file .265 represents the error at [1:10:05]. To fix movie A you copy file .057 from movie B and to fix movie B you copy file .265 from movie A

But if the errors are too close to each other (a few seconds apart) they might be in the same file .057. That means that part .057 would contain both good AND bad data and you would be unable to use it to fix either movie. One possibility is to use a smaller splitsize so the errors get separated into different files. Or you could split up file .057 into smaller files and repeat the process. It could get even more complicated but save you some trouble in these cases and find someone that has an error-free version.

Another problem arises with non-visible errors. If only a few bytes in the file are corrupt it is very likely these errors don't show up when watching the movie. However, they will result in a CRC mismatch but since you cannot see the errors, you will be unable to determine which one of you has the correct data. In that case, don't fix anything unless your movie has other visible errors. Then it is more likely that your movie is corrupt in several other places but you can never be sure off course.

The main problem in cases where errors are hard to spot is that you are unable to link the time in the movie to the byte position in the file because the video uses a variable bitrate. If you get a CRC mismatch of file .075 you cannot determine at what time in the movie the error should be (or the other way around). So you can't go watch the movie meticulously at a particular time to find out if there are any distortions in the image. I have not found a program that gives me this information. The only thing possible at the moment is to make a rough guess. The best I think you can do is the following. Suppose a movie is 650MB and lasts 1,5 hours. Let S be the size of the file in bytes and L the length of the movie in seconds. The error is at time Te and byte offset Oe.

 

S=681.574.400 bytes

L=5.400 seconds

 

The rate in bytes/seconds is then: R=S/L=126.217 bytes/seconds

 

Let's look at 2 cases where either the time or byte offset is known:

Te=1800 seconds

Oe=?

Oe=200.000.000 bytes

Te=?

Estimation of the byte offset

Oe=Te*R=227.191.467 bytes

Estimation of the time

Te=Oe/R=1.585 seconds

 

Remember these are only rough estimations and they can be quite different from the real values, but they should give you an idea where to look.

 

I'm going to conclude by saying something about the splitsize. The size of 2.000.000 bytes has been chosen as a weighed average of several factors. Firstly files of this size can be downloaded fairly quickly, even at slower speeds. Using the above approximations they contain about 15 seconds of video, so errors can be close in time and still be separated in the files. You get around 340 files per CD, which is an acceptable amount because directories with many files are slow to handle. And finally, since errors are usually only a few KB in size, the bigger the files the more superfluous data you have to download. Therefore I have decided to use this size, but you are free off course to use another.

 

5. Conclusions

 

5.1. Why repair movies?

 

The motivation for this is mainly personal. Maybe you don't mind the errors or you feel it is too much work to repair and rather download the movie again on your super-fast T3 connection. This is fine by me but let me give you some reasons why some people would prefer to repair the movies:

 

·        You don't want to waste a CDR on a corrupt movie. Sometimes there is more then one error and they can last several seconds (I even had a movie with so much corrupt data it refused to play).

·        Downloading 650MB takes time no matter how fast your connection is. Often, repairing will take less time, especially if you have 56K modem and spent 3 days of non-stop downloading to get your movie :-)).

·        If your provider only allows you to generate a certain amount of traffic (e.g. 2GB/week), you probably do not want to sacrifice 650MB to fix a small error.

·        You got the movie from a ratio server (or traded for it) and you have to upload another 650MB to download it again.

·        Most people are more likely to send you 2MB then 650MB.

·        The movie was already corrupt on the server, which means downloading will have no effect.

·        You risk getting new errors by downloading again.

·        You might want send the movie to other people and don't want to give them a corrupt movie.

·        It's a bad habit to cause unnecessary traffic on the Internet, just like in real life. Think about other people for a change :-p.

·        Etc.

 

5.2. The reason for this document

 

Very simple, to become rich and famous. No seriously, up until now I have had to fix 10% of the movies I have downloaded. This is quite a lot and I hope you have been more fortunate then me. Almost all of them were already corrupt on the server so it wasn't my own fault and means there are a lot of people out there experiencing the same problems.

Now I thought the time was right to share my findings with the rest of the world. Mainly because I don't think the method can be optimised even further without someone writing a program (any volunteers?).

Another reason is to get those corrupt movies out of circulation. I'm sick of them and I know I am not alone.

And finally, I wanted to make people aware of current problems and show them how they can be avoided in the future (see section 2.3.). Even though this document describes a way to fix the errors, it's always better to prevent a disease then to find a cure afterwards.

 

5.3. What's next?

 

Should you decide to go out and try to fix your corrupt movies you will soon discover that finding someone with a good version is hard and takes a lot of time. The repairing itself can be finished in as little as half an hour. But it would a whole lot easier if people decided to work together. I was thinking of maybe creating an IRC channel or ICQ Activelist where you could come and ask for help. Keeping an archive of .SFV files for error-free movies would also come in handy. People would just have to download it, verify their movie and ask for the parts they need. If anyone has any other good ideas or would like to help, please contact me.

 

5.4. Credits

 

I would like to thank all the people on IRC who took the time to help me fix my movies. You know who you are.

 

Congratulations to all the authors of some of my favourite programs mentioned and/or linked to in this text. Keep up the good work (and sorry but I can't afford to register :).

 

BIG thanks off course to all the guys of the Microsoft Corporation involved in the development of their wonderful MPEG-4 codec, and also to the hackers who made DivX ;-) possible.

 

To all the people that laughed and tried to make me look like an idiot when I proposed my method on Usenet, I would like to say FUCK YOU! You know who you are too.

 

I also like to thank the people that proofread this document for their comments and suggestions.

 

And finally I'd like to say hello to all my friends both on- and offline.