Understanding patches

A patch is a plain text file that describes the differences (and nothing else) between two different files. They are a very convenient way to provide modifications to programs - you modify the sources, generate a patch and submit it back to the mainstream author - as they describe exactly what you changed. The original can easily check your modifications and decide if they are wrong or not.

Patches are very easy to understand, assuming they are in unified format (the only one I'm going to explain here). Let's see an example to make things clear. We start with two files, called file1 and file2. file1 contains the following:

This is the first line.
A line of text.
This is the last line.

And file2 contains:

This is the first line.
A modified line of text.
This is the last line.

To generate the patch between these two files, just do: diff -u file1 file2 (note that the original file goes first, while the modified goes last). The output of this command will be something like:

--- file1       2004-07-23 22:59:00.000000000 +0200
+++ file2       2004-07-23 22:59:05.000000000 +0200
@@ -1,3 +1,3 @@
This is the first line.
-A line of text.
+A modified line of text.
This is the last line.

The first two lines are the header. The first - the one that starts with three dashes - shows the name of the original file and its timestamp. The second - starting with three plus signs - has the same information but for the destination file. The second line is specially important, as the patch(1) utility will use the name in it to look for the file to be modified. More on this below.

After the header, there come several chunks, which are identified by the lines starting and ending with @@. They define the position in the original file where the chunk has to be applied (I've never understood them very well, so can't explain this in detail).

Inside each chunk there is a block of text (the four last lines of my example). Lines starting with a dash mean lines removed from the file (file1). Lines starting with a plus sign mean lines added to the file (file2). At last, lines starting with a single space describe the context of the changes: they are used by the patch utility to locate the exact chunk of the file (i.e., those lines are searched around the position specified in the chunk's header).

When you have two completely unrelated files, "removed" and "added" don't have any real meaning (but in that case, why do you want to do a patch ;-). But when you are modifying an existing file, they do.

As you can see, the patch just includes the lines that differ (and some others that surround the change). But, if the file is large, all the parts of it that haven't been touched won't be included in the patch, thus saving a lot of space.

At last, suppose you have file1 on your computer and I send you the patch in a file named patch.diff so that you can get file2. How do you apply these changes? Easy. Start by copying (or renaming) file1 to file2; patch(1) will look for the later as it appears in the second line of the header (remember I said the name on the second line was important? this is why). Then, just apply the patch: patch -p0 <patch.diff. The program will backup the original file as file2.orig and the new copy of file2 will have all the changes applied. Nice, eh? ;-)

Featured software

Featured posts