Wednesday 13 October 2010

Comparing two files under LINUX [Basic Linux Command - Part II]

In this post, I am going to introduce you the methods of comparing two files in LINUX and is the second article on the basic linux commands following the previous post.

For comparing two files, we can follow two step task. The first step would be counting the number of lines, words and characters in the two files. After this, we would be interested to check the difference in the lines.

We use wc command to find the newline(giving count for number of lines), word and byte counts for the specified file (or the standard input if any file is not specified.

Let me take an example of the following test.c file on which I will perform wc command.

---------test.c----------

#include <stdio.h>
int main (int argc, char *argv[])
{
    printf("SAMAR\n");
    return 0;
}

samar@samar-laptop:~/Desktop$ wc test.c
6 13 88 test.c

So this file contains 6 lines, 13 words and 88 characters (actually bytes).
You can specify the multiple files at once with this command such as:

samar@samar-laptop:~/Desktop$ wc test.c test.cpp
6 13 88 test.c
9 18 112 test.cpp
15 31 200 total

This way, you can calculate and compare the lines, words and characters count between two files. Now, we will look on the way of comparing files line by line and seeing the difference in the line.

The diff command allows us to display the line-by-line difference between two files.

Lets view the content of two files 1.txt and 2.txt, then we will be issuing diff command to see the difference between these files line-by-line.

samar@samar-laptop:~/Desktop$ cat 1.txt
www.techgaun.blogspot.com
samar dhwoj acharya

samar@samar-laptop:~/Desktop$ cat 2.txt
www.techgaun.blogspot.com
saurya dhwoj acharya
my brother

samar@samar-laptop:~/Desktop$ diff 1.txt 2.txt
2c2,3
< samar dhwoj acharya
---
> saurya dhwoj acharya
> my brother

In the output, the lines in first file are identified with less than sign and lines in the second file are identified with greater than sign. This is how you can see the lines where two files differ with each other. Further, we have sdiff command which will allow you to merge the file differences side by side and view the differences easily. I'll let you explore this command on your own. Also, if you want case-independent comparison with diff command, you can use -i parameter.

Also, I would be more than happy if you try to learn other commands such as cmp on your own.

Hope you learn something from this. Have fun and Happy Vijaya Dashami. :)