Wdiff compare – Good!


Wdiff compare – Good!

[update Dec 2010: click here for my attempt at an online comparison service.]

(see also: Real World Test, Docdiff post, OOo compare post)

I previously commented (adversely) on the performance of OpenOffice.org’s compare function.   Tridge suggested that I might try wdiff – and I’m pleasantly surprised.  When I ran wdiff on the two text files I used in that post it gave me this:

[-1.1Contractor-]{+1.1[Vendor]+} may invoice [-[Customer]-] {+Customer the Fees for each Service in accordance with the Payment Terms and, where no relevant time is set out in the Payment Terms, [Vendor] may invoice the Customer+} for ongoing fees quarterly in [-arrears-] {+advance+} and other fees [-monthly-] in arrears.  [-Contractor must-]  {+[Vendor] will+} provide [-[Customer]-] {+Customer+} with a tax invoice in respect of all GST charged.  [-[Customer] is not liable to pay any amount in respect of GST except as set out in a valid tax invoice.-]

Which, if you can decipher the inline formatting instructions, is a better performance than OOo (not that that would be difficult mind…).  In fact, if I’m reading it correctly, it seems to be only slightly worse than the output of Word’s compare (it marks changes within “words” as a change of a word – so the  [-[Customer]-] {+Customer+} is not a minimal change set- it should be  [-[-] and [-]-]).  I will try to find sometime to test it on longer clauses or whole agreements.  A bonus is that its output format is cut-and-paste compatible, albeit a little visually unappealing.  It shouldn’t be too much trouble to pipe it through something to pretty it up (the -p option apparently prints overstrikes, but this doesn’t work on my tty – ah, -t is what i need for more visual output on a tty).  A second bonus is that -s gives change statistics (I don’t recall ever seeing this in Word).

The other nice thing about wdiff is that it was already installed on my machine!  About two (maybe three??) years ago I googled around for something that would do diffs on text files, but the searches only showed diff, kdiff etc.  Wdiff apparently dates back to the early 90s so my search terms maybe weren’t the best – or too many other people are using and linking to diff!  Anyway, wdiff seems to be part of the standard install for OpenSuSE and I’d had it all along.

And thanks Tridge!

Update:

Tridge says:

There are probably quite a few other similar utilities. Minimal diff
algorithms are a nice little computer science task. I'm sure heaps of
people have written variants. I expect quite a few CS courses use this
sort of thing as an exercise for 2nd year programming students...

Improving the algorithms in OOo would be a nice Google summer of code
project.

Which I think would be a great idea and a valuable addition to OOo.  If anyone’s game to try (either as a student or a supervisor) please drop me an email or lodge a comment and I’ll try to connect you.

Update 2:

Following someone’s suggestion, I have tried meld – am impressed by the interface, but not by the markup.  Run on the same two sample text files it marked:

with a tax invoice in respect of all GST charged.

as unchanged, with the balance as deletion/addition.  Sadly, this is a better performance than OOo (but is still inadequate).

About these ads

12 Responses to “Wdiff compare – Good!”


  1. 1 IanM 20 March 2009 at 1:13 pm

    I pipe the output of wdiff through a short C program I wrote which turns wdiff’s markup into the ANSI terminal codes for colour. So the “old” text is marked with a red background and the “new” text with a green background.

    Looks great in a bash terminal.

  2. 2 Pmartino 22 March 2009 at 12:42 am

    I never tried it but this extension looks intersting evenb if it is not opensource

    http://extensions.services.openoffice.org/project/DeltaXMLODTCompare

  3. 3 jan 22 March 2009 at 4:36 am

    The program wdiff is a front end to diff for comparing files on a word per word basis. A word is anything between whitespace. This is useful for comparing two texts in which a few words have been changed and for which paragraphs have been refilled. It works by creating two temporary files, one word per line, and then executes diff on these files. It collects the diff output and uses it to produce a nicer displany of word differences between the original files.

    This is the first paragraph of the first link in your article. As you can see wdiff is designed to work on a word by word basis. It doesn’t try to find a minimal change set on a per character (or per bit) basis. Besides, I’m not sure, whether this would actually be useful.

  4. 4 Jo 22 March 2009 at 6:56 am

    If you’re just comparing plain text files, have a look at ‘meld’ (http://meld.sourceforge.net/). This is a graphical diff and editor utility in one. Works line by line though, I’m afraid.

    Jo

  5. 5 Uwe Brauer 23 March 2009 at 9:13 pm

    docdiff!

    I think docdiff is what you are looking for.

    It produces (using html markup) a diff file in which word by
    word the differences are highlighted.

    A part from this, it is part of debian/ubuntu

    Uwe Brauer

  6. 6 akshay 27 March 2009 at 5:02 am

    Hey…
    I am a student interested in doing this GSoc project…I did some research behind wdiff and find the actual diff program and then the actual algorithm this stuff is based upon…the Longest Common Subsequence algo…I would really love to this stuff..so please try to *connect* me…..
    thnx….

  7. 7 brendanscott 27 March 2009 at 8:40 am

    Hi Akshay

    The first place to look is the Go-OO Summer of Code Page:

    http://freedesktop.org/wiki/Software/ooo-build/SummerOfCode/2009

    I will see if I can dig up some other info for you.

    Regards

    Brendan

  8. 8 Uwe Brauer 4 March 2013 at 9:17 pm

    (X)emacs has the ediff utility, which at least for not too big files, allows
    wordwise comparison and navigation between these differences.


  1. 1 OOo Compare: Inadequate « Brendan Scott’s Weblog Trackback on 20 March 2009 at 11:57 am
  2. 2 Boycott Novell » Links 21/03/2009: GNU/Linux Advances; Free Software Prioritised in Germany Trackback on 22 March 2009 at 11:24 am
  3. 3 Docdiff compare - Good! + Roundup « Brendan Scott’s Weblog Trackback on 24 March 2009 at 10:10 pm
  4. 4 Real World Test – Wdiff Best « Brendan Scott’s Weblog Trackback on 15 July 2009 at 4:03 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Blog Stats

  • 152,650 hits

OSWALD Newsletter

If you would like to receive OSWALD, a weekly open source news digest please send an email to oswald (with the subject "subscribe") at opensourcelaw.biz

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: