Archive for March, 2009

Microsoft Anti-Open Source Patent Suit Settles

From eweek.com:

According to Microsoft:

The agreement includes patent coverage for Microsoft’s three file management systems patents provided in a manner that is fully compliant with TomTom’s obligations under the General Public License Version 2 (GPLv2).

Advertisements

Docdiff compare – Good! + Roundup

Docdiff compare – Good! + Roundup

(see also: Real World Test)

Docdiff compare of sample clauses

This is part 3 in my (unfortunately, apparently neverending) search for an adequate markup facility that doesn’t rely on MS Word.  The earlier two installments related to OOo and wdiff.

Uwe (thanks Uwe!) posted a comment to my Wdiff post suggesting I try docdiff.  Docdiff is a ruby script which has a number of output formatting options (and looks prettier than wdiff).  Running docdiff on the sample files gives:

1.1Contractor 1.1[Vendor] may invoice [Customer] Customer the Fees for each Service in accordance with the Payment Terms and, where no relevant time is set out in the Payment Terms, [Vendor] may invoice the Customer for ongoing fees quarterly in arrears advance and other fees monthly in arrears. Contractor must [Vendor] will provide [Customer] Customer with a tax invoice in respect of all GST charged.  [Customer] is not liable to pay any amount in respect of GST except as set out in a valid tax invoice. charged.

The cut and paste here has removed the colored highlighting from each add/remove – see screenshot above for a general indication –  I could do without the different colors for different types of addition/deletion.  This output looks very similar to the markup provided by wdiff – including the marking of “[Customer]” being replaced by “Customer”.  It has also caught “charged” as a deletion/addition when it shouldn’t have – wdiff didn’t make this mistake.  However, docdiff has another trick up its sleeve – it can be a character based engine with the use of the –char switch.  With this on the output is:

1.1Co[Ventractdor] may invoice [Customer] the Fees for each Service in accordance with the Payment Terms and, where no relevant time is set out in the Payment Terms, [Vendor] may invoice the Customer for ongoing fees quarterly in arredvarsnce and other fees monthly in arrears. Co[Ventractdor] mustwill provide [Customer] with a tax invoice in respect of all GST charged. [Customer] is not liable to pay any amount in respect of GST except as set out in a valid tax invoice.

The markup is more minimal – and it hasn’t incorrectly caught “charged”.  However, it’s come up with a strange result I’ve never seen in a mark up before – a letter by letter change of Contractor to Vendor.  It’s a bit unnerving to see something like  Co[Ventractdor] appear in a markup.  Conceptually, you probably do want the whole word marked as a change – to indicate you’ve stopped calling one of the parties one thing, and are now calling it another.

The Wash Up So Far

Docdiff provides very readable output, albeit subject to some inaccuracy at the word level of resolution.  I suspect that it would not be too hard to coax wdiff to produce similar output (using the –start-delete=STRING  etc switches).[1]  That said docdiff also provides a variety of different output formats and the ability to use the (slightly weird) –char option.   It’s hard to say which is better.

In an update to my wdiff post, I also tested meld – which performed poorly, albeit better than OOo (sad, but true).  A test of meld is unfair though as this is outside its intended domain of application.  Everybody loves a ranking, so my current rankings (out of 10) of them are:

Word 8.5 (markup known to go feral at times)

docdiff 7.5 (idiosyncracies in the operation of the engine are offset by its output and flexibility – revised down a little based on some more testing of the engine)

wdiff 7.5 (I am assuming I can script something relatively easily to produce more visually appealing markup)

meld 3

OOo 2

diff 2

This is all subject to the caveat that I haven’t tried these tools (other than Word) on long documents.  I may find that they go feral too for long files.  The take away point though is that there are free software solutions which give acceptable markup performance, at least on .txt files.  I have never had the need to compare formatting, so a .txt limitation is no great shakes for me.

I have had a search in YAST for diff.  It’s hard to tell, but the things it throws up seem to be line based (like diff).  If anything looks interesting there I will investigate further.

Finally, improving compare performance has been added as a potential project for Go-OO in Google’s Summer of Code.  If you’re interested, here’s your chance to make your mark! (up… errr …. so to speak)

Notes:

[1]  The answer is:

wdiff –start-delete=\<del\> –end-delete=\<\/del\> –start-insert=\<ins\> –end-insert=\<\/ins\>

or, prettier:

wdiff –start-delete=\<del\ style=\”color:red\”\> –end-delete=\<\/del\> –start-insert=\<ins\ style=\”color:blue\”\> –end-insert=\<\/ins\>

Although, it needs more smarts for multi-line text b/c browsers need <p> tags

NZ Gets a Clue on 92

The National Business Review reports that NZ is looking to rework the three strikes law (variously referred to as s 92, 92a and s 92A) (see also Michael Geist).   Pity other governments don’t demonstrate a similar to commitment to the protection of citizens’ rights.   That said, who knows whether the revamped version will be better?

Wdiff compare – Good!

Wdiff compare – Good!

[update Dec 2010: click here for my attempt at an online comparison service.]

(see also: Real World Test, Docdiff post, OOo compare post)

I previously commented (adversely) on the performance of OpenOffice.org’s compare function.   Tridge suggested that I might try wdiff – and I’m pleasantly surprised.  When I ran wdiff on the two text files I used in that post it gave me this:

[-1.1Contractor-]{+1.1[Vendor]+} may invoice [-[Customer]-] {+Customer the Fees for each Service in accordance with the Payment Terms and, where no relevant time is set out in the Payment Terms, [Vendor] may invoice the Customer+} for ongoing fees quarterly in [-arrears-] {+advance+} and other fees [-monthly-] in arrears.  [-Contractor must-]  {+[Vendor] will+} provide [-[Customer]-] {+Customer+} with a tax invoice in respect of all GST charged.  [-[Customer] is not liable to pay any amount in respect of GST except as set out in a valid tax invoice.-]

Which, if you can decipher the inline formatting instructions, is a better performance than OOo (not that that would be difficult mind…).  In fact, if I’m reading it correctly, it seems to be only slightly worse than the output of Word’s compare (it marks changes within “words” as a change of a word – so the  [-[Customer]-] {+Customer+} is not a minimal change set- it should be  [-[-] and [-]-]).  I will try to find sometime to test it on longer clauses or whole agreements.  A bonus is that its output format is cut-and-paste compatible, albeit a little visually unappealing.  It shouldn’t be too much trouble to pipe it through something to pretty it up (the -p option apparently prints overstrikes, but this doesn’t work on my tty – ah, -t is what i need for more visual output on a tty).  A second bonus is that -s gives change statistics (I don’t recall ever seeing this in Word).

The other nice thing about wdiff is that it was already installed on my machine!  About two (maybe three??) years ago I googled around for something that would do diffs on text files, but the searches only showed diff, kdiff etc.  Wdiff apparently dates back to the early 90s so my search terms maybe weren’t the best – or too many other people are using and linking to diff!  Anyway, wdiff seems to be part of the standard install for OpenSuSE and I’d had it all along.

And thanks Tridge!

Update:

Tridge says:

There are probably quite a few other similar utilities. Minimal diff
algorithms are a nice little computer science task. I'm sure heaps of
people have written variants. I expect quite a few CS courses use this
sort of thing as an exercise for 2nd year programming students...

Improving the algorithms in OOo would be a nice Google summer of code
project.

Which I think would be a great idea and a valuable addition to OOo.  If anyone’s game to try (either as a student or a supervisor) please drop me an email or lodge a comment and I’ll try to connect you.

Update 2:

Following someone’s suggestion, I have tried meld – am impressed by the interface, but not by the markup.  Run on the same two sample text files it marked:

with a tax invoice in respect of all GST charged.

as unchanged, with the balance as deletion/addition.  Sadly, this is a better performance than OOo (but is still inadequate).

AU Blacklist Leaked

AU Blacklist Leaked

Apparently, now the AU blacklist has also made it onto Wikileaks… and wikileaks is timing out in my browser,  and does not respond to ping requests.   The Age is reporting that a number of innocent people have had their websites blocked, including a dentist in Queensland, a tuck shop and a website designer in Sydney.  Clearly the Howard Government, when it set up this scheme, was comfortable with the ethical position of causing actual harm to innocents in order to prevent the possibility of harm to other innocents.

If the reason Wikileaks is not responding is because it has been blocked you have the same problem of whether the blocking is actually sanctioned by the BSA. Update: someone from the US has said they can’t reach Wikileaks either, so maybe it’s not blocked.  I’m sure we’ll find out soon.

Update 2: As of the morning of 20 March, wikileaks has been available, so it looks like it was a transient problem with the Wikileaks server rather than government blocking.  I have not seen any explanation for its unavailablity.

AU Censorship of Wikileaks

AU Censorship of Wikileaks

ACMA, the Australian Communications and Media Authority has added Wikileaks to its list of banned sites according to a couple of sources (Age, Wikipedia, Whirlpool).  I’m sorry, but I can’t provide a link to Wikileaks – apparently it’s illegal.

The site (it may just be a particular URL – I can’t find the announcement on the ACMA site (apparently it’s a secret list available only to IIA members?) ) was added in response to the appearance on Wikileaks of the list of URLs banned by the Danish government.

Is this just a mistake? It is hard to understand how a list of URLs would meet the definition of prohibited content (or potential prohibited content) under Schedule 7 of the Broadcasting Services Act.  The relevant definition is below.  I can’t see how a list of URLs will be RC or X18+/R18+ [1] – this excludes (a) and (b) of the definition.  The list is on Wikileaks, so it is not being provided for profit – excluding (c), and it’s not being provided through a premium mobile service – excluding (d).

Update: apparently there are provisions elsewhere which catch certain categories of indirections, so even if the URLs are not prohibited content a URL directing to the content may be caught.   That still doesn’t answer whether  a bare list (ie not linked) would be caught.

Prohibited content is:

20 Prohibited content
Content other than eligible electronic publications
(1) For the purposes of this Schedule, content (other than content that consists of an eligible electronic publication) is prohibited content if:
(a) the content has been classified RC or X 18+ by the Classification Board; or
(b) both:

(i) the content has been classified R 18+ by the Classification Board; and
(ii) access to the content is not subject to a restricted access system; or

(c) all of the following conditions are satisfied:

(i) the content has been classified MA 15+ by the Classification Board;
(ii) access to the content is not subject to a restricted access system;
(iii) the content does not consist of text and/or one or more still visual images;
(iv) access to the content is provided by means of a content service (other than a news service or a current affairsbservice) that is operated for profit or as part of a profit-making enterprise;
(v) the content service is provided on payment of a fee (whether periodical or otherwise);
(vi) the content service is not an ancillary subscription television content service; or

(d) all of the following conditions are satisfied:

(i) the content has been classified MA 15+ by the Classification Board;
(ii) access to the content is not subject to a restricted access system;
(iii) access to the content is provided by means of a mobile premium service.

Notes:

The National Classification Code is here.

To be classified “RC” a “publication” must “(a) describe, depict, express or otherwise deal with” various matters “in such a way that they offend against the standards of morality, decency and propriety generally accepted by reasonable adults to the extent that they should not be classified;” or “(b) describe or depict in a way that is likely to cause offence to a reasonable adult, a person who is, or appears to be, a child under 18 (whether the person is engaged in sexual activity or not); or (c) promote, incite or instruct in matters of crime or violence“.

The X18+/R18+ references are to sexually explicit material (Category 1 and Category 2 in the Code), so unless the list of urls is some form of ASCII art it is hard to see how this would be caught either.

OOo Compare: Inadequate

OOo Compare: Inadequate

Update 4: see also my wdiff post and (update 5) my docdiff post and (update 6) my Real World Test post.

OpenOffice.org’s compare function has, historically, performed very poorly for me.  Being able to more or less accurately compare two documents for changes is an essential function for any law practice which does any sort of transactional work.   Without it, OpenOffice will never find a place in legal firms.  Moreover, this functionality is of use to anyone who wants to have changes between versions readily identifiable without the need to have track changes on all of the time.

OpenOffice’s compare performance is inadequate.  By way of a test I took two similar clauses and saved them to separate text files.  I then used OpenOffice’s compare function to show the changes.  This was the output:

Comparison of clause by OpenOffice.org showing markup

Comparison of clause by OpenOffice.org showing markup

Despite there being some commonality between the two clauses – eg “may invoice” are the second and third words of both clauses, OpenOffice simply marked the whole thing as a change.  The clauses are each less than 60 words – a markup should not be difficult.  The output is unhelpful and this sort of output from compare is not unusual for OpenOffice.  It should come as no surprise that things do not improve with longer documents.

By way of example, Word’s compare function gave this output:

Word's compare function of sample clauses

Word's compare function operating on the same two sample clauses

This markup shows pretty much an accurate record of the changes which were made.   Text which has not changed is shown as unchanged.   Changed text is shown as a change.  This is not to say that Word does not go markup haywire from time to time, but its track record is vastly superior to OOo’s in my experience.  That said, OOo’s mark up has been so poor for me that the number of times I have experienced it is not that great.

Exactly who uses OOo’s compare function?  What is its intended domain of application?

Update: Richard [thanks Richard] has posted a comment linking to an issue dating to 2005 in the OOo bug tracker.  If you think this should be improved please go vote for the issue.

Update 2: Oh, and based on my experience – any compare based on diff will also be inadequate.  Don’t even consider it.  FWIW I ran them through diff (the output is at least more reader friendly):

< 1.1[Vendor] may invoice Customer the Fees for each Service in accordance with the Payment Terms and, where no relevant time is set out in the Payment Terms, [Vendor] may invoice the Customer for ongoing fees quarterly in advance and other fees in arrears.  [Vendor] will provide Customer with a tax invoice in respect of all GST charged.
---
> 1.1Contractor may invoice [Customer] for ongoing fees quarterly in arrears and other fees monthly in arrears.  Contractor must provide [Customer] with a tax invoice in respect of all GST charged.  [Customer] is not liable to pay any amount in respect of GST except as set out in a valid tax invoice.

Update 3: One of the comments on a linking site points out this code (which I haven’t tried): http://www.plagiarism.phys.virginia.edu/copyfind.cpp

Update 4: Wdiff seems to be very promising.

Notes:

OOo compare results were the same for both versions of OOo I tried (2.3.something and 3.0.1).


Blog Stats

  • 240,253 hits

OSWALD Newsletter

If you would like to receive OSWALD, a weekly open source news digest please send an email to oswald (with the subject "subscribe") at opensourcelaw.biz