My Git Book Writing Blog

Using Git for the Book (Part 2): antiword

One of the benefits of Git is that you can obtain the differences between commits. So how does this feature work with Microsoft Word? The short answer: it’s somewhat OK.

Git for Windows (http://msysgit.github.io/) came with a piece of software called antiword. This program takes an MS Word document (with the “.doc” extension), and extracts all of its text. Git Bash then has a configuration such that any use of git diff on a Word document would first be converted by antiword. Git then takes the difference between the extracted text. This nifty piece of configuration prevents Git from saying that it cannot obtain the difference between two binary files, but unfortunately, Git cannot apply a diff to a binary Word document.

I found the configuration interesting, but over the course of the writing, I found myself not needing it. If I need to compare files, I resort to Word’s “Compare Documents” tool. This allows me to compare any two Word documents. Further, Word’s venerable “Track Changes” feature records each change to the document. These changes can then be inspected by anyone else. It’s the equivalent of an editor’s red pen!

antiword does have its uses, however. For one thing, with the extracted text I can use command line tools to do analysis (word count and frequency). Also, having the chapters as text makes it very easy to search with the grep command line tool across all my files (each chapter is a separate Word document, so globally searching the book isn’t possible with Word). Finally: it’s faster to open a text file than a Word document, so if I just want to reread something, I usually will check the text file.

There won’t be a BLOG post next week, because of the Thanksgiving holiday (in the US). Thank You, everyone, for reading!