Wednesday, September 30, 2015

Hoyle Bibliography: technology update (part 3)

For background to this short post, please see part 1 and part 2 of the technology update. I've pretty much finished the work of producing MS Word from my XML files, but my approach is quite different from the one I expected.

I thought the model would be XML->HTML for the web and XML->MS Word for the print version. It turns out to be easier, much easier, to go XML->HTML->MS Word!

To look at the sample file, the HTML version of Whist.3 is here. Below is the translation to MS Word:

(click to enlarge)
Now you'll notice there isn't much in the way of formatting: no borders on the table, no nice margins or spacing, no bold table headers, etc. That's deliberate. One can always add styling later and it can be quite hard to remove if there's too much. What I have done is get all the text rendered correctly: smallcaps, italics, superscripts, etc. And the crazy table with the rows and columns that span cells. [Aside: As you can learn here, spanning columns is simple; spanning rows is much more difficult.]

Other than spanning rows, the hardest thing was managing whitespace. There is a whole section in my XSLT/XPath book on whitespace including a subsection "Solving Whitespace Problems" with subsections "Too Much Whitespace" and "Too Little Whitespace". I had problems with both. It was necessary:
  • to have the XML->HTML transformation use stricter <xsl:output method ="xml"> rather than "=html"
  • to have the HTML->MS Word transformation use <xsl:strip-space elements="*"/>
  • to write a function to "normalize" all text data--that is, collapse consecutive white space into a single space, but allow an initial leading and trailing space.
 Okay, TMI, I know. But I wanted to write it all down so I wouldn't lose it.

There may well be better ways to do this. I found myself frequently at the boundaries of my knowledge. But with a lot of Googling and reading, I've found that many others have been down this path and come up with similar solutions.

OK, enough technology. Back to bibliography!

No comments:

Post a Comment