Sunday, September 27, 2015

Hoyle Bibliography: technology update (part 2)

Another techie update...

In my last essay, I gave an overview of the technology I am using for the Hoyle bibliography. One of the claims I made is that storing the descriptions in a highly-structured format would allow me to render them both on the web and in a word processing document. If truth be told, until quite recently, I had never tested that claim, except on the most trivial data. But now I'm ready to declare success!

To review the acronyms briefly, I am storing each bibliographical description in an XML file. I use another language, XSLT (Extensible Stylesheet Language Transformations) to translate the data into HTML for display on the web. I've always assumed that I could modify the XSLT to translate the data into a MS Word file, but had tested that only for unformatted text. It remained to deal with the annoyances of superscripts, subscripts, italics, tables, etc.

Well, I'm quite relieved to be able to report that everything works! In the last essay, I showed the XML for the collation formula for Whist.3, which is displayed as:

12o: A–D12 E4 [$½ (-A2,B2) signed; missigning B4 as B5]; 52 leaves, pp. [8] [1] 2–96

You can see the full bibliographical description on my website here, rendered as HTML. I wrote a new XSLT program reads the same XML and plops the collation formula into a file that MS Word can read. More on that program in a moment. Here is the output, readable by MS Word:

<?xml version="1.0" encoding="utf-8"?><?mso-application progid="Word.Document"?>
<w:wordDocument xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
   <w:body>
      <w:p>
         <w:r>
            <w:t>
               <w:rPr>
                  <w:i w:val="on"/>
               </w:rPr>A Short Treatise on the Game of Whist<w:rPr>
                  <w:i w:val="off"/>
               </w:rPr>, printed for F. Cogan, third London edition, 1743.<w:p/>
               <w:p/>
               <w:t>Collation: 12<w:rPr>
                     <w:vertAlign w:val="superscript"/>
                  </w:rPr>o<w:rPr>
                     <w:vertAlign w:val="baseline"/>
                  </w:rPr>: A–D<w:rPr>
                     <w:vertAlign w:val="superscript"/>
                  </w:rPr>12<w:rPr>
                     <w:vertAlign w:val="baseline"/>
                  </w:rPr> E<w:rPr>
                     <w:vertAlign w:val="superscript"/>
                  </w:rPr>4<w:rPr>
                     <w:vertAlign w:val="baseline"/>
                  </w:rPr> [$½ (-A2,B2) signed; missigning B4 as B5]; 52 leaves, pp. [<w:rPr>
                     <w:i w:val="on"/>
                  </w:rPr>8<w:rPr>
                     <w:i w:val="off"/>
                  </w:rPr>] [1] 2–96 </w:t>
            </w:t>
         </w:r>
      </w:p>
   </w:body>
</w:wordDocument>


All those impenetrable tags beginning <w:....> are the incantations that MS Word needs for formatting.

For the ambitious, you can copy that text into a file and save it as Whist3.xml or some such. Note that the file extension must be .xml. Then launch MS Word and open the file. You should get something that looks like this (click to enlarge):


Notice that I've dealt with paragraph breaks, superscripts, italics, and more. Success!

Not shown in this example are other things I'll need to do: tables, headers, etc. Fortunately, I've solved those items as well. 

Back to the program. The really good news is that there is about an 80% overlap between the XSLT used to translate to HTML and to MS Word. Now that I am learning which parts of the XSLT are the same and which must be customized, I can recode the XSLT a bit more intelligently so that the common 80% is in one file, and the two 20% specializations are in other files.

I can't say I was ever worried about getting my descriptions into MS Word, but it's awfully nice to know it works!

No comments:

Post a Comment