XML: The cornerstone of a Digital First strategy for publishers

By Arash Hejazi

XML sampleIf you search for XML on the web, you will end up on hundreds, thousands of sites where you can get information about XML. Your headache starts when you start with the definition of XML:

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.

Just ignore the definition for a moment. We will come back to it later.

When an editor used to edit a work on paper, they usually had two or three pens: A black or blue pen, a green pen, and a red pen. They used the pens to ‘markup’ the content. They differentiated between the main headings and subheadings, special inline styles, captions, footnotes, quotes, references, etc. so the typesetter could decide which style to apply to the text, and where. Then the typesetter would set the text, using ‘styles’ based on the editor’s markups.

XML is the same concept. It’s the language electronic documents can speak to each other, to software and to humans. It would be easier if you see it in practice:

Let’s say I am working on an encyclopaedia of biographies:

William ShakespeareBirth: 26 April 1564Death: 23 April 1616Life

William Shakespeare was an English poet and playwright, widely regarded as the greatest writer in the English language and the world’s pre-eminent dramatist.[1] He is often called England’s national poet and the “Bard of Avon”. His plays have been translated into every major living language and are performed more often than those of any other playwright. His most renowned works are: Hamlet, Romeo and Juliet and Macbeth.

Footnotes:

1. Dates follow the Julian calendar, used in England throughout Shakespeare’s lifespan, but with the start of year adjusted to 1 January (see Old Style and New Style dates). Under the Gregorian calendar, adopted in Catholic countries in 1582, Shakespeare died on 3 May (Schoenbaum 1987, xv).

I need to markup this document, so anyone and anything, knows exactly how things should be displayed. I want the main entry to be differentiated from the body, the title of books to be in italic, and some parts of the body text to be more prominent than the rest. So I need to mark the text up. In XML you use <> to start your markup construct.

So, I come up with my own markup guide:

Main entry: <Heading1>
Subheads: <Heading 2>
Footnotes: <footnote>
Footnote references: <footref>
Prominent parts of text: <strong>
Name of the books: <booktitle>

etc.

As you see, I am not setting the style here. I am telling to whomever may be concerned, human or machine, what needs to be different from what. Styling comes later, when you decide what you want to do with the text.

So, when I’ve finished the markup of the text above, it will look something like:

<Heading1>William Shakespeare</Heading1><line-break /><Details>Birth: 26 April 1564<line-break />Death: 23 April 1616</Details><line-break /><Heading2>Life</Heading2><line-break /><body><strong>William Shakespeare</strong> was an English poet and playwright, widely regarded as the greatest writer in the English language and the world’s pre-eminent dramatist.<footref>[1]</footref> He is often called England’s national poet and the <emphasis>Bard of Avon</empasis>. His plays have been translated into every major living language and are performed more often than those of any other playwright. His most renowned works are: <booktitle>Hamlet</booktitle>, <booktitle>Romeo and Juliet</booktitle> and <booktitle>Macbeth</booktitle>.</body>

<Heading2>Footnotes:</Heading2>

<footnote>1. Dates follow the Julian calendar, used in England throughout Shakespeare’s lifespan, but with the start of year adjusted to 1 January (see Old Style and New Style dates). Under the Gregorian calendar, adopted in Catholic countries in 1582, Shakespeare died on 3 May (Schoenbaum 1987, xv).</footnote>

Now I have a text that’s marked up in XML, the universal language for documents. Now it doesn’t matter if I want to set the book for print or PDF, build a webpage, or produce an ebook. All I need to do is to tell the publishing solution what to do with the markups. In this case, I want my Heading1 to be bold and big, the Heading2 a bit smaller and in blue, the <emphasis> to mean underlined text, the footnote references to be smaller and superscript, the foontnotes to be smaller, and book titles to be in italic, and I want to publish the text on a web page. I create a styleguide for that purpose and define the styles only once (usually in a CSS file, if you are publishing to the web or to an ebook).

Then my text will look like this:

 

William Shakespeare

Birth: 26 April 1564
Death: 23 April 1616

Life

William Shakespeare was an English poet and playwright, widely regarded as the greatest writer in the English language and the world’s pre-eminent dramatist.[1] He is often called England’s national poet and the Bard of Avon. His plays have been translated into every major living language and are performed more often than those of any other playwright. His most renowned works are: Hamlet, Romeo and Juliet and Macbeth.

Footnotes:

1. Dates follow the Julian calendar, used in England throughout Shakespeare’s lifespan, but with the start of year adjusted to 1 January (see Old Style and New Style dates). Under the Gregorian calendar, adopted in Catholic countries in 1582, Shakespeare died on 3 May (Schoenbaum 1987, xv).

That’s it. Of course, there’s much more to it, and you can get into the details of creating an XML file using various resources available online (I have listed some of them at the end of this post). As a publisher, you need to understand the concept of an XML file, make sure that your editors, copy-editors and typesetters know how to produce XML files, and you would also need a set of markup declarations for your publishing house, called a Document Type Definition (DTD), that sets the grounds for consistency in your XML files, and is basically a list of definitions for your markup. You can either define your own bespoke DTD or choose to adopt one of the standard DTDs out there (eg. NLM DTD or DocBook DTD, DITA, S1000D or other public domain DTDs). You can find a list of DTDs and a comparison here.

This is the first step you need to take to re-model your editorial production workflow into a Digital First model. Once your books are available as XML, sky is the limit and your opportunities increase exponentially. You can create multiple products out of a single XML document, and you (or your users) can customise the content according to their specific needs.

Resources

Comments are closed.