Good vs Valid XML: Cheap is Dear

For many years I preached the merits of XML-first and XML-early workflows before it was the norm. Now my platform is "good vs valid XML."

Any service provider can provide XML.

Indeed automated XML is pretty much a standard output from most systems that have anything to do with publishing. It's been 13 years since Microsoft Office introduced the XML formats for Excel and Word files.

Yet when I hit the road and speak with publishers about their challenges, a lot comes back to what I put into this bucket of "good vs valid XML." There is a distinction between a valid XML file and a good valid XML file. You can have a file that is valid but doesn’t really achieve the goal of what the content is supposed to be. What happens too often is that budgets demand, or conversion teams choose to do whatever is easiest (i.e., cheapest) instead of doing the right thing to create a good XML file.

Let's look at some examples

Glossary Example

Following is the rendered text and image:


Following are examples of what I call "good" XML and "valid" XML. Take note of the tagging structure used. The <dl> tag itself better defines the content and provides inherent semantic meaning. The valid XML example is missing alternative text. Without alternative text publishers are missing out on improving SEO and, more important, are failing at content accessibility.

Good XML

Definition term with class to differentiate languages

Valid XML

Definition term in paragraph element with strong element.

<dl> <dt class="english"> <strong>amphibian</strong> </dt> <dd>(am fib&#x00B4; &#x0113; &#x0259;n) An animal that lives part of its life in water and part of its life on land. My pet frog is an <strong>amphibian.</strong></dd> <dt class="spanish"> <strong>anfibio</strong> </dt> <dd>Animal que pasa parte de su vida en el agua y parte en tierra. Mi ranita es un <strong>anfibio.</strong> <imggroup> <img id="pEM002-001" src="./images/U99C99/pEM002-001.jpg" alt="A red-eyed tree frog wrapped around a green branch."/> </imggroup> </dd> </dl>
 
<p><strong>amphibian</strong> (am fib&#x00B4; &#x0113; &#x0259;n) An animal that lives part of its life in water and part of its life on land. My pet frog is an <strong>amphibian.</strong></p> <p><strong>anfibio</strong> Animal que pasa parte de su vida en el agua y parte en tierra. Mi ranita es un <strong>anfibio.</strong> <imggroup> <img id="pEM002-001" src="./images/U99C99/pEM002-001.jpg" alt=""/> </imggroup> </p>

Annotated Text Example

Following is the rendered text and image:

The good and valid XML demonstrate an image with annotated text (good) and just an image (valid). Take note of the alternative text in the valid XML example. This description is virtually useless to a visually impaired reader.

Good XML

<sidebar render="required" id="fig_chap03_004"> <hd><strong>Figure 3-4</strong></hd> <br/>A purpose statement explains a website&#8217;s overall goals and the specific objectives that will be used to achieve those goals. <imggroup> <img id="p075-001" src="./images/U00C03/p075-001.jpg" alt="A page from a book that shows a purpose statement example with goals and objectives."/> <caption imgref="p075-001">&#x00A9; 2015 Publisher Name</caption> <prodnote render="required" imgref="p075-001"> <p>primary goal</p> <p>secondary goals</p> <p>objectives</p> <p><strong>Regifting Website</strong></p> <p><span class="underline">Purpose Statement:</span> </p> <p>The goal of the reusable and …</p> <list type="ul" depth="1"> <li>Promote an online …</li> …… </list> </prodnote> <prodnote render="required"/> </imggroup>
 

Valid XML

<sidebar render="required" id="fig_chap03_004"> <hd><strong>Figure 3-4</strong></hd> <br/>A purpose statement explains a website&#8217;s overall goals and the specific objectives that will be used to achieve those goals. <imggroup> <img id="p075-001" src="./images/U00C03/p075-001.jpg" alt="regifting website"/> <caption imgref="p075-001">&#x00A9; 2015 Publisher Name</caption> <prodnote render="required"/> <prodnote render="required"/> </imggroup> </sidebar>

Alt Text Example

 
 

This example demonstrates an image with alt text (good) compared with XML just as an image (valid). Alt text improves discoverability and supports accessibility.

Good XML

<sidebar render="required" class="quote"> <q>I bet the folks at home would like to know what we&#8217;re going to do this year!</q> <imggroup> <img id="piii-001" src="./images/U00/piii-001.jpg" alt="A teenage boy in jeans and sneakers smiling with hands folded in front of him."/> <prodnote render="required"/> <prodnote render="required"/> </imggroup> </sidebar>
 

Valid XML

<sidebar render="required"> <imggroup> <img id="piii-001" src="./images/U00/piii-001.jpg" alt=""/> <prodnote render="required"/> <prodnote render="required"/> </imggroup> </sidebar>

Takeaways

Talk to your vendor about the quality of the XML they produce. The proliferation of offshore vendors has brought pricing models down and this has impacted quality. While price is of great importance and low-cost XML is attractive, publishers are finding that thoughtfulness and editorial quality have been slipping away. With so much technology integrated into publishers’ workflows, it is easy to forget that human QA ensures premium editorial and production services.

  • Good XML is critical for accessibility

  • Good XML improves downstream discoverability

  • Good XML involves automation plus human intervention and that equals quality


If you would like to learn more about some of the ways we help publishers improve XML file creation and XML publishing workflows, simply click the link below.

 

The XML sample file was excellent. I went through it tag by tag, attribute by attribute, entity by entity, and I was very impressed by the level of attention to detail shown. You and your team deserve credit. Over the last 20 years or so I have seen sample files from both sides of the fence—-both supplying them and receiving them—-and these were the best I have ever seen!

Learn more about accessibility in our white paper. Click here to download.


Comment

Mike Groth

Michael Groth is Director of Marketing at Cenveo Publisher Services, where he oversees all aspects of marketing strategy and implementation across digital, social, conference, advertising and PR channels. Mike has spent over 20 years in marketing for scholarly publishing, previously at Emerald, Ingenta, Publishers Communication Group, the New England Journal of Medicine and Wolters Kluwer. He has made the rounds at information industry events, organized conference sessions, presented at SSP, ALA, ER&L and Charleston, and blogged on topics ranging from market trends, library budgets and research impact, to emerging markets and online communities.. Twitter Handle: @mikegroth72