HTML5 Is As Good As It Gets for Publisher Digital Content

2014 has had some strange and negative commentary about publishing with HTML5. The comments appear to be focussed on HTML for trade fiction books. The requirements of publishing genres beyond simple narratives seems to be ignored.

An entity with infinite properties such as HTML5 cannot be understood without considering all viewpoints. In publishing production that means all book and document genres. To address this commentary imbalance and to attempt to see the full digital content production possibilities, other factors have been stated and provided below.

Of course there are many dialogues going on in digital content production. It sometimes appears to be like 'The Blind Men and an Elephant Story'; except there are different publishers in separate cubicles each reading different book genres as they describe digital content:

  • One person is looking at a novel on a Kindle and says how digital contents needs to focus on the design, flow and typography for readability.
  • Another is looking at academic papers and journals on a workstation and says how important searching, linking, indexes and references are for understanding.
  • Yet another looks at K-12 textbooks in an ePub3 reader and says how the most important thing is interactivity, engagement and dynamic feedback.
  • Then from another cubicle a person looking at corporate and government documentation says is has to be multi-author enabled and dynamically reviewable and available instantly in all formats.
  • How to Use HTML for Digital Publishing

    (X)HTML(5), CSS(3) and Javascript provide all of the digital content production, engagement and delivery requirements for all publisher genres right now. All of the components are in place, are relatively mature and well understood. It will continue to be the distribution heart of digital content for the foreseeable future.

    In the past all genres had just one presentation method. Print books are edited and produced in a production machine has has spent numerous years learning how to do XY content presentation on paper; and there are still lousy looking print books produced. The digital content space in 2014 has many more challenges.

    HTML5, semantics and metadata are not show-stopping problems. The common tools used to create them are the show-stopping problems. HTML production issues with valuable digital content are detailed rather than complex. HTML has been around and working for a long time; and it has all the tools needed for the most sophisticated 2014 publishing.

    Publishers need to keep their managed content structurally complete and always ready for business. You can then consistently apply values to the structural components using 'class, data-, title, lang, translate and dir' for semantic, processing, presentation, layout and any other required qualifiers. While this is relatively straight-forward for fiction and even academic text, it is game changing for more complex, interactive, reusable, extensible and dynamic content.

    Use the HTML(5) elements you choose carefully and consistently

    The hardcore historical elements of HTML are < div />, < h1-h6 />, < p />, < ul />, < ol />, < li />, < img />, < table />, < span />, < a /> and a few others. Between them these define the no-fail structure of any document, They set the no-CSS fallback content pattern so core accessibility is addressed up-front and not as an after-thought.

    The simple structural heart of HTML is its strength. The other attributes deliver all the components required for the most sophisticated digital content publishing. HTML/5 in all its versions was designed for the world wide web.

    In production we use well formed (X)HTML tagging patterns with hTML5 declarations. Especially never use epub:type in production content. This gives us the flexibility of MathML and SVG in the HTML without namespace declarations and the content is well-formed XSL processor ready. The HTML tagging is not done for the Internet and especially not any particular e-book format. The content tagging is done to address:

    • Complete and accurate structure, semantics, processing, readiness, presentation rules, styling and metadata.
    • A digital expression of the current commercial value of the content.
    • The processing and generation of all output formats including print PDF, all e-books, static sites, LMS packages and anything else required.
    • Preservation of the value of the content for the future.
    • New requirement content processing readiness.
    • We call this the "tag up, process down" approach. The (X)HTML5 applied to the content stored in Cross Platform Publisher is rich, semantic, processor ready and complete. If any generated format doesn't need or can't use any particular elements or attributes they are processed out or replaced with suitable simplifications at format generation time.

      Tagging Patterns

      In XPP:FoundationXHTML tagging patterns replace the concept of XML validity. If the content tools consistently apply the correct patterns, the content is always ready for processing.

      Tagging patterns are superior to XML DTD's and Schemas because they are complete and can be very easily modified and extended. XML Schemas and custom extensions require constant and expensive maintenance. At some stage the changes result in content processing failing.

      As Cross Platform Publisher is a publishing content production and management application, it addresses publisher content directly and has hundreds of tagging patterns for dozens of document genres built in. That means publisher oriented semantic tagging is automatic and easy.

      Every Publisher Needs to Address Their Content

      Publishers do not need to, and should not tag to some specification agreed content standard. For a start that specification can never be written given the infinite variety and complexity of the content. Publishers need to maintain their content tagging so they control their content. That means a very different approach from the XML Schema camp.

      Our customers have been using XPP:FoundationXHTML since 2007. That is hundreds of publishers and tens of thousands of trade, academic and text books; plus magazines, academic articles, exam papers, government reports and many other document genres. In that time the following e-book formats have been introduced: Kindle KF-7, Kindle KF-8, ePub2, ePub3, Google Search inside PDF, WebApps and Apps.

      The same (X)HTML tagging patterns have seamlessly generated all of these formats as they emerged, including specific platform quirk treatments.

      The job is to built off a well designed (X)HTML(5) foundation that makes everything just work rather than complain because you have the wrong tools or approach. A foundation that is applied consistently can be maintained, extended and optimised into and for the future.

      It is absolutely clear that publishers cannot afford to build a valuable digital content strategy on a format package like ePun or proprietary formats. These are delivery formats only and have no valid or valuable properties for long-term digital content ownership. They encapsulate thinking and technologies of yesterday in a world that is moving fast into tomorrow.

      It is a fact that standards and specification bodies fiddle with their things. They constantly make mistakes and wrong decisions. Even worst they make unforgivable compromises. No publisher should every but into those limitations.

      Get on the HTML5 Wagon

      As publishers and publisher solution providers we have to accept that the content that is made ready for print PDF, e-book or other forms of digital content distribution today, must be ready instantly for any format package, use online, WebApp, App or export and processing by any other system.

      By using HTML5 with well formed tagging patterns as the production method, "small things" like serialisation into JSON for secure delivery are just part of the pattern.

      There has to be a significant change in the way publishers think about the tools they use for digital content production anf the value of the content that is generated by those tools.

Share this post