Power XHTML e-Indexing

We challenged ourselves to create interactively indexing continuous XHTML content (tagged in Cross Platform Publisher) to create print indexes in multiple print book page-size editions and with powerful back linked e-indexes for digital content.
Cross Platform Publisher now has an amazing, easy to use indexing application and a range of processors available, making it an even more powerful tool.
It has been built based on real-world recommendations by our academic clients; plus our own drive to create something better for the digital content future launch-pad on which we all stand.
Indexing is tough. But when done appropriately it adds significant value to print books. What has yet to be seen is the index as the most significant exploration and navigation tool for relevant content in 2013.
Requirements
Traditional indexing is difficult. While there are plenty of guidelines around they do not actually say how an indexing application should work. Axis12 had to understand as far as possible what is happening from an indexers view. So we developed a list of requirements:

  1. Easy to use. This is quite an obvious first requirement for a job as challenging as indexing. The indexer wants to read and think about the content, not struggle with an unfriendly interface.
  2. Interactive built. The index should always be available to view and edit as it is being built. Every change should be seen immediately.
  3. Index while authoring/editing. Rather than limit indexing to when documents are complete indexing can be a part of the authoring and editing process. An indexer can also work on a document while it is being edited and correct the beginner index of an ambitious author or edit.
  4. Navigate and inspect. There are not page numbers while the index is being built. If you want to check that index item in the copy, just click the relevant button.
  5. Editing fidelity. Must work when sections are reorders, or even if an Index term is moved in an editing cut and paste operation.
  6. Works for all formats. Must work for print, e-books, be processable to other XML Schemas and strange formats like ePub3, plus work online.
  7. Support multiple Cross Platform Publisher Design Profiles. This means the index must generate correct page numbers for any print format such as Paperback, Large Print and RGB PDF with full index interactivity even with significant repagination. The days of a one print edition being the master cited reference may be over.
  8. Production convergence. Handle backlist digitization and front list Index generation in the same manner with only the index term resolution being the difference. That is all about the quality of the Cross Platform Publisher XHTML strategy.
  9. Multiple Index Generation. Allow processing of a primary Index to multiple specialist sub-Indexes. Eg: Name/Place/Date indexes. Dream feature!
  10. Index Term to format. Allow index terms to be associated with format (design profiles) and not be limited by print page count requirements. IE. You can have 20,000 index references in your e-Book even if your print edition is restricted to 10 pages.
  11. Remixable. Must be able to be used with the Cross Platform Publisher REMIX feature, allow assembly of disparate sections into a new book, with easy reorganization of the newly assembled Index references and items. A feature regarded as impossible by some (but not the secret way we do it. Now we haven't actually implemented this yet. But it is on the list of things to do.
  12. e-Index Ready.With that "little" list of requirements the problem was attacked with gusto using nothing more than JQuery, Javascript, XHTML, CSS Cross Platform Publisher XHTML as the rock on which to build and a decade or so of exerience getting things wrong until we get them right!

The Tools
The objective was to keep the tools as simple and direct as possible so the Indexer can focus on the content and not the tools. Because Cross Platform Publisher is a Web App and not a desktop application things have to be done a little differently.
Because the indexer gets to see the content and index building side by side there are a number of different strategies available.

  1. Manual Index Terms. Highlight any term and click on the Index list. It is automatically and instantly inserted into the Index list. Click on a root term and it is inserted as a sub-term. Click on a sub-term and it is inserted on a sub-sub-term, etc.
  2. Manual Index Range Terms. Click in a paragraph to set the term range start point. Click in a lower paragraph to set the term range end point. Click on the index as with the Manual Index Terms and the range is instantly set.
  3. Edit terms. What is in the book and what should be in the index. You can edit a term to "Index lingo" and it will immediately re-sort itself.
  4. Italic Style Terms. Italicize where you need to.
  5. Remove a Term Entry. If an entry term is not required, click the delete option.
  6. Remove a Term. If a full term is not required, click the delete option and all entry terms are also deleted.
  7. Click Save. Your Cross Platform Publisher XHTML index lists are immediately generated and available for inspection on the Cross Platform Publisher Writer page and ready for print PDF or e-book format generation.
  8. Key-Term generation. Provide a keyword list and the application will process the file and add all occurences of the Index Keywords to the Index. This gives a robust start for proper-names, events, dates and other sigificant content. It also gives an over-population of terms so both terms and entries had to be able to be inspected and deleted interactively.

Well that's the big picture there are of course a million details which are for user documentation not a major feature announcement.
The Outcome
Generate an edition in A5, B5, 6in X 9in or any other size using Design Profiles and your index is automatically generated with the correct page numbers for each index item. No fuss.
The e-book edition can have print-page numbers using ePub3 page links, sequence number links or anonymous links. More importantly DP lets you reverse process your indexes for e-Books.
Next - Multiple Indexes
Multiple indexes are a reality with some content, but too expensive to produce with print in many cases. You have a great master index, now you want an index of places, people, events, food or any other term group.
To make Indexer's life that little bit more difficult we are thinking of adding an "Index type" classification value so there is one master index processed to all other indexes. IE. Classification metadata on index terms.
This makes it easier to create index rich and valuable content. But that may be breaking the print index methods a little too early in this new digital content frontier we inhabit.
New Interactive Indexes
It needs to be easy to create an Index Term hover (or tap) interaction that allows the horizontal exploration through the content of an Index term and it's sub-terms.
Wrapping it up
Indexes are a print invention and a powerful content engagement tool. However the print index thinking is constrained by things such as page counts in certain types of publishing, plus often the need for economy and a certain vagueness and specialist paper-saving vocabulary construction. Will a digital content transition occur, how soon will the transition occur? Sigh! a million questions...
Our index master-plan is to be able to combine all Indexes from all books together within the reading system and make them a valid search target. For most appropriate content this will result in more valuable search and discovery results. We are now going to start experimenting with our shiny new e-Index tool for Web Indexes on static sites.
If e-Index creation is easy and delivers the goods, indexes become significant content engagement power-up tools for the future, not just a continued imitation of print indexes.
Cross Platform Publisher e-Index is ready to go and will be updated on all licensee installations on the next major release update.
It is up to you whether you create yesterday's print indexes or start redefining indexes and indexing as a vital component in significant digital content into the future.
All feedback gratefully received. We are sure this toolset will be a prime candidate for a lot of improvement.

Share this post


About Us

Axis12 specialise in building, hosting and supporting high traffic, content heavy web applications for both the public and private sector that help them achieve their Digital First aspirations. We recently implemented Cross Platform Publisher for the National Health Service (NHS) in the United Kingdom, which has transformed the way online reporting and publishing is carried out.
Read more...