FINAL
DSSSL Survey
and
Assessment Report

for the

DOD CALS IDE PROJECT

An MVP Joint Venture

March 1997

Submitted by
ManTech Advanced Technology Systems
West Virginia Technology Applications Operations Center
1000 Technology Drive, Suite 3310
Fairmont, West Virginia 26554

In support of
Contract DAAB10-94-D-0503-0048
and in compliance with
CDRL Sequence Number A009



______________________
______________________
Robert S. Kidwell
Jack G. Richman
Technical Director
Executive Director
DoD CALS IDE Project
DoD CALS IDE Project



TABLE OF CONTENTS

   
[ Next ]           [ Home ]

LIST OF FIGURES
LIST OF TABLES
ABSTRACT
1.0  DOCUMENT STRUCTURE
2.0  HISTORY
3.0  OVERVIEW OF SPECIFICATIONS
    3.1  MIL-PRF-28001C Output Specification
    3.2  Document Style Semantics and Specification Language (DSSSL)
        3.2.1  DSSSL-Online
4.0  DSSSL VS 28001 OS CHARACTERISTICS EVALUATION
    4.1  Summary of Output Specification Characteristics
        4.1.1  Resource Description
        4.1.2  Security Description
        4.1.3  Page Models and Pagination Characteristics
        4.1.4  Style Description
        4.1.5  Graphics Description
        4.1.6  Table Description
        4.1.7  Footnote Description
    4.2  Document Style Semantics and Specification Language Characteristics
    4.3  DSSSL-Online Characteristics
    4.4  Comparison Matrix
    4.5  Comments on Matrix
    4.6  Internet DART
    4.7  MIL-M-38784C Formatting Requirements of Technical Manuals
5.0  SUMMARY SECTION
APPENDIX A:  MIL-PRF-28001C / OUTPUT SPECIFICATION CHARACTERISTICS
APPENDIX B:  DSSSL CHARACTERISTICS TABLE
APPENDIX C:  MIL-M-38784C REQUIREMENTS ADDRESSED BY DSSSL AND OS
APPENDIX D:  REFERENCES
APPENDIX E:  ABBREVIATION AND ACRONYM LIST

LIST OF FIGURES


      
[ Previous ]           [ Next ]           [ Home ]

Figure 2.0-1  Procedural Markup
Figure 2.0-2  Descriptive Markup
Figure 3.1-1  FOSI Context Diagram
Figure 3.2-1  DSSSL's Transformation Language Flow Diagram
Figure 3.2-2  DSSSL's Style Language Flow Diagram
Figure 4.0-1  Venn Diagram Characteristics Evaluation
Figure 4.1.3-1  Page Parameter Areas
Figure 4.2-1  Inline and Display Area
Figure 4.6-1  MOREplus System Architecture
Figure 4.6-2  Sample DART Page

LIST OF TABLES


      
[ Previous ]           [ Next ]           [ Home ]

Table 4.4-1  DSSSL Output Specification Comparison Matrix

ABSTRACT


      
[ Previous ]           [ Next ]           [ Home ]

The emergence of the International Organization for Standardization (ISO) standard known as the Document Style Semantics and Specification Language (DSSSL) is expected by many to be the long awaited solution to providing an internationally standardized method for formatting style and applying transformations steps to Standard Generalized Markup Language (SGML); (ISO 8879). In the Continuous Acquisition and Life-Cycle Support (CALS) domain with SGML defined by MIL-PRF-28001C, the Output Specification (OS) was created to provide users with the ability to format their SGML documents in a standardized and interchangeable manner, first for printed output and second for electronic presentation and delivery.

This report will discuss both of the specifications and will attempt to functionally correlate the features offered by them. This report will assess the capabilities, benefits, and limitations of the emerging DSSSL standard for the display of technical information in a platform independent manner and compare it to the capabilities, benefits, and limitations of the current OS in MIL­PRF­28001C. A key part of this report will be to identify the technical issues associated with the OS and the DSSSL standard and then assess the gaps that exist in both of the standards with respect to the formatting of technical manual data. Functionality and characteristics matrices will be established for each of the standards. The matrices method opens the door for an objective and impartial analysis and will identify the overlapping and isolated features in the two specifications. Armed with this information, the user will have an independent, universal set of formatting requirements for technical manuals to identify the features not addressed by either of the specifications.

Active participation from the vendor community is vital for the development and progress of either of the specifications. In our research, we have identified only a limited number of vendors supporting the Output Specification targeted to specific program implementations. Furthermore, it was discovered that the different vendors have not interpreted all aspects of the OS in the same way, resulting in Formatting Output Specification Instances (FOSIs) that are not directly interchangeable. With the recent release for publication of the DSSSL standard, products conforming to DSSSL are still in their infancy and are expected to establish themselves in early 1997. The report will introduce some of the vendor plans for DSSSL implementations and their time schedules. In addition to the DSSSL standard, a DSSSL-Online Specification, whose objective is to specify a subset of DSSSL as the basis for electronic delivery of documents, is being developed. The DSSSL-Online subset is also included in the scope of this report and is assessed with the OS.

Technical manuals are publications that contain instructions for the installation, operation, maintenance, training, and support for weapon systems, weapon systems components, and support equipment. MIL-M-38784C is a military specification titled Manuals, Technical: General Style and Format Requirements. The specification covers the general style and format requirements for the preparation of technical manuals and changes to standard technical manuals. The formatting requirements and style for technical manuals given in this specification are taken and a matrix that marks the requirements addressed by the MIL-PRF-28001 OS and DSSSL is also studied in this report.

1.0  DOCUMENT STRUCTURE


      
[ Previous ]           [ Next ]           [ Home ]

This report will first give a brief history of the requirement for the formatting of SGML information and the techniques currently available for doing so. An in-depth introduction to the Output Specification, followed by a discussion of DSSSL, its uses and versatility will follow. It will provide detailed introductions to the Output Specification in MIL-PRF-28001 C and the Document Style Semantics and Specification Language. A methodology for the assessment of the features and characteristics will be provided as the underlying approach to the task of comparing the formatting features addressed by the two specifications.

To arrive upon a set of formatting requirements for the comparison of the MIL-PRF-28001 Output Specification and DSSSL, it is necessary to carefully study the characteristics of each of these standards as well as those characteristics not covered by each of these specifications and develop a matrix of comparison criteria. This generic set of criteria must be able to expose the features that both of the specifications address and those they do not. This falls in line with our development of an Independent Universal Set of formatting requirements for technical manuals. This analysis between the two specifications results in the formation of a comparison matrix within the scope of this report.

The second matrix given in the appendix of this report concentrates on the formatting requirements specified in MIL-38784: Manuals, Technical: General Style and Format Requirements. Using the criteria specified in this military specification, the requirements of formatting for technical manuals are marked against the features addressed by the Output Specification and DSSSL. This matrix will identify the features not offered by the two specifications and could be a starting point for possible future enhancement of either specification. We anticipate that this matrix will help identify a possible migration path between implementations using the Output Specification towards the DSSSL specification.

A summary and conclusion section will discuss the findings in this comparison study. Detailed matrices consisting of the MIL-PRF-28001 Output Specification and DSSSL characteristics will be given in the appendices.

2.0  HISTORY


      
[ Previous ]           [ Next ]           [ Home ]

SGML is an international standard for the description of marked-up electronic text. More exactly, SGML is a metalanguage, that is, a means of formally describing a language, in this case, a markup language. Historically, the word markup has been used to describe annotation or other marks within a text intended to instruct a compositor or typist how a particular passage should be printed or laid out. Examples include wavy underlining to indicate boldface, special symbols for passages to be omitted or printed in a particular font, and so forth. As the formatting and printing of texts was automated, the term was extended to cover all sorts of special markup codes inserted into electronic texts to govern formatting, printing, or other processing.

By markup language we mean a set of markup conventions used together for encoding texts. At a banal level, all printed texts are encoded in this sense: punctuation marks, use of capitalization, disposition of letters around the page, even the spaces between words, might be regarded as a kind of markup, the function of which is to help the human reader determine where one word ends and another begins, or how to identify gross structural features such as headings or simple syntactic units such as dependent clauses or sentences. A markup language must specify what markup is allowed, what markup is required, and how markup is to be distinguished from text. SGML provides the means to satiate these requirements. Markup is everything in a document that is not content. Markup originally referred to the handwritten notations that a designer would add to typewritten text; these notations contained instructions to a typesetter about how to lay out the copy and what typeface to use. This kind of markup is known as "procedural markup." Most electronic publishing systems today, such as word processing software and desktop publishing software, use procedural markup. Procedural markup is typically unique to a specific software package such as Microsoft Word and Quark XPress. Each has its own set of markup codes that make sense; only to itself. This markup usually takes the form of formatting codes that are mixed in with the text of the document. Procedural markup codes apply to a single way of presenting the information, such as a printed page, and provide no capability to define appearance for other media, such as Compact Disk with Read Only Memory (CD-ROM) or for publishing across the Internet.

Descriptive markup, also known as "generic markup," describes the purpose of the text in a document, rather than its physical appearance on the page. The basic concept of descriptive markup is that the content of a document should remain separate from its style. Descriptive markup is based on the structure of a document and identifies elements within that structure--such as a chapter, a section, or a table of contents--using notations that describe what the element is, not how it appears. By separating presentation information (i.e., style) from the structure, descriptive markup allows for multiple presentations of the same information. For example, you can publish on paper, on-line, on CD-ROM, and on the World Wide Web (Internet), all from one set of source files.


Figure 2.0-1  Procedural Markup


Figure 2.0-2  Descriptive Markup

A typical document consists of three layers: structure, content, and style. SGML separates these three aspects, but deals mainly with the relationship between structure and content. At the heart of an SGML application is a file called the Document Type Definition (DTD). The DTD describes the structure of a document, much like a database schema describes the types of information it handles and the relationships between the fields. A DTD provides a framework for the elements (such as chapters and chapter headings, sections, and topics) that constitute a document. A DTD also specifies rules for the relationships between elements; for example, "a chapter heading must be the first element after the start of a chapter." These rules, which the DTD defines, help ensure that documents have a consistent, logical structure. A DTD accompanies a document wherever it goes. A "document instance" is a document whose content has been tagged in conformance with a particular DTD. Content is the information itself: content includes titles, paragraphs, lists, tables, graphics, and audio. The method for identifying the content's position within the DTD structure is called "tagging." Creating an SGML document involves inserting tags around content. These tags mark the beginning and end of each part of the structure. The structure of a particular document is revealed by the nesting of tags. Fortunately, human beings usually do not have to deal with manually typing in tags and checking to make sure all the tags are there. Some SGML-based authoring software programs make it easy to enter tags by clicking on pull-down menus that list only those tags that are valid at the cursor's current position in the document. These programs rely on a software module called a "parser" that verifies that the document follows the rules of the DTD. (The parser also verifies that the DTD itself is structurally correct.)

Three characteristics of SGML distinguish it from other markup languages: its emphasis on descriptive rather than procedural markup; its document type concept; and its independence of any one system for representing the script in which a text is written. A descriptive markup system uses markup codes that simply provide names to categorize parts of a document. By contrast, a procedural markup system defines what processing is to be carried out at particular points in a document. In SGML, the instructions needed to process a document for some particular purpose (for example, to format it) are sharply distinguished from the descriptive markup that occurs within the document. Usually, they are collected outside the document in separate procedures or programs.

With descriptive instead of procedural markup, the same document can readily be processed by many different pieces of software. Each of these can apply different processing instructions to those parts of it which are considered relevant. For example, a content analysis program might disregard entirely the footnotes embedded in an annotated text, while a formatting program might extract and collect them all together for printing at the end of each chapter. Different sorts of processing instructions can be associated with the same parts of the file. For example, one program might extract names of persons and places from a document to create an index or database, while another, operating on the same text, might print names of persons and places in a distinctive typeface. Secondly, SGML introduces the notion of a document type, and hence a DTD. Documents are regarded as having types, just as other objects processed by computers do. The type of a document is formally defined by its constituent parts and their structure. If documents are of known types, a special purpose program (called a parser) can be used to process a document claiming to be of a particular type and check that all the elements required for that document type are indeed present and correctly ordered. More significantly, different documents of the same type can be processed in a uniform way. Programs can be written to take advantage of the knowledge encapsulated in the document structure information, and can thus behave in a more intelligent fashion.

It should be stressed that the SGML standard is entirely unconcerned with the semantics of textual elements. The semantic considerations are application dependent. Standardized markup using the SGML is the most powerful and flexible means of encoding the structure of textual information that exists today. Markup defined using SGML can be efficient and extremely portable. SGML encoded text can be versatile and can be interpreted into typeset printed products or electronic applications. SGML was developed to address the problems of information exchange and management challenges associated with electronic printed output and exchange of textual information. ISO participated in the development of the SGML standard and adopted it in 1986 as ISO 8879. This standard completely defines the terms and syntax to specify the structure and content of a document. Because ISO 8879 does not provide functionality for specifying appearance, formatting requirements, or standards for style, most systems still rely on proprietary methods. With SGML being independent from presentation, some means of describing presentation to a document composition system was needed. Unfortunately, the language that the international standards community was developing to satisfy this requirement, called the Document Style Semantics and Specification Language (DSSSL), was first published in draft form in late 1994, and was published as an official ISO standard in late 1995 - eight years after the CALS requirement was identified. During that time, the Department of Defense (DoD) elected to establish an interim capability for CALS, based upon SGML, that addressed composition. The first version of the DoD's MIL­M­28001 Output Specification (March 25, 1987) applied this in a limited manner by focusing on paper output.

The OS is in the form of a particular DTD that allows the user to create a FOSI, that is well suited to printed output. A FOSI is essentially a powerful style sheet that specifies the formatting for each tag in a DTD. A FOSI, the document instance, and the DTD constitute a complete interchange package for printed documents. In a DSSSL domain, a complete interchange package would also be constituted by the document instance, the DTD, and the DSSSL specification.

3.0  OVERVIEW OF SPECIFICATIONS


      
[ Previous ]           [ Next ]           [ Home ]

This section gives a broad overview of the Output Specification and the DSSSL. The introductions are followed by general discussion of the methodology that we employed in the comparison of both of these specifications as well as an in-depth look at their features and characteristics.

3.1  MIL-PRF-28001C Output Specification

The DoD application of SGML is specified in MIL­PRF­28001C. ISO 8879 defines SGML as a meta-language. Many different SGML languages satisfying the grammar and syntax specified in ISO 8879 can be obtained by choosing from among different features and options afforded by ISO 8879. MIL­PRF­28001C specifies the DoD CALS SGML implementation by choosing certain SGML features and setting certain parameters. MIL­PRF­28001C also uses SGML to provide a capability for specifying the formatting of CALS SGML documents.

An ISO 8879 SGML document contains three parts: the SGML declaration, the Document Type Definition (DTD), and the Document Instance. A CALS SGML document contains these parts and may also contain a Formatting Output Specification Instance (FOSI) for paper delivery or a Formatting Presentation Specification Instance (FPSI) for electronic delivery (Revision C). The SGML declaration defines what characters will be allowed in the rest of the document and how they will be encoded. The SGML declaration also specifies how certain characters are to be interpreted along with other special rules and definitions. The DTD defines the content and structure of a document. It refers to how the information in a class of documents such as technical manuals, books, memos, etc., is related, and defines any dependencies. Markup of a document provides an unambiguous definition of its structure and content, allowing automated data processing software to process the document in a predictable manner. Also defined in the DTD are entity declarations. The document instance is the information content of the document, marked up in accordance with a DTD. This markup may include declarations, tags, and entity references. The FOSI specifies the desired appearance of the information content of the document. This output formatting description capability is not contained in ISO 8879, but is found in MIL­PRF­28001C. MIL­PRF­28001C contains an Output Specification (OS) DTD that defines how FOSIs are to be developed and interpreted.

The goal of the OS is to allow for the interchange of style and formatting information between all types of publishing systems. It describes a method for interchanging formatting requirements for documents whose source files are tagged according to DTDs. The OS DTD describes the rules that must be followed to develop a FOSI. The FOSI specifies layout characteristics for page models, and style characteristics for graphics, tables, and all other elements. The categories of format and style characteristics are represented in the OS DTD as elements. Individual characteristics are represented as attributes on those elements. A FOSI is the set of characteristics and values chosen from the OS DTD to represent the formatting requirements for a particular type of document.

Characteristics are descriptions of the format of a document, rather than commands that tell a formatting system what to do. A FOSI is intended to specify how a particular group of documents should be formatted. The FOSI for that class of documents should be used any time data need to be presented. If the same information needs to be used for another purpose (it falls under a different class of documents), a different FOSI would be used to present the data in the appropriate format. A FOSI author must have a background in typographic design and a working knowledge of SGML. The most important qualification, however, is an intimate familiarity with the requirements of the formatting and style specifications for the class of documents that are to be represented through the FOSI.

The figure below outlines the context of a FOSI analogous to the DTD and the SGML document instance. In an SGML system, both the DTD and the SGML document (and instance of the DTD), can be parsed by an SGML validating parser. Similarly, the Output Specification DTD and the FOSI ( an instance of the OD DTD) can be parsed. A formatting engine would take the FOSI and the SGML instance to produce the required formatted document.


Figure 3.1-1  FOSI Context Diagram

From the diagram above, it can be seen that the FOSI is also an SGML document, and specifies how another SGML file is to be formatted. In certain contexts FOSIs provide a device-independent way to specify semantic information. The FOSIs can be read by a typesetter creating paper or a browser rendering characters on a graphical device. It is up to the application utilizing the FOSI to provide the system-specific information required to render the font and style information. Although the FOSI provides the important service of separating the formatting information from a specific output device, it is limited in some aspects. It is sometimes necessary to insert certain pre-processing steps in front of the SGML data file that utilizes the FOSI. Further discussion of some of these matters will be in the sections following the characteristics and comparison sections that follow.

3.2  Document Style Semantics and Specification Language (DSSSL)

As mentioned earlier, in any generalized markup scheme, there must be a method for associating processing specifications with the SGML markup. This method of association allows the information to be attached to specific instances of elements as well as to general classes of element types. The primary goal of DSSSL is to provide a standardized framework and methods for associating processing information with the markup of SGML documents or portions of documents. DSSSL is intended for use with documents structured as a hierarchy of elements. The main objective of this international standard is to provide a language for expressing formatting and other document processing specifications in a formal and rigorous manner so that these specifications may be processed by a broad range of formatters, either natively or using a translation mechanism. The Document Style Semantics and Specification Language (DSSSL) is used to specify the formatting and transformation of SGML documents. The initial focus of DSSSL is on formatting for both paper and electronic media and on the transformation of SGML documents marked up according to different DTDs. DSSSL may be used with any SGML documents without requiring modifications or constraining the document type definitions.

DSSSL enables formatting and other processing specifications to be associated with elements in a source SGML file to produce a formatted document for presentation. DSSSL consists of two main components: a transformation language and a style language. The transformation language is used to specify structural transformations on SGML source files. For example, all the references quoted in a book's chapter may be transformed with the transformation specification, to appear at the end of a chapter rather than in the footnote of a page. The transformation language also can be used to specify the merging of two or more documents, the generation of indices, tables of content, and other such applications. While the transformation language is a powerful tool for gaining the maximum use from document bases, the focus in early vendor implementations as well as the focus of our work will be upon the style language component. The figure shown below represents the steps involved in the transformation process. The output of the transformation process is a transformed resultant SGML file. During the DSSSL transformation process, formatting information may be added to the result of the transformation. This information may be represented as SGML attributes. These, in turn, may be used by the style language to create formatting characteristics with specific value.



Figure 3.2-1  DSSSL's Transformation Language Flow Diagram

Figure 3.2-1 shown above illustrates the processes involved in the transformation process. In the Grove Building process, the input SGML document or subdocument is parsed and is represented by a collection of nodes called a grove. A grove is similar to an element tree, but may include other subtrees, for example, a subtree of attribute values. Relationships in a grove are expressed in terms of properties. The input to the transformation process includes the SGML document as created during the grove building step and the transformation-specification. The transformation-specification consists of a collection of associations. Each association specifies the transformation of like objects in the source document into objects in the result grove. Key to this transformation is that not only can each object be mapped to an explicit location in the result grove, but also it can be mapped to a location using the result of transforming some other source object as a reference point. The output of the transformation process is the result grove. The transformation process may operate on multiple SGML documents as input to the process, and likewise may transform them into multiple SGML documents. The transformation process produces a grove that must be converted to an SGML document for interchange, validation, and input to the formatting process. The SGML generator is used for this purpose. The output of the SGML generator shall be a valid SGML document.

The style language is a part of the areas of standardization offered by DSSSL and is used for specifying the application of formatting characteristics onto an SGML document. The process that applies formatting and other formatting-related processing characteristics to an SGML document is called the formatting process. The term "formatting" is used to specify the process that applies presentation styles to source document content and determines its position on the presentation medium, the selection, and reordering of content in the result document with respect to its position in the input document, the inclusion of material not explicitly present in the input document, such as the generation of new material, and the exclusion of material from the input document in the result document. The objective of DSSSL is to provide a formal and rigorous means of expressing the range of document production specifications, including high-quality typography required by the graphics arts industry. It supports SGML applications by providing a standardized architecture for the formatting specifications because many users require a standardized approach for interchanging the formatting and other processing information. DSSSL also can be used for the extraction of information from SGML documents for loading into databases. In other words, DSSSL defines a standard way of describing the external appearance of SGML documents.

This process is controlled by the style-language-specification. It is important to note that for the DSSSL style language and the associated formatting process, DSSSL does not standardize the process itself, but merely standardizes the form and semantics of the style language controlling a portion of the process. The remaining formatting functions, such as line-breaking, column-breaking, page-breaking, and other aspects of whitespace distribution, are not standardized and are under control of the formatter. Figure 3.2-2 shown below outlines the formatting specified by the style specification.


Figure 3.2-2  DSSSL's Style Language Flow Diagram

DSSSL defines the visual appearance of a formatted document in terms of formatting characteristics attached to an intermediate tree called the flow object tree. DSSSL allows enough flexibility in the specification so that it is not tied to a set of composition or formatting algorithms (i.e., line-breaking, page-breaking, or whitespace distribution algorithms, used by any particular formatting system). These aspects of the layout process are specific to individual implementations. In this international standard, line-breaking and page-breaking rules may be expressed in terms of constraints and other formatting characteristics that govern the formatting process. The output of the formatter, undefined in this international standard, is a formatted document suitable for printing or imaging.

The conceptual processes that constitute the formatting process are building a grove from the SGML document, application of construction rules to the objects in the source grove to create the flow object tree, definition of page and column geometry, and composition and lay out of the content based on the rules specified by the semantics of the flow object classes and the values of the characteristics associated with those objects.

The formatting process uses the same grove building step as the transformation process to convert the SGML document into a grove of hierarchically structured objects. The grove is then further processed, using the construction rules, to create a flow object tree consisting of flow objects with the appropriate formatting and page-layout characteristics. Each flow object (except an atomic flow object) has one or more sequences of flow object children. Each sequence of flow object children is attached to a point of a flow object called a port. Either the port is the principal port of the flow object, or it may be named. A flow object class defines a set of formatting characteristics that apply to some category of flow objects. Each flow object class also defines a set of port names. The class of a child flow object shall be compatible with the class and port name of the port to which it is attached. In constructing the flow object tree, the Standard Document Query Language (SDQL) may be used to identify portions of the SGML document that have specific formatting characteristics as well as those that can be treated together for purposes of flowing onto the same column or page. The content that is flowed together is placed as a sequence of flow objects in a port of the parent in the flow tree. The flow object classes and the characteristics that apply to them define the formatting appearance and behavior of the contents of the document.

A complete package for publishing using DSSSL would contain the SGML source, corresponding to the structure of the source DTD, the DTD corresponding to the virtual intermediate document that is the result of the general language transformation, and the DSSSL specification, which associates semantics to each logical element according to the DSSSL document architecture.

As is the case with SGML, software is required to process the information contained in DSSSL documents. A DSSSL formatter will be able to generate the specification automatically and read and apply them to conforming SGML documents. The instance of "page" layout can be input for a text formatter, output in PostScript, or generate an Standard Page Description Language (SPDL) instance.

3.2.1  DSSSL-Online

DSSSL-Online is an application profile of DSSSL designed for the formatting specification requirements of on-line SGML browsers and editors. DSSSL-Online supports the basic features needed to provide publisher-oriented formatting control of on-line displays and a minimum set of page-oriented features needed to provide utility printouts from browsers and editors.

Within the style language, it is possible to identify a number of capabilities that for one reason or another should be considered optional for early implementations. Recognizing this, the designers of DSSSL designated certain features of the style language as optional to make limited implementations possible. There is no particular subset of the style language component within the standard. This has been left to industry organizations and standards bodies. DSSSL-Online is the first of such application profiles.

This report will also address issues pertaining to DSSSL-Online in the analysis and comparison matrices with the OS and the DSSSL.

4.0  DSSSL VS 28001 OS CHARACTERISTICS EVALUATION


      
[ Previous ]           [ Next ]           [ Home ]

The main goal of this study is to independently assess the capabilities of the DSSSL standard and present a comparative study of the formatting features offered by DSSSL to the Output Specification in MIL­M­28001C for Department of Defense needs. On initial survey of the two specifications, it was found that both had a great deal of commonality but had definitive differences both in the approach taken to formatting and in the way they are structured to perform the formatting function. For these reasons, it was decided to independently perform a characteristics evaluation of both of the specifications. This exercise would yield a set of formatting features and attributes that both the specifications contain. This study and evaluation will restrict itself to a discussion of the formatting features outlined in the style language component of DSSSL.

The technique being employed in this report generates a global/universal set of formatting requirements. The set will consist of the formatting features necessary for the publishing of technical manuals. The components of this set have been derived by considering all the formatting features offered by five different composition WYSIWYG systems. Also included in the formatting requirements are the features addressed by the MIL­PRF­28001C OS and DSSSL.

To arrive upon a set of formatting requirements to compare the MIL­PRF­28001 Output Specification and DSSSL, we must carefully study the characteristics of each of these standards and develop a matrix of comparison criteria. This generic set of criteria must be able to expose the features that both of the specifications address. This falls in line with our development of an independent universal set of formatting requirements for technical manuals. This analysis between the two specifications results in the formation of a comparison matrix between the two specifications. The Venn Diagram graphically illustrates the methodology being employed in our comparison matrix. The matrix will yield the different regions of the Venn Diagram and possibly lend ideas for the implementation of a migration plan between the two specifications. For the sake of brevity in the flowing text of this report, detailed discussions of the individual attributes are omitted. Only the main sections and their descriptions are used. A comprehensive list of the characteristics and the attributes is given in Appendix A for the Output Specification and for DSSSL in Appendix B.


Figure 4.0-1  Venn Diagram Characteristics Evaluation


4.1  Summary of Output Specification Characteristics

The goal of the OS is to allow for the interchange of style and formatting information between all types of publishing systems. It describes a method for interchanging formatting requirements for documents whose source files are tagged according to DTDs. A FOSI is the set of characteristics and values chosen from the OS DTD to represent the formatting requirements for a particular type of document. Characteristics are descriptions of the format of a document, rather than commands that tell a formatting system what to do. The information about the content/structure of the document should already be rigorously described in a DTD. The DTD defines the element types, the possible content/structure the document can have using these element types, and the attributes that can be associated with each element type. Formatting information may be contained in a single functional specification or may appear in a combination of specifications or other documents. A FOSI is intended to specify, in general, how a particular class of documents should be formatted. It does not specify with precise fidelity how any particular document was actually formatted; this level of precision is not required and is beyond the purpose of a FOSI. Characteristics are, in general, descriptions of the format of a document, rather than commands that tell a formatting system what to do. For example, if a FOSI has a value of 10 for the font size (size) characteristic for element type A, this should be interpreted as follows: however your formatting system works, make sure that font size 10 is used to process element A. The FOSI should not be interpreted as saying: when you see a start­tag for element A, call a command that changes the font size and give it a value of 10. This is a subtle, but extremely important, distinction. The first interpretation allows any system (including a human) to create the desired end result, while the second interpretation allows for only systems with a specific command language to easily create the desired result. In this way, the OS does not presume to direct how a formatting system should behave in order to accomplish the desired result.

The basic unit of data within the source document identified within a FOSI is an element (qualified by its context and occurrence). Additionally, attribute values associated with the element can be identified. Once identified, an element is treated as whole. The characteristics associated with the element through the FOSI apply to all the content of that element. Every relevant element and attribute in the source DTD should have an entry in the FOSI describing how it is to be formatted. In processing a FOSI, there should be no assumptions made about the source data.

This section will discuss the characteristics offered by MIL­PRF­28001. To completely specify all the formatting features offered by the OS, we must look at all the characteristics, elements, attributes, and the values those attributes can assume from the Output Specification Document Type Definition. This section will detail the features offered by each of the main sub section descriptions of the OS DTD. These are

4.1.1  Resource Description

The resource description (rsrcdesc) gives document-wide hyphenation rules (hyphrule), as well as descriptions of character fills (charfill), counters (counter), strings (stringdecl), and floats (floatloc) that will be used throughout the FOSI. The attributes and the values associated with them are listed in Appendix A. The hyphenation rule category provides for setting various parameters for the hyphenation process that will be used throughout the document. The character fill category provides for describing literals that can be used to fill a space horizontally or vertically. The counter construct is used in the resource description to specify the properties of a counter that will be associated with one or more Elements In Context (e­i­cs) using the enumeration (enumerat) category. The string construct is used in the resource description to specify the properties of a text variable that will be associated with one or more e-i-cs using the savetext or usetext characteristic. The float construct is used in the resource description to specify that an e-i-c's content should float to some place in the output instance other than the next available location in the Flowing Text Area.

4.1.2  Security Description

The security description is used to specify security text in a document. In the security description (secdesc), the strings are established to be automatically generated for the security text identified in the header and footer. To set up these strings, the user must know the possible values for the security attribute in the source DTD. Style and positioning characteristics are specified for the string through the sectext portion of the header and footer specification.

4.1.3  Page Models and Pagination Characteristics

A description (pagedesc) specifies how the pages are to look (the page model) and can be set up independently of the content that goes on them. It describes the placement and relationship of the areas in which the content is to be placed. Development of the FOSI requires an analysis of the formatting specifications for page layout to determine the sizes of these areas and specification of the characteristics that control how and when these areas are created on the page. The following formatting requirements and their characteristics/features are listed below.

Page sets (pageset) provide the means to specify automatic relationships between recto, verso, recto with blank back, verso with blank front, and automatically generated blank pages.

Page parameters provide layout areas within the page such as width and depth of the page, the left and right margin, widths, width of the flow text (though the flow text width is not explicitly specified), different number of columns that are possible in the flow text (flowtext) of a given page specification (pagespec), the individual column widths for that number of columns, gutter area width, top and bottom margins, nominal and maximum depths for headers and footers, and finally change markings (chgmark) area. Figure 4.1.3-1 shown below illustrates the various parameters that can be set for a given page.

Page references are a shortcut for specifying a page model when the only difference from another page model is the header and footer information. Each page model can be assigned an unique identifier through the page id (pgid) attribute. A page reference then refers to this page model through the page id reference (pgidref) attribute, and any header and footer information additionally supplied overrides the header and footer information in the referenced page model.

Headers and Footers typically contain text that is dependent on document content, such as the technical manual identification number and the chapter title. The headers and footers may be specified to allow variable depth.

Positioning and Style of text placed in the header and footers are determined by the subcharacteristics (subchar) specified for puttext and usetext and the vertical quadding (vquad) additionally specified within the header or footer. For simple cases, where the text is anticipated to be a single line, the quadding values are right, left, center, in, and out for horizontal positioning, and the vertical quadding values of top, middle, and bottom for vertical positioning are used. This allows for nine positions relative to the header or footer area.

For more precise positioning and handling of multiple­line text, a box is formed using prespace (presp) to specify the distance from the top edge of the area, postspace (postsp) to specify the distance from the bottom edge of the area, left indent (leftind) to specify the distance from the left edge of the area, and right indent (rightind) to specify the distance from the right edge of the area. Quadding can then be used to specify how the lines are positioned within the box.

Security Classifications may be necessary in headers and footers, which may vary depending on the content of the page or sheet. This text is identified within the headers and footers as security text (sectext). This text is special because it is automatically generated based on the specifications within the security description (secdesc).


Figure 4.1.3-1  Page Parameter Areas

Page Numbers are generated using the enumerate specification in the page resource (pageres).

Footnote Areas can appear at the bottom of each column or span the Flowing Text Area. This part of the page model characteristics can specify the maximum amount of space that the footnotes can take up within the column, the fixed amount of space that should always appear between the text and the footnotes, length and thickness, whether footnotes can break across pages, and whether footnotes stay attached to the footer or the flowing text when a floating figure or table appears at the bottom of the page.

Floating Elements, such as tables and figures, are thought of as floating elements because they may appear in the resulting formatted documents in a different position than where they occurred in the source documents. Proper control of floating elements greatly improves usability of information. The following are a few of the situations that may arise for floating elements. The OS is capable of handling such situations. Although these are by no means the exhaustive list, they may be considered as representative.

  1. Floating figure - title at top of figure.
  2. Floating figure - title at bottom of figure.
  3. Multi-sheet figure - title at top of figure.
  4. Inline figure - title at bottom of figure.

    Note: An inline figure or table appears in the output in the same relative location as it is in the source document instance. It cannot be floated past other material to a place that might be more convenient in facilitating dense page layout.

  5. Figure on same page with associated text - title at top of figure.
  6. Figure on facing page with associated text - title at bottom of figure.
  7. Figure on same sheet with associated text - title at bottom of figure.
  8. All figures placed in a separate section.

4.1.4  Style Description

The Style Description is used to specify formatting characteristics for every element that may appear in the document. A common approach is to look at the formatting specification and determine the overall general requirements of formatting. These requirements can be specified for the document description, thereby providing document­wide defaults. Common sets of requirements also can be specified for certain named environments, for example, for front, body, and rear matter. The style description allows for the unique requirements for element types in specific contexts and specification for the e­i­cs for each.

The style description in the OS is very detailed and is governed by various rules and formatting attributes. The purpose of this study is to bring out the formatting requirements that the OS caters to and compare it on an even plane with those offered by DSSSL.

The following are the major categories of the style description that are presented in MIL­PRF­28001.

Document Defaults are used to specify formatting characteristics for every element that may appear in the document. One approach is to look at the formatting specification and determine the overall general requirements. These requirements can be specified for the document description, providing document­wide defaults.

Environments are useful when some set of characteristics is common to many elements. Environments can be referred to by any e­i­c, and then only the differing characteristics for that e­i­c need to be specified.

Charsubsets are used if a group of characteristics (e.g., font, leading, wordspace) is used together often. One might want to define a characteristic list subset (charsubset) with the appropriate values and then refer to it by name in a charlist. This subset would merge with the rest of the charlist.

Font is used to specify the style for text. It is possible to specify the general style of the font. The font styles are proportionally­spaced serif (serif), proportionally­spaced sans serif (sanserif), monospaced serif (monoser), and monospaced sans serif (monosans). In addition, it is possible to specify the name of an actual font (famname) that the formatting system can optionally select. Characteristics of the font include posture, weight, proportionate width, upright, medium, and regular, respectively. Another characteristic is a value for size, typically in points. How the formatting system actually chooses a font is based on the algorithms within the system. While the specification describes font as a style modified by characteristics, it is common for font libraries to include different actual fonts for upright and italic versions. A detailed list of the characteristics and attributes is given in Appendix A.

Leading is directly related to the font size. In this specification, leading is measured from text baseline to text baseline. Therefore, the value for leading should be at least as large as the value for the font size, and is typically slightly larger.

Hyphenation must be specified for each e­i­c. If it applies, the hyphenation characteristics specified in the document description apply. The only characteristic that can be overridden is the hyphenation zone.

Word spacing and Letter spacing are generally set up for the document description and are rarely changed for a particular element. Typically, word and letter spacing values are specified with em (unit of size/distance) spaces. This allows word and letter spacing to vary with the font in use.

Kerning is to be used by the composition system to specify letter spacing. Pair kerning specifies an approach whereby kerning pairs are looked up in a kerning table. Track kerning is a methodology for placing the same amount of space between characters in a line. Sector kerning is an algorithmic approach for placing space between characters depending on the characters in the line.

Indents establish margins against which text can be positioned. Specify the indents relative to the column area boundaries, or specify indents relative to the text margins of the parent, for example, in the case of nested paragraphs.

Quadding values of left, right, center, and justify represent typical typographic positioning techniques. The values in and out work exactly the same as left and right but leave the actual determination of which side to the formatter based on the bind edge. In and out are most commonly used in headers and footers.

Highlighting attributes like scoring, score weight (scorewt), and score offset (scoreoff) are used for underlining, overbars, and strike­throughs. Reverse, color, and screens are used for special effects.

Change marks specify that either a bar or literal string appears in the change mark area to denote changed text. Font and highlight characteristics can be specified for the string; there is no leading. The change mark literal fits on one line with no line wrapping.

Prespace and postspace specify the space before and after an element, respectively. Note that consecutive prespaces and/or postspaces generally combine in such a fashion that only the highest priority spacing (or, in the case of equal priorities, the largest dimension) in a series takes effect. Some planning for prespace and postspace values will ensure that the FOSI reflects the formatting requirements.

Keeps should be used with discretion because it is very easy to specify unrealistic expectations for the formatter. Because the content of the document is unknown, such a requirement may be impossible to fulfill. Keep is turned on for elements that fit on a single page. A more common formatting requirement is to keep some pieces of adjoining elements together on a page, for example, to keep a title with the first two lines of the following paragraph. This type of requirement can be specified with keep next, keep previous (prev), widow count (widowct), and orphan count (orphanct).

Vertical justification parameters are used in performing vertical justification where the formatter may need to make some adjustments in order to balance the text in the columns. In general, a formatter should attempt to honor all specifications for an element's prespace, postspace, and keeps.

Text breaks (textbrk) typically include a formatting specification that explicitly states requirements for starting text on new columns or pages, such as starting chapters on a new page. This can be specified with start column (startcol) and start page (startpg). In addition, the page model (e.g., foldout pages) to be used when starting on a new page can be specified. The OS makes no assumptions about whether or not elements start on a new line. This information must be explicitly specified for each element that starts on a new line.

Spans are used to specify that text normally placed within a single column in the Flowing Text Area should span all the columns. Note that tables, footnotes, and graphics have special characteristics for specifying their width.

Borders that always appear on a particular type of page should be specified in the page specification (pagespec). In addition, the occurrence of certain elements on a page may trigger the appearance of a border as, for example, with emergency information. Border patterns are specified with a name, which is described in the declaration subset of the FOSI.

Rules are used for inline rules within paragraph text, for example, as in a signature line. Multiple rules can be specified on a single e­i­c; each specification draws one rule.

Character fills are used mainly for leader dots. Character fill patterns are specified in the resource description to set up how the fill string looks and to assign the string a unique name.

Automatic numbering is typically specified for structural elements such as chapters, sections, paragraphs, and steps, as well as tables, figures, and footnotes. In a FOSI, counters are set up that can be referenced by an e­i­c such that the actual value is maintained by the formatter. The counter element in the resource description (rsrcdesc) is used to set up each counter.

Suppressing text is used for text that is marked up in the source document that is intended to be used for some purpose other than in the normal text flow. Typically, the text is saved with a savetext so that it can be used elsewhere.

Puttext is used when the formatting specification requires the generation of a standard piece of text with each occurrence of an element as, for example, with the note heading for a note. This text does not appear in the source document itself; however, by specifying it in the FOSI, the use can be ensured that it is consistent throughout the document.

Putgraph is used when graphics appear in the document that are not part of figures, such as the DoD seal. Putgraph allows identification of these graphics and specifies how they appear in running text.

Savetext allows for saving the content of an element for use elsewhere in the document, for example in the header, footer, or table of contents. The content also still appears in the Flowing Text Area in its normal sequence (unless inhibited by use of the suppress category). Savetext can be used to save combinations of other saved text, saved counters, pseudo­elements, and literals. Usetext is used to retrieve the saved text and specify how it is used.

String construct is used to specify the properties of a string that will be associated with one or more e-i-cs using the savetext or usetext category. All variables that are time independent (i.e., have the same value regardless of when they are used) must be specified with their time status attribute set to "1". The scope attribute can be used to indicate an element name after which all uses of the string should be resolved.

4.1.5  Graphics Description

The graphics description is used to specify various characteristics and attributes for the formatting of graphic elements in an SGML instance. The various characteristics can be grouped as shown below:

Reproduction area dimensions,
Graphic sizing,
Text block, and
Placement.

Reproduction area dimensions give information about the size of the reproduction area (the area on the presentation media) in which the graphic is to be placed. The characteristics are width and depth.

Graphic sizing gives information concerning constraints on how to modify the size or view of graphics to be placed in the reproduction area. The following characteristics apply: Graphic Name, Horizontal Scaling, Vertical Scaling, Scale to Fit, Lower Left Coordinates, and Upper Right Coordinates.

Text block gives information concerning the size and reference point of a text block (textblock). The attributes are Text Block Width, Text Block Depth, Horizontal Reference Point, and Vertical Reference Point.

Placement specifies information concerning constraints on where and how to place graphics or text blocks with respect to the reproduction area. The attributes are Horizontal Placement, Vertical Placement, Start Coordinates, End Coordinates, and Rotation.

4.1.6  Table Description

A table is a rectangular, two-dimensional grid. Its horizontal and vertical dimensions may or may not have uniform measures; they may be determined from the source document instance, the FOSI, or both. The objects within a table are table subset groups, table subsets, columns, rows, and cells. A table is the entire rectangle that takes up space in the Flowing Text Area. Characteristics of the table control the frame. A table subset group (tgroup) is a set of table subsets containing an optional heading subset, an optional footing subset, and one or more body subsets. A table subset is a set of contiguous rows within a table such as the header, footer, or body of a tgroup. There is no space between table subsets in a table. There are three types of table subsets - heading, footing, and body subsets.

Tables present unique formatting characteristics. Tables are important in technical manuals because they contain and present a large amount of data in a format that shows relationships among the data. The characteristics for tables allow for robust and discretionary access and manipulation of data contained within tables, and facilitate exchange with, and use within, databases. The table description is used to describe the organization and formatting of an actual table, that is, data organized into a two-dimensional grid. Any associated information, such as title, is described in the style description.

A cell is the intersection of a column and row and forms the basic area into which table content is placed. Characteristics of the cell control the column and row separators, margins, and alignment.

Two aspects of specifying the style of a table are geometric and text composition. The geometric aspect includes the number of columns, rulings, and margins. The text composition aspect includes font, positioning of text within cells, and generally those characteristics that can be applied to text. Both of these kinds of characteristics can be specified in the FOSI. Special table characteristics are provided to control the style of the table itself. Composition characteristics are used to specify the style of the content.

In general, there is a unique set of characteristics for each table object. In addition, there is a set of standard cell characteristics (stdcellatts) that control the characteristics of a cell but can be specified on any table object. These characteristics include column and row separators, margins, and alignment.

Table characteristics, listed below, are unique to tables, and may be used in conjunction with composition characteristics to fully specify the output of tables. Characteristics that apply specifically to the table element are as follows:

Table Style,
Width Type,
Specific Width,
Relative Width,
Frame List,
Frame Thickness, and
Frame Style.

Cell characteristics apply to cells. They may be specified on any table object and apply to the cells within the scope of that object.

Column Separator On,
Row Separator On,
Column Separator Width,
Row Separator Width,
Column Separator Style,
Row Separator Style,
Left Margin,
Right Margin,
Top Margin,
Bottom Margin,
Horizontal Alignment,
Vertical Alignment,
Alignment Character,
Alignment Character Offset,
Reverse,
Background Color,
Shading,
Rotation, and
Text Width.

Table subset characteristics apply specifically to table subsets.

Number of Columns,
Keep,
Boundary, and
Subset Type.

Column Characteristics apply specifically to table columns.

Column Width,
Column Number,
Column Name,
Span Name,
Start Column Name, and
End Column Name.

Row characteristics apply specifically to table rows.

Break Row.

4.1.7  Footnote Description

In the footnote description, the e­i­cs that describe the elements (or pseudo­elements) will cause their contents to be placed in the footnote area of the page on which these elements are used. Associated with each e­i­c in the ftndesc is a set of footnote attributes that contains a charlist (minus keeps and span) that specifies the formatting of the footnote content itself. More precisely, the contents (e.g., the charlist) of the e­i­c in the ftndesc determine the characteristics for what gets placed in the Flowing Text Area when this e­i­c instance is encountered.

4.2  Document Style Semantics and Specification Language Characteristics

DSSSL enables formatting and other processing specifications to be associated with these elements to produce a formatted document for presentation. The style language is a part of the areas of standardization offered by DSSSL and is used for specifying the application of formatting characteristics onto an SGML document. The process that applies formatting and other formatting-related processing characteristics to an SGML document is called the formatting process. This process is controlled by the style-language-specification. It is important to note that for the DSSSL style language and the associated formatting process, DSSSL does not standardize the process itself, but merely standardizes the form and semantics of the style language controlling a portion of the process.

DSSSL defines the visual appearance of a formatted document in terms of formatting characteristics attached to an intermediate tree called the flow object tree. DSSSL allows enough flexibility in the specification so that it is not tied to a set of composition or formatting algorithms (i.e., line-breaking, page-breaking, or whitespace distribution algorithms) used by any particular formatting system. The conceptual processes that constitute the formatting process are building a grove from an input SGML document, application of construction rules to the objects in the source grove to create the flow object tree, a definition of page and column geometry, and finally the composition and layout of the content based on the rules specified by the semantics of the flow object classes and the values of the characteristics associated with those objects. Each flow object (an instance of a flow object class) is finally formatted to produce a sequence of areas having explicit dimensions and positioned by a parent in the flow object tree.

The concept of an area is used to give semantics to flow objects. The result of formatting a flow object other than the root flow object is a sequence of areas. The nature of these areas is not fully specified by this international standard. An area is a rectangular box with a fixed width and height. An area is also a specification of a set of marks that can be imaged on a presentation medium. An area may contain other areas. In particular, an area may contain a glyph. Information may be attached to areas depending on the flow object that produced the area and the context in which it is to be used. Areas are of two types: display areas and inline areas. Display areas are areas that are not directly parts of lines. A display area has an inherent absolute orientation. Inline areas are areas that are parts of lines. Each type of area is placed in a different way. For an illustration of the concept of displayed and inlined areas, please see Figure 4.2-1.

Figure 4.2-1  Inline and Display Area

A flow object class defines a set of formatting characteristics that apply to some category of flow objects. Each flow object class also defines a set of port names. The class of a child flow object shall be compatible with the class and port name of the port to which it is attached. The flow object classes and the characteristics that apply to them define the formatting appearance and behavior of the contents of the document. An in-depth look at the required and optional flow object classes defined by the DSSSL will yield a matrix of formatting capabilities offered by this standard. The characteristics associated by each flow object class define clearly the various options and functionality that each flow object offers toward formatting.

To study the formatting requirements offered by DSSSL and to stay within the scope of the comparison with the Output Specification, we must look at the capabilities of DSSSL for various types of formatting. The optimum way of achieving this is to restrict the scope of formatting requirements in DSSSL to the style language process and extract all the flow object classes that DSSSL addresses. These can be used to form the comparison matrix with the MIL­PRF­28001 Output Specification.

The following are the flow object classes defined by DSSSL along with their descriptions.

Sequence flow object class is formatted to produce the concatenation of the sequence areas produced by each of its children.

Display-group flow object class is formatted to produce the concatenation of the display areas produced by each of its children. It has a single principal port. Its children shall all be displayed, and it is itself displayed.

Simple-page-sequence flow object class is formatted to produce a sequence of page areas. The simple-page-sequence flow object is intended for systems that wish to provide a very simple page layout facility. More complex page layouts can be obtained with the page-sequence and column-set-sequence flow object classes. A simple-page-sequence may have a single-line header and footer containing text that is constant except for a page number. A document can contain multiple simple-page-sequences.

Page-sequence flow object class is formatted to produce a sequence of page areas. The structure and positioning of the page areas shall be controlled by page-models. A few of the types of pages that DSSSL is capable of formatting are shown in the below figures.

Column-set-sequence flow object class is formatted to produce a sequence of column-set areas. A column-set area is a display area. A column-set area is produced by creating and filling an area container. A column-set area contains a set of parallel columns. Typically, column-set areas may be used to fill page-regions; however, column-set areas may also be used to fill other column-set areas. The structure and positioning of each column-set area shall be controlled by the column-set-model to which it conforms.

Paragraph flow object represents a paragraph. The contents of this port may be either inlined or displayed. Inline flow objects are formatted to produce line areas. Displayed flow objects implicitly specify a break, and their areas shall be added to the resulting sequence of areas. A paragraph flow object may only be displayed .

Paragraph-break flow object class can be used to make a paragraph flow object represent a sequence of paragraphs. The paragraphs are separated by paragraph-break flow objects, which are atomic. Paragraph-break flow objects are allowed only in paragraph flow objects. The characteristics of a paragraph-break flow object determine how the portion of the content of the paragraph flow object following that paragraph-break flow object up to the next paragraph-break flow object, if any, is formatted.

Line-field flow object class is inlined and has inline content. It produces a single inline area. The width of this area is equal to the value of the field-width: characteristic. If the content of a line-field area cannot fit in this width, then the area grows to accommodate the content, and if the line-field occurs in a paragraph, there shall be a break after the line-field.

Sideline flow object class is used to contain flow objects that have an attachment area consisting of a line parallel to the placement direction. A sideline flow object has a single principal port that can contain both inlined and displayed flow objects. For each display area produced by its content, the sideline flow object adds an attachment. For each inline area produced by its content, the sideline flow object annotates that area so as to cause the paragraph in which the flow object occurs to add an attachment area to the line in which that inline area occurs.

Anchor flow object class serves in a flow object to be synchronized.

Character flow object class is comprised of character flow objects that are atomic. Flow objects of this class can only be inlined.

Leader flow object class has a single principal port containing the inline flow objects to be repeated.

Embedded-text flow object class is used for embedding right-to-left text within left-to-right text or vice-versa.

Rule flow object class is used to specify a straight line. Rules may be inlined or displayed.

External-graphic flow object class is used for graphics contained in an external entity. Flow objects of this class may be inlined or displayed.

Included-container-area flow object class results in a sequence of one or more areas, each of which is specified as an area container. The size of the container shall be fixed in the direction perpendicular to the area container's filling-direction.

Score flow object class consists of a single port whose contents are scored.

Box flow object class may be used to put a box around a sequence of flow objects. The box flow object is either displayed or inlined depending on the value of the display?: characteristic. If the box is displayed, then the port shall accept any displayed flow objects. If the box is inlined, then the port shall accept any inlined flow objects.

Side-by-side flow object class requires the side-by-side feature. A side-by-side-item flow object is always displayed. It has a single principal port whose contents are displayed. The display-size of the content is the same as the display-size of the side-by-side.

Glyph-annotation flow object class is mainly used for characters, words, or phrases that have an associated description of their meaning or pronunciation. The annotation is placed on the before side in the line-progression direction of the annotated glyphs. A glyph-annotation flow object that has more than one annotated glyph shall not be broken between lines.

Alignment-point flow object class specifies an explicit alignment point for paragraphs with a first-line-align characteristic being true.

Aligned-column flow object class is used for grouping together externally aligned paragraphs. An aligned-column is displayed. It has a single principal port that may contain any displayed flow objects. Displayed flow objects in the port that are not externally aligned paragraphs shall be formatted normally.

Multi-line-inline-note flow object class is used for placing a note inline. A multi-line-inline-note consists of an open parenthesis in approximately the same size as the glyphs before the note, two lines placed one before the other in the line-progression direction with the contents in a smaller size than the surrounding glyphs, and a close parenthesis in the same size as the open parenthesis.

Emphasizing-mark flow object class is used for emphasizing characters, words, or phrases. Each emphasizing-mark shall be placed on a path that is perpendicular to the line-progression direction and that lies before the placement path in the line-progression direction. This path is called the emphasizing-mark placement path.

Flow object classes for mathematical formulae are math-sequence, unmath, subscript, superscript, script, mark, fence, fraction, radical, math-operator, and grid. These flow objects may also be used for "linear" chemical formulae. Character flow objects are used for characters in mathematical formulae; there is no special flow object class for this. Characteristics such as font-size: or font-posture: are determined in the usual way by the characteristics of the character flow object.

Flow object classes for tables are used for specifications for tabular formatting. They make use of the following flow object classes: table, table-part, table-column, table-row, table-cell and table-border.

Flow object classes for on-line display are used for the specification of formatting in electronic delivery.

4.3  DSSSL-Online Characteristics

DSSSL-Online is an application profile of DSSSL designed for the formatting specification requirements of on-line SGML browsers and editors. DSSSL-Online supports the basic features needed to provide publisher-oriented formatting control of on-line displays and a minimum set of page-oriented features needed to provide utility printouts from browsers and editors.

This section summarizes the features and flow object classes that must be supported by a minimally conformant DSSSL-Online application. Because DSSSL-Online is an application profile and because many flow object classes are optional in DSSSL, the comparison matrix is constructed making certain assumptions. DSSSL-Online is considered to contain only the Core DSSSL components and those options that are deemed necessary for on-line electronic delivery. A discussion of the flow object classes and their characteristics is avoided because they have been discussed already in the DSSSL section and are detailed in the Appendix.

Basic Flow Object Classes

DSSSL Options Required in DSSSL-Online

DSSSL Options Not Required in DSSSL-Online



4.4  Comparison Matrix

As mentioned earlier, the construction of a comparison matrix requires a set of comparison criteria against which features offered by DSSSL and the Output Specification will be evaluated. Using the information presented so far in the preceding sections and the detailed set of characteristics in Appendices A and B, we have derived the following set of criteria for evaluation. The criteria are intended not to be too detailed so that this matrix is not cluttered with detailed attributes and their values. The matrix will also attempt to highlight the areas covered by one specification over the other. Key formatting requirements are discussed in a section following the matrix.

Table 4.4-1  DSSSL Output Specification Comparison Matrix
Format Requirement Area
Output Specification
DSSSL
DSSSL

On-line
General Document Wide Formatting
Hyphenation
X
X
X
Counters
X
X
X
Floating Locations
X
X
X
Security Text
Partial
Word and Letter Spacing
X
X
X
Kerning
X
X
X
Indents
X
X
X
Quadding
X
X
X
Highlighting
X
X
X
Keeps
X
X
X
Spanning
X
X
X
Borders
X
X
X
AutoNumbering
X
X
X
String
X
X
X
Change Package Functions
Minimal
Security Classifications
Partial
Page Layout Formatting
Page sizes
X
X
Top and Bottom Margins
X
X
Left and Right Margins
X
X
Columns
X
X
Page Numbers
X
X
Headers
X
X
Footers
X
X
Title Pages
X
X
Justification
X
X
Orientation
X
Blank/Recto/Verso
X
X
Change Markings
Partial
Security Text
X
Flowing Text
X
X
X
Gutter Area
X
X
Border Area
X
X
Binding Edges
X
X
Paragraph Formatting
Named Styles
X
X
X
Top and Bottom Margins
X
X
X
Left and Right Margins
X
X
X
Indentation (left, right, in, and out)
X
X
X
Default Character Style
X
X
X
Line Spacing
X
X
X
Align (left, right, and center)
X
X
X
Page Breaks
X
X
X
Word Spacing Control
X
X
X
Tabs
X
X
X
AutoNumbering
X
Line Numbering
X
X
Shading
X
X
Hyphenation Control
X
X
X
Quadding
X
X
X
Prespace/Postspace
X
X
X
Line Wrapping Control
X
X
X
Line Truncate Control
X
X
X
Widow-Count
X
X
Orphan-Count
X
X
Language
X
Country
X
Writing-Mode
X
Paragraph Breaks
X
X
Character Formatting
Named Character Styles
X
X
X
Font Name
X
X
X
Font Size
X
X
X
Bold/Underline/Strike
X
X
X
Bold Weights
X
X
X
Italic/Other Angles
X
X
X
Dotted Underline
X
X
Overbar
X
X
Revision Marking
X
Pair/Track Kerning
X
X
X
Color
X
X
X
All Capitals
X
X
X
Shadow
X
X
X
Math-Font-Posture
X
X
Hyphenation Control
X
X
White Spaces
X
X
Ligatures
X
X
Math-Classes
X
Language
X
Country
X
Table Formatting
Named Table Styles
X
Page Breaks (before, after, and with)
X
X
X
Force Page/Column Break
X
X
X
Column Width
X
X
X
Top and Bottom Margins
X
X
X
Left and Right Margins
X
X
X
Align (left, right, and center)
X
X
X
Widow-Orphan Control
X
X
Header/ Footer Rows
X
X
X
Table Border Rules
X
X
X
Border Style
X
X
X
Table Auto Width
X
X
Border Rounded Corners
X
X
Table Part Formatting
X
X
Table Row Formatting
Page Break
X
X
X
Force Page/Column Break
X
X
X
Table Column Formatting
Column Number
X
X
Spanned Columns
X
X
X
Alignment
X
X
X
Indents
X
X
X
Table Cell Formatting
Ruling (left, right, top, and bottom)
X
X
X
Straddling
X
X
X
Margins in Cells
X
X
X
Shading
X
X
X
Orientation of Text
X
X
Background Color
X
X
X
Line Patterns
X
X
X
Line Thickness
X
X
X
Float Out Marginalia
X
X
Graphics Formatting
Displayed/Inlined
X
X
X
Dimensions
X
X
X
Horizontal Graphic Scaling
X
X
X
Vertical Graphic Scaling
X
X
X
Scale to Fit
X
X
X
Coordinates
X
X
X
Text Block Width
X
X
X
Text Block Depth
X
X
X
Horizontal Placement
X
X
X
Vertical Placement
X
X
X
Security Text
Headers
X
Footers
X
Embedded Text
Direction
X
Line Breaks
X
Box Formatting
Display/Inlined
X
X
Color
X
X
X
Corners Rounded
X
X
Line Types
X
X
X
Side By Side Items
X
Glyph Annotation
Placement
X
Style
X
Notes
X
X
X
Layout Driven Generated Text
X
X
X
Asian Language Options
X
Footnotes
Single Column
X
X
Multiple Column
X
X
For Each Text Column
X
X
Anchoring
X
X
Tables in footnotes
X
X
Mathematical Formulae
Superscript
X
Subscript
X
Alignment
X
Mark
X
Fence
X
Fraction
X
Radical
X
Math Operator
X
Grid
X
Grid Cell
X
Interactive Electronic Delivery
Vertical Scroll
X
X
Filling Direction
X
X
Multimode Presentation
X
X
Link
X
X
Marginalia
X
X

4.5  Comments on Matrix

One of the salient points of difference between the two specifications lies in the support for mathematical elements. The MIL-PRF-28001 OS and Presentation Specification (PS) do not support the specification of formatting characteristics for mathematical elements. When creating an SGML-tagged source file tagged for a specific document or contract, mathematical elements must be handled in one of four ways:

For additional information on tagging of mathematical notation, see MIL­HDBK­28001.

Other salient features not addressed completely by the Output Specification include

In the August 1995 meeting of the Output Specification Committee of the CALS Industry Steering Group (ISG) Electronic Publishing Committee (EPC), discussions were held on the direction of the Output Specification. There is a possibility of the development of four separate OSs. They would be the current OS, an OS to support Electronic Technical Manuals (ETMs) and Interactive Electronic Technical Manuals (IETMs), an OS to support change processing, and one for a series of simple subsets. In its current version, under the draft MIL­PRF­28001C however, it is our understanding that these subsets are not being added. Some members of the committee feel that the OS should retain its current form with security enhancements and ETM (without IETM capability) and move ahead. The justification for this is that a FOSI is capable of doing 80% of the processing needed for a technical manual, which is more than most users expect out a formatting mechanism.

While on the same topic, it is necessary to point out that neither DSSSL nor DSSSL-Online addresses the problem of change processing as applied to technical manuals. It is of the opinion that the issue is one of formatting and not really one relevant in a structured formatting specification as DSSSL.

Support for other languages is one aspect that the Output Specification does not address. DSSSL on the other hand addresses this issue quite well and lends support for features like right-to-writing mode, Asian languages, etc. It is probably felt that such a capability is really not of great importance in the DoD realm, but it is a feature of DSSSL not reflected in the OS.

The following features of DSSSL that are not a requirement in DSSSL-Online are complex typographic feature, bidi options, Asian language options, complex SGML options, and mathematical options.

4.6  Internet DART

DART is acronym that stands for DSSSL Assessment Repository Tool. DART uses the above matrix (Table 4.4-1) organized into the specific characteristics of the OS and flow object classes of DSSSL. Under each characteristic or flow object class subset, a concise set of information known as metadata is displayed. This metadata includes a hyperlink that displays information about the particular feature or flow object class characteristic in the other specification. The information repository is constructed atop a hypertext transport protocol server so that it may be accessed by anyone over the Internet. We have used a product called MOREplus for the design and implementation of this repository.

MOREplus is an information management tool consisting of a set of user interface executables that operate in conjunction with hypertext servers to provide access to a relational database. It can best be described as the core component of corporate or project information systems. The MOREplus user interface, from browsing and searching to presenting and organizing stored information, is provided by World Wide Web browser clients. The system administration, functionality necessary to add and update information, users, and other administrators, is also accessed through common browsers. Look and feel can be customized to include company or division logos through a TCL (Tool Command Language) layer that is manipulated without recompilation. Information can be organized to meet a variety of classification needs. Access to data is achieved through hypertext links that take the user to database records or to sites maintained on the Web for information not physically stored within the database. Access to proprietary or sensitive data can be restricted to designated groups of users and administrators. MOREplus offers both natural language and pattern mat