swirl Guide to OmniMark 5   OmniMark home
docs home 
IndexConceptsTasksSyntaxLibrariesOMX VariablesErrors
 
  Related Syntax   Related Concepts  
control structure   do sgml-parse    

Syntax

  do sgml-parse document (with id-checking Boolean-expression)?  (with utf-8 Boolean-expression)? (creating sgml-dtds key keyname)?
     scan (input source | (input input-function-call))
     action+
  done

  do sgml-parse subdocument  (with id-checking Boolean-expression)?  (creating sgml-dtds key keyname)?
     scan (input source | (input input-function-call))
     action+
  done

  do sgml-parse instance (with document-element element-name)?
     with (sgml-dtds key key | current sgml-dtd)
     (with id-checking Boolean-value)?

     scan (input source | (input input-function-call))
     action+
  done


Purpose

You can invoke the SGML parser, with do sgml-parse. To invoke the XML parser, use do xml-parse. do sgml-parse initiates a code block, ending with done, in which you must do the following:

  1. Identify the type of data to be processed, document, subdocument, or instance.
  2. Identify the source of this data, a stream, a file, or an input function.
  3. Perform any processing that should take place at the start of the data.
  4. Perform exactly one parse continuation operator (%c or suppress) to initiate processing of the data by markup rules.
  5. Perform any processing that should take place at the end of the data.
The simplest use of do sgml-parse is to process a complete SGML document:
  do sgml-parse document scan file "my-sgml.sgm"
     output "%c"
  done

This assumes that the file "mysgml.sgm" contains an SGML document. You will often find that the DTD and the instance you want to process are in two different files. The simplest way to handle this is:

  do sgml-parse document scan file "my-dtd.dtd" || file "my-sgml.sgm"
     output "%c"
  done

But suppose you have 20 instances to process, all of which use the same DTD. It is wasteful to parse the same DTD 20 times. To avoid doing this you can pre-compile the DTD and place it on the built-in shelf sgml-dtds:

  do sgml-parse document
     creating sgml-dtds key "my-dtd"
     scan file "my-dtd.dtd"
     suppress
  done
You can then process each instance in turn. The following code assumes you have placed the file names of the instances on a shelf called "my-instances":
  repeat over my-instances
     do sgml-parse instance
        with sgml-dtds key "my-dtd"
        scan file my-instances
        output "%c"
     done
  again   
In some cases you may wish to parse a partial instance, that is, a piece of data comprising an element from a DTD which is not the doctype element of that DTD. In this case you can specify the element to be used as the effective doctype for parsing the data:
  do sgml-parse instance
     with document-element "lamb"
     with sgml-dtds key "my-dtd"
     scan file "partinst.sgm"
     output "%c"
  done
The element's start and end tags can be present, or they can be omitted if the element allows. SGML comments, processing instructions and even marked sections can precede and follow the element's start and end tags, but anything else (particularly other elements, data, entity references or USEMAP declarations) is an error.

You can also use do sgml-parse to parse an SGML subdocument. Subdocument processing can only occur in the middle of parsing another SGML document that includes the subdocument reference. The concrete syntax defined by the document currently being processed is used to parse the subdocument. In accordance with the SGML standard, the subdocument's text must not contain an SGML declaration.

This is an example of how to make references to SGML subdocument entities trigger parsing of the subdocument entities. The source of the subdocument entity text in the example is assumed to be a file whose name is either the system identifier (provided by a library rule), the "public text description" part of the public identifier, or the name of the entity (uppercased and with ".ENT" file extension appended).

  external-data-entity #implied when entity is subdoc-entity
     local stream file-name
     output "subdoc depth exceeded!%n"
            when number of current subdocuments > 100
     do when entity is system
        set file-name to "%eq"
     else when entity is in-library
        set file-name to "%epq"
     else when entity is public
        do scan "%pq"
        match (["+-"] "//")? ((lookahead ! "//") any)* "//"
              [ \ " "]* " " "-//"?
              ((lookahead ! "//") any)* => public-text-description
           set file-name to public-text-description
        done
     else
        set file-name to "%uq.ent"
     done
     do sgml-parse subdocument scan file file-name
        output "%c"
     done

Processing a subdocument increments the integer value returned by the number of current subdocuments (and decrements it when the action has finished), but OmniMark does not issue an error message when the subdocument nesting level exceeds that allowed by the concrete syntax or when "subdoc no" is specified by the concrete syntax.

By default, OmniMark checks all SGML IDREF attributes to make sure they reference valid IDs. This checking may not be appropriate in processing a partial instance. It also takes time. You can turn this checking on and off using with id-checking followed by a Boolean expression. The following code will parse the specified document without checking IDREFs:

  do sgml-parse document with id-checking false scan file "my-sgml.sgm"
     output "%c"
  done

When parsing a document, markup rules are fired as follows (if specified in your code):

When parsing a subdocument, markup rules are fired as follows (if specified in your code):

When parsing an instance part only general markup rules are fired.

As with subdocument, instance saves and resets the integer value returned by the number of current subdocuments and restores the saved value when the action is finished.

do sgml-parse saves the current setting of sgml-in and sgml-out and restores them at the end of the action.

If there are errors in the SGML declaration or prolog (DTD), then the processing of the content of the do sgml-parse action will terminate and execution is resumed in the actions following the parse continuation operator in the body of the do sgml-parse. However, the amount of input read is undefined in this situation. That is, OmniMark may choose to consume the entire input source, it may stop reading the input immediately, or it may do something in between.

SGML is an ASCII-based language. This means that character references greater than 127 (for example ï) have no predefined encoding method appropriate to them. The OmniMark parser outputs character references between 128 and 255 as equivalent binary byte values. Character references greater than 255 cause a markup error.

If the document you are processing contains numerical character references greater than 127, you can instruct the parser to output them as UTF-8 byte sequences. This will allow character references above 255 to be output as UTF-8 byte encodings. This is appropriate if, and only if, your output will be encoded and interpreted as a UTF-8 document.

To turn on UTF-8 output of character references, use the with utf-8 modifier with a Boolean expression that evaluates to true:

  process
      do sgml-parse document with utf-8 true
          scan file "myfile.sgm"
          output "%c"
      done

Note that actual UTF-8 encoded characters in your input data are unaffected by this setting.

Note that with utf-8 can only be used with a full document and not with a subdocument or instance parse. Subdocument processing inherits the UTF-8 setting of the parent parse.

    Related Syntax
   creating
   sgml-dtds
   do xml-parse
 
Related Concepts
   Input
   Input functions
   SGML DTDs: creating
   XML/SGML parsing: built-in shelves
 
----

Top [ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ OMX ] [ OMX ] [ ERRORS ]

Generated: August 11, 2000 at 3:07:10 pm
If you have any comments about this section of the documentation, send email to docerrors@omnimark.com

Copyright © OmniMark Technologies Corporation, 1988-2000.