swirl Guide to OmniMark 5   OmniMark home
docs home 
IndexConceptsTasksSyntaxLibrariesOMX VariablesErrors
 
     
Scopes

Scopes play a particularly important role in the design and execution of OmniMark programs. OmniMark has five kinds of scopes:

Lexical scopes

A lexical scope is a scope in the written structure of the program. For instance, a rule is a lexical scope -- it is written as a series of lines one after another. A function is also a lexical scope. Within a rule or function, a repeat loop or a do...done block is also a lexical scope.

Lexical scopes define the visibility of variables. You can declare a local variable in any lexical scope and it will be visible only to code within that scope. Where one lexical scope is nested inside another, variables declared in the outer scope are visible in the inner scope, unless a variable of the same name is declared in the inner scope. In this case, the variable in the outer scope is hidden within the inner scope, but it still exists in the outer scope.

  process
  local stream foo
  local stream bar
      set foo to "A"
      set bar to "Z"
      output foo || bar
      do
          local stream foo
          set foo to "B"
          set bar to "Y"
          output foo || bar
      done
      output foo || bar

In this program the process rule is one lexical scope. The do...done block is another lexical scope nested inside the lexical scope of the rule. The program outputs "AZBYAY". The variable "bar", declared in the outer scope, is visible in the inner scope, so when its value is changed in the inner scope, the original variable is changed. The variable "foo", on the other hand, is a different variable inside the do...done block from the one declared in the rule. Changing the value of foo in the do...done block does not change the value of foo in the outer scope.

Scope of execution

An execution scope is a lexical scope that is actually being executed at a particular point in a program. Just as lexical scopes can be nested in each other lexically, as shown above, execution scopes can be nested in each other in the course of program execution. The most straightforward case of execution nesting is a function call.

  define integer function sum
      (value integer foo,
       value integer bar
      )
      as
      return foo + bar

  process
  local stream foo
  local stream bar
      set foo to "A"
      set bar to "Z"
      output foo || bar
      do
          local stream foo
          set foo to "B"
          set bar to "Y"
          output foo || bar
          output "d" % sum (2, 4)
      done
      output foo || bar

Here the function "sum" is an entirely separate lexical scope. The variable names "foo" and "bar" used in the function have nothing to do with the variable names "foo" and "bar" in the process rule. But as the program is executed, the execution scope of the function is nested inside the execution scope of the process rule.

A more common case, in OmniMark, is the nested execution scoping that occurs when a find rule fires as a result of a submit in a rule:

  process
      output "<rhyme>"
      submit "Mary had a little lamb"
      output "</rhyme>"

  find ("Mary" | "lamb") => person
      output "<person>" || person || "</person>" 

This program outputs "<rhyme><person>Mary</person> had a little <person>lamb</person></rhyme>". In this program, the execution of the find rule is nested inside the execution of the process rule. The submit initiates the scanning of the input data and invokes the find rules. It is this execution scoping that ensures that the "<rhyme>" and "</rhyme>" tags get wrapped around the material output as a result of the submit.

The find rule and the process rule are independent lexical scopes but nested execution scopes. Note, however, that unlike the previous example in which the nested execution scope of the function was directly invoked by the function call, in this case it is the data that determines if and when a find rule will be executed in the execution scope established by the process rule. The fact that the data drives program execution in this way is what makes OmniMark such a powerful text processing tool.

While local variables are never visible outside their lexical scope, they are still instantiated for as long as their lexical scope is in execution scope, and they may well be active. Consider the following program:

  process
      local stream foo
      open foo as file "foo.txt"
      using output as foo
      do
          output "<rhyme>"
          submit "Mary had a little lamb"
          output "</rhyme>"
      done

  find ("Mary" | "lamb") => person
      output "<person>" || person || "</person>" 

In this case the local stream variable "foo" created in the process rule is the current output stream for the lexical scope bounded by using output as foo do and done. While it is not lexically in scope in the find rule, and you cannot put any code in the find rule to address or manipulate it, it is still very much active. It is the stream that output goes to when you say "output" in the rule.

Output scopes

What we saw above was in fact the establishment of an output scope. In most languages, "output" or its equivalent takes the form of an assignment statement, and the variable the assignment is made to must be in lexical scope. In OmniMark, the question of where output goes to is separated from the act of creating output, meaning that the stream that receives output does not have to be lexically in scope for you to output to it. Instead, a stream can be placed in an output scope. Once a stream is in the current output scope, all output will go to it, no matter what lexical scope the output statement occurs in.

You can use the keywords using output as to create an output scope and to place a stream variable into that output scope. Like any other kind of scope, output scopes can be nested:

  process
      local stream foo
      open foo as file "foo.txt"
      using output as foo
      do
          output "<rhyme>"
          submit "Mary had a little lamb"
          output "</rhyme>"
      done

  find ("Mary" | "lamb") => person
      local stream foo
      reopen foo as file "foo2.txt"
      using output as foo
        output "<person>" || person || "</person>"    

Here a new output scope is established in the find rule, causing the material output in the find rule to be sent to a different destination. This output scope is nested inside the output scope created in the process rule. This scope becomes the current output scope again as soon as the find rule exits.

You can also place a stream into the current output scope, without creating a new output scope, using output-to:

  global stream foo
  global stream bar

  process
      open foo as buffer
      open bar as buffer
      using output as foo
          submit "Mary had a little lamb"
      close foo
      close bar
      output "Foo contains: " || foo
      output "%nBar contains: " || bar

  find " a "
      output-to bar

This program outputs the following:

  Foo contains: Mary had
  Bar contains: little lamb

The output-to in the find rule resets the destination of the output scope established by the using output as in the process rule. Thus the rest of the text goes to the new destination.

In general, you should use using output as rather then output-to, but output-to is useful in certain situations, especially when the destination of data is determined by examining the data itself.

Consider a piece of XML that might be used to send files across a network. It encapsulates the name of the file and its contents inside "name" and encapsulates "data" elements inside a "file" element:

  <file>
  <name>myfile.txt</name>
  <data>The content of the file.<data/>
  </file>

We can process this with the following program:

  global stream file-data
  process
      do xml-parse document
          scan file "files.xml"
          suppress
      done

  element file
      suppress
      close file-data

  element name
      open file-data as file "%c"
      output-to file-data

  element data
      output "%c"

Here the entire processing of the XML is done in a single output scope, but every time we find a filename in the input we change the destination of the current output scope. It would be difficult to do the same thing with using output as, because the "data" element is not nested inside the "filename" element, so an output scope established in the "name" element would have expired before the "data" element was processed.

(By the way, the example shows poor XML language design. It would have been better to make the filename an attribute of the "file" element. But you can't always control the format of the data you have to process.)

If you use output-to, note that placing a stream into an output scope does not exempt it from the rules of lexical scoping when it comes to the life span of variables. Local variables are created when their lexical scope enters execution scope and destroyed when their lexical scope leaves execution scope. It is an error to allow a local variable to go out of execution scope while it is still in an output scope. You will avoid this problem if you stick to using using output as to create output scopes.

Input scopes

We have already seen several examples of an input scope. Every example above that uses a submit or do xml-parse is creating a new input scope. Input scopes are the flip side of output scopes. Just as output scopes determine where output goes, so input scopes determine where input comes from. Just as we never have to say where output goes to in an output statement, we never have to say where the input comes from when we write a find rule. Output goes to the current output scope. Input comes from the current input scope.

A new input scope is created by every submit, every variant of scan, and every matches test. They establish the current input for the execution scopes contained within them. They also initiate scanning of that source. You can change the current input scope without initiating scanning by using using input as.

Once an input scope is established, it is in effect for the execution scope of the submit, scan, or using input as that established it. Within that scope, you can initiate a new scan of the current source using #current-input. This allows you to perform efficient one-pass scanning of nested structures by initiating a new scan for each level of nesting, without the need to capture the whole structure and re-scan it.

The following code demonstrates this with the function "sum-of-csv", which calculates the sum of a series of comma-separated values found in the current input. This function could be called anywhere there is a current input scope, and it will consume a series of comma-separated numeric values from the current input scope and return the sum. It will exit as soon as it encounters data that does not fit the pattern it is looking for, leaving the current input scope intact, but with the comma-separated-value data consumed.

  define integer function sum-of-csv
      as
      local integer sum initial {0}
      repeat scan #current-input
          match white-space*
                digit+ => number
                white-space*
                ","?
              set sum to sum + number
      again
      return sum

  process
      repeat scan "Results: (12,34,65, 92 , 75 )"
          match "Results:" white-space* "("
              output "Total: " || "d" % sum-of-csv
          match ")"
      again    

Note the difference between this code and the more common programming practice represented by the following program:

  define integer function sum
      read-only integer numbers
      as
      local integer total initial {0}
      repeat over numbers
              set total to total + numbers
      again
      return total

  process
      local integer numbers variable
      repeat scan "Results: (12,34,65, 92 , 75 )"
      match "Results:" white-space* "(" [digit or space or ","]* => csv ")"
          repeat scan csv
          match digit+ => num
              set new numbers to num
          match any
          again
          output "Total: " || "d" % sum numbers
       again    

The differences between these two pieces of code are twofold. First, in the second, more conventional, code the outer level of code is responsible for identifying the whole nested structure. This has a kind of symmetry about it, but it is misleading symmetry. The task of recognizing the beginning of a nested structure takes place outside the nested structure. (You find the door marked "IN" when you are outside; you find the door marked "OUT" when you are inside.) The task of recognizing the end of a nested structure should take place inside the nested structure. In our first example, the function that handles the comma-separated values is responsible for figuring out when the comma-separated values end. It does this very easily by exiting the repeat scan as soon as it sees a character that does not fit the pattern it is looking for.

The second difference between the two programs is that the second program has to scan the csv data twice -- once when it is trying to find it in the data stream, and again when it is analyzed in the second repeat scan. The first program processes the csv data and finds the end of the structure in one pass.

Referent scopes

Referents also exist in scopes. The default referent scope is established at the start of a program and is resolved when the program ends. You can use the code using nested referents to establish a nested referents scope. A nested referents scope is in effect for the duration of the execution scope with which it is established. The advantages of creating nested referent scopes are three:

  1. In a server program, they allow you to use referents and have them resolved without having to shut down the program. This is the only effective way to use referents in a server program.
  2. They allow you to resolve referents as soon as is needed so that you can use the resulting output in your program.
  3. They allow you to use referents in processing nested structures. For instance, if you use referents to write code to process tables, and if your data tables can contain other tables, using nested referent scopes lets you call your table code recursively without corrupting the referents for the parent table.

       
----

Top [ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ OMX ] [ OMX ] [ ERRORS ]

Generated: August 11, 2000 at 3:06:28 pm
If you have any comments about this section of the documentation, send email to docerrors@omnimark.com

Copyright © OmniMark Technologies Corporation, 1988-2000.