CGI programming with OmniMark

Related Syntax

The Common Gateway Interface (CGI) is a protocol that allows web servers to invoke and communicate with other programs. Through CGI, a web server can call a program, send input to that program, and receive output from that program sent to a web browser.

Writing OmniMark CGI programs is similar to writing other programs in OmniMark, with a few differences in how your program has to handle input and output data.

Before you can begin using OmniMark CGI programs, you may have to make some minor changes to your computer and web server setup. Information about configuring your system should be available in your operating system or web server documentation.

Running an OmniMark CGI program involves three basic steps:

Invoking the program
Receiving input data
Doing the main processing

The third step is, of course, the heart of your program. In OmniMark CGI programs, this third step is exactly the same as in any other type of OmniMark program. You can do anything with OmniMark in a CGI program that you can do in any other OmniMark program, including processing markup, using external function libraries, or interacting with databases.

Invoking an OmniMark CGI program

Before an OmniMark CGI program can receive input data from a web server, the web server has to be able to successfully invoke the OmniMark program. To do this, the web server has to know where it can find the OmniMark executable.

Some web servers (for example, Apache) require that all CGI programs begin with a (hash-bang) directive that tells the web server where to find the software with which to run your CGI program. In an OmniMark CGI program, this line must include the full path and name of your OmniMark executable, followed by a command-line option:

     #!/usr/bin/omnimark/bin/omnimark -sb

The "-sb" command-line option is new to OmniMark 5, and is the functional equivalent of a "-s" combined with "-brief".

The command-line option in the line might look a little strange to people who are used to the OmniMark command line, since the -sb option isn't followed by a filename. This is because the line implicitly refers to the file in which it occurs. For example, assuming that the program above is saved as helloworld.xom, your operating system will interpret the directive in this program as:

     #!/usr/bin/omnimark/bin/omnimark -sb helloworld.xom

Note that you can use a line in an arguments file as well:

     #!/usr/bin/omnimark/bin/omnimark -f

     -sb helloworld.xom
     -x /usr/bin/omnimark/lib/=L.so
     -i /usr/bin/omnimark/xin/

The -f in the arguments file is interpreted the same way as the -sb in the program: the operating system interprets the -f as being followed by the name of the arguments file itself.

Using a line in OmniMark CGI programs does not reduce the portability of your code: OmniMark ignores the line as if it were a comment, as do web servers that don't require it. Note, however, that if you use a line, it must be the first line in the program. Having anything preceding the line will produce an error.

Web servers that don't use the line must be specifically configured to recognize OmniMark CGI programs and to find the OmniMark executable. This is usually done by creating file associations so that the system uses OmniMark to execute .xom and .xar files. For example:

  ".xom" = "d:\programs\omnimark -sb %s"
  ".xar" = "d:\programs\omnimark -f %s"

Receiving input data

CGI programs receive their input data from the web server in two ways: through environment variables and through standard input. The means used to retrieve the main input data depends upon the method used to send that data, which will usually be either GET or POST.

When you specify the GET method, the web server puts the main input data into an environment variable called QUERY_STRING. When you specify the POST method, the web server sends the main input data to the CGI program through standard input.

The method your OmniMark program uses when retrieving GET data will differ from the method used when retrieving POST data. Using the OmniMark CGI library, however, you can easily create an OmniMark CGI program that will successfully retrieve data sent by either of these methods.

Using the OmniMark CGI library

The OmniMark CGI library contains two functions and a macro:

cgiGetQuery function
cgiGetEnv function
crlf macro

The cgiGetQuery function retrieves the data that the web server sends to your OmniMark CGI program, parses it, and puts the data on a keyed shelf of name/value pairs. The cgiGetEnv function retrieves the values of a variety of CGI-related environment variables and puts the data on a keyed shelf of name/value pairs. The CRLF macro allows you to easily use %13#%10# instead of a %n to insert a new line in your program output. Use %13#%10# instead of %n to ensure the portability of your code.

Because the OmniMark CGI function library uses functions in the OmniMark System Utilities library ("omutil"), you must declare and include the System Utilities library before including the "omcgi.xin" file.

Here's an example of the cgiGetQuery function in action:

     declare #process-input has unbuffered
     declare #process-output has binary-mode

     include "omutil.xin"
     include "omcgi.xin"

     process
        local stream input-data variable initial-size 0

        cgiGetQuery into input-data

        output "Content-type: text/plain"
            || crlf
            || crlf

        repeat over input-data
           output key of input-data || " - " || input-data || crlf
        again

When called by a web server, the above program retrieves the query string from the QUERY_STRING environment variable if the GET method was used to send the data, or from #process-input if the POST method was used. The program parses the query string and puts the name/value pairs on the input-data shelf. The program then outputs a minimal HTTP header (Content-type: text/plain), and repeats over the input-data shelf, outputting the name/value pairs.

Note that in the program, #process-input is declared as unbuffered. Under normal circumstances, all OmniMark streams are buffered. When doing CGI programming, however, this buffering can cause endless amounts of trouble when you're trying to get your input data. If #process-input is buffered, your OmniMark program will never be able to get all the data it's waiting for. Therefore, you always have to tell your program to unbuffer #process-input. Do this with a declaration at the beginning of the program:

     declare #process-input has unbuffered

The cgiGetQuery example shown above is an extremely simple CGI program. It does, however, do all the essential things that any OmniMark CGI program must do: it retrieves and parses the input data sent by the web server, and it sends a minimal HTTP header to the web server prior to sending the main bulk of the output.

Sending output data

Data written by your OmniMark CGI program is sent to standard output (which is the default #process-output stream) and is then relayed to the web browser. For the web browser to properly format the output, however, your program must output a minimal HTTP header before outputting the data to be displayed.

Setting #process-output to binary-mode ensures that the system running your CGI program will properly interpret the %13#%10# of the CRLF macro. This ensures that your code is portable among systems.

Declare the #process-output stream as binary-mode with the following declaration:

  declare #process-output has binary-mode

Formatting output data

Here's a simple OmniMark CGI program:

     ; declarations and inclusions
     declare #process-input has unbuffered
     declare #process-output has binary-mode

     include "omutil.xin"
     include "omcgi.xin"

     process
        output "Content-type: text/plain"
            || crlf
            || crlf
            || "Hello World!"
            || crlf

Assuming that this program is saved as "helloworld.xom" and has an accompanying arguments file saved as "helloworld.xar", you can invoke the program by using the path and name of the arguments file in a URL. For example:

     http://localhost/cgi-bin/helloworld.xar

If the web server is properly configured, it will receive the request for the helloworld.xar file which will then call the helloworld.xom program. The program will execute, and the output (an HTTP header followed by "Hello World!") will be sent to the browser. The browser, in turn, will display that output as plaintext ASCII.

Before you send the main content of the page to the browser, you have to send an HTTP header. In most cases, a very minimal HTTP header will suffice, so long as it contains the content-type information for the page you are sending. The two most common types of page content are plaintext and HTML, the minimal HTTP headers for which are:

      "Content-type: text/plain" || CRLF || CRLF
      "Content-type: text/html"  || CRLF || CRLF

Notice the two new lines (|| crlf || crlf) appended to the end of each of the HTTP header lines above. These new lines are required, because the blank line following the HTTP header indicates to the web browser that the HTTP header is complete, and that everything that follows is part of the page content. If you forget these new lines when sending output to the browser, the web browser will attempt to interpret all of the output as part of the header, which will result in an error.

Other than sending a minimal HTTP header followed by two new line characters, there is nothing special about the output of an OmniMark CGI program. Anything your program sends to standard output (#process-output, which is the default output destination) will be sent to the web browser.

Unbuffering #process-output

While you don't have to declare #process-output as unbuffered in your OmniMark CGI programs, it can sometimes be a good idea to do so. If #process-output is unbuffered, users can see responses from your CGI program a little more quickly than if #process-output is buffered. In most cases the change in response time is negligible. If your CGI program executes a large number of database queries or some particularly time-consuming processing, however, unbuffering #process-output can reduce the perceived "wait time" for the user, and your CGI program will seem more responsive. Again, the change in response time is often negligible, because OmniMark CGI programs tend to be extremely fast.

You can unbuffer #process-output with the following declaration:

  declare #process-output has unbuffered

Error message handling

Dealing with error messages that OmniMark CGI programs generate is a bit more complicated than debugging other OmniMark programs.

When you execute a regular OmniMark program on the command line, any errors that program generates are sent to standard error and displayed in the console window. Since OmniMark CGI programs are executed by the web server rather than through the command line, the error messages are sent back to the web server and usually end up being written in the web server error log file.

If your OmniMark CGI program has errors, the HTTP header that the web browser is expecting doesn't get sent. Instead, the web server receives one or more OmniMark error messages. Since an OmniMark error message does not qualify as a valid HTTP header, the "header" the web server receives (which is actually the OmniMark error message) gets recorded in the server error log as part of a "malformed header" error. The web browser in this case will usually display an HTTP 500 error, indicating an internal server error. To see the error messages, you'll have to open and read the server error log.

If you don't want to tackle the web server error log or if the OmniMark error messages aren't being written to it, you can create an error log for your OmniMark CGI program using the -log or -alog option in the arguments file:

     #!/usr/bin/omnimark/bin/omnimark -f
     -sb helloworld.xom
     -alog helloworld.log

All error messages that OmniMark generates will be recorded in the file specified after the -log or -alog option. If errors occur in your program, your web browser will display a CGI error message that your CGI program returned an incomplete set of HTTP headers. This occurs because the web browser didn't actually receive anything; the program output (the error messages in this case) was sent to the log file instead.

Note that the -log and -alog command-line options should be used to create a log file only for debugging purposes. If you use these options in your CGI program when it is running in a production environment, your program could encounter concurrency problems if two instances of the CGI program are trying to write to the same log file simultaneously. To avoid these problems, stop using the -log or -alog option after you have finished debugging your program.

CGI-related environment variables

When a web server receives a request for a CGI program, it also stores other CGI-related information in environment variables. You can access these environment variables using the UTIL_GetEnv function in the OmniMark System Utilities library ("omutil"). Not all web servers will set all environment variables. You can use the cgiGetEnv function to retrieve all of the following environment variable values into a keyed shelf of name/value pairs:

AUTH_TYPE: The authentication protocol currently being used. This variable is defined only if the server supports, and if access to the CGI program requires, authentication.
CONTENT_LENGTH: The length, in bytes, of the information that the web server sends to the CGI program as input. This variable is used most often when the CGI program will be processing input being sent from an HTML form using the POST method.
CONTENT_TYPE: The type of content that the web server sends to the CGI program as input.
DOCUMENT_ROOT: This variable is set to the value of the DocumentRoot directive of the accessed website.
GATEWAY_INTERFACE: The version of the Common Gateway Interface that the web server supports.
HTTP_ACCEPT: A comma-separated list of MIME types that the browser software accepts.
HTTP_ACCEPT_CHARSET: The character set that the client will accept.
HTTP_ACCEPT_LANGUAGE: The language that the client will accept.
HTTP_CONNECTION: The type of connection that the client and server use. For example, "HTTP_CONNECTION = Keep-Alive".
HTTP_HOST: The IP address or host name of the accessed machine.
HTTP_REFERER: The URI that forwarded the request to the called CGI program.
HTTP_USER_AGENT: The browser software and operating system that the client system is running.
PATH_INFO: Extra path information from the request.
PATH_TRANSLATED: Maps the CGI program's virtual path (from the root of the server directory, for example) to a physical path that could be used to call the program.
QUERY_STRING: Contains the encoded data from a form submission when that form is submitted using the GET method. If a form is submitted using the POST method, this environment variable is not set, as the encoded data is passed to the CGI program through standard input (in OmniMark terms, through #process-input).
REMOTE_ADDR: The IP address of the client machine.
REMOTE_HOST: The host name of the client machine.
REMOTE_IDENT: Stores the user identification information returned by the remote identd (identification daemon). Few systems run this type of daemon process, however, so this environment variable is rarely set.
REMOTE_PORT: The port number the client uses to originate the connection to request the CGI program.
REMOTE_USER: The authenticated user ID of the user requesting the CGI program. This variable is defined only if the server supports, and if access to the program requires, authentication.
REQUEST_METHOD: The method by which the CGI program was called (usually "GET" or "POST").
REQUEST_URI: The URI of the request.
SCRIPT_FILENAME: The URI of the requested CGI program.
SCRIPT_NAME: The virtual path to the program.
SERVER_ADMIN: The value of the ServerAdmin directive, if one is used, to set the email address of the web server in the web server's configuration file.
SERVER_NAME: The configured host name for the server (usually www.something.com).
SERVER_PORT: The number of the port on which the server software is listening for requests (usually 80, the default web server port).
SERVER_PROTOCOL: The version of the web protocol that the server uses (for example, HTTP 1.0).
SERVER_SOFTWARE: The web server software and version number.

[ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ OMX ] [ OMX ] [ ERRORS ]

Generated: August 11, 2000 at 3:06:17 pm
If you have any comments about this section of the documentation, send email to docerrors@omnimark.com

Copyright © OmniMark Technologies Corporation, 1988-2000.