|
|||||||||
|
|||||||||
Floating point data type |
The floating point data type allows you to store numbers in floating point form. This means that the numbers behave like numbers written in scientific notation -- they can have not only a number, but also a base and an exponent.
Floating point numbers are limited in precision.
Floating point numbers are particularly appropriate for physics and astronomical calculations -- calculations where the result is either very small or very large.
BCD numbers are generally superior to floating point numbers for most applications. There are three principal differences between these two types:
Base 2 versus base 10
Floating point numbers are represented internally as binary (base 2) numbers. They provide precise representation of fractional numbers that are powers of 2 (1/2, 1/4, 1/8, 1/16, and so forth), but they do not provide precise representation of fractions that are powers of 10 (1/10, 1/100, 1/1000). Any fraction that can be precisely represented in base 2 can be precisely represented in base 10, but not vice versa. (There are, of course, many fractions that cannot be precisely represented in either base 2 or base 10 -- 1/3 for example.)
Limited size versus unlimited size
Floating point numbers are of a limited size and are represented by a fixed number of bytes of memory. BCD numbers, as implemented by the OmniMark BCD library, are of unlimited size.
Floating point versus fixed point
Floating point numbers, as their name implies, have a floating decimal point. That is, floating point numbers have a fixed number of significant bits which are distributed between the whole number portion and the fractional portion of the number. The larger the whole number portion of the number, the fewer bits are available for the fractional part.
Mixing floating point and integer values
You can mix integer variables and floating point variables in mathematical expressions. Thus, you can write:
include "omfloat.xin" process local float price initial {6.37 * float 10 ** 3} local float total local integer quantity initial {3} set total to quantity * price output "Total = " || "d" % total || "%n" ;Output: "Total = 19110"
Note that if you perform an operation on two integers and assign the result to a floating point number, the operation will be done as an integer operation and the result will be coerced to a float. Thus the following code will fail, even though a float can hold the result of 1000000 * 2000000:
include "omfloat.xin" process local integer large initial {1000000} local integer larger initial {2000000} local float largest set largest to float(large * larger) output "Largest = " || "d" % largest || "%n" ;Output: "Largest = -1454759936" (This is incorrect.)
In this case, the result of the integer operation large * larger
will overflow before the coercion to a floating point number. The correct way to code this operation is to force one of the operands to float before the operation is performed. This causes the operation to be performed as a floating point operation, returning a floating point value:
include "omfloat.xin" process local integer large initial {1000000} local integer larger initial {2000000} local float largest set largest to float large * larger output "Largest = " || "d" % largest || "%n" ;Output: "Largest = 2000000000000" (This is correct).
Supported operators
You can use the following operators with floating point numbers:
Handling floating point errors
In the event of an error in a calculation, theFloating Point library will return NaN
. NaN means "Not a Number".
include "omfloat.xin" include "builtins.xin" process local float total initial {2.2} local stream foo initial {"foo"} set total to total + foo output "Total = " || "d" % total || "%n" ; Output: "Total = NaN" ; Note: "NaN" means "Not a Number"
---- |