Author: Aleksandr Gofen
Dynamic Arrays overview
Answer:
The Long and Winding Way of the Dynamic Array
Dynamic arrays were introduced to Object Pascal in Delphi 4. It was, however, not
the first attempt of the Pascal/Delphi team to evolve the static array of Wirth's
original Pascal.
Before going any further, let's first clarify some terminology. The terms "static"
and "dynamic" are now applied at least four ways:
For arrays whose boundaries may vary (dynamic), versus arrays with constant
boundaries (static).
For the methods of assigning memory to the variables: Either their relative
addresses are known at compile time (static), or the addresses are assigned by the
system at run time (dynamic). Correspondingly, there are two methods of memory
allocation: on the stack (static), or on the heap (dynamic).
There are two methods referring to variables: either directly by their names
(static), with one-to-one correspondence between a variable and its instance in
memory; or indirectly via pointers (dynamic) where a variable and its instance are
not the same.
Methods in the class declaration may be either static or virtual/dynamic.
This article follows the evolution of the "dynamic" concept with regard to arrays
only. We will analyze its particularities for all array types appearing in
Borland's Object Pascal. In addition to the standard arrays (static), there are now
three more types in the family of arrays: open, variant, and dynamic. Why so many?
Although it's beyond the scope of this article, we should also add to this list all
the different types of strings in Delphi 4, which are a family of special character
arrays whose diversity is even larger.
The Origin
The idea of dynamic arrays has a long history, beginning as early as ALGOL-60. In
ALGOL-60, the syntax of array type looks similar to that of Pascal. For example, a
declaration:
Real array A[M1: N1, M2: N2, M3: N3]
defines a 3D array of Real numbers. Only constants (integer numbers) are allowed as
the array boundaries in the outermost block - just as in Pascal; in this case, the
array is called static. But in the inner blocks of ALGOL-60, unlike Pascal, the
boundaries may be variables (see Figure 1).
1 begin
2 Real array A[1: 100, 0..10]; { Static array in ALGOL-60. }
3 Integer M1, N1, M2, N2; { Variables. }
4 { M1, N1, M2, N2 have to be defined. }
5 ...
6 begin
7 Real array B[M1: N1, M2: N2]; { Dynamic array in ALGOL-60. }
8 Real array C[1: 3, 1: 4]; { "Static" array in ALGOL-60. }
9 ...
10 end;
11 ...
12 end
Figure 1: Variable boundaries in the inner blocks of ALGOL-60.
In the inner blocks, the array boundaries could be any arithmetic expression with
the only requirement that numeric values were assigned to all variables in the
boundaries before entering the block. Therefore, beginning with ALGOL-60, the
concept of a dynamic array means that its boundaries may vary during run time,
while for static arrays only explicit numbers are allowed in the boundary
expressions.
For arrays that are local variables (like arrays B and C in Figure 1), the memory
was allocated when entering the block and released when leaving it. Because
declarations of the variables local in a block could appear only in the beginning
of that block, no complications with redefining dynamic arrays occurred, and
obviously such arrays could not be kept till the next entrance in this block.
Nevertheless, in ALGOL-60, one could use the specifier own for any local variable
including dynamic arrays, which meant that, although the visibility of the
concerned variables obeyed the rule of scope, their values were kept available
after re-entering the block. According to the Revised Report on ALGOL-60, this
persistency in case of dynamic arrays ought to be followed also for the subset of
indexes that are valid for the current and previous versions of the own array,
although specific compilers could limit or simplify that behavior.
Note: Regarding compile-time versus run-time assignment of memory to the variables,
it was typically run-time assignment in ALGOL-60 versus compile time in Pascal,
although both use the stack.
The Long and Winding Road
In Wirth's Pascal, programmers could derive an unlimited number of new types from
the basic types, but, for some reasons, the basic array type strictly required only
constant boundaries and, therefore, was allowed to deal only with static arrays.
Wirth's motivation probably was to have very fast and efficient one-pass compilers,
for which all but indirect variables (i.e. except instances of pointer types) are
compile-time variables providing the most efficient access. As a drawback, there
was no way to overcome the static nature of the arrays, e.g. to develop procedures
that deal with vectors, matrixes, and other structures of arbitrary sizes. We had
to hard-code all array sizes as the maximum possible numeric values in the const
section at least once.
For one-dimensional arrays, with a constant low boundary Low1 and variable boundary
High1, we could resort to a trick involving indirect variables (pointers):
13 type
14 TMaxArray = array[Low1..MaxInteger] of AnyType;
15 PArray = ^TMaxArray;
and later allocate memory to the instance of PArray according to the actual value
of High1:
16
17 GetMem(PArray, (High1 - Low1 + 1) * SizeOf(AnyType));
Then, we can use index expressions like PArray^[k]; we can omit ^ because of the
undocumented syntax feature of Delphi. Unfortunately, for multi-dimensional arrays,
this approach with indirect variables doesn't work as simply with just one pointer.
Another drawback is that we have to deal with pointers instead of direct variables
(as we are responsible for allocating and de-allocating memory and other possible
confusions connected with indirect access).
Open Arrays
The open array, introduced in Delphi 1, was the first extension of Pascal's concept
of static arrays, but it wasn't really a type like the others that could be used to
declare variables. Instead, it was an intrinsic type, applicable only to formal
parameters in procedures and functions. If a formal parameter looks like this:
FormalArr: array of TSomeType;
then the actual parameter may be either a one-dimensional array of type TSomeType,
just one variable, or the so-called open array constructor [Expr1, Expr2, ...,
ExprN] - all of type TSomeType. The latter is a nice feature not available in
ALGOL-60, otherwise the behavior of the open array as a formal parameter suffers
two serious drawbacks. First, only one-dimensional arrays as actual parameters are
allowed. Second, whatever the low and high boundaries of the actual array are,
they're always mapped to the zero-based formal open array. The same is true for the
open array constructor. The expressions are numbered starting with 0, which looks a
little confusing. This zero-based indexing of the open arrays isn't consistent with
the more convenient Pascal array boundaries of low..high type, but, for some
reason, Borland still adheres to this principle.
The open arrays enabled us to overcome a serious limitation of static arrays, and
allowed us to deal comfortably with one-dimensional arrays. Thus, procedures for
simple vectors of any size like scalar product, average, maximum, minimum, etc.
were no longer a problem.
The variant open array formal parameter looks like:
FormalVarArr: array of const ;
and is intended only to transfer the open array constructors containing expressions
of different types as an actual parameter - an extension of the similar feature for
open arrays and the predecessor of the more general idea of the type variant.
Variant Arrays
The Variant type, introduced by Delphi 2, was a very powerful extension of Pascal
intended for different purposes. We're going to discuss it here only with regard to
the concept of dynamic arrays. A variable declared as Variant may represent a
multi-dimensional dynamic array, but you need a special non-Pascal statement to
specify the dimensions and the type of elements:
18 var
19 vArr: Variant;
20 {...}
21 begin
22 {...}
23 vArr := VarArrayCreate([Low1, High1, ..., LowN, HighN],
24 ElementType);
where the boundaries may be variables and ElementType belongs to the fixed list of
basic Pascal types denoted by the identifiers of the format varXXXX, for example
varInteger, varDouble. After that we can consider vArr as an N-dimensional index
variable vArr[i1,i2,...,iN]. This implementation is the closest to the notion of
dynamic arrays as it appeared in ALGOL-60; it really made possible the
multi-dimensional rectangular arrays with the variable boundaries of low..high
type. Unfortunately, access to elements of variant arrays is at least 10 times
slower than to static arrays; a simple benchmark that transposes a big matrix (e.g.
A[i,j]:=A[j,i], i=1,...,N; j = 1,...,i-1) demonstrates it well. Also, in terms of
memory consumption, any variant variable requires a 16-byte overhead. Although it
doesn't seem too much if one variable represents a big array, it's something to
keep in mind in case of many non-array variants.
The re-dimensioning of variant arrays is possible within the same block, but only
for the last (right-most) dimension; the special function, varArrayRedim, does the
job.
As to the efficiency of access to the elements, it may be improved to the level as
fast as that of static arrays via the special procedure varArrayLock. It returns a
pointer that is assignment compatible with pointers to any static array, but
meaningful only if that static type corresponds exactly to the dimensions specified
in the varArrayCreate. For example, for vArr, the corresponding static array type
must be:
TvArrStat = array[LoN..HiN, ..., Lo1..Hi1] of ElementType;
with the dimensions specified in the order inverse to that in the varArrayCreate
(why?!) and all LoN, HiN, ... Lo1, Hi1 being constants equal to the current values
of the corresponding variable boundaries. Then, providing the declaration:
25 var
26 vArrLock: ^TvArrStat;
the statement:
vArrLock := varArrayLock(vArr)
allows us to use the index variable vArrLock^[iN,...,i1] (or simply
vArrLock[iN,...,i1]) with the access speed as quick as static arrays. We increased
speed, but, to deal with variable boundaries, we must explicitly declare as many
different static array types as we are going to have in run time. For example, we
may need to prepare in advance several type declarations:
27 type
28 T200x200 = array[1..200; 1..200] of Real;
29 T150x150 = array[1..150; 1..150] of Real;
30 {...}
and then specify the variable dimensions in the varArrayCreate according to one of
these types - not a very convenient technique.
The interesting feature of variant arrays is that the ElementType may be variant,
too:
31
32 vArr := VarArrayCreate([Low1, High1, varVariant)
In particular, it means that individual elements vArr[k] may be defined as a
variant array again with any number of dimensions of any size:
vArr[k] := VarArrayCreate([kLow, kHigh, varDouble)
This creates an illusion as though we can treat vArr as a two-dimensional,
non-rectangular array. (The examples of non-rectangular arrays of two dimensions
are triangle matrixes, or matrixes with just a few stripes. For three dimensions,
it may be an integer grid of points inside a pyramid.) Unfortunately, the variant
array of variant arrays doesn't work like the similar construction of the standard
Pascal arrays. Providing the declaration of vArr given previously, the code
compiles for the index variable like vArr[i,j], but stops with a run-time error
(because vArr is created as one-dimensional). Surprisingly, vArr[i][j] - that
should be the synonym in Pascal - shows different behavior: It even produces a
syntax error if it appears in the left side of the assignment statement; a :=
vArr[i][j] compiles and runs correctly, while vArr[i][j] := b doesn't, resulting in
a syntax error.
So we see that although the variant array type allows some functionality of the
ALGOL-60's dynamic arrays, the variant arrays are far more complex, slow, and not
consistent with the Pascal's concept of arrays both in syntax and semantics.
Dynamic Arrays of Delphi 4
Finally, here is the latest attempt to implement the dynamic arrays (covered only
on two pages of the Object Pascal Language Guide!). The declaration of
one-dimensional dynamic arrays looks like this:
type
TDynArray1 = array of baseType;
boundaries [...] must be omitted, where the baseType may be also a static array
type or a dynamic array type again. This allows the declaration of the
multi-dimensional "mixture" as well as "purely" dynamic arrays. For example, the
declaration for three dimensions takes the following form:
type
TDynArray3 = array of array of array of baseType;
This syntax allows us to declare a certain number of dimensions, but not their
sizes (which require special consideration). Thus, if the baseType is of the
non-array type, a variable:
var
A, B: TDynArray3
may be used with up to three indexes, e.g. A[i,j,k]. Otherwise, if the baseType =
array[1..100;1..200] of Double, this variable may appear with up to five indexes
A[k1,k2,k3,k4,k5].
After a dynamic array variable is declared, it still cannot be used unless the
special statement SetLength specifies the sizes of all dimensions and allocates the
required memory. This shows the important difference between the static and dynamic
arrays. The latter are - but only partially behave like - hidden pointers, i.e. a
dynamic array variable is not strictly associated with its memory image, the
instance, but rather separates from it.
Thus, the above-mentioned variable A (without indexes), or A[i], or A[i,j] (with
number of indexes less than the declared number) are all hidden pointers. As such,
at certain moments they may point nowhere, or more than one hidden pointer may
point to the same instance. For example, after the assignment A := B, both A and B
point to the same instance of B, so that any change to the elements of A affects B;
this contradicts the usual meaning of the assignment statement. While the instance
of A (if it exists) seems to be lost because it's pointed to by nothing, it doesn't
cause a memory leak, which is prevented by the so-called reference count technique
implemented for the dynamic arrays. For that reason, two consecutive statements -
SetLength(A, ...) and SetLength(A, ...) - don't cause the loss of the piece of
memory allocated in the first statement (leak) - unlike the similar situation, say,
for classes. The sequence:
33 X := TAnyClass.Create;
34 X := TAnyClass.Create;
35
36 //is a mistake. Also, the assignment:
37
38 A := nil
actually signals to the system that the memory (instance) must be freed, which is
never the case for classes or pointers.
And even if:
A[i1, i2, i3] = B[i1, i2, i3]
for all indexes, it never means that the conditions:
A = B or A[i1] = B[i1] or A[i1, i2] = B[i1, i2]
are True, because these partially-indexed variables point to different locations.
In terms of persistency, while leaving and re-entering a block, the dynamic array
variables behave like all other local (static) variables: leaving the block, the
variables and their instances are freed automatically. For local variables of the
types class and pointer it's wrong to leave the block without freeing all the
instances of all such variables - the reason why such variables must be declared,
for example, global. The user should nil a dynamic array only if it's important to
free the memory before leaving the block. Hence, dynamic arrays are much safer than
classes and pointers.
Thus, both the syntax and semantics of dynamic arrays differ from those of static
arrays. Two system procedures, SetLength and Copy, previously intended to deal with
strings, are applied now also for dynamic arrays. To define the sizes of dimensions
- and the allowed index space - we must use the system procedure SetLength(A,
Length1,...) with a non-fixed number of the integer parameters Length1,...,
LengthN. At least one of them is always required to specify the size of the
left-most dimension. If the number of dimensions is more than 1, the remaining
sizes may be specified either in the same SetLength statement, or later in other
such statements individually for each sub-array element. The former method defines
rectangular arrays, similar to those known in ALGOL-60 or standard Pascal (but with
mandatory zero low indexes), while the latter enables the so-called non-rectangular
arrays.
For example, providing:
39 var
40 A: TDynArray3
41
42 //the single statement:
43
44 SetLength(A, N1, N2, N3)
defines the rectangular array with the index field [0..N1-1; 0..N2-1; 0..N3-1],
while the statement:
SetLength(A, N1)
defines the size for the first dimension as N1 and correspondingly the index field
for the first index as [0.. N1-1]. This postpones the definition of the 2 other
dimensions for each A[k] individually. Figure 2 defines two types of triangle
matrixes.
45 var
46 A, B: array of array of Double;
47 N, i: Integer;
48 begin
49 { Defining N. }
50 SetLength(A, N);
51 SetLength(B, N);
52 for i := 0 to N - 1 do
53 begin
54 { Lower-left triangle matrix; index field 0<=i<=N-1, 0<=j<=i }
55 SetLength(A[i], i + 1);
56 { Upper-left triangle matrix; index field 0<=i<=N-1, 0<=j<=N-i-1 }
57 SetLength(B[i], N - i);
58 end;
59 { ...}
60 end;
Figure 2: Two types of triangle matrixes.
Unfortunately, because of the limitation imposed by zero-based indexing, dynamic
arrays don't allow us to define the lower- and upper-right triangle matrixes,
matrixes with several diagonal stripes this way. Figure 3 shows some examples of
three-dimensional dynamic arrays of a 3- and 4-lateral pyramid-type.
61 var
62 C, D: array of array of array of Double;
63 N, i, j: Integer;
64 begin
65 { Defining N }
66 SetLength(C, N);
67 SetLength(D, N);
68 for i := 0 to N - 1 do
69 begin
70 { 4-lateral pyramid; index field 0<=i<=N-1, 0<=j,k<=i }
71 SetLength(C[i], i + 1, i + 1);
72 { 3-lateral pyramid }
73 SetLength(D[i], i + 1);
74 for j := 0 to i do
75 { index field 0<=i<=N-1, 0<=j<=i, 0<=k <=j }
76 SetLength(D[i, j], j + 1)
77 end;
78 {...}
79 end;
Figure 3: 3D dynamic arrays of a 3- and 4-lateral pyramid type.
As to the speed of access to elements of dynamic arrays, it's almost as high as for
static arrays, at least one- and two-dimensional ones, as the benchmark with matrix
transposing proves. For static arrays, the memory location of each element in a
multi-dimensional array is known as soon as the index expression computes. Thus,
for a static element, such as A[k1,k2,k3], the relative location may look like this:
N2N3 * k1 + N3 * k2 + k3
Instead, for dynamic arrays to access an element of K-dimensional array, the code
must sequentially de-reference K pointers to the respective one-dimensional
sub-arrays. Both approaches seem compatible.
Language Barrier
The evolution of dynamic arrays in Borland Pascal/Delphi wasn't straightforward.
With regard to the functionality, the multi-dimensional rectangular variant arrays
are the closest to the concept of dynamic arrays as they first appeared in
ALGOL-60, but variant arrays are 10 times slower than static ones, and they differ
in syntax. In addition, for the concept of a variant array of variant arrays, both
syntax and semantics remain not quite clear.
The dynamic arrays in Delphi 4 exceed the arrays of ALGOL-60, at least in that they
can be non-rectangular and still be as fast as static arrays in Pascal.
Unfortunately, this notion reveals several language inconsistencies:
Why the mandatory zero-based indexing when the static and variant arrays don't
require it? The low..high indexing in standard Pascal is an important feature, and
can be very helpful in many applications. Also, the most natural and safe method of
numbering in structures like vectors and matrixes is to number the elements in a
vector corresponding to the index field: 1..N, not 0..N-1.
Why the special syntax in the declaration that omits the boundaries [ ]? As a
result, the separate statement, SetLength, is later always required. True,
SetLength allows to re-define the dimensions several times within a block, if it's
really needed, but for the more typical case when the dimensions and the sizes are
declared once at the beginning of the block, the standard form array[low..high] is
better, because it is consistent with the syntax. The compiler knows by itself if
the boundary expression is constant or variable, therefore it could implement the
static or dynamic models according to the situation.
The assignment statements with incomplete indexes for dynamic array variables such
as A := B have a quite unusual meaning: Any change in an element A[k] affects also
B[k] because A and B refer to the same instance. This behavior contradicts the
standard meaning of the assignment statement and is dangerous, so that incomplete
indexes in assignment statements for dynamic arrays shouldn't be allowed.
The behavior of dynamic arrays as hidden pointers is better and safer than the
behavior of classes (also hidden pointers) or pointers. Why not improve the
behavior of classes and pointers to the level of dynamic arrays so that all
indirect reference mechanisms in the language follow uniform rules?
Logically, the Delphi type class must be simply an indirect reference version of
the type object introduced in Turbo-Pascal 5.5. The only difference should be in
the method of memory allocation: for the class on the heap, and for the object on
the stack. This is safer and doesn't involve the dangerous separation of variables
from their instances. Delphi 4 and all previous versions support the type object
for backward compatibility, but there are still certain differences between syntax
and semantics of both types.
Conclusion
The fact that now there are as many as four different array types with quite
different and inconsistent syntax and semantics in Borland Pascal/Delphi doesn't
seem to be a good thing. Too many - and not always good - new features have been
added to standard Pascal, which makes it cluttered, hectic, and less safe. It looks
like the language doesn't evolve according to a well-developed fundamental plan;
rather, it's trying to cope somehow with all different and inconsistent features of
the very complex operating environment.
Delphi is still an unparalleled software development tool, but it's getting more
and more complex, while its documentation and Help system lag behind. Even the
Object Pascal Language Guide, the fundamental document of the language, is neither
complete nor clear, or strict and formal enough as one should expect from a
document of this type. This bears no resemblance to that high standard of
documentation that Borland was proud of in the era of Borland Turbo Pascal. Back to
the future?
I am very thankful to Dr Manfred Mackeben for his patience in reading and improving
this text and for many valuable notes.
|