Author: Danny Thorpe
Virtual Methods, Inside Out
Answer:
Polymorphism is perhaps the cornerstone of object-oriented programming (OOP).
Without it, OOP would have only encapsulation and inheritance - data buckets and
hierarchical families of data buckets - but no way to uniformly manipulate related
objects.
Polymorphism is the key to leveraging your programming investments to enable a
relatively small amount of code to drive a wide variety of behaviors, without
requiring carnal knowledge of the implementation details of those behaviors.
However, before you can extend existing Delphi components, or design new,
extensible component classes, you must have a firm understanding of how
polymorphism works and the opportunities it provides.
True to its name, polymorphism allows objects to have "many forms" in Delphi, and a
component writer typically uses a mix of all these forms to implement a new
component. In this article, we'll closely review the implementation and use of one
of Delphi's polymorphism providers, the virtual method, and some of its more
peculiar sand traps and exotic applications, e.g. its part in making .EXEs smaller.
(Dynamic methods, message methods, and class reference types are Delphi's other
polymorphism providers, but are outside the scope of this article.)
This article assumes you are familiar with Delphi class declaration syntax and
general OOP principles. If you're a bit rusty with these concepts, you should first
refer to the Delphi Language Reference. Also note that in this article, "virtual"
denotes the general term that applies to all forms of virtual methods (i.e. methods
declared with virtual, dynamic, or override), and "virtual" denotes the specific
term that refers only to methods declared with the virtual directive. For example,
most polymorphism concepts and issues apply to all virtual methods, but there are a
few noteworthy items that apply only to virtual methods.
Review: Syntax of Virtual Methods
Here's a review of the two kinds of virtual methods and four language directives
used to declare them:
Virtual methods come in two flavors: virtual and dynamic. The only difference
between them is their internal implementations; that is, they use different
techniques to achieve the same results.
Calls to virtual methods are dispatched more quickly than calls to dynamic methods.
Seldom-overridden virtual methods require much more storage space for their
compiler-generated tables than dynamic methods.
The keywords, virtual and dynamic, always introduce a new method name into a class'
name space.
The override directive redefines the implementation of an existing virtual method
(virtual or dynamic) that a class inherits from an ancestor.
The override method uses the same dispatch mechanism (virtual or dynamic) as the
inherited virtual method it replaces.
The abstract directive indicates that no method body is associated with that
virtual method declaration. Abstract declarations are useful for defining a purely
conceptual interface, which is in turn useful for maintaining absolute separation
between the user of a class and its implementation.
The abstract directive can only be used in the declaration of new virtual (virtual
or dynamic) methods; you can't make an implemented method abstract after the fact.
A class type that contains one or more abstract methods is an abstract class.
A class type that contains nothing but abstract methods (no static methods, no
virtual methods, no data fields) is called an abstract interface (or, in C++
circles, a pure virtual interface).
Polymorphism in Action
What do virtual methods do? In general, they allow a method call to be directed, at
run time, to the appropriate piece of code, appropriate for the type of the object
instance used to make the call. For this to be interesting, you must have more than
one class type, and the class types must be related by inheritance from a common
ancestor.
Figure 1 shows three classes we'll use to explore the execution characteristics of
polymorphism: a simple base class named TBaseGadget that defines a static method
named NotVirtual and a virtual method, ThisIsVirtual; and two descendant classes,
TKitchenGadget and TOfficeGadget, that override the ThisIsVirtual method they
inherit from TBaseGadget. TOfficeGadget also introduces a new static method named
NotVirtual and a new virtual method named NewMethod.
1 type
2 TBaseGadget = class
3 procedure NotVirtual(X: Integer);
4 procedure ThisIsVirtual(Y: Integer); virtual;
5 end;
6
7 TKitchenGadget = class(TBaseGadget)
8 procedure ThisIsVirtual(Y: Integer); override;
9 end;
10
11 TOfficeGadget = class(TBaseGadget);
12
13 function NewMethod: Longint; virtual;
14 procedure NotVirtual(X, Y, Z: Integer);
15 procedure ThisIsVirtual(Y: Integer); override;
16 end;
Figure 1: Three classes to explore polymorphism.
Identical names in different classes aren't related. Declaring a static method in a
descendant that happens to have the same name as a static method in an ancestor is
not a true override. Other than same-name similarity, no relationship exists
between static methods declared in a descendant and static methods declared in an
ancestor class. Your brain makes an association, but the compiler does not. For
instance, TBaseGadget has a NotVirtual method, and TOfficeGadget has a disparate
method, also named NotVirtual.
If we start with a variable P of type TBaseGadget, we can assign to it an instance
of a TBaseGadget; or an instance of one of its descendants, such as a
TKitchenGadget or TOfficeGadget. Recall that Delphi object instance variables are
pointers to the instance data allocated from the global heap, and that pointers of
a class type are type compatible with all descendants of that type. We can then
call methods using the instance variable P:
17 var
18 P: TBaseGadget;
19 begin
20 P := TBaseGadget.Create;
21 P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
22 P.ThisIsVirtual(5); { Call TBaseGadget.ThisIsVirtual }
23 P.Free;
24 end;
(In the interest of brevity, I'll fold the execution traces into comments in the
source code. You can step through the sample code to verify the execution trace.)
If P refers to an instance of TKitchenGadget, the execution trace would resemble
the code in Figure 2. Nothing remarkable here; we have one call to a static method
going to the version defined in the ancestor type, and one call to a virtual method
going to the version of the method associated with the object instance type.
25 var
26 P: TBaseGadget;
27 begin
28 P := TKitchenGadget.Create;
29 P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
30 P.ThisIsVirtual(5); { Call TKitchenGadget.ThisIsVirtual }
31 P.Free;
32 end;
Figure 2: Execution with an instance of TKitchenGadget.
You may deduce that the inherited static method, NotVirtual, is called because
TKitchenGadget doesn't override it. This observation is correct, but the
explanation is flawed, as Figure 3 shows. If P refers to an instance of
TOfficeGadget, you may be a little puzzled by the result.
33 var
34 P: TBaseGadget;
35 begin
36 P := TOfficeGadget.Create;
37 P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
38 { The compiler will not allow the following two lines:
39 P.NotVirtual(1,2,3); "Too many parameters"
40 P.NewMethod; "Method identifier expected" }
41 P.ThisIsVirtual(5); { Call TOfficeGadget.ThisIsVirtual }
42 P.Free;
43 end;
Figure 3: Execution with an instance of TOfficeGadget.
Static method calls are resolved by variable type. Although TOfficeGadget has its
own NotVirtual method, and P refers to an instance of TOfficeGadget, why does
TBaseGadget.NotVirtual get called instead? This occurs because static (non-virtual)
method calls are resolved at compile time according to the type of the variable
used to make the call. For static methods, what the variable refers to is
immaterial. In this case, P's type is TBaseGadget, meaning the NotVirtual method
associated with P's declared type is TBaseGadget.NotVirtual.
Notice that NewMethod defined in TOfficeGadget is out of reach of a TBaseGadget
variable. P can only access fields and methods defined in its TBaseGadget object
type.
New names obscure inherited names. Let's say P is declared as a variable of type
TOfficeGadget. The following method call would be allowed:
44 P.NotVirtual(1, 2, 3)
45
46 //However, this method call:
47
48 P.NotVirtual(1)
would not be allowed, because TOfficeGadget.NotVirtual requires three parameters.
TOfficeGadget.NotVirtual obscures the TBaseGadget.NotVirtual method name in all
instances and descendants of TOfficeGadget. The inherited method is still a part of
TOfficeGadget (proven by the code in Figure 3); you just can't get to it directly
from TOfficeGadget and descendant types.
To get past this, you must typecast the instance variable:
TBaseGadget(P).NotVirtual(1)
If P were declared as a TOfficeGadget variable, P.NewMethod would also be allowed,
because the compiler can "see" NewMethod in a TOfficeGadget variable.
Descendant >= ancestor. An instance of a descendant type could be greater than its
ancestor type in both services and data. However, the descendant-type instance can
never be less than what its ancestors define. This makes it possible for you to use
a variable of an ancestral type (e.g. TBaseGadget) to refer to an instance of a
descendant type without loss of information.
Inheritance is a one-way street. With a variable of a particular class type, you
can access any public symbol (field, property, or method) defined in any of that
class' ancestors. You can assign an instance of a descendant class into that
variable, but cannot access any new fields or methods defined by the descendant
class. The fields of the descendant class are certainly in the instance data that
the variable refers to, yet the compiler has no way of knowing that run-time
situation at compile time.
There are two ways around this "nearsightedness" of ancestral class types:
Typecasting - The programmer assumes a lot and forces the compiler to treat the
variable as a descendant type.
Virtual methods - The magic of virtual will call the method appropriate to the type
of the associated instance, determined at run time.
Ancestors set the standard. Why do we care about the nearsightedness of ancestral
classes? Why not simply use the matching variable type when you create or
manipulate an object instance? Sometimes this is the simplest thing to do. However,
this "simplest" solution falls apart when you begin talking about manipulating
multiple classes that do almost the same things.
Ancestral class types set the minimum interface standard through which we can
access a set of related objects. Polymorphism is the use of virtual methods to make
one verb (method name) produce one of many possible actions depending on the
context (the instance). To have multiple, possible actions, you must have multiple
class types (e.g. TKitchenGadget and TOfficeGadget) each potentially defining a
different implementation of a particular method.
To be able to make one call that could cover those multiple class types, the method
must be defined in a class from which all the multiple class types descend - in an
ancestral class such as TBaseGadget. The ancestral class, then, is the least common
denominator for behavior across a set of related classes.
For polymorphism to work, all the actions common to the group of classes need to at
least be named in a common ancestor. If every descendant is required to override
the ancestor's method, the ancestral method doesn't need to do anything at all; it
can be declared abstract.
If there is a behavior that is common to most of the classes in the group, the
ancestor class can pick up that default behavior and leave the descendants to
override the defaults only when necessary. This consolidates code higher in the
class hierarchy, for greater code reuse and smaller total code size. However,
providing default behaviors in an ancestor class can also complicate the design
issues of creating flexible, extensible classes, since what is done by ancestors
usually cannot be entirely undone.
Polymorphism lets ancestors reach into descendants. Another aspect of polymorphism
doesn't appear to involve instance pointer types at all - at least not explicitly.
Consider the code fragment in Figure 4. The TBaseGadget.NotVirtual method contains
an unqualified call to ThisIsVirtual. When P refers to an instance of
TKitchenGadget, P.NotVirtual will call TBaseGadget.NotVirtual. Nothing new, so far.
However, when that code calls ThisIsVirtual, it will execute
TKitchenGadget.ThisIsVirtual. Surprise! Even within the depths of TBaseGadget, a
non-virtual method, a virtual method call is directed to the appropriate code.
49 procedure TBaseGadget.NotVirtual;
50 begin
51 ThisIsVirtual(17);
52 end;
53
54 var
55 P: TBaseGadget;
56
57 begin
58 P := TKitchenGadget.Create;
59 P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
60 P.Free;
61 end.
Figure 4: Polymorphism allows ancestors to call into descendants.
How can this be? The resolution of virtual method calls depends on the object
instance associated with the call. A pointer to the object instance is secretly
passed into all method calls, surfacing inside methods as the Self identifier.
Inside TBaseGadget.NotVirtual, a call to ThisIsVirtual is actually a call to Self.
ThisIsVirtual. Self, in this context, operates like a variable of type TBaseGadget
that refers to an instance of type TKitchenGadget. Thus, when the instance type is
TKitchenGadget, the virtual method call resolves, at run time, to
TKitchenGadget.ThisIsVirtual.
How is this useful? An ancestral method - virtual or not - can call a sequence of
virtual methods. The descendants can determine the specific behavior of one or more
of those virtual methods. The ancestor determines the sequence in which the methods
are called, plus miscellaneous setup and cleanup code. The ancestor, however, does
not completely determine the final behavior of the descendants. The descendants
inherit the sequence logic from the ancestor, and can override one or more of the
steps in that sequence. But, the descendants don't have to reproduce the entire
sequence logic. This is one of the ways OOP promotes code reuse.
Fully-qualified method calls are reduced to static calls. As a footnote, consider
what happens if TBaseGadget.NotVirtual contains a qualified call to
TBaseGadget.ThisIsVirtual:
62 procedure TBaseGadget.NotVirtual;
63 begin
64 TBaseGadget.ThisIsVirtual(17);
65 end;
Although ThisIsVirtual is a virtual method, a fully-qualified method call will
compile down to a regular static method call. You've specified that you want only
the TBaseGadget.ThisIsVirtual method called, so the compiler does exactly what you
tell it to do. Dispatching this as a virtual method call may call some other
version of that method, which would violate your explicit instructions. Except in
special circumstances, you don't want this in your code because it defeats the
whole purpose of making ThisIsVirtual virtual.
The Virtual Method Table
A Virtual Method Table (VMT) is an array of pointers to all the virtual methods
defined in a class and all the virtual methods the class inherits from its
ancestors. A VMT is created by the compiler for every class type, because all
classes descend from TObject and TObject has a virtual destructor named Destroy. In
Delphi, VMTs are stored in the program's code space. Only one VMT exists per class
type; multiple instances of the same class type refer to the same VMT. At run time,
the VMT is a read-only lookup table.
Structure of the VMT. The first four bytes of data in an object instance are a
pointer to that class type's VMT. The VMT pointer points to the first entry in the
VMT's list of four-byte pointers to the entry points of the class' virtual methods.
Since methods can never be deleted in descendant classes, the location of a virtual
method in the VMT is the same throughout all descendant classes. Thus, the compiler
can view a virtual method simply as a unique entry in the class' VMT. As we'll see
shortly, this is exactly how virtual method calls are dispatched. Thinking of
virtual methods as indexes into an array of code pointers will also help us
visualize how method name conflicts are resolved by the compiler.
The VMT does not contain information indicating how many virtual methods are stored
in it or where the VMT ends. The VMT is constructed by the compiler and accessed by
compiler-generated code, so it doesn't need to make notes to itself about size or
number of entries. (This does, however, make it difficult for BASM code to call
virtual methods.)
Optimization note. A descendant of a class with virtual methods gets a new copy of
the ancestor's VMT table. The descendant can then add new virtual methods or
override inherited virtual methods without affecting the ancestor's VMT. For
example, if the ancestor has a 12-entry VMT, the descendant has at least a 12-entry
VMT. Every descendant class type of that ancestor, and all descendants of those
descendants, will have at least 12 entries in their individual VMTs.
All these VMTs occupy memory. For most programs, this won't be a problem, but
extraordinarily large class types with thousands of virtual methods and/or
thousands of descendants could consume quite a bit of memory, both in RAM and .EXE
file size; dynamic methods are much more space efficient, but incur a slight
execution speed penalty.
Now let's examine the mechanics behind the magic of virtual method calls.
Inside a virtual method call. When the compiler is compiling your source code and
encounters a call to a virtual method identifier, it generates a special sequence
of machine instructions that will unravel the appropriate call destination at run
time. The following machine code snippets assume compiler optimizations are
enabled, and stack frames are disabled:
66 // Machine code for statement P.SomeVirtualMethod;
67
68 { Move instance data address (P^) into EAX }
69 MOV EAX, [EBP + 4]
70 { Move instance's VMT address into ECX }
71 MOV ECX, [EAX]
72 { Call address stored at VMT index 2 }
73 CALL[ECX + 08]
The VMT pointer is always stored at offset 0 (zero) in the instance data. In this
example, the method being called is the third virtual method of a class, including
inherited virtual methods. The first virtual method is at offset 0, the second at
offset 4, and the third at offset 8.
Conclusion
That's it - all the magic of virtual methods and polymorphism boils down to this:
the indicator of which virtual method to invoke on the instance data is stored in
the instance data itself.
In Part II, we'll conclude our series with a discussion of abstract interfaces and how virtual methods can defeat and enhance "smart linking." See you then.
|