Delphi: How to use Virtual Methods and Polymorphism Part 1

Articles
Members Online:
-Article/Tip Search
-News Group
Member Area
-Top 10 NEW!!
-Forums Upgraded!!
-Indexes NEW!!
Employment
Contacts
Embarcadero
Embarcadero Community
JEDI
Links
How to use Virtual Methods and Polymorphism Part 1
31-Oct-03
Category
Algorithm
Language
Delphi 2.x
Views
186
User Rating
No Votes
# Votes
Replies
Publisher:
DSP, Administrator
Reference URL:
DKB
			Author: Danny Thorpe

Virtual Methods, Inside Out

Answer:

Polymorphism is perhaps the cornerstone of object-oriented programming (OOP). 
Without it, OOP would have only encapsulation and inheritance - data buckets and 
hierarchical families of data buckets - but no way to uniformly manipulate related 
objects. 

Polymorphism is the key to leveraging your programming investments to enable a 
relatively small amount of code to drive a wide variety of behaviors, without 
requiring carnal knowledge of the implementation details of those behaviors. 
However, before you can extend existing Delphi components, or design new, 
extensible component classes, you must have a firm understanding of how 
polymorphism works and the opportunities it provides. 

True to its name, polymorphism allows objects to have "many forms" in Delphi, and a 
component writer typically uses a mix of all these forms to implement a new 
component. In this article, we'll closely review the implementation and use of one 
of Delphi's polymorphism providers, the virtual method, and some of its more 
peculiar sand traps and exotic applications, e.g. its part in making .EXEs smaller. 
(Dynamic methods, message methods, and class reference types are Delphi's other 
polymorphism providers, but are outside the scope of this article.) 

This article assumes you are familiar with Delphi class declaration syntax and 
general OOP principles. If you're a bit rusty with these concepts, you should first 
refer to the Delphi Language Reference. Also note that in this article, "virtual" 
denotes the general term that applies to all forms of virtual methods (i.e. methods 
declared with virtual, dynamic, or override), and "virtual" denotes the specific 
term that refers only to methods declared with the virtual directive. For example, 
most polymorphism concepts and issues apply to all virtual methods, but there are a 
few noteworthy items that apply only to virtual methods.   

Review: Syntax of Virtual Methods

Here's a review of the two kinds of virtual methods and four language directives 
used to declare them: 

Virtual methods come in two flavors: virtual and dynamic. The only difference 
between them is their internal implementations; that is, they use different 
techniques to achieve the same results. 
Calls to virtual methods are dispatched more quickly than calls to dynamic methods. 
Seldom-overridden virtual methods require much more storage space for their 
compiler-generated tables than dynamic methods. 
The keywords, virtual and dynamic, always introduce a new method name into a class' 
name space. 
The override directive redefines the implementation of an existing virtual method 
(virtual or dynamic) that a class inherits from an ancestor. 
The override method uses the same dispatch mechanism (virtual or dynamic) as the 
inherited virtual method it replaces. 
The abstract directive indicates that no method body is associated with that 
virtual method declaration. Abstract declarations are useful for defining a purely 
conceptual interface, which is in turn useful for maintaining absolute separation 
between the user of a class and its implementation. 
The abstract directive can only be used in the declaration of new virtual (virtual 
or dynamic) methods; you can't make an implemented method abstract after the fact. 
A class type that contains one or more abstract methods is an abstract class. 
A class type that contains nothing but abstract methods (no static methods, no 
virtual methods, no data fields) is called an abstract interface (or, in C++ 
circles, a pure virtual interface). 

Polymorphism in Action 

What do virtual methods do? In general, they allow a method call to be directed, at 
run time, to the appropriate piece of code, appropriate for the type of the object 
instance used to make the call. For this to be interesting, you must have more than 
one class type, and the class types must be related by inheritance from a common 
ancestor. 

Figure 1 shows three classes we'll use to explore the execution characteristics of 
polymorphism: a simple base class named TBaseGadget that defines a static method 
named NotVirtual and a virtual method, ThisIsVirtual; and two descendant classes, 
TKitchenGadget and TOfficeGadget, that override the ThisIsVirtual method they 
inherit from TBaseGadget. TOfficeGadget also introduces a new static method named 
NotVirtual and a new virtual method named NewMethod. 

1   type
2     TBaseGadget = class
3       procedure NotVirtual(X: Integer);
4       procedure ThisIsVirtual(Y: Integer); virtual;
5     end;
6   
7     TKitchenGadget = class(TBaseGadget)
8       procedure ThisIsVirtual(Y: Integer); override;
9     end;
10  
11    TOfficeGadget = class(TBaseGadget);
12  
13  function NewMethod: Longint; virtual;
14    procedure NotVirtual(X, Y, Z: Integer);
15      procedure ThisIsVirtual(Y: Integer); override;
16  end;

Figure 1: Three classes to explore polymorphism. 

Identical names in different classes aren't related. Declaring a static method in a 
descendant that happens to have the same name as a static method in an ancestor is 
not a true override. Other than same-name similarity, no relationship exists 
between static methods declared in a descendant and static methods declared in an 
ancestor class. Your brain makes an association, but the compiler does not. For 
instance, TBaseGadget has a NotVirtual method, and TOfficeGadget has a disparate 
method, also named NotVirtual.

If we start with a variable P of type TBaseGadget, we can assign to it an instance 
of a TBaseGadget; or an instance of one of its descendants, such as a 
TKitchenGadget or TOfficeGadget. Recall that Delphi object instance variables are 
pointers to the instance data allocated from the global heap, and that pointers of 
a class type are type compatible with all descendants of that type. We can then 
call methods using the instance variable P:

17  var
18    P: TBaseGadget;
19  begin
20    P := TBaseGadget.Create;
21    P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
22    P.ThisIsVirtual(5); { Call TBaseGadget.ThisIsVirtual }
23    P.Free;
24  end;


(In the interest of brevity, I'll fold the execution traces into comments in the 
source code. You can step through the sample code to verify the execution trace.) 

If P refers to an instance of TKitchenGadget, the execution trace would resemble 
the code in Figure 2. Nothing remarkable here; we have one call to a static method 
going to the version defined in the ancestor type, and one call to a virtual method 
going to the version of the method associated with the object instance type. 

25  var
26    P: TBaseGadget;
27  begin
28    P := TKitchenGadget.Create;
29    P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
30    P.ThisIsVirtual(5); { Call TKitchenGadget.ThisIsVirtual }
31    P.Free;
32  end;

Figure 2: Execution with an instance of TKitchenGadget.

You may deduce that the inherited static method, NotVirtual, is called because 
TKitchenGadget doesn't override it. This observation is correct, but the 
explanation is flawed, as Figure 3 shows. If P refers to an instance of 
TOfficeGadget, you may be a little puzzled by the result. 

33  var
34    P: TBaseGadget;
35  begin
36    P := TOfficeGadget.Create;
37    P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
38    { The compiler will not allow the following two lines:
39     P.NotVirtual(1,2,3);   "Too many parameters"
40     P.NewMethod;           "Method identifier expected" }
41    P.ThisIsVirtual(5); { Call TOfficeGadget.ThisIsVirtual }
42    P.Free;
43  end;

Figure 3: Execution with an instance of TOfficeGadget.

Static method calls are resolved by variable type. Although TOfficeGadget has its 
own NotVirtual method, and P refers to an instance of TOfficeGadget, why does 
TBaseGadget.NotVirtual get called instead? This occurs because static (non-virtual) 
method calls are resolved at compile time according to the type of the variable 
used to make the call. For static methods, what the variable refers to is 
immaterial. In this case, P's type is TBaseGadget, meaning the NotVirtual method 
associated with P's declared type is TBaseGadget.NotVirtual.

Notice that NewMethod defined in TOfficeGadget is out of reach of a TBaseGadget 
variable. P can only access fields and methods defined in its TBaseGadget object 
type. 

New names obscure inherited names. Let's say P is declared as a variable of type 
TOfficeGadget. The following method call would be allowed: 

44  P.NotVirtual(1, 2, 3)
45  
46  //However, this method call: 
47  
48  P.NotVirtual(1)


would not be allowed, because TOfficeGadget.NotVirtual requires three parameters. 

TOfficeGadget.NotVirtual obscures the TBaseGadget.NotVirtual method name in all 
instances and descendants of TOfficeGadget. The inherited method is still a part of 
TOfficeGadget (proven by the code in Figure 3); you just can't get to it directly 
from TOfficeGadget and descendant types. 

To get past this, you must typecast the instance variable: 

TBaseGadget(P).NotVirtual(1)

If P were declared as a TOfficeGadget variable, P.NewMethod would also be allowed, 
because the compiler can "see" NewMethod in a TOfficeGadget variable. 

Descendant >= ancestor. An instance of a descendant type could be greater than its 
ancestor type in both services and data. However, the descendant-type instance can 
never be less than what its ancestors define. This makes it possible for you to use 
a variable of an ancestral type (e.g. TBaseGadget) to refer to an instance of a 
descendant type without loss of information. 

Inheritance is a one-way street. With a variable of a particular class type, you 
can access any public symbol (field, property, or method) defined in any of that 
class' ancestors. You can assign an instance of a descendant class into that 
variable, but cannot access any new fields or methods defined by the descendant 
class. The fields of the descendant class are certainly in the instance data that 
the variable refers to, yet the compiler has no way of knowing that run-time 
situation at compile time. 

There are two ways around this "nearsightedness" of ancestral class types: 

Typecasting - The programmer assumes a lot and forces the compiler to treat the 
variable as a descendant type. 
Virtual methods - The magic of virtual will call the method appropriate to the type 
of the associated instance, determined at run time. 

Ancestors set the standard. Why do we care about the nearsightedness of ancestral 
classes? Why not simply use the matching variable type when you create or 
manipulate an object instance? Sometimes this is the simplest thing to do. However, 
this "simplest" solution falls apart when you begin talking about manipulating 
multiple classes that do almost the same things. 

Ancestral class types set the minimum interface standard through which we can 
access a set of related objects. Polymorphism is the use of virtual methods to make 
one verb (method name) produce one of many possible actions depending on the 
context (the instance). To have multiple, possible actions, you must have multiple 
class types (e.g. TKitchenGadget and TOfficeGadget) each potentially defining a 
different implementation of a particular method. 

To be able to make one call that could cover those multiple class types, the method 
must be defined in a class from which all the multiple class types descend - in an 
ancestral class such as TBaseGadget. The ancestral class, then, is the least common 
denominator for behavior across a set of related classes. 

For polymorphism to work, all the actions common to the group of classes need to at 
least be named in a common ancestor. If every descendant is required to override 
the ancestor's method, the ancestral method doesn't need to do anything at all; it 
can be declared abstract.

If there is a behavior that is common to most of the classes in the group, the 
ancestor class can pick up that default behavior and leave the descendants to 
override the defaults only when necessary. This consolidates code higher in the 
class hierarchy, for greater code reuse and smaller total code size. However, 
providing default behaviors in an ancestor class can also complicate the design 
issues of creating flexible, extensible classes, since what is done by ancestors 
usually cannot be entirely undone. 

Polymorphism lets ancestors reach into descendants. Another aspect of polymorphism 
doesn't appear to involve instance pointer types at all - at least not explicitly. 

Consider the code fragment in Figure 4. The TBaseGadget.NotVirtual method contains 
an unqualified call to ThisIsVirtual. When P refers to an instance of 
TKitchenGadget, P.NotVirtual will call TBaseGadget.NotVirtual. Nothing new, so far. 
However, when that code calls ThisIsVirtual, it will execute 
TKitchenGadget.ThisIsVirtual. Surprise! Even within the depths of TBaseGadget, a 
non-virtual method, a virtual method call is directed to the appropriate code. 

49  procedure TBaseGadget.NotVirtual;
50  begin
51    ThisIsVirtual(17);
52  end;
53  
54  var
55    P: TBaseGadget;
56  
57  begin
58    P := TKitchenGadget.Create;
59    P.NotVirtual(10); { Call TBaseGadget.NotVirtual }
60    P.Free;
61  end.


Figure 4: Polymorphism allows ancestors to call into descendants. 

How can this be? The resolution of virtual method calls depends on the object 
instance associated with the call. A pointer to the object instance is secretly 
passed into all method calls, surfacing inside methods as the Self identifier. 
Inside TBaseGadget.NotVirtual, a call to ThisIsVirtual is actually a call to Self. 
ThisIsVirtual. Self, in this context, operates like a variable of type TBaseGadget 
that refers to an instance of type TKitchenGadget. Thus, when the instance type is 
TKitchenGadget, the virtual method call resolves, at run time, to 
TKitchenGadget.ThisIsVirtual.

How is this useful? An ancestral method - virtual or not - can call a sequence of 
virtual methods. The descendants can determine the specific behavior of one or more 
of those virtual methods. The ancestor determines the sequence in which the methods 
are called, plus miscellaneous setup and cleanup code. The ancestor, however, does 
not completely determine the final behavior of the descendants. The descendants 
inherit the sequence logic from the ancestor, and can override one or more of the 
steps in that sequence. But, the descendants don't have to reproduce the entire 
sequence logic. This is one of the ways OOP promotes code reuse. 

Fully-qualified method calls are reduced to static calls. As a footnote, consider 
what happens if TBaseGadget.NotVirtual contains a qualified call to 
TBaseGadget.ThisIsVirtual:

62  procedure TBaseGadget.NotVirtual;
63  begin
64    TBaseGadget.ThisIsVirtual(17);
65  end;


Although ThisIsVirtual is a virtual method, a fully-qualified method call will 
compile down to a regular static method call. You've specified that you want only 
the TBaseGadget.ThisIsVirtual method called, so the compiler does exactly what you 
tell it to do. Dispatching this as a virtual method call may call some other 
version of that method, which would violate your explicit instructions. Except in 
special circumstances, you don't want this in your code because it defeats the 
whole purpose of making ThisIsVirtual virtual. 

The Virtual Method Table

A Virtual Method Table (VMT) is an array of pointers to all the virtual methods 
defined in a class and all the virtual methods the class inherits from its 
ancestors. A VMT is created by the compiler for every class type, because all 
classes descend from TObject and TObject has a virtual destructor named Destroy. In 
Delphi, VMTs are stored in the program's code space. Only one VMT exists per class 
type; multiple instances of the same class type refer to the same VMT. At run time, 
the VMT is a read-only lookup table. 

Structure of the VMT. The first four bytes of data in an object instance are a 
pointer to that class type's VMT. The VMT pointer points to the first entry in the 
VMT's list of four-byte pointers to the entry points of the class' virtual methods. 
Since methods can never be deleted in descendant classes, the location of a virtual 
method in the VMT is the same throughout all descendant classes. Thus, the compiler 
can view a virtual method simply as a unique entry in the class' VMT. As we'll see 
shortly, this is exactly how virtual method calls are dispatched. Thinking of 
virtual methods as indexes into an array of code pointers will also help us 
visualize how method name conflicts are resolved by the compiler. 

The VMT does not contain information indicating how many virtual methods are stored 
in it or where the VMT ends. The VMT is constructed by the compiler and accessed by 
compiler-generated code, so it doesn't need to make notes to itself about size or 
number of entries. (This does, however, make it difficult for BASM code to call 
virtual methods.) 

Optimization note. A descendant of a class with virtual methods gets a new copy of 
the ancestor's VMT table. The descendant can then add new virtual methods or 
override inherited virtual methods without affecting the ancestor's VMT. For 
example, if the ancestor has a 12-entry VMT, the descendant has at least a 12-entry 
VMT. Every descendant class type of that ancestor, and all descendants of those 
descendants, will have at least 12 entries in their individual VMTs. 

All these VMTs occupy memory. For most programs, this won't be a problem, but 
extraordinarily large class types with thousands of virtual methods and/or 
thousands of descendants could consume quite a bit of memory, both in RAM and .EXE 
file size; dynamic methods are much more space efficient, but incur a slight 
execution speed penalty. 

Now let's examine the mechanics behind the magic of virtual method calls. 

Inside a virtual method call. When the compiler is compiling your source code and 
encounters a call to a virtual method identifier, it generates a special sequence 
of machine instructions that will unravel the appropriate call destination at run 
time. The following machine code snippets assume compiler optimizations are 
enabled, and stack frames are disabled: 

66  // Machine code for statement P.SomeVirtualMethod;
67  
68  { Move instance data address (P^) into EAX }
69  MOV EAX, [EBP + 4]
70  { Move instance's VMT address into ECX }
71  MOV ECX, [EAX]
72  { Call address stored at VMT index 2 }
73  CALL[ECX + 08]


The VMT pointer is always stored at offset 0 (zero) in the instance data. In this 
example, the method being called is the third virtual method of a class, including 
inherited virtual methods. The first virtual method is at offset 0, the second at 
offset 4, and the third at offset 8. 

Conclusion

That's it - all the magic of virtual methods and polymorphism boils down to this: 
the indicator of which virtual method to invoke on the instance data is stored in 
the instance data itself. 

In Part II, we'll conclude our series with a discussion of abstract interfaces and how virtual methods can defeat and enhance "smart linking." See you then.
Vote: How useful do you find this Article/Tip?
Bad											Excellent
	1	2	3	4	5	6	7	8	9	10
Advertisement
Share this page
Advertisement