Author: Danny Thorpe
Smart Linking
Answer:
In Part I we explored the magic of polymorphism and its Object Pascal
implementation, the virtual method. We discovered that the indicator of which
virtual method to invoke on the instance data is stored in the instance data
itself.
In this installment, we conclude our exploration with a discussion of abstract
interfaces and how virtual methods can defeat and enhance "smart linking."
Abstract Interfaces
An abstract interface is a class type that contains no implementation and no data -
only abstract virtual methods. Abstract interfaces allow you to completely separate
the user of the interface from the implementation of the interface.
And I do mean completely separate; with abstract interfaces, you can have an object
implemented in a DLL and used by routines in an .EXE, just as if the object were
implemented in the .EXE itself. Abstract interfaces can bridge:
conceptual barriers within an application,
logistical barriers between an application and a DLL,
language barriers between applications written in different programming languages,
and
address space barriers that separate Win32 processes.
In all cases, the client application uses the interface class just as it would any
class it implemented itself.
Let's now take a closer look at how an abstract interface class can bridge the gap
between an application and a DLL. (By the way, abstract interfaces are the
foundation of OLE programming.)
Importing Objects from DLLs: The Hard Way. If you want an application to use a
function in a DLL, you must create a "fake" function declaration that tells the
compiler what it needs to know about the parameter list and result type of the
function. Instead of a method body, this fake function declaration contains a
reference to a DLL and function name. The compiler sees these and knows what code
to generate to call the proper address in the DLL at run time.
To have an application use an object that's implemented in a DLL, you could do
essentially the same thing, declaring a separate function for each object method in
the DLL. As the number of methods in the DLL object increases, however, keeping
track of all those functions will become a chore. To make things a little easier to
manage, you could set up the DLL to give you (the client application) an array of
function pointers that you would use to call any of the DLL functions associated
with a particular DLL class type.
You can see where this is headed. A Virtual Method Table is precisely an array of
function pointers (we discussed the VMT last month). Why do things the hard way
when the compiler can do the dirty work for you?
Importing Objects from DLLs: The Smart Way. The client module (the application)
requires a class declaration that will make the compiler "visualize" a VMT that
matches the desired DLL's array of function pointers. Enter the abstract interface
class. The class contains a hoard of virtual; abstract; method declarations in the
same order as the functions in the DLL's array of function pointers. Of course, the
abstract method declarations need parameter lists that match the DLL's functions
exactly.
Now you can fetch the array of function pointers from the DLL and typecast a
pointer to that array into your application's abstract interface class type. (Okay;
it actually needs to be a pointer to a pointer to an array of function addresses.
The first pointer simulates the object instance, the second pointer simulates the
VMT pointer embedded in the instance data, but who's counting?)
With this typecast in place, the compiler will think you have an instance of that
class type. When the compiler sees a method call on that typecast pointer, it will
generate code to push the parameters on the stack, then look up the nth virtual
method address in the "instance's VMT" (the pointer to the function table provided
by the DLL), and call that address. Voil?! Your application is using an "object"
that lives in a DLL as easily as one of its own classes.
Exporting Objects from DLLs. Now for the flip side. Where does the DLL get that
array of function pointers? From the compiler, of course! On the DLL side, create a
class type with virtual methods with the same order and parameter lists as defined
by the "red-herring" array of function pointers, and implement those methods to
perform the tasks of that class. Then implement and export a simple function from
the DLL that creates an instance of the DLL's class and returns a pointer to it.
Again, Voil?! Your DLL is exporting an object that can be used by any application
that can handle pointers to arrays of function addresses. Also known as objects!
Abstract Interfaces Link User and Implementor. Here's the clincher. How do you
guarantee that the order and parameter lists of the methods in the application's
abstract interface class exactly match the methods implemented in the DLL?
Simple. Declare the DLL class as a descendant of the abstract interface class used
by the application, and override all the abstract virtual methods. The abstract
interface is shared between the application and the DLL; the implementation is
contained entirely within the DLL.
Abstract Interfaces Cross Language Boundaries. This can also be done between
modules written in different languages. The Microsoft Component Object Model (COM)
is a language-independent specification that allows different programming languages
to share objects as just described. At its core, COM is simply a specification for
how an array of function pointers should be arranged and used. COM is the
foundation of OLE.
Since Delphi's native class type implementation conforms to COM specifications,
there is no conversion required for Delphi applications to use COM objects, nor any
conversion required for Delphi applications to expose COM objects for other modules
to use.
Of course, when dealing with multiple languages, you won't have the luxury of
sharing the abstract interface class between the modules. You'll have to translate
the abstract interface class into each language, but this is a small price to pay
for the ability to share the implementation.
The Delphi IDE is built entirely upon abstract interfaces, allowing the IDE main
module to communicate with the editor and debugger kernel DLLs (implemented in
BC++), and with the multitude of component design-time tools that live in the
component library (CMPLIB32.DCL) and installable expert modules.
Virtuals Defeat Smart Linking
When the Delphi compiler/linker produces an .EXE, the procedures, variables, and
static methods that are not referenced by "live" code (code that is actually used)
will be left out of the .EXE file. This process is called smart linking, and is a
great improvement over normal linkers that merely copy all code into the .EXE
regardless of whether it's actually needed. The result of smart linking is a
smaller .EXE on disk that requires less memory to run.
Smart Linking Rule for Virtuals. If the type information of a class is touched (for
example, by constructing an instance) by live code, all the virtual methods of the
class and its ancestors will be linked into the .EXE, regardless of whether the
program actually uses the virtual methods.
For the compiler, keeping track of whether an individual procedure is ever used in
a program is relatively simple; figuring out whether a virtual method is used
requires a great deal more analysis of the descendants and ancestors of the class.
It's not impossible to devise a scheme to determine if a particular virtual method
is never used in any descendants of a class type, but such a scheme would certainly
require a lot more CPU cycles than normal smart linking, and the resulting
reduction in code size would rarely be dramatic. For these reasons (lots of work,
greatly reduced compile/link speed, and diminishing returns), adding smart linking
of virtual methods to the Delphi linker has not been a high priority for Borland.
If your class has a number of utility methods that you don't expect to use all the
time, leaving them static will allow the smart linker to omit them from the final
.EXE if they are not used by your program.
Note that including virtual methods involves more than just the bytes of code in
the method bodies. Anything that a virtual method uses or calls (including static
methods) must also be linked into the .EXE, as well as anything those routines use,
etc. Through this cascade effect, one method could potentially drag hundreds of
other routines into the .EXE, sometimes at a cost of hundreds of thousands of bytes
of additional code and data. If most of these support routines are used only by
your unused virtual method, you have a lot of deadwood in your .EXE.
The best general strategy to keep unused virtual methods - and their associated
deadwood - under control, is to declare virtual methods sparingly. It's easier to
promote an existing static method to virtual when a clear need arises, rather than
trying to demote virtual methods down to statics at some late stage of your
development cycle.
Virtuals Enhance Smart Linking
Smart linking of virtuals is a two-edged sword: What is so often cursed for
bloating executables with unused code can also be exploited to greatly reduce the
amount of code in an executable in certain circumstances - even beyond what smart
linking could normally achieve with ordinary static methods and procedures. The key
is to turn the smart linking rule for virtuals inside out:
Inverse Smart Linking Rule for Virtuals. If the type information of a class is not
touched by live code, then none of that class' virtual methods will be linked into
the executable. Even if those virtual methods are called polymorphically by live
code!
In a virtual method call, the compiler emits machine code to grab the VMT pointer
from the instance data, and to call an address stored at a particular offset in the
VMT. The compiler can't know exactly which method body will be called at run time,
so the act of calling a virtual method does not cause the smart linker to pull any
method bodies corresponding to that virtual method identifier into the final
executable.
The same is true for dynamic methods. The act of constructing an instance of the
class is what cues the linker to pull in the virtual methods of that particular
class and its ancestors. This saves the program from the painful death that would
surely result from calling virtual methods that were not linked into the program.
After all, how could you possibly call a virtual method of an object instance
defined and implemented in your program if you did not first construct said
instance? The answer is: you can't. If you obtained the object instance from some
external source, e.g. a DLL, then the virtual methods of that instance are in the
DLL, not your program.
So, if you have code that calls virtual methods of a class that is never
constructed by routines used in the current project, none of the code associated
with those virtual methods will be linked into the final executable.
The code in Figure 1 will cause the linker to pull in all the virtual methods of
TKitchenGadget and TOfficeManager, because those classes are constructed in live
code (the main program block), and all the virtual methods of TBaseGadget, because
it's the ancestor of TKitchenGadget.
1 type
2 TBaseGadget = class
3 constructor Create;
4 procedure Whirr; virtual; { Linked in: YES }
5 end;
6
7 TOfficeGadget = class(TBaseGadget)
8 procedure Whirr; override; { Linked in: NO }
9 procedure Buzz; { Linked in: NO }
10 procedure Pop; virtual; { Linked in: NO }
11 end;
12
13 TKitchenGadget = class(TBaseGadget)
14 procedure Whirr; override; { Linked in: YES }
15 end;
16
17 TOfficeManager = class
18 private
19 FOfficeGadget: TOfficeGadget;
20 public
21 procedure InstantiateGadget; { Linked in: NO }
22 { Linked in: YES }
23 procedure Operate(AGadget: TOfficeGadget); virtual;
24 end;
25
26 { ... Non-essential code omitted ... }
27
28 procedure TOfficeManager.InstantiateGadget;
29 begin { Dead code, never called }
30 FOfficeGadget := TOfficeGadget.Create;
31 end;
32
33 procedure TOfficeManager.Operate(AGadget: TOfficeGadget);
34 { Live code, virtual method of a constructed class }
35 begin
36 AGadget.Whirr
37 end;
38
39 var
40 X: TBaseGadget;
41 M: TOfficeManager;
42 begin
43 X := TKitchenGadget.Create;
44 M := TOfficeManager.Create;
45
46 X.Free;
47 M.Free;
48 end.
Figure 1: Inverse virtual smart linking: TOfficeGadget.Whirr will not be linked
into this program, although Whirr is touched by the live method
TOfficeManager.OperateGadget.
Because TOfficeManager.Operate is virtual, its method body is all live code (even
though Operate is never called). Therefore, the call to AGadget.Whirr is a live
reference to the virtual method Whirr. However, TOfficeGadget is not constructed in
live code in this example -TOfficeManager.InstantiateGadget is never used. Nothing
of TOfficeGadget will be linked into this program, even though a live routine
contains a call to Whirr through a variable of type TOfficeGadget.
Variations on a Theme. Let's see how the scenario changes with a few slight code
modifications. The code in Figure 2 adds a call to AGadget.Buzz in the
TOfficeManager.Operate method. Notice that the body of TOfficeGadget.Buzz is now
linked in, but TOfficeGadget.Whirr is still not. Buzz is a static method, so any
live reference to it will link in the corresponding code, even if the class is
never constructed.
49 type
50 TBaseGadget = class
51 constructor Create;
52 procedure Whirr; virtual; { Linked in: YES }
53 end;
54
55 TOfficeGadget = class(TBaseGadget)
56 procedure Whirr; override; { Linked in: NO }
57 procedure Buzz; { Linked in: YES }
58 procedure Pop; virtual; { Linked in: NO }
59 end;
60
61 TKitchenGadget = class(TBaseGadget)
62 procedure Whirr; override; { Linked in: YES }
63 end;
64
65 TOfficeManager = class
66 private
67 FOfficeGadget: TOfficeGadget;
68 public
69 procedure InstantiateGadget; { Linked in: NO }
70 { Linked in: YES }
71 procedure Operate(AGadget: TOfficeGadget); virtual;
72 end;
73
74 { ... Non-essential code omitted ... }
75
76 procedure TOfficeManager.InstantiateGadget;
77 begin { Dead code, never called }
78 FOfficeGadget := TOfficeGadget.Create;
79 end;
80
81 procedure TOfficeManager.Operate(AGadget: TOfficeGadget);
82 { Live code, virtual method of a constructed class }
83 begin
84 AGadget.Whirr;
85 AGadget.Buzz; { This touches the static method body }
86 end;
87 var
88 X: TBaseGadget;
89 M: TOfficeManager;
90 begin
91 X := TKitchenGadget.Create;
92 M := TOfficeManager.Create;
93
94 X.Free;
95 M.Free;
96 end.
Figure 2: Notice how the addition of a call to the static Buzz method affects its
linked-in status. TOfficeGadget.Whirr is still not included.
The code in Figure 3 adds a call to the static method
TOfficeManager.InstantiateGadget. This brings the construction of the TOfficeGadget
class into the live code of the program, which brings in all the virtual methods of
TOfficeGadget, including TOfficeGadget.Whirr (which is called by live code) and
TOfficeGadget.Pop (which isn't). If you deleted the call to AGadget.Buzz, the
TOfficeGadget.Buzz method would become dead code again. Static methods are linked
in only if they are used in live code, regardless of whether their class type is
used.
97 type
98 TBaseGadget = class
99 constructor Create;
100 procedure Whirr; virtual; { Linked in: YES }
101 end;
102
103 TOfficeGadget = class(TBaseGadget)
104 procedure Whirr; override; { Linked in: YES }
105 procedure Buzz; { Linked in: YES }
106 procedure Pop; virtual; { Linked in: YES }
107 end;
108
109 TKitchenGadget = class(TBaseGadget)
110 procedure Whirr; override; { Linked in: YES }
111 end;
112
113 TOfficeManager = class
114 private
115 FOfficeGadget: TOfficeGadget;
116 public
117 procedure InstantiateGadget; { Linked in: YES }
118 { Linked in: YES }
119 procedure Operate(AGadget: TOfficeGadget); virtual;
120
121 end;
122
123 { ... Non-essential code omitted ... }
124
125 procedure TOfficeManager.InstantiateGadget;
126 begin { Live code }
127 FOfficeGadget := TOfficeGadget.Create;
128 end;
129
130 procedure TOfficeManager.Operate(AGadget: TOfficeGadget);
131 { Live code, virtual method of a constructed class }
132 begin
133 AGadget.Whirr;
134 AGadget.Buzz; { This touches the static method body }
135 end;
136
137 var
138 X: TBaseGadget;
139 M: TOfficeManager;
140 begin
141 X := TKitchenGadget.Create;
142 M := TOfficeManager.Create;
143
144 M.InstantiateGadget;
145
146 X.Free;
147 M.Free;
148 end.
Figure 3: With a call to InstantiateGadget, the construction of TOfficeGadget
becomes live and all of TOfficeGadget's virtual methods are linked.
Life in the Real World. Let's examine a slightly more complex (and more
interesting) example of this virtual smart linking technique inside the VCL.
The Delphi streaming system has two parts: TReader and TWriter, which descend from
a common ancestor, TFiler:
TReader contains all the code needed to load components from a stream.
TWriter contains everything needed to write components to a stream.
These classes were split because many Delphi applications never need to write
components to a stream - most applications only read forms from resource streams at
program start up. If the streaming system was implemented in one class, all your
applications would wind up carrying around all the stream output code, although
many don't need it.
So, splitting the streaming system into two classes improved smart linking. End of
story? Not quite.
In a careful examination of the code linked into a typical Delphi application, the
Delphi R&D team noticed that bits of TWriter were being linked into the .EXE. This
seemed odd, because TWriter was definitely never instantiated in the test program.
Some of those TWriter bits touched a lot of other bits that piled up rather quickly
into a lot of unused code. Let's backtrack a little to see what lead to this code
getting into the .EXE, and its surprising solution.
Delphi's TComponent class defines virtual methods that are responsible for reading
and writing the component's state in a stream, using TReader and TWriter classes.
Because TComponent is the ancestor of just about everything of importance in
Delphi, TComponent is almost always linked into your Delphi programs, along with
all the virtual methods of TComponent.
Some of TComponent's virtual methods use TWriter methods to write the component's
properties to a stream. Those TWriter methods were static methods.
Therefore, TComponent virtual methods are always included in Delphi form-based
applications, and some of those virtual methods (e.g. TComponent.WriteState) call
static methods of TWriter (e.g. TWriter.WriteData). Thus, those static method
bodies of TWriter were being linked into the .EXE. TWriter.WriteData is the kingpin
method that drives the entire stream output system, so when it is linked in, almost
all the rest of TWriter tags along (everything, ironically, except TWriter.Create).
The solution to this code bloat (caused indirectly by the TComponent.WriteState
virtual method) may throw you for a loop: To eliminate the unneeded TWriter code,
make more methods of TWriter (e.g. WriteData) virtual!
The all-or-none clumping of virtual methods that we curse for working against the
smart linker can be used to our advantage, so that TWriter methods that must be
called by live code are not actually included unless TWriter itself is instantiated
in the program. Because methods such as TWriter.WriteData are always used when you
use a TWriter, and TWriter is a mule class (no descendants), there is no
appreciable cost to making TWriter.WriteData virtual.
The benefits, however, are appreciable: Making TWriter.WriteData virtual shaved
nearly 10KB off the size of a typical Delphi 2 .EXE. Thanks to this and other code
trimming tricks, Delphi 2 packs more standard features (e.g. form inheritance and
form linking) into smaller .EXEs than Delphi 1.
What's Really in Your Executables? The simplest way to find out if a particular
routine is linked into a particular project is to set a breakpoint in the body of
that routine and run the program in the debugger. If the routine is not linked into
the .EXE, the debugger will complain that you have set an invalid breakpoint.
To get a complete picture of what's in your .EXE or DLL, configure the linker
options to emit a detailed map file. From Delphi's main menu, select Project |
Options to display the Project Options dialog box. Select the Linker tab. In the
Map File group box, select Detailed. Now recompile your project. The map file will
contain a list of the names of all the routines (from units compiled with $D +
debug information) that were linked into the .EXE.
Because the 32-bit Delphi Compiled Unit (.DCU) file has none of the capacity
limitations associated with earlier, 16-bit versions of the Borland Pascal product
line, there is little reason to ever turn off debug symbol information storage in
the .DCU. Leave the $D, $L, and $Y compiler switches enabled at all times so the
information is available when you need it in the integrated debugger, map file, or
object browser. (If hard disk space is a problem, collect the loose change beneath
the cushions of your sofa and buy a new 1GB hard drive.)
Novelty of Inverse Virtual Smart Linking. This technique of using virtual methods
to improve smart linking is not unique to Delphi, but because Delphi's smart linker
has a much finer granularity than other compiler products, this technique is much
more effective in Delphi than in other products.
Most compilers produce intermediate code and limited symbol information in an .OBJ
format, and most linkers' atom of granularity for smart linking is the .OBJ file.
If you touch something inside a library of routines stored in one .OBJ module, the
entire .OBJ module is linked into the .EXE. Thus, C and C++ libraries are often
broken into swarms of little .OBJ modules in the hope of minimizing dead code in
the .EXE.
Delphi's linker granularity is much finer - down to individual variables,
procedures, and classes. If you touch one routine in a Delphi unit that contains
lots of routines, only the thing you touch (and whatever it uses) is linked into
the .EXE. Thus, there is no penalty for creating large libraries of
topically-related routines in one Delphi unit. What you don't use will be left out
of the .EXE.
Developing clever techniques to avoid touching individual routines or classes is
generally more rewarding in Delphi than in most other compiled languages. In other
products, the routines you so carefully avoided will probably be linked into the
.EXE anyway because you are still using one of the other routines in the same
module. Measuring with a micrometer is futile when your only cutting tool is a
chainsaw.
Conclusion
Virtual methods are often maligned for bloating applications with unnecessary code.
While it's true that virtuals can drag in code that your application doesn't need,
this series has shown that careful and controlled use of virtual methods can
achieve greater smart linking efficiency than would be possible with static methods
alone.
|