Author: Ezra Hoch
You've probably used classes & interfaces more than once in your delphi programs.
Did you ever dtop to think how delphi implements this creatures ?
Answer:
A few words before we start :
First, I want to start this article by saying that all of the knowledge in this
paper is derived from viewing the disassembler of Delphi5. Hence everything writen
here is valid only for Delphi5 and might change by any upgrade / different version.
Second, inorder to fully understand what is writen in this article, you'll have to
dive into some assembler code. I'll explain what the assembler code does, but be
prepared, it might get messy.
And now to the real stuff. In delphi a class' instance is a simple pointer. That
might seem odd to some people, since you've used instances in delphi many a time,
and never had to treat them like pointers. That is correct, but only because
boralnd was kind enough to wrap these pointers nicly up.
These pointers actually point to a complicated structor in memory, which we'll try
and understand. First we'll look at some simple class' defenition :
1 TBoo1 = class
2 FDataA, FDataB: Integer;
3 end;
4
5 var
6 Boo1: TBoo1;
7 begin
8 Boo1 := TBoo1.Create;
9 end;
Now let's look at what Boo1 points to (Boo1 is a pointer, remember ?) :
(Boo1 points to the following values, each 4 bytes long)
a Pointer to TBoo1's VMT
FDataA
FDataB
Now let's examine a decendant of TBoo1 :
TBoo2 = class(TBoo1)
FDataC, FDataD: Integer;
end;
var
Boo2: TBoo2;
begin
Boo2 := TBoo2.Create;
end;
Boo2 will point to the following values in memory :
a Pointer to TBoo2's VMT
FDataA
FDataB
FDataC
FDataD
Notice that the values that Boo2 points to include some of the values that Boo1
points to. That's very easy to explain - TBoo2 inherites from TBoo1, therefor it
must include all of the fields that TBoo1 has.
As a general case, we could state that each class instance points to the following
values :
a pointer to the Class' VMT
a list of the Class' parent's fields
a list of the Class' fields
Now it's time to investigate interfaces. Before we can fully understand interfaces
we must understand the way delphi makes a method call to a class' instance. What
delphi actually does, is call a function with one more parameter than was declared,
and that parameter is the instance itself. Let's look at an example :
10 TMoo = class
11 FData: Integer;
12 procedure Act(Value: Integer);
13 end;
14
15 procedure TMoo.Act(Value: Integer);
16 begin
17 if FData = Value then
18 FData := FData + 1
19 else
20 FData := Value;
21 end;
22
23 var
24 Moo: TMoo;
25 begin
26 Moo := TMoo.Create;
27 Moo.Act(15);
28 end;
How does delphi implement this ? Simple, 'TMoo.Act' is actually compiled into a
procedure that accepts two(!) parameters. One is the defined parameter -'Value' of
type integer. The other is an instance of class TMoo. Every time delphi calls
'Moo.Act' it does some preprocessing before hand, that is, it passes the instance
of TMoo that is making the call. Basically you could say that any call to a method
of an object is translated to a regular call to a function / procedure that accepts
the object making the call as a parameter.
In the previos example, 'TMoo.Act' is actually compiled to something like this :
29
30 procedure TMoo_Act(Self: TMoo; Value: Integer);
31 begin
32 if Self.FData = Value then
33 Self.FData := FData + 1
34 else
35 Self.FData := Value;
36 end;
37
38 It's time to go back to interfaces. Consider the following code :
39
40 IKoo = interface
41 function Calculate(Value: Integer): Double;
42 end;
43
44 function Evaluate(Koo: IKoo; Value: Integer): Double;
45 begin
46 Result := Koo.Calculate(Value);
47 end;
48
49 TKooA = class(TInterfacedObject, IKoo)
50 function Calculate(Value: Integer): Double;
51 end;
52
53 TKooB = class(TInterfacedObject, IKoo)
54 procedure DoNothing;
55 function Calculate(Value: Integer): Double;
56 end;
Any class that supports IKoo can be passed as a variable to the function
'Evaluate'. When we pass an instance of TKooA to 'Evaluate' we need to call the
first method of TKooA, but when we pass an instance of TKooB, we need to call the
second method of TKooB ! How will delphi now which function to call at each time ?!
Inorder to understand the answer, we must review what an interface realy is (and
how it is implemented in delphi). An interface is simply a list of methods that a
class declares that it implements. That is, each method in the interface is
implemented in the class. The way deplhi implements this is thus :
Each interface a class supports is actually a list of pointers to methods.
Therefor, each time a method call is made to an interface, the interface actually
diverts that call to one of it's pointers to method, thus giving the object that
realy imlpements it the chance to act. I'll explain that via the 'Koo' example
above :
Each time the function 'Evaluate' gets a parameter of type IKoo, it realy gets a
list (with 4 items - IKoo inherites from IUnknown) of pointers to methods. If it
got an IKoo interface that was implemented by TKooA, then the 4th item in the
pointer-to-method list would point to 'TKooA.Calcualte'. Otherwise it would point
to 'TKooB.Calcualte'. Therefor, when a call is made to 'IKoo.Calculate' what
actually is called is what 'IKoo.Calcualte' points to (either 'TKooA.Calculate' or
TKooB.Calculate'). Thus delphi implements interfaces.
And now to how delphi stores interfaces in memory. For each instance of a class
that supports 'N' interfaces, we need 'N' different lists of pointer-to-method (for
each interface we need a list of pointer-to-method). But these lists are the same
in the scope of a single class, therefor inorder to save memory, we only hold 'N'
pointers to these lists for each instance (instead of the lists themselves).
Consider the following code :
57 ILooA = interface
58 end;
59
60 ILooB = interface
61 end;
62
63 TLoo = class(TInterfacedObject, ILooA, ILooB)
64 FLooA, FLooB: Integer;
65 end;
This is how an instance of TLoo would look in memory :
a pointer to TLoo's VMT
FRefcount
IUnknown
FLooA
FLooB
ILooB
ILooA
In general, any class' instance would look like this :
a poitner to the class' VMT
the class' parent's structor (except for the pointer to the VMT)
first data member of the class
.
.
last data member of the class
last interface in the class' interface list
.
.
first interface in the class' interface list
As I said at the begining of this article, inorder to realy grasp the way delphi
implements class & interfaces we must look at the assembler code delphi produces.
First we'll learn a bit of assembler inorder to understand to code that will
follow. In assembler there is a thing called 'Register'. A register is a place on
the CPU that can hold a 32 bit value. On a Pentium CPU there are 8 main registers
(EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP). Most actions that are done in assembler
are done on registers. Here are a few commands in assembler :
(Moves the value into the register)
MOV Register, Value
(Moves the value in Register2 into Register1)
Mov Register1, Register2
(Moves the value that Register2 points to into Register1. This is the same as the
followin code : 'Register1 := Register2^;')
Mov Register1, [Register2]
(Moves the value that Register2 + Value points to into Register1. The same as :
'Register1 := Pointer(Integer(Register2) + Value)^;')
Mov Register1, [Register2 + Value]
Eaxmples :
66 Mov EAX, 10
67 MOV EBX, EAX
68 MOV EAX, [EBX + 6]
EBX will hold the value 10 and EAX will hold the value that is in the address $10.
Just inorder to make sure that you understood this part, I'll give an example of
how delphi assignes a value to an instance's data member.
69 TGoo = class
70 FDataA, FDataB: Integer;
71 end;
72
73 var
74 Goo: TGoo;
75 begin
76 Goo := TGoo.Create;
77 Goo.FDataA := 5;
78 Goo.FDataB := 7;
79 end;
80
81 //If you'd open delphi's disassembler you'd see the following code :
82
83 //Goo.FDataA := 5;
84 mov eax, [ebp - $08]
85 mov[eax + $04], $00000005
86 //Goo.FDataB := 7;
87 mov eax, [ebp - $08]
88 mov[eax + $08], $00000007
Why move the value pointed by 'ebp-$08' ? Simple, that's where the variable Goo is
stored. Notice that accessing FDataA is the same as accessing the address at 'eax +
$04' and that accessing FDataB is the same as accessing the address at 'eax + $08'.
That's because the address 'eax' points to is the pointer to the VMT of TGoo, and
(as I mentioned before) the following values in memory are the data members of
TGoo.
Let's go back to interfaces. Look at the following code :
89
90 IRoo = interface
91 end;
92
93 TRoo = class(TInterfacedObject, IRoo)
94 end;
95
96 var
97 Roo: TRoo;
98 RooIntf: IRoo;
99 begin
100 Roo := TRoo.Create;
101 RooIntf := Roo;
102 RooIntf._AddRef;
103 end;
The following assembler code isn't exactly what delphi produces but it serves the
same point :
104
105 // RooIntf := Roo;
106 // eax holds the value returned by TRoo.Create, that is, the variable Roo
107 // ecx holds the value that should later be assigned to RooIntf
108 mov ecx, eax
109 // This is the same as : 'ecx := ecx + $0C';
110 add ecx, $0C
111 // RooIntf._AddRef
112 // Push 'ecx' onto the CPU's stack
113 push ecx
114 mov ecx, [ecx]
115 // 'call' tells the CPU to jump to the address stored as a value in 'ecx'
116 call ecx
117
118 Let's look at the code that 'call ecx' brought us too :
119
120 // POP the value we pushed onto the stack into 'cx'
121 pop ecx
122 // Same as : 'cx := ecx - $0c;
123 sub ecx, $0C
124 // Call the method '_AddRef' with 'ecx' as a variable.
125 call TInterfacedObject._AddRef(ecx)
A Little explaination is due. Why did delphi add '$0C' to 'ecx' ? remember how Roo
is stored in memory (a pointer to VMT, FRefCount (Of InterfacedObject), IUnknown
(Of TInterfacedObject), IRoo). IRoo is the forth value in the list that 'ecx'
points to. Each value is 4 bytes long, so IRoo is 12 (4*4) bytes after 'ecx', and
'$0C' is 12 in exadecimel notation. So basically, adding '$0C' to 'ecx' just made
'ecx' point to the right value, that is, point to IRoo of Roo (an instance of
TRoo).
Why do we push ecx into the stack ? That's cause we'll need to use it later, when
calling the real '_AddRef' method. Remeber, 'ecx' is the value pointing to Roo +
12.
After that, we move into 'ecx' the value that 'ecx' pointed to. Remeber when I said
that instead of holding the lists of pointer-to-method, delphi stores only the
pointers to them (to save memory) ? That's why 'ecx' was actually a pointer, but
now it holds the value it pointed to before.
The next command, is to call the method that 'ecx' holds. Now we'll look at that
method. It's very short. The only thing it does is modify the value of 'ecx' (after
poping it from the stack) so it is equal to the value of Roo (that is, it points to
the variable Roo). Then the method 'TInterfacedObject._AddRef' is called with 'ecx'
(Roo) as a parameter. This is the same as when I've writen that delphi actually
complies a Class' method into a regualr function / procedure that accepts one extra
parameter - the instance of the class.
What was that good for ? We added a value from a poitner then did this jump
around in memory, then subtracted the same value from the pointer and called the
function the pointer points too ! why bother ? we could simple call the function
without adding and subtracting values !
This is where the power of indirection comes into the game. Notice
that the call to 'RooIntf._AddRef' didn't know that RooIntf was actually of an instance of TRoo. It just called the method that was there to call. The Implementation of this method is where the reassigning of the value of the pointer was made. That is, only the implementation that RooIntf points to (IRoo of TRoo) knew how much was added or substracted from the pointer pushed to the stack. If we had another varaible of type TRoo2, that also implemented IRoo, and we would have made the following assignment 'RooIntf := varaible of type TRoo2', and would call the method 'RooIntf._AddRef' then a different value would be subtracted from the value in the stack. Thus making the method call go to the right place in the TRoo2 class.
|