Know Your C++: Virtual Functions
Tuesday, August 15, 2006
Polymorphism is one of three programming paradigms that give a language the ability to support object-oriented programming. In C++ this is implemented via virtual functions. Before we can understand virtual functions, we must first learn about function call binding.
When writing code, we frequently implement function declarations and definitions. In order for the code to compile, the compiler must match up the function call with the address of the associated function body. This process is known as binding. When the compiler and linker are able to successfully match the two before code execution, it is called early binding. Because polymorphism requires binding at runtime, C++ must provide an additional mechanism in order to support this. Matching the function call with its associated body at runtime is called late binding. C++ uses virtual functions to support late binding. When a function has the keyword virtual prepended to its declaration, the C++ compiler knows that function call binding will occur at runtime. Since compiler and linker know that a function body address will not be matched with an associated call at compile/link time, it creates a unique VTABLE for each object type that contains virtual functions. This table contains the addresses of each virtual function body for a particular object type. Along with the VTABLE, a vpointer (e.g. VPTR) is created/added to each object that is instantiated. The vpointer points to the appropriate VTABLE for that particular object type. So, for each object you create (that contains virtual functions), a vpointer is created, which knows the correct VTABLE to which it must point. And, this VTABLE contains the addresses of all the function bodies for that object type. So at runtime, your are guaranteed that your code will always have an associated function body for each function call. Still with me? Let's look at an example to clarify the concept.
class Base { public: Base(); ~Base(); void PrintStuff(); // Virtual function declaration virtual long MorphMePlease(); }; // Virtual function definition long Base::MorphMePlease() { return 0; } class Derived : public Base(); { public: Derived(); ~Derived(); // Derived implements its own version of the // virtual function by using the same function // signature and removing the keyword virtual long MorphMePlease(); }; // Derived function definition that overrides the Base class' // virtual function. long Derived::MorphMePlease() { return 1; } // A non-class member function. long SomeNonMemberFunction(Base * ptrObj) { // If a Base pointer is passed to this function, // Base::MorphMePlease() is called. If a Derived // pointer is passed to this function, Derived::MorphMePlease() // is called. return (ptrObj->MorphMePlease()); }
In the above example you see that our Base
class implements a virtual function called MorphMePlease()
and that our Derived
class overrides that function with its own implementation. Depending on what you pass into the non-member function SomeNonMemberFunction()
either the Base
or the Derived
class' function can be called. How is that? When a pointer to a Derived
object is passed in, it is implicitly upcast to a Base
pointer. Even though the cast is performed, the object's vpointer is unchanged, and therefore still points to the VTABLE of Derived
function addresses. This type of functionality demonstrates the ability to change functional behavior based on object type at runtime. This is the polymorphism we mentioned earlier.
So what happens with pure virtual functions? If by definition, they don't have a default definition, how does the compiler treat them? Well, as we mentioned earlier, the compiler must guarantee that every function call be associated with a corresponding function body address. If this were not guaranteed, we'd have all kinds of runtime errors when a function call was invoked that had no body (imagine the madness...). When the compiler sees that an object type has a pure virtual function, it reserves a row in the VTABLE for the associated function body, but doesn't actually insert the address (b/c there isn't one). Of course, this makes the VTABLE for that object type incomplete. Since the table is incomplete, the compiler cannot guarantee safe execution of that object's code. Because of this, the compiler will throw an error if an object of that type is instantiated anywhere in the compiled code. Because you cannot actually instantiate this object type, it becomes what is known as an abstract class type. Let's take a look at our previous example using pure virtual functions instead.
class Base { public: Base(); ~Base(); void PrintStuff(); // Pure virtual functions require that a // derived class define the function body. virtual long MorphMePlease() = 0; }; class Derived : public Base(); { public: Derived(); ~Derived(); // Derived class defines the function body // and is therefore allowed to be instantiated. long MorphMePlease(); }; // Derived function definition that defines the // Base class' pure virtual function long Derived::MorphMePlease() { return 1; }
In our updated example above, we notice the Base::MorphMePlease()
function declaration is assigned to zero. That assignment signals to the compiler that a function is pure virtual (and therefore makes the class abstract). You will also notice that the Base::MorphMePlease()
function definition is not provided. If you tried to provide one the compiler would barf and say you need to make up your mind b/w virtual and pure virtual functions (paraphrasing of course *grin*). If you were to remove the declaration and definition of the Derived::MorphMePlease()
function, the Derived
class would also become abstract and therefore not instantiatable. This is because the Derived
object type's VTABLE would be incomplete as well.
Now that virtual functions and call binding have been explained, there are a few more points that need mentioning. First, the initialization of the VTABLE and vpointer structures is performed during object construction. The compiler provides this hidden code for you in your constructor and therefore you don't have to write it. Second, be careful about how you cast objects up/down the inheritance tree. Casting down or casting from a base object to a derived object is not guaranteed to work and has additional performance overhead. This is because there is no runtime information which guarantees the downcast to succeed. Therefore, an explicit cast is required. It is safest to use the dynamic_cast()
function to cast a base object to its derived type. dynamic_cast()
performs a runtime check to verify the cast will succeed and then performs the operation. This additional runtime type verification is the performance overhead I hinted at earlier. If you know more about your object hierarchy during downcasting you might be able to save the performance overhead of dynamic_cast()
and substitute it with static_cast()
. static_cast()
doesn't perform the runtime check and there isn't guaranteed to be a safe operation. The benefit is that it is faster than dynamic_cast()
. If the downcast fails, it returns a NULL pointer to the calling code. You just need to add a check for NULL before using the newly cast pointer. Casting up, on the other hand, is always safe. A derived class object is always a Base class object. This casting is performed implicitly for you and doesn't require the use of code>dynamic_cast or any other explicit cast code.
Virtual functions allow C++ to implement late binding and therefore polymorphic object behavior. They can sometimes be a bit tricky to grasp, but once understood, they become a fundamental principle that C++ programmers rely on. Until next time...
- Gilemonster
Labels: C++
posted by Gilemonster @ 10:19 AM, , links to this post
Know Your Data Structures: Queues and Stacks
Tuesday, August 08, 2006
Stacks
A stack is a dynamic linear data structure where insertion and deletion of items is already predetermined. This basically means that we already know where items are added and deleted before we perform the operation itself. Here's how it works. Stacks only allow you to insert and delete items from the top (or tail of the stack). This logic flow is called LIFO ("Last In First Out") and means that the deletion of an item from the stack is always the item that was most recently added. A good analogy that I've read is a stack of dishes in a restaurant's buffet line. New plates are added at the top of stack and when a customer grabs one, the top-most plate is removed. Insert and delete operations are called push and pop respectively and both are performed in constant time O(1).
When implementing a stack data structure, there are two main boundary conditions you need to account for: underflow and overflow. An underflow occurs when an empty stack is "popped." This means that a delete operation was attempted on an empty stack. An overflow occurs when a full stack is "pushed." This means that an insert operation was attempted on a full stack. Below, I have provided some pseudocode showing how a simple stack structure works:
void Push(int newValue) { if (Stack isn't full (e.g. overflow)) { - Add item to top of stack - Increment your top index value } } void Pop(int & newValue) { if (Stack isn't empty (e.g. underflow)) { - Remove item from top of stack and assign to the newValue parameter - Decrement your top index value } }Queues
A queue is also a dynamic linear data structure with predetermined insertion and deletion operations. The main difference between a stack and queue is that a queue practices FIFO ("First In First Out"). This means that new items are added at the top of the queue (or tail) and are removed from the bottom (or head). A good analogy is your standard waiting line to get into the movie theater. You get in line at the end and wait until those customers before you have purchased tickets and walked away. Then you get to go next while the people who entered the line after you await their turn. Insert and delete operations are called enqueue and dequeue respectively and both are performed in constant time O(1) just like stacks.
Queues require slightly more work because we are required to keep up with both the beginning (the "head") and the ending (the "tail") of the structure in our implementations. With stacks, all we have to do is just make sure that the top-most element is within boundary limits. With queues, we are required to make sure that when an element is added, it goes in the tail's current location and when an item is removed, we have to make sure that the element in the head's current location is deleted. Also, the same boundary conditions in stacks exist for queues (e.g. underflows and overflows). Below, I have provided some pseudocode showing how a simple queue structure works:
void Enqueue(int newVal) { if (Queue isn't full (e.g. overflow)) { - Add your value to the tail of the queue - Increment your tail index (or wrap it) } } void Dequeue(int & newVal) { if (Queue isn't empty (e.g. underflow)) { - Remove item from queue and assign to the newVal parameter - Increment your head index (or wrap it) } }
Stacks and queues are simple and easy to use for certain programming projects (e.g. postfix/infix evaluations). The easiest implementation technique used for both is to use an array as the internal structure (my pseudocode examples use an array). There are more complex implementations but arrays are used more frequently because of the simplicity. As always, post comments if something isn't clear and I'll try to help out. If you'd like the full queue and stack class files (.h & .cpp) let me know and I'll provide a link to them. Until next time...
-Gilemonster
Labels: General
posted by Gilemonster @ 8:38 AM, , links to this post