Bitten by an STL Gotcha...again
Friday, April 04, 2008

I was recently using an STL map container and hit one of those "STL gotchas" that I had forgotten about. I figured I would pass along what I found and the changes I made. As everyone already knows an STL map is a sorted associated container which takes a key-value pair. The existing code I had, which was using an std::map, actually contained unique key-value pairs. What I mean is that each pair was only inserted into the map once and the pairs were never updated by inserting the same key twice. When I went to change this code I realized I would need to change the values of existing items in the map. Here's some sample code to explain what I was doing:


typedef std::map<std::wstring, std::wstring> MyMap;
typedef std::pair<std::wstring, std::wstring> MyMapEntry;

// Insert a value with key "MyKey".
MyMap theMap;
theMap.insert(MyMapEntry(L"MyKey", L"MyValue1"));

// Update the value associated with key "MyKey."
theMap.insert(MyMapEntry(L"MyKey", L"MyValue2"));

When I went to run this code and iterated through my map I noticed that the value associated with key "MyKey" was actually "MyValue1"!!! What the heck is going on? Well, the insert() routine is a little tricky here (evil if you ask me). If you are inserting a new item into the map, everything works fine. If you are attempting to overwrite an existing item, it sees that it is there and doesn't update it. Nice... So, I tried the use of operator[] to update my map and this worked. Here's what the second iteration (no pun intended) of the code looked like:


typedef std::map<std::wstring, std::wstring> MyMap;
typedef std::pair<std::wstring, std::wstring> MyMapEntry;

// Insert a value with key "MyKey".
MyMap theMap;
theMap[L"MyKey"] = L"MyValue1";

// Update the value associated with key "MyKey."
theMap[L"MyKey"] = L"MyValue2";

So, now that it is working I should just let it go right? Wrong. I remembered a Scott Meyer's article awhile back where he mentioned how each of the mechanisms for inserting/updating items in a map can affect efficiency based on how you use them. Here's the basic gist of things:

insert should only be used to insert new items into a map.
operator[] should only be used to update existing items.

But this is obviously a royal pain because how do you know if an item is already in a map without searching for it first? It can't be more efficient to search for an item first and then decide how to insert/update it? This is where lower_bound comes into play. This routine will return an iterator to the first element in a container with a key that is equal to or greater than a given key. Once you get an iterator to that point in the map you can determine if the key really exists by invoking key_comp. If the key matches, you know you've found your existing item and you can update it. If the key doesn't match you now have an iterator to where that item SHOULD be inserted into the map. So this "search" operation is not wasted after all. You can use this location "hint" to insert your new item into the map. Meyers also mentioned that insert will run in amortized constant-time because of the hint. So here's how you would write the earlier example given what we know now:


typedef std::map<std::wstring, std::wstring> MyMap;
typedef std::pair<std::wstring, std::wstring> MyMapEntry;

// Insert a value with key "MyKey".
MyMap theMap;
theMap.insert(MyMapEntry(L"MyKey", L"MyValue1"));

// Assume you don't know whether an item has already been inserted
// after this point *grin*
MyMap::iterator iter = theMap.lower_bound(L"MyKey");
if ((theMap.end() != iter) && 
     !(theMap.key_comp()(L"MyKey", iter->first)))
{
    iter->second = L"MyValue2";
}
else
{
    theMap.insert(iter, MyMapEntry(L"MyKey", L"MyValue2"));
}

Now the example above is pretty simple but when you have code that performs a number of insertions in numerous locations, it is easy to not know whether an item exists in a map. You could write a template function as well (which is what I did). Until next time...

- Gilemonster

Labels: C++

posted by Gilemonster @ 2:02 PM, , links to this post

ATL 7.0 String Conversion Classes
Sunday, December 17, 2006

If you've ever written Win32 code that is compiled for both ANSI and Unicode you've probably used the ATL 3.0 string conversion classes and their macros (e.g. W2A, A2W, T2A, A2T, etc.). They have been very useful but unfortunately have problems. Microsoft has alleviated a number of these issues in ATL version 7.0. This article gives a brief overview of those fixes and how the use of these classes has improved in version 7.0.

ATL 3.0 string conversion had the following problems which are fixed in version 7.0.

Usage not safe in loops.
Usage not safe in exception catch blocks.
Requires USES_CONVERSION to be defined.
Large strings stored on the heap where space is limited.

The main reason why ATL 3.0 had issues relates to where strings are stored and when they are freed. All converted strings were stored on the stack and they were not freed until the calling function returned. This means that if you had a routine that never returned (i.e. a separate "watch-dog" thread that never returns unless your application stops running) your converted strings were never freed. This could put tremendous strain on a thread's stack because of how large a string is and how often they are allocated. In version 7.0, the ATL now destructs the string when the object goes out of scope. It also checks the size of the string and if it is too large for the stack, it will store the string on the heap. So, small strings will be stored on the stack, but large ones will be allocated on the heap. Because the strings are destructed when they go out of scope, it is now safe to use the classes in loops because you know that when a loop iteration completes, the string will be destructed. This also makes them safe for use in exception handling code (e.g. catch(MyException &e)). Another nice improvement is the ability to leave that pesky USES_CONVERSION definition out of your code. It always annoyed me and I'm glad to see it go. :-)

Now that we've seen a quick overview of how the new classes are safer, let's look at how to use them because it is drastically different and if used like the older macro code you will get undefined results. If you want to use the new macros, you'll need to change your code. Below is the form of the macros that I stole from the MSDN:

CSourceType2[C]DestinationType[EX]

where:

[C] is present when the destination type must be constant.
[EX] is present when the initial size of the buffer must be specified as a template argument.
SourceType/DestinationType can be the following:
- A - ANSI character string
- W - UNICODE character string
- T - Generic character string (determined at compile time).
- OLE - OLE character string (equivalent to W).

Here are some simple examples of how to use the new macros. Note: I hate LPCSTR and LPWCSTR so you'll always see me use char * and wchar_t * whenever I can (probably not a good practice though). :-)

// Figure 1:
// Convert a UNICODE string to ANSI.
void 
ConvertUnicodeToAnsi(wchar_t * pszWStr)
{
   // Create a local instance of the CW2AEX class and construct
   // it using a wchar_t *.
   // Note:  Here you will notice that I am using CW2A which is 
   // a typedef macro of the CW2AEX class.
   CW2A pszAStr(pszWStr);

   // Note: pszAStr will become invalid when it goes out of 
   // scope.  In this example, that is when the function
   // returns.
}

// Figure 2:
// How to use a temporary instance of the CA2WEX class.
void
UseTempConvertedString(char * pszAStr)
{
   // Create a temporary instance of the CA2WEX class
   // and use it as a parameter in a function call.
   SomeSampleFunction(CA2W(pszAStr));

   // Note the temporary instance created in the
   // above call is only valid in the SomeSampleFunction
   // body.  Once the function returns, the temporary
   // string is destructed and no longer valid.
}

// Figure 3:
// How NOT to use the conversion macros and classes.  This
// example uses the new classes but applied using the old
// programming style.
void
BadFunction(wchar_t * pszWStr)
{
   // Create a temporary instance of CW2A, save a 
   // pointer to it and then use it.
   char * pszAStr = CW2A(pszWStr);

   // The pszAStr variable in the following line is an invalid pointer,
   // as the instance of CW2A has gone out of scope.
   ExampleFunctionA(pszAStr);
}

Figures 1 and 2 are pretty straight forward, but Figure 3 should be discussed further. In ATL 3.0, this is how we used the conversion classes. It should be noted that this code structure is no longer valid and will produce undefined results. Because of the new scoping of the conversion libraries, an invocation of the CW2AEX constructor cannot be used as we would expect. Figure 3 shows that the pszAStr variable does not contain a valid pointer even though it appears it should. If you need a converted string throughout the scope of a function, you should declare a local instance of the CW2AEX class on the stack and use the appropriate parameters during object construction (e.g. CW2A pszAStr(pszWStr);).

Specify a Custom Buffer Size
The default buffer size for the new ATL classes is 128 characters. If you need to change the default buffer size for certain types of conversions, use the EX macros and specify a new buffer size. This is defined as a C++ template. Here is an example:

// Figure 5:
// Specify a new buffer size with C++ template syntax.
void
UseCustomBufferSize(wchar_t * pszWStr)
{
   // Use a 16-character buffer.
   SomeFunction(CW2CAEX< 16 >(pszWStr));
}

The new ATL 7.0 string conversion classes are a much needed improvement over their 3.0 siblings. Of course you don't have to change all your code to use them if you don't want to. If you are concerned about application performance then you should consider updating your code. You will be able to use the classes in a number of places previously unavailable and that is pretty convenient. You can remove your old "work-around" code because of the safety of the new classes. I plan on looking at my own code and estimating how much it will take to upgrade my ATL usage to version 7.0. I might not be able to make the full change but I am least going to look at what the cost/benefit ratio is. And for new code I'll only use the new 7.0 classes. You should at least consider the same. Until next time...

- Gilemonster

Labels: C++

posted by Gilemonster @ 12:10 PM, , links to this post

Exception Handling Warnings in Visual Studio
Sunday, November 05, 2006

Have you ever seen this cryptic little C++ compiler warning before in Visual Studio 7.x (.NET 2003)?

cl: Command line warning D4025: overriding '/EHs' with '/EHa'

I've seen it a number of times but always ignored it because other articles have said that "the compiler knows better." Well, the compiler might know better but I want to know why it knows better. So let's dissect this warning.

In Visual Studio, you can specify the Exception Handling model in the Code Generation property page of a C++ project. What you are selecting is whether or not the compiler uses an asynchronous exception model or a synchronous exception model. The asynchronous model is an older exception handling mechanism where the compiler assumes any instruction can generate an exception (hardware). This significantly increases the overall code size because the compiler must have mechanics for tracking the lifetime of objects that cannot unwind. The synchronous model is new and tells the compiler that exceptions can only be thrown with a throw statement. Because the compiler can now assume that exceptions will only be thrown by a throw statement or at a function call, all that extra object lifetime tracking code is not supplied by the compiler. The MSDN states that hardware exceptions can still be caught using the synchronous model, "... However, some of the unwindable objects in the function where the exception occurs may not get unwound, if the compiler judges their lifetime tracking mechanics to be unnecessary for the synchronous model."

So how in the world did you get the aforementioned compiler warning in the first place? There are a couple of things to look for.

First, verify that your command line doesn't specify both '/EHs' and '/EHa'. You cannot specify both exception handling models.
Second, verify that your code isn't using structured exception translators (i.e. _set_se__translator()).

Note: Structured exception translators (for the uninitiated) are used so that C exceptions (raised always as of type unsigned int as opposed to C++ exceptions which can be raised of any type) can be handled by a C exception wrapper class and therefore attributed a type. This allows a matching C++ catch handler to catch a C exception. If you have C code mixed in your C++ project that uses structured exception translators, you will have to use the asynchronous exception model or the compiler will throw the above warning.

In VS 7.x, it has been documented that the inability to select the asynchronous exception model ('/EHa') in the project properties dialog was a bug. To get around this in 7.x, you need to select "No" for the "Enable C++ Exceptions" property. Then go to the "Command Line" property and add "/EHa" in the "Additional options" section. This will allow you to enable C++ exceptions in your projects. Here's a screenshot of the VS 2003 .NET C++ Code Generation properties dialog. This bug was fixed in version 8 and you can now select "Yes" to the "Enable C++ Exceptions" property with a value of "Yes With SEH Exceptions (/EHa)." Here's a screenshot of the updated dialog.

For my particular project, I had some C code that used structured exception translators. So that was the reason for my warning. Therefore, I disabled the option above, and added "/EHa" as an additional command line property option. In general, enabling C++ exceptions in your VS projects should always be considered thoroughly (required in my opinion) and you should never disregard warnings like this one (as I did). For this particular case, the compiler was smart enough to fix the setting at build time, but that is not always the case. Know and understand your project settings and fix all warnings if possible. Until next time...

- Gilemonster

Labels: C++

posted by Gilemonster @ 11:08 PM, , links to this post

Quick Observation on STL string Comparisons

I was looking through some code this week and I noticed a few lines of code that caught my eye. The lines of code were doing basic STL string comparison using the non-member overloaded operator== function. I remembered that the STL also provides an overloaded set of compare functions for strings and wondered why you would use the overloaded operator functions instead of compare. I did a bit of reading and found that there are reasons for using one over the other.

The easiest of the comparison mechanisms is obviously to use operator==. You can use it to compare string objects, string literals that are quoted, and traditional C-style string pointers. The nice thing is that it (and other operator routines) doesn't have to create temporary string objects. Here's a simple example:

#include <string>
using namespace std;

std::wstring strFirstName = L"Billy";

if (L"Billy" == strFirstName)
{
    // Do something
}
else if (L"BILLY" == strFirstName)
{
    // Do something else
}
else if (L"BillyBoy" == strFirstName)
{
    // Do something else
}

The above sample shows the simplicity of using operator==. It also should show that its use should be limited to comparing simple character sets or single characters. A value of true will only be returned when the two character sets being compared match identically with respect to length and ASCII character code (meaning "A" != "a").

The compare routines provide a mechanism for performing lexical comparisons and other more precise string comparisons. Instead of returning true or false, three values are returned.

0: the operand string is lexically equal to the parameter string.
1: the operand string is greater than the parameter string.
-1: the operand string is less than the parameter string.

You can also use the other overloaded compare routines to further specify what your lexical queries compare. Here are some examples.

#include <string>
using namespace std;

wstring strOperand = L"Billy Bob",
        strParameter = L"Billy Bob's Barbecue";

// Standard use of compare
int intComparisonVal = strOperand.compare(strParameter);

// Further specification examples
size_type iOperandStartIndex = 0,
          iNumOperandCharsToCompare = 5;

// The second example specifies the starting index of the operand
// string and how many characters in the operand string to compare.  
// So you can compare part of the operand string with the parameter
// string.
intComparisonVal = strOperand.compare(iOperandStartIndex,
                                      iNumOperandCharsToCompare,
                                      strParameter);

// The third example goes further by also specifying the starting 
// index of the parameter string and how many characters to compare 
// in the parameter string.  So you can specify part of the operand
// string with part of the parameter string.
size_type iParamStartIndex = 0,
          iNumParamCharsToCompare = 5;
intComparisonVal = strOperand.compare(iOperandStartIndex,
                                      iNumOperandCharsToCompare,
                                      strParameter,
                                      iParamStartIndex,
                                      iNumParamCharsToCompare);

// My last example shows how to compare the operand string against
// a standard C-style string.
const wchar_t * strParameterCStyle = L"Billy Bob's Barbecue";
intComparisonVal = strOperand.compare(strParameterCStyle);

There are other examples that further demonstrate the remaining overloaded compare routines which I have left out for simplicity. And you can see that they give you the ability to lexically compare STL strings and C-style strings with greater precision than using the non-member operator==. But, when do you choose one over the other and what are the details? Here's what I have learned as a set of general guidelines concerning STL string comparisons:

When doing simple string comparisons where you want to exclude any unneeded overhead, use the operator routines. They first compare the operand and parameter string lengths and only if they are equal does it do a character by character ASCII value code comparison.
When you need more precise comparisons, use the compare routines as desired. You can compare substrings which is more useful than an "all or nothing" string comparison where applicable.

So, after reading all this mess I went back and examined the code mentioned at the beginning of this article. The code was written correctly and the string comparison was being used correctly. Anyway, I thought this was a quick and simple topic. Until next time...

- Gilemonster

Labels: C++

posted by Gilemonster @ 1:18 PM, , links to this post

Know Your C++: Virtual Functions
Tuesday, August 15, 2006

Polymorphism is one of three programming paradigms that give a language the ability to support object-oriented programming. In C++ this is implemented via virtual functions. Before we can understand virtual functions, we must first learn about function call binding.

When writing code, we frequently implement function declarations and definitions. In order for the code to compile, the compiler must match up the function call with the address of the associated function body. This process is known as binding. When the compiler and linker are able to successfully match the two before code execution, it is called early binding. Because polymorphism requires binding at runtime, C++ must provide an additional mechanism in order to support this. Matching the function call with its associated body at runtime is called late binding. C++ uses virtual functions to support late binding. When a function has the keyword virtual prepended to its declaration, the C++ compiler knows that function call binding will occur at runtime. Since compiler and linker know that a function body address will not be matched with an associated call at compile/link time, it creates a unique VTABLE for each object type that contains virtual functions. This table contains the addresses of each virtual function body for a particular object type. Along with the VTABLE, a vpointer (e.g. VPTR) is created/added to each object that is instantiated. The vpointer points to the appropriate VTABLE for that particular object type. So, for each object you create (that contains virtual functions), a vpointer is created, which knows the correct VTABLE to which it must point. And, this VTABLE contains the addresses of all the function bodies for that object type. So at runtime, your are guaranteed that your code will always have an associated function body for each function call. Still with me? Let's look at an example to clarify the concept.

class Base
{
public:
    Base();
    ~Base();

    void PrintStuff();

    // Virtual function declaration
    virtual long MorphMePlease();
};

// Virtual function definition
long
Base::MorphMePlease()
{
    return 0;
}

class Derived : public Base();
{
public:
    Derived();
    ~Derived();

    // Derived implements its own version of the
    // virtual function by using the same function
    // signature and removing the keyword virtual 
    long MorphMePlease();
};

// Derived function definition that overrides the Base class'
// virtual function.
long
Derived::MorphMePlease()
{
    return 1;
}

// A non-class member function.
long 
SomeNonMemberFunction(Base * ptrObj)
{
    // If a Base pointer is passed to this function,
    // Base::MorphMePlease() is called.  If a Derived
    // pointer is passed to this function, Derived::MorphMePlease()
    // is called.
    return (ptrObj->MorphMePlease());
}

In the above example you see that our Base class implements a virtual function called MorphMePlease() and that our Derived class overrides that function with its own implementation. Depending on what you pass into the non-member function SomeNonMemberFunction() either the Base or the Derived class' function can be called. How is that? When a pointer to a Derived object is passed in, it is implicitly upcast to a Base pointer. Even though the cast is performed, the object's vpointer is unchanged, and therefore still points to the VTABLE of Derived function addresses. This type of functionality demonstrates the ability to change functional behavior based on object type at runtime. This is the polymorphism we mentioned earlier.

So what happens with pure virtual functions? If by definition, they don't have a default definition, how does the compiler treat them? Well, as we mentioned earlier, the compiler must guarantee that every function call be associated with a corresponding function body address. If this were not guaranteed, we'd have all kinds of runtime errors when a function call was invoked that had no body (imagine the madness...). When the compiler sees that an object type has a pure virtual function, it reserves a row in the VTABLE for the associated function body, but doesn't actually insert the address (b/c there isn't one). Of course, this makes the VTABLE for that object type incomplete. Since the table is incomplete, the compiler cannot guarantee safe execution of that object's code. Because of this, the compiler will throw an error if an object of that type is instantiated anywhere in the compiled code. Because you cannot actually instantiate this object type, it becomes what is known as an abstract class type. Let's take a look at our previous example using pure virtual functions instead.

class Base
{
public:
    Base();
    ~Base();

    void PrintStuff();

    // Pure virtual functions require that a 
    // derived class define the function body.
    virtual long MorphMePlease() = 0;
};

class Derived : public Base();
{
public:
    Derived();
    ~Derived();

    // Derived class defines the function body
    // and is therefore allowed to be instantiated.
    long MorphMePlease();
};

// Derived function definition that defines the 
// Base class' pure virtual function
long
Derived::MorphMePlease()
{
    return 1;
}

In our updated example above, we notice the Base::MorphMePlease() function declaration is assigned to zero. That assignment signals to the compiler that a function is pure virtual (and therefore makes the class abstract). You will also notice that the Base::MorphMePlease() function definition is not provided. If you tried to provide one the compiler would barf and say you need to make up your mind b/w virtual and pure virtual functions (paraphrasing of course *grin*). If you were to remove the declaration and definition of the Derived::MorphMePlease() function, the Derived class would also become abstract and therefore not instantiatable. This is because the Derived object type's VTABLE would be incomplete as well.

Now that virtual functions and call binding have been explained, there are a few more points that need mentioning. First, the initialization of the VTABLE and vpointer structures is performed during object construction. The compiler provides this hidden code for you in your constructor and therefore you don't have to write it. Second, be careful about how you cast objects up/down the inheritance tree. Casting down or casting from a base object to a derived object is not guaranteed to work and has additional performance overhead. This is because there is no runtime information which guarantees the downcast to succeed. Therefore, an explicit cast is required. It is safest to use the dynamic_cast() function to cast a base object to its derived type. dynamic_cast() performs a runtime check to verify the cast will succeed and then performs the operation. This additional runtime type verification is the performance overhead I hinted at earlier. If you know more about your object hierarchy during downcasting you might be able to save the performance overhead of dynamic_cast() and substitute it with static_cast(). static_cast() doesn't perform the runtime check and there isn't guaranteed to be a safe operation. The benefit is that it is faster than dynamic_cast(). If the downcast fails, it returns a NULL pointer to the calling code. You just need to add a check for NULL before using the newly cast pointer. Casting up, on the other hand, is always safe. A derived class object is always a Base class object. This casting is performed implicitly for you and doesn't require the use of code>dynamic_cast or any other explicit cast code.

Virtual functions allow C++ to implement late binding and therefore polymorphic object behavior. They can sometimes be a bit tricky to grasp, but once understood, they become a fundamental principle that C++ programmers rely on. Until next time...

- Gilemonster

Labels: C++

posted by Gilemonster @ 10:19 AM, , links to this post

Know Your C++: Using Exceptions
Tuesday, July 25, 2006

As most of you already know, exceptions are errors or anomalies that occur during program execution. They are used as part of a programming philosophy that states, "You will not ignore this error scenario!" Rather than getting into why one would use exceptions, I am going to discuss a bit about their details and proper usage. The guts of C++ are always more interesting (right?).

How They Work

Lets begin with our standard throw invocation. In C++, the first thing performed is a copy of the exception object thrown is created. This copy is then "returned" from the function initiating the throw expression (the "throwing" function's scope is exited immediately and he is popped off the call stack). This process is called stack unwinding and will continue until an appropriate handler is found that can catch the object type thrown. As part of stack unwinding, each object that is popped off the stack goes out of scope, and therefore its destructor is called (this is good). This ensures that objects are cleaned up as they go out of scope (assuming the popped object's destructor does not throw an exception as well - more about this in a bit).

If an appropriate exception handler is not found, the terminate() function is automatically invoked. By default, this calls the Standard C library function abort() which kills your application. As a side note, if an object's destructor throws an exception (nobody does that right?) terminate() is called as well. Therefore, the general process goes as follows:

Function foo() throws an exception.
That function and all objects local to it are destructed and popped of the call stack along with the foo() function itself.
Then an appropriate handler is searched for all the way up the call stack until one (or none) is found.
If this happens, terminate() is called which calls abort() and it sends your application packing.

Using Exceptions

Now that we see the general idea of how they work, we need to make sure we account for various pitfalls in their use. Because the proper use of exceptions is critical to an application's viability, it is imperative that we know these extra details. Therefore, I have provided a few tips on how to use exceptions below.

Catch Exceptions by Reference

In order to guarantee that all parts of your object are handled by a catch block, make sure that you catch your exceptions by reference. There are good reasons to do this. One, is that you avoid the normal overhead of an extra copy when the object is passed-by-value. Second, you ensure that any modifications provided to the exception object itself are preserved if the object needs to be re-thrown. Finally, you avoid the slicing of derived exception objects when base-class handlers catch them. Here is a quick sample of catching an exception by reference:

try
{
    ...
}
catch (MyException & e)
{
    ...
}

Never Let Exceptions Leave an Object's Destructor

We all hate memory leaks right? Well in order to prevent memory leaks with exceptions, you have to first make sure that your destructors do not throw them. For example, I have a class below, the MyLeakyClass class that is defined like this:

class MyLeakyClass
{
public:
    MyLeakyClass();
    ~MyLeakyClass();
    ...
private:
    wchar_t * szSimpleString1;
    wchar_t * szSimpleString2;
};

MyLeakyClass::MyLeakyClass()
{
    szSimpleString1 = new wchar_t[32];
    szSimpleString2 = new wchar_t[32];
}

MyLeakyClass::~MyLeakyClass()
{
    delete [] szSimpleString1;
    delete [] szSimpleString2;
}

What happens if the first delete call above generates an exception? We know that execution stops immediately at the exception generating code and a matching exception handler is searched for by the caller (e.g. you called delete on an instance of MyLeakyClass). The first thing to notice is that the szSimpleString2 character array object is lost. He never is deleted and therefore you have a leak of the 32 bytes. Nice job, but it gets worse. What if the calling code was not your invocation of the delete operator? If it was say, a part of the stack unwinding by another exception being generated, your destructor's exception (the second exception in this example) will result in a call to the terminate() function as we mentioned earlier. Moreover, we know this means your application will most likely die. Therefore, best-case scenario here, we leak if we have dynamically allocated objects on the heap. Worst case is we leak and we die. Not a good combination. There is really only one solution to this problem if you do not want your application to die. You need a try/catch all which does nothing. This is the only way to guarantee that exceptions are not thrown from the handler and therefore no exceptions leave the destructor. It is not pretty and some of you will disagree with it, but here is a simple example of MyLeakyClass destructor rewritten:

MyLeakyClass::~MyLeakyClass()
{
    try
    {
        delete [] szSimpleString1;
    }
    catch(...) {}

    try
    {
        delete [] szSimpleString2;
    }
    catch(...) {}
}

Catch exceptions in order of object inheritance

When catching exceptions you have to understand that a handler will match the object type it is catches with the type of object being thrown. Like all objects in C++, a catch block which specifies a base class will match with a derived class object because of the is-a relationship. Therefore, you must order your catch blocks by the object hierarchy of the exception objects themselves. For example, if you have three exception object types:

class Base {};
class Derived : public Base {};
class DerivedAgain : public Derived {};

you need to provide your catch blocks in the following order:

try
{
    ...
}
catch (DerivedAgain & e)
{
    ...
}
catch (Derived & e)
{
    ...
}
catch (Base & e)
{
    ...
}

Starting with the most derived object and progressing up to the base-class, you are guaranteed to catch each and every type of the above three objects where appropriate. If you change that order, bad things can (and will) happen. For example, if you swapped the order of the Base and Derived handlers, your Derived objects would never be caught (by the Derived handler) because they would always be caught by the Base handler. This is because a Derived object is-a Base object.

Avoid exception specifications

Exception specifications are a mechanism that allows the programmer to declare the types (if any) of exceptions, a function will throw. In my opinion, they biggest benefit of using specifications is that it documents for a client/user what you throw. This is rather nice, but the adverse affect of using specifications greatly outweighs that benefit. The problem is when your function throws an exception that is not one of the specified types. Here is an example.

// function declaration...
unsigned long FooBar() throw(int);

// function definition...
unsigned long FooBar()
{
    EvilExceptionThrower()    // Might throw anything
}

What happens if EvilExceptionThrower() throws an exception of type std::runtime_error? FooBar() is defined to only throw exceptions of type int and therefore we enter the unknown. Because our function isn't supposed to throw std::runtime_error exceptions, the default behavior is to call terminate(). Oh yea, we know what happens next. You just killed your application, congratulations. Now, if you still like the specifications and want to use them, you have one solution. Provide a catch all handler in your function that makes sure only specified exceptions are thrown (you could also pray that nothing called in your function throws non-specified exceptions). Depending on your design, this could be good or bad. In general, catch all handlers are not elegant in the general sense. If you want your app to die, do not add the catch all. If your application should try to continue on, add 'em. In my opinion, it is preferable to add a comment in your function declaration, saying what type(s) of exceptions you throw. This way you do not limit your function and inadvertently kill your application.

In general, exceptions are a effective and useful programming philosophy that I have used a number of times. Like most things in C++, if used correctly and in the right circumstances, they perform their job well. Until next time...

- Gilemonster

Labels: C++

posted by Gilemonster @ 10:00 PM, , links to this post

Know Your C++: The new operator and operator new
Monday, July 24, 2006

Make sure you know how the new operator works when you use it in your C++ programs. Here is an example of how it is normally used:

wchar_t * myString = new wchar_t[256];

Most C++ programmers are familiar with this call and will tell you that the new operator invokes the target object,s constructor. That constructor performs operations like initializing member variables and whatnot. Yes, this is true, but how is this memory allocated and who allocates it?

This task is performed by a call to operator new (could the names be more confusing?). You can think of this as like a call to malloc(). operator new is usually declared like this:

void * operator new(size_t size);

The return type is void* because it returns a pointer to raw, uninitialized memory (on the heap or free store). You will also notice that the size_t parameter actually tells the routine how much memory to allocate. In the case where operator new cannot allocate the desired size, it will either return NULL or throw an exception. Now that we know who actually allocates the memory for the new operator, let us look at how that memory is allocated.

When a heap allocation request is invoked, a scan is performed that looks for a sufficiently large chunk of contiguous memory. This entails traversing the heap through pointers that point to the next areas on the heap that contain free memory blocks. When a sufficiently sized block is found, the memory is "tagged" with a size value (the boundary of where this allocated chunk ends) and pointers are rearranged so that this memory block is not seen as "free" (until we delete it later). Finally, a pointer to the starting address of this newly allocated section on the heap is then returned to the caller of operator new (in our case that is the new operator).

Now that our starting memory address is returned we can continue with the last task that the new operator must perform. This final step is the initialization of the allocated memory via invocation of the target object's constructor.

When a program is finished with its dynamically allocated memory, it must be manually released by invoking the delete operator. As with the new operator, a similar process occurs when operator delete is called by the delete operator (yes, this naming is just as confusing as with new). When this occurs, the deallocated memory is added back to the top of the heap so it can be reused.

To summarize, we can see that a call to the new operator, always performs the following two operations:

Call the operator new function which returns a pointer to the starting address of an uninitialized memory block on the heap, and
Call the target object's constructor to initialize that memory block.

Knowing this type of information is a must for C++ programmers. Even though you may never need to overload operator new (you can,t override the new operator) it is useful information for understanding how C++ allocates and instantiates user-defined objects.

-Gilemonster

** Update **
There are a couple more items to mention that have come from comments and other readers.

First of all when creating an array of objects, the new operator actually invokes operator new[]. As Chris added below, this is commonly known as array new which further confuses programmers.
Second, the new operator doesn't always allocate memory via operator new if placement new is invoked instead. You can override operator new with an additional void* parameter that is a block of memory that has already been allocated. This allows the programmer to provide a block of memory for reuse and therefore not perform the standard allocation that the object requires.

** End Update **

Labels: C++

posted by Gilemonster @ 8:37 PM, , links to this post

Discussion Topic #3: STL Containers and Object Destruction
Wednesday, June 14, 2006

When using STL containers, be aware that they assume they hold value-like objects. One of the pitfalls with this is that during container destruction it assumes that the objects it holds are being fully destructed as well. If your container holds a collection of raw pointers to user-defined objects, this is not the case. The memory each raw pointer points to is actually not destructed unless you manually delete the raw pointer itself. The STL does not do this for you. To help prevent this type of design flaw, here are a few guidelines to use.

If a container owns the objects it holds, use either full object types (e.g. vector<foo>) or reference-counted smart pointers to those objects (e.g. vector< shared_ptr<foo> >).
If a container does not own the objects it holds, the use of raw pointers is okay (e.g. vector<foo*>).
In general, never use containers of auto_ptr (e.g. vector< auto_ptr<foo> >).

Now these are just guidelines and therefore there are exceptions. Here are a few examples.

You have performance and/or space-critical requirements that make the use of containers of raw pointers more attractive. This is perfectly valid, but always be aware that you are required to manually delete those pointers before destructing your container.
The copy constructor of your object might be really expensive and therefore raw pointers to those objects might be held in the container.
You write have a container object that was written to specifically work with auto_ptr (sounds like suicide, but it might be legitimate).

In general, it is probably better to just use reference-counted smart pointers because you then never have to worry about whether you own the object or not. If you do own it, then your count should drop to zero on container destruction and no memory is leaked. Also, if you don't own the objects in your container, then a reference-counted smart pointer merely increments the reference count and decrements it during container construction and destruction respectively. You therefore do not destruct the objects unless you are the last person using them. For more information on reference-counted smart pointers, take a look at shared_ptr. This is a new STL smart pointer from Boost that can prove to be very useful in certain situations.

-Gilemonster

Labels: C++

posted by Gilemonster @ 9:10 PM, , links to this post

Discussion Topic #2: Create standalone and "guarded" header files
Monday, May 15, 2006

When writing C++ header files, you should always be conscious that your header files can be included by other projects and files. The reason for doing this is because improperly written header files can cause a number of problems including stray dependencies, unnecessary include burdens on your header file's users, and multiple inclusion/redefinitions to name a few. Two techniques that help to prevent this are making your header files standalone and using include guards. Both techniques are described below.

Make Your Headers Self-Sufficient

Header files should be written so that they can stand on their own two feet. Header files that include other unneeded header files can cause stray dependencies and longer build times (because the compiler is required to parse each header file that is included at compile time). The first step in making your header files self-sufficient is to look at each header file it includes and then verify that it actually needs to include that file. This is easily verifiable by just commenting out the included file and then recompiling. If the compile fails, at least that included object's type is needed. If the compile works, you can delete the commented out include altogether. Once you have verified which includes are required, the second step is determine what objects in those header files are actually needed. If you do not need the included object's definition (i.e. You return it from a function, it is declared as a parameter in one of your functions, etc) you can forgo including the type's definition in your header. You can achieve this with a forward declaration. A forward declaration allows you to declare the use of a type without the details of its definition. Note: The only time you need the type's definition in your header is when the compiler needs to know the size of the type (i.e. your have the included object as a member of your class) or when you need to use/call a member of that type (i.e. you invoke a function on the included object). Here is a sample header that shows displays forward declares and also minimal include directives:

#include <atlbase.h>    // include header for VARIANT class member

class SnmpRequestResults; // forward declaration of results class

class SnmpRequest
{
public:
    SnmpRequest();
    ~SnmpRequest();
    SnmpRequestResults* GetResults();

private:
    VARIANT m_vDataType;
};

As you can see in the above SnmpRequest class, we use both the forward declaration and a small number of include directives (we only include atlbase.h). We have removed the need for including the SnmpRequestResults header in our header and therefore made our own header file more self-sufficient. When using these techniques, think of your header files as being used completely by itself.

Use #include Guards

Make sure that your header files are only defined once in case it is included multiple times. You can guarantee this with the use of include guards. Modern compilers recognize these guards at the top of header files and therefore prevent multiple inclusions of a header file during the compilation of a single cpp file. They are implemented with preprocessor directives. The guards need to conform to the following rules:

The name needs to be unique.
No code or comments should come before or after the guarded portion.
The guards should wrap the entire object declaration.

Here is an example of include guard use:

#ifndef FOO_H_INCLUDED
#define FOO_H_INCLUDED

class Foo
{
public:
    Foo();
    ~Foo();
    int ComputeFooSize();
};
#endif

By using include guards and making header files self-sufficient, you can make your header files lightweight and therefore easier to use by other code. It also has the indirect affect of you having to write less code. That's always good. I hope this was helpful and if there are any questions, post away.

- Gilemonster

Labels: C++

posted by Gilemonster @ 8:48 PM, , links to this post

Discussion Topic #1: Logical vs. Bitwise constness
Sunday, May 07, 2006

One of my favorite C++ programming tips is the use of const as a semantic constraint. I use it whenever and wherever I can because it can significantly reduce logic errors and turn them into compile errors (a good thing). In order to use it properly, you need to know how C++ defines constness. There are two types of object constness in the C++ world and they affect class design in different ways.

C++ compilers, by default, support the use of bitwise constness. This means that no part of an object can be modified after instantiation when declared with the const modifier. Inside a class definition, this is great for "getter" functions that merely return values to the caller. By declaring the functions const, you are telling the caller (and the compiler) that this member function cannot modify any members of the class it is working on. Here's an example:

class MyObject
{
public:
    MyObject();  // constructor
    ~MyObject(); // destructor

    long GetObjectID() const;
    void SetObjectID(long lngID);

    long GetObjectLength() const;
    void SetObjectLength(long lngLength);

private:
    long m_lngID;
    long m_lngLength;
};

In the above example, you will notice that there are two class functions that are defined with the const modifier. This tells the compiler that the function will not modify either the m_lngID or m_lngLength class members. So, by default, the definitions of the two "get" functions above cannot contain code that changes those two private class members. Here's a sample function definition:

long MyObject::GetObjectID() const
{
    m_lngID = 25;       // ERROR!
    m_lngLength = 0;    // ERROR!
    return m_lngID;     // Allowed!
}

Above you will note that attempts to assign values to m_lngID and m_lngLength will cause the compiler to generate an error. Alternatively, the return statement will not generate an error b/c it is merely returning a copy of one of the class' data members. Its not changing it.

Now sometimes you need a const function that guarantees it won't change certain class members, but might change others. This brings about the idea of logical constness. Let's say you have a class member who's value can be changed (and that not affect your object's definition of constness). In C++ you can declare this member with the keyword mutable. This can be useful at times. Scott Meyers suggests that when designing interfaces, you should use logical constness. He makes a good point b/c mutable gives the designer the ability to create his/her own definition of what constness means for a particular object. Example below:

class MyObject
{
public:
    MyObject();  // constructor
    ~MyObject(); // destructor

    long GetObjectID() const;
    void SetObjectID(long lngID);

    long GetObjectLength() const;
    void SetObjectLength(long lngLength);

private:
    mutable long m_lngID;
    long m_lngLength;
};

The above class mirrors our earlier example in all aspects except that the m_lngID member is declared with the mutable modifier. This tells the compiler to allow modifications to this member inside of const class functions. Here's a sample function definition:

long MyObject::GetObjectID() const
{
    m_lngID = 25;    // Allowed b/c the member mutable
    m_lngLength = 0; // ERROR!
    return m_lngID;  // Allowed!
}

The above function definition shows how the compiler will treat both member variables based on whether they where declared using mutable.

So, from the examples above you can see how using const as a semantic constraint can be useful. The examples above also show how mutable allows the designer to define what constness means for a particular object. There is one final note to this topic and that is const modifiers aren't "full-proof" in their protection of class member modification. Let's add a new routine to our MyObject class that returns a pointer to a character string:

class MyObject
{
public:
    // Constructor, destructor, blah blah...

    char * GetObjectName() const;

private:
    char * m_strObjectName;
};

// Defintion
char * MyObject::GetObjectName() const
{
    return m_strObjectName;
}

The GetObjectName() function is declared as a const function, but the private m_strObjectName variable isn't safe from modification! How can that be you say? The reason is because the function is returning a pointer to the class' data member. Anyone can now take that pointer and do with it what they wish. Here's the correct way to declare the above routine and ensure the name variable isn't changed:

const char * GetObjectName() const;

This states that we aren't going to change m_strObjectName in the function and also that we are returning a pointer to a constant character string. The string that the pointer "points" to cannot be changed. This now guarantees that the caller can only query the value of the class' name member.

I hope this wasn't too confusing. If so, let me know and I'll attempt to further demonstrates the idea. The bottom line is that using const forces code semantics that improve code stability. Understanding how C++ defines constness also allows the designer to provide more customized object definitions.

- Gilemonster

Labels: C++

posted by Gilemonster @ 11:44 AM, , links to this post

Recommended C++/STL Reading
Thursday, May 04, 2006

I've read a number of articles/books on general C++ programming and have created my own personal list of books that are a requirement in your own cube. It is my opinion that anything Scott Meyers writes is worth reading. They are always informative and interesting.

Effective C++ (3rd ed.)
More Effective C++
Effective STL

They are great to keep around if you are like me and can't remember all the topics you've read in the past. Others?

-Gilemonster

Labels: C++

posted by Gilemonster @ 10:47 PM, , links to this post

The Write Better Code Forum

Site menu:

Bitten by an STL Gotcha...again
Friday, April 04, 2008

ATL 7.0 String Conversion Classes
Sunday, December 17, 2006

Exception Handling Warnings in Visual Studio
Sunday, November 05, 2006

Quick Observation on STL string Comparisons

Know Your C++: Virtual Functions
Tuesday, August 15, 2006

Know Your C++: Using Exceptions
Tuesday, July 25, 2006

Know Your C++: The new operator and operator new
Monday, July 24, 2006

Discussion Topic #3: STL Containers and Object Destruction
Wednesday, June 14, 2006

Discussion Topic #2: Create standalone and "guarded" header files
Monday, May 15, 2006

Discussion Topic #1: Logical vs. Bitwise constness
Sunday, May 07, 2006

Recommended C++/STL Reading
Thursday, May 04, 2006

Discussion Categories

Links

Previous Posts

Archives

Powered By

The Write Better Code Forum

Site menu:

Bitten by an STL Gotcha...again Friday, April 04, 2008

ATL 7.0 String Conversion Classes Sunday, December 17, 2006

Exception Handling Warnings in Visual Studio Sunday, November 05, 2006

Quick Observation on STL string Comparisons

Know Your C++: Virtual Functions Tuesday, August 15, 2006

Know Your C++: Using Exceptions Tuesday, July 25, 2006

Know Your C++: The new operator and operator new Monday, July 24, 2006

Discussion Topic #3: STL Containers and Object Destruction Wednesday, June 14, 2006

Discussion Topic #2: Create standalone and "guarded" header files Monday, May 15, 2006

Discussion Topic #1: Logical vs. Bitwise constness Sunday, May 07, 2006

Recommended C++/STL Reading Thursday, May 04, 2006

Discussion Categories

Links

Previous Posts

Archives

Powered By

Bitten by an STL Gotcha...again
Friday, April 04, 2008

ATL 7.0 String Conversion Classes
Sunday, December 17, 2006

Exception Handling Warnings in Visual Studio
Sunday, November 05, 2006

Know Your C++: Virtual Functions
Tuesday, August 15, 2006

Know Your C++: Using Exceptions
Tuesday, July 25, 2006

Know Your C++: The new operator and operator new
Monday, July 24, 2006

Discussion Topic #3: STL Containers and Object Destruction
Wednesday, June 14, 2006

Discussion Topic #2: Create standalone and "guarded" header files
Monday, May 15, 2006

Discussion Topic #1: Logical vs. Bitwise constness
Sunday, May 07, 2006

Recommended C++/STL Reading
Thursday, May 04, 2006