A place where programmers can discuss various programming topics and experiences.



ATL 7.0 String Conversion Classes

If you've ever written Win32 code that is compiled for both ANSI and Unicode you've probably used the ATL 3.0 string conversion classes and their macros (e.g. W2A, A2W, T2A, A2T, etc.). They have been very useful but unfortunately have problems. Microsoft has alleviated a number of these issues in ATL version 7.0. This article gives a brief overview of those fixes and how the use of these classes has improved in version 7.0.

ATL 3.0 string conversion had the following problems which are fixed in version 7.0.

The main reason why ATL 3.0 had issues relates to where strings are stored and when they are freed. All converted strings were stored on the stack and they were not freed until the calling function returned. This means that if you had a routine that never returned (i.e. a separate "watch-dog" thread that never returns unless your application stops running) your converted strings were never freed. This could put tremendous strain on a thread's stack because of how large a string is and how often they are allocated. In version 7.0, the ATL now destructs the string when the object goes out of scope. It also checks the size of the string and if it is too large for the stack, it will store the string on the heap. So, small strings will be stored on the stack, but large ones will be allocated on the heap. Because the strings are destructed when they go out of scope, it is now safe to use the classes in loops because you know that when a loop iteration completes, the string will be destructed. This also makes them safe for use in exception handling code (e.g. catch(MyException &e)). Another nice improvement is the ability to leave that pesky USES_CONVERSION definition out of your code. It always annoyed me and I'm glad to see it go. :-)

Now that we've seen a quick overview of how the new classes are safer, let's look at how to use them because it is drastically different and if used like the older macro code you will get undefined results. If you want to use the new macros, you'll need to change your code. Below is the form of the macros that I stole from the MSDN:

CSourceType2[C]DestinationType[EX]

where:

Here are some simple examples of how to use the new macros. Note: I hate LPCSTR and LPWCSTR so you'll always see me use char * and wchar_t * whenever I can (probably not a good practice though). :-)

// Figure 1:
// Convert a UNICODE string to ANSI.
void 
ConvertUnicodeToAnsi(wchar_t * pszWStr)
{
   // Create a local instance of the CW2AEX class and construct
   // it using a wchar_t *.
   // Note:  Here you will notice that I am using CW2A which is 
   // a typedef macro of the CW2AEX class.
   CW2A pszAStr(pszWStr);

   // Note: pszAStr will become invalid when it goes out of 
   // scope.  In this example, that is when the function
   // returns.
}
// Figure 2:
// How to use a temporary instance of the CA2WEX class.
void
UseTempConvertedString(char * pszAStr)
{
   // Create a temporary instance of the CA2WEX class
   // and use it as a parameter in a function call.
   SomeSampleFunction(CA2W(pszAStr));

   // Note the temporary instance created in the
   // above call is only valid in the SomeSampleFunction
   // body.  Once the function returns, the temporary
   // string is destructed and no longer valid.
}
// Figure 3:
// How NOT to use the conversion macros and classes.  This
// example uses the new classes but applied using the old
// programming style.
void
BadFunction(wchar_t * pszWStr)
{
   // Create a temporary instance of CW2A, save a 
   // pointer to it and then use it.
   char * pszAStr = CW2A(pszWStr);

   // The pszAStr variable in the following line is an invalid pointer,
   // as the instance of CW2A has gone out of scope.
   ExampleFunctionA(pszAStr);
}

Figures 1 and 2 are pretty straight forward, but Figure 3 should be discussed further. In ATL 3.0, this is how we used the conversion classes. It should be noted that this code structure is no longer valid and will produce undefined results. Because of the new scoping of the conversion libraries, an invocation of the CW2AEX constructor cannot be used as we would expect. Figure 3 shows that the pszAStr variable does not contain a valid pointer even though it appears it should. If you need a converted string throughout the scope of a function, you should declare a local instance of the CW2AEX class on the stack and use the appropriate parameters during object construction (e.g. CW2A pszAStr(pszWStr);).

Specify a Custom Buffer Size
The default buffer size for the new ATL classes is 128 characters. If you need to change the default buffer size for certain types of conversions, use the EX macros and specify a new buffer size. This is defined as a C++ template. Here is an example:

// Figure 5:
// Specify a new buffer size with C++ template syntax.
void
UseCustomBufferSize(wchar_t * pszWStr)
{
   // Use a 16-character buffer.
   SomeFunction(CW2CAEX< 16 >(pszWStr));
}

The new ATL 7.0 string conversion classes are a much needed improvement over their 3.0 siblings. Of course you don't have to change all your code to use them if you don't want to. If you are concerned about application performance then you should consider updating your code. You will be able to use the classes in a number of places previously unavailable and that is pretty convenient. You can remove your old "work-around" code because of the safety of the new classes. I plan on looking at my own code and estimating how much it will take to upgrade my ATL usage to version 7.0. I might not be able to make the full change but I am least going to look at what the cost/benefit ratio is. And for new code I'll only use the new 7.0 classes. You should at least consider the same. Until next time...

- Gilemonster

Labels:

posted by Gilemonster @ 12:10 PM, , links to this post