ATL 7.0 String Conversion Classes
Sunday, December 17, 2006
If you've ever written Win32 code that is compiled for both ANSI and Unicode you've probably used the ATL 3.0 string conversion classes and their macros (e.g. W2A, A2W, T2A, A2T, etc.). They have been very useful but unfortunately have problems. Microsoft has alleviated a number of these issues in ATL version 7.0. This article gives a brief overview of those fixes and how the use of these classes has improved in version 7.0.
ATL 3.0 string conversion had the following problems which are fixed in version 7.0.
- Usage not safe in loops.
- Usage not safe in exception catch blocks.
- Requires
USES_CONVERSION
to be defined. - Large strings stored on the heap where space is limited.
The main reason why ATL 3.0 had issues relates to where strings are stored and when they are freed. All converted strings were stored on the stack and they were not freed until the calling function returned. This means that if you had a routine that never returned (i.e. a separate "watch-dog" thread that never returns unless your application stops running) your converted strings were never freed. This could put tremendous strain on a thread's stack because of how large a string is and how often they are allocated. In version 7.0, the ATL now destructs the string when the object goes out of scope. It also checks the size of the string and if it is too large for the stack, it will store the string on the heap. So, small strings will be stored on the stack, but large ones will be allocated on the heap. Because the strings are destructed when they go out of scope, it is now safe to use the classes in loops because you know that when a loop iteration completes, the string will be destructed. This also makes them safe for use in exception handling code (e.g. catch(MyException &e)
). Another nice improvement is the ability to leave that pesky USES_CONVERSION
definition out of your code. It always annoyed me and I'm glad to see it go. :-)
Now that we've seen a quick overview of how the new classes are safer, let's look at how to use them because it is drastically different and if used like the older macro code you will get undefined results. If you want to use the new macros, you'll need to change your code. Below is the form of the macros that I stole from the MSDN:CSourceType2[C]DestinationType[EX]
where:
[C]
is present when the destination type must be constant.[EX]
is present when the initial size of the buffer must be specified as a template argument.SourceType/DestinationType
can be the following:A
- ANSI character stringW
- UNICODE character stringT
- Generic character string (determined at compile time).OLE
- OLE character string (equivalent to W).
Here are some simple examples of how to use the new macros. Note: I hate LPCSTR
and LPWCSTR
so you'll always see me use char *
and wchar_t *
whenever I can (probably not a good practice though). :-)
// Figure 1: // Convert a UNICODE string to ANSI. void ConvertUnicodeToAnsi(wchar_t * pszWStr) { // Create a local instance of the CW2AEX class and construct // it using a wchar_t *. // Note: Here you will notice that I am using CW2A which is // a typedef macro of the CW2AEX class. CW2A pszAStr(pszWStr); // Note: pszAStr will become invalid when it goes out of // scope. In this example, that is when the function // returns. }
// Figure 2: // How to use a temporary instance of the CA2WEX class. void UseTempConvertedString(char * pszAStr) { // Create a temporary instance of the CA2WEX class // and use it as a parameter in a function call. SomeSampleFunction(CA2W(pszAStr)); // Note the temporary instance created in the // above call is only valid in the SomeSampleFunction // body. Once the function returns, the temporary // string is destructed and no longer valid. }
// Figure 3: // How NOT to use the conversion macros and classes. This // example uses the new classes but applied using the old // programming style. void BadFunction(wchar_t * pszWStr) { // Create a temporary instance of CW2A, save a // pointer to it and then use it. char * pszAStr = CW2A(pszWStr); // The pszAStr variable in the following line is an invalid pointer, // as the instance of CW2A has gone out of scope. ExampleFunctionA(pszAStr); }
Figures 1 and 2 are pretty straight forward, but Figure 3 should be discussed further. In ATL 3.0, this is how we used the conversion classes. It should be noted that this code structure is no longer valid and will produce undefined results. Because of the new scoping of the conversion libraries, an invocation of the CW2AEX constructor cannot be used as we would expect. Figure 3 shows that the pszAStr
variable does not contain a valid pointer even though it appears it should. If you need a converted string throughout the scope of a function, you should declare a local instance of the CW2AEX class on the stack and use the appropriate parameters during object construction (e.g. CW2A pszAStr(pszWStr);
).
Specify a Custom Buffer Size
The default buffer size for the new ATL classes is 128 characters. If you need to change the default buffer size for certain types of conversions, use the EX
macros and specify a new buffer size. This is defined as a C++ template. Here is an example:
// Figure 5: // Specify a new buffer size with C++ template syntax. void UseCustomBufferSize(wchar_t * pszWStr) { // Use a 16-character buffer. SomeFunction(CW2CAEX< 16 >(pszWStr)); }
The new ATL 7.0 string conversion classes are a much needed improvement over their 3.0 siblings. Of course you don't have to change all your code to use them if you don't want to. If you are concerned about application performance then you should consider updating your code. You will be able to use the classes in a number of places previously unavailable and that is pretty convenient. You can remove your old "work-around" code because of the safety of the new classes. I plan on looking at my own code and estimating how much it will take to upgrade my ATL usage to version 7.0. I might not be able to make the full change but I am least going to look at what the cost/benefit ratio is. And for new code I'll only use the new 7.0 classes. You should at least consider the same. Until next time...
- Gilemonster
Labels: C++
posted by Gilemonster @ 12:10 PM,