A place where programmers can discuss various programming topics and experiences.



.NET Generics and Native Code Sharing

Sorry for the break in articles but work has been extremely time consuming. Hopefully, I can find some time to write more articles. Anyways, now for the goodies. If you are like you me, you have been playing ".NET catch-up" trying to read as many version 2.0/3.0 .NET articles as quickly as possible. I recently have been reading up on Generics in the MSDN's C# Programming Reference and it is a very exciting addition to the .NET framework. Specifically I want to talk about how it reduces code bloat through native code sharing. I'll start with a quick overview of Generics, how it is compiled, and then how code sharing is provided at runtime.

Generics who?
Generics is a language extension that allows programmers to parameterize the use of types. It allows classes, interfaces, structs, delegates, and methods to parameterize the data they manipulate and store (if you've ever used C++ templates the syntax will appear familiar). When declaring the constructs just mentioned, you simply replace the type of object being manipulated with what is called a type parameter. You are also allowed to specify more than one type parameter if needed. Here is a simple example of a generic class in C#:

using System;
using System.Text;
using System.Collections.Generic;

namespace WriteBetterCode
{
    public class MySimpleStack<T>
    {
        private int m_StackSize = 0;
        private int m_StackTopIndex = 0;
        T[] m_StackItems;

        public MySimpleStack(uint stackSize)
        {
            if (0 == stackSize)
            {
                m_StackSize = stackSize;
                m_StackItems = new T[m_StackSize];
            }
            else
            {
                throw new InvalidOperationException(
                    @"MySimpleStack must have a size greater
                    than zero!!!");
            }
        }

        public T Pop()
        {
            m_StackTopIndex--;
            if(m_StackTopIndex >= 0)
            {
                return m_StackItems[m_StackTopIndex];
            }
            else
            {
                m_StackTopIndex = 0;
                throw new InvalidOperationException(
                    "You popped an empty MySimpleStack!!!");
            }
        }

        public void Push(T newStackItem)
        {
            if(m_StackTopIndex >= m_StackSize)
            {
                throw new InvalidOperationException(
                    "The MySimpleStack is full!!!");
            }
            
            m_StackItems[m_StackTopIndex] = newStackItem;
            m_StackTopIndex++;
        }
    }

    public class TestMySimpleStack
    {
        static void Main()
        {
            try
            {
                MySimpleStack<int> stackOfInts = 
                    new MySimpleStack<int>(5);

                stackOfInts.Push(10);
                stackOfInts.Push(15);
                int intValue = stackOfInts.Pop();
            }
            catch (InvalidOperationException e)
            {
                Console.WriteLine(e.Message);
            }
        }
    }
}
// Figure 1 : MySimpleStack<T> generic class declaration 
//            and instantiation

As you can see in the above example, the MySimpleStack class uses the type parameter T to specify that it is a generic type. In order to instantiate the generic type, you must replace that identifier with a type argument (e.g. int in the above example). The resulting type is called a constructed type (e.g. MySimpleStack<int>), of which there are two types:

  • Closed constructed type: a constructed type that does not contain any type parameters (e.g. MySimpleStack<int> stackOfInts).
  • Open constructed type: a constructed type that contains at least one type parameter (e.g. MyOpenType<int, U> stackOfOpenTypes).

The .NET Framework also provides a number of built-in generic types for you. You can use those types in your code by referencing the System.Collections.Generic namespace (click here for a full list of generic types provided by the framework). One of the built-in generic types is the Queue<T> class. Here is an example of how to use it:

using System;
using System.Text;
using System.Collections.Generic;

namespace WriteBetterCode 
{
    public class TestQueue
    {
        static void Main()
        {
            Queue<int> myQueue = new Queue<int>();
            myQueue.Enqueue(5);
            int queuedVal = myQueue.Dequeue();
        }
    }
}

So how is generic code compiled? At compile-time, the generic type is converted into IL and metadata, which contain additional information specifying that a type parameter is being used (this was a required update to the CLR in version 2.0 of the framework so that Generics could supported). Other than that, a generic type is compiled just like any other type.

Native Code Sharing
Now we can get into the interesting stuff with native code sharing. This is when two or more “compatible” method instantiations point to the same x86 code. So the shared code is the method code (e.g. myQueue.Enqueue(int)) that operates on a particular type, not the instances themselves (e.g. myQueue). They are still independent as expected.

Of course, native code sharing is provided at runtime by the JIT compiler and it reacts differently for value and reference type arguments. At runtime, when the JIT is fed a type argument that is a value type, it replaces the type parameters in the IL with the actual value type and then generates the native code. So your runtime generated type becomes a collection where the value types are actually contained in the collection. The first time the JIT compiler is fed a type argument that is a reference type, it replaces the type parameters in the IL with Object and then compiles that into native code. The native code sharing comes into play if the JIT is given a type argument for which it has already generated native code.

When the JIT is fed a type argument, that is a value type, for which it has already generated native code (an exact value type match), a reference to that native code is returned. It can do this because the JIT compiler keeps track of previously generated value-type-specific code. Note that native code sharing does not apply globally to value types in general. This means that all value types do not share a single implementation. Conversely, reference types do share a single implementation of the JIT generated native code. Since Object replaces the type argument in the IL code, this code can be shared for any further requests (b/c reference sizes are the same).

So you might be wondering why there is single implementation being shared for reference types, but not for value types. The reasoning for the reference type implementation is obvious. The size of a reference is always the same for any object and therefore the generated native code (that operates on the reference type) can be shared. Value types don't have the luxury of always being the same size. Even if two value types are the same size, you still can't share native code because the operations on different value types are not always the same. The only way to ensure that all types share a single implementation is to box/unbox all value types at runtime. The performance hit from this type of operation is detrimental to the performance of Generic types and therefore was intentionally left out of the .NET framework 2.0 implementation. But, if you have two constructed types whose type arguments are an exact value type match, they do share their method implementations at the native code leve. Here's a quick example:

using System;
using System.Text;
using System.Collections.Generic;

namespace WriteBetterCode 
{
    public class TestQueue
    {
        static void Main()
        {
            Queue<int> myQueue1 = new Queue<int>();
            Queue<int> myQueue2 = new Queue<int>();
            
            myQueue1.Enqueue(5);
            myQueue2.Enqueue(10);
            int queuedVal = myQueue1.Dequeue();
        }
    }
}

The two Queue variables above share method implementations generated by the JIT compiler (e.g. Enqueue(<int>)).

C++ templates, do not have any type of native code sharing. For example, if you instantiate 2 vector<int> objects, the method code that operates on a vector<int> object is generated separately twice and is a part of your program. This is the code bloat that I was talking about earlier. I'm sure you can imagine a scenario where your own STL applications could have a much smaller memory footprint if native code sharing was supported.

- Gilemonster

Labels:

posted by Gilemonster @ 6:25 PM, , links to this post