As Raymond Chen so aptly explained, the purpose of garbage collection is to simulate a computer with an infinite amount of memory. This is another piece of the puzzle that gives application developers freedom through abstraction. Freedom to focus on the logic and structure of their code without being unnecessarily concerned about the constraints of the system it is running on, including managing physical and virtual memory. Unfortunately, this seems to result in some gross misuse of system resources.
So that you do not fall into the same fallacies and abuse, let’s learn some more about memory management in .NET. This article will be a higher level overview of Garbage Collection (GC) in version 4.5 of the .NET Framework. There will be more articles to follow that will go into more depth of some of the different parts of the .NET Garbage Collector.
Memory Management?
The Garbage Collector (GC) in .NET can be thought of as the subsystem that handles allocating and de-allocating managed memory. Ever used variables in c?
main() { char *name = malloc(50); doSomething(name); free(name); }
Or in c++?
main() { int* values = new int[10]; doSomething(values); delete [] values; }
With GC, now we don’t need to worry about freeing the memory we use.
static void Main() { var items = new List(); doSomething(items); //no need to "free" or "delete" }
So, that is GC. Use whatever variables you want, then forget about them. The GC will take care of it, right? Something doesn’t sound right about that… let’s keep going.
When Memory Is Allocated
Unless noted otherwise, when I talk about “memory”, I am referring to virtual memory. If I am referring to any other kind of memory, I’ll say so.
When a .NET application starts, the CLR initializes a managed heap for the application. This heap is shared throughout the entire process. Segments of memory are added to the heap by the CLR as needed, when virtual memory is requested that is larger than what is already available on the heap.
Here is a visualization of how virtual memory is allocated in the managed heap:
The application starts, and the CLR initializes a managed heap that consists of 1 segment
As the application runs, memory is allocated for more and more objects and variables.
When a request is made for memory, and there is not enough space in the current heap segments, then another one is reserved by the CLR
This continues as long as memory is being allocated for the application. If that is where it stopped, it would be hugely wasteful. Most of those variables are no longer reachable by the code and will never be used again, and the system is on a one-way track to running out of memory. Keep reading…
The Triggers for Garbage Collection
So, in a managed application, you don’t release any memory manually. As you are reading this you should be asking yourself, “so when does the GC run?” Good question! I’m glad you asked!
Many people and articles will tell you that the garbage collector is always running. This is not strictly true. As your application is running, and the threads are allocating memory, the CLR is always watching for certain situations that will trigger a garbage collection, but memory is not being collected constantly. Certain things will trigger the collector to actually reclaim some memory. When any of these situations are true, garbage collection will occur:
- The host system is low on virtual memory
- Allocated memory on the managed heap exceeds an “acceptable” threshold
- GC.Collect() is called
So, how often does garbage collection actually happen? I dunno. Laugh if you want, but that’s the real answer. When you hear about garbage collection being non-deterministic. This is what they are referring to. The application writer cannot determine when garbage collection may run. Sure, you can call GC.Collect() to force it to run, but it could run on its own without your knowledge.
The Collection Process
When GC is triggered, all user threads are suspended (read up on server, workstation, background and non-concurrent to learn when this may not be true), and then the garbage collector is given the reigns. Once the garbage collector has control, the process consists of 3 phases: Marking, Relocating and Compacting.
Marking Phase
The marking phase begins by collecting the list of all “garbage collection roots”. This includes all objects and references that are directly referenced by the process and app domains. Things like static items, globals, finalizer queue and call stack local variables and parameters.
From there, the GC walks all references from all of the GC roots. Every item it finds on the managed heap gets added to a list of “live” objects. When an object is processed for references, it is skipped every time it is encountered again. This allows circular memory references in objects, and will not interrupt or slow down the GC.
Relocating and Compacting Phases
In order to simplify this post and hopefully reduce some confusion, I am going to describe the relocating and compacting phases together. For my intentions here, there is no need to discuss them separately.
The purpose of these phases is to reclaim memory and maintain (or improve) the performance of the garbage collector. It does this by scanning the heap for unused memory (based on the new graph of “live” objects), freeing unused memory, moving live objects closer together, and moving survivors towards or into older memory segments.
Since I am a visual thinker, here is the basic process with pictures…
All of the “dead” objects are freed.
All remaining objects are moved together to maximize memory availability.
Now is the time to note that the Large Object Heap does not get compacted. Large objects are large and should be longer-lived, which means there is less value in compacting them. Another reason is that large objects are more expensive to move, so it would adversely affect the performance of the GC.
Next Steps
This article is just an overview of the .NET garbage collection process. This will essentially serve as a springboard for my next series of posts, where I will dive into more depth on various aspects of GC. My ultimate goal is to help you produce applications that are responsible consumers of system resources.
Very much informative and thought proviking article.
Thanks for sharing with us.
But I still have a question. When does the GC run? How many times in a day does it run? Can this we administered using any tool?
There’s no set time that GC will run, and there’s no tool to control it the way you are asking.
While you can’t (and normally shouldn’t) try to control when GC runs, there are some minimal configuration options you can set that affect its behavior:
gcAllowVeryLargeObjects: http://msdn.microsoft.com/en-us/library/hh285054(v=vs.110).aspx
gcConcurrent: http://msdn.microsoft.com/en-us/library/yhwwzef8(v=vs.110).aspx
GCCpuGroup: http://msdn.microsoft.com/en-us/library/hh925566(v=vs.110).aspx
gcServer: http://msdn.microsoft.com/en-us/library/ms229357(v=vs.110).aspx
There are very few situations where you will want to change the normal behavior of the GC. Why do you need to control when it runs?
Thanks Patrick for your reply.
The reason why I need to control GC is that on one of the Production server, the environment gets hanged 4 to 5 times a day, and it remains hanged for around 2 minutes. None of the users can do anything. They just have to wait for 2 minutes and re login again.
That may or may not be a GC problem. Here are some troubleshooting tips and tools that might help:
ASP.NET Performance Troubleshooting: http://msdn.microsoft.com/en-us/library/bb398859(v=vs.100).ASPX
Garbage Collection Notifications: http://msdn.microsoft.com/en-us/library/cc713687.aspx
more: http://www.abhisheksur.com/2010/08/garbage-collection-notifications-in-net.html
An older blog post about GC blocking threads: http://blogs.msdn.com/b/abhinaba/archive/2009/09/02/netcf-gc-and-thread-blocking.aspx
Comments are closed.