A Curmudgeon in Redmond

Using and abusing software since 1966

System.WeakReference: Solution in Search of a Problem

.NET has a WeakReference class, whose semantics appear to have been borrowed from Java. As nearly as I can tell, they're both useless.

A WeakReference is an object that holds a reference to another object. However, it holds a special type of reference, that doesn't prevent the target from being reclaimed by the garbage collector. At first blush, this seems perfect for implementing a cache -- as long as the system doesn't need the memory, we've got the object; should the system need the memory, the object goes away and we transparently re-create it next time we need it. Unfortunately, this doesn't quite work.

First, a cache is only interesting to create in the first place if two things are true:

  • The objects are big enough that we can't keep them in memory indefinitely, and
  • They're sufficiently expensive to create that we can't re-create them every time we need them.

We probably implement our cache using some sort of a dictionary or hash table that maps from the object identity to the cached value. In C# terms, a Dictionary<string,WeakReference>. Now, when a garbage collection occurs, the target object goes away, but the entries in the dictionary remain. Over time, we'll end up with lots and lots of entries in the dictionary, each holding a WeakReference whose target is gone. This leads to the next constraint:

  • Caching using a dictionary of WeakReference's is useful only when the total population of the data subject to caching is small.

Otherwise, the dictionary will explode with useless entries.

The next problem is that not only does a WeakReference not prevent an object from being garbage-collected, it doesn't even suggest that the object shouldn't be. So the next time a garbage collection comes around, all of your otherwise-unreferenced objects are reclaimed. So your cache doesn't really hold objects until the memory is needed; it only holds objects until the next garbage-collection. This implies:

  • Caching using WeakReference's isn't appropriate unless the objects are cheap enough to be thrown away (and later re-created) on every garbage collection.

I would assert that the set of problems that meet all of the above criteria is empty. After all, we just said:

  • The objects are big,
  • They're expensive to create,
  • There aren't very many of them, and
  • They're cheap to create.

Now, it's possible to get around the "not many of them" restriction with some very ugly code involving finalizers that remove entries from the hash table. (I doubt that it would be performant, but it is possible.) But I don't see how you can possibly reconcile "expensive to create" and "cheap to create".

So, has anyone out there ever successfully used Weak Reference's? If you used it for a cache, do you have measurements of how much it helped performance? If you used it for something else, what was it?

Comments

Paul said:

I believe WeakReference's do have some real benefits (at least in Java). Java has the concept of ReferenceQueue's which you can poll to determine which references have been reclaimed by the collector. This allows you to easily clean the Map much more efficiently than possible with finalizers. In fact java has a built in WeakHashMap which does basically what you describe (it maps keys to WeakReferences of the values). Whenever an  item is added to the Map, the ReferenceQueue is polled and any reclaimed references are removed from the map.

Personally, i've used WeakHashMap-like structures for caching objects that may be accessed from many threads, but may be somewhat expensive to construct (such as parsing bytes or strings). In this case, the first thread to access it occurs some expense to construct the object and populate the cache. Further threads just get the pre-parsed object from the cache.

I don't have absolute numbers for performance, but we have done extensive profiling of our application, and we don't see significant overhead from the ReferenceQueue.

Java also has "phantom" references which can be useful for tracking object lifetimes (to do things like ensuring resources have been released)

# August 12, 2008 1:27 AM

jim said:

Paul,

It does sound like Java managed to solve the cleanup problem, effectively eliminating the "not very many" constraint.

If you ever get around to measuring it, I'd love to know whether and to what extent the cache actually helped.

-- jimbo

# August 12, 2008 10:00 PM

Erik said:

"So the next time a garbage collection comes around, all of your otherwise-unreferenced objects are reclaimed."

- From the C# documentation: "A weak reference allows the garbage collector to collect an object while still allowing an application to access the object."

Unless the VM is implemented poorly, a garbage collector shouldn't collect weak references at each collection but only when the allocated memory becomes full or when it needs to free more memory for other objects.

- "They're sufficiently expensive to create that we can't re-create them every time we need them." If you cache large graphs of objects, then the cache becomes quickly worth it. In all the apps I worked, this was very common

# August 13, 2008 8:04 PM

jim said:

Erik,

"Unless the VM is implemented poory, a garbage collector shouldn't weak references at each collection..."

Every GC I've ever encountered decides what to collect, without considering whether there are any weak references. By your definition, I guess it means that they're all poorly implemented.

-- jimbo

# August 13, 2008 9:44 PM