Strategies for Distributed Garbage Collection
The design of garbage collectors must trade programming
simplicity with features as:
- Overhead in network traffic.
- Fault tolerance.
- Support of object migration.
- Liveness vs. efficiency.
Reference counting
- Like Birrel's DGC, but concrete objects keep a counter of the remote
processes referencing that object.
- Dirty calls increments the counter and clean calls decrement it.
- When the counter reaches zero, the object can be recycled (if not
referenced locally).
- Generates lot of packet traffic on the network.
- Not fault tolerant: if a process dies, the object will never
be collected.
Weighted reference count
- Use weights associated with remote references.
- The initial weight for the concrete objects is a constant value N.
- Sending a reference from P to Q divides the reference weight stored
for P and Q.
- The sum of all weights is N.
- The clean call return the weight of the deleted reference to the
concrete object.
- The concrete object may be recycled when its weight is N.
- No dirty calls.
Advantage: less network traffic.
Disadvantages:
- What if the weight of a reference reaches 1?
- Not fault tolerant: what if a process dies?
Indirect Reference Count (Piquer)
- Like reference counting but the stubs (object descriptors) keeps
also a counter.
- Each stub keeps a field indicating from wich process was received
the reference the first time (its father).
- Each time the owner of o sends a reference to Q, P increments
the counter stored in o.
- Each time a process Q keeping a stub r for o, sends a reference to
R, Q increments the counter stored in r.
- When a process Q' detects that a stub r is no more reachable locally
and its counter is 0, it sends a clean call to its father and
recycles the stub.
- When the owner detects that the counter is 0, it may recycle the
object.
Advantages:
- No dirty calls: less network traffic.
- No weight underflow.
- Supports object migration.
Disadvantages:
Reference Sets (Shapiro)
- Like IRC but instead of keeping counters, keeps reference sets
of spaces.
- Processes in the reference sets can be contacted to ask them
if they still keep the reference.
- If a process seems to be crashed, it is removed from the client set.
- When a process P, has executed a remote invocation r.M(), the owner
of the object referenced by r becomes the father of r. Q sends a clean
call to its old father.
Advantage: fault tolerant.
Birrels Garbage Collector
- Shapiro's DGC, but simplifies programming by eliminating the tree
structure.
- Reintroduces dirty calls (synchronous).
- Clean calls are asynchronous.
Disadvantage:
- A process Q receiving an array of remote references will
send dirty calls sequentially, delaying a remote invocations by
several times the latency of the network.
Distributed Mark & Sweep
- Based on the concurrent mark & sweep of Dijkstra.
- The number of computers that can be implied in the marking phase is not
bound (can expands to all the computers in the Internet).
- Termination detection of the marking phase is hard.
- Distributed garbage collection is very tied with the local gargabe
collector.
Conclusions
- Indirect garbage collection is more efficient avoiding the
latency of the network.
- Reference sets are mandatory to implement fault tolerance.
- The simplicity of Birrel's DGC explain its success.
- You don't need to implement complete new languages to proof
new ideas.
- Future systems could mix (i) reference sets for quick collection
of non cyclic data structures, and (ii) unfrequent mark & sweep
to collect cycles.
- New programming languages should consider distributed objects
from their origins.