Every object has a unique OID, so I just borrowed the concept of a UUID (Universally Unique ID) from a draft RFC Microsoft submitted to the IETF a year or so ago (yes, I did just use the terms "RFC" and "Microsoft" in the same phrase) -- though I modified Goo's UUID/OID to remove Windows-centric timestamp behavior and make some of the math easier for me. An OID is roughly based on the machine's ethernet address, the current time in milliseconds, and a counter.
Since we're not using proper UUID's, there's no reason not to redefine the OID completely and scrap the use of UUIDs. Therefore Greg suggested creating a GWID (Goo World ID) which is unique to each machine running Goo. This would likely be a 32 or 64-bit hash of the ethernet address, current time, and some other machine-dependent info. It's important that the GWID is unique for each machine.
The OID can then be redefined to be composed of the GWID, timestamp, and counter. With a proper table of (hostname, GWID) pairs, you could then always identify the creator of an object: extract the GWID from the OID, and lookup the hostname for that GWID. The OID would of course still be unique.
However, NPCs that wander over the entire gameboard, and objects carried by players who are wandering across machines -- these things will end up "living" on a machine that didn't create them. For example, an NPC that enters a portal and moves from Bob's gameboard to Jim's gameboard will end up being serialized, and transferred from Bob's goo engine to Jim's goo engine.
When this happens, the "creator" machine for the object will record the "forwarding address" (GWID) -- it will be responsible for keeping track of where its objects have wandered off to. When we want to find an object, we can extract the GWID of the creator from the OID of the object and contact that host. The creator host will then direct us to the current owner, and we can talk to the owner.
We should keep a rather long-life cache of this information, so that we're not always bugging the creator host when it's not currently owning an object -- if the object moves, and we talk to the old owner, it will just shrug and say "I don't own that object". In that case, we clear the cache entry, and go back to the creator host and ask for the owner once again. Naturally, when an object is moved from one machine to another, the creator host must be updated about who's holding it.
If the creator machine is down, but some other machine currently owns it, the object is reachable only to people who have the owner information cached. For everyone else, the object has effectively popped out of existence. Attempts to reach the object will timeout (or instantly fail). In this case, the goo engine should cache the object in a "blacklist" with exponential backoff. For example, after the first failed attempt, blacklist the object for 5 minutes. If the object is accessed by goo code after 5 minutes, try again, and if it fails again, blacklist for 10 minutes, and so on. Eventually the object will return or the blacklist timeout will reach some magic length (a week?) where we are effectively not even checking anymore -- the object is dead.
When goo code tries to access an object that can't be reached in this way, we call it AWOL (from US Army term: Absent Without Leave). After the timeout, the goo engine will trigger an exception (ObjectAWOL) on the calling code. Smart code can catch this exception and either continue normally, or take evasive action. If the exception isn't caught, the goo code snippet will exit and the player will see a rude error message.
If the creator machine is alive and well, but the owner machine is down, a similar situation occurs. We get the forwarding address from the creator, but have trouble contacting the owner. Pretty much everything else happens as described above. The difference here is that if the owner machine is down, nobody can access the object. If only the creator machine is down, some people can access the object while it's still in their cache, but most people can't. Eventually the GWID of the creator will disappear from the (hostname, GWID) table, and the owner machine should do a periodic scan of the database to extinguish any orphaned objects like that -- this check wouldn't need to be done very often, and you might be able to get away with forgetting to do it at all, though you'd have a chance of collecting unusable cruft objects.
Classes are a weird case. Whenever an object is owned by a machine, that machine will need to have a copy of the class. Possibly it can just make a copy of the class, and then periodically check the creator machine to see if it needs to get an update of the class definition. The authoritative class definition should always remain on the creator machine for that class, with other machines just using temporary copies -- that is, class definitions don't move around like objects. If the creator machine goes down, each machine that owns an object of that class will still have a copy of the class definition, so things should be able to hobble along. (Class definitions have an OID just like objects, and that OID will of course have the GWID in it also.)
Greg had the idea that you could migrate players, and then just recursively migrate any objects that the player had references to -- but you run into issues of when to stop. For example, Robey and Scott might have walkie-talkies that they use to communicate with each other. When Robey walks to a different host, his walkie-talkie goes with him. But his walkie-talkie has a reference to Scott's, and moving Scott's walkie-talkie doesn't make much sense. Other cases get even more complex in an ever-downward spiral.
Greg's II proposal is "lazy migration". As players and NPCs move around, no objects migrate at first. But as I keep using my flashlight, it eventually behooves the flashlight's owner to migrate it over to my host. The easiest threshold I could think of is N consecutive RPC calls to the object (easy for goo to track, since any RPC call from another host [or the local host] would reset the count). Note that this means some objects might never migrate, either because the player/NPC isn't using it much, or because other pieces of the game are using it just as much. For example, my walkie-talkie might never migrate if Scott talks to me about as often as I talk to him, since Scott's host will be referring to my walkie-talkie whenever Scott talks, and my host will be referring to it whenever I talk.
Object migration should definitely be initiated by the game (not by the goo engine), so the game can use a little bit of "smarts" or "sanity checking" on object migration. It could, at the very least, make sure that players and rooms never migrated unless it wanted to, and it could make sure objects in a player's possession (or a room's possession) were migrated to that player or room (if we think that's a good idea). The ambiguous cases could be handled through the lazy migration scheme.
Since the game should be able to track down all local-host objects by traversing its map of rooms and player list, a traversal-type garbage collector should mostly work. It will easily detect objects that are obviously still in use, and we can set those aside as "in use" and concentrate on the rest -- the "questionable" pile.
The "questionable" pile is questionable because of the corkscrew that a distributed game adds: Two objects might appear to be a closed loop, but they may be referenced by (living) objects on another host. So we need another phase to figure this out. I think this can be done with a quadratic algorithm that sort of linearly searches each item in the "questionable" pile and determines how many inter-references there are. For example, if an object has a refcount of 3, but there are only 2 objects in the "questionable" pile that have references to it, then one reference is from off-site. We mark these objects as "nimby" (explained below) and traverse their references, moving those chains over to the "in use" pile. Anything left over is a local closed-loop (all refcounts are accounted for locally) and can be discarded.
"Nimby" objects are thus objects that are "in use" by some other host, but not used by anything locally. Therefore the local host has no need for them, and wants to get rid of them as soon as possible. It has no way of knowing which remote host holds a reference to this object chain, but the next time an RPC comes in for one of these objects, goo will see that it's a nimby object, and it will proceed to evict the object chain (ie, that object, and a recusive descent of all references), and migrate it to the remote host that sent the RPC. That's why they're called "nimby" -- they're marked for eviction as soon as we can find a victim to migrate them to, because we definitely don't want them.
[ In the 80s and 90s in America, garbage disposal became a big problem. Dumps and landfills (and toxic waste dumps) were rapidly filling up, but it was hard to open new ones, because nobody wanted a bunch of trash dumped in their city/state/etc -- "Not In My BackYard" was used to refer to this game of trash hot potato, later abbreviated to NIMBY. ]A possible loophole is that you could have a closed loop (objects that only refer to each other, but aren't referred to by anything active in the game) that spans more than one host. Maybe an object on A that has a reference to (and is referred to by) an object on B. They each have a refcount of 1, with no local references, meaning they appear to be "in use" remotely, so they're each marked nimby -- but they'll never come into play. In this case, the garbage collector should do one final step. If it's doing garbage collection, and it finds a nimby object which is already marked as a nimby then it should decide (based on a random number) whether to do a no-op RPC on any objects that the nimby refers to. If it decides to, that RPC may trigger remote nimby objects to be evicted and migrated over to the local host, and on the next round of garbage collection, the whole thing will appear as a local closed loop, and get thrown away.