Persistence design details

Congrats on the Spritely Goblins v0.13.0 release! :slightly_smiling_face:

After reading the blog post and accompanying docs, I’m still a bit unsure about the design of the persistence system.

The blog post says:

Goblins’ persistence system takes a different approach by using a technique which qualifies as “manual persistence”, but in reality Goblins has also provided tooling which makes most actors “just work”

Goblins’ persistence system was also designed so that only the information which needs to be saved is saved.

Parts of this wording (the second quote in particular) make it sound like the persistence system is somehow trying to detect the minimal set of object state needed to efficiently save … but the docs and use of the word “manual” make it seem more like the object ultimately defines its own portrait, and that’s the data that is saved on every state change.

Thanks for everyone’s work on this project! If there are more detailed docs I should consult somewhere, please do pass them along.

3 Likes

Hello, welcome to the forum @jryans!

Those are great questions. You’re right that combined, that language is confusing, so I’ve updated it. It is indeed that the author of the object’s definition determines how it is stored to disk. The reason that Goblins’ persistence mostly “just works” is because define-actor’s default persistence approach is to do the default, which is store the arguments which are passed in at construction time. But the blogpost language wasn’t sufficiently clear about this! define-actor is a convenience, and even it can define different self-portraits with the #:portrait keyword or a different restoration procedure with the #:restore keyword. An object might need less than what was defined at construction time (eg certain arguments can be modified, used once, and/or possibly discarded during construction) but it also may need more if for instance during the constructor process the object constructs new data than what it was given. (Notably this latter pattern is used heavily in Terminal Phase and is yet under-documented… a blogpost to come soon will show off this and other patterns we have discovered too.)

I hope that helps answer your question!

3 Likes

The blog post didn’t say, but I assume the macro calls self-portrait at the end of every turn. That choice dramatically simplifies recovery from the failure of one node of a distributed application, because the most recent checkpoint of any node is guaranteed to be part of a globally consistent set. (See the paper by Chandy and Lamport to see why.)

I did some waterken programming, another system with persistence that works as I understand Goblins does. (Waterken implemented orthogonal persistence, and upgrades were possible in all but extreme cases.) I learned that there is a trade-off. You would like long turns to amortize the cost of the checkpoint, but that adds latency to messages to other vats. As a result, you find yourself spending time tuning the amount of work you do during each turn.

You might consider an option for not taking a checkpoint on turns that don’t send any messages. I don’t know how much that will help on a modern machine because I did my checkpoints to spinning disk. Still, it might be a useful experiment.

1 Like

You might consider an option for not taking a checkpoint on turns that don’t send any messages. I don’t know how much that will help on a modern machine because I did my checkpoints to spinning disk. Still, it might be a useful experiment.

I think if I understand what you’re suggesting, is more or less how Goblins works and what Christine was eluding to when she said:

Goblins’ persistence system was also designed so that only the information which needs to be saved is saved.

The define-actor macro isn’t responsible for calling self-portrait and object do not decide themselves when to persist, instead that’s left to Goblins. The define-actor macro is just attaching a self-portrait function onto the object and making it available to Goblins, by default this self-portrait function just reflects whats given to the objects constructor.

Vats (opposed to actormaps, which are driven manually) by default persist objects that have changed on churns (when the vat has reached quiescence). When objects change their state or behavior with bcom, they are committed to a transaction, at the end of the churn this transaction contains all the objects which have changed. Goblins is then able to then go through all the objects within the transaction and persist just those objects as it knows the others haven’t changed.

2 Likes

Chandy and Lamport showed that recovering from the failure of one node of a distributed application requires coordination with the other nodes. When you hold messages until the turn has been committed, you guarantee that the most recent checkpoint is part of a globally consistent state. That means you can recover a failed vat from its most recent checkpoint without coordinating.

Note that the key factor is whether or not a message has been sent, not whether any objects were changed. If a turn doesn’t send a message, then the last checkpoint in which a message was sent is guaranteed to be part of a globally consistent state. As a result, you don’t always have to checkpoint turns that don’t send messages. You only have to checkpoint such turns to avoid lost work. Whether skipping those checkpoints is worth doing is likely to be context dependent.