[Someone please supply a link/text describing the exact form of this conundrum, which I will happily replace this text with.]
Message ordering is an unsolvable problem on existing networks for all intents and purposes. In my previous company we used to say that switches are DReDDful. They drop, reorder, delay, and duplicate packets. Getting exactly once delivery of messages is virtually impossible. Guaranteeing in-order delivery requires unbounded buffer space.
TCP does a wonderful job of hiding all the network problems, until it doesn’t. When it doesn’t, e.g., when a connection is dropped, an unknowable amount of data is lost. The result is that applications are expected to deal with any and all network issues, the so-called end-to-end principle. The result is a lot of code (we were told some 50,000 lines in FoundationDB) that mostly works most of the time. Making applications resilient to the failures is going to be important.
Perhaps you meant to include a in that? We are all transmitting/receiving millions of ordered messages every day.
Yes, this is the real problem. What does ordered messaging look like in a universe of promise pipelining? And what do we mean by resilience in our world…
I probably should have, but there’s a problem that happens about 0.01% of the time on Facebook, which is probably a million times a day. (I can’t find the reference to the paper reporting it.) The paper said that the problem was so rare and the consequences so minor that they were not going to attempt a fix.
- Alice goes into the hospital.
- Bob gets the information and says, “That’s too bad.”
- Alice gets out of the hospital.
- Bob gets the information and says, “That’s great.”
- Carol gets the first notice and says, “That’s too bad.”
Everybody is mad at Carol because she’s not happy that Alice got out of the hospital.
A similar problem has happened with my email. One of the messages in a thread ended up in my spam folder, leaving me very confused about what was going on.
I expect these “rare” situations to be more important in virtual worlds than in enterprise applications, but I think it’s something we need to consider. Simply reporting a lost or delayed message may avoid a lot of confusion.
Romeo and Juliet got their messages misordered. Tragedy.
There’s enough insiders to the conversation here that it’s worth bringing up the E-Order description.
- Having message order be right on a single session between two parties is relatively easy given contemporary abstractions. (Yes, I know that @alanhkarp has stared deeply into this abyss and found it staring back into him, but in general “ordering problems over TCP+TLS” isn’t something I find myself stressing about.)
- However, message order amongst three or more parties is a fraught topic. We have E-Order in Goblins but really I think even that is misleading: if any of those objects are proxies (perhaps even as promises!) for other objects, E-Order can fall apart. More useful conversation on this cap-talk thread.
And this is all assuming a captp-like distributed opaque mutually suspicious object system. When you need a single machine with a single order of events, eg to solve the double spending problem, that’s where people tend to introduce blockchains. But people tend to over-introduce blockchains where captp-like systems might do as well or better.
Come to think of it, that needs its own thread…
For any of the message orders being considered for these distributed ocap systems, promise pipelining can be introduced without hurting the ordering. But the ordering sometimes does constrain the pipelining, and often does around the edges. Decide what ordering you’d like, and then just make sure you don’t break it when you add pipelining.
Lately I’ve come to favor Tyler’s point-to-point fifo ordering over e-order, given appropriate affordances and conventions to recover e-order, or something e-order-like, at the user level.
Prior to the “Lost Resolution Bug”, E-Order appears to be something
delivered “for free”, falling out of the implementation naturally. We
can jump up and down and say “look at this thing we got at no extra
This is indeed one of the considerations leading me to retreat to Tyler’s Waterken point-to-point fifo.
Even after that retreat, there’s a related problem we’ve internally at Agoric been calling the “Auxiliary Data Problem” for which we have several adequate but complex paper designs.
I have in progress a cheap approximation at fix: cheap auxdata by erights · Pull Request #6355 · Agoric/agoric-sdk · GitHub that I owe everyone a writeup about. Might as well mention it here so I’ll owe it to y’all as well
Okay, interesting to know that you’ve ended up there also!
I would love a writeup about the solution, but even more importantly, the problem! What does the “Auxillary Data Problem” mean? Perhaps it deserves its own thread!