Start with some rough design diagrams:
Sketch a rough system design and sequence diagram to clarify roles and interactions as we had done above.
They don’t need to be precise or complete - this reduces cognitive load and helps you focus on the modeling process.
Modelling
Identify the roles:
As shown in the sequence diagram, there are two roles - Coordinator and Participant.
Identify the state for each role:
Each Coordinator and Participant has its own status variable to track the state of the transaction independently.
Define and initialize the system:
Create instances of Coordinator and Participant roles to represent the transaction system.
Identify the RPCs for each role:
The Participant exposes two RPCs - Prepare (Phase 1) and Finalize (Phase 2).
RPCs are functions called between different roles.
In this case, Prepare and Finalize are RPCs because the Coordinator calls them on the Participant.
Identify the actions for each role:
Actions are similar to functions but do not specify a caller. They represent external triggers such as user requests, timers, or client API calls.
The Coordinator has an action Commit, which starts the two-phase commit protocol.
Implement the RPCs and actions.
This is the core of the design specification-the pseudo-code part of the spec. The implementation is written in a Python-compatible Starlark-like language, defining the logic for each role.
You can introduce helper functions, enforce preconditions, and model non-determinism where needed.
Identify durable/ephemeral states:
In this implementation, the status variables of both Coordinator and Participant are durable. Since durability is the default, no additional handling is needed.
Validation
8 Identify the safety properties:
The critical safety property is: if any `Participant` commits the transaction, no other `Participant` must have aborted (and vice versa).
Identify the liveness properties:
In the basic implementation, there are no liveness guarantees. However, we will later extend the protocol to ensure that no transaction remains indefinitely unresolved; every started transaction must eventually be committed or aborted.
Modelling
Roles
If you looked at the sequence diagram, you can see there are two roles or microservices or process types - Coordinator and Participant.
Roles are defined using the role keyword and the syntax is similar to a class declaration in python.
roleCoordinator:# ...roleParticipant:# ...
States
The state variables for each role is defined within Init action. To reference the state variable status, we will
use python like syntax self.status.
Run the following code in the playgroud.
“Run in playground” will open the snippet in the playground.
Then, click “Enable whiteboard” checkbox and click Run.
The screenshot of the playground highlighting the Enable whiteboard and Run steps
---
# Disable deadlock detection for now. We will address it later.deadlock_detection: false---
role Coordinator:
action Init:
self.status ="init"role Participant:
action Init:
self.status ="init"NUM_PARTICIPANTS=2action Init:
coordinator= Coordinator()participants=[]for i in range(NUM_PARTICIPANTS):
participants.append(Participant())
When you run it, you’ll see an Explorer link. Click you’ll see an explorer.
The screenshot of the playground highlighting the Explorer link to open the Explorer view
At present, it will only show the init state.
As we start filling in the details, you’ll see more interactive exploration.
RPCs are just functions, with the only difference being these will typically be called from another role. That is,
inter-role function calls are treated as RPCs, whereas intra-role function calls are treated as just helper functions.
Functions are defined using func keyword (As opposed to def in python and the self parameter is implicit)
roleParticipant:actionInit:self.status="init"funcPrepare():pass# To be filled in laterfuncFinalize():pass# To be filled in later
Actions
Actions are defined using action keyword.
roleCoordinator:actionInit:self.status="init"actionCommit:pass# To be filled in later
Implement RPCs and Actions
Participant
The Participant exposes two RPCs:
Prepare (Phase 1): The participant decides whether it can commit the transaction.
Finalize (Phase 2): The participant records the final decision.
Prepare Phase
On Prepare, a participant can either prepare to commit or abort the transaction. While real-world implementations have specific reasons for aborting, our model only needs to capture whether the decision is aborted or prepared, not why.
To express this, we use non-determinism with the any keyword:
vote=any["prepared","aborted"]
This means the vote can be either "prepared" or "aborted", and the model checker will explore both cases to ensure correctness.
Once the vote is determined, the participant stores the decision in self.status and returns the vote.
Finalize Phase
The Finalize function takes a single parameter - the final transaction decision. This decision is persisted in self.status to reflect whether the transaction was ultimately committed or aborted.
The Coordinator has a Commit action that runs the Two-Phase Commit protocol.
To ensure the protocol starts only once, we add a precondition to trigger the action only when the status is "init".
When triggered, the status is set to "working".
action Commit:
- pass # To be filled in later
+ require(self.status == "init")
+ self.status = "working"
Next, the coordinator iterates over the participants’ list. For each participant, it calls Prepare.
If any participant votes "aborted", the coordinator sets the status to "aborted", triggers the second phase, and exits.
If all participants accept, it sets the status to "committed".
We now have a working description of what we want to design. Let us explore the design in the UI.
Run the following code in the playgroud.
“Run in playground” will open the snippet in the playground.
Then, click “Enable whiteboard” checkbox and click Run.
The screenshot of the playground highlighting the Enable whiteboard and Run steps
---
# Disable deadlock detection for now. We will address it later.deadlock_detection: false---
role Coordinator:
action Init:
self.status ="init" action Commit:
require(self.status =="init") self.status ="working"for p in participants:
vote= p.Prepare()ifvote=="aborted":
self.finalize("aborted")return self.finalize("committed") func finalize(decision):
self.status = decision
for p in participants:
p.Finalize(decision)role Participant:
action Init:
self.state ="init" func Prepare():
vote= any ["prepared", "aborted"] self.status = vote
return vote
func Finalize(decision):
self.status = decision
NUM_PARTICIPANTS=2action Init:
coordinator= Coordinator()participants=[]for i in range(NUM_PARTICIPANTS):
participants.append(Participant())
You can see the state diagram and explore how the state space can change. Additionally, you may notice how faults, like a thread crash, are simulated.
FizzBee automatically simulates various types of faults to help model your system under these scenarios.
State graph
In the state diagram, you’ll notice that when a thread crashes, the coordinator remains in the “working” state. However, there is no outgoing transition from this state, which implies a potential deadlock.
To allow for exploration of more interesting scenarios, we have temporarily disabled deadlock detection in this specification.
State diagram illustrating thread crash modeling and resulting deadlock
Now, open the explorer. On the left, you’ll see the button labeled "Coordinator#0.Commit", corresponding to the only action defined in the model.
Explorer action button for Coordinator#0.Commit
Clicking it will change the Coordinator’s state to "working", which is the first statement in the Commit action.
Explorer view showing Coordinator#0.state = working
On the left, you’ll see two buttons.
Explorer showing action buttons: ‘action-1’ or ‘crash’
Clicking “action-1” will proceed with the thread, initiating the Prepare RPC to Participant#0.
Explorer view showing Participant#0 status options
The first step for the Participant is choosing to either prepare or abort the transaction. Selecting "Any:prepared" shows the prepared path, changing Participant#0.status to "prepared" and returning.
Explorer view showing Participant#0.status = prepared
You can explore the next steps and generate various sequence diagrams.
All Committed
Explorer view showing all states committed
Aborted
Explorer view showing transaction aborted
Crashed
Explorer view showing transaction crashed after 1st prepare
In this state, you’ll see the participants remain stuck in "prepared" state.
Explorer view showing transaction crashed after 1st prepare
Validation
While you can use FizzBee to model your system, explore and simulate various scenarios, and look at how the state variables change,
it would be nicer to add some assertions to the state.
This is called model checking.
Safety invariant
The critical invariant is, to confirm there is no state where one participant committed but another participant aborted.
While the safety property shows ParticipantsConsistent, there is no state where one participant committed but another one aborted.
However, this could be easily achieved with not having any actions (for example, comment out the Commit action) or just
trivially aborting all transactions. One thing we would want to verify is, is there at least one state were every participant committed.
Note, this is not an LTL property. This is part of the CTL logic equivalent to EF p where p is the predicate
Extension: Timeout
In the current system, if the process crashes in between, the participants that prepared the transaction would be
stuck in that state forever. Ideally, we would want all the participants to either commit or abort.
Since commit cannot be guaranteed, one way to force convergence is to use a timeout.
role Coordinator:
action Init:
self.status ="init" action Commit:
require(self.status =="init") self.status ="working"for p in participants:
vote= p.Prepare()ifvote=="aborted":
self.finalize("aborted")return self.finalize("committed") func finalize(decision):
self.status = decision
for p in participants:
p.Finalize(decision) action Timeout:
self.finalize("aborted")role Participant:
action Init:
self.status ="init" func Prepare():
vote= any ["prepared", "aborted"] self.status = vote
return vote
func Finalize(decision):
self.status = decision
NUM_PARTICIPANTS=2action Init:
coordinator= Coordinator()participants=[]for i in range(NUM_PARTICIPANTS):
participants.append(Participant())always assertion ParticipantsConsistent:
for p1 in participants:
for p2 in participants:
if p1.status =='committed' and p2.status =='aborted':
return False
return True
exists assertion AllParticipantsCommitted:
for p1 in participants:
if p1.status !='committed':
return False
return True
eventually always assertion CoordinatorTerminated:
return coordinator.status !='working'eventually always assertion AllParticipantsTerminated:
for p1 in participants:
if p1.status =='prepared':
return False
return True
You will see the following error trace.
Error trace showing a timeout occurring after all participants have committed
The issue here is that even after all participants have committed, the timeout action can still be triggered, causing the transaction to start aborting.
You can see this behavior in the error details view:
Error details showing how a timeout is incorrectly triggered after commit
And it is also reflected in the sequence diagram:
Sequence diagram illustrating the timeout incorrectly triggering after commit
Timeout attempt 2: Abort transaction if not committed
Add a precondition that only transaction that is not committed must be aborted.
actionTimeout:require(self.status!="committed")# <--- This is the only changed line hereself.finalize("aborted")
While the previous error is fixed, this scenario reveals a different issue.
Error trace showing a timeout occurring before Phase 2, leading to inconsistency
Once both participants have prepared but before the coordinator receives the last participant’s response, if the timeout handler is triggered, it will decide to abort the transaction. Then, if the context switches back to the commit thread, it will proceed with Phase 2 and commit the transaction.
Once that is done, the timeout handler can still continue executing the abort path, leading to an inconsistent state. The error details view clearly highlights this context switching issue.
Error details highlighting how a timeout before Phase 2 leads to an inconsistent state
Timeout attempt 3: Finalize only in working status
funcfinalize(decision):require(self.status=="working")# <-- This is the only changed lineself.status=decisionforpinparticipants:p.Finalize(decision)
When you run it, it will show a new error.
Error details highlighting how a timeout before Phase 2 leads to an inconsistent state
This time it is a liveness error caused by stuttering. The fix is simply
to mark the timeout action to be fair.
@@ -25,7 +25,7 @@
for p in participants:
p.Finalize(decision)
- action Timeout:
+ fair action Timeout:
require(self.status != "committed")
self.finalize("aborted")
This will now show a different error.
Error details highlighting how a timeout before Phase 2 leads to an inconsistent state
Timeout attempt 4: Allow timeout to retry
Allow the finalize if the decision is the same as the previously decided value.
@@ -20,7 +20,7 @@
self.finalize("committed")
func finalize(decision):
- require(self.status == "working")
+ require(self.status in ["working", decision])
self.status = decision
for p in participants:
p.Finalize(decision)
This will fix that, but show the scenario where the thread crashed after coordinator decided to commit.
Error details highlighting how a timeout before Phase 2 leads to an inconsistent state
Timeout attempt 5: Allow timeout handler to retry commit as well
If you notice, the FizzBee code is more concise and closer to pseudocode.
In addition to being concise, FizzBee does exhaustive model checking. Whereas, P does not.
It explores various paths heuristically. That means, P cannot verify the correctness of the system, but FizzBee can.
Take a look at the readability of the FizzBee spec vs the TLA+ spec, making FizzBee a viable
alternative to TLA+ and the easiest formal methods language so far.