Probabilistic Modeling
In this tutorial, we will introduce the basics of probabilistic modeling with FizzBee.
When building distributed systems, behavioral correctness is only one part of the story. Performance, reliability, and security are also important. Probabilistic modeling is a powerful tool for reasoning about these properties. It allows us to reason about the likelihood of certain events happening, such as a message being lost or a node failing or error rate of an api call and so on.
Also, probabilistic modeling is a prerequisite for performance modeling like expected latency, throughput, availability SLAs and so on, that we will learn in the next tutorial.
If you need to reason about performance of a system, you need to reason about probability. In this tutorial, we will see probabilistic modeling, and in the next one, we will see how to model performance.
This tutorial assumes, you have gone through at least the FizzBee getting started tutorial.
PRISM model checker is the state of the art for probabilistic model checking. Unfortunately, the language is hard to use, and it is not suitable for large state spaces. Unlike PRISM, FizzBee uses Python-like language that is incredibly easy for anyone to quickly get started, making it a powerful yet easier to use alternative to PRISM.
In this page, we will point to PRISM’s examples to show how FizzBee is easier to use.
As of now, the online playground only supports the basic behavioral model checking. To use the probabilistic modeling, you need to use the CLI.
See: https://github.com/fizzbee-io/fizzbee for installation instructions.
Create a folder for this example. Let’s call it simple_coin
.
samples/simple-coin/CoinToss.fizz
:
action Toss:
oneof:
return "head"
return "tail"
Since we just want to limit to a single coin toss, the easiest way to do is, say that in fizz.yaml file.
samples/simple-coin/fizz.yaml
:
options:
maxActions: 1
Now, run the model checker. Assuming, you have fizzbee installed, and fizz is in your path, you can run the following command.
fizz samples/simple-coin/CoinToss.fizz
You should see something like.
Model checking samples/simple-coin/CoinToss.json
configFileName: samples/simple-coin/fizz.yaml
StateSpaceOptions: options:{max_actions:1 max_concurrent_actions:1}
Nodes: 4, elapsed: 78.25µs
Time taken for model checking: 83.209µs
Writen graph dotfile: samples/simple-coin/out/run_2024-05-07_23-46-17/graph.dot
To generate svg, run:
dot -Tsvg samples/simple-coin/out/run_2024-05-07_23-46-17/graph.dot -o graph.svg && open graph.svg
Max Depth 2
PASSED: Model checker completed successfully
Writen 1 node files and 1 link files to dir fizzbee/simple-coin/out/run_2024-05-07_23-46-17
What the code says, is Toss action can return either “head” or “tail”. At present, we haven’t specified the probability of each outcome. If not specified, fizzbee assumes equal probability for each branch.
The CLI takes three parameters, we will use the third one later.
–states is the directory where the states are stored. This will be printed when running fizz
command.
–source is the source file, but use the json path, that is also generated by the fizz
command.
bazel-bin/performance/performance_bin \
--states samples/simple-coin/out/run_2024-05-07_23-46-17/ \
--source samples/simple-coin/CoinToss.json
##
## Ignore the debug logs
##
2: 0.50000000 state: {} / returns: {"Toss":"\"head\""}
3: 0.50000000 state: {} / returns: {"Toss":"\"tail\""}
Starting from the root node, if you follow the graph, taking each branch with equal probability, the these are the final states, and they both have equal probability.
To define the probability, we need a way to identify the choice. For this, we FizzBee has labels. The labels are specified with a backtick (`).
action Toss:
oneof:
`head` return "head"
`tail` return "tail"
Rerun the model checker,
./fizz samples/simple-coin/CoinToss.fizz
Model checking samples/simple-coin/CoinToss.json
configFileName: samples/simple-coin/fizz.yaml
StateSpaceOptions: options:{max_actions:1 max_concurrent_actions:1}
Nodes: 4, elapsed: 454µs
Time taken for model checking: 467.5µs
Writen graph dotfile: samples/simple-coin/out/run_2024-05-08_00-05-55/graph.dot
To generate svg, run:
dot -Tsvg samples/simple-coin/out/run_2024-05-08_00-05-55/graph.dot -o graph.svg && open graph.svg
Max Depth 2
PASSED: Model checker completed successfully
Writen 1 node files and 1 link files to dir samples/simple-coin/out/run_2024-05-08_00-05-55
open the graph.svg file, and you will see the labels.
Now, that we have the labels, we can specify the probabilities. For that, create another yaml file.
samples/simple-coin/perf_model_unbiased.yaml
:
configs:
Toss.head:
probability: 0.5
Toss.tail:
probability: 0.5
Note: The labels are namespaced to the action/function names. So, to refer to head, use Toss.head
.
Now, run the performance model checker with the new yaml file.
bazel-bin/performance/performance_bin \
--states samples/simple-coin/out/run_2024-05-08_00-05-55/ \
--source samples/simple-coin/CoinToss.json \
--perf samples/simple-coin/perf_model_unbiased.yaml
##
## Ignore the debug logs
##
2: 0.50000000 state: {} / returns: {"Toss":"\"head\""}
3: 0.50000000 state: {} / returns: {"Toss":"\"tail\""}
Of course, nothing exciting yet. Now, lets specify a biased coin.
samples/simple-coin/perf_model_biased.yaml
:
configs:
Toss.head:
probability: 0.9
Toss.tail:
probability: 0.1
Now, run the performance model checker with the new yaml file.
bazel-bin/performance/performance_bin \
--states samples/simple-coin/out/run_2024-05-08_00-05-55/ \
--source samples/simple-coin/CoinToss.json \
--perf samples/simple-coin/perf_model_unbiased.yaml
##
## Ignore the debug logs
##
2: 0.90000000 state: {} / returns: {"Toss":"\"head\""}
3: 0.10000000 state: {} / returns: {"Toss":"\"tail\""}
So far, nothing much exciting. But, let us build a couple of bigger examples.
This is a classic first example in PRISM as well. https://www.prismmodelchecker.org/tutorial/die.php
The algorithm is simple. Toss the coins three times. Assuming {0,1}, with 3 tosses, you get 8 options. {000, 001, 010, 011, 100, 101, 110, 111} corresponding to 0-7. So, ignore 000 and 111, and map the rest to 1-6.
samples/die/Die.fizz
:
atomic func Toss():
oneof:
`head` return 0
`tail` return 1
atomic action Roll:
toss0 = Toss()
while True:
toss1 = Toss()
toss2 = Toss()
if (toss0 != toss1 or toss0 != toss2):
return 4 * toss0 + 2 * toss1 + toss2
Here also we are simulating a single roll, so limit the actions to 1.
samples/die/fizz.yaml
:
options:
maxActions: 1
Now, run the model checker.
fizz samples/die/Die.fizz
Model checking samples/die/Die.json
configFileName: samples/die/fizz.yaml
StateSpaceOptions: options:{max_actions:1 max_concurrent_actions:1}
Nodes: 14, elapsed: 883.875µs
Time taken for model checking: 894.292µs
Writen graph dotfile: samples/die/out/run_2024-05-08_00-29-07/graph.dot
To generate svg, run:
dot -Tsvg samples/die/out/run_2024-05-08_00-29-07/graph.dot -o graph.svg && open graph.svg
Max Depth 4
PASSED: Model checker completed successfully
Writen 1 node files and 1 link files to dir samples/die/out/run_2024-05-08_00-29-07
Open, the graph.svg file to visualize how the algorithm works.
Now, run the probabilistic model checker. Since, we are using a fair coin, no need to specify the custom probabilities.
bazel-bin/performance/performance_bin --states samples/die/out/run_2024-05-08_00-29-07/ --source samples/die/Die.json
##
## Ignore the debug logs
##
8: 0.16666667 state: {} / returns: {"Roll":"1"}
9: 0.16666667 state: {} / returns: {"Roll":"2"}
10: 0.16666667 state: {} / returns: {"Roll":"3"}
11: 0.16666667 state: {} / returns: {"Roll":"4"}
12: 0.16666667 state: {} / returns: {"Roll":"5"}
13: 0.16666667 state: {} / returns: {"Roll":"6"}
As you can see, the die roll will return with equal probability 1/6.
Once this problem is modeled, the next immediate question would be, how many tosses are needed the result. That relates to performance modeling, that we will see in the next tutorial.
@@ -5,8 +5,9 @@
atomic action Roll:
- toss0 = Toss()
+
while True:
+ toss0 = Toss()
toss1 = Toss()
toss2 = Toss()
The result will still produce 1-6 with equal probability. But, the number of tosses will be more.
8: 0.16666667 state: {} / returns: {"Roll":"1"}
9: 0.16666667 state: {} / returns: {"Roll":"2"}
10: 0.16666667 state: {} / returns: {"Roll":"3"}
11: 0.16666667 state: {} / returns: {"Roll":"4"}
12: 0.16666667 state: {} / returns: {"Roll":"5"}
13: 0.16666667 state: {} / returns: {"Roll":"6"}
@@ -5,8 +5,8 @@
atomic action Roll:
toss0 = Toss()
+ toss1 = Toss()
while True:
- toss1 = Toss()
toss2 = Toss()
if (toss0 != toss1 or toss0 != toss2):
Oops, the probabilities are not equal.
8: 0.25000000 state: {} / returns: {"Roll":"1"}
9: 0.12500000 state: {} / returns: {"Roll":"2"}
10: 0.12500000 state: {} / returns: {"Roll":"3"}
11: 0.12500000 state: {} / returns: {"Roll":"4"}
12: 0.12500000 state: {} / returns: {"Roll":"5"}
13: 0.25000000 state: {} / returns: {"Roll":"6"}
Do it as an exercise :)
This is another classic, smaller but more interesting example. Given a coin of unknown bias, how do you simulate a fair coin. This solution was proposed by von Neumann. Toss the coin twice. If the outcomes are different, return the first outcome.
samples/fair-coin/FairCoin.fizz
:
atomic func UnfairToss():
oneof:
`head` return "head"
`tail` return "tail"
atomic action FairToss:
while True:
toss1 = UnfairToss()
toss2 = UnfairToss()
if toss1 != toss2:
return toss1
Set the max_actions to 1 in fizz.yaml
as before. Also, create a perf_model_unbiased.yaml
and perf_model_biased.yaml
file as before and set the UnfairToss.head
and UnfairToss.tail
probabilities.
Now, run the model checker.
./fizz samples/fair-coin/FairCoin.fizz
Model checking samples/fair-coin/FairCoin.json
configFileName: samples/fair-coin/fizz.yaml
StateSpaceOptions: options:{max_actions:1 max_concurrent_actions:1}
Nodes: 6, elapsed: 645.875µs
Time taken for model checking: 662.792µs
Writen graph dotfile: samples/fair-coin/out/run_2024-05-08_01-26-14/graph.dot
To generate svg, run:
dot -Tsvg samples/fair-coin/out/run_2024-05-08_01-26-14/graph.dot -o graph.svg && open graph.svg
Max Depth 3
PASSED: Model checker completed successfully
Writen 1 node files and 1 link files to dir samples/fair-coin/out/run_2024-05-08_01-26-14
Take a look at the graph to visualize how the algorithm works.
Now, run the probabilistic model checker with unbiased model.
bazel-bin/performance/performance_bin \
--states samples/fair-coin/out/run_2024-05-08_01-26-14/ \
--source samples/fair-coin/FairCoin.json \
--perf samples/fair-coin/perf_model_unbiased.yaml
4: 0.50000000 state: {} / returns: {"FairToss":"\"head\""}
5: 0.50000000 state: {} / returns: {"FairToss":"\"tail\""}
Repeat the same with the biased coin model.
bazel-bin/performance/performance_bin \
--states samples/fair-coin/out/run_2024-05-08_01-26-14/ \
--source samples/fair-coin/FairCoin.json \
--perf samples/fair-coin/perf_model_biased.yaml
4: 0.50000000 state: {} / returns: {"FairToss":"\"head\""}
5: 0.50000000 state: {} / returns: {"FairToss":"\"tail\""}
As you can see, the algorithm works as expected. Irrespective of the bias of the coin, the result is fair.
This is actually an important concept in cryptography. If you have a source of randomness with unknown bias, you can use this algorithm to generate a fair random number generator (as long as the source is memory-less, to be precise. That is, even if the coin is biased, each trial is independent of each other).
Probabilistic modeling is important beyond cryptography even to distributed developers. For example, when you have a number of components with varying error rates, and retries, what will be your system’s overall error rate? Can you have an API that is 99.999% reliable? If you add more retries, will it lead to timeout at other components, making the retries less useful and actually detrimental as it contributes to additional load?
We will see more real world examples, in later tutorials after basic performance modeling.
In this tutorial, we have seen the basics of probabilistic modeling with FizzBee. Unlike PRISM model checker, FizzBee is incredibly easy to use, and the language is mostly Python.
In the next tutorial, we will see how to model performance of a system.