110 lines
2.5 KiB
Markdown
110 lines
2.5 KiB
Markdown
|
---
|
|||
|
title: "Oppsumering av TTK4145"
|
|||
|
description: "Lot of theory and discussion, some fomulas, spring 2021."
|
|||
|
date: 2021-05-04
|
|||
|
math: true
|
|||
|
---
|
|||
|
|
|||
|
## Fault tolerance
|
|||
|
|
|||
|
Hard to capture faults.
|
|||
|
|
|||
|
|
|||
|
### Bugs
|
|||
|
|
|||
|
* 1 bug per 50 lines before testing
|
|||
|
* 1 bug per 500 at release
|
|||
|
* 1 bug per 550 after a year, the constant
|
|||
|
|
|||
|
1. Make the program work within specs.
|
|||
|
2. Run/Tests of the program-
|
|||
|
3. Errors happen
|
|||
|
4. Locate errors
|
|||
|
* Incomplete spec
|
|||
|
* Missing handleling of som situation
|
|||
|
5. Fix code
|
|||
|
|
|||
|
### Traditional error handeling
|
|||
|
|
|||
|
{% highlight c %}
|
|||
|
FILE *
|
|||
|
openConfigFile(){
|
|||
|
FILE * f = fopen("/path/to/config.conf");
|
|||
|
if (f == NULL) {
|
|||
|
switch(errno){
|
|||
|
case ENOMEM: {
|
|||
|
...
|
|||
|
break;
|
|||
|
}
|
|||
|
case ENOTDIR: {
|
|||
|
...
|
|||
|
break;
|
|||
|
}
|
|||
|
// Do this for all errors
|
|||
|
}
|
|||
|
}
|
|||
|
}
|
|||
|
{% endhighlight %}
|
|||
|
|
|||
|
### Causes of errors
|
|||
|
|
|||
|
* Incomplete specification
|
|||
|
* Software bugs
|
|||
|
* HW problems
|
|||
|
* Communication problems
|
|||
|
|
|||
|
### Fault tolerance in real time systems
|
|||
|
|
|||
|
The problem with traditional errorhandleing is that errors can happen at any possible time.
|
|||
|
This is extremely hard to test.
|
|||
|
|
|||
|
This is some of the error handling real time programming have.
|
|||
|
|
|||
|
* Handling of unexpected errors
|
|||
|
* More threads hanles errors
|
|||
|
* Can not test the conventional way
|
|||
|
* Can only show extistence of errors
|
|||
|
* Can not find errors in specification
|
|||
|
* Can not find race conditions
|
|||
|
|
|||
|
The fault path is shown under.
|
|||
|
|
|||
|
![Fault tolerance](figures/fault-path.svg)
|
|||
|
|
|||
|
With fault tolerance the path looks something more like the figure under.
|
|||
|
|
|||
|
![Fault tolerance](figures/fault-tolarance.svg)
|
|||
|
|
|||
|
### Error handling
|
|||
|
|
|||
|
Keep it simple!
|
|||
|
|
|||
|
The error modes is a part of the module interface.
|
|||
|
|
|||
|
One way is to handle all errors the same way.
|
|||
|
Handle the as if it was the worst error.
|
|||
|
Crash and start again.
|
|||
|
|
|||
|
A different approach is to check that everything is OK.
|
|||
|
|
|||
|
To test how the systems responds for a unknown error is to insert a failed acceptance test (a not OK signal).
|
|||
|
|
|||
|
### Redundancy
|
|||
|
|
|||
|
* If I have $N$ copies of my data, it is possible to handle that one is destroyed.
|
|||
|
* Sending $N$ messages, trying $N$ times.
|
|||
|
|
|||
|
**Static redundancy**
|
|||
|
|
|||
|
* $N$ active copies. Sending $N$ messages if it is necessary or not.
|
|||
|
* Detecting errors is not important.
|
|||
|
* Handles cosmic rays easily.
|
|||
|
|
|||
|
**Dynammic redunancy**
|
|||
|
|
|||
|
* Relies on detecting the error and recovering
|
|||
|
* Resend if timeout and not receiving "ack"
|
|||
|
* Go with default if no messages have been received
|
|||
|
* The acceptancetest must be good.
|
|||
|
|
|||
|
|