Risk Analysis¶
Purpose¶
This risk analysis focuses on displaying the benefits of using light markup text, and specifically RST, for documentation of (software) projects.
This file also
- tries a quantitative estimation of the advantages and disadvantages: Risk analysis for documenting with RST
- motivates the Requirements on Documentation
This whole documentation is an example for documentation using RST. Specifically this file’s original is a template file that integrates automatic generation of some of its parts.
Only a part of rstdoc documentation deals with the provided python code.
Qualitative Analysis¶
Productivity¶
To have more evolution with less effort one must change the tools to better productivity.
The objective is to find a documentation format that is more productive than MS Office or Libre Office for technical documentation.
DOCX or ODT cannot be well integrated into a software project. The reason does not lie in the writing itself, but in the organization of information and the further development and handling of text. They
- have low accessibility: san, stq
- have low traceability: s9v, s0t
- produce too much redundancy: sgt, s8c
- are no good for automation: sgt, s8c
As a result:
- They lead to low productivity (sa7).
- The quality of the content suffers.
Code is written in a text editor. Documentation must be written with the same text editor. It brings overhead to access information with two different tools.
Formatting vs Content¶
r9g: | s45 |
---|
The purpose of DOCX or ODT, in general the WYSIWYG idea, is about providing easy formatting. The information coded in a human language is surrounded by layers of formatting.
- The DOCX XML files are zipped, which makes them binary.
- The XML-based formatting is so full of formatting markup, that it is not very readable. This also applies to non-zipped formats like docbook.
But formatting should have no importance in development.
There is a less obtrusive alternative for formatting than via XML, HTML or even TeX:
The content is important in documentation, not formatting.
Every bit of information needs a location. This location cannot be in a DOCX or ODT, because there it is not well accessible (Accessibility).
Certain content can be stored in a text database and reused in other documents via templating (r8d).
Data can be better integrated into text, than into DOCX or ODT.
From an immediate but naive perspective it may seem easier to compose documents using DOCX or ODT. Due to the complex task to bring a big project to a consistent final state, it is not. More detailed reasons are the topic of this documentation.
Final Version¶
rio: | scf |
---|
The purpose of this proposal is targeted to the development time.
After the project is over documents are
- archived or
- placed on a web server
The formats usually used, are:
In case the final version is a printout, DOCX or ODT allow final formatting correction before printing.
Parallelism¶
Parallel processing is faster than serial processing. The productivity of a team increases if the team members can work in parallel.
In order for the developer to work independently he needs to be allowed to make his own decisions.
The decisions get their input from the documentation done by others and the information generated by the developer himself.
Every developer needs to understand, how the product will be used.
The external requirements are kept
- minimal
- mostly soft, i.e. modifiable
- with good rationale, especially for hard requirements
If a developer has an idea, a conflict or an issue, he can adapt the source code and the documentation and the tests, also of others, to resolve the issue by himself.
The chief developer only
- does initial coordination
- observes, i.e. reviews the changes, as other developers do, too.
The format of the documentation matters regarding independence:
- changes can be traced in the VCS (Traceability)
- information can more easily be found via grep-like tools over all files (Accessibility)
- a final document file can be decomposed into separate source files for developers
Traceability¶
Trace changes¶
rnn: | s9v |
---|
Documentation is the description of the system in a human language. It is meant for humans. Nevertheless it is not a novel, but more like code.
- It defines variables and values (concepts) like code.
- It undergoes the same changes as code.
- It has dependencies and a hierarchical structure like code.
Team members need to be able to follow changes. A version control system (VCS) like SVN or GIT is needed to trace the changes in documentation.
Trace dependencies¶
rw9: | s0t, san |
---|
Code uses identifiers for its items (variables, functions, classes, …).
The documentation can use IDs to mark an item (paragraph, figure, table, …).
The ID can be used to reference an item from somewhere else: m-n. A special case is 1-n, e.g. the ID of a header comprises all IDs of the paragraphs below.
Flat addressing: Relations are not reflected in the names and especially not in the IDs. Especially the IDs do not have an order. Flat and unordered IDs are more flexible, because they are independent of the changes in structure and order.
Accessibility¶
Hypertext¶
r33: | san |
---|
The productivity depends much on how fast information can be found.
Access time: The time to access stored information.
The access time is fastest for information stored in the brain. The brain of most humans is very slow to memorize, though. And the brain forgets. Normally one can expect only the current topic to be present for immediate processing.
Related information can be quickly looked up, if the documentation contains references that immediatly can be jumped to (hypertext). The importance of this can be seen by the immense success of hypertext in the internet.
To allow hypertext referenced items must have a unique resource ID (URI).
Search¶
re4: | stq |
---|
Another alternative to discover information is via search.
- For small to mediums sized systems normal text search like grep suffices.
- A larger text corpus needs indexing to speed up search.
Since source code and documentation describe the same system, the same concepts and IDs are likely to occur. Source code describes the details and is not rephrasing documentation items, though. The concept names and IDs of documentation are expected more in source code comments than identifiers.
Community¶
rj4: | s9o |
---|
Community spreads the effort for tooling to more people.
- The commercial model makes more people dependent on one company. In case of DOCX there is no alternative to MS Word, that renders documents in the same way. This makes Microsoft a monopoly leading to over-pricing.
- The open source model is a decentralized community effort: With software there is no effort and therefore no loss in sharing. One gains the effort of others.
The open source model is preferred, because one has more control.
- one can add a feature if needed
- one can fix a bug immediately
The total effort is less than for the commercial model.
Sustainability¶
ref: | sed |
---|
The information shall be accessible
- over a long time
- by many people
But if the format is only readable by one of many commercial tools,
- at some point one may not want or be able to pay the license
- some people might use a different tool
If one would like to change the tool one cannot without substantial costs (vendor lock-in).
Because of the sustainability argument, a DOCX document needs to be converted to PDF, e.g. before sending to someone else or maybe even when checking into a VCS.
Further reading:
Redundancy¶
r90: | sgt, s8c |
---|
Redundancy: When the same information needs to be maintained at more places.
Less redundancy means higher productivity.
Redundancy
- needs more resources
- more pages, more memory
- more time to read
- more time to write
- more effort when changing something
- leads to inconsistencies
The reasons for redundancy are
- barrier between formats: DOCX and text, computer language and human language
- inability to link to information: no hypertext
- inability to exploit functional dependencies: no automation available
- normative boilerplate texts: no automation available
Further reading:
Automation¶
Scripting¶
r1p: | sgt, s8c |
---|
Why don’t we write code in MS Word or LibreOffice? Because it would be hard to parse away all the formatting.
It does not help to have a library like Office-XML-SDK or DOM, because the additional complexity through formatting elements still needs to be dealt with when parsing or creating documentation parts.
A format where formatting is less important and less obtrusive can be handled more easily via scripts.
Templates¶
r8d: |
---|
Many internet sites are generated with a mixture of text and a scripting language (PHP, JS, Python, …). Such templates allow
- to mix text and data or
- to generate text from data.
Text files can easily be generated from templates files.
Quantitative Analysis¶
Introduction to risk analysis¶
Risk analysis is basically a simulation.
Event: | A possible and recurring configuration of values of variables. |
---|---|
Frequency: | \(f\). How often an event is observed per time interval. |
Probability: | \(p\). Compares the frequency of mutually exclusive values of one variable. At least one value must occur (exhaustiveness). |
Rating: | \(v\). Judge an event by associating a value expressing harm/benefit, loss/profit or advantage/disadvantage. |
Risk: | \(r\). The risk is frequency * rating: |
The total risk \(R\) sums over all events:
Events can depend on other events functionally or statistically. One can start with the probability for an event once a day and then follow conditional probability chains to other events.
The risk analysis tries to analyse these dependencies to get to a more precise estimation of the frequency.
It is hard to get good estimates of frequencies in a complex real world, because there are
- many variables
- many dependencies
- unknown probabilities
Because the frequencies will be inaccurate, instead of numbers one can use more imprecise but realistic values, that need to be defined for the special area (Table 1).
Table 1: Occurrence values for a medical device
Number | Category | Explanation |
---|---|---|
1 | Unimaginable | Never occurs in the lifetime of device |
2 | Improbable | Occurs once in the lifetime of device |
3 | Remote imaginable | Occurs once in 100 applications |
4 | Sometimes | Occurs once in 10 applications |
5 | Probable | Occurs once per applications |
6 | Frequent | Occurs multiple times per applications |
The rating depends on the
- area (health sector, finance, …) and the
- circumstances (war or peace, rich or poor, …)
Table 2: Severity rating in the health sector.
Number | Category | Explanation |
---|---|---|
1 | Non-essential | Minor injury not needing medical intervention |
2 | Minor | Small to moderate injury |
3 | Critical | Severe injury or death |
4 | Catastrophical | Multiple deaths |
In this discrete description, risk value could be
ac: | acceptable |
---|---|
alarp: | as low as reasonably practicable |
nac: | not acceptable |
The risk function is defined by a table. The total risk can
- count each risk value occurrence
- count each cell occurrence in the risk table (Table 3)
Table 3: Occurrence/Severity matrix. AC, NAC, ALARP are counts of events in the respective cell.
Risk R | ||||
---|---|---|---|---|
OS | 1 | 2 | 3 | 4 |
6 | ALARP | NAC | NAC | NAC |
5 | ALARP | ALARP | NAC | NAC |
4 | ALARP | ALARP | ALARP | NAC |
3 | AC | ALARP | ALARP | ALARP |
2 | AC | AC | ALARP | ALARP |
1 | AC | AC | AC | ALARP |
Countermeasures¶
r2m: |
---|
The purpose of the risk analysis is not to make a yes/no decision for a project, but to derive countermeasures that reduce the risk or prevent harm or financial loss.
The countermeasures change the probability of the events, by changing the causal dependencies between events.
The rating probably will not change, unless circumstances change.
In the Occurrence/Severity example, in Table 3:
- before the measures: events are possibly in the upper right corner
- after the measures: events are ideally only in the lower left corner
- the events in the top/left to right/bottom diagonal have a trade-off and should be kept “as low as reasonably practicable”
Risk analysis for documenting with RST¶
rp5: |
---|
This risk analysis compares to the above introduction to risk analysis in this way:
- Event is a task a developer performs
- Time consumed per event corresponds to severity (per developer)
- Occurrence per developer
Instead of the discrete values, numbers are used for time and occurrence. The numbers are rough estimates, because they depend a lot:
- on the developer
- on the tools he uses (editor and plugins)
- how well he knows his tools
- which phase the development is in
- how long the project takes
- how much documentation there is
The risk is the effort per developer.
- \(e\): event to perform a task
- \(v_e\): time consumed for task
- \(f_e\): how often per day the task \(e\) occurs
- \(R\): total effort per developer
The countermeasures taken lead to:
- RST for documentation instead of MS Word or Libre Office
Events¶
The following events have a
- one-line description
- occurrence \(f\)
- countermeasure
- the effort \(v_1\) [min] before countermeasure
- the effort \(v_2\) [min] after countermeasure
As a check for the estimation \(\sum f v_1\) should give \(1d = 8h = 480\text{min}\).
The estimates assume a project that takes
- about a year
- has 5 team members
- needs to be consistently documented
description | occurences | measure | time1/min | time2/min |
---|---|---|---|---|
Include documentation in the build system | 1/5/365 | sxr | 0 | 10 |
Create separate version of documentation file (e.g. doc_1.1.docx) | 1/5/100 | s10 | 10 | 0 |
Look for file and open in editor then open another file in another tool (office application) | 20 | sed | 1 | 1/10 |
Plan the design of a software component and document it | 1 | s8c | 40 | 30 |
Review the changes in a documentation file | 1 | s9v | 20 | 1 |
Look up an ID in a documentation file | 10 | san | 1 | 1/60 |
Solve an implementation detail or a bug report | 2 | 100 | 100 | |
Discuss an interface with other team member consulting documentation | 1 | san | 10 | 9 |
Describe an implementation detail or how a bug was fixed documentation | 2 | san | 30 | 20 |
Merge contributions to a documentation file from more developers | 1/30 | sxr | 30 | 1 |
A printout of the documentation shall be started (without printing time) | 1/5/100 | scf | 5 | 10 |
Create a traceability file that shows how documentation items are linked | 1/5/100 | s0t | 3*480 | 1 |
Search for all occurrences of a name in all project files | 10 | stq | 4 | 1 |
Replace all occurrences of a name in all project files | 5 | stq | 4 | 1 |
Refactor and re-describe parts of code and update documentation | 1 | s8c, san | 30 | 20 |
Fix a formatting issue | 10 | s45 | 1 | 1/2 |
Check for consistency of a limit values between code and documentation | 1 | s8c | 2 | 0 |
Make the documentation of automatic tests or a test report of a test run | 1 | sgt | 20 | 10 |
Result¶
The assumed 1 year project with 5 developers would take only 0.7 years.
- Effort without RST: 486min=1.00000000000000day
- Effort with RST: 332min=0.7day
- Less effort (sa7): -154min=-0.3day
The benefit is not so much due to using a text editor instead of an office application to write documentation. It is due to a good exploitation of all the possibilities opened by pure text (Requirements on Documentation and Requirements on Project).