Risk Analysis

Purpose

This risk analysis focuses on displaying the benefits of using light markup text, and specifically RST, for documentation of (software) projects.

This file also

This whole documentation is an example for documentation using RST. Specifically this file’s original is a template file that integrates automatic generation of some of its parts.

Only a part of rstdoc documentation deals with the provided python code.

Qualitative Analysis

Productivity

To have more evolution with less effort one must change the tools to better productivity.

The objective is to find a documentation format that is more productive than MS Office or Libre Office for technical documentation.

DOCX or ODT cannot be well integrated into a software project. The reason does not lie in the writing itself, but in the organization of information and the further development and handling of text. They

  • have low accessibility: san, stq
  • have low traceability: s9v, s0t
  • produce too much redundancy: sgt, s8c
  • are no good for automation: sgt, s8c

As a result:

  • They lead to low productivity (sa7).
  • The quality of the content suffers.

Code is written in a text editor. Documentation must be written with the same text editor. It brings overhead to access information with two different tools.

Formatting vs Content

r9g:s45

The purpose of DOCX or ODT, in general the WYSIWYG idea, is about providing easy formatting. The information coded in a human language is surrounded by layers of formatting.

  • The DOCX XML files are zipped, which makes them binary.
  • The XML-based formatting is so full of formatting markup, that it is not very readable. This also applies to non-zipped formats like docbook.

But formatting should have no importance in development.

There is a less obtrusive alternative for formatting than via XML, HTML or even TeX:

The content is important in documentation, not formatting.

Every bit of information needs a location. This location cannot be in a DOCX or ODT, because there it is not well accessible (Accessibility).

Certain content can be stored in a text database and reused in other documents via templating (r8d).

Data can be better integrated into text, than into DOCX or ODT.

From an immediate but naive perspective it may seem easier to compose documents using DOCX or ODT. Due to the complex task to bring a big project to a consistent final state, it is not. More detailed reasons are the topic of this documentation.

Final Version

rio:scf

The purpose of this proposal is targeted to the development time.

After the project is over documents are

  1. archived or
  2. placed on a web server

The formats usually used, are:

  1. PDF
  2. HTML
  3. DOCX or ODT

In case the final version is a printout, DOCX or ODT allow final formatting correction before printing.

Parallelism

Parallel processing is faster than serial processing. The productivity of a team increases if the team members can work in parallel.

In order for the developer to work independently he needs to be allowed to make his own decisions.

The decisions get their input from the documentation done by others and the information generated by the developer himself.

Every developer needs to understand, how the product will be used.

The external requirements are kept

  • minimal
  • mostly soft, i.e. modifiable
  • with good rationale, especially for hard requirements

If a developer has an idea, a conflict or an issue, he can adapt the source code and the documentation and the tests, also of others, to resolve the issue by himself.

The chief developer only

  • does initial coordination
  • observes, i.e. reviews the changes, as other developers do, too.

The format of the documentation matters regarding independence:

  • changes can be traced in the VCS (Traceability)
  • information can more easily be found via grep-like tools over all files (Accessibility)
  • a final document file can be decomposed into separate source files for developers

Traceability

Trace changes

rnn:s9v

Documentation is the description of the system in a human language. It is meant for humans. Nevertheless it is not a novel, but more like code.

  • It defines variables and values (concepts) like code.
  • It undergoes the same changes as code.
  • It has dependencies and a hierarchical structure like code.

Team members need to be able to follow changes. A version control system (VCS) like SVN or GIT is needed to trace the changes in documentation.

Trace dependencies

rw9:s0t, san

Code uses identifiers for its items (variables, functions, classes, …).

The documentation can use IDs to mark an item (paragraph, figure, table, …).

The ID can be used to reference an item from somewhere else: m-n. A special case is 1-n, e.g. the ID of a header comprises all IDs of the paragraphs below.

Flat addressing: Relations are not reflected in the names and especially not in the IDs. Especially the IDs do not have an order. Flat and unordered IDs are more flexible, because they are independent of the changes in structure and order.

Accessibility

Hypertext

r33:san

The productivity depends much on how fast information can be found.

Access time: The time to access stored information.

The access time is fastest for information stored in the brain. The brain of most humans is very slow to memorize, though. And the brain forgets. Normally one can expect only the current topic to be present for immediate processing.

Related information can be quickly looked up, if the documentation contains references that immediatly can be jumped to (hypertext). The importance of this can be seen by the immense success of hypertext in the internet.

To allow hypertext referenced items must have a unique resource ID (URI).

Community

rj4:s9o

Community spreads the effort for tooling to more people.

  • The commercial model makes more people dependent on one company. In case of DOCX there is no alternative to MS Word, that renders documents in the same way. This makes Microsoft a monopoly leading to over-pricing.
  • The open source model is a decentralized community effort: With software there is no effort and therefore no loss in sharing. One gains the effort of others.

The open source model is preferred, because one has more control.

  • one can add a feature if needed
  • one can fix a bug immediately

The total effort is less than for the commercial model.

Sustainability

ref:sed

The information shall be accessible

  • over a long time
  • by many people

But if the format is only readable by one of many commercial tools,

  • at some point one may not want or be able to pay the license
  • some people might use a different tool

If one would like to change the tool one cannot without substantial costs (vendor lock-in).

Because of the sustainability argument, a DOCX document needs to be converted to PDF, e.g. before sending to someone else or maybe even when checking into a VCS.

Further reading:

Redundancy

r90:sgt, s8c

Redundancy: When the same information needs to be maintained at more places.

Less redundancy means higher productivity.

Redundancy

  • needs more resources
    • more pages, more memory
    • more time to read
    • more time to write
    • more effort when changing something
  • leads to inconsistencies

The reasons for redundancy are

  • barrier between formats: DOCX and text, computer language and human language
  • inability to link to information: no hypertext
  • inability to exploit functional dependencies: no automation available
  • normative boilerplate texts: no automation available

Further reading:

Automation

Scripting

r1p:sgt, s8c

Why don’t we write code in MS Word or LibreOffice? Because it would be hard to parse away all the formatting.

It does not help to have a library like Office-XML-SDK or DOM, because the additional complexity through formatting elements still needs to be dealt with when parsing or creating documentation parts.

A format where formatting is less important and less obtrusive can be handled more easily via scripts.

Templates

r8d:

Many internet sites are generated with a mixture of text and a scripting language (PHP, JS, Python, …). Such templates allow

  • to mix text and data or
  • to generate text from data.

Text files can easily be generated from templates files.

Quantitative Analysis

Introduction to risk analysis

Risk analysis is basically a simulation.

Event:A possible and recurring configuration of values of variables.
Frequency:\(f\). How often an event is observed per time interval.
Probability:\(p\). Compares the frequency of mutually exclusive values of one variable. At least one value must occur (exhaustiveness).
Rating:\(v\). Judge an event by associating a value expressing harm/benefit, loss/profit or advantage/disadvantage.
Risk:\(r\). The risk is frequency * rating:
\[r_{e} = f_{e} v_{e}\]

The total risk \(R\) sums over all events:

\[R = \sum_e r_e\]

Events can depend on other events functionally or statistically. One can start with the probability for an event once a day and then follow conditional probability chains to other events.

The risk analysis tries to analyse these dependencies to get to a more precise estimation of the frequency.

It is hard to get good estimates of frequencies in a complex real world, because there are

  • many variables
  • many dependencies
  • unknown probabilities

Because the frequencies will be inaccurate, instead of numbers one can use more imprecise but realistic values, that need to be defined for the special area (Table 1).

Table 1: Occurrence values for a medical device

Number Category Explanation
1 Unimaginable Never occurs in the lifetime of device
2 Improbable Occurs once in the lifetime of device
3 Remote imaginable Occurs once in 100 applications
4 Sometimes Occurs once in 10 applications
5 Probable Occurs once per applications
6 Frequent Occurs multiple times per applications

The rating depends on the

  • area (health sector, finance, …) and the
  • circumstances (war or peace, rich or poor, …)

Table 2: Severity rating in the health sector.

Number Category Explanation
1 Non-essential Minor injury not needing medical intervention
2 Minor Small to moderate injury
3 Critical Severe injury or death
4 Catastrophical Multiple deaths

In this discrete description, risk value could be

ac:acceptable
alarp:as low as reasonably practicable
nac:not acceptable

The risk function is defined by a table. The total risk can

  • count each risk value occurrence
  • count each cell occurrence in the risk table (Table 3)

Table 3: Occurrence/Severity matrix. AC, NAC, ALARP are counts of events in the respective cell.

Risk R        
OS 1 2 3 4
6 ALARP NAC NAC NAC
5 ALARP ALARP NAC NAC
4 ALARP ALARP ALARP NAC
3 AC ALARP ALARP ALARP
2 AC AC ALARP ALARP
1 AC AC AC ALARP

Countermeasures

r2m:

The purpose of the risk analysis is not to make a yes/no decision for a project, but to derive countermeasures that reduce the risk or prevent harm or financial loss.

The countermeasures change the probability of the events, by changing the causal dependencies between events.

The rating probably will not change, unless circumstances change.

In the Occurrence/Severity example, in Table 3:

  • before the measures: events are possibly in the upper right corner
  • after the measures: events are ideally only in the lower left corner
  • the events in the top/left to right/bottom diagonal have a trade-off and should be kept “as low as reasonably practicable”

Risk analysis for documenting with RST

rp5:

This risk analysis compares to the above introduction to risk analysis in this way:

  • Event is a task a developer performs
  • Time consumed per event corresponds to severity (per developer)
  • Occurrence per developer

Instead of the discrete values, numbers are used for time and occurrence. The numbers are rough estimates, because they depend a lot:

  • on the developer
  • on the tools he uses (editor and plugins)
  • how well he knows his tools
  • which phase the development is in
  • how long the project takes
  • how much documentation there is

The risk is the effort per developer.

(1)\[R = \sum_{e}v_e f_e\]

Math 1:

  • \(e\): event to perform a task
  • \(v_e\): time consumed for task
  • \(f_e\): how often per day the task \(e\) occurs
  • \(R\): total effort per developer

The countermeasures taken lead to:

  • RST for documentation instead of MS Word or Libre Office

Events

The following events have a

  • one-line description
  • occurrence \(f\)
  • countermeasure
  • the effort \(v_1\) [min] before countermeasure
  • the effort \(v_2\) [min] after countermeasure

As a check for the estimation \(\sum f v_1\) should give \(1d = 8h = 480\text{min}\).

The estimates assume a project that takes

  • about a year
  • has 5 team members
  • needs to be consistently documented
description occurences measure time1/min time2/min
Include documentation in the build system 1/5/365 sxr 0 10
Create separate version of documentation file (e.g. doc_1.1.docx) 1/5/100 s10 10 0
Look for file and open in editor then open another file in another tool (office application) 20 sed 1 1/10
Plan the design of a software component and document it 1 s8c 40 30
Review the changes in a documentation file 1 s9v 20 1
Look up an ID in a documentation file 10 san 1 1/60
Solve an implementation detail or a bug report 2   100 100
Discuss an interface with other team member consulting documentation 1 san 10 9
Describe an implementation detail or how a bug was fixed documentation 2 san 30 20
Merge contributions to a documentation file from more developers 1/30 sxr 30 1
A printout of the documentation shall be started (without printing time) 1/5/100 scf 5 10
Create a traceability file that shows how documentation items are linked 1/5/100 s0t 3*480 1
Search for all occurrences of a name in all project files 10 stq 4 1
Replace all occurrences of a name in all project files 5 stq 4 1
Refactor and re-describe parts of code and update documentation 1 s8c, san 30 20
Fix a formatting issue 10 s45 1 1/2
Check for consistency of a limit values between code and documentation 1 s8c 2 0
Make the documentation of automatic tests or a test report of a test run 1 sgt 20 10

Result

The assumed 1 year project with 5 developers would take only 0.7 years.

  • Effort without RST: 486min=1.00000000000000day
  • Effort with RST: 332min=0.7day
  • Less effort (sa7): -154min=-0.3day

The benefit is not so much due to using a text editor instead of an office application to write documentation. It is due to a good exploitation of all the possibilities opened by pure text (Requirements on Documentation and Requirements on Project).