Risk Analysis¶

Purpose¶

This risk analysis focuses on displaying the benefits of using light markup text, and specifically RST, for documentation of (software) projects.

This file also

tries a quantitative estimation of the advantages and disadvantages: Risk analysis for documenting with RST
motivates the Requirements on Documentation

This whole documentation is an example for documentation using RST. Specifically this file’s original is a template file that integrates automatic generation of some of its parts.

Only a part of rstdoc documentation deals with the provided python code.

Qualitative Analysis¶

Productivity¶

To have more evolution with less effort one must change the tools to better productivity.

The objective is to find a documentation format that is more productive than MS Office or Libre Office for technical documentation.

DOCX or ODT cannot be well integrated into a software project. The reason does not lie in the writing itself, but in the organization of information and the further development and handling of text. They

have low accessibility: san, stq
have low traceability: s9v, s0t
produce too much redundancy: sgt, s8c
are no good for automation: sgt, s8c

As a result:

They lead to low productivity (sa7).
The quality of the content suffers.

Code is written in a text editor. Documentation must be written with the same text editor. It brings overhead to access information with two different tools.

Formatting vs Content¶

r9g:	s45

The purpose of DOCX or ODT, in general the WYSIWYG idea, is about providing easy formatting. The information coded in a human language is surrounded by layers of formatting.

The DOCX XML files are zipped, which makes them binary.
The XML-based formatting is so full of formatting markup, that it is not very readable. This also applies to non-zipped formats like docbook.

But formatting should have no importance in development.

There is a less obtrusive alternative for formatting than via XML, HTML or even TeX:

light markup.

The content is important in documentation, not formatting.

Every bit of information needs a location. This location cannot be in a DOCX or ODT, because there it is not well accessible (Accessibility).

Certain content can be stored in a text database and reused in other documents via templating (r8d).

Data can be better integrated into text, than into DOCX or ODT.

From an immediate but naive perspective it may seem easier to compose documents using DOCX or ODT. Due to the complex task to bring a big project to a consistent final state, it is not. More detailed reasons are the topic of this documentation.

Final Version¶

rio:	scf

The purpose of this proposal is targeted to the development time.

After the project is over documents are

archived or
placed on a web server

The formats usually used, are:

PDF
HTML
DOCX or ODT

In case the final version is a printout, DOCX or ODT allow final formatting correction before printing.

Parallelism¶

Parallel processing is faster than serial processing. The productivity of a team increases if the team members can work in parallel.

In order for the developer to work independently he needs to be allowed to make his own decisions.

The decisions get their input from the documentation done by others and the information generated by the developer himself.

Every developer needs to understand, how the product will be used.

The external requirements are kept

minimal
mostly soft, i.e. modifiable
with good rationale, especially for hard requirements

If a developer has an idea, a conflict or an issue, he can adapt the source code and the documentation and the tests, also of others, to resolve the issue by himself.

The chief developer only

does initial coordination
observes, i.e. reviews the changes, as other developers do, too.

The format of the documentation matters regarding independence:

changes can be traced in the VCS (Traceability)
information can more easily be found via grep-like tools over all files (Accessibility)
a final document file can be decomposed into separate source files for developers

Traceability¶

Trace changes¶

rnn:	s9v

Documentation is the description of the system in a human language. It is meant for humans. Nevertheless it is not a novel, but more like code.

It defines variables and values (concepts) like code.
It undergoes the same changes as code.
It has dependencies and a hierarchical structure like code.

Team members need to be able to follow changes. A version control system (VCS) like SVN or GIT is needed to trace the changes in documentation.

Trace dependencies¶

rw9:	s0t, san

Code uses identifiers for its items (variables, functions, classes, …).

The documentation can use IDs to mark an item (paragraph, figure, table, …).

The ID can be used to reference an item from somewhere else: m-n. A special case is 1-n, e.g. the ID of a header comprises all IDs of the paragraphs below.

Flat addressing: Relations are not reflected in the names and especially not in the IDs. Especially the IDs do not have an order. Flat and unordered IDs are more flexible, because they are independent of the changes in structure and order.

Accessibility¶

Hypertext¶

r33:	san

The productivity depends much on how fast information can be found.

Access time: The time to access stored information.

The access time is fastest for information stored in the brain. The brain of most humans is very slow to memorize, though. And the brain forgets. Normally one can expect only the current topic to be present for immediate processing.

Related information can be quickly looked up, if the documentation contains references that immediatly can be jumped to (hypertext). The importance of this can be seen by the immense success of hypertext in the internet.

To allow hypertext referenced items must have a unique resource ID (URI).

Search¶

re4:	stq

Another alternative to discover information is via search.

For small to mediums sized systems normal text search like grep suffices.
A larger text corpus needs indexing to speed up search.

Since source code and documentation describe the same system, the same concepts and IDs are likely to occur. Source code describes the details and is not rephrasing documentation items, though. The concept names and IDs of documentation are expected more in source code comments than identifiers.

Community¶

rj4:	s9o

Community spreads the effort for tooling to more people.

The commercial model makes more people dependent on one company. In case of DOCX there is no alternative to MS Word, that renders documents in the same way. This makes Microsoft a monopoly leading to over-pricing.
The open source model is a decentralized community effort: With software there is no effort and therefore no loss in sharing. One gains the effort of others.

The open source model is preferred, because one has more control.

one can add a feature if needed
one can fix a bug immediately

The total effort is less than for the commercial model.

Sustainability¶

ref:	sed

The information shall be accessible

over a long time
by many people

But if the format is only readable by one of many commercial tools,

at some point one may not want or be able to pay the license
some people might use a different tool

If one would like to change the tool one cannot without substantial costs (vendor lock-in).

Because of the sustainability argument, a DOCX document needs to be converted to PDF, e.g. before sending to someone else or maybe even when checking into a VCS.

Redundancy¶

r90:	sgt, s8c

Redundancy: When the same information needs to be maintained at more places.

Less redundancy means higher productivity.

Redundancy

needs more resources
- more pages, more memory
- more time to read
- more time to write
- more effort when changing something
leads to inconsistencies

The reasons for redundancy are

barrier between formats: DOCX and text, computer language and human language
inability to link to information: no hypertext
inability to exploit functional dependencies: no automation available
normative boilerplate texts: no automation available

Automation¶

Scripting¶

r1p:	sgt, s8c

Why don’t we write code in MS Word or LibreOffice? Because it would be hard to parse away all the formatting.

It does not help to have a library like Office-XML-SDK or DOM, because the additional complexity through formatting elements still needs to be dealt with when parsing or creating documentation parts.

A format where formatting is less important and less obtrusive can be handled more easily via scripts.

Templates¶

r8d:

Many internet sites are generated with a mixture of text and a scripting language (PHP, JS, Python, …). Such templates allow

to mix text and data or
to generate text from data.

Text files can easily be generated from templates files.

Quantitative Analysis¶

Introduction to risk analysis¶

Risk analysis is basically a simulation.

Event:	A possible and recurring configuration of values of variables.
Frequency:	\(f\). How often an event is observed per time interval.
Probability:	\(p\). Compares the frequency of mutually exclusive values of one variable. At least one value must occur (exhaustiveness).
Rating:	\(v\). Judge an event by associating a value expressing harm/benefit, loss/profit or advantage/disadvantage.
Risk:	\(r\). The risk is frequency * rating:

\[r_{e} = f_{e} v_{e}\]

The total risk \(R\) sums over all events:

\[R = \sum_e r_e\]

Events can depend on other events functionally or statistically. One can start with the probability for an event once a day and then follow conditional probability chains to other events.

The risk analysis tries to analyse these dependencies to get to a more precise estimation of the frequency.

It is hard to get good estimates of frequencies in a complex real world, because there are

many variables
many dependencies
unknown probabilities

Because the frequencies will be inaccurate, instead of numbers one can use more imprecise but realistic values, that need to be defined for the special area (Table 1).

Table 1: Occurrence values for a medical device

Number	Category	Explanation
1	Unimaginable	Never occurs in the lifetime of device
2	Improbable	Occurs once in the lifetime of device
3	Remote imaginable	Occurs once in 100 applications
4	Sometimes	Occurs once in 10 applications
5	Probable	Occurs once per applications
6	Frequent	Occurs multiple times per applications

The rating depends on the

area (health sector, finance, …) and the
circumstances (war or peace, rich or poor, …)

Table 2: Severity rating in the health sector.

Number	Category	Explanation
1	Non-essential	Minor injury not needing medical intervention
2	Minor	Small to moderate injury
3	Critical	Severe injury or death
4	Catastrophical	Multiple deaths

In this discrete description, risk value could be

ac:	acceptable
alarp:	as low as reasonably practicable
nac:	not acceptable

The risk function is defined by a table. The total risk can

count each risk value occurrence
count each cell occurrence in the risk table (Table 3)

Table 3: Occurrence/Severity matrix. AC, NAC, ALARP are counts of events in the respective cell.

Risk R
OS	1	2	3	4
6	ALARP	NAC	NAC	NAC
5	ALARP	ALARP	NAC	NAC
4	ALARP	ALARP	ALARP	NAC
3	AC	ALARP	ALARP	ALARP
2	AC	AC	ALARP	ALARP
1	AC	AC	AC	ALARP

Countermeasures¶

r2m:

The purpose of the risk analysis is not to make a yes/no decision for a project, but to derive countermeasures that reduce the risk or prevent harm or financial loss.

The countermeasures change the probability of the events, by changing the causal dependencies between events.

The rating probably will not change, unless circumstances change.

In the Occurrence/Severity example, in Table 3:

before the measures: events are possibly in the upper right corner
after the measures: events are ideally only in the lower left corner
the events in the top/left to right/bottom diagonal have a trade-off and should be kept “as low as reasonably practicable”

Risk analysis for documenting with RST¶

rp5:

This risk analysis compares to the above introduction to risk analysis in this way:

Event is a task a developer performs
Time consumed per event corresponds to severity (per developer)
Occurrence per developer

Instead of the discrete values, numbers are used for time and occurrence. The numbers are rough estimates, because they depend a lot:

on the developer
on the tools he uses (editor and plugins)
how well he knows his tools
which phase the development is in
how long the project takes
how much documentation there is

The risk is the effort per developer.

(1)¶\[R = \sum_{e}v_e f_e\]

Math 1:

\(e\): event to perform a task
\(v_e\): time consumed for task
\(f_e\): how often per day the task \(e\) occurs
\(R\): total effort per developer

The countermeasures taken lead to:

RST for documentation instead of MS Word or Libre Office

Events¶

The following events have a

one-line description
occurrence \(f\)
countermeasure
the effort \(v_1\) [min] before countermeasure
the effort \(v_2\) [min] after countermeasure

As a check for the estimation \(\sum f v_1\) should give \(1d = 8h = 480\text{min}\).

The estimates assume a project that takes

about a year
has 5 team members
needs to be consistently documented

description	occurences	measure	time1/min	time2/min
Include documentation in the build system	1/5/365	sxr	0	10
Create separate version of documentation file (e.g. doc_1.1.docx)	1/5/100	s10	10	0
Look for file and open in editor then open another file in another tool (office application)	20	sed	1	1/10
Plan the design of a software component and document it	1	s8c	40	30
Review the changes in a documentation file	1	s9v	20	1
Look up an ID in a documentation file	10	san	1	1/60
Solve an implementation detail or a bug report	2		100	100
Discuss an interface with other team member consulting documentation	1	san	10	9
Describe an implementation detail or how a bug was fixed documentation	2	san	30	20
Merge contributions to a documentation file from more developers	1/30	sxr	30	1
A printout of the documentation shall be started (without printing time)	1/5/100	scf	5	10
Create a traceability file that shows how documentation items are linked	1/5/100	s0t	3*480	1
Search for all occurrences of a name in all project files	10	stq	4	1
Replace all occurrences of a name in all project files	5	stq	4	1
Refactor and re-describe parts of code and update documentation	1	s8c, san	30	20
Fix a formatting issue	10	s45	1	1/2
Check for consistency of a limit values between code and documentation	1	s8c	2	0
Make the documentation of automatic tests or a test report of a test run	1	sgt	20	10

Result¶

The assumed 1 year project with 5 developers would take only 0.7 years.

Effort without RST: 486min=1.00000000000000day
Effort with RST: 332min=0.7day
Less effort (sa7): -154min=-0.3day

The benefit is not so much due to using a text editor instead of an office application to write documentation. It is due to a good exploitation of all the possibilities opened by pure text (Requirements on Documentation and Requirements on Project).