Real Random Numbers Without Additional Hardware

Maurice Castro
Software Engineering Research Centre
Email: maurice@serc.citri.edu.au

Abstract

Random numbers are used in many areas of computing, ranging from games and simulation through to key generation in cryptography. Many applications can be satisfied with pseudo-random numbers. However, some applications, such as generating keys for use in cryptography, require genuine random numbers.

The paper describes a method of extracting a string of random bits from the behaviour of the operating system scheduler. To reduce the load imposed on the system by the random number generator and to improve the properties of the random numbers generated, the random bits gathered from the scheduler are used to perturb a pseudo-random number generator.

The result is a stream of random numbers with a computational cost approaching that of a sequence of pseudo-random numbers, but with the properties of a sequence of real random numbers.

1 Introduction

Random numbers are used in many areas of computing, ranging from games and simulation through to key generation in cryptography. Many applications of sequences of random numbers are satisfied by using computational methods to produce a pseudo-random number sequence - a sequence of values that has the appearance of being random when only a small segment of the sequence is known and the algorithm for selecting the next number is unknown. Some applications, such as generating passwords for a password capability-based operating system (Anderson 1986, Castro 1995) and generating keys for use in cryptography, require genuine random numbers. Genuine random numbers have traditionally been produced through the use of measurements of a physically random process.

Other approaches which use hardware found in an existing system (Davis 1994) typically require modifications to the operating system or access to the hardware at a privileged level. The method proposed here operates at the user level.

The paper addresses a method of extracting a string of random bits from the behaviour of the operating system scheduler. To reduce the load imposed on the system by the random number generator and to improve the properties of the random numbers generated, the random bits gathered from the scheduler are used to perturb a pseudo-random number generator which has the property that a long series of values is required before it is possible to determine the next value (Wallace 1990).

The implementation and testing is being carried out on a number of Unix-based platforms.

2 Scheduling

Multiprogrammed multi-user operating systems typically employ a pre-emptive scheduler. Periodically, processes are interrupted and suspended to allow other processes to be scheduled. The algorithm employed by the scheduler aims to: In general, scheduling algorithms can adjust the length of the quanta - the time a process is allowed to execute before it is next interrupted - and the order in which processes are scheduled.

There are several varieties of Unix and a number of different scheduling algorithms have been employed. Two common varieties are discussed.

The term ‘priority’ is used in the following discussions in the normal sense of the word in English: the highest priority is the most important. This is distinct from the concept of priority level used in the UNIX operating system where the process with the smallest priority level value is the next process to run.

2.1 4.3BSD

This section briefly identifies the features of the short-term scheduling algorithm used in 4.3BSD (Leffler 1990). The scheduler employs multi-level feedback queues. The scheduler has several run queues and selects jobs from the highest priority non-empty queue. This queue is operated on under a ‘round robin’ regime. Blocked processes are removed from the run queues. The priority of a process determines the movement of a process between queues. When the priority of a process falls sufficiently, it is moved to a lower priority queue; conversely, when the priority of a process increases sufficiently, it is moved to higher priority queue. The scheduler is biased towards interactive programs. It characterises interactive programs as programs which tend to not use their complete time slice. Programs that use their full-time slices are decreased in priority, while processes which have been blocked for a significant length of time increase their priority.

2.2 SYSVR4

Process scheduling under UNIX System V Release 4 differs from earlier versions of the operating system (Goodheart 1994). In earlier versions, a single scheduling queue was used and the process with the highest priority would be executed. SYSVR4 introduced the concept of classes of processes. Two classes of processes are currently supported: time-sharing processes, and real-time processes. Time-sharing processes are managed using a ‘round robin’ mechanism similar to the process-scheduling mechanism used by earlier versions of SYSV. Real-time processes are given higher precedence than all time-sharing processes; furthermore, real-time processes can control the length of their time quanta and their priority within the real-time process class.

3 Principles

Although computers are constructed from components which have deterministic behaviours, the system constructed from those components displays complex and sometimes non-deterministic behaviour. Non-deterministic behaviour is undesirable because it reduces an observer's understanding of the system. Computer system designers attempt to reduce the amount of non-deterministic behaviour exhibited by their systems; however, most systems have failed to eliminate this type of behaviour. Real random numbers are derivable from non-deterministic behaviour. This section identifies a source of non-deterministic behaviour and describes a method of generating random numbers through the interaction of the scheduler, the real-time clock and the file system.

3.1 A source of random behaviour

A cheap source of random behaviour is found in many low cost electronics kits for electronic dice and ‘executive decision makers’ (Dick Smith Electronics 1980). These devices use a counter and a high speed clock. A person presses a button and the counter is connected to the clock. When the button is released, the value of the counter is displayed. These devices work by sampling a rapidly changing stream of events with the sampling rate driven by a much slower and less accurate clock.

The scheduler of a pre-emptive multi-tasking operating system acts as a rapidly changing stream of events when viewed as a process operating on a machine. We exploit this property to produce a sequence of random numbers. Furthermore, we illustrate that the order of events is hard to predict in the short-term, and impossible to predict in the long-term, thus guaranteeing true random behaviour.

3.1.1 Long-term unpredictability

A well-known problem in designing computer hardware is the tendency for two synchronised clocks to diverge.

  figure36

Figure 1: Relationship of two clocks of similar period

When two clocks are sampled, the result can be classified into three classes (illustrated in Figure 1): lagging, synchronised and leading. If the clocks are sampled infrequently enough and the clocks are derived from two different time sources, the result of the sampling process will have an indeterminate behaviour.

This principle can be used directly by locating two independent clocks within a computer system and performing sampling. Classifying the relationship between the two clocks into lagging, leading or synchronised can be used to provide an indeterminate pattern which can be represented as a bitstream. Two candidate clocks for this type of operation are the real-time clock - although many systems derive ‘ticks’ from the system clock, the real-time is often corrected by periodic references to an independent source of wall-clock time - and the system clock. The requirement that sufficient time elapse between sampling the clocks makes the method impractical for generating large volumes of random data, although it ensures that long-term behaviour is truly random.

3.1.2 Short-term unpredictability

The tendency for clocks to diverge can be exploited in a way that provides short-term variation. Two synchronised clocks can be run in a tournament. The first clock to complete a cycle is declared the winner. Due to variations in the clocks, the results of the tournaments can vary. This provides a degree of short-term variation. It is important to note that the results are likely to be biased towards one of the two possible results. To provide a useful stream of random bits, it is necessary to run many tournaments and remove the bias.

This approach is implemented using the scheduler and a set of identical processes. The processes are instructed to wakeup at the same time beginning the tournament. The processes perform some equivalent tasks and then perform an operation which employs mutual exclusion of the other process. By monitoring the result of the operation, the outcome of the tournament can be determined. On a multiprocessor, competitions run in parallel. On a uniprocessor, the competitions are run in sequence against the time quantum allocated to each process.

 

figure46

Figure 2: Two Processes Competing to Enter a Mutex Region

Figure 2 illustrates the operation of a competition employing two processes on a multiprocessor and on a uniprocessor system. In the illustration, two identical processes - pa and pb - are set in operation. Each process performs a computational task and, at the end of the task, each process attempts to perform a mutual exclusion operation and record its name. When a process ends, it releases the mutual exclusion, allowing the other process to record its name. In the multiprocessor system, the two processes can compete directly against each other. On a uniprocessor system, one of the processes is scheduled first. At the end of the first process's time quanta, another process is scheduled. After completing its computational task, one of the processes enters the mutual exclusion region and writes its name. The lagging process records its name before terminating. Two random elements exist in the uniprocessor example - the selection of the first process to run, and the competition to enter the mutual exclusion region before a process's time quanta ends.

4 Implementation

This section describes the mechanism used to recover random numbers from the scheduling of processes. The algorithm operates on both uniprocessors and multiprocessors. The method is divided into several parts: Each of these components is described in the remainder of this section.

4.1 Idle operations

Optimising compilers make the task of generating an operation, which does nothing except absorb a fixed number of clock cycles, difficult. Traditionally, the generation of an idle operation is relegated to a small piece of hand-written assembly code similar to that found in Figure 3.

 

figure61

Figure 3: An Idle Operation Written In i486 Assembly Code

This method never exposes the code to optimisation, allowing the idle operation to be performed exactly as the programmer specified.

A requirement that the code be easily portable across systems prevented the use of assembly code; assembly code is platform-specific and hence, not portable.

The approach employed exploited the C programming language's support for separate compilation. Separate compilation allows functions in other translation units (in C, a translation unit typically equates to a file) and libraries of precompiled code to be called. The linker is responsible for satisfying these external references at link time. To support separate compilation, a compiler must generate a call to a function in another translation unit. The optimiser cannot remove this call since the compiler is unaware of the action of code in other translation units on the process state. This ensures that at least the time taken by setting up, performing and returning from a function call is consumed by the idle operation code. Figure 4 illustrates code fragments used to implement the idle operation.

 

figure70

Figure 4: An Idle Operation Written In C

A system call is performed before the idle operation is commenced to increase the chance of synchronising the idle operations with a quanta boundary.

4.2 Estimating idle operations per quanta

Calculating the number of idle operations required to fill a quanta is complicated by the fact that Unix does not provide a mechanism for synchronising operations with a quanta boundary; furthermore, there is variation in the length of a quanta. The mechanism described here does not return a highly accurate result, but the result is sufficient for the purpose of generating random numbers.

The beginning of a quanta is generated by one of two events: a process resuming after it has been pre-empted and returning from a system call which causes the scheduler to reschedule the process. The sleep and the select system calls can cause processes to be rescheduled. This implementation employs select and makes use of the sysconf system call - provided by POSIX - to return the maximum length of a quanta in microseconds.

The length of a quanta in microseconds is used as a goal time. The algorithm increases the number of times the idle operation is called while measuring the number of microseconds that elapse. When the elapsed time is greater than the goal time, the last increment is subtracted and the increment size halved. This process is repeated until the increment has a value of zero. Figure 5 shows a plot of the number of iterations per trial and the size of the increment against the trial number.

  figure87

Figure 5: Determining the number of idle iterations per quanta

4.3 Conducting Tournaments

The operation of a tournament is described in two parts: the parent process and the child processes.

The parent conducts a tournament by forking a number of child processes, creating the result file, waiting until the child processes have completed and reading the results file for the tournament. The result is returned to the caller of the parent.

When a child process is started, it locates a unique number to use as its name by consulting a counter in the copied state of the parent process. Each child will see a different number since the state of the parent process is copied after the counter has been incremented. The child processes suspend themselves until a fixed period of time after a reference time initially determined by the parent. After the child process awakes, it attempts to gain an exclusive lock on the results file. If the process succeeds, it writes its name into the file and surrenders the remainder of its quanta. If the process fails, it performs a sufficient number of idle operations to nearly fill out its quanta, surrenders anything remaining of its quanta and then retries accessing the file. After a child process has written its name into the result file a specified number of times, the child terminates. The number of times a child writes its name into the result file is known as the number of repetitions.

The result of the parent process's operations and the child processes' operations is a pattern which is returned to the caller.

Figure 6 and Figure 7 show relevant code fragments to implement the method of generation of random numbers described above.

 

figure98

Figure 6: Code Fragments for the Parent Process

 

figure105

Figure 7: Code Fragments for the Child Process

4.4 Enumerating tournament results

The list of all possible tournament results is easily computed using a recursive algorithm. Initially, a set of the number of repetition copies of all the possible processes is generated. Patterns are constructed by selecting previously unused elements from the set of names and prepending it to a string. Figure 8 shows a partial example of computing the set of results.

 

figure114

Figure 8: Generating Potential Result String

After a complete set of patterns is generated, all the duplicate patterns are removed and the patterns are numbered continuously from zero.

When a result is recovered from the random number generator, the list is searched and the number allocated to the pattern is returned. This allows a single integer to be returned to represent the pattern.

The cost of generating the list is fairly high; however, this cost is amortised over all random numbers generated with the same number of repetitions and number of child processes.

5 Results

The mechanism is currently in the early stages of investigation. Initial experiments have resulted in correlated results. However, methods for improving the spread of output values and hence, reducing the correlation of the raw results, are being investigated. This section details the results of a series of experiments carried out on a uniprocessor FreeBSD-based system and a two-processor Solaris system. The results are encouraging as the spread of results is far more even than expected. Furthermore, the code and algorithm have proven to be successfully portable over different computer architectures.

Four experiments are reported on each of the two platforms. The parameters for each experiment are listed in Table 1.

 
Name # Processes # Repeats # Results
3.1.100 3 1 100
3.2.500 3 2 500
4.1.5000 4 1 5000
5.1.10000 5 1 10000

The ‘# Processes’ column indicates the number of child processes created. The ‘# Repeats’ column refers to the number of times a child process writes its name into the results file. The ‘# Results’ column refers to the number of random numbers generated.

Table 1: Experiment Parameters

The number of results generated was increased for experiments with a larger number of potential outcomes. The relative number of results gives a simple measure of the coverage of the results generated over the set of possible results. A first-order estimate of the repetition of patterns is found by counting the number of times a given value appears in a row.

 
Experiment Platform # Possible Patterns # Different Patterns # Different Lengths
3.1.100 FreeBSD 6 6 6
3.2.500 FreeBSD 120 9 5
4.1.5000 FreeBSD 24 18 18
5.1.10000 FreeBSD 120 45 13
3.1.100 Solaris 6 6 5
3.2.500 Solaris 120 15 8
4.1.5000 Solaris 24 24 10
5.1.10000 Solaris 120 117 6

Table 2: Gross Features of Result Sets

  figure177

Figure 9: Distribution of values produced by random number generator

  figure186

Figure 10: Distribution of Lengths of Strings of Repeated Values

  figure195

Figure 11: Distribution of values produced by random number generator

  figure204

Figure 12: Distribution of Lengths of Strings of Repeated Values

Table 2 and Figures 9, 10, 11 and 12 summarise the results generated in the experiments. The graphs of values show that some values are highly preferred by the random number generator; however, a wide spread of results are produced even if in only small numbers. The graphs of the length of repeating values are particularly encouraging as they tend toward short string lengths. This indicates that although some values are highly favoured, the favoured values tend not to occur in long continuous sequences.

An interesting result can be seen from the 3.2.500 series and the 5.1.10000 series. The graphs and other experiments indicate that increasing the number of repetitions appears to be less beneficial for improving the spread of results than increasing the number of processes.

The multiprocessor results are generally of higher quality, in that the number of patterns produced is higher and the lengths of repeating strings tend to be shorter. This result is expected as there is direct contention between processes for access to files, in addition to the indirect method available on uniprocessor systems.

6 Conclusion

This paper presents a mechanism for generating random numbers by exploiting the skew between a single process's consumption of a quanta of CPU time and the quanta of CPU time delivered by the scheduler. The method employs no special hardware devices, can be implemented at user level, and has been shown to be effective on both single-processor and multiprocessor systems running Unix.

The results presented showed that although the random numbers were not of high quality (the values tended to be poorly distributed), the behaviour was random.

Future work will focus on optimising the quality of the raw random number stream. Currently, the raw random number stream could be used to perturb a pseudo-random number generator yielding real random numbers with a uniform distribution. Using a perturbed random number generator would allow random numbers with a cost approaching that of pseudo-random numbers to be generated, but with the added advantage that the series produced would have a truly random behaviour.

Bibliography

1
Anderson, M., Pose, R.D. & Wallace, C.S. (1986) A Password-Capability system. The Computer Journal January, 29(1):1-8

2
Castro, M. (1995) The Walnut Kernel: User level programmer's guide. Technical Report 95/222, Department of Computer Science, Monash University, Clayton, Victoria

3
Davis, D., Ihaka, R., & Fenestermacher, P. (1994) Cryptographic randomness from air turbulance in disk drives in Desmedt, G. (ed.) Advances in Cryptology - CRYPTO'94 August, 114-120. Springer-Verlag.

4
Dick Smith Electronics (1980) Fun Way Into Electronics: Volume 2, 2nd edn, Dick Smith Electronics Pty Ltd, North Ryde, Sydney.

5
Goodheart, B. & Cox, J. (1994) The magic garden explained: the internals of {UNIX System V} release 4, an open-systems design, Prentice Hall, New York

6
Leffler, S. J, McKusick, M. K., Karels, M. J. & Quarterman, J. S. (1990) The Design and Implementation of the 4.3BSD UNIX Operating System, Addison-Wesley, Reading, Massachusetts

7
Tanenbaum, A. S. (1992) Modern Operating Systems, Prentice-Hall, Englewood Cliffs, New Jersey

8
Wallace, C. S. (1990) Physically random generator. Computer Systems Science & Engineering April, 5(2):82-88


Organised by: AUUG'96 & CSU Return to Conference Proceedings