The initial application we developed to test the architecture was an MPEG audio system which allows terminals to play CD quality audio from a general purpose file server. To compensate for jittering and delay, prefetching techniques were used within the terminal software.
The system has three different interfaces: a far interface with a cordless mouse for non-keyboard interaction, and near interface which is used when working in front of the computer and a programming interface allowing programs, for example clock and alarm, to interact with the audio system.
For our far interface we developed a new mouse paradigm which reduces track-ball movement and cursor dependence making cordless mice easier to use.
The next section presents the architecture of the MPEG audio system and its main components. In section 4, we discuss MPEG audio compression and our audio collection. Modifications to the Plan 9 file server to cope with MPEG storage and play-back requirements are discussed in section 5. In section 6, we present prefetching techniques used by the terminal, as well as implementation details of standard audio operations. Sections 7 and 8 explains our user interfaces, the programming interface and a disk-jockey program. We finish with some conclusions in section 10.
The prefetching software pre-loads a few kilobytes from the file server to compensate for network and file server jittering and delay. It has a pool of buffers and a process which monitors the number of free buffers in the pool, requesting data from the file server to keep the pool as full as possible. We found it easier to implement standard operations such as CHANGE, STOP and SEEK, in the prefetching program rather than in the interfaces, because the choice of data that should be prefetched depends on the current mode, for example PLAY, FAST-FORWARD or REWIND. The prefetcher exports a file system with two files: a ctl file to issue operations and a status file to read administrative information.
The main commands we can issue to the ctl file are:
Play a track of a particular CDPause/Resume the current track
Stop the current track
Seek to the nth second of the current track
Reading the status file returns track name, current position within the track and total length of the track.
Having a file system interface to the low-level software, prefetcher and decoder, allows applications to be constructed using different systems and programming languages, from textual interfaces based on shell scripts to graphical ones using high-level languages, and promotes re-use of the low-level software which is often more complex and time-consuming to develop than the user interface.
The output of the prefetcher is connected to an MPEG decoder, which could be software based or hardware assisted. We use a decoding program called maplay Bading, which decodes MPEG audio layer II in real-time on a 100MHz Pentium PC. The prefetcher and the decoder are connected through a pipe.
The main functions of our user interfaces are to provide ways to select CDs and tracks to be played, present status information and apply standard commands. User interfaces translate user-actions into the corresponded strings to be written to the ctl file and present the status information read from the status file. In section 6, we describe two different user interfaces to the system.
After some investigation, we chose 128Kbit/s as our compression rate which gives CD quality audio and storage requirement of 960Kbytes per minute.
A 128Kbit per second stream is composed of sequences of twenty three 418-byte frames plus one 417-byte frame, and twenty four 418-byte frames plus one 417-byte frame. Using this information, it is easy to translate an offset in seconds within a track to a byte-offset within a track file.
In a 128Kbit/s fixed-rate compressed stream, the seconds to offset formula is 128/8*1024*seconds. However, we have to round this offset to the next frame boundary to keep maplay synchronised.
It is also possible to convert from a byte-offset within the track to a number of seconds or milliseconds since the beginning of the track.
Currently, our audio database averages 3.8Mbytes per track and 48MBytes per CD. Our design goal for the home multimedia system is to service 5 video and 5 audio streams simultaneously. The system is currently capable of servicing 8 simultaneous audio streams. The number of users has been limited by the lack of CPU power of our Plan 9 terminal computers to decode MPEG in real time. Table 1 shows the decoding capacity of the available machines. The M-No is the number of MPEG streams that a machine can decode in real-time. For example, a SPARC ELC is not quite able to decode one MPEG stream, while a Magnum R4400 can decode three MPEG streams in real-time.
| M-No. | CPU | Operating System |
| 0.41 | 486DX /33 | Linux 1.2.9 i48 |
| 0.75 | 486DX2/66 | Linux 1.3.3 i486 |
| 0.81 | 486DX4/100 | Linux 1.2.13 i486 |
| 0.93 | SPARC ELC | Solaris 2.4 |
| 1.65 | Pentium 60/66 | Deskpro XE560 |
| 1.73 | Pentium 60/66 | Linux 1.3.56, 60MHz |
| 2.54 | Pentium 90/100 | Linux 1.2.13 i586 |
| 3.06 | SPARC 50 | Solaris 2.4 |
| 3.24 | R4400 | ULTRIX 4.3 1 RISC |
Table 1: Decoding Capacity
Table 2 displays the relationship between compression time and playing time. For example, the machine ciml can compress 1 minute of audio data in 3 minutes of real time, while barossa can compress 1 minute of audio data in 33 minutes of real-time.
| Name | Speed | System | MHz Operating System |
| cimlr | 2:1 | DEC alpha (DEC3000-M500) | 275 OSF/1 V3.0 |
| ciml | 3:1 | DEC alpha (DEC3000-M500) | 150 OSF/1 V3.0 |
| uuscss | 5:1 | i586 PC clone | 90/100 Linux 1.2.13 |
| ml2 | 6:1 | DECstation 5000 MIPS R4400 | 60 Ultrix 4.4 |
| spring | 7:1 | Sun SPARCstation-5 | 110 Solaris 2.4 |
| staff | 8:1 | Sun SPARCserver-1000 | 50/100 Solaris 2.4 |
| karl | 9:1 | MIPS R4000 | 50/100 RISC/os 5.01 |
| smallpox | 18:1 | i486 PC clone | DX4/100 Linux 1.2.13 |
| joyce | 18:1 | MIPS R3000 (Magnum 3230) | 25 RISC/os 4.52 |
| hunter | 24:1 | Sun SPARC Classic | 50 Solaris 2.1 |
| barossa | 33:1 | Sun SPARC 4/65 | 25 SunOS 4.1.3 |
| anthrax | 34:1 | i486 PC clone | 33 Linux 1.2.9 |
Table 2: Compression time versus playing time
We have developed a distributed compression system in which we extract all the tracks first and then send them to different machines for compression, reducing compression time to approximately two hours.
Our file server is a modified version of the Plan 9 file server (Thompson 1995) in which we increased the block size from 8K to 64K and use a more aggressive read-ahead strategy.
Using 64K-blocks we only need one block every 4 seconds to serve a 128Kbps stream, reducing the number of seek operations per second. The maximum number of streams we can serve using a SCSI disk rotating at 4500RPM, with 20ms advertised average seek time, 4MB transfer rate, and controller overhead of 2ms, is:
However, we still have to send the data through an Ethernet to reach our terminals. We observed a 130ms average delay to read 64K from our file server using MIPS based Magnum 3000 machines, which reduces the maximum theoretical number of users to 4*1000ms/130m approximately 30.
Initially, the project used a magneto-optical juke-box as the main file server and a magnetic disk as a cache. The juke-box was a superseded HP Series 6300 with 20G capacity, enough space for 400 CDs. The juke-box had one CD drive and stored the data on 32 platters that were placed in the drive when needed.
To reduce arm movement we implemented a whole-file read-ahead strategy in which we read-ahead all the blocks of the file after the first read operation. In this way, when reading a CD-track from the juke-box, we extract all the blocks associated with the track at once. The advantage of this strategy can be illustrated if we imagine two non-cached tracks have been read from the juke-box at the same time using a one-block read-ahead strategy. In this case, the juke-box will probably swap disks once per read operation, which takes 6-8 seconds, making it impossible to read one block every 4 seconds. The whole-file strategy works quite well in our server because almost all our files are immutable and we sequentially sorted the free list at formatting time, obtaining a high degree of spatial locality.
However, using this whole-file strategy we sometimes would have to wait for a track to be played while other tracks are being moved to magnetic disk. A simplistic formula for the waiting time is:
where Tracklength is the average length of a track in seconds, Readtime is the average time to read a CD track from the juke-box and n is the maximum number of users.
Using our juke-box's parameters in the preceding formula, we have an average delay of 8 seconds to start playing a track. However, caching is not considered in the previous formula, so we rarely experience such delays. With a 2GB cache disk, our hit ratio was near 70%, which leads to a 2.4 seconds average delay.
Unfortunately, our juke-box was unreliable and lack of support forced us to move to a more traditional file system, composed of two 4G byte magnetic disks, having a maximum installed capacity of 170 CDs.
The audio data is stored in a simple directory structure with one directory per CD, containing one file per track and an index file with administrative information about the CD and a JPEG file with the cover of the CD for use in the user interface.
The fetcher monitors the pool, requesting more data from the file server whenever possible. The sender extracts the next chunk of data from the pool and passes it to the MPEG decoder. Another process, called commander, serves the ctl and status files and translates those operations into pool actions or system calls.
Following are the basic commands which we can write to the ctl file and how they affect the pool:
Fast-forward and rewind could be implemented by taking a subset of the stream's frames. For example, we could take 10% of the frames of a stream and send those frames to the decoder. Unfortunately, this mode would increase MPEG requirements by a factor of 10. We did not implement fast-forward nor rewind in our system, because we did not feel it was so important for audio and because our seek command covers most of these kind of interactions.
Figure 2 shows the screen layout used by our far interface. The left window is a browser of our on-line CD collection. The right window is the control panel, which allows for track selection, status report and basic operations. The right bottom window is a mixer application to control volume and other audio attributes.
Figure 3 shows our cordless mouse, which consists of three buttons and a track-ball. After using this mouse for a few weeks with traditional interfaces, as for example Web browsers and media players, we found that:
To reduce track-ball movement, pressing a button and moving the track ball at the same time, and cursor dependence, we developed a technique to assign functions and behaviour to the mouse that we feel is better for far interfaces. When using the cordless mouse:
To scroll through the CD list we have to press the left button to go down and the right button to go up. The up or down movement will accelerate if the button is held down. This is simpler than having to press one button and then move the track ball up and down to control a scrollbar. The former can be done with one hand and without having to look at the cursor on the screen or at the screen at all.
We found that moving away from the traditional select-and-drag model, we can exploit the capabilities of cordless mice and comfortable interfaces can be constructed. However, it means that new graphical libraries have to be developed. We modified the Panel library (Duff 1995) in such a way that most of its objects support our new paradigm.
Font size was one of the main problems we experienced with our far interface. Using 16x16-bits fonts (the biggest Plan 9 font) with a 1028x1024 screen, the interface is comfortable to use in the range 1 to 2 metres, but started to be difficult to use at 3 meter and almost impossible at 4. We are working to improve this interface using larger fonts and a better layout.
Here is a simple shell script, called dj, which selects music at random from the database and plays it.
#!/bin/rc
# NCD is the number of CDs in the database
while() {
#
# select cd and track to play
#
cd=`{rand $NCD}
ntrack = `{ls /n/cod/cd/$cd/*.mp2 | wc -l}
track=`{rand $ntrack}
#
# Send the play cmd to the prefetcher
#
echo play /n/cod/cd/$cd/$track > /dev/fetcher/ctl
#
# Wait for the track to finish, pooling
# the status file. The format is 'sec/len'.
#
off=0
len=100
while(! ~ $off $len) {
sleep 1
off = `{awk -F'/' '{print $1}'< /dev/fetcher/status
len = `{awk -F'/' '{print $2}'< /dev/fetcher/status
}
}
This small shell script illustrates the ease with which the audio system can be controlled. More sophisticated versions of this script are being developed that deliver a personalised 'mix' of tracks.
A Web-based interface has also been built to the audio system. This uses HTML pages and CGI scripts to allow the user to browse the CD database, select a CD or set of tracks from a given CD and initiate playing. This interface, while more portable than the Plan 9 interface, does not offer the same level of interaction.
For the audio system, we plan to:
Moving away form the traditional select and drag model we can better exploit the capabilities of cordless mice and comfortable far interfaces can be constructed.
Return to Conference Proceedings