An MPEG Audio System

Alberto Nava, Bruce Janson & Bob Kummerfeld
Basser Department of Computer Science,
University of Sydney
Email: bob@cs.su.oz.au

Abstract

The design and implementation of a music-on-demand system using a flexible MPEG audio player for Plan 9 terminals is described. The player can play real-time MPEG audio streams from a general purpose file server using prefetching techniques. The system architecture and two interfaces to the system are described.

1 Introduction

We are developing a Home Area Multimedia System (Kay et al. 1994), which provides delivery of multimedia materials within the home. The system consists of three components: a multimedia file server, multimedia terminals and a delivery protocol.

The initial application we developed to test the architecture was an MPEG audio system which allows terminals to play CD quality audio from a general purpose file server. To compensate for jittering and delay, prefetching techniques were used within the terminal software.

The system has three different interfaces: a far interface with a cordless mouse for non-keyboard interaction, and near interface which is used when working in front of the computer and a programming interface allowing programs, for example clock and alarm, to interact with the audio system.

For our far interface we developed a new mouse paradigm which reduces track-ball movement and cursor dependence making cordless mice easier to use.

The next section presents the architecture of the MPEG audio system and its main components. In section 4, we discuss MPEG audio compression and our audio collection. Modifications to the Plan 9 file server to cope with MPEG storage and play-back requirements are discussed in section 5. In section 6, we present prefetching techniques used by the terminal, as well as implementation details of standard audio operations. Sections 7 and 8 explains our user interfaces, the programming interface and a disk-jockey program. We finish with some conclusions in section 10.

2 System Architecture

Figure 1 shows the architecture of the system. Its main components are: a file server to store the CD database and multimedia terminals to fetch, decode and play MPEG compressed data. Both components are linked together using 9P (Pike et al. 1995) the Plan 9 file protocol, which is a state-full file protocol encapsulated above the 'IL' transport protocol AT&T Protocol. Our file server does not provide real-time services nor quality of service guarantees, although it has been tuned for MPEG audio. The terminal software is composed of three modules: a prefetching and control program, an MPEG decoder and interfaces to the system.


Figure 1: MPEG Audio Player Architecture

The prefetching software pre-loads a few kilobytes from the file server to compensate for network and file server jittering and delay. It has a pool of buffers and a process which monitors the number of free buffers in the pool, requesting data from the file server to keep the pool as full as possible. We found it easier to implement standard operations such as CHANGE, STOP and SEEK, in the prefetching program rather than in the interfaces, because the choice of data that should be prefetched depends on the current mode, for example PLAY, FAST-FORWARD or REWIND. The prefetcher exports a file system with two files: a ctl file to issue operations and a status file to read administrative information.

The main commands we can issue to the ctl file are:

Play a track of a particular CD

Pause/Resume the current track

Stop the current track

Seek to the nth second of the current track

Reading the status file returns track name, current position within the track and total length of the track.

Having a file system interface to the low-level software, prefetcher and decoder, allows applications to be constructed using different systems and programming languages, from textual interfaces based on shell scripts to graphical ones using high-level languages, and promotes re-use of the low-level software which is often more complex and time-consuming to develop than the user interface.

The output of the prefetcher is connected to an MPEG decoder, which could be software based or hardware assisted. We use a decoding program called maplay Bading, which decodes MPEG audio layer II in real-time on a 100MHz Pentium PC. The prefetcher and the decoder are connected through a pipe.

The main functions of our user interfaces are to provide ways to select CDs and tracks to be played, present status information and apply standard commands. User interfaces translate user-actions into the corresponded strings to be written to the ctl file and present the status information read from the status file. In section 6, we describe two different user interfaces to the system.

3 MPEG Audio

The MPEG audio compression algorithm ISO produces fixed-rate streams with bandwidth ranging from 8Kbit to 256Kbit per second, and composed of self-contained and almost identical size frames. Each frame contains its own header and data, making it possible to seek to a particular frame within the stream, without having to read the previous data, and implement common operations such as fast-forward and rewind, playing only a subset of the stream's frames.

After some investigation, we chose 128Kbit/s as our compression rate which gives CD quality audio and storage requirement of 960Kbytes per minute.

A 128Kbit per second stream is composed of sequences of twenty three 418-byte frames plus one 417-byte frame, and twenty four 418-byte frames plus one 417-byte frame. Using this information, it is easy to translate an offset in seconds within a track to a byte-offset within a track file.

In a 128Kbit/s fixed-rate compressed stream, the seconds to offset formula is 128/8*1024*seconds. However, we have to round this offset to the next frame boundary to keep maplay synchronised.

It is also possible to convert from a byte-offset within the track to a number of seconds or milliseconds since the beginning of the track.

4 Audio Database

Audio data for the prototype system was extracted from CD-DA formatted CDs using several different CD-ROM drives and extracting programs. Some difficulty was encountered with certain CDROM drives due to the inability to extract data at the required rate. We are now able to read audio CDs without noticeable defects and almost in real-time using a SPARCstation 5 workstation equipped with a TOSHIBA XM4104 CD-ROM. The data is then compressed into the MPEG form.

Currently, our audio database averages 3.8Mbytes per track and 48MBytes per CD. Our design goal for the home multimedia system is to service 5 video and 5 audio streams simultaneously. The system is currently capable of servicing 8 simultaneous audio streams. The number of users has been limited by the lack of CPU power of our Plan 9 terminal computers to decode MPEG in real time. Table 1 shows the decoding capacity of the available machines. The M-No is the number of MPEG streams that a machine can decode in real-time. For example, a SPARC ELC is not quite able to decode one MPEG stream, while a Magnum R4400 can decode three MPEG streams in real-time.

M-No.CPUOperating System
0.41486DX /33Linux 1.2.9 i48
0.75486DX2/66Linux 1.3.3 i486
0.81486DX4/100Linux 1.2.13 i486
0.93SPARC ELCSolaris 2.4
1.65Pentium 60/66Deskpro XE560
1.73Pentium 60/66Linux 1.3.56, 60MHz
2.54Pentium 90/100Linux 1.2.13 i586
3.06SPARC 50Solaris 2.4
3.24R4400ULTRIX 4.3 1 RISC

Table 1: Decoding Capacity

Table 2 displays the relationship between compression time and playing time. For example, the machine ciml can compress 1 minute of audio data in 3 minutes of real time, while barossa can compress 1 minute of audio data in 33 minutes of real-time.

NameSpeedSystemMHz Operating System
cimlr2:1DEC alpha (DEC3000-M500)275 OSF/1 V3.0
ciml3:1DEC alpha (DEC3000-M500)150 OSF/1 V3.0
uuscss 5:1i586 PC clone90/100 Linux 1.2.13
ml26:1DECstation 5000 MIPS R440060 Ultrix 4.4
spring 7:1Sun SPARCstation-5110 Solaris 2.4
staff8:1Sun SPARCserver-100050/100 Solaris 2.4
karl9:1MIPS R400050/100 RISC/os 5.01
smallpox18:1i486 PC cloneDX4/100 Linux 1.2.13
joyce18:1MIPS R3000 (Magnum 3230)25 RISC/os 4.52
hunter24:1Sun SPARC Classic50 Solaris 2.1
barossa33:1Sun SPARC 4/6525 SunOS 4.1.3
anthrax34:1i486 PC clone33 Linux 1.2.9

Table 2: Compression time versus playing time

We have developed a distributed compression system in which we extract all the tracks first and then send them to different machines for compression, reducing compression time to approximately two hours.

5 Audio File Server

To deal with the real-time nature of audio, we explored an approach in which we put most of the control in the clients and not in the file server. Clients prefetch enough data to compensate for delay or jittering. Fortunately, MPEG audio requirements are small enough that not a great deal of work needs to be done to achieved the required quality of service.

Our file server is a modified version of the Plan 9 file server (Thompson 1995) in which we increased the block size from 8K to 64K and use a more aggressive read-ahead strategy.

Using 64K-blocks we only need one block every 4 seconds to serve a 128Kbps stream, reducing the number of seek operations per second. The maximum number of streams we can serve using a SCSI disk rotating at 4500RPM, with 20ms advertised average seek time, 4MB transfer rate, and controller overhead of 2ms, is:

However, we still have to send the data through an Ethernet to reach our terminals. We observed a 130ms average delay to read 64K from our file server using MIPS based Magnum 3000 machines, which reduces the maximum theoretical number of users to 4*1000ms/130m approximately 30.

Initially, the project used a magneto-optical juke-box as the main file server and a magnetic disk as a cache. The juke-box was a superseded HP Series 6300 with 20G capacity, enough space for 400 CDs. The juke-box had one CD drive and stored the data on 32 platters that were placed in the drive when needed.

To reduce arm movement we implemented a whole-file read-ahead strategy in which we read-ahead all the blocks of the file after the first read operation. In this way, when reading a CD-track from the juke-box, we extract all the blocks associated with the track at once. The advantage of this strategy can be illustrated if we imagine two non-cached tracks have been read from the juke-box at the same time using a one-block read-ahead strategy. In this case, the juke-box will probably swap disks once per read operation, which takes 6-8 seconds, making it impossible to read one block every 4 seconds. The whole-file strategy works quite well in our server because almost all our files are immutable and we sequentially sorted the free list at formatting time, obtaining a high degree of spatial locality.

However, using this whole-file strategy we sometimes would have to wait for a track to be played while other tracks are being moved to magnetic disk. A simplistic formula for the waiting time is:

where Tracklength is the average length of a track in seconds, Readtime is the average time to read a CD track from the juke-box and n is the maximum number of users.

Using our juke-box's parameters in the preceding formula, we have an average delay of 8 seconds to start playing a track. However, caching is not considered in the previous formula, so we rarely experience such delays. With a 2GB cache disk, our hit ratio was near 70%, which leads to a 2.4 seconds average delay.

Unfortunately, our juke-box was unreliable and lack of support forced us to move to a more traditional file system, composed of two 4G byte magnetic disks, having a maximum installed capacity of 170 CDs.

The audio data is stored in a simple directory structure with one directory per CD, containing one file per track and an index file with administrative information about the CD and a JPEG file with the cover of the CD for use in the user interface.

6 Prefetcher and Basic Operations

The prefetching module consists of a buffer pool, in which to put prefetched chunks of data from the current track, and monitoring processes which handle different activities within the module.

The fetcher monitors the pool, requesting more data from the file server whenever possible. The sender extracts the next chunk of data from the pool and passes it to the MPEG decoder. Another process, called commander, serves the ctl and status files and translates those operations into pool actions or system calls.

Following are the basic commands which we can write to the ctl file and how they affect the pool:

play
When a play command is issued the fetcher starts to read data from the new track until the pool is full. After that the sender starts to write data to the decoder's pipe and the fetcher monitors the pool, requesting more data when possible.
pause/resume
To pause or resume the player we send stop/start messages to maplay's control file, stopping or restarting the decoder, which will eventually block or restart the sender process.
stop
To stop the player we stop sending compressed data down the pipe and flush the pool. (We also keep writing data until we encounter a frame boundary. See Advanced operation below.)

7 Advanced operations

Seek operations are difficult because of the already processed data which could have been written to the MPEG decoder, but not already decoded nor played, and due to the MPEG frame based structure which required stops at particular points within the stream. To seek within a track of a CD we have to:

  1. Write data to the pipe until a frame boundary is reached
  2. Flush the buffer pool
  3. Translate the seek value to an offset within the file
  4. Round this offset to next boundary frame
  5. Change the prefetcher offset to the new value

Fast-forward and rewind could be implemented by taking a subset of the stream's frames. For example, we could take 10% of the frames of a stream and send those frames to the decoder. Unfortunately, this mode would increase MPEG requirements by a factor of 10. We did not implement fast-forward nor rewind in our system, because we did not feel it was so important for audio and because our seek command covers most of these kind of interactions.

7 User Interfaces

The previous layers provide the base for developing interfaces to our audio system, reducing their complexity and development time. We have constructed several interfaces to the system. In this section we describe two of them: a far interface and an off-line one.

7.1 Far Interface

The far interface is part of a Home Area Multimedia System (Nava et al. 1995) in which we use a cordless mouse to control video, audio and other applications.

Figure 2 shows the screen layout used by our far interface. The left window is a browser of our on-line CD collection. The right window is the control panel, which allows for track selection, status report and basic operations. The right bottom window is a mixer application to control volume and other audio attributes.


Figure 2: The Far Interface

Figure 3 shows our cordless mouse, which consists of three buttons and a track-ball. After using this mouse for a few weeks with traditional interfaces, as for example Web browsers and media players, we found that:


Figure 3: The Cordless mouse

To reduce track-ball movement, pressing a button and moving the track ball at the same time, and cursor dependence, we developed a technique to assign functions and behaviour to the mouse that we feel is better for far interfaces. When using the cordless mouse:

To scroll through the CD list we have to press the left button to go down and the right button to go up. The up or down movement will accelerate if the button is held down. This is simpler than having to press one button and then move the track ball up and down to control a scrollbar. The former can be done with one hand and without having to look at the cursor on the screen or at the screen at all.

We found that moving away from the traditional select-and-drag model, we can exploit the capabilities of cordless mice and comfortable interfaces can be constructed. However, it means that new graphical libraries have to be developed. We modified the Panel library (Duff 1995) in such a way that most of its objects support our new paradigm.

Font size was one of the main problems we experienced with our far interface. Using 16x16-bits fonts (the biggest Plan 9 font) with a 1028x1024 screen, the interface is comfortable to use in the range 1 to 2 metres, but started to be difficult to use at 3 meter and almost impossible at 4. We are working to improve this interface using larger fonts and a better layout.

8 Off-line interface: The Disk-Jockey program

During the day-to-day use of the system we found it inconvenient to have to be looking at the screen to select the CD to play next. For this reason we developed a series of programs which select and play music without having to interact with the users.

Here is a simple shell script, called dj, which selects music at random from the database and plays it.

#!/bin/rc

# NCD is the number of CDs in the database

while() {
    #
    # select cd and track to play
    #
    cd=`{rand $NCD}
    ntrack = `{ls /n/cod/cd/$cd/*.mp2 | wc -l}
    track=`{rand $ntrack}
    #
    # Send the play cmd to the prefetcher
    # 
    echo play  /n/cod/cd/$cd/$track > /dev/fetcher/ctl
    #
    # Wait for the track to finish, pooling 
    # the status file. The format is 'sec/len'.
    # 
    off=0
    len=100
    while(! ~ $off $len) {
        sleep 1
        off = `{awk -F'/' '{print $1}'< /dev/fetcher/status
        len = `{awk -F'/' '{print $2}'< /dev/fetcher/status
    }
}

This small shell script illustrates the ease with which the audio system can be controlled. More sophisticated versions of this script are being developed that deliver a personalised 'mix' of tracks.

9 Related Work

The audio system described in this paper is part of a larger project to build a prototype of the Home Multimedia System of the future. Several components of the complete system are being built in parallel: a multicast delivery system for multimedia objects, a system for selection, filtering and customisation of multimedia objects, and a user modelling toolkit. These components will allow an adaptive user interface to be built that will present new music to users that have expressed a preference for that type of music.

A Web-based interface has also been built to the audio system. This uses HTML pages and CGI scripts to allow the user to browse the CD database, select a CD or set of tracks from a given CD and initiate playing. This interface, while more portable than the Plan 9 interface, does not offer the same level of interaction.

For the audio system, we plan to:

10 Conclusion

The design and implementation of a flexible MPEG audio player for Plan 9 terminals was described. The player can play real-time MPEG audio streams from a general purpose file server using prefetching techniques.

Moving away form the traditional select and drag model we can better exploit the capabilities of cordless mice and comfortable far interfaces can be constructed.

Acknowledgements

Several other people contributed to this project: Dave Hogan did the driver for the magneto-optical juke-box we used in our initial file server, Tobias Bading implemented the MPEG audio decoder, and Amila Fernando and Michael Mikalauskas did the initial implementation of the web based interface and helped during the encoding phase.

Copyright

All of the audio data used in our prototype system is copyright material. Some of the CDs used have been purchased specifically for the project and others have been donated. The audio data is not available outside the Department of Computer Science and is only used for teaching and research purposes.

Plan 9 File Systems

In Plan 9, the file system namespace is the interface to services. Most services, including graphics and network, are implemented with a program accepting requests when data is read from or written to file names that have been bound to the program.

Bibliography

1
Bading, T. (nd) MPEG Audio Player maplay 1.2

2
Duff, T. (1995) A Quick Introduction to the Panel Library in Plan 9, AT&T.

3
Coded Representation of Picture, Audio and Multimedia/Hypermedia Information (1991), Committee Draft of Standard ISO/IEC 11172, ISO.

4
Kay, J. & Kummerfield, R. J. (1994) Customization and Delivery of Multimedia Information in Proceedings Multicomm 94, Vancouver.
URL: http://www.cs.su.oz.au/~bob/CandD.html

5
Nava, A. & Kummerfeld, R. J. (1995) Architecture of a Home Area Multimedia System in Proceedings AUUG95 and Asia-Pacific WWW95, Sydney.

6
Pike, R., Presotto, D., Dorwood, S., Flandrena, B., Thompson, K., Trickey, H. & Winterbottom, P. (1995) Plan 9 from Bell Labs, AT&T.

7
Winterbottom, P. & Presotto, D. (1995) The IL Protocol - Plan 9, AT&T.

8
Thompson, K. (1995) The Plan 9 File Server, AT&T.


Organised by: AUUG'96 & CSU Return to Conference Proceedings