Doom PC speaker format

Back when Doom was released in 1993, many computers didn't even have sound cards capable of playing digital sound effects, something we now take for granted. Doom therefore included support for PC speaker sound effects, which took advantage of the buzzer found in almost all PCs so that those without sound cards could make crude beeps and at least add some degree of audio to the game.

By the time of the Doom source release all new computers were shipping with sound cards and this feature had become obsolete and long forgotten. Because the source shipped without the DMX sound library, the format for these sound effects was not even known. This bothered me slightly as almost all of the contents of Doom's IWAD files was understood, and the PC speaker format was one of the last mysteries left.

Working with Andrew Apted on the Doomworld forums, I developed a modified version of DOSbox which I used to reverse engineer the format. I wrote up the details as a text file specification which at the time I imagined as an addendum to the Unofficial Doom Specs which I had spent much time reading when I was a teenager. The text file contents are below.

I also implemented PC speaker playback support in Chocolate Doom, a project I had recently started. Chocolate Doom became the first source port to include PC speaker playback support. A video is below.


=============================================================================

    Doom PC speaker file format description

       by Simon Howard (fraggle) and Andrew Apted (ajapted)

    Version 1.0

    27 February, 2007

=============================================================================

----------------
COPYRIGHT NOTICE
----------------

Copyright (C) 2007 Simon Howard.  Verbatim copying is allowed provided
this notice is kept attached.

---------
CONTENTS:
---------

 [1] Introduction
 [2] File format
    [2-1] Lump naming
    [2-2] Header
    [2-3] Data format
    [2-4] Frequency table
 [3] Method of research


[1]: Introduction
=================

When Doom was released in 1993, digital sound cards were not as ubiquitous 
as they have now become.  As a result, a large percentage of PCs did not
have the ability to play digital audio.  

To work around this problem, when Id wrote Doom they included support for
two different sound formats.  The main one is the digital sound format,
which supports 8-bit mono sound effects; this is used for the sound blaster
and all other sound cards supported by Doom.  The second is the PC speaker 
format, which uses the speaker built in to every PC to make basic beeping
sound effects.

PC speaker sound effects can be enabled by setting snd_sfxdevice to 1
in the Doom configuration file (usually named default.cfg).

While the basic structure of PC speaker sound effects has been known 
since the early days of Doom (see the Unofficial Doom Specs), its 
complete format has not been properly documented before now.  The 
reasons for this are mainly those of indifference: the ubiquity of 
digital sound cards means that there is little interest in playing PC 
speaker sound effects.

Determining the format of PC speaker sound effects has been made more 
difficult as the code to play them is not present in the released Doom
source code: either it depended on the proprietary DMX sound library or
was simply removed with the other sound code before the source release.


[2]: File format
================

[2-1]: Lump naming
==================

The PC speaker sound data is stored in the main IWAD file, along with other
resources used by Doom.  The IWAD contains a lump resource for each sound
effect, each of which has a name up to 8 characters long.  The first 
two characters of each PC speaker effect are "DP"; the remaining six 
characters identify the sound.  This is the same format as is used for
digital sound effects, which use the prefix "DS".  The six character
names match those used for the digital effects.

As an example, the digital sound effect for firing the pistol is named 
"DSPISTOL"; the corresponding PC speaker sound effect is "DPPISTOL".


[2-2]: Header
=============

PC speaker effects have a four byte header.  The first two bytes are
always zero.  The third and fourth bytes are a 16-bit little endian value 
specifying the length of the sound effect in samples.  The sample rate
is 140 samples per second; for example, a sound effect 70 samples long
will take half a second to play.


[2-3]: Data format
==================

Immediately following the header is the sound sample data; the number of
samples is specified in the header.  Each sample is a single byte.

The sample values map to frequencies: higher sample values correspond to
higher frequencies.  The scale used is a musical one: the samples 
correspond to frequencies of notes in the western musical scale.  As
such, the frequency of the notes increases exponentially.  There are
24 values per octave, whereas the western musical scale has 12 semitones 
per octave.  The sample values are in fact a superset of the western 
musical scale, with an extra value inserted between each semitone.

As a consequence of the above, adding 24 to the value of a sample will
double the frequency.  The values start at 175Hz, which is a low "F".

A table of the frequencies is given in section 2-4.

A value of zero indicates silence.


[2-4]: Frequency Table
======================

This table specifies the frequency for values in PC speaker sound effects.
The value is given along with the corresponding output frequency and the
precise counter value which should be programmed into the 8253 timer chip
found in PCs to give this frequency.  

This table is only given for the range 0..95, which is the range of values
used by the Doom PC speaker sound effects.  It is not known if higher
values are supported.

In some cases, the frequency is off by one or two hz, this is due to
loss of precision in calculations, and is an inherent limitation of 
the 8253 chip.

Value   Counter    Frequency(Hz)   Note
-------------------------------------------------------------
0       -          -               Silence
1       6818       175.00          F-3
2       6628       180.02
3       6449       185.01          F#3
4       6279       190.02
5       6087       196.02          G-3
6       5906       202.02
7       5736       208.01          G#3
8       5575       214.02
9       5423       220.02          A-3
10      5279       226.02
11      5120       233.04          A#3
12      4971       240.02
13      4830       247.03          B-3
14      4697       254.03
15      4554       262.00          C-4
16      4435       269.03
17      4307       277.03          C#4
18      4186       285.04
19      4058       294.03          D-4
20      3950       302.07
21      3836       311.04          D#4
22      3728       320.05
23      3615       330.06          E-4
24      3519       339.06
25      3418       349.08          F-4
26      3323       359.06
27      3224       370.09          F#4
28      3131       381.08
29      3043       392.10          G-4
30      2960       403.10
31      2875       415.01          G#4
32      2794       427.05
33      2711       440.12          A-4
34      2633       453.16
35      2560       466.08          A#4
36      2485       480.15
37      2415       494.07          B-4
38      2348       508.16
39      2281       523.09          C-5
40      2213       539.16
41      2153       554.19          C#5
42      2089       571.17
43      2032       587.19          D-5
44      1975       604.14
45      1918       622.09          D#5
46      1864       640.11
47      1810       659.21          E-5
48      1757       679.10
49      1709       698.17          F-5
50      1659       719.21
51      1612       740.18          F#5
52      1565       762.41
53      1521       784.47          G-5
54      1478       807.29
55      1435       831.48          G#5
56      1395       855.32
57      1355       880.57          A-5
58      1316       906.67
59      1280       932.17          A#5
60      1242       960.69
61      1207       988.55          B-5
62      1173       1017.20
63      1140       1046.64         C-6
64      1107       1077.85
65      1075       1109.93         C#6
66      1045       1141.79
67      1015       1175.54         D-6
68      986        1210.12
69      959        1244.19         D#6
70      931        1281.61
71      905        1318.43         E-6
72      879        1357.42
73      854        1397.16         F-6
74      829        1439.30
75      806        1480.37         F#6
76      783        1523.85
77      760        1569.97         G-6
78      739        1614.58
79      718        1661.81         G#6
80      697        1711.87
81      677        1762.45         A-6
82      658        1813.34
83      640        1864.34         A#6
84      621        1921.38
85      604        1975.46         B-6
86      586        2036.14
87      570        2093.29         C-7
88      553        2157.64
89      538        2217.80         C#7
90      522        2285.78
91      507        2353.41         D-7
92      493        2420.24
93      479        2490.98         D#7
94      465        2565.97
95      452        2639.77         E-7


[3]: Method of research
=======================

Although the basic structure of the file format is relatively trivial to
reverse engineer, the mapping of sample values to actual frequencies has 
not been known until now.

The frequencies were found by running DOS Doom inside a modified version
of DOSBox.  This version of DOSBox was instrumented to output the timer
values written to the emulated 8253 timer chip (PCSPEAKER_SetCounter in
dosbox/src/hardware/pcspeaker.cpp).  Doom was then run using a WAD
containing a replacement DPPISTOL lump, crafted to cycle through all 
of the sample values from 1..95.  The output from DOSBox then gave the
timer countdown value for each sample.  These were put into a spreadsheet
and used to calculate the corresponding frequency.