The machine that powered numerous Number Stations
A few years ago, I first stumbled over some recordings of number stations and started reading up on their history. For most of them, we have pretty good guesses who ran them and for what purpose. One example is the Gongs and Chimes Station, known for its melody that got increasingly creepy over the years, the more worn down the tape got. What always seemed mysterious to me is the kinds of devices used to actually generate the spoken messages. Were they similar to electromechanical devices used for time announcements over the telephone, with multiple play heads or complicated tape mechanisms?
Then, I stumbled over this video. And... It's the actual voice used by the Gongs and Chimes Station? What surprised me was how rather small, polished and complex the machine seemed. There definitely was a microcontroller in there, it had an LED display, and just in general seemed like a rather fancy device.
Sprach-Morse-Generator by Mfs-sammler via wikimedia commons
The most important thing however was, I finally knew what to look for. Searching for "Sprach-Morse-Generator" (Speech-Morse-Generator) quickly got me to a page about it at the crypto museum which also mentions its official name: "Gerät 32620" (Device 32620). The page also has some great information about the history of the device and the voice behind it.
What exited me the most was... There are actually links to technical documentation about the Speech generator and its companion device, Gerät 32621, which could be used for digitizing voice recordings and writing them to EPROM cartridges to use with Gerät 32620.
I don't want to go over everything mentioned in these guides; if you speak German I'd recommend giving it a read yourself; but I do want to mention some things I found notable.
As input, either paper punch tapes, the keyboard, or a serial interface (RS-232) can be used. The last one is a bit unclear, it is mentioned in the beginning, but later parts of the document mention it as a possible future addition. Either way, the messages are transferred to RAM, from where they can later be transmitted. It can hold up to 3791 characters.
The language samples are loaded from an exchangeable cartridge which can hold up to 96 kByte, where they are stored as uncompressed PCM . They proudly mention the cartridges can be swapped in 10 seconds.
The number of words per minute and pitch of the voice can be varied, which explains why some recordings of number stations with the same voice sound a bit different.
Technical drawing of the display and keyboard. The display can show which message, group and digit is played right now.
What I find quite surprising: The keyboard and output of the device mostly uses English words. A bit unexpected for a device developed in eastern Germany for use in the Soviet Union; I would rather have expected Russian there.
There are some pretty charming charts in the documentation showing some interactions with the device.
Bus chart of the whole system. Interestingly, it shows a second language module, although the final device only supports one language module at a time.
On the back are a speaker, output to the transmitter and some status outputs and remote control inputs. These were probably used so the chimes and speech wouldn't overlap.
There are also pretty detailed schematic drawings and lists of component at the end of the document.
This is the companion device used for digitizing voices and writing them to cartridges for the 32620. It's quite astounding how complicated this device really is: It can either record words separately or all of them in one go, it automatically recognizes the beginning and end of the words, it can duplicate cartridges and it even has a UV chamber for clearing the EEPROMS on a cartridge to reuse it.
What's interesting: There are no photos of this device. It's unclear how many were produced, there may just have been one or two. The only things we know about it are from this document.
In its architecture, it is quite similar to the 32620, it has a very similar keyboard and probably the same display.
One interesting difference: It uses German labels on the keyboard. Also, all of the text it shows on the display is in German.
The documentation mentions recording up to 20 words. This also came up in the 32620's documentation, although only 13 words were ever used: digits 0-9, "Achtung" (Attention), "Trennung" (Seperation) and "Ende" (End).
It has controls for adjusting the 0-level of the recordings and amplification. LEDs show whether it senses an audio signal and whether it's clipping. The amplification for single words can also be adjusted later, from a value of 0 to 7. It seems like this doesn't actually change the recorded samples but controls the amplification level in the digital to analogue converter. This feature is intended for making all the digits similar in volume.
I find this a bit surprising, all of this is something that could be done with other equipment, especially since only two different voices for these machines are known: German and Spanish.
In order to check the recording, either all recorded digits or just combinations of them can be played back.
When programming to an EPROM, it transfers the start address and amplification for each word and the samples from RAM to the ROM and checks whether the values it reads back after programming are as expected.
Numbering of the EPROMs on the cartridge. If the read back values don't match, the operator can switch out the offending EPROM chip.
There is a separate mode for deleting cartridges where the time to expose them to UV can be typed in via the keyboard. If the rom cartridge is double sided, the cartridge has to be turned around to delete the chips on the other side.
For all of the different modes of the 32621, flow charts for the interface are provided.
The front also features various other indicators, e.g. for the different voltages in the device.
This manual also includes schematics for the device.
The crypto museum also provides dumps of the EPROMS for both the German and Spanish cartridges.
Having read the manuals, decoding them is actually not that hard: The data is stored as PCM, and the 32621 Guide mentions the time between samples is 125μs, which adds up to a sample rate of 8000Hz. Loading the roms into Audacity with File>Import>Raw Data, you can actually already listen to them!
In the very beginning of the first EPROM, there are 64 byte of header information, containing the start address for each word and its amplification factor. For 20 words, that leaves just over 3 bytes per word, although they seem to just stick to 3.
Decoding the rest of the header took a lot longer than anticipated. I found the 3 bits for the amplification value pretty quickly in the last byte for each word. The rest somehow had to be the start address or some offset, and possibly the length. For accessing the values in each of the 8K chips, 13 Bits for the address are needed. To select one of the up to 12 EPROMs, another 4 Bit are required. This adds up to 17 Bits, so one more than in two bytes.
To get around this, there are two obvious solutions: Stuff the one extra bit into the byte which holds the amplification level, or throw away the least significant bit for the start address, so just every other sample can be used as the start sample.
Two things made this really hard to pin down: The z80 is a little endian machine, meaning for 16 bit values, the less significant byte is stored before the more significant byte, but this doesn't mean they would have to store the values in this order. Also, the binary chip select value for the first EPROM isn't 0, but 2. I only stumbled upon that when looking at the schematic for the EPROM cartridge (p. 38 in the 32620 manual). There's also a really cute demultiplexer built from discrete AND and NAND gates for translating the chip address to the actual chip select lines!
Isn't it gorgeous! Note how it only suggests the presence of the fourth address line for the chip select with that not connected NAND.
My best guess to why is the case is the chip select address values 0 and 1 are already used to select some other components. But, at last, I think I found out how it works:
For each word, the second byte contains the 3 least significant bits for the chip select and 5 of the address bits. The first byte contains the remaining 8 address bits. The end of each word is the beginning address for the next word. That's why there are two extra bytes after the last word. This would have worked for up to twenty words, as there would still be 4 spare bytes to contain the end address of the last word. The third byte contains the remaining chip select bit, some mysterious bits, and 3 bits for the amplification level.
The remaining chip select bit is still a bit confusing to me: The German recordings don't use it, since they fit on just 6 EPROMs. The Spanish one has to use it for the last two words, but there's something odd happening with the values in the one before the last two words.
Another thing worth noting: The German words all use the default amplification level of 3, whereas the Spanish ones use lower and higher values for some words. To be really authentic, you'd have to take that into account when playing the samples.
One last thing I was interested in: The Spanish voice uses the same word for "Trennung" and "Ende"&em;"final", and they sure sound the same. So, is it a digital copy? Or was the same section of magnetic tape digitized twice? Looking at the sample values, it's quite easy to tell the second option is the case; the sample values don't match up.
Wow, I got a lot deeper into this than I had planned. But, let me tell you, trying to reverse engineer a binary format without the actual machine that can read it is quite a lot of fun, but also pretty hard.
They're an interesting and weird bit of both the world's history and the history of computers, but I think the mysteries that still surround these machines are what make them so particularly fascinating: It's unclear how many of them existed, the 32621 seems to be completely lost to history. It's also unclear whether any other voices existed; judging from the documentation, it would have actually been fairly easy to produce more of them, but I've not found any information that would suggest so.
german 00 40 43 b7 4a 43 c3 58 43 d9 68 43 fb 75 43 26 | | | | | 84 43 b6 95 43 35 a4 43 4d b1 43 33 c1 43 42 d0 | | | | | 43 84 e3 43 05 f4 43 eb fe ff ff ff ff ff ff ff | | | ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff CHIP SELECT sequence: 1, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6 german, binary: 00000000 01000000 01000011 10110111 01001010 01000011 11000011 01011000 ^^^^^^^^ ^^^^^^^^ ^^^^^^^^| | LIIIIIII LIILIIII I????LII I I I I +-- Amplification (Default: 3) I I I +--------- Remaining chip select line? (but not quite?) I I +----------- Address (HI Byte) I +---------------- Chip Select lines A15, A14, A13 +-------------------- Address (LO Byte) 01000011 11011001 01101000 01000011 11111011 01110101 01000011 00100110 | | | 10000100 01000011 10110110 10010101 01000011 00110101 10100100 01000011 | | | 01001101 10110001 01000011 00110011 11000001 01000011 01000010 11010000 | | 01000011 10000100 11100011 01000011 00000101 11110100 01000011 11101011 | | | 11111110 11111111 11111111 11111111 11111111 11111111 11111111 11111111 spanish 40 40 42 19 50 43 18 5c 45 ca 70 43 26 81 43 49 | | | | | 95 43 7a a9 43 ff bd 43 d2 d0 43 f7 e3 43 70 f9 | | | | | c3 13 57 83 6d 6b 83 ba 7f ff ff ff ff ff ff ff | | | | | | ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff | | | | CS Sequence: 1, 1, 1, 2, 3, 3, 4, 4, 5, 6, 6, 7, 8 _ _ spanish, binary 01000000 01000000 01000010 00011001 01010000 01000011 00011000 01011010 "0" |"1" |"2" 01000101 11001010 01110000 01000011 00100110 10000001 01000011 01001001 |"3" |"4" |"5" 10010101 01000011 01111010 10101001 01000011 11111111 10111101 01000011 |"6" |"7" | 11010010 11010000 01000011 11110111 11100011 01000011 01110000 11111001 "8" |"9" | "Atencion" 11000011 00010011 01010111 10000011 01101101 01101011 10000011 10111010 |"Final" |"Final" | 01111111 11111111