It's been a while since we've run a technical post here on the blog. I recently ran across a problem that seems to be quite common amongst iPhone game developers, so I thought I'd do a quick post on it: using IMA4 (ADPCM) audio encoding on the iPhone/iPod Touch.
IMA-ADPCM is a compression standard defined by the Interactive Multimedia Association that gets you 4:1 compression on 16-bit audio files. It's supported natively by the iPhone — but only using certain APIs.
In particular, it's not supported by OpenAL, which is a shame, because OpenAL is the easiest way to get fast multitrack positional audio on the iPhone. If you're writing games with sound on the iPhone, you're probably using OpenAL to do it.
Well, I wanted to compress the audio in Hexterity, to make it a smaller, quicker download, and to take up less space on people's iPhones. The lack of OpenAL support for compressed audio stopped me, but I've been revisiting the topic this week — both because I may add it to a future Hexterity update, and because I need it for another project.
Note: I'm going to assume in this article that you already have OpenAL audio playback working on the iPhone with uncompressed PCM data. Perhaps you've written your own, or perhaps you're using the SoundEngine code from the CrashLanding demo. If you're using SoundEngine, I should warn you that last time I checked, it was full of memory leaks and other bugs and isn't recommended for use in production.
It tends to be known as ADPCM in the Windows world, and that's what we knew it as back at Mucky Foot, when Tom put support for it into Startopia. In the Mac world, IMA4 seems to be the preferred name, and what I'll call it for the rest of this article. But it's the same stuff either way, although some of the "magic numbers" may vary slightly from platform to platform.
The system relies on the fact that most audio data is somewhat predictable, being to some extent a smooth wave shape. Instead of storing a series of samples, it predicts what the next sample should be, finds the difference between that and the real value, then uses lookup tables to compress that delta to just 4 bits for each 16-bit signed integer sample, getting us our 4:1 compression.
In practice, the compression isn't quite 4:1 because the data is split into packets, where each packet represents 64 samples (128 bytes) of PCM audio, and is stored as 32 data bytes plus 2 header bytes. Still, 3.76:1 is good enough for me :)
This is the easiest bit; your Mac ships with a tool to do it, afconvert. Using it to compress a wave file to IMA4 is pretty straightforward:
afconvert -f caff -d ima4 audiofile.wav
...will write out audiofile.caf (Core Audio File) compressed to IMA4.
At some point, your audio framework is going to receive a path or URL to an audio file and call AudioFileOpenURL() on it. Assuming it opens OK, you can call AudioFileGetProperty() to learn about its structure. The two most important properties for us at this point are kAudioFilePropertyDataFormat and kAudioFilePropertyAudioDataByteCount.
kAudioFilePropertyDataFormat will give you an AudioStreamBasicDescription structure containing sample rate, number of channels, etc. All you need to do is check the mFormatID member, and see if it's the four character code 'ima4'.
Most likely, your code currently fetches kAudioFilePropertyAudioDataByteCount, reserves a buffer of that size, then uses AudioFileReadBytes() or AudioFileReadPackets() to read the data into the buffer. This buffer is then handed off to OpenAL using alBufferDataStaticProc().
To minimise disruption to your code, I recommend using AudioFileReadPackets() whether or not you're dealing with IMA4 files. That way, the only changes you need to make are:
In theory, you could just multiply the packed data size by 128 then divide by 34, but I don't recommend this — all the relevant data is stored in the .caf file itself, and if you use that instead of hardcoding, then your code won't explode if you get an unusual file. The AudioStreamBasicDescription contains the stuff you need in mBytesPerPacket (usually 34) and mFramesPerPacket (usually 64, then multiply by sizeof(SInt16)). Note that if you're loading stereo files, the packets are interleaved, but this shouldn't affect your decoding algorithm since each packet is independent anyway.
IMA4 requires two tables of magic numbers, the Index Table and the Step Table, which can be found on Multimedia Wiki along with more details.
The decoding process starts from scratch for each packet, by initialising three values: predictor, step_index and step. The first two are encoded into the header, which is a big-endian 16-bit value:
Treat the rest of the packet as a stream of 64 nibbles, one per sample (low nibble of each byte first, then high). Each nibble is passed through this algorithm:
step_index += ima_index_table[(unsigned)nibble]; int diff = ((signed)nibble + 0.5f) * step / 4; predictor += diff; step = ima_step_table[step index];
Pretty noddy, right? At the end of each pass, predictor is the new 16-bit sample ready to write into your PCM audio buffer. Note that although predictor is a signed 16-bit value, the algorithm can cause it to go out of range — you should use a 32-bit int and clamp it yourself. Likewise, step_index needs to be clamped so that it doesn't go out of range of the Step Table.
Also, the line that calculates diff? Don't actually use floating-point there, throw some bit-twiddling at it instead and it'll be much faster. The Multimedia Wiki link has some suggestions for you.
And that's it: you should have everything you need to load IMA4 audio, unpack it, and pass the unpacked audio to OpenAL for playback. Free your fans from the tyranny of oversized downloads!