Indexing Media

Last summer I had a task to play sounds from a large library in response to both user and system generated events.  The recording artist delivered the product in the format of a raw session, with a single delivered session that contained all of the relevant phrases , spoken at a slow cadence.  Not having any experience with audio processing, I first looked for a way to chop up the audio into approximately 100 different audio files.  This already became an issue for me, having a large number of very small files to load onto a mobile device.

I looked into the recording session again and saw something I had missed the first time through.  The recording artist had a natural cadence that was almost exactly about one second in between the start of the next phrase.  Moreover, each phrase was small enough that they all fit within a 1 second clip.  Being an initiate coder, I decided to make use of the given environment and I played with loading the sound file into memory as a whole, then using a form of indexing to play only the sounds I had interest in, before closing the file.

I went in first using SoundBooth to do a little trimming and thresholding on the audio to make the sounds begin precisely on even 1000 ms intervals.  I also researched the system being used by the system I was coding under, Corona (not using their openAL library), and noted that I was not shooting myself in the foot by loading an audio stream of that size.  At this point, I created a partitioned space, breaking down the audio categories with a few chosen key index values, then referencing the exact sound of interest by adding an offset.

table.insert(mainView.sound.queue, {mainView.sound.recording, soundIndex, 1000})

local stream = obj[1]
audio.seek(obj[2], stream)
audio.play(stream, {channel = 2, duration = obj[3], onComplete = playSound})

The first table entry (1 indexed?  really?  Thanks lua), allows me to select which of the main recording streams I am interest in.  The key here is the second table entry, which provides the value into the stream to seek (here in milliseconds).  This gave me a very easy way of selecting the sound and playing them back without having to worry about unloading and reloading the next cut.  Furthermore, since this was an active stream, I was able to enqueue sound requests and play them back as fast as it could receive the command to seek to the new location and play.  This was, as it turned out, a very important feature in this implementation, playing back a set of audio clips without any unnatural pauses in between.

This was a cool little trick that I was able to get away with in this case because the loaded audio file was quite within the constraints of memory that I was working under, allowed for very fast changing and playing sets of sounds, and gave me a very fast and rather elegant way of programming the means for selecting the sounds to be played, using very simple key defined values and offsets for indexing.

Kevin Andrea


Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>