Audio Processing and Indexing Project Page

<Towards Real Time Audio Mosaicing>

(Last updated: Jan-22 2018)

Contents



Project Members

Christiaan Lamers

Goals

The goal of this project is to see if it is possible to modify the audio mosaicing method of Driedger et al. in order for it to be ran in real time.

Abstract

This project is inspired by the Audio Mosaicing done by Driedger et al. Audio Mosaicing is the process of approximating a target sound sample using slices of a source sound. Driedger et al. demonstrated this by picking "Let it be" by the Beatles as a target sound and the sound of buzzing bees as the source sound. The result is bees buzzing the song ''Let is be''. The result can be heard on their website along with more examples.
The method of Driedger et al. is designed to work on prerecorded audio. For this project we wondered if it would be possible to alter the method in order to make it functional in a real time setting. This way the method could potentially be used as a live musical effect.
The goals of this project are to recreate the method of Driedger et al., to alter the method to make it suitable for a real time environment and to try to run the method in real time.

Introduction

Audio mosaicing is done by Driedger et al. by using the method depicted by the following figure:

Overview

In this figure we can see a target sound, in this case ''Let is be'' by the Beatles and a source sound, in this case the buzzing bees. The source sound and target sound are transformed using a Fast Fourier Transform in order to produce matrices containing spectrograms. The goal is to learn an activation matrix for which the dot product of the source sound and the activation matrix is similar to the target sound. This dot product is the resulting audio mosaic. It is important to note that the source sound matrix and the target sound matrix contain only real values derived by taking the magnitude of their complex counterparts. The activation matrix consists of positive real values. The audio mosaic is made by using the complex values of the source sound together with the real values of the activation matrix. The resulting matrix is then converted back to audio using the inverse Fast Fourier Transform.

Design

This project is designed by first recreating the method of Driedger et al. Then it is modified with a sliding window implementation in order for it to accept streams of audio.

Implementation

Implementation issues involved glitches in the audio mosaic after recreating the method of Driedger et al. We did not get the same results as Driedger et al. in the sense that their output recreated the texture of the source sound very well because of the glitches. The implementation of Driedger et al. produced mosaics in which the melody of the target sound was clearly recognizable. With our implementation this was not the case although rhythmic patterns are recognizable. We think this is because our implementation does not pitch shift the source sound multiple times.
Another problem we ran into was the long runtime of the algorithm. While we made it suitable for a streaming environment, it does not run in real time. However with optimization it might be able to run in real time.

 

 

Experimentation

Driedger et al. proposed a classic method and a method with extra update rules, we implemented both of them and compared the output. We experimented with our method by measuring the runtime with different FFT window sizes. We also judged the quality of the produced audio mosaics by ear.
Classic vs. Extra update rules (our implementation):

elvis riverside (target)
crickets (source)
elvis crickets classic method (mosaic)
elvis crickets uxtra updat rules (mosaic)

In our paper we mention an experiment where we vary the window size of the FFT and measure the effect on the runtime. This experiment uses the "back in black clean" sample as the target and the "sheep" sample as the source.

back in black clean (target)
sheep (source)
black_sheep_65000 (mosaic)
black_sheep_6500 (mosaic)
black_sheep_650 (mosaic)


Software Requirements

This project is implemented in Python 3. We use librosa to do the FFT, scipy to import wav files, and numpy for matrix operations.

Hardware Requirements

The experiments where run on an intel dual core i7 clocked at 1.7 GHz non boosted and 3.2 GHz boosted, with 8 Gb of DDR3 RAM clocked at 1600 MHz.

Work Plan

  1. Study and understand method of Driedger et al.

  2. Implement methods of Driedger et al.

  3. Test implementation

  4. Modify implementation in order for it to accept streams of audio

  5. Test implementation

  6. Run experiments and report them

Deliverables

References

Project Links