Back to All Events

Introduction to Audio Synthesis with Machine Learning

  • Babycastles Babycastles Academy Online (map)
 

Introduction to Audio Synthesis with Machine Learning

Led by Allan Pichardo

January 17, 2021

In this workshop, we will cover the basic intuition behind the variational autoencoder–a basic generative model. We will use a variety of synth samples as training data and use the final trained decoder to output a range of random wav files of unique sounds that can be used in a sampler or DAW.

In this workshop, students will learn:

  • Basic concepts behind variational autoencoders
  • Basic concepts of digital audio and spectrograms
  • How to build and train a generative model
  • Use the trained model to generate and export unique sounds

Prerequisite / Background knowledge

No prior background in machine learning is necessary, however it is recommended that participants feel comfortable coding in Python. All datasets, tools, and environments will be provided in the form of a Google Colab notebook, so only a web browser is required.

Schedule / Suggested Duration

  • Variational Autoencoders: Basic Intuition ~10 min
  • Spectrograms: Digital Audio Basics ~10 min
  • Our Plan: An overview ~1 min
  • Coding example ~1 hour
  • Training the model ~15 minutes
  • Generating Sounds ~30 minutes
  • Questions ~15 minutes

Assessment

What is a variational autoencoder?

What is a convolutional neural network?

What is a latent vector?

What property of a variational autoencoder enables the generation of new data?

Glossary

Convolutional neural network - In deep learning, a convolutional neural network is a class of deep neural networks, most commonly applied to analyzing visual imagery.

Short-term Fourier Transform - STFTs as well as standard Fourier transforms and other tools are frequently used to analyze music. The spectrogram can, for example, show frequency on the horizontal axis, with the lowest frequencies at left, and the highest at the right. The height of each bar (augmented by color) represents the amplitude of the frequencies within that band. The depth dimension represents time, where each new bar was a separate distinct transform.

Procedure

The introduction will introduce students into basic concepts behind machine learning. This will be at a high level and not delve into the calculus. We will discuss linear functions as a basic example and how the composition of functions such as f(g(x)) can lead to more complex functions. This is the fundamental concept behind machine learning.

Secondly, we will discuss digital audio and how it’s represented in a computer. We will discuss spectrograms and how they represent audio waves as images.

Finally, the variational autoencoder will be introduced at a high level. We will focus on the “bottleneck” shape of the network and how it compresses complex information into simpler low-dimensional vectors.

Following the introduction, we will do a coding exercise where we will build and train a very basic autoencoder. Students will use the dataset provided to train the model and run the application Tensorboard to see the progression of the network during training.

At the end of the exercise, students will use the trained network to export and listen to random audio samples.

Student Reflections, Takeaways & Next Steps

Autoencoding Neural Networks as Musical Audio Synthesizers

Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras

Thanks to Lauren Gardner and EYEBEAM for past lesson plans

Allan Pichardo

Instagram: @mylovemhz

Twitter: @allanpichardo

Course materials in Google collab notebook

All teaching resources are available in the Github repo

 

This program is supported, in part, by public funds from the New York City Department of Cultural Affairs in partnership with the City Council, and by Babycastles members. Thank you.

 
Earlier Event: November 22
DIY Scene Design and 3D Modeling
Later Event: February 14
Building 3D Worlds in Blender