AI DJ that creates a mix from the individual components of 2 songs, as the final project of the Le Wagon Data Science bootcamp

On June 18th 2021, I finished an intensive 9-weeks Data Science bootcamp to learn all about Data Analytics through Python, tackling real-world prediction problems through Machine Learning, and showcasing these analyses and predictions by developing easy-to-use Data Projects. The final two weeks of this bootcamp were all about creating a data science project from beginning to end. My team, consisting of Willem Sickinghe, Amine Aboufirass and myself, decided to create an Artificial Intelligence (AI) DJ. 

So why an AI DJ? Well, we were in the midst of the Corona lock-down period when the bootcamp started. Meaning that there was little relief from daytime life, since nighttime life hardly existed. There were no house parties and get-togethers with friends, let alone going to a club to see your favorite DJ. Therefore, we decided to put matters in our own hands and allow everybody to be a DJ, without any prior music composition knowledge. 

Sounds nice right? But how does it work?

The AI DJ allows the user to pick a song they would like to mix with, and give this song as an input to the AI DJ. To make it as simple as possible, the user solely needs to provide a YouTube link of the song; the AI DJ will do the rest.

As this project was part of the Le Wagon Data Science Bootcamp, we gave a presentation on the project on Demo Day explaining the process. 

The AI DJ was written in Python, whereby we used the libraries Numpy and Pandas throughout the entire project. However, there were some specific steps that required dedicated libraries:

  • Extracting the audio from Youtube: youtube_dl
  • Splitting the audio: spleeter
  • Tracking the beat: madmom and librosa
  • Extracting the audio features: librosa, scipy, pyACA
  • Mixing the 4 new stems in the correct BPM and beat: librosa
  • Rating the mix: sklearn

For the prediction of the new mix’s rating, we used a Linear Regression model. This is definitely not the optimal model, and should we have had more than 2 weeks, we would have opted for a more complex model. 

The interaction with the user was done through Heroku and Streamlit, and we saved the data in Google Cloud Platform.