Fetching and converting the music data
First and foremost, we should choose an audio file collection we will work on. We decided to use the GTZAN dataset, which is often used as a benchmark for music genre classification. It can be fetched here.
This collection includes 10 folders, corresponding to 10 different music genres ranging from classical to metal. Each genre is represented by 100 audio files, each actually containing just the first 30 seconds of a piece of music. Since the original files in the dataset were in the Au format, we had to convert them into WAV files, merely because the Python libraries we relied on (namely, scipy) accept only this format.
Thus, we wrote a simple bash script in order to perform this dull task.
.au to .wav converter
#!bin/sh
INPUT_DIR=$1
for file in $(ls $INPUT_DIR)
do
old_extension="${file##*.}"
filename_no_ext="${file%.*}"
new_file="${filename_no_ext}.wav"
ffmpeg -i "${INPUT_DIR}/$file" "${INPUT_DIR}/$new_file"
done