Fast Fourier Transform as a feature extractor
Performing a sensible feature extraction from audio files is definitely a non-trivial task. Fortunately, the Fast Fourier Transform algorithm can be of some help in coping with this issue. A thorough analysis of this popular algorithm cannot be done here, so we limit ourselves to saying that this algorithm's aim is to extract individual frequency intensities from raw sample readings, in order to generate a sort of fingerprint for a piece of music. Therefore, our data set will be made up of these features along with the genre associated to each audio file as output label. This computation was carried out by the fft method within the scipy module, which is able to "return the Discrete Fourier Transform of real or complex sequences". The FFT computation has been made after a call to the scipy.io.wavfile.read routine, which is said to "return the sample rate (in samples/sec) and data from a WAV file".
After some careful experiments, we found out that taking all the features coming from each WAV file was unnecessary and required too much computation time. Therefore, our code got a remarkable speedup without jeopardising effectiveness by restricting everything to the first 1000 features.
Even though a deep understanding of this process would demand a solid background in Signal Processing, we list some of the computed features:
- root-mean-square (RMS) level;
- spectral centroid;
- bandwidth;
- spectral roll-off frequency;
- band energy ratio;
- delta spectrum magnitude;
- pitch;
- pitch strength.
In the end, the extracted feature set has been persistently saved to disk in the .npy format, the standard binary file format in NumPy for saving a single arbitrary NumPy array on disk.
Feature extraction and file writing
def create_fft(filename):
s_rate, data = scipy.io.wavfile.read(filename)
fft_feat = abs(scipy.fft(data)[:1000])
base_filename, ext = os.path.splitext(filename)
data_filename = base_filename + ".fft"
np.save(data_filename, fft_feat)