Mel Frequency Cepstral Coefficients
Mel Frequency Cepstral Coefficients (MFCC) is an alternative and somewhat more advanced approach for feature extraction that differs from the previous one in various aspects. It is commonly used in some research areas such as Speech Recognition and Music Information Retrieval. Let's glance at this algorithm's main concepts. We could define the Mel Frequency Cepstrum (that is, spectrum with the first four characters reversed) as a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. A Mel Frequency Cepstrum is made up of a set of coefficients (named, indeed, Mel Frequency Cepstral Coefficients) which will serve as features for a future classification task. In order to work out the coefficients, the algorithm goes through the steps listed below:
- take the Fourier Transform of the given signal;
- map the powers of the spectrum obtained above onto the mel scale;
- take the logarithm of the powers at each mel frequency;
- take the Discrete Cosine Transform of the list of the aforementioned logarithms;
- the amplitudes of the resulting spectrum corresponds to the coefficients we were looking for.
Fortunately we were able to find a Python library, called Talkbox SciKit, that implements the MFCC algorithms offering a handy programming interface.
The .npy format has been used here as well.
Feature extraction and file writing
def write_ceps(ceps, filename):
base_filename, ext = os.path.splitext(filename)
data_filename = base_filename + ".ceps"
np.save(data_filename, ceps)
print("Written %s" % data_filename)
def create_ceps(fn):
s_rate, X = scipy.io.wavfile.read(fn)
ceps, mspec, spec = mfcc(X)
write_ceps(ceps, fn)