A Method for Rapid Pitch-based Speaker Segmentation

Authors

Abstract

Speaker Diarization is the process of partitioning an input audio stream into homogeneous segments according to the speaker identity. Voice Activity Detection (VAD), speaker segmentation and speaker clustering are the main necessary parts of the Speaker Diarization systems. There are several methods for speaker segmentation. However, most of the Speaker Diarization Systems use BIC-based Segmentation methods. The main goal of this paper is to propose a new method for speaker segmentation with higher speed than the current methods - e.g. BIC - and acceptable accuracy. Our proposed method is based on the pitch frequency of the speech. The accuracy of this method is similar to the accuracy of common speaker segmentation methods. However, its computation cost is much less than theirs. We show that our method is about 2.4 times faster than the BIC-based speaker segmentation method, while the accuracy of pitch-based method is %71 which is about %1 higher than that of the BIC-based method.

Keywords