Snips is looking for a Speaker Recognition Research Intern, to join our team in Paris.
This internship offer fits within the scope of a collaboration between Snips and the Multispeech team of Inria Nancy - Grand Est (https://team.inria.fr/multispeech/). You will be based in our office in Paris and will have the opportunity to spend some time at Inria. This internship can be followed by a CIFRE PhD, co-supervised by Snips and Inria, on a closely-related subject.
English is the official language of the company, as we have over 12 different nationalities, so don't worry if you don't speak French!
On top of a competitive salary, we offer many perks including relocation assistance and VISA sponsorships, laptops, makers kits, language classes, sports classes, full health insurance, a transport card and free lunches!
State-of-the-art speaker recognition systems  rely on various speaker embedding methods, such as i-vector , x-vector , d-vector , that have allowed good recognition performance under controlled conditions. However, speaker recognition remains a challenging problem under real-world conditions. For instance, when the system is trained under clean acoustic conditions but the recognition has to be performed on speech corrupted by background noises, speaker recognition accuracy severely drops. This is a strong requirement from emerging smart assistant technology sector to develop reliable speaker recognition methods for authenticating users in adverse environmental conditions. The existing DNN-based speaker recognition methods uses short-term magnitude information but ignore the phase information which is found useful in other speech processing applications such as speech recognition , speech enhancement , speech separation , etc.
The goal of this Master internship is to design and implement a phase-aware DNN-based noise-robust speaker recognition system and to evaluate it for practical applications. The intern will be responsible for (a) investigating the reliability of existing handcrafted phase representations such as the modified group delay, the relative phase, and the all-pole group delay function, and (b) developing an end-to-end DNN architecture that accounts for the phase. This work will involve both using the existing speaker recognition system in Kaldi  and developing additional software in Python using the publicly available PyTorch machine learning library.
The experiments will be conducted on the state-of-the-art SRE-18 speaker recognition corpus . This corpus does not involve recordings with background noise. Therefore, in a first step, in order to simulate noisy conditions, the intern will mix this speech data with noise signals from the publicly available MUSAN noise corpus .
 Hansen, J.H. and Hasan, T., 2015. Speaker recognition by machines and humans: A tutorial review. IEEE Signal processing magazine, 32(6), pp.74-99. (Overview of speaker recognition)
 Dehak, N., Kenny, P.J., Dehak, R., Dumouchel, P. and Ouellet, P., 2011. Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), pp.788-798.
 Snyder, D., Garcia-Romero, D., Sell, G., Povey, D. and Khudanpur, S., 2018. X-vectors: Robust DNN embeddings for speaker recognition. Proc. of ICASSP 2018.
 Variani, E., Lei, X., McDermott, E., Moreno, I.L. and Gonzalez-Dominguez, J., 2014, May. Deep neural networks for small footprint text-dependent speaker verification. Proc. of ICASSP 2014.
 C. Kim, T. Sainath, A. Narayanan, A. Misra, R. Nongpiur and M. Bacchiani, "Spectral Distortion Model for Training Phase-Sensitive Deep-Neural Networks for Far-Field Speech Recognition," Proc. of ICASSP 2018.
 N. Zheng and X. Zhang, "Phase-Aware Speech Enhancement Based on Deep Neural Networks," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 1, pp. 63-76, Jan. 2019.
 Erdogan, H., Hershey, J.R., Watanabe, S. and Le Roux, J., “Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks”, Proc. of ICASSP 2018.