Speaker: Björn Schuller, DI Dr.
Institute for Human-Machine Communication, Technische Universität München, D-80333 München, GERMANY, +49-(0)89-289-28548,
Intelligent Acoustic Solutions Group, DIGITAL - Institute for Information and Communication Technologies, JOANNEUM RESEARCH, Forschungsgesellschaft mbH, Steyrergasse 17, 8010 Graz, AUSTRIA, +43-316-876-5012,
The combination of advanced Machine Intelligence and Signal Processing techniques holds promises far beyond today's computer audition systems. Besides Automatic Speech Recognition, recently an increasing number of further speech and speaker characterisation tasks are pursued targeting technical systems' social competence. In addition, the younger field of Music Information Retrieval is growing and there is emerging interest in the computationally `intelligent' analysis of general sound events. Fields of application comprise audio coding, edition, interaction, search, surveillance as well as coaching and entertainment applications. This talk first propagates a unified view on the multiplicity of resulting tasks. It further provides a broad overview on the field enriched by extensive research and project results of the presenter's latest work. The focus thereby lies on realistic conditions and standardisation by open-source software implementations and comparative evaluations. Robustness is advanced by latest and innovative methods such as automated data-acquisition by active- and semi-supervised learning, signal enhancement by non-negative matrix factorisation, analytical feature brute-forcing, and memory-enhanced learning - for example in combination with tailored graphical model structures. Machine-based recognition of speech, non-linguistic vocalisations, and paralinguistic speaker states and traits serve as examples of applied speech processing. As for music processing, examples include blind separation of instruments, determination of tempo, metre and ballroom dance style, as well as analysis of musical key, chord progression, and structure, next to estimation of music mood and singer traits. Finally, examples are complemented by the recognition of general sound events along with their emotional connotation. In the outlook, avenues towards universal evolutionary computer audition are shown.
Björn W. Schuller received his diploma in 1999, his doctoral degree for his study on Automatic Speech and Emotion Recognition in 2006, and his habilitation for his work on Intelligent Audio Analysis in 2012 all in electrical engineering and information technology from TUM (Munich University of Technology). At present, he is with JOANNEUM RESEARCH, Institute for Information and Communication Technologies in Graz/Austria, working in the Research Group for Intelligent Acoustic Solutions. He is further tenured as Senior Lecturer in Signal Processing and Machine Intelligence heading the Intelligent Audio Analysis Group at TUM’s Institute for Human-Machine Communication since 2006. From 2009 to 2010 he lived in Paris/France and was with the CNRS-LIMSI Spoken Language Processing Group in Orsay/France. In 2010 he was also a visiting scientist in the Imperial College London's Department of Computing in London/UK. In 2011 he was guest lecturer at the Università Politecnica delle Marche in Ancona/Italy and visiting researcher of NICTA in Sydney/Australia.
Dr. Schuller is president-elect of the HUMAINE Association and member of the ACM, IEEE and ISCA and (co-)authored 4 books and more than 300 publications in peer reviewed books (23), journals (45), and conference proceedings in the field leading to more than 3,000 citations (h-index = 28). He serves as co-founding member and secretary of the steering committee, associate editor, and guest editor of the IEEE Transactions on Affective Computing, associate and repeated guest editor for the Computer Speech and Language, associate editor for the IEEE Transactions on Systems, Man and Cybernetics: Part B Cybernetics and the IEEE Transactions on Neural Networks and Learning Systems, and guest editor for the IEEE Intelligent Systems Magazine, Speech Communication, Image and Vision Computing, Cognitive Computation, and the EURASIP Journal on Advances in Signal Processing, reviewer for more than 50 leading journals and 30 conferences in the field, and as workshop and challenge organizer including the first of their kind INTERSPEECH 2009 Emotion, 2010 Paralinguistic, 2011 Speaker State, and 2012 Speaker Trait Challenges and the 2011 and 2012 Audio/Visual Emotion Challenge and Workshop and programme committee member of 50 international workshops and conferences. Steering and involvement in current and past research projects includes the European Community funded ASC-Inclusion STREP project as coordinator and the awarded SEMAINE project, and projects funded by the German Research Foundation (DFG) and companies such as BMW, Continental, Daimler, HUAWEI, Siemens, Toyota, and VDO.