Publications
Dereverberation for ACtive Human-Robot Communication Robust to Speaker's Face Orientation
Abstract
Reverberation poses a problem to the active robot audition system. The change in speaker’s face orientation relative to the robot perturbs the room acoustics and alters the reverberation condition at runtime, which degrades the automatic speech recognition (ASR) performance. In this paper, we present a method to mitigate this problem in the context of the ASR. First, filter coefficients are derived to correct the Room Transfer Function (RTF) per change in face orientation. We treat the change in the face orientation as a filtering mechanism that captures the room acoustics. Then, joint dynamics between the filter and the observed reverberant speech is investigated in consideration with the ASR system. Second, we introduce a gain correction scheme to compensate the change in power as a function of the face orientation. This scheme is also linked to the ASR, in which gain parameters are derived via the Viterbi algorithm. Experimental results using Hidden Markov Model-Deep Neural Network (HMM-DNN) ASR in a reverberant robot environment, show that proposed method is robust to the change in face orientation and outperforms state-of-the-art dereverberation techniques.
Citation
Gomez, R., Ivanchuk, L., Nakamura, K., Mizumoto, T., & Nakadai, K. (2015). Dereverberation for active human-robot communication robust to speaker's face orientation. In Sixteenth Annual Conference of the International Speech Communication Association.
URL: Click here
Utilizing Visual Cues in Robot Audition for Sound Source Discrimination in Speech-based Human-Robot Communication
Abstract
It is easy for human beings to discern whether an observed acoustic signal is a direct speech, reflected speech or noise through simple listening. Relying purely on acoustic cues is enough for human beings to discriminate between the different kinds of sound sources which is not straightforward for machines. A robot equipped with the current robot audition mechanism in most cases, will fail to differentiate a direct speech from the other sound sources because acoustic information alone is insufficient for effective discrimination. Robot audition is an important topic in speech-based human-robot communication. It enables the robot to associate the incoming speech signal to the user for an effective human-robot communication. In challenging environments, this task becomes difficult due to reflections of the direct speech signal and background noise sources. To counter this problem, a robot needs to have a minimum amount of prior information to discriminate the valid speech signal (direct speech) from the contaminants (i.e., speech reflections and background noise sources). Failure to do so would lead to false speech-to-speaker association in robot audition and will gravely impact human-robot communication experience. In this paper we propose to using visual cues to augment the traditional robot audition which relies solely on acoustic information. The proposed method significantly improves accuracy of speech-tospeaker association and machine understanding performance in real environment situation. Experimental results show that our expanded system is robust in discriminating direct speech from speech reflections and background noise sources.
Citation
R. Gomez, L. Ivanchuk, K. Nakamura, T. Mizumoto and K. Nakadai, "Utilizing visual cues in robot audition for sound source discrimination in speech-based human-robot communication," 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, 2015, pp. 4216-4222.
doi: 10.1109/IROS.2015.7353974
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7353974&isnumber=7353104
SmartColor: Real-Time Color Correction and Contrast for Optical See-Through Head-Mounted Displays
Abstract
Users of optical see-through head-mounted displays (OHMD) perceive color as a blend of the display color and the background. Color-blending is a major usability challenge as it leads to loss of color encodings and poor text legibility. Color correction aims at mitigating color blending by producing an alternative color which, when blended with the background, more closely approaches the color originally intended. To date, approaches to color correction do not yield optimal results or do not work in real-time. This paper makes two contributions. First, we present QuickCorrection, a real-time color correction algorithm based on display profiles. We describe the algorithm, measure its accuracy and analyze two implementations for the OpenGL graphics pipeline. Second, we present SmartColor, a middleware for color management of user-interface components in OHMD. SmartColor uses color correction to provide three management strategies: correction, contrast, and show-upon-contrast. Correction determines the alternate color which best preserves the original color. Contrast determines the color which best warranties text legibility while preserving as much of the original hue. Show-upon-contrast makes a component visible when a related component does not have enough contrast to be legible. We describe the SmartColor’s architecture and illustrate the color strategies for various types of display content.
Citations:
J. David Hincapié-Ramos, L. Ivanchuk, S. K. Sridharan and P. Irani, "SmartColor: Real-time color correction and contrast for optical see-through head-mounted displays," 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, 2014, pp. 187-194.
doi: 10.1109/ISMAR.2014.6948426
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6948426&isnumber=6948385
J. David Hincapié-Ramos, L. Ivanchuk, S. K. Sridharan and P. P. Irani, "SmartColor: Real-Time Color and Contrast Correction for Optical See-Through Head-Mounted Displays," in IEEE Transactions on Visualization and Computer Graphics, vol. 21, no. 12, pp. 1336-1348, Dec. 1 2015.
doi: 10.1109/TVCG.2015.2450745
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7138644&isnumber=7299343