Haru Research and Experiments in Ukraine by Levko Ivanchuk

About Haru

Haru, first introduced in 2018, is a robot designed as a platform to investigate social presence and emotional and empathetic engagement for long-term human interaction. Envisioned as a multimodal communicative agent, Haru interacts through nonverbal sounds (paralanguage), eye, face, and body movements (kinesics), and voice (language). While some of Haru's features connect it to a long lineage of social robots, others distinguish it and suggest new opportunities for human-robot interaction.

The first Haru prototype is a relatively simple communication device—an open-ended platform through which researchers can explore the mechanisms of multi-modal human-robot communication. Researchers deploy various design techniques, technologies, and interaction theories to develop Haru's basic skillset for conveying information, personality, and affect. Currently, researchers are developing Haru's interaction capabilities to create a new form of telepresence embodiment that breaks away from the traditional notion of a tablet-on-wheels, while also exploring ways for the robot to communicate with people more directly.

Inclusive Practice and Accessibility

Since Haru is primarily a social robot, it can be naturally placed in environments such as hospitals or schools. In these locations, Haru can fulfil its potential to the fullest. For example, Haru can be seen supporting patients as they are going through tough times and are in need of empathy and support. OR Haru can be seen acting as an encouraging mediator at schools, facilitating and encouraging productive communication between children as they learn and interact with each other at school.

In Ukraine, our target is to deploy Haru in schools as an encouraging mediator. Children will be able to interact with their peers from other countries, with Haru acting as encouraging mediator. Also, we plan on conducting a pilot study involving Haru4Kids platform (in proceedings) for a long-term cohabitation study in families.

Global reach and scalability

Haru is jointly developed by a multidisciplinary and multi stakeholder team, scattered across the globe. Many research labs, universities and private companies, including us, have joined forces in developing Haru and enabling new, previously unseen interactions in various new domains and settings.

To broaden our outreach in Ukraine and to facilitate experiments, we have partnered with two local schools, located in the city of Lviv. Школа вільних та небайдужих (School of the free and the prudent) and IT Step School Lviv have graciously agreed to involve their student in studies and experiments with Haru.

Benefits

We believe that conducting this research in Ukraine is particularly beneficial, especially in light of the recent events. Although recently Ukraine has been leading in some areas of digitalisation and digital government, in addition to rapidly growing IT sector, it is extremely rare for children to have the opportunity to interact with a robot, especially a social one. Such study will be well-received at any school by children, their parents and teachers alike.

In light of Russian aggression in Ukraine, and the ongoing war, children are especially affected by it, both physically and mentally. Many children studying in the schools we have partnered with are refugees from parts of the occupied parts of the country and have went through a difficult process of relocation and readjustment. Also, due to frequent air raids and sirens, children are often forced to spend their lessons hiding in basements. All of this takes a toll on their mental wellbeing.

We believe that by bringing Haru to these children and having them interact with it we will be positively contributing to their wellbeing and hopefully getting them interested in computer science and robotics.

Responsible Practices

In all our research, we place particular emphasis on data privacy. Whenever possible, data is being anonymised, processed and stored on device or on the local network. We refrain from using external services as much as possible and immediately delete all non-essential data. Children and their parents alike are informed and are asked to consent to any data collection during the experiment. They are also informed how this data will be used once the experiment is over.

Utilizing visual cues in robot audition for sound source discrimination by Levko Ivanchuk

download.png

The goal of this project was simple: what kind of visual information can we use to improve verbal human - robot communication? Humans have a fantastic system for discerning noise from actual speech and identifying the direction of the sound. For robots and other systems which rely on speech input, this task is much harder than it initially seems, as sound information alone is often not sufficient to discern noise from speech. 

Therefore, I have implemented a system which used Microsoft Kinect V2 to determine the locations of humans around the robot. This information was then made available to the sound localization and separation system, which would automatically ignore noise and sound reflections coming from the direction other than the position of an actual human. 

In addition, human face and mouth were also tracked. With this information, we were able to ignore any noise coming from the back of the human speaker. As we knew when the human speaker was moving his or her mouth, we were able to process only those sounds that were emitted at the time of the movement. Although the technique of mouth tracking does require a visual line of sight between the camera and the human, it significantly improves the quality of noise filtering during sound source localization. 

Using Face tracking to assist Human-Robot Communication by Levko Ivanchuk

Overall system structure

Overall system structure

Any active robot audition system, or any other automatic speech recognition (ASR) system for that matter, suffers from acoustic reverberation. Unless perfect recording conditions are provided, ASR system needs to be able to deal with the same sound source being picked up by the microphones multiple times as a result of the sound reverberating from various room surfaces. The problem is compounded when the room acoustics is perturbed as a result of the change in the speaker's face orientation. The research team at HRI Japan, which I was part of at the time, saw an opportunity to improve the results of our dereverberation techniques by taking speaker's face orientation into the account - something that no other methods attempted before. 

My contribution to the dereverberation approach taken by R. Gomez et. al is in providing the face orientation vector for the correct selection of the Room Transfer function (RTF) as well as correct inputs for a more accurate gain correction scheme.  According to our findings, such combination of visual and audio information outperforms state-of-the-art dereverberation techniques. 

In my work, I used Microsoft's Kinect V2 sensor to obtain the speaker's location in the room. Knowing the positions of the camera, microphone, and speaker, I was able to calculate not only the position of the speaker but also the speaker's face rotation relative to the microphone. 

While this was sufficient information to the dereverberation method, I went further and improved microphone noise filtering by tracking mouth movements. It is intuitive that sound coming from the speaker should only be considered when there is mouth movement. Henceforth, any sound coming from the speaker's direction at the time when no significant and continuous mouth movement was detected by the sensor was successfully ignored and not sent to the ASR system at all. 

Our work was published at Interspeech 2015, with citation and publication links below

Citation: 

Gomez, R., Ivanchuk, L., Nakamura, K., Mizumoto, T., & Nakadai, K. (2015). Dereverberation for active human-robot communication robust to speaker's face orientation. In Sixteenth Annual Conference of the International Speech Communication Association.

URL: Click here

SmartColor by Levko Ivanchuk

A video demonstrating SmartColor in action on a few real life scenarios.

During my internship at University of Manitoba Human - Computer Interaction Lab, I worked primarily with Juan David Hincapié-Ramos on a project that aimed to improve color reproduction on head-worn transparent displays (HMDs). 

Users of optical see-through head-mounted displays (OHMD) perceive color as a blend of the display color and the background. Color-blending is a major usability challenge as it leads to loss of color encodings and poor text legibility. Color correction aims at mitigating color blending by producing an alternative color which, when blended with the background, more closely approaches the color originally intended. To date, approaches to color correction do not yield optimal results or do not work in real-time. This paper makes two contributions. First, we present QuickCorrection, a real-time color correction algorithm based on display profiles. We describe the algorithm, measure its accuracy and analyze two implementations for the OpenGL graphics pipeline. Second, we present SmartColor, a middleware for color management of user-interface components in OHMD. SmartColor uses color correction to provide three management strategies: correction, contrast, and show-upon-contrast. Correction determines the alternate color which best preserves the original color. Contrast determines the color which best warranties text legibility while preserving as much of the original hue. Show-upon-contrast makes a component visible when a related component does not have enough contrast to be legible. We describe the SmartColor’s architecture and illustrate the color strategies for various types of display content.

Citations: 

J. David Hincapié-Ramos, L. Ivanchuk, S. K. Sridharan and P. Irani, "SmartColor: Real-time color correction and contrast for optical see-through head-mounted displays," 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Munich, 2014, pp. 187-194.
doi: 10.1109/ISMAR.2014.6948426
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6948426&isnumber=6948385

J. David Hincapié-Ramos, L. Ivanchuk, S. K. Sridharan and P. P. Irani, "SmartColor: Real-Time Color and Contrast Correction for Optical See-Through Head-Mounted Displays," in IEEE Transactions on Visualization and Computer Graphics, vol. 21, no. 12, pp. 1336-1348, Dec. 1 2015.
doi: 10.1109/TVCG.2015.2450745
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7138644&isnumber=7299343

Here is an example of SmartColor in action: 

From Top Left to Bottom Right - 1) Not corrected, 2) Corrected using Fragment Shader Correction, 3) Corrected using Vertex shader Correction, 4) Corrected using Vertex Shader Correction + Voting mechanism

From Top Left to Bottom Right - 1) Not corrected, 2) Corrected using Fragment Shader Correction, 3) Corrected using Vertex shader Correction, 4) Corrected using Vertex Shader Correction + Voting mechanism