Ruei-Che Chang

rueiche@umich.edu
Twitter
Google Scholar
Curriculum Vitae

I am a Ph.D. candidate at Department of Computer Science, University of Michigan, advised by Anhong Guo.

My research focuses on building Human-AI systems that enable blind or visually impaired people to access the real world, specifically with audio and captioning interfaces.

During my PhD study, I interned at Meta Reality Labs. Prior to that, I graduated with a Master degree in Computer Science at Dartmouth College and a Bachelor degree in Electrical Engineering at National Cheng Kung University in Taiwan.

News

Oct 03, 2024	WorldScribe awarded Best Paper Award 🏆 at UIST’24!
Jul 04, 2024	WorldScribe is conditionally accepted to UIST’24
Jul 04, 2024	EditScribe and CustomAD are conditionally accepted to ASSETS’24

Selected Publications

WorldScribe: Towards Context-Aware Live Visual Descriptions

Ruei-Che Chang, Yuxuan Liu, and Anhong Guo

In Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology (UIST ’24), Pittsburgh, PA, USA, 2024

🏆 Best Paper Award

Abs HTML PDF Video 30s Preview

Automated live visual descriptions can aid blind people in understanding their surroundings with autonomy and independence. However, providing descriptions that are rich, contextual, and just-in-time has been a long-standing challenge in accessibility. In this work, we develop WorldScribe, a system that generates automated live real-world visual descriptions that are customizable and adaptive to users’ contexts: (i) WorldScribe’s descriptions are tailored to users’ intents and prioritized based on semantic relevance. (ii) WorldScribe is adaptive to visual contexts, e.g., providing consecutively succinct descriptions for dynamic scenes, while presenting longer and detailed ones for stable settings. (iii) WorldScribe is adaptive to sound contexts, e.g., increasing volume in noisy environments, or pausing when conversations start. Powered by a suite of vision, language, and sound recognition models, WorldScribe introduces a description generation pipeline that balances the tradeoffs between their richness and latency to support real-time use. The design of WorldScribe is informed by prior work on providing visual descriptions and a formative study with blind participants. Our user study and subsequent pipeline evaluation show that WorldScribe can provide real-time and fairly accurate visual descriptions to facilitate environment understanding that is adaptive and customized to users’ contexts. Finally, we discuss the implications and further steps toward making live visual descriptions more context-aware and humanized.
EditScribe: Non-Visual Image Editing with Natural Language Verification Loops

Ruei-Che Chang, Yuxuan Liu, Lotus Zhang, and Anhong Guo

In Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS ’24), St. John’s, Newfoundland and Labrador, Canada, 2024

Abs PDF Video

Image editing is an iterative process that requires precise visual evaluation and manipulation for the output to match the editing intent. However, current image editing tools do not provide accessible interaction nor sufficient feedback for blind and low vision individuals to achieve this level of control. To address this, we developed EditScribe, a prototype system that makes image editing accessible using natural language verification loops powered by large multimodal models. Using EditScribe, the user first comprehends the image content through initial general and object descriptions, then specifies edit actions using open-ended natural language prompts. EditScribe performs the image edit, and provides four types of verification feedback for the user to verify the performed edit, including a summary of visual changes, AI judgement, and updated general and object descriptions. The user can ask follow-up questions to clarify and probe into the edits or verification feedback, before performing another edit. In a study with ten blind or low-vision users, we found that EditScribe supported participants to perform and verify image edit actions non-visually. We observed different prompting strategies from participants, and their perceptions on the various types of verification feedback. Finally, we discuss the implications of leveraging natural language verification loops to make visual authoring non-visually accessible.
SoundShift: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness

Ruei-Che Chang, Chia-Sheng Hung, Bing-Yu Chen, Dhruv Jain, and Anhong Guo

In Proceedings of the 2024 ACM Designing Interactive Systems Conference (DIS ’24), IT University of Copenhagen, Denmark, 2024

Abs PDF Video

Mixed-reality (MR) soundscapes blend real-world sound with virtual audio from hearing devices, presenting intricate auditory information that is hard to discern and differentiate. This is particularly challenging for blind or visually impaired individuals, who rely on sounds and descriptions in their everyday lives. To understand how complex audio information is consumed, we analyzed online forum posts within the blind community, identifying prevailing challenges, needs, and desired solutions. We synthesized the results and propose SoundShift for increasing MR sound awareness, which includes six sound manipulations: Transparency Shift, Envelope Shift, Position Shift, Style Shift, Time Shift, and Sound Append. To evaluate the effectiveness of SoundShift, we conducted a user study with 18 blind participants across three simulated MR scenarios, where participants identified specific sounds within intricate soundscapes. We found that SoundShift increased MR sound awareness and minimized cognitive load. Finally, we developed three real-world example applications to demonstrate the practicality of SoundShift.
OmniScribe: Authoring Immersive Audio Descriptions for 360° Videos

Ruei-Che Chang, Chao-Hsien Ting, Chia-Sheng Hung, Wan-Chen Lee, Liang-Jin Chen, Yu-Tzu Chao, Bing-Yu Chen, and Anhong Guo

In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (UIST ’22), Bend, OR, USA, 2022

Abs HTML PDF Video

Blind people typically access videos via audio descriptions (AD) crafted by sighted describers who comprehend, select, and describe crucial visual content in the videos. 360° video is an emerging storytelling medium that enables immersive experiences that people may not possibly reach in everyday life. However, the omnidirectional nature of 360° videos makes it challenging for describers to perceive the holistic visual content and interpret spatial information that is essential to create immersive ADs for blind people. Through a formative study with a professional describer, we identified key challenges in describing 360° videos and iteratively designed OmniScribe, a system that supports the authoring of immersive ADs for 360° videos. OmniScribe uses AI-generated content-awareness overlays for describers to better grasp 360° video content. Furthermore, OmniScribe enables describers to author spatial AD and immersive labels for blind users to consume the videos immersively with our mobile prototype. In a study with 11 professional and novice describers, we demonstrated the value of OmniScribe in the authoring workflow; and a study with 8 blind participants revealed the promise of immersive AD over standard AD for 360° videos. Finally, we discuss the implications of promoting 360° video accessibility.