TY - GEN
T1 - Voice and Touch Based Error-tolerant Multimodal Text Editing and Correction for Smartphones
AU - Zhao, Maozheng
AU - Cui, Wenzhe
AU - Ramakrishnan, I. V.
AU - Zhai, Shumin
AU - Bi, Xiaojun
N1 - Publisher Copyright:
© 2021 Owner/Author.
PY - 2021/10/10
Y1 - 2021/10/10
N2 - Editing operations such as cut, copy, paste, and correcting errors in typed text are often tedious and challenging to perform on smartphones. In this paper, we present VT, a voice and touch-based multi-modal text editing and correction method for smartphones. To edit text with VT, the user glides over a text fragment with a finger and dictates a command, such as "bold"to change the format of the fragment, or the user can tap inside a text area and speak a command such as "highlight this paragraph"to edit the text. For text correcting, the user taps approximately at the area of erroneous text fragment and dictates the new content for substitution or insertion. VT combines touch and voice inputs with language context such as language model and phrase similarity to infer a user's editing intention, which can handle ambiguities and noisy input signals. It is a great advantage over the existing error correction methods (e.g., iOS's Voice Control) which require precise cursor control or text selection. Our evaluation shows that VT significantly improves the efficiency of text editing and text correcting on smartphones over the touch-only method and the iOS's Voice Control method. Our user studies showed that VT reduced the text editing time by 30.80%, and text correcting time by 29.97% over the touch-only method. VT reduced the text editing time by 30.81%, and text correcting time by 47.96% over the iOS's Voice Control method.
AB - Editing operations such as cut, copy, paste, and correcting errors in typed text are often tedious and challenging to perform on smartphones. In this paper, we present VT, a voice and touch-based multi-modal text editing and correction method for smartphones. To edit text with VT, the user glides over a text fragment with a finger and dictates a command, such as "bold"to change the format of the fragment, or the user can tap inside a text area and speak a command such as "highlight this paragraph"to edit the text. For text correcting, the user taps approximately at the area of erroneous text fragment and dictates the new content for substitution or insertion. VT combines touch and voice inputs with language context such as language model and phrase similarity to infer a user's editing intention, which can handle ambiguities and noisy input signals. It is a great advantage over the existing error correction methods (e.g., iOS's Voice Control) which require precise cursor control or text selection. Our evaluation shows that VT significantly improves the efficiency of text editing and text correcting on smartphones over the touch-only method and the iOS's Voice Control method. Our user studies showed that VT reduced the text editing time by 30.80%, and text correcting time by 29.97% over the touch-only method. VT reduced the text editing time by 30.81%, and text correcting time by 47.96% over the iOS's Voice Control method.
KW - Multimodal interaction
KW - smartphones.
KW - text correction
KW - text editing
KW - touch input
UR - https://www.scopus.com/pages/publications/85118222059
U2 - 10.1145/3472749.3474742
DO - 10.1145/3472749.3474742
M3 - Conference contribution
AN - SCOPUS:85118222059
T3 - UIST 2021 - Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology
SP - 162
EP - 178
BT - UIST 2021 - Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology
PB - Association for Computing Machinery, Inc
T2 - 34th Annual ACM Symposium on User Interface Software and Technology, UIST 2021
Y2 - 10 October 2021 through 14 October 2021
ER -