Igor Jablokov interview on multimodal search

Last Monday night I sat down with Igor Jablokov, an IBM program director working on new methods of multimodal search using open standards, to do a podcast. Multimodal search adds voice commands to a visual display to allow easy access to a long list of commands and contextual information. The technology is currently used in web browsers, mobile phones, and automobile computing systems. I also recorded a presentation by Igor on mobile search at Mobile Monday in April.

IBM is one of the contributors to the VoiceXML proposed standard. Opera and Motorola are also active contributors. IBM promotes a voice-activated system by combining XHTML, VoiceXML, and XML events. The open software works across many server and client platforms including an Eclipse-based environment for creating voice-enabled content.

Igor showed off a Samsung phone running Windows Mobile with a prototype of WebSphere multimodal browser. The browser accepts search queries for Yahoo! Local and returns voice-enabled results using Yahoo!’s web service APIs.

We discussed dynamic grammars, a new development in mobile search that creates acceptable grammars specific to a returned data set. If you are in your car waiting for an urgent e-mail you can ask your car to retrieve all new e-mails with an urgent status and build a grammar based on the senders in the returned data set.

Igor is tasked with building for the future. Many of the technologies we discussed are not expected to be mainstream until 2008 or 2010. Companies involved in creating these voice-enabled interfaces are already planning for 2015.

Thanks to Igor for requesting this interview and Text100 for making all the arrangements.

My audio interview with Igor Jablokov is available in MP3 format. The 28-minute interview is a 12.9 MB download.

Interview questions

What are some of the biggest obstacles in mobile search today?
What is the XHTML+Voice proposal?
What devices and software support the service today?
What companies are outputting content in this format?
What is IBM’s involvement? What other companies are involved?
How is it being used in the car?
How can you accommodate a variety of accents and dialects? A thick Irish accent is supposed to be very difficult to compute.
You brought a new mobile prototype with you today. What’s exciting about this advancement?
Tell me about mixed initiatives. What are the current use cases and implementations?
I’ve used voice software in the past and I felt the need to slow down and annunciate. How has voice recognition improved?
Tell me about JSGF.
How can you create dynamically generated grammars?
Why should I, as a small company, be interested in X+V? Where is the ROI?
What are some ways we can voice-enable our site? What changes do we need to make?
What are some of the largest grammar implementations right now and what sort of hardware is needed to deal with that?
What are some competing standards and implementations? Microsoft Speech?
What are some of the tools I need to get started?
What’s coming next? How can I build an application for the next generation of devices and standards?

Tags: ibm, voicexml, xmlevents