Let's take a look at each of the three major parts of a Web Integrated Telephony Architecture (WITA) in depth. WITA contains three basic architecture blocks : a voice scripting front end, a Ruby on Rails back end, and a collection of web services to implement telephony applications. Today, I want to talk about the scripting front end.
The scripting front end provides the user interface for the user. User interfaces may be implemented through traditional web applications or desktop clients, but WITA applications typically use as a primary interface a voice scripting language such as VoiceXML. The VoiceXML script acts exactly as a web site, but instead of filling in forms, you use your voice to fill in forms to be posted to the web site. Instead of reading web pages produced by the site, the VoiceXML script will read the results back to you. VoiceXML scripts can be localized as web pages do, and can speak the native language of the caller. Different views into the application can be rendered to the user by either a voice menu choice (analogous to web site navigation) or through different dial in numbers or SIP addresses (analogous to using different URLs). In general, the scripting front end doesn't provide much in the way of call control, as the telephony features are implemented using the web services back end.
- Interfaces to the user through any telephony device, either from the PSTN to SIP. Can use natural language recognition, DTMF detection, or some combination of both.
- Interfaces to the Web server back end (typically implemented using Ruby on Rails). Fetches VoiceXML forms based on inbound dialing information. Presents data to user as provided by Web server back end. Collects data from user as a form and uses HTTP POST to save on server or to cause some action.
From a human to scripting standpoint, the scripting front end scales linearly with use. You need as many ports of inbound scripting as you have inbound callers. It is unimportant if each of the scripting ports come from a common source, as the scripts have no interaction with each other. Thus, you can use an arbitrarily large number of scripting engines (either hosted or platform) without any architectural impact to the application. Voice is not typically passed from the VoiceXML platform into the larger Internet, so quality of service issues are not typically important at this level. From a scripting to Web server standpoint, common and well known Internet scaling techniques of load balancing, server replication through DNS, etc, are fully available to the system engineer. No additional nor unique requirements are placed on the Web server farm in the WITA architecture.
In terms of business scaling, the WITA front end may be completely functional using a hosted provider (see below) at only a dime or so a minute of use. These hosted providers will be happy to scale with your application, or you can choose to deploy with a platform solution. Platform solutions typically make sense only for large or sensitive installations.
There are many hosted suppliers of voice scripting. My favorite happens to be Voxeo, because they have excellent service, they are well adopted by larger corporations in the US, they are price competitive and I happen to like those guys. There are a few other good choices as well, including BeVocal, TellMe and Angel.com.
For large corporations, there are a number of good options for platform solutions such as Convedia (now Radysis), Snowshore (now Cantata) and Voxeo (another reason I like them). In large measure, VoiceXML scripts are pretty portable, and only small code changes should be required when moving from hosted solutions to platform solutions.