Developer Info ::
VIXML specs ::
FAQ ::
VIXML examples ::
VIXML previewing ::
HTTP API scripts ::

VIXML - Video Interactive XML

VIXML, or Video Interactive XML is a scripting language designed for IVR sessions with 3G video calls. It also works with standard voice calls.

A VIXML page is specified within an XML document by a list of objects between <vixml> and </vixml> tags. The VIXML interpreter works its way sequentially through the objects, rendering graphics and playing sound as necessary.

<vixml>
  objects ...
</vixml>

To ensure correct syntax, VIXML documents can also be created via the XML schema http://www.vivatel.com.au/VIXMLSchema.xsd

A video IVR session has an audio and a video channel, while a voice session only has an audio channel. Some VIXML objects attempt to lock these channels, sometimes instantaneously, like when an image is specified, or for a period of time, like when text is spoken by the speech synthesizer. The VIXML interpreter will stop at any object that is unable to obtain a lock, then proceed once the channel becomes available.

Each object has optional attributes, and some may include a text value. For example, the following XML tag sets the background colour to red.

<defaults bgcolour="red" />

while this object will say hello my friend with a male Scottish accent.

<speech voice="cmu_us_awb_arctic_hts">hello my friend</speech>

Note that whenever a voice is set for the speech synthesizer, it will persist for the duration of the call. This is one of the few attributes that is not reset when a new page is loaded.

URLs within VIXML objects may be either absolute (e.g. http://www.darkside.com/test/image.jpg) or relative (image.jpg or /test/image.jpg). However, for relative addresses to work, the URL of the referring page must contain a trailing slash if it's the default file for its directory (e.g. http://www.darkside.com/test/).

There are also special URLs that can be referenced from link and timeout objects.

  • system://hangup This causes the call to terminate immediately.
  • system://signal?dtmf=NNN When connected to another party via the call command, this will send them a sequence of one or more DTMF digits. This is useful when the other party is an IVR.

All pages loaded by VIXML will have a duration variable appended to their URL. This contains the duration of the call so far, in milliseconds. It can be useful in situations where a call has to be terminated after a certain length of time.

In addition, all pages will have a session variable appended. This is an integer, unique to every call.


Start page

The start page of a VIXML session will be passed three HTTP GET variables, type, localPhone, and remotePhone if it is known. The value of type will be either video or voice. If caller-ID is suppressed, then remotePhone will not be specified. The phone numbers will be in E.164 format where possible (e.g. 61410402221), but for numbers that aren't accessible internationally the local format will be used (e.g. 1800555236).

The duration and session variables will also be set for the start page. Note, however, that the duration on the start page will always be zero.

Since the start page will probably need to be a dynamic script to handle these variables, it is unlikely to have a .xml suffix. To be safe, the generated page should have a Content-Type of text/xml or application/xml, although this isn't enforced.


VIXML objects

The complete list of VIXML objects is as follows.

Name Attributes Text value Lock video channel Lock audio channel
defaults bgcolour, wordWrap, font, fontSize, fontColour, beepDuration, beepFrequency, beepSRC, voice, hangupURL No If bgcolour is set No
image src, x, y, width, height, animated, windowed No Yes No
text wordWrap, font, fontSize, fontColour, x, y, width, height, halign, valign Yes Yes No
speech voice Yes No Yes
link keys, url No No No
timeout delay, url, background No No No
input name, validKeys, deleteKeys, clearKeys, acceptLength, acceptKeys, acceptURL, cancelKeys, cancelURL, timeoutDelay, timeoutURL, visible, wordWrap, font, fontSize, fontColour, x, y, width, height, halign, valign, passwordChar No If visible is true No
stream asrc, vsrc, avsrc, bytesPerSecond No If vsrc or avsrc is set If asrc or avsrc is set
capture url, contentType, realtime, audioDirection, videoDirection No No No
call phone, conference, phoneConference, conferenceType, type, fromPhone, successURL, connectedURL, failedURL, engagedURL, rejectedURL, preambleURL No If type is video Yes
application type, src No Yes Yes



defaults

The defaults tag is used to set the background colour and default attributes for the objects that follow.

Attribute Default value Description
bgcolour white Fills the screen with the specified colour. Colours can be specified by name or in the hexadecimal #rrggbb format, in other words all HTML colours are valid. For example, white could also be specified as #ffffff. Note that this colour will overwrite anything that was previously on the screen.
wordWrap false If true, text will break at whitespace where possible when wrapping to the next row. Otherwise it breaks at the last character.
font The default font for all rendered text. A blank value means a regular (non-bold, non-italic, proportional) Lucida font. Other valid fonts are bold, italic, bold-italic, fixed, fixed-bold, fixed-italic, and fixed-bold-italic. Note that fixed means a fixed-width font.
fontSize 16 The height of the font, in pixels. Actually, this is the spacing between rows of text. The characters are a bit smaller.
fontColour black The colour of the font. As with the background colour, this can be a name or a hexadecimal value.
beepDuration 0 The duration of the audio beep when a DTMF key is pressed, in milliseconds. A value of zero disables the beep. A value of 250 sounds good.
beepFrequency 440 The frequency of the DTMF beep in Hertz.
beepSRC The URL of an audio file to play whenever a DTMF key is pressed. This overrides the beepDuration value. Valid file formats are wav, mp3 and raw audio. See the stream object for details.
voice cmu_us_awb_arctic_hts The default voice for synthesized speech. See the speech object for more details.
hangupURL This URL will be notified when the call ends. A duration variable is appended, containing the duration of the call in milliseconds. To indicate successful notification, the output of the URL should be OK. Note that this setting will persist for the duration of the call.



image

The image tag is used to load an image from a URL and display it on the screen at the specified position at the specified size. Images that are not the same resolution as the specified size will be scaled to fit. Supported image formats are JPEG, GIF, PBM, PGM, and PPM. For images from remote servers, JPEG format is preferred because it uses less bandwidth. For images hosted locally, PPM is preferred because it requires less processing.

Attribute Default value Description
src The URL of the image to be loaded.
x 0 The x position, in pixels, of the left edge of the image.
y 0 The y position, in pixels, of the top edge of the image.
width 176 The width of the image, in pixels.
height 144 The height of the image, in pixels.
animated false Only relevant if the image is an animated GIF. Note that the video channel will remain locked while the animation runs.
windowed false When this is set true the user can zoom/pan/tilt through the image with DTMF keys. The image is initially shown at the smallest zoom that fills the screen while preserving its aspect ratio. 5 zooms in, 0 zooms out, 4 and 6 pan left/right, 2 and 8 move up/down, and 1, 3, 7, and 9 move diagonally. Note that these override any DTMF links, so to exit this page links using the # or * keys must be specified. Also, once a windowed image is loaded, the video channel will remain permanantly locked.



text

The text tag displays text on the screen. The text is contained within a bounding box, and when the width of the text exceeds the width of the bounding box it wraps to the next row.

Note that if the text contains the HTML special characters &, <, or >, or a character value greater than 127, then those characters will have to be replaced with HTML escape sequences. The relevant sequences are &amp;, &lt;, &gt;, and &#128; respectively.

Attribute Default value Description
wordWrap default word wrap The text wrapping policy. See the defaults object for more details.
font default font The font for the text. See the defaults object for more details.
fontSize default font size The font size. See the defaults object for more details.
fontColour default font colour The font colour. See the defaults object for more details.
x 0 The x position, in pixels, of the left edge of the text bounding box.
y 0 The y position, in pixels, of the top edge of the text bounding box. Position zero is at the top of the screen, while 144 is at the bottom. Note that the y position determines the baseline for the text, so for top-aligned text at position zero only the descenders will be visible.
width 176 The width of the bounding box, in pixels.
height 144 The height of the bounding box, in pixels.
halign left The horizontal alignment of text within the bounding box. Valid alignments are left, centre, and right.
valign top The vertical alignment of text within the bounding box. Valid alignments are top, centre, and bottom.



speech

The speech tag causes the text to be spoken in a synthesized voice.

Attribute Default value Description
voice default voice Supported voices are kal_diphone (American English male), ked_diphone (American English male), cmu_us_awb_arctic_hts (Scottish English male), cmu_us_jmk_arctic_hts (Canadian English male), cmu_us_rms_arctic_hts (American English male), cmu_us_bdl_arctic_hts (American English male), cmu_us_slt_arctic_hts (American English female), cmu_us_clb_arctic_hts (American English female). Note that this setting will persist for the duration of the call.



link

The link tag is used to specify links to other VIXML documents. These are triggered by DTMF key presses. Note that the VIXML interpreter goes through the list of objects sequentially, so links should appear at the start of the script if they are to be active the whole time.

Attribute Default value Description
keys One or more DTMF keys that will activate this link.
url The URL that the VIXML interpreter will go to when the link is activated.



timeout

The timeout tag will cause the interpreter to pause for the specified number of milliseconds. If there is a URL specified, it will then attempt to jump to that page.

If the background attribute is set to true, the interpreter will immediately jump to the next command and run the timeout in the background. When the timeout expires, the url (if specified) will be called in background mode. VIXML scripts called in background mode do not interfere with the main application, but run independently. They are useful for playing periodic beeps and hanging up calls at a pre-scheduled time. Note that background mode commands that try to display visual information are ignored. If a timeout command is executed during background mode with its background attribute set to false, its URL will take control from the main application.

Attribute Default value Description
delay 0 The delay in milliseconds. Must be greater than zero.
url The URL that the VIXML interpreter jumps to when the timeout expires.
background false When this is set to true, the url will be run in the background.



input

The input tag is used collect a sequence of DTMF key presses that will be passed to another VIXML script via HTTP GET. Note that the destination URLs can contain variables (e.g. http://www.vivatel.com.au/scripts/test.php?user=mkwan), in which case the input variable and its value will be appended to the URL.

Attribute Default value Description
name The name of the variable.
validKeys 0123456789 The DTMF keys that are accepted as valid input. Note that these keys can be overridden by other key attributes below.
deleteKeys When these DTMF keys are pressed, the last input key value will be deleted. Conceptually, these are backspace keys.
clearKeys When these DTMF keys are pressed all input entered so far will be erased.
acceptLength 0 When this value is non-zero, the input will be automatically accepted when it reaches the specified length. For example, a value of 4 could be used when requesting a credit card expiry date.
acceptKeys When these keys are pressed the input will be accepted.
acceptURL When the input is accepted, the interpreter will jump to this URL, passing through the input variable and value as an HTTP GET parameter.
cancelKeys When these keys are pressed the input will be cancelled.
cancelURL When the input is cancelled, the interpreter will jump to this URL. Note that the variable and value entered will still be passed through.
timeoutDelay 0 The timeout delay in milliseconds. The counter is reset whenever a DTMF key is pressed.
timeoutURL When the timeout counter expires, the interpreter will jump to this URL. Note that the variable and value entered will still be passed through.
visible false Should the entered DTMF values be displayed on the screen? If true the attributes below determine how they will be displayed.
wordWrap default word wrap The text wrapping policy. See the defaults object for more details.
font default font The font for the visible text. See the defaults object for more details.
fontSize default font size The font size. See the defaults object for more details.
fontColour default font colour The font colour. See the defaults object for more details.
x 0 The x position of the text bounding box. See the text object for more details.
y 0 The y position of the text bounding box. See the text object for more details.
width 176 The width of the text bounding box. See the text object for more details.
height 144 The height of the text bounding box. See the text object for more details.
halign left The horizontal alignment of the text. See the text object for more details.
valign top The vertical alignment of the text. See the text object for more details.
passwordChar If set, display this character instead of the keys entered by the user. This can be used to conceal passwords.



stream

The stream tag is used to play an audio stream, a video stream, both an audio stream and a video stream, or an audio-video stream. Note that streams will be buffered for a quarter second before playing. If both an audio and video stream are specified, they will be synchronized.

Audio streams can be read from wav files (MIME type audio/x-wav), mp3 files (MIME type audio/mpeg), and raw format au files (MIME type audio/basic). Note that the raw format is 16-bit signed 16kHz samples in little-endian order, and is specific to this application. This is not the Sun Microsystems 8kHz u-law format.

Attribute Default value Description
asrc The URL of the audio stream. Supported formats are wav, mp3, and raw audio.
vsrc The URL of the video stream. Supported format is QCIF mpeg4.
avsrc The URL of the audio-video stream.
bytesPerSecond The playback speed of the video stream. Only applies to QCIF mpeg4 streams.



capture

The capture tag is used to capture the stream of audio/video being sent to/from the remote phone. It is usually used to capture audio and video being sent from the phone, but can also record what the phone is receiving, or the audio in both directions. The only limitation is that transcoded video being sent to the phone cannot be captured because it is never available as decoded frames within the session.

When the capture tag is encountered, the address of the remote capture script specified in the url attribute is called with the following HTTP GET parameters.

url The URL from which to read the captured inbound stream.
contentType The content type of the captured inbound stream.

When called, the remote capture script should immediately start reading from the address passed through in the url parameter. If it fails to connect to the URL within 60 seconds the capture will be aborted.

By default the captured inbound stream is encoded as MPEG video, but different formats, including audio-only, can be requested with the contentType attribute.

Normally when capture begins, all input is buffered then sent as fast as the connection will allow once the remote capture script connects and starts downloading the stream. This is ideal for recorded-message applications where all input must be captured. However, for applications where the stream must be processed in real time, setting the realtime attribute to true will disable buffering and ensure that stream data will be sent as soon as it is received from the remote handset. Note that this will result in the loss of some data if the capture script is slow to connect or the connection is too slow to support the stream.

Capture ends when another capture tag is encountered with a different URL or the call ends. If the new capture URL is not an empty string, then a new capture will begin immediately to that URL.

Note that the captured data does not always strictly conform to the format of the specified content type. Many audio and video formats have headers that specify the length of the clip, but because the data is streamed the header must be sent before the length is known, so dummy values are used. Some players can cope with this, others can't. There may be some trial-and-error involved in finding a format that works.

Attribute Default value Description
url The URL of the remote capture script.
contentType video/mpeg The format of the captured stream. Supported formats are video/mpeg, video/x-msvideo, video/x-ms-wmv, audio/amr, audio/mpeg, audio/x-wav, and audio/x-pn-realaudio.
realtime false Do we capture the stream in real time?
audioDirection fromHandset Which audio stream to capture. Valid options are fromHandset, toHandset, both, and none. When the both option is selected, both sides of the conversation are combined and captured.
videoDirection fromHandset Which video stream to capture. Valid options are fromHandset, toHandset, and none.



call

The call tag is used to link a call to another handset, a SIP-capable softphone, or a streaming media source such as a webcam. A number of URL callbacks are provided to handle the various outcomes.

When the phone parameter is specified, an outbound call is made to that phone number. This call is charged at the usual outbound rate. When that call is answered, the two phone calls are tied together like a regular person-to-person call.

Alternatively, the conference parameter can be used to tie two or more existing calls together. The parameter can be any string, and when two calls specify the same value they will be linked together. Note that the engagedURL and rejectedURL parameters are not used in this case.

A third option is to specify the phoneConference parameter. This will initiate an outbound call to the phone number, unless a call to that number is already in progress. In that case, it will connect immediately to that call in conference mode.

Note that when a remote party is called using the phoneConference parameter they will always be the first party in the resulting conference. This differs from the phone parameter, where the caller is always the first party. This is important when the conference type is Broadcast or OneToMany.

Five different conference types are possible, which affect the behaviour of video in a conference with three or more parties.

  • Normal. The first party to connect will send video to all the other parties. The second party to connect will send to the first.
  • OneToMany. Video works the same as Normal mode.
  • FollowSpeaker. The current speaker will send video to all the other parties. The previous speaker will send to the current one.
  • SplitScreen. The screen is divided up to show all other parties simultaneously, up to a maximum of 31.
  • Broadcast. The first party to connect will send video to all the other parties. Nothing will be sent to the first party. This is the recommended conference type when the first party is a streaming media URL.

In Broadcast mode, audio is sent from the first party to all other parties. No audio is sent to the first party or between the other parties. OneToMany mode is similar, except that audio from the second party is sent to the first party. In all other modes audio will be shared among participants. They will hear audio from all parties except themselves.

When a participant leaves the conference, all parties move up one place unless there is only one participant left, in which case the conference ends. The conference also ends if the type is OneToMany or Broadcast and the first party leaves.

A call ends when another call tag is encountered with a different phone number or conference, or the conference ends. If the new phone number or conference is not an empty string, then a new call will begin immediately.

Most of the URL callbacks are self-explanatory, except for preambleURL. This is called when the call connects and, unlike other VIXML scripts, is played to the other party, not the caller. It plays until it reaches the end, then connectedURL is called and the call proceeds as usual. This callback is designed for situations when the called party needs to be briefed about the caller before speaking to them.

Note that the VIXML preamble script should only play audio and display static images. Pre-encoded mpeg4 video clips will not work, and other video formats will not play reliably.

Attribute Default value Description
phone The phone number to call. This may be a phone number (e.g. 0401234567) a SIP URL (e.g. sip:123@voip.com), or a streaming media URL (e.g. http://www.whatever.com/webcam.asf)
phoneConference The phone number to call, or to join in an existing conference.
conference The name of the conference to connect to.
conferenceType Normal The conference type. Valid values are Normal, OneToMany, FollowSpeaker, SplitScreen, and Broadcast.
type video The call type. Valid values are video, voice, and silent. The silent option should be used when calling a streaming media source that contains no audio.
fromPhone The phone number to call from. By default, calls are made with caller-ID suppressed, but this parameter allows the calling number to be explicitly set. When a phone number is being called valid values are in the range 03991093xx. When a SIP address is being called, the calling address will be sip:fromPhone@210.11.56.34
successURL The URL to jump to when the call has successfully completed.
connectedURL The URL to jump to when the call has connected.
engagedURL The URL to jump to if the call is engaged.
failedURL The URL to jump to if the phone number could not be dialled. Possible reasons include a non-existent or unreachable phone number.
rejectedURL The URL to jump to if the call is rejected. Sometimes calls are rejected by the recipient because they don't want to answer the call, sometimes they are rejected by the network because the phone is switched off or the recipient is not able to accept video calls.
preambleURL If specified, this URL is called when the call connects. The contents of the URL are played only to the other party, and are not visible to the caller. Once the script ends, connectedURL is called and the call proceeds as normal.



application

The application tag is used to run applications written in third-party scripting languages. Currently supported formats are Java and Flash.

Attribute Default value Description
type The application type. Java or Flash
src The URL of the application's source.



Examples

Here is a simple script that requests a 16-digit credit card number.

<vixml>
  <defaults bgcolour="#e0e0e0" beepDuration="250" fontColour="green" />
  <text font="bold" x="0" y="16" halign="centre" wordWrap="true">
    Enter your credit card number
  </text>
  <text x="0" y="56" halign="centre" wordWrap="true">Press # to delete, * to cancel</text>
  <speech voice="cmu_us_slt_arctic_hts">Please type in your credit card number</speech>
  <input name="ccnumber" acceptLength="16" acceptURL="http://www.mysite.com/cc.php"
        deleteKeys="#" cancelKeys="*" cancelURL="http://www.mysite.com/ccCancel.php?reason=user"
        timeoutDelay="20000" timeoutURL="http://www.mysite.com/ccCancel.php?reason=timout"
        visible="true" y="112" halign="left" />
</vixml>

Here is the same script generated using the VIXML schema.

<?xml version="1.0" encoding="UTF-8"?>
<tns:vixml xmlns:tns="http://www.vivatel.com.au/VIXMLSchema"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.vivatel.com.au/VIXMLSchema VIXMLSchema.xsd ">
  <tns:defaults bgcolour="#e0e0e0" beepDuration="250" fontColour="green" />
  <tns:text font="bold" x="0" y="16" halign="centre" wordWrap="true">
    Enter your credit card number
  </tns:text>
  <tns:text x="0" y="56" halign="centre" wordWrap="true">Press # to delete, * to cancel</tns:text>
  <tns:speech voice="cmu_us_slt_arctic_hts">Please type in your credit card number</tns:speech>
  <tns:input name="ccnumber" acceptLength="16" acceptURL="http://www.mysite.com/cc.php"
        deleteKeys="#" cancelKeys="*" cancelURL="http://www.mysite.com/ccCancel.php?reason=user"
        timeoutDelay="20000" timeoutURL="http://www.mysite.com/ccCancel.php?reason=timout"
        visible="true" y="112" halign="left" />
</tns:vixml>