SmartBody : VHvanilla mobile app and the VHmobile platform

Ari Shapiro, Ph.D.

Andrew Feng, Ph.D.

Character Animation and Simulation Group


Anton Leuski, Ph.D.

Natural Language Dialogue Group


USC Institute for Creative Technologies

last modified 1/7/15

The design goal of the VHmobile platform is to provide a self-contained mobile architecture that can be easily scripted to generate chat-based virtual human applications. This is in contrast to traditional approaches to mobile app development which would involve either direct coding of a native language using the mobile API (e.g. Java on Android) or the use of a build tool (such as a game engine that can run on mobile architectures). The basic control mechanisms for a virtual human are either provided automatically, such as in the case of lip syncing to speech and automated nonverbal behavior.


There are two ways to use VHMobile:

1) using an app called VHVanilla that can be modified by changing the Python control scripts. VHVanilla contains all the code needed to run the application (including on the Google Cardboard platform) without using a separate development environment.

2) as an embedded library within another Android application. The Android application would be constructed using traditional mobile app tools, using the VHMobile library.

A build for iOS does not currently exist, although we expect that such a build could be done using many of the same approach, substituting the appropriate Android service for iOS service (Apple ASR instead of Google ASR, objective c instead of Java, etc.)

Quick Start

Download the VHvaniilla Android app on the Google Play store (approximately 500 mb).

You can also obtain a copy by contacting Ari Shapiro at

The app will install the supporting files into /sdcard/vhdata. There are several sample applications located in the /sdcard/vhdata

VHVanilla will run the contents of the /sdcard/vhdata/ file which contain the instructions for the application.

There are several examples of usage of a 3D scene that can all be tested by replacing the file with one of the following sample scripts:

Script for 3DDescription
init_chatbot.pyExample of using a chatbot. Uses speech recognition and TTS. 
init_TextToSpeechDemo.pyCreates buttons that when pushed caused the virtual human to speak with text-to-speech (TTS) synthesis.

Example of using the networking capabilities (VHMG) to communicate between two mobile devices.

Place the PhoneA and PhoneB files on different mobile devices. Pressing the button on one will cause the virtual human

to speak on the other, and vice versa.

init_SpeechRecognitionDemo.pyExample of using speech recognition. Character will echo what the user inputs.
init_DialogueNPCDemo.pyExample of using the dialogue classifier to achieve a question/answer
init_SensorDemo.pyExample of using sensor data to modify a virtual human's reactions (pick up or put down the mobile device).

The has also been copied to the file and will be run by default.



(above) Example of an application that uses natural voice.

(above) Example of using a chatbot virtual character in VHVanilla.

To switch between examples, copy that .py file to the file. To modify or change an application, simply modify the contents of (and other other needed scripts) and restart the application.

In addition to showing a 3D scene,  VHVanilla can operate in 'video' mode, which means that instead of displaying a 3D scene, it can instead display a set of videos.

Scripts for VideoDescription
init_PlayVideo.pyExample of playing a video.

To switch to this mode, change the contents of the file b commenting out the other two modes, and uncommenting the following mode:

scene.createStringAttribute("VHMobileRenderer",  "Video", True, "Display", 400, False, False, False, "VH Mobile Renderer Type")


(above) Example of using video playback in the VHVaniila app. Videos are played back in response to the user speech, which consults the classifier to return the proper video id, which is then played back.


In addition to showing a 3D scene and playing videos,  VHVanilla can operate in 'Google Cardboard' mode suitable for a Cardboard viewer, which means that the app will display a 3D scene with two viewers:

Scripts for Google CardboardDescription
init_CardboardDemo.pyExample of using vhmobile with Google Cardboard.

To put the app into Google Cardboard mode, uncomment only the following line in and comment out the other lines:

scene.createStringAttribute("VHMobileRenderer",  "Cardboard", True, "Display", 400, False, False, False, "VH Mobile Renderer Type")


(above) Example of using the Google Cardboard interface in the VHVaniila app.



VHvanilla and VHmobile software 

Obtaining a License

The Software is made available for academic or research purposes only. The license is for a copy of the executable program for an unlimited term. Individuals requesting a license for commercial use must pay for a commercial license.

USC Stevens Institute for Innovation
University of Southern California
1150 S. Olive Street, Suite 2300

Los Angeles, CA 90115, USA
ATTN: Accounting


For commercial license pricing and annual commercial update and support pricing, please contact:

Taylor Philips
USC Stevens Institute for Innovation
University of Southern California
1150 S. Olive Street, Suite 2300

Los Angeles, CA 90115, USA
Tel: +1 213.821.0943

Fax: +1 213-821-5001
Email: taylorp@stevens,


What is it?

VHmobile is a mobile platform library that makes the creation of chat-based virtual human characters easy. A virtual character can be created and made to speak using TTS voices and automated nonverbal behavior with only a few lines of Python code. In addition, it offers easy access to networking, sensors and voice recognition. VHvanilla is a mobile application that uses the VHmobile platform that includes a simplified widget (button) layout and a set of example scripts, as well as support for video playback and Google Cardboard VR viewing. 

What are the capabilities of the platform?

The platform includes an animation system, SmartBody, a dialogue management/classfication system, NPC Editor, a nonverbal behavior generator (Cerebella, light version), text-to-speech (Cereproc), a networking system (VHMSG), a set of 3D characters, 3D behaviors, and a Python-based API that allows the easy scripting of application control and virtual human behavior. Rendering is done in three ways: 3D rendering is done through SmartBody, video rendering is done via the Android platform, and Google Cardboard rendering is done through the SmartBody rendering and the Google Cardboard API. Using VHvanilla, there are no limitations to the extent that the application could be programmed through SmartBody and Python.

Is there an iOS version of VHmobile/VHvanilla as well?

Not yet, although it is the intention of the authors to generate one as well. 

Why did you call it VHvanilla?

VH = virtual human, and 'vanilla' refers to the basic (although still delicious..) contents of the mobile app. You download a mobile app which gives you the generic/vanilla capability, and it is up to you to 'flavor' the app to your liking.

Which characters can be used in the application?

There are a few characters (6) characters that can be used and set up with only one line of Python code. Each character has the ability to talk, gesture and emote.

What voices can be used for each character? Can I used different voices?

There are two Cereproc-based voices included: Star and Heather. Additional voices can be purchased from Cereproc and used in the application.

Can I use recorded speech instead of text-to-speech?
Yes, you can use  prerecorded speech by placing a .wav (sound), a .bml (lip sync) and a .xml (behavior) file in the /vhdata/sounds folder, then calling the appropriate BML command. Each sound (.wav) file needs to be processed by a phoneme scheduler to produce the lip sync file, then packaged in a BML description (.xml file) and put in the /vhdata/sounds folder. Then the character need to be switched from 'remote' to 'audiofile' voice (by setting the "voice" and "voiceCode" attributes on the character, see the SmartBody manual for details).

The VHvanilla app is rather big (500mb). How can I make a smaller application for distribution?

The VHvanilla app is intended to include all the necessary assets and capability for the VHmobile platform. As such, it is likely that not all the assets will be used. For example, you might use only one character in your application, even though there are 6 characters that could be used. To make a smaller app, you can use the VHvanilla source code and remove the assets that are not needed. THe VHmobile library itself is only 12 mb, so any application that uses it would be relatively small.

How do I control the widgets on the user interface in VHvanilla?

The VHvanilla app has a set format for widgets, which can be programmed in Python with the scripts. The widgets have a set placement in the application, and the scripting can show or hide them, as well as respond to button presses. If you want to create your own set of widgets or controls, you can either modify the layout in the VHvanilla app (you will need the VHvanilla source code for that) or you can create your own app by using VHmobile as a library. 

How do I change the lighting, camera angles, and other 3D features?

There are some convenience functions in the /vhdata/scripts folder, including which detail the lighting configuration, and that details the camera positioning and settings. The built-in renderer is capable of using both normal and specular maps on the characters. Other 3D features can be programmed using the standard SmartBody commands.

How can I see the debug information from the application?

Using Android Studio ( you can connect a USB cable to your app and see the output in the console. Look for the messages using the log 'SBM' to eliminate other android messages that would otherwise make reading this console output difficult.

Can I use this in a commercial application?

You will need a separate commercial license. The software is for noncommercial and academic/research purposes only.

Where can I get the VHmobile libraries?

Please contact Ari Shapiro at if you are interested in the VHmobile library.


Where can I ask questions/get support/report a bug?


You can use the SmartBody forums at :

Setting up a character in 3D

VHMobile includes the SmartBody animation which allows  you to set up and control a 3D character with various conversational capabilities, such as lip sync to speech and automated nonverbal behavior, lighting control, and so forth. Thus the application developer can access the entire SmartBody API using Python as described here:


Assets and Data

VHMobile requires a set of data, including characters, animations and control scripts. The following folders describe the data that is included in VHvanilla under the /sdcard/vhdata: 

behaviorsets/Character behaviors
cerevoice/Cereproc voices
classifier/Data needed for the classifier (NPC editor)
mesh/3D model assets and textures
motions/3D animations and skeletons
parser/Data needed for the Charniak parser
scripts/Convenience scripts for SmartBody
sounds/Folder for prerecorded speech
pythonlibs/supporting Python libraries
aiml/AIML python library for use with chatbots
alice/ALICE chatbot knowledge scripts


In addition, there are numerous helper scripts that make such a process simpler. Those scripts include the following located in the /sdcard/vhdata/scripts folder:

setupCharacterSets up characters with default behaviors: lip synching, gaze, gestures, locomotion
init-diphoneDefaultSets up the lip syncing data set for English.
light.pyDefault lighting.
camera.pyDefault cameras.
nonverbalbehavior.pyDefault nonverbal behavior (head and face movements, gestures, gaze) automatically generated when an utterance is processed.
zebra2-map.pyMapping file to convert characters from zebra2 format to SmartBody standard format.



Many different types of characters can be created including the following:

ChrAmity, ChrAlyssa, ChrHarrison, ChrJin, ChrJohn, ChrLindsay, ChrTessa

In order to create a character, the following command

setupCharacter(name, characterType, "", voiceType)

where name is the name of the character, characterType is one of the valid characters listed above (such as ChrAlyssa), and voiceType is the cereproc voice.


setupCharacter("mycharacter", "ChrAmity", "", "")
setupCharacter("mycharacter", "ChrHarrison", "", "")


Character TTS Voices

Note that currently two voices are available: Katherine (female) and Star (male). All female characters use the Katherine voice, and all male characters use the Star voice. Additional voices can be purchased from, and placed in the /vhdata/cereproc/voices folder.


To make a character speak, instruct the character using the following BML command:

bml.execXML("mycharacter", "<speech>peas porridge hot, peas porridge cold, peas porridge in the pot nine days old</speech>") 

If you want the character to speak using automated nonverbal behavior, run the following command which will return a more complicated behavior after running the utterance through the nonverbal behavior processor as follows:

bmlMsg = myengine.getNonverbalBehavior("peas porridge hot, peas porridge cold, peas porridge in the pot nine days old")
bml.execXML("mycharacter", bmlMsg) 


Character Prerecorded Voices

Characters can instead use prerecorded voices instead of TTS voices. Prerecorded speech requires a sound file to play (.wav) file, a lip sync file (.bml) and a nonverbal behavior file to play while speaking (.xml).

 To configure a character to use prerecorded speech, run the following commands:

scene.addAssetPath("audio", "/vhdata/sounds")
character = scene.getCharacter("mycharacter")
character.setStringAttribute("voice", "audiofile")
character.setStringAttribute("voiceCode", ".")

This sets up the location where SmartBody will look for the sound files (/vhdata/sounds) then for your character, use those files located in a particular subdirectory(/vhdata/sounds/.) for the .wav. .bml and .xml  files, Please consult the SmartBody manual for information on how to use recorded speech. Playing recorded speech is similar to playing TTS speech, but instead of specifying text, you instead specify an id that indicates the location of the sound and behaviorfiles: 

bml.execXML("mycharacter", "<speech id="peas"/>") 

Which assumes that peas.wav, peas.bml and peas.xml exist.

Character Behaviors

Characters can be easily configured with SmartBody behaviors by calling the addBehavior() function. Currently supported behaviors are: Gestures (male), FemaleGestures(female) and locomotion.

addBehavior("mycharacter", "Gestures") # for male characters
addBehavior("mycharacter", "FemaleGestures") # for female characters

Character Posture

Gestures are designed to start from an initial posture, and return to that same posture. SmartBody includes mechanisms to coarticulate gestures (keep hands and arms in gesture space as two gestures are played back-to-back). To make sure that the character is in the proper posture for gesturing, the folliowing postures should be set via BML:

bml.execXML('mycharacter', '<body posture="ChrBrad@Idle01"/>') 

Note that any posture can be set for the characters. However, only gestures associated with that posture will be able to generate automatic nonverbal behavior.

Other Character Capabilities

The characters have the full functionality of other SmartBody characters, including gazing, breathing, saccadic eye movements, reaching, touching and so forth. Please see the SmartBody manual for more details.

Support for Video playback

The VHVanilla app includes support for playing videos in place of showing a 3D scene. To activate it, make sure that the file contains a line as follows:

scene.createStringAttribute("VHMobileRenderer",  "Video", True, "Display", 400, False, False, False, "VH Mobile Renderer Type")

This will instruct the VHVanilla program to use the video setup, allowing the playback of videos. 

Support for Google Cardboard

The VHVanilla app includes support for Google Cardboard. To activate it, make sure that the file contains a line as follows:

scene.createStringAttribute("VHMobileRenderer",  "Cardboard", True, "Display", 400, False, False, False, "VH Mobile Renderer Type")

This will instruct the VHVanilla program to use the Google Cardboard setup, and all rendering will be done within the Google Cardboard views. The button press in Cardboard will be exposed to the eventButtonTouch() function as follows:

def eventButtonTouch(self, buttonName, action):
	if (buttonName == 'cardboard_button'):
		if (action == 0): # ACTION_DOWN
			# button has been pressed 
		elif (action == 1): # ACTION_UP
			# button has been released

VHMobile API Usage

The main control is run by creating an instance of the VHEngine class and respond to the updates and callbacks during execution. The VHEngine contains callbacks to events, such as an event that is called every time a simulation step is made, or one when the automated speech recognition recognizes words spoken by the user, or when a button or screen is touched. The application can respond to such events by overriding methods in the VHEngine class. So to create an application, you must extend the VHEngine class, then create an instance of it in Python as follows:

class MyVHEngine(VHEngine):
	def eventInit(self):
		# called when the application is started

	def eventInitUI(self):
		# called when the application is activated 
	def eventStep(self):
		# called on every step of the simulation
	def eventVoiceRecognition(self, recognitionText, isSuccess):
		# called when the voice recognizer returns a response. If isSuccess is True, recognitionText contains the words spoken. If isSuccess is False, no words were recognized
	def eventScreenTouch(self, action, x, y):
		# called when the screen is touched. For action, 0 = pressed, 1 = released, 2 = moved, 3 = cancelled, 4 = none, x & y are the screen coordinates
	def eventButtonTouch(self, buttonName, action):
		# called a button is touched. buttonName is the name of the button, For action, (what are the action values...?)

	def eventButtonClick(self, buttonName):
		# called a button is touched. buttonName is the name of the button.
	def eventVideoCompletion(self, videoViewName):
		# called when a video stops playing. videoViewName is the name of the video viewer.
	def eventWord(self, timing, word, emphasis):
		# called when an utterance is being processed. timing is the BML timing (T1, T14, etc.) of the word, word is the word text, emphasis is True/False to indicate whether or not emphasis was placed on the word.

	def eventPartOfSpeech(self, timing, partOfSpeech):
		# called when an utterance is being processed. timing is the BML timing (T1, T14, etc.) of the word, part of speech is the syntactical part of the utterance, such as a noun phrase (NP), verb phrase (VP), and so forth.

myengine = MyVHEngine()
setVHEngine(myengine )	

The built-in Python modules are available for use (math, sys). Other Python libraries can be included by specifying the location of the Python libraries in the script:

import sys


In addition, there are many methods that can be called on the VHEngine class that implements various types of behaviors as follows:

Application Control APIDescriptionExample
exitApp()Exits the application.exitApp()
Voice Recognition APIDescriptionExample
Starts the voice recognition.

Stops the voice recognition.

Once the voice recognition system completes, eventVoiceRecognition() will be called.

Voice Generation APIDescriptionExample

Initializes the text-to-speech engine.

ttsType can be any valid text-to-speech engine name. Currently, only 'cereproc' is supported.

Nonverbal Behavior APIDescriptionExample
result = getNonverbalBehavior(utterance)

returns XML for execution a behavior based on an utterance.

Utterance will be processed by the nonverbal behavior processor, and can include

word emphasis (using a '+' sign) or deemphasis (using a '-' sign). For each word

processed, the eventOnWord() function will be called. For each syntactic structure

found, the function eventPartOfSpeech() will be called.

result = getNonverbalBehavior("hello, my name is John, and I +really like sushi.")
result = getBehaviorName()returns a list of named BML behaviors
result = getBehaviorNames()
result = getBehavior(behaviorName, start, end)returns BML for a behavior name given a start and end time..
result = getBehavior("big_smile", 1, 6)
Video APIDescriptionExample
playVideo(viewName, videoFile, isLooping)

plays a video in the viewer area. isLooping determines if the video will loop.

When video finishes, eventVideoCompletion() will be called.

playVideo("myviewer", '"/sdcard/myvideo.mp4", False)
stopVideo(viewName)stops a video from playing in the viewer areastopvideo("myviewer")
Sensor APIDescriptionExample
enableSensor(sensorName, isEnable)

Enables the sensors.

sensorName can be 'accelerometer' or 'gyroscope'

enableSensor("accelerometer", True)

enableSensor("gyroscope", False)

vec = getAccelerometerValues()

Returns the values of the accelerometer in x/y/z

returned object is of type SrVec.

vec = getAccelerometerValues()
vec = getGyroscopValues()

Returns the values of the gyroscope in x/y/z

returned object is of type SrVec.

vec = getGyroscopValues()
Dialogue APIDescriptionExample

Initializes the NPC editor classifier.

Filename should be a .csxml file 

answer = classify(state, question)

Returns an answer to a question stored in the classifier.

state is the character name or domain of the questions.

answer = classify("John", "What's your name?")
addQuestionAnswer(state, question, answer)
Adds a question/answer pair to the classifier.
addQuestionAnswer("John", "How old are you?", "I'm 32.")

Updates the classifier with the new set of questions/answers.

tempFileName is the name of a file location on disk

that will be used for temporary storage.

Network APIDescriptionExample
ret = isConnected()
 Determines if the app is connected to an ActiveMQ serverret = isConnected()
ret = connect(host)
Connects the app to an ActiveMQ server host.ret = connect("")
disconnect()Disconnects the app from an ActiveMQ server host.disconnect()
send(message)Sends a message to the ActiveMQ host.send("sb bml.execBML('*', '<head type=\"NOD\"/>')")
2D Interaction APIDescriptionExample
createDialogBox(dialogName, title, message, hasTextInput)

 Creates a dialog box.

dialogName is the name of the dialog.

title is the title text of the dialog.

message is the contents of the dialog.

createDialogBox('exitBox','Exit Program?', 'Are you sure you want to exit the program?')
setWidgetProperty(widgetName, visibility, text)

Enables a widget.

widgetName is the name of the widget.

visible is '1' if visible, '-1' if hidden

text is the text of the widget

setWidgetProperty('button1',1, 'Press To Speak')

setWidgetProperty('exit_button',1, 'Exit App')

3D Interaction APIDescriptionExample
Sets the background imagesetBackgroundImage('/sdcard/vhdata/office1.png')


Relationship To USC ICT Virtual Human Technologies

The VHmobile plaform is a self-contained Android-based mobile architecture that includes the following virtual human components:


Question/answer classifierNPC Editor
Nonverbal behaviorCerebella-like
Voice recognitionGoogle ASR
sensor accessgyroscope, accelerometer, GPS vi Android APIs


At a high level, the difference is that VHmobile/VHvanilla is designed for fast prototyping of virtual human chat applications, whereas the Virtual Human Toolkit is capable of much more complex applications, along with a higher barrier of entry. In addition:

1) components are embedded within the application; for example, both the TTS, the NPC Editor and Cerebella are embedded within the platform and do not need any external process to run.

2) the Virtual Human Toolkit uses the Unity game engine to construct the application and program flow, whereas VHmobile and VHvanilla uses its own rendering and program flow is controlled with Python.

3) The VHmobile library can be embedded within another mobile/Android application, whereas a Unity application must be generated from the Unity editor. The VHvanilla app can be downloaded, then reconfigured by changing Python files directly.

4) The Virtual Human Toolkit is capable of much more complex 3D and visually interesting scenes than is VHmobile/VHvanilla. The rendering in the Virtual Human Toolkit is controlled by Unity, whereas the rendering in VHmobile/VHvanilla is using a basic SmartBody renderer. 


Eliza_VHVanilla.png (image/png)
NDT on mobile.png (image/png)
vhvanilla_Cardboard.png (image/png)