Ari Shapiro, Ph.D. firstname.lastname@example.org
Andrew Feng, Ph.D. email@example.com
Anton Leuski, Ph.D. firstname.lastname@example.org
Natural Language Dialogue Group
USC Institute for Creative Technologies
last modified 1/7/15
The design goal of the VHmobile platform is to provide a self-contained mobile architecture that can be easily scripted to generate chat-based virtual human applications. This is in contrast to traditional approaches to mobile app development which would involve either direct coding of a native language using the mobile API (e.g. Java on Android) or the use of a build tool (such as a game engine that can run on mobile architectures). The basic control mechanisms for a virtual human are either provided automatically, such as in the case of lip syncing to speech and automated nonverbal behavior.
There are two ways to use VHMobile:
1) using an app called VHVanilla that can be modified by changing the Python control scripts. VHVanilla contains all the code needed to run the application (including on the Google Cardboard platform) without using a separate development environment.
2) as an embedded library within another Android application. The Android application would be constructed using traditional mobile app tools, using the VHMobile library.
A build for iOS does not currently exist, although we expect that such a build could be done using many of the same approach, substituting the appropriate Android service for iOS service (Apple ASR instead of Google ASR, objective c instead of Java, etc.)
Download the VHvaniilla Android app on the Google Play store (approximately 500 mb).
You can also obtain a copy by contacting Ari Shapiro at email@example.com
The app will install the supporting files into /sdcard/vhdata. There are several sample applications located in the /sdcard/vhdata
VHVanilla will run the contents of the /sdcard/vhdata/init.py file which contain the instructions for the application.
There are several examples of usage of a 3D scene that can all be tested by replacing the init.py file with one of the following sample scripts:
|Script for 3D||Description|
|init_chatbot.py||Example of using a chatbot. Uses speech recognition and TTS.|
|init_TextToSpeechDemo.py||Creates buttons that when pushed caused the virtual human to speak with text-to-speech (TTS) synthesis.|
Example of using the networking capabilities (VHMG) to communicate between two mobile devices.
Place the PhoneA and PhoneB files on different mobile devices. Pressing the button on one will cause the virtual human
to speak on the other, and vice versa.
|init_SpeechRecognitionDemo.py||Example of using speech recognition. Character will echo what the user inputs.|
|init_DialogueNPCDemo.py||Example of using the dialogue classifier to achieve a question/answer|
|init_SensorDemo.py||Example of using sensor data to modify a virtual human's reactions (pick up or put down the mobile device).|
The init_naturalvoice.py has also been copied to the init.py file and will be run by default.
(above) Example of an application that uses natural voice.
(above) Example of using a chatbot virtual character in VHVanilla.
To switch between examples, copy that .py file to the init.py file. To modify or change an application, simply modify the contents of init.py (and other other needed scripts) and restart the application.
In addition to showing a 3D scene, VHVanilla can operate in 'video' mode, which means that instead of displaying a 3D scene, it can instead display a set of videos.
|Scripts for Video||Description|
|init_PlayVideo.py||Example of playing a video.|
To switch to this mode, change the contents of the setup.py file b commenting out the other two modes, and uncommenting the following mode:
(above) Example of using video playback in the VHVaniila app. Videos are played back in response to the user speech, which consults the classifier to return the proper video id, which is then played back.
In addition to showing a 3D scene and playing videos, VHVanilla can operate in 'Google Cardboard' mode suitable for a Cardboard viewer, which means that the app will display a 3D scene with two viewers:
|Scripts for Google Cardboard||Description|
|init_CardboardDemo.py||Example of using vhmobile with Google Cardboard.|
To put the app into Google Cardboard mode, uncomment only the following line in setup.py and comment out the other lines:
(above) Example of using the Google Cardboard interface in the VHVaniila app.
VHvanilla and VHmobile software
Obtaining a License
The Software is made available for academic or research purposes only. The license is for a copy of the executable program for an unlimited term. Individuals requesting a license for commercial use must pay for a commercial license.
USC Stevens Institute for Innovation
University of Southern California
1150 S. Olive Street, Suite 2300
Los Angeles, CA 90115, USA
For commercial license pricing and annual commercial update and support pricing, please contact:
USC Stevens Institute for Innovation
University of Southern California
1150 S. Olive Street, Suite 2300
Los Angeles, CA 90115, USA
Tel: +1 213.821.0943
Fax: +1 213-821-5001
What is it?
VHmobile is a mobile platform library that makes the creation of chat-based virtual human characters easy. A virtual character can be created and made to speak using TTS voices and automated nonverbal behavior with only a few lines of Python code. In addition, it offers easy access to networking, sensors and voice recognition. VHvanilla is a mobile application that uses the VHmobile platform that includes a simplified widget (button) layout and a set of example scripts, as well as support for video playback and Google Cardboard VR viewing.
What are the capabilities of the platform?
The platform includes an animation system, SmartBody, a dialogue management/classfication system, NPC Editor, a nonverbal behavior generator (Cerebella, light version), text-to-speech (Cereproc), a networking system (VHMSG), a set of 3D characters, 3D behaviors, and a Python-based API that allows the easy scripting of application control and virtual human behavior. Rendering is done in three ways: 3D rendering is done through SmartBody, video rendering is done via the Android platform, and Google Cardboard rendering is done through the SmartBody rendering and the Google Cardboard API. Using VHvanilla, there are no limitations to the extent that the application could be programmed through SmartBody and Python.
Is there an iOS version of VHmobile/VHvanilla as well?
Not yet, although it is the intention of the authors to generate one as well.
Why did you call it VHvanilla?
VH = virtual human, and 'vanilla' refers to the basic (although still delicious..) contents of the mobile app. You download a mobile app which gives you the generic/vanilla capability, and it is up to you to 'flavor' the app to your liking.
Which characters can be used in the application?
There are a few characters (6) characters that can be used and set up with only one line of Python code. Each character has the ability to talk, gesture and emote.
What voices can be used for each character? Can I used different voices?
There are two Cereproc-based voices included: Star and Heather. Additional voices can be purchased from Cereproc and used in the application.
Can I use recorded speech instead of text-to-speech?
Yes, you can use prerecorded speech by placing a .wav (sound), a .bml (lip sync) and a .xml (behavior) file in the /vhdata/sounds folder, then calling the appropriate BML command. Each sound (.wav) file needs to be processed by a phoneme scheduler to produce the lip sync file, then packaged in a BML description (.xml file) and put in the /vhdata/sounds folder. Then the character need to be switched from 'remote' to 'audiofile' voice (by setting the "voice" and "voiceCode" attributes on the character, see the SmartBody manual for details).
The VHvanilla app is rather big (500mb). How can I make a smaller application for distribution?
The VHvanilla app is intended to include all the necessary assets and capability for the VHmobile platform. As such, it is likely that not all the assets will be used. For example, you might use only one character in your application, even though there are 6 characters that could be used. To make a smaller app, you can use the VHvanilla source code and remove the assets that are not needed. THe VHmobile library itself is only 12 mb, so any application that uses it would be relatively small.
How do I control the widgets on the user interface in VHvanilla?
The VHvanilla app has a set format for widgets, which can be programmed in Python with the scripts. The widgets have a set placement in the application, and the scripting can show or hide them, as well as respond to button presses. If you want to create your own set of widgets or controls, you can either modify the layout in the VHvanilla app (you will need the VHvanilla source code for that) or you can create your own app by using VHmobile as a library.
How do I change the lighting, camera angles, and other 3D features?
There are some convenience functions in the /vhdata/scripts folder, including lights.py which detail the lighting configuration, and camera.py that details the camera positioning and settings. The built-in renderer is capable of using both normal and specular maps on the characters. Other 3D features can be programmed using the standard SmartBody commands.
How can I see the debug information from the application?
Using Android Studio (http://developer.android.com/sdk/index.html) you can connect a USB cable to your app and see the output in the console. Look for the messages using the log 'SBM' to eliminate other android messages that would otherwise make reading this console output difficult.
Can I use this in a commercial application?
You will need a separate commercial license. The software is for noncommercial and academic/research purposes only.
Where can I get the VHmobile libraries?
Please contact Ari Shapiro at firstname.lastname@example.org if you are interested in the VHmobile library.
Where can I ask questions/get support/report a bug?
You can use the SmartBody forums at : http://smartbody.ict.usc.edu/forum
Setting up a character in 3D
VHMobile includes the SmartBody animation which allows you to set up and control a 3D character with various conversational capabilities, such as lip sync to speech and automated nonverbal behavior, lighting control, and so forth. Thus the application developer can access the entire SmartBody API using Python as described here:
Assets and Data
VHMobile requires a set of data, including characters, animations and control scripts. The following folders describe the data that is included in VHvanilla under the /sdcard/vhdata:
|classifier/||Data needed for the classifier (NPC editor)|
|mesh/||3D model assets and textures|
|motions/||3D animations and skeletons|
|parser/||Data needed for the Charniak parser|
|scripts/||Convenience scripts for SmartBody|
|sounds/||Folder for prerecorded speech|
|pythonlibs/||supporting Python libraries|
|aiml/||AIML python library for use with chatbots|
|alice/||ALICE chatbot knowledge scripts|
In addition, there are numerous helper scripts that make such a process simpler. Those scripts include the following located in the /sdcard/vhdata/scripts folder:
|setupCharacter||Sets up characters with default behaviors: lip synching, gaze, gestures, locomotion|
|init-diphoneDefault||Sets up the lip syncing data set for English.|
|nonverbalbehavior.py||Default nonverbal behavior (head and face movements, gestures, gaze) automatically generated when an utterance is processed.|
|zebra2-map.py||Mapping file to convert characters from zebra2 format to SmartBody standard format.|
Many different types of characters can be created including the following:
ChrAmity, ChrAlyssa, ChrHarrison, ChrJin, ChrJohn, ChrLindsay, ChrTessa
In order to create a character, the following command
setupCharacter(name, characterType, "", voiceType)
where name is the name of the character, characterType is one of the valid characters listed above (such as ChrAlyssa), and voiceType is the cereproc voice.
Character TTS Voices
Note that currently two voices are available: Katherine (female) and Star (male). All female characters use the Katherine voice, and all male characters use the Star voice. Additional voices can be purchased from www.cereproc.com, and placed in the /vhdata/cereproc/voices folder.
To make a character speak, instruct the character using the following BML command:
If you want the character to speak using automated nonverbal behavior, run the following command which will return a more complicated behavior after running the utterance through the nonverbal behavior processor as follows:
Character Prerecorded Voices
Characters can instead use prerecorded voices instead of TTS voices. Prerecorded speech requires a sound file to play (.wav) file, a lip sync file (.bml) and a nonverbal behavior file to play while speaking (.xml).
To configure a character to use prerecorded speech, run the following commands:
This sets up the location where SmartBody will look for the sound files (/vhdata/sounds) then for your character, use those files located in a particular subdirectory(/vhdata/sounds/.) for the .wav. .bml and .xml files, Please consult the SmartBody manual for information on how to use recorded speech. Playing recorded speech is similar to playing TTS speech, but instead of specifying text, you instead specify an id that indicates the location of the sound and behaviorfiles:
Which assumes that peas.wav, peas.bml and peas.xml exist.
Characters can be easily configured with SmartBody behaviors by calling the addBehavior() function. Currently supported behaviors are: Gestures (male), FemaleGestures(female) and locomotion.
Gestures are designed to start from an initial posture, and return to that same posture. SmartBody includes mechanisms to coarticulate gestures (keep hands and arms in gesture space as two gestures are played back-to-back). To make sure that the character is in the proper posture for gesturing, the folliowing postures should be set via BML:
Note that any posture can be set for the characters. However, only gestures associated with that posture will be able to generate automatic nonverbal behavior.
Other Character Capabilities
The characters have the full functionality of other SmartBody characters, including gazing, breathing, saccadic eye movements, reaching, touching and so forth. Please see the SmartBody manual for more details.
Automated Nonverbal Behavior
VHmobile has support for automated nonverbal behavior. To do this, an utterance is sent through two processes: a process that configures behavior per-word, and one that configures behavior based on a syntactical analysis (such as the presence of noun- or verb-phrases). To start this process run the getNonverbalBehavior() function:
In turn, a call to onWord and onPartOfSpeech will be called during the processing, for example:
There is already an extensive nonverbal behavior response as part of the nonverbalbehavior.py script, which can be invoked as follows:
There are a set of sample behaviors that can be accessed. The list of behaviors is as follows:
To invoke any of these behaviors, call the getBehavior method:
The getBehavior() methods returns a set of BML commands that denote that behavior. Note that the purpose of the behaviors is to provide an easy description of a behavior, and that BML can be substituted for such behaviors if needed. To get a list of these behaviors:
Support for Video playback
The VHVanilla app includes support for playing videos in place of showing a 3D scene. To activate it, make sure that the file setup.py contains a line as follows:
This will instruct the VHVanilla program to use the video setup, allowing the playback of videos.
Support for Google Cardboard
The VHVanilla app includes support for Google Cardboard. To activate it, make sure that the file setup.py contains a line as follows:
This will instruct the VHVanilla program to use the Google Cardboard setup, and all rendering will be done within the Google Cardboard views. The button press in Cardboard will be exposed to the eventButtonTouch() function as follows:
VHMobile API Usage
The main control is run by creating an instance of the VHEngine class and respond to the updates and callbacks during execution. The VHEngine contains callbacks to events, such as an event that is called every time a simulation step is made, or one when the automated speech recognition recognizes words spoken by the user, or when a button or screen is touched. The application can respond to such events by overriding methods in the VHEngine class. So to create an application, you must extend the VHEngine class, then create an instance of it in Python as follows:
The built-in Python modules are available for use (math, sys). Other Python libraries can be included by specifying the location of the Python libraries in the script:
In addition, there are many methods that can be called on the VHEngine class that implements various types of behaviors as follows:
|Application Control API||Description||Example|
|exitApp()||Exits the application.||exitApp()|
|Voice Recognition API||Description||Example|
|Starts the voice recognition.|
Stops the voice recognition.
Once the voice recognition system completes, eventVoiceRecognition() will be called.
|Voice Generation API||Description||Example|
Initializes the text-to-speech engine.
ttsType can be any valid text-to-speech engine name. Currently, only 'cereproc' is supported.
|Nonverbal Behavior API||Description||Example|
result = getNonverbalBehavior(utterance)
returns XML for execution a behavior based on an utterance.
Utterance will be processed by the nonverbal behavior processor, and can include
word emphasis (using a '+' sign) or deemphasis (using a '-' sign). For each word
processed, the eventOnWord() function will be called. For each syntactic structure
found, the function eventPartOfSpeech() will be called.
result = getNonverbalBehavior("hello, my name is John, and I +really like sushi.")
|result = getBehaviorName()||returns a list of named BML behaviors|
result = getBehaviorNames()
|result = getBehavior(behaviorName, start, end)||returns BML for a behavior name given a start and end time..|
result = getBehavior("big_smile", 1, 6)
|playVideo(viewName, videoFile, isLooping)|
plays a video in the viewer area. isLooping determines if the video will loop.
When video finishes, eventVideoCompletion() will be called.
|playVideo("myviewer", '"/sdcard/myvideo.mp4", False)|
|stopVideo(viewName)||stops a video from playing in the viewer area||stopvideo("myviewer")|
Enables the sensors.
sensorName can be 'accelerometer' or 'gyroscope'
vec = getAccelerometerValues()
Returns the values of the accelerometer in x/y/z
returned object is of type SrVec.
|vec = getAccelerometerValues()|
|vec = getGyroscopValues()|
Returns the values of the gyroscope in x/y/z
returned object is of type SrVec.
|vec = getGyroscopValues()|
Initializes the NPC editor classifier.
Filename should be a .csxml file
answer = classify(state, question)
Returns an answer to a question stored in the classifier.
state is the character name or domain of the questions.
answer = classify("John", "What's your name?")
addQuestionAnswer(state, question, answer)
|Adds a question/answer pair to the classifier.|
addQuestionAnswer("John", "How old are you?", "I'm 32.")
Updates the classifier with the new set of questions/answers.
tempFileName is the name of a file location on disk
that will be used for temporary storage.
ret = isConnected()
|Determines if the app is connected to an ActiveMQ server||ret = isConnected()|
ret = connect(host)
|Connects the app to an ActiveMQ server host.||ret = connect("192.168.0.25")|
|disconnect()||Disconnects the app from an ActiveMQ server host.||disconnect()|
|send(message)||Sends a message to the ActiveMQ host.||send("sb bml.execBML('*', '<head type=\"NOD\"/>')")|
|2D Interaction API||Description||Example|
createDialogBox(dialogName, title, message, hasTextInput)
Creates a dialog box.
dialogName is the name of the dialog.
title is the title text of the dialog.
message is the contents of the dialog.
|createDialogBox('exitBox','Exit Program?', 'Are you sure you want to exit the program?')|
|setWidgetProperty(widgetName, visibility, text)|
Enables a widget.
widgetName is the name of the widget.
visible is '1' if visible, '-1' if hidden
text is the text of the widget
setWidgetProperty('button1',1, 'Press To Speak')
setWidgetProperty('exit_button',1, 'Exit App')
|3D Interaction API||Description||Example|
|Sets the background image||setBackgroundImage('/sdcard/vhdata/office1.png')|
Relationship To USC ICT Virtual Human Technologies
The VHmobile plaform is a self-contained Android-based mobile architecture that includes the following virtual human components:
|Question/answer classifier||NPC Editor|
|Voice recognition||Google ASR|
|sensor access||gyroscope, accelerometer, GPS vi Android APIs|
At a high level, the difference is that VHmobile/VHvanilla is designed for fast prototyping of virtual human chat applications, whereas the Virtual Human Toolkit is capable of much more complex applications, along with a higher barrier of entry. In addition:
1) components are embedded within the application; for example, both the TTS, the NPC Editor and Cerebella are embedded within the platform and do not need any external process to run.
2) the Virtual Human Toolkit uses the Unity game engine to construct the application and program flow, whereas VHmobile and VHvanilla uses its own rendering and program flow is controlled with Python.
3) The VHmobile library can be embedded within another mobile/Android application, whereas a Unity application must be generated from the Unity editor. The VHvanilla app can be downloaded, then reconfigured by changing Python files directly.
4) The Virtual Human Toolkit is capable of much more complex 3D and visually interesting scenes than is VHmobile/VHvanilla. The rendering in the Virtual Human Toolkit is controlled by Unity, whereas the rendering in VHmobile/VHvanilla is using a basic SmartBody renderer.