University of Southern CaliforniaUSC
USC ICT TwitterUSC ICT FacebookUSC ICT YouTube

new lip syncing algorithm | General SmartBody Discussion | Forum

Avatar

Please consider registering
guest

sp_LogInOut Log In sp_Registration Register

Register | Lost password?
Advanced Search

— Forum Scope —




— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

sp_Feed Topic RSS sp_TopicIcon
new lip syncing algorithm
September 29, 2012
10:35 pm
Avatar
Admin
Forum Posts: 983
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

All,

We've added a new lip syncing algorithm to SmartBody. http://smartbody.ict.usc.edu/H.....IG2013.mp4

The method uses approximately 400 artists-designed diphone clips stitched together. It will work on any character has a compatible set of visemes, so it is very portable.

To make this work, you need to have a slightly different set of facial poses. We use the same poses that FaceFX uses for compatibility (the link is here: http://www.facefx.com/document.....n/2009/W76). Then you will need to make a few tweaks in the character to use this new method. I will post the details for this soon.

Unfortunately, the current characters in SmartBody don't include these visemes. My intention is to include different (updated) characters that will be able to use the new lip syncing method.

Ari

October 1, 2012
6:42 pm
Avatar
Member
Members
Forum Posts: 80
Member Since:
June 13, 2012
sp_UserOfflineSmall Offline

What's a viseme? A facial expression corresponding to a phoneme?

October 1, 2012
7:13 pm
Avatar
Admin
Forum Posts: 983
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

Yes. I'm using 'viseme' and 'facial pose' in the same way here. 'Facial pose' is more accurate.

Ari

October 27, 2012
1:37 am
Avatar
Member
Members
Forum Posts: 80
Member Since:
June 13, 2012
sp_UserOfflineSmall Offline

About those details coming soon... still coming?

Has the discussion of lip syncing in SmartBodyManual.pdf been updated for the new algorithm? If not, how far off is it? I.e., should I read it, or will it just lead me astray?

October 31, 2012
8:13 pm
Avatar
Member
Members
Forum Posts: 80
Member Since:
June 13, 2012
sp_UserOfflineSmall Offline

Hello? Any clues?

October 31, 2012
10:35 pm
Avatar
Admin
Forum Posts: 983
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

I've just checked in an initial version of bradrachel.py which shows the setup (although I don't yet have the meshes and textures).
The relevant parts are the facial pose setup like this:

rachelFace = scene.createFaceDefinition("ChrRachel")
rachelFace.setFaceNeutral("ChrRachel@face_neutral")
rachelFace.setViseme("open", "ChrRachel@open")
rachelFace.setViseme("W", "ChrRachel@W")
rachelFace.setViseme("ShCh", "ChrRachel@ShCh")
rachelFace.setViseme("PBM", "ChrRachel@PBM")
rachelFace.setViseme("FV", "ChrRachel@FV")
rachelFace.setViseme("wide", "ChrRachel@wide")
rachelFace.setViseme("tBack", "ChrRachel@tBack")
rachelFace.setViseme("tRoof", "ChrRachel@tRoof")
rachelFace.setViseme("tTeeth", "ChrRachel@tTeeth")

then loading the diphone animation set:
scene.run("init-diphoneDefault.py")

then connecting the character to that particular set:
rachel.setStringAttribute("diphoneSetName", "default")
rachel.setBoolAttribute("useDiphone", True)

then you can set her voice:

rachel.setVoice("remote")
rachel.setVoiceCode("MicrosoftAnna")

Then you should be able to run the TTS engine (ttsrelaygui) and then the lip syncing should work. Most of this is documented in the SmartBodyManual.pdf.
I'll be adding the mesh and textures so that you can see the full effect soon.

Ari

November 2, 2012
9:21 pm
Avatar
Admin
Forum Posts: 983
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

I've added a script called: bradrachel.py in sbm-common/scripts which you can use to try out the lip syncing algorithm.
Steps
-------
1) run a TTS engine:
run smartbody/bin/TtsRelay/bin/x86/Release/TtsRelayGui.exe
(on Linux or OSx, run bin/FestivalRelay/bin/bin/FestivalRelay.exe

2) run smartbody/core/sbm/bin/sbm-fltk.exe

3) File -> Load -> ../../../../data/sbm-common/scripts/bradrachel.py

4) View->Character->GPU Deformable Geometry
5) Window->Viseme Viewer
6) type 'hello my name is brad' (or whatever you want him to say) in the box next to the 'Speak' button, and then press 'Speak'
7) Change the Character dropdown to ChrRachel, and press Speak

Ari

November 13, 2012
2:21 am
Avatar
Member
Members
Forum Posts: 80
Member Since:
June 13, 2012
sp_UserOfflineSmall Offline

When doing lip sync from audio files, the manual assumes one has an XML file specifying when each phoneme starts. How does one generate such a file?

November 13, 2012
5:14 pm
Avatar
Admin
Forum Posts: 983
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

I am coming up with an open-source method of generating a speech XML file from an audio file; I believe that PocketSpinx can do that, but I haven't had the time to test that out yet. Internally, we are using a commercial solution to do this (FaceFx) which takes audio as input and generates phonemes and timings as output.

There is a program in smartbody/tools/word_breaker which can transform a text-to-speech utterance into a speech XML file (its not part of the standard build, but it's only a single file). It needs to be fixed to include the phonemes as well as the word timings, but it won't extract phonemes from an audiofile - only generate it from TTS.

Ari

January 11, 2013
10:23 pm
Avatar
Member
Members
Forum Posts: 80
Member Since:
June 13, 2012
sp_UserOfflineSmall Offline

Does the lip syncing algorithm automatically generate any head or eye movements to go with the visemes, or is it up to the user to specify additional behaviors?

January 11, 2013
11:25 pm
Avatar
Admin
Forum Posts: 983
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

There is another software component that works on top of SmartBody that we use in-house that does just that; it adds head movements, facial expressions, gestures eye saccades and so forth to an utterance automatically (cerebella). Here is an example of the output (the animation performance is generated automatically from the audio in combination with a SmartBody character that is loaded with a number of gestures):

http:/smartbody.ict.usc.edu/HTML/videos/cerebella/breakup-neutral-01-6.mp4

We haven't yet made this publicly available the way SmartBody is yet, although is you are interested in using it for your commercial product, we are open to talking about that.

Ari

January 11, 2013
11:47 pm
Avatar
Member
Members
Forum Posts: 80
Member Since:
June 13, 2012
sp_UserOfflineSmall Offline

We're interested... did you have any particular licensing arrangement in mind? Would you prefer to take this off the forum?

January 11, 2013
11:53 pm
Avatar
Admin
Forum Posts: 983
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

I'll email you directly about this.

Ari