University of Southern CaliforniaUSC
USC ICT TwitterUSC ICT FacebookUSC ICT YouTube

BML Commands fail to get speech, process timeout | General SmartBody Discussion | Forum

Avatar

Please consider registering
guest

sp_LogInOut Log In sp_Registration Register

Register | Lost password?
Advanced Search

— Forum Scope —




— Match —





— Forum Options —





Minimum search word length is 3 characters - maximum search word length is 84 characters

sp_Feed Topic RSS sp_TopicIcon
BML Commands fail to get speech, process timeout
June 27, 2017
10:31 am
Avatar
Member
Members
Forum Posts: 7
Member Since:
May 15, 2017
sp_UserOfflineSmall Offline

I am trying to match gestures with speech and using BML commands to send gesture and speech actions. I used the example provided in the SmartBody manual:

bml.execBML('*', '<speech type="application/ssml+xml" id="myspeech">
<mark name="T0"/>hello
<mark name="T1"/>
<mark name="T2"/>my
<mark name="T3"/>
<mark name="T4"/>name
<mark name="T5"/>
<mark name="T6"/>is
<mark name="T7"/>
<mark name="T8"/>Utah
<mark name="T9"/>
</speech>
<head type="NOD" start="myspeech:T4"/>')

However, when I slightly change this, for example change 'hello' as 'i' and 'my' as 'am', TTS engine not working and I cannot get any output from the avatar. It gives this error:

remote_speech::rVoiceTimeOut ERR: RemoteSpeechReply Message NOT RECEIVED for utterance #25. Please check if remote speech process is on and is accessible by SBM.

ERROR: BML::Processor::speechReply() exception:BehaviorRequest "BML_ChrBrad_sbm_test_bml_27_#1_<speech>_"myspeech"" Scheduling Exception: Speech Interface error: Remote speech process timed out.

Can you help me with this error? I only did a minor change I cannot see any reason why one is working while the latter did not.

 

 
 
 
July 21, 2017
9:19 am
Avatar
Admin
Forum Posts: 980
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

It sounds like the activemq messaging system isn't installed and/or the TTS Relay isn't running. Make sure that ActiveMQ server is running and the Window-> Run TTS Relay has been started.

 

Ari

August 8, 2017
2:10 pm
Avatar
Member
Members
Forum Posts: 7
Member Since:
May 15, 2017
sp_UserOfflineSmall Offline

The Speech Relay is working normally so this is not about ActiveMQ failure or TTS problem.

August 15, 2017
12:32 pm
Avatar
Admin
Forum Posts: 980
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

Perhaps you can post a screen grab of SmartBody and the TTSRelay side by side? 

August 18, 2017
2:56 pm
Avatar
Member
Members
Forum Posts: 3
Member Since:
August 18, 2017
sp_UserOfflineSmall Offline

Hi Ari,

I'm working with Nilay. Here is the setup I'm using. Nilay's is very similar, albeit on a separate machine.

smartbody and ttsrelaygui

The fundamental issue is that we can output an arbitrary amount of text only speech. I put in some lorum ipsum, and it spoke for 55s. However, as soon as we start adding mark elements, it's unreliable. Testing today, I thought there was a 10s barrier, but some that are less than that don't work, and some that are more than that do work. I've sent speech messages that I've created by hand (and added mark elements), and I've sent them that are generated by script (adding the mark elements procedurally).

Is there anything about processing speech with elements that changes the timeout expectations?

*Edit* To be clear, I understand that smartbody relies on ttsrelay completing its task of preparing speech -- however, it seems to be doing that. Furthermore, smartbody appears to "wait" much longer for regular text, and I haven't seen one such time out in many utterances. Can you tell us anything about smartbody's timeout behaviour or something related that may be causing what we're seeing?

We would like to sync behaviour with procedurally-generated speech, which of course requires mark elements.

Here's the ttsrelay output for one of the speech outputs that timed out.

Thanks for any advice you can give!

Processing message.

GenerateAudio() - 'i  think  i've  told  you  everything  about  myse
lf.  thanks  for  this  opportunity  to  speak  to  you  today!
 ' 'C:\data\cache\audio\utt_20170818_144057_ChrBrad_60.wav' 'C:\data\cache\audio\utt_2017081
8_144057_ChrBrad_60.wav' 'CereVoice|Adam|-|English|(East|Coast|America)'
Generating audio for message with voice: CereVoice|Adam|-|English|(East|Coast|America)
Selecting SAPI voice: CereVoice Adam - English (East Coast America)
Debug: Generating speech for SSML string: "i  think  i've  told  you  everything  about
 myself.  thanks  for  this  opportunity  to  speak  to  you  today! ."...

Total viseme duration: 0
Reached phoneme: ♦ at time: 0 for duration: 00:00:00.0040000

Total phoneme duration: 0.004
Total viseme duration: 0.004
Reached phoneme: a?i at time: 0.0049886 for duration: 00:00:00.0890000

Total phoneme duration: 0.093
Reached bookmark: T0 at time: 0.0949659

Total viseme duration: 0.093
Reached phoneme: ? at time: 0.0949659 for duration: 00:00:00.1700000

Total phoneme duration: 0.263
Total viseme duration: 0.263
Reached phoneme: ? at time: 0.2649886 for duration: 00:00:00.0390000

Total phoneme duration: 0.302
Total viseme duration: 0.302
Reached phoneme: ? at time: 0.3049886 for duration: 00:00:00.0800000

Total phoneme duration: 0.382
Total viseme duration: 0.382
Reached phoneme: k at time: 0.3849886 for duration: 00:00:00.0690000

Total phoneme duration: 0.451
Reached bookmark: T1 at time: 0.4548752

Total viseme duration: 0.451
Reached phoneme: a?i at time: 0.4548752 for duration: 00:00:00.1230000

Total phoneme duration: 0.574
Total viseme duration: 0.574
Reached phoneme: v at time: 0.5782766 for duration: 00:00:00.0400000

Total phoneme duration: 0.614
Reached bookmark: T2 at time: 0.6183219

Total viseme duration: 0.614
Reached phoneme: t at time: 0.6183219 for duration: 00:00:00.1200000

Total phoneme duration: 0.734
Total viseme duration: 0.734
Reached phoneme: o at time: 0.7384126 for duration: 00:00:00.0860000

Total phoneme duration: 0.82
Total viseme duration: 0.82
Reached phoneme: l at time: 0.8248072 for duration: 00:00:00.1390000

Total phoneme duration: 0.959
Total viseme duration: 0.959
Reached phoneme: d at time: 0.9643537 for duration: 00:00:00.0600000

Total phoneme duration: 1.019
Reached bookmark: T3 at time: 1.0245351

Total viseme duration: 1.019
Reached phoneme: j at time: 1.0245351 for duration: 00:00:00.0640000

Total phoneme duration: 1.083
Total viseme duration: 1.083
Reached phoneme: u at time: 1.0894784 for duration: 00:00:00.0790000

Total phoneme duration: 1.162
Reached bookmark: T4 at time: 1.1687074

Total viseme duration: 1.162
Reached phoneme: ? at time: 1.1687074 for duration: 00:00:00.0520000

Total phoneme duration: 1.214
Total viseme duration: 1.214
Reached phoneme: v at time: 1.2214512 for duration: 00:00:00.0600000

Total phoneme duration: 1.274
Total viseme duration: 1.274
Reached phoneme: ? at time: 1.2817687 for duration: 00:00:00.0390000

Total phoneme duration: 1.313
Total viseme duration: 1.313
Reached phoneme: i at time: 1.3217687 for duration: 00:00:00.0490000

Total phoneme duration: 1.362
Total viseme duration: 1.362
Reached phoneme: ? at time: 1.371746 for duration: 00:00:00.1590000

Total phoneme duration: 1.521
Total viseme duration: 1.521
Reached phoneme: ? at time: 1.531746 for duration: 00:00:00.0400000

Total phoneme duration: 1.561
Total viseme duration: 1.561
Reached phoneme: ? at time: 1.571746 for duration: 00:00:00.0990000

Total phoneme duration: 1.66
Reached bookmark: T5 at time: 1.671746

Total viseme duration: 1.66
Reached phoneme: ? at time: 1.671746 for duration: 00:00:00.0390000

Total phoneme duration: 1.699
Total viseme duration: 1.699
Reached phoneme: b at time: 1.7117006 for duration: 00:00:00.1070000

Total phoneme duration: 1.806
Total viseme duration: 1.806
Reached phoneme: a?? at time: 1.8192743 for duration: 00:00:00.1490000

Total phoneme duration: 1.955
Total viseme duration: 1.955
Reached phoneme: t at time: 1.9692517 for duration: 00:00:00.0310000

Total phoneme duration: 1.986
Reached bookmark: T6 at time: 2.0002721

Total viseme duration: 1.986
Reached phoneme: m at time: 2.0002721 for duration: 00:00:00.1030000

Total phoneme duration: 2.089
Total viseme duration: 2.089
Reached phoneme: a?i at time: 2.1037188 for duration: 00:00:00.0890000

Total phoneme duration: 2.178
Total viseme duration: 2.178
Reached phoneme: s at time: 2.1936054 for duration: 00:00:00.1900000

Total phoneme duration: 2.368
Total viseme duration: 2.368
Reached phoneme: ? at time: 2.3836281 for duration: 00:00:00.0590000

Total phoneme duration: 2.427
Total viseme duration: 2.427
Reached phoneme: l at time: 2.4436281 for duration: 00:00:00.1400000

Total phoneme duration: 2.567
Total viseme duration: 2.567
Reached phoneme: f at time: 2.5836281 for duration: 00:00:00.1900000

Total phoneme duration: 2.757
Total viseme duration: 2.757
Reached phoneme: ♦ at time: 2.7736507 for duration: 00:00:00.2000000

Total phoneme duration: 2.957
Total viseme duration: 2.957
Reached phoneme: ♦ at time: 2.9736507 for duration: 00:00:00.2000000

Total phoneme duration: 3.157
Reached bookmark: T7 at time: 3.1736507

Total viseme duration: 3.157
Reached phoneme: ? at time: 3.1736507 for duration: 00:00:00.0490000

Total phoneme duration: 3.206
Total viseme duration: 3.206
Reached phoneme: æ at time: 3.2236734 for duration: 00:00:00.1090000

Total phoneme duration: 3.315
Total viseme duration: 3.315
Reached phoneme: ? at time: 3.3336054 for duration: 00:00:00.0890000

Total phoneme duration: 3.404
Total viseme duration: 3.404
Reached phoneme: k at time: 3.4235827 for duration: 00:00:00.0450000

Total phoneme duration: 3.449
Total viseme duration: 3.449
Reached phoneme: s at time: 3.4686621 for duration: 00:00:00.0900000

Total phoneme duration: 3.539
Reached bookmark: T8 at time: 3.5586394

Total viseme duration: 3.539
Reached phoneme: f at time: 3.5586394 for duration: 00:00:00.1000000

Total phoneme duration: 3.639
Total viseme duration: 3.639
Reached phoneme: ? at time: 3.6586848 for duration: 00:00:00.0690000

Total phoneme duration: 3.708
Reached bookmark: T9 at time: 3.7283446

Total viseme duration: 3.708
Reached phoneme: ð at time: 3.7283446 for duration: 00:00:00.0980000

Total phoneme duration: 3.806
Total viseme duration: 3.806
Reached phoneme: ? at time: 3.8268934 for duration: 00:00:00.0500000

Total phoneme duration: 3.856
Total viseme duration: 3.856
Reached phoneme: s at time: 3.8771882 for duration: 00:00:00.1250000

Total phoneme duration: 3.981
Reached bookmark: T10 at time: 4.0022222

Total viseme duration: 3.981
Reached phoneme: ? at time: 4.0022222 for duration: 00:00:00.0920000

Total phoneme duration: 4.073
Total viseme duration: 4.073
Reached phoneme: p at time: 4.0950113 for duration: 00:00:00.0890000

Total phoneme duration: 4.162
Total viseme duration: 4.162
Reached phoneme: ? at time: 4.1849886 for duration: 00:00:00.0700000

Total phoneme duration: 4.232
Total viseme duration: 4.232
Reached phoneme: t at time: 4.2550113 for duration: 00:00:00.1400000

Total phoneme duration: 4.372
Total viseme duration: 4.372
Reached phoneme: u at time: 4.3950566 for duration: 00:00:00.0590000

Total phoneme duration: 4.431
Total viseme duration: 4.431
Reached phoneme: n at time: 4.4550113 for duration: 00:00:00.0600000

Total phoneme duration: 4.491
Total viseme duration: 4.491
Reached phoneme: ? at time: 4.5150566 for duration: 00:00:00.0790000

Total phoneme duration: 4.57
Total viseme duration: 4.57
Reached phoneme: t at time: 4.5950113 for duration: 00:00:00.0400000

Total phoneme duration: 4.61
Total viseme duration: 4.61
Reached phoneme: i at time: 4.6350566 for duration: 00:00:00.1290000

Total phoneme duration: 4.739
Reached bookmark: T11 at time: 4.765034

Total viseme duration: 4.739
Reached phoneme: t at time: 4.765034 for duration: 00:00:00.0600000

Total phoneme duration: 4.799
Total viseme duration: 4.799
Reached phoneme: ? at time: 4.8250793 for duration: 00:00:00.0390000

Total phoneme duration: 4.838
Reached bookmark: T12 at time: 4.865034

Total viseme duration: 4.838
Reached phoneme: s at time: 4.865034 for duration: 00:00:00.1100000

Total phoneme duration: 4.948
Total viseme duration: 4.948
Reached phoneme: p at time: 4.9750113 for duration: 00:00:00.0750000

Total phoneme duration: 5.023
Total viseme duration: 5.023
Reached phoneme: i at time: 5.0500226 for duration: 00:00:00.1300000

Total phoneme duration: 5.153
Total viseme duration: 5.153
Reached phoneme: k at time: 5.1800453 for duration: 00:00:00.0290000

Total phoneme duration: 5.182
Reached bookmark: T13 at time: 5.2099773

Total viseme duration: 5.182
Reached phoneme: t at time: 5.2099773 for duration: 00:00:00.0950000

Total phoneme duration: 5.277
Total viseme duration: 5.277
Reached phoneme: ? at time: 5.3049886 for duration: 00:00:00.0300000

Total phoneme duration: 5.307
Reached bookmark: T14 at time: 5.3350566

Total viseme duration: 5.307
Reached phoneme: j at time: 5.3350566 for duration: 00:00:00.1430000

Total phoneme duration: 5.45
Total viseme duration: 5.45
Reached phoneme: u at time: 5.4785487 for duration: 00:00:00.0490000

Total phoneme duration: 5.499
Reached bookmark: T15 at time: 5.5285714

Total viseme duration: 5.499
Reached phoneme: t at time: 5.5285714 for duration: 00:00:00.1190000

Total phoneme duration: 5.618
Total viseme duration: 5.618
Reached phoneme: ? at time: 5.648526 for duration: 00:00:00.0400000

Total phoneme duration: 5.658
Total viseme duration: 5.658
Reached phoneme: d at time: 5.6885714 for duration: 00:00:00.0390000

Total phoneme duration: 5.697
Total viseme duration: 5.697
Reached phoneme: e?i at time: 5.728526 for duration: 00:00:00.2690000

Total phoneme duration: 5.966
Total viseme duration: 5.966
Reached phoneme: ♦ at time: 5.9985034 for duration: 00:00:00.2000000

Total phoneme duration: 6.166
Total viseme duration: 6.166
Reached phoneme: ♦ at time: 6.1985487 for duration: 00:00:00.2000000

Total phoneme duration: 6.366
Reached bookmark: T16 at time: 6.3985487

Total viseme duration: 6.366
Reached phoneme: ♦ at time: 6.3985487 for duration: 00:00:00.4000000

Total phoneme duration: 6.766
August 23, 2017
11:31 am
Avatar
Admin
Forum Posts: 980
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

in remote_speech.h there is a default timeout of 10 seconds which SmartBody uses to determine when the TTS request should be cancelled. You can modify that value (constructor of remote_speech()  called from SpeechManager.cpp line 32).

 

Ari

August 30, 2017
5:54 pm
Avatar
Member
Members
Forum Posts: 3
Member Since:
August 18, 2017
sp_UserOfflineSmall Offline

Thanks for the response. I think I mislead you with the comment I put in the edit. Indeed, I can see smartbody timing out after 10s; I don't think we want to change that, as 10s would be a long time to wait for speech to be prepared. The issue we're having is that although utterances are prepared (and the audio file created in about 2s, we can see that happening), some don't get delivered back to smartbody. In other words, with a successful utterance, the ttsrelay window shows the debug message with the speak element and the prepared utterance ready for lip-syching; with an unsuccessful utterance, the viseme timing shows up (as with the log in my previous message) & the audio file is created, but the message doesn't get sent back to smartbody.

Plain text utterances, without the mark element, can be arbitrarily long, and there are no problems. However, once I start adding any number of mark elements (to later synchronize gestures), the process becomes unreliable.

I still can't nail down exactly when it fails, because it doesn't seem obvious (like a certain number that fails). I set up an utterance with one mark element and start adding text and it will fail with a two word piece of text, then work with a longer piece of text. So it's quite unpredictable. This recent testing is just through the smartbody BML commands window, so there shouldn't be anything else interfering.

September 1, 2017
9:15 am
Avatar
Admin
Forum Posts: 980
Member Since:
December 1, 2011
sp_UserOfflineSmall Offline

Is it the case that the message from the TTSRelay isn't getting sent back through the messaging system, or that SmartBody isn't processing it? If the former, then the answer lies within the TTSRelay code (possibly an error in the TTS Engine - does it happen for every voice, or just a particular voice?)

 

Ari

September 8, 2017
3:49 pm
Avatar
Member
Members
Forum Posts: 3
Member Since:
August 18, 2017
sp_UserOfflineSmall Offline

When the voice works, the TTSRelay process outputs "Debug: Sending reply ... " and then includes the generated speak element with the viseme timings. When it doesn't work, that never happens. So I suppose the TTSRelay is stumbling on the mark elements.

I'm surprised though; isn't this how vhtoolkit and other application that rely on smartbody generate synchronized speech? (ie with nvbg etc) So, I'm unsure how this is novel, even if it is a TTSRelay issue. 

Additionally, I was using a Cereproc voice initially, but have switched back to a default windows 8 voice and encountered the same problems, so I don't think that's the issue.

September 18, 2017
11:41 pm
Avatar
Member
Members
Forum Posts: 5
Member Since:
June 13, 2017
sp_UserOfflineSmall Offline
10sp_Permalink
Awaiting Moderation

Forum Timezone: America/Los_Angeles

Most Users Ever Online: 733

Currently Online: anton.1988dmon, hastingswiggins7, MyraRoberso52, Nasreen Zamir, jeff345qs, ledretrofitlights, subduedcorps4964, JaimePatrick54, Herbertnes, OliveSantos53
59 Guest(s)

Currently Browsing this Page:
1 Guest(s)

Top Posters:

jwwalker: 80

jyambao: 51

rbaral: 47

adiaz: 30

WargnierP: 29

lucky7456969: 28

mbarros: 28

avida.matt: 26

JonathanW: 24

laguerre: 23

Member Stats:

Guest Posters: 67

Members: 55092

Moderators: 3

Admins: 4

Forum Stats:

Groups: 1

Forums: 5

Topics: 429

Posts: 2348