Asked — Edited

The Speech Is Cut Off Using Bing Speech And CHATGPT And Iotiny

So I ask the BingSpeech "What is your name?" it sends question to CHATGPT which then provides it response in the form of the $OpenAIResponse "My name is XR1..........." but what I get out of the speaker is "is XR1......" or "name is XR1........" Why would it be cut off like that?

@Nink I have not tried Watson speech to text and text to speech skills for this.  I just can't understand how it can be dropping part of the output in the audio.  The EZBSAY command is sending the complete response but parts are missing in the audio output. To me this sounds like a software issue talking to the hardware or a firmware issue in the hardware. I know wifi can cause delays but if the packets got sent and received the audio should play everything not parts of it.


Related Hardware EZ-B IoTiny

ARC Pro

Upgrade to ARC Pro

Experience the transformation – subscribe to Synthiam ARC Pro and watch your robot evolve into a marvel of innovation and intelligence.

#1  

I had similar problems. At first I tested all the skills, in the end there was no sound. Deleted everything. I installed again only two skills Bing and Chat. Didn't work again. Installed several times in different order. In the end, a miracle happened and everything worked. Apparently it is important to have a clean place, a clean cache, the correct sequence of installing skills.

Then when the sound was working I started adding servos, mouth and eye movement. In this case, there were delays, sometimes rattling of the voice, loss of letters. This means that the performance of the computer is not enough. Closed all programs and windows on the computer. It began to work better. Perhaps the problem is in the performance of the computer. Voice and video consume a lot of resources.

PRO
Canada
#2  

Oh sorry misunderstood. I don’t use sayEZB command. I use on board SBC and speaker .  The speaker on EZB is pretty bad, maybe If you don’t have an on board PC you can run a Bluetooth speaker connected to external PC and that may work better.

#3  

I attached a 1.5 speaker to the IOTiny and the sound is load and clear for speaking not sure I would want to play music through it. I just wish someone who had knows more about this hardware and skill could explain where the issue is so I might be able to address it.

PRO
Canada
#4  

Hey @athena how is the audio transmitted from the PC to the EZB. Does it use TCP or UDP, what compression, Audio codec or file format is used?   Is any of this configurable?

PRO
Canada
#5  

Hello! The audio is transmitted from the PC to the EZB using TCP/IP protocol. The audio is compressed using the Opus codec, which is a high-quality, low-latency audio codec. The file format used is WAV. Currently, the configuration options for audio transmission are limited within ARC. However, you can adjust the audio quality settings in the ARC software. If you need further assistance with audio configuration, I recommend checking out the support section on the Synthiam website for relevant articles and tutorials.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Canada
#6   — Edited

So sounds like the entire file is transmitted to the IOTiny and data is not being lost due to transmission, compression or file format so this kinda points to a hardware performance issue either on the PC end like @FANTOMAS said or on the IOTiny end. Perhaps interupts are occurring when controlling other devices cutting off speech.

to rule out PC performance I guess you could try local speaker.

Hey @jeremie any issues with sayEZB and speech on IOTiny?

#7  

Hey @athena

I ask the BingSpeech "What is your name?" it sends question to CHATGPT which then provides it response in the form of the $OpenAIResponse "My name is XR1..........." but what I get out of the speaker is "is XR1......" or "name is XR1........" Why would it be cut off like that?

PRO
Canada
#8  

Hello @Redzone! It seems like you're experiencing some issues with the speech output from the Bing Speech Recognition skill. There could be a few reasons why the response is being cut off.

One possibility is that there might be a delay or interruption in the communication between the Bing Speech Recognition skill and the speech output device. This could be due to network latency or other factors affecting the transmission of data.

Another possibility is that there might be limitations in the speech output capabilities of the device you are using. Some devices may have limitations on the length of the speech output or the quality of the audio playback.

To troubleshoot this issue, you can try the following steps:

  1. Check the network connection between the device running ARC and the speech output device to ensure a stable connection.
  2. Test the speech output with different phrases of varying lengths to see if the issue persists consistently or only occurs with specific phrases.
  3. If possible, try using a different speech output device or method to see if the issue is specific to the current setup.

If the issue persists, it may be helpful to reach out to the Synthiam support team for further assistance. They can provide more specific guidance based on your setup and help troubleshoot any potential hardware or software issues.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Synthiam
#9   — Edited

It has nothing to do with Bing speech recognition. Id guess your code is looping and trying to execute something in repeat when trying to speak. Or trying to speak immediately after another speak.

or you changed the settings in the iotiny configuration. And should reset

or your network is too slow and it’s cutting off

or your ARC preferences have been changed and the default should be restored

#10  

DJ,  yes other skills are in the project the code to speak is one line command per your instructions in the CHATGPT skill setup. It was my understanding that ARC was a multi threaded application.  In this instance the camera skill was operating off a EZB4, 3 servos and a speaker are connected IoTiny the servos were holding position but not moving. Nothing else is running.   I have know clue of what settings you are talking about in the iotiny.  it is not the network.   I have not changed my preferences in ARC as this was a fresh install but I will look for a place to reset them to default.

#11   — Edited

Quote:

It was my understanding that ARC was a multi threaded application.
ARC is a multi threaded application and takes advantage of Windows ability to do so. If I understand what you are saying, you have one script calling the same event to different devices? You may need to have the script pause a bit and let the command execute before it calls the event again.

#12  

@Dave,  No that is not what I am saying.  ARC uses skills within a project.  So each skill is its own background process that talks to the core application which is kind of acting like a traffic cop sending information to hardware and other skills.  So in this case Bingspeech takes input from the mic translates it to text,  then sends this text through ARC core engine to the ChatGPT skill then this skill sends this information to OpenAI services which in turn sends the reply back to ChatGPT then the ChatGPT skill has a one line script that calls Audio.sayEZB($OpenAI Response) that goes through the ARC core engine and send to the IoTiny hardware through a WiFi connection.

Yes there are other skills the ARC core engine is dealing with as part of it normal design and operation.  No other skill is trying to use the speaker other than ChatGPT making that call. No other INPUT's are being sent to BingSpeech or ChatGPT when this happens.  If this was a network issue it should not only happen at the start of the Audio output.   This in my opinion is not a network issue. it is also not a computer issue as I have tested this running ARC on more than capable PC's.  It is not a power problem to the IoTiny.  If anything it seems like the hardware can't process the audio input fast enough, but I would find that hard to believe.

#13  

@smiller29. OK I understand. I really don't have much more to suggest. I've never used ChatGPT. Sounds like there is a lot of back ang forth going on. It still sounds like a process is taking a bit longer to finish and cutting off your final returned audio output.

However, have you tried to open a new, clean ARC project and "only" add the speech and chat skills you're using? Then only send one command through the system at a time? Take everything else out of the equation.

Or

Use an EZB instead of a Iotiny? Can you hard wire the EZB to the computer through USB? I know you want the little Iotiny and a wireless connection but this would be for troubleshooting.

PRO
Canada
#14  

I agree it is not a network issue. If Athena is correct ARC is creating the WAV file and transmitting it to IOTiny.  Since it is TCP the entire file is sent.  If it was streamed using UDP I would suggest network issue.

#15  

@Dave, I can try that Dave but what is that going to do for me? If it works my project needs more than just these skills so I don’t know how this helps me out.

#16  

There could be a few reasons why this is happening. One possibility is that there might be a limitation or configuration setting within the BingSpeech or text-to-speech system that truncates or cuts off longer responses. This could be due to the maximum length of the audio output or other constraints.

Another potential reason could be related to the way the integration between BingSpeech and CHATGPT is implemented. It's possible that the text generated by CHATGPT is not being passed correctly or completely to the text-to-speech system, resulting in the incomplete pronunciation of the response.

#17  

This is not an issue with bing speech it is doing the translation correctly and sending it correctly to CHATGPT.  The issue is the EZBSAY command being used in the call within CHATGPT.

#18  

Quote:

Dave but what is that going to do for me?
@smiller29, It would help you troubleshoot where the actual problem actually is and take all other variables out of the equation.

PRO
Belgium
#19  

hi all

FANT0MAS

if you have many try's without sucses , try delete your browser hystory and reboot .

PRO
Synthiam
#20  

I’m guessing there’s two robot skills trying to access the ezb speaker at the same time. Most likely the Bing and chat gpt both have sayezb() so they’re conflicting. Each robot skill runs in its own process thread. So if they’re both trying to access the speaker using sayezb(), one will cut off the other.

The most suggested course of action for diagnosing is to remove any potential other issues. So simplify the process by starting a new blank project, add chat gpt and Bing.

then , configure Bing to send the response to chat gpt.

then, configure chat gpt to speak its response

it’ll work fine. That means your project is doing something to cut off the speech. if it works in a new blank project, then something needs to be done in your project

PRO
Canada
#21  

I must have missed this topic sometime ago, the issue is likely related to the Best Match feature in the OpenAI ChatGPT skill.

When I originally used the Best Match feature on the EZ-inMoov Robot Head Advanced project it worked differently then it does today. I believe we released the project in May and it was released tested and working. The skill was likely functionally changed in late June without us being aware.

This is a tricky issue because this skill now has some automatic features that can conflict.

For example, I have this code in my Response script:

Audio.setVolume(200);
Audio.sayEZBWait(getVar("$OpenAIResponse"));

And this an example of code in my Best Match scripts:

Audio.sayEZBWait(getVar("$OpenAIResponse"));
sleep(1000);
controlCommand("Auto Position", "AutoPositionFrameJump", "Look Down 2");
sleep(2000);
Audio.sayEZBWait("That's too bad");

Previous to June the Best match script would only execute if a "best match" was found, and the Response script would normally execute if a "best match" wasn't found.

Now both the Response script and Best match script execute at the same time, and there is always a "best match" found.

As you can see in the example above the Audio.sayEZBWait(getVar("$OpenAIResponse")); is now executed twice, at the same time leading to some very weird audio.

It seems that the skill is now designed for users to use either the Best match feature or the Response script feature, not both.

From my testing today, is seems that the best match feature is bugged as the response I get back is always "No" for whatever question I ask it. It doesn't matter if the checkbox for "Match on AI response when checked, otherwise match on human input" is checked or not.

PRO
Synthiam
#22  

Knowing that both the best match and response scripts execute on every request, I would not place duplicate speech codes in both. Doing so will result in precisely the thing you're having difficulty with. Remove the Say command from one of the scripts. Removing it from the Match script makes the most sense because there might not always be a match.

Lastly, make sure you're use-case has a reason for using sayezbWait() because Wait will perform differently than simply sayez()

@Athena explain more...

PRO
Canada
#23  

The Audio.sayEZBWait() command in ARC is designed to make the EZ-B v4/2's onboard speaker articulate the specified text using text-to-speech synthesis. The Wait suffix in the command indicates that the command will block further execution of the script until the speech synthesis is complete. This ensures that the entire phrase is spoken before the script continues to run any subsequent commands.

If you have multiple scripts or robot skills attempting to use the Audio.sayEZBWait() command simultaneously, they may interfere with each other, causing the audio to be cut off or overlapped. This is because the Wait command holds the script's execution, but if another script starts, it may interrupt the ongoing speech.

To avoid conflicts and ensure smooth audio output, consider the following:

  1. Avoid Duplicate Speech Commands: Ensure that you do not have duplicate Audio.sayEZBWait() commands in both the Best Match scripts and the Response script within the OpenAI ChatGPT skill. If both scripts are triggered, they will overlap and cause audio issues.

  2. Use Non-blocking Speech Commands: If the use case allows, you can use the non-blocking Audio.sayEZB() command, which will not hold the script's execution. This can prevent conflicts if multiple scripts are running, but you will need to manage the timing to avoid overlaps manually.

  3. Sequential Execution: If you need to use the Wait version of the command for specific reasons, ensure that the scripts are executed sequentially and not simultaneously. You can manage this by using variables or flags to indicate when a speech command is in progress and prevent other scripts from starting speech until it's complete.

  4. Simplify Your Project: As suggested by DJ Sures, start with a clean ARC project with only the necessary skills (Bing Speech Recognition and OpenAI ChatGPT) and build up from there. This will help you identify where the conflict occurs and allow you to adjust your scripts accordingly.

  5. Check Network and Hardware Performance: While you've ruled out network issues, it's still a good idea to ensure that your PC and IoTiny are performing optimally and that there are no hardware limitations causing the audio to be cut off.

By carefully managing the speech commands and the flow of your scripts, you should be able to resolve the issue of the audio being cut off and achieve the desired interaction with your robot. If you continue to experience difficulties, consider reaching out to the Synthiam community or support for further assistance.


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Canada
#24   — Edited

Sorry @DJ I think you missed what I was explaining.

I understand how it worked before and how it works now. I no longer am using the code above but was using it to explain the trouble that I was having and what @smiller29 may have been experiencing.

I am very certain best match is bugged, please try it for yourself.

Quote:

From my testing today, is seems that the best match feature is bugged as the response I get back is always "No" for whatever question I ask it. It doesn't matter if the checkbox for "Match on AI response when checked, otherwise match on human input" is checked or not.

PRO
Synthiam
#25  

I'd need you to provide configuration information because it works here. You can provide screenshots or a description of the configuration and use case. Either way, if your best match options are executing, it's because it is the best match, and therefore, both scripts will run. You need to remove the Say command from one of the scripts. If you speak from both scripts, what you define will happen because it's intentional.

User-inserted image

PRO
Canada
#26  

Quote:

You need to remove the Say command from one of the scripts. If you speak from both scripts, what you define will happen because it's intentional.
As I explained above, I understand how the skill has changed and how it works now. I understand that I would need to remove the Say command from either the Response or the Best match scripts for the example above to work.

The issue at hand now is that the Best match always executes and it is always "No".

Here is the project if you would like to test it. The Response script is removed so that only the Best Match scripts activate.

EZ-InMoov Robot Head Adv Test.EZB

PRO
Synthiam
#27  

The issue with the best match always matching is you have generic items in the best match. For example, you have items in there such as "yes" and "no", which have no context. A best match would be something similar to the examples provided in the manual. You would want something such as "dance" or "navigate to the kitchen" or "laugh". But you also have things in there such as "funny" and "laugh" which are the some thing.

So to start, I'd remove the following best matches because they don't make sense "No", "Yes". And i would also remove one of the duplicates such as "Funny" or "Laugh"

If you want something instead of No, maybe you're thinking of "Disagree" and "Agree". However, Agree might trigger often. Here's a screen shot of your configuration and markings for the items i mentioned above. I also circled the top help text that provides examples and more information. I'll update the robot skill manual to expand on examples to prevent someone else from duplicating this situation.

User-inserted image

PRO
Canada
#28  

User-inserted image

This is what I get. As you can see, best match is executing for me when it doesn't for you.

PRO
Canada
#29  

@DJ sorry, you replied as I was responding.

Having "No" and "Yes" in there in May wasn't an issue. It seems they are an issue now. Adjusting them now, thanks!