PRO
afcorson
Australia
Asked
How can I record audio from the microphone for an extended period, such as up to 10 minutes? I want to capture a person speaking, save the audio as an MP3 file on my hard drive, and then send it to an AI platform for a response. I have attempted to use the Microphone skill in Synthiam ARC, but it failed to record anything and does not save the audio to my hard drive.
Could you provide guidance on how to achieve this functionality?
Related Hardware EZ-B v4
Related Control
Microphone
To record audio for an extended period using the Microphone skill in Synthiam ARC and save it as an MP3 file, you'll need to follow a few steps. Unfortunately, the Microphone skill in ARC doesn't directly support saving audio to your hard drive as an MP3 file. However, you can use a workaround by recording the audio and then using additional software to convert and save it. Here's a step-by-step guide:
Step 1: Configure the Microphone Skill
Add the Microphone Skill: Ensure the Microphone skill is added to your ARC project. You can do this by navigating to
Project -> Add Skill -> Audio -> Microphone
.Configure the Microphone:
Sample Rate
andSample Size
according to your needs. A higher sample rate will give you better quality but will also result in larger file sizes.Auto Record
if you want the recording to start automatically when sound is detected.Test the Microphone: Make sure your microphone is working correctly by checking the audio waveform in the Microphone skill's main window.
Step 2: Record the Audio
Start Recording: Click the
Record
button in the Microphone skill to start recording. You can record for as long as needed, but keep in mind that the buffer might have limitations.Stop Recording: Once you have recorded the desired audio, click the
Stop
button.Step 3: Export and Convert the Audio
Export the Recording: Use the
Export to Soundboard
button to save the recording to the Sound Board (EZB) skill. This will allow you to play it back, but it won't save it as an MP3 file.Save the Audio: To save the audio as an MP3 file, you will need to use external software. You can use a program like Audacity to record directly from your computer's audio output or convert the saved file from the Sound Board.
Step 4: Send to AI Platform
Convert to MP3: Use Audacity or another audio editing software to convert the recorded audio to an MP3 file.
Send to AI Platform: Once you have the MP3 file, you can use an API or platform-specific method to send the audio file to your AI platform for processing.
Additional Tips
If you encounter any issues or need further assistance, feel free to ask!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
That's a tricky question for @athena or me to answer because there is insufficient information. Let me explain...
An "AI platform" would use a specific communication protocol for its API. That could be an HTTP post, a WebSocket, a streaming HTTP post, etc. There are several ways.
an "AI Platform" API would also have a specific document template or application protocol specification. This is their "form" for providing data to their server.
The response from the "AI Platform" would be in the communication protocol outlined in #1 with a specific document template or application protocol specification outlined in #2
So, without knowing anything, the equivalent would be standing yelling random sounds at someone, expecting them to do something. I can imagine making a pretty good Monty Python skit.
The first thing you can do is research the "AI Platform's" API. After that, you'll know how to send data to it. Once you know how to send data, you'll see the response and how to parse it. At that point, you can determine how to record and include the audio in the transmission.
Sending the audio in the transmission could become a complicated script or be straightforwardbut right now, it's unknown.
The more detail you can share, the closer you'll accomplish the goal.
Thank you for the additional context, DJ_Sures! You're absolutely right. To effectively send audio to an AI platform, we need to understand the specific requirements and protocols of that platform. Here's a more detailed approach to tackle this:
Step 1: Understand the AI Platform's API
Research the API Documentation:
Identify Audio Requirements:
Step 2: Record and Prepare Audio
Record Audio:
Convert Audio:
Step 3: Send Audio to the AI Platform
Develop a Script:
Net.hTTPPost()
function in ARC to send the audio file if the platform uses HTTP POST. You may need to include additional headers or authentication tokens as specified in the API documentation.Handle the Response:
Step 4: Test and Iterate
If you can provide more details about the AI platform you're using, I can offer more specific guidance on how to integrate it with ARC. Feel free to ask if you have any questions or need further assistance!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
Ok. I have done some more thinking on this. I can use Windows Sound Recorder easy enough to capture a person speaking for an extended period. I then use something like Descript.com to convert the audio to text. I then send the text to https://deepai.org/chat/debate to get an insightful response. Then I want to post this text response somewhere on the internet so my robot can read and speak it. This is where I am stuck, as HTTPget("https://rur.free.nf/dialogue_script.txt") does not retrieve the file content. Bear in mind that only the file read/speaking is performed on the robot's PC. Everything else prior to that is performed on a separate laptop.
It looks like dj has some insight which you’ve expanded details on the use case. I’m sure when dj looks at this, he’ll have a proposed solution/idea for you.
because the microphone robot skill does not save the audio to the drive as a wave or mp3, that can’t be used as Athena suggested.
a third party audio recording software would be ideal. One way would be to use a command line audio recorder. That can be launched from JavaScript in arc. Ideally one that allows a parameter with a time recording length, so it stops after a specified time.
then, that file could be read as binary into JavaScript in ARC and sent to the server using get appropriate http get or http post.
@athena do you have any insight while we wait for dj to chime in?
Certainly! Here's a step-by-step approach to achieve your goal using ARC and external tools:
Step 1: Record Audio
Step 2: Convert Audio to Text
Step 3: Send Text to DeepAI
Step 4: Post Text Response Online
Step 5: Retrieve and Speak Text on Robot's PC
Use HTTPGet in ARC:
Net.hTTPGet()
function in ARC to retrieve the content of the text file. Here's a sample JavaScript code snippet to do this:Check File Accessibility: Ensure the file is accessible without any authentication or restrictions. You can test this by opening the URL in a web browser.
Additional Tips
Net.hTTPGet()
is not retrieving the content, check for any network issues or URL errors. Ensure the file is hosted correctly and accessible.If you encounter any issues or need further assistance, feel free to ask!
I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
As a backround, my motivation for doing this came from an AI debate I attended which had four speakers and Pepper. Two spoke in favour of AI development and two against it. Pepper listened to each speaker, then provided a thoughtful debate type response directed at each of them. Very impressive as there was no scripting provided to Pepper.
I still can't get HTTPGet("https://www.rur.free.nf/dialogue_script.txt") to return anything, even though the Chromium window retrieves the content ok - see image. The HTTPGet command returns a message about Javascript which makes no sense (see Script 3 window). It would be useful if I could make use of the content in the Chromium Window, but the available control commands don't allow it.
The website that you're using has protection to prevent "bots" from crawling it and accessing the content, such as the script.txt. The javascript error you are receiving is this..
As you can see the error reads "This site requires Javascript to work, please enable Javascript in your browser or use a browser with Javascript support"
That server is filtering out bots. When you programmatically interact with a website, you're a bot. If that website has an API, that'll be meant for bots and programming. I guess what's happening is the browser requests the page, and the server feeds the page via a javascript program rather than responding with the content.
Let me see what i can do about the web browser robot skill. There might be a way to throw the content into a variable to use.
Thanks for the update to the Chromium skill with the global content variable. This works well and allows me to retrieve and speak the text. The value of the Chromium content variable could get quite huge if the person speaks for 5 mins or so. I will have to try out the whole process by recording a live speaker and test out the whole process.
Yeah that's true. Another option would be to have a robot skill that records audio for a long time and writes to a file to send to your ai platform.
What's the process you are currently taking to generate the file and send it to the ai platform? Because maybe if i have some time i can think of a robot skill that could be useful for stuff like this - instead of using advanced bing speech recognition i guess. I think the biggest challenge you're up against is how to perform text to speech on a super long audio recording.
There seems to be numerous online audio to text converters out there - this one I tried was very good (https://transcribetotext.ai). But I need to record a 5 min speech and thoroughly test this site. I'll try and do this in the next day or so and report back. I'll send the text to https://deepai.org/chat/debate and see what it responds with.
I recorded a 6 min speech today, used transcribetotext.ai to convert to text, and sent the text to deepai.org/chat/debate for a reponse. Within a few seconds a meaningful response came back which my robot could speak. So the process is feasible. Just need to streamline the process and test it in a live environment.
I'm not sure if that can be automated. They don't have an API for performing that - they only seem to have vision for their API. here's a screenshot of what i saw on their website for the list of api's
You are right there. But at the rate AI is developing, I am sure someone will eventually provide an API for audio to text and debate responses.
Will the chat gpt not do a debate response? If you instruct the personality for it, it should.
then we just need to figure out the long recording analysis of speech to text from an audio file.
I haven't tried ChatGPT for a debate type response. I'll give it a go and compare the results.
I can get a reasonably good debate response from Chat GPT using a prompt such as this...
I want you to act as a debater. I will provide you with one person’s discussion on a topic and your task is to research both sides of the topic, present valid arguments for each side, refute opposing points of view, and draw persuasive conclusions based on evidence. Your goal is to help people come away from the discussion with increased knowledge and insight into the topic at hand. You may address the speaker by name. Here is the first speaker.
At this point, the speaker's talk translated to text, needs to be added.