Recording audio to send to AI platform

afcorson

Australia

Asked Nov 2024

How can I record audio from the microphone for an extended period, such as up to 10 minutes? I want to capture a person speaking, save the audio as an MP3 file on my hard drive, and then send it to an AI platform for a response. I have attempted to use the Microphone skill in Synthiam ARC, but it failed to record anything and does not save the audio to my hard drive.

Could you provide guidance on how to achieve this functionality?

Related Hardware EZ-B v4

Related Control Microphone

Jump to end

Upgrade to ARC Pro

Don't limit your robot's potential – subscribe to ARC Pro and transform it into a dynamic, intelligent machine.

Compare Pro Features View Subscription Plans

Athena AI

PRO

Canada

#1 Nov 2024

To record audio for an extended period using the Microphone skill in Synthiam ARC and save it as an MP3 file, you'll need to follow a few steps. Unfortunately, the Microphone skill in ARC doesn't directly support saving audio to your hard drive as an MP3 file. However, you can use a workaround by recording the audio and then using additional software to convert and save it. Here's a step-by-step guide:

Step 1: Configure the Microphone Skill

Add the Microphone Skill: Ensure the Microphone skill is added to your ARC project. You can do this by navigating to Project -> Add Skill -> Audio -> Microphone.
Configure the Microphone:
- Open the settings of the Microphone skill.
- Set the Sample Rate and Sample Size according to your needs. A higher sample rate will give you better quality but will also result in larger file sizes.
- Enable Auto Record if you want the recording to start automatically when sound is detected.
Test the Microphone: Make sure your microphone is working correctly by checking the audio waveform in the Microphone skill's main window.

Step 2: Record the Audio

Start Recording: Click the Record button in the Microphone skill to start recording. You can record for as long as needed, but keep in mind that the buffer might have limitations.
Stop Recording: Once you have recorded the desired audio, click the Stop button.

Step 3: Export and Convert the Audio

Export the Recording: Use the Export to Soundboard button to save the recording to the Sound Board (EZB) skill. This will allow you to play it back, but it won't save it as an MP3 file.
Save the Audio: To save the audio as an MP3 file, you will need to use external software. You can use a program like Audacity to record directly from your computer's audio output or convert the saved file from the Sound Board.

Step 4: Send to AI Platform

Convert to MP3: Use Audacity or another audio editing software to convert the recorded audio to an MP3 file.
Send to AI Platform: Once you have the MP3 file, you can use an API or platform-specific method to send the audio file to your AI platform for processing.

Additional Tips

Use a High-Quality Microphone: For better audio quality, use a high-quality external microphone.
Check Audio Settings: Ensure your computer's audio input settings are correctly configured to capture sound from the desired microphone.
Consider Buffer Limitations: Be aware of any buffer limitations in the Microphone skill, as it might not support very long recordings.

If you encounter any issues or need further assistance, feel free to ask!

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

DJ Sures

PRO

Synthiam

#2 Nov 2024 — Edited Nov 2024

That's a tricky question for @athena or me to answer because there is insufficient information. Let me explain...

An "AI platform" would use a specific communication protocol for its API. That could be an HTTP post, a WebSocket, a streaming HTTP post, etc. There are several ways.
an "AI Platform" API would also have a specific document template or application protocol specification. This is their "form" for providing data to their server.
The response from the "AI Platform" would be in the communication protocol outlined in #1 with a specific document template or application protocol specification outlined in #2

So, without knowing anything, the equivalent would be standing yelling random sounds at someone, expecting them to do something. I can imagine making a pretty good Monty Python skit.

The first thing you can do is research the "AI Platform's" API. After that, you'll know how to send data to it. Once you know how to send data, you'll see the response and how to parse it. At that point, you can determine how to record and include the audio in the transmission.

Sending the audio in the transmission could become a complicated script or be straightforward-but right now, it's unknown.

The more detail you can share, the closer you'll accomplish the goal.

Athena AI

PRO

Canada

#3 Nov 2024

Thank you for the additional context, DJ_Sures! You're absolutely right. To effectively send audio to an AI platform, we need to understand the specific requirements and protocols of that platform. Here's a more detailed approach to tackle this:

Step 1: Understand the AI Platform's API

Research the API Documentation:
- Look for the API documentation of the AI platform you intend to use. This will provide details on how to send data, including audio, to the platform.
- Identify the communication protocol (e.g., HTTP POST, WebSocket) and the data format (e.g., JSON, XML) required by the API.
Identify Audio Requirements:
- Check if the platform requires audio in a specific format (e.g., MP3, WAV) and any specific encoding settings.
- Determine if the platform supports streaming audio or if it requires a complete file upload.

Step 2: Record and Prepare Audio

Record Audio:
- Use the Microphone skill in ARC to capture audio. As mentioned earlier, you may need to use external software like Audacity to record directly from your computer's audio output if the Microphone skill's buffer is insufficient for long recordings.
Convert Audio:
- Convert the recorded audio to the required format using audio editing software if necessary.

Step 3: Send Audio to the AI Platform

Develop a Script:
- Write a script in ARC using JavaScript or Python to handle the HTTP POST or WebSocket communication with the AI platform.
- Use the Net.hTTPPost() function in ARC to send the audio file if the platform uses HTTP POST. You may need to include additional headers or authentication tokens as specified in the API documentation.
Handle the Response:
- Parse the response from the AI platform to extract the information or action you need.
- Implement any additional logic in ARC to process the response and trigger actions in your robot.

Step 4: Test and Iterate

Test the Integration:
- Run tests to ensure that the audio is being sent correctly and that the AI platform is responding as expected.
Iterate and Improve:
- Make any necessary adjustments to the script or audio processing based on the test results.

If you can provide more details about the AI platform you're using, I can offer more specific guidance on how to integrate it with ARC. Feel free to ask if you have any questions or need further assistance!

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

afcorson

PRO

Australia

#4 Nov 2024

Ok. I have done some more thinking on this. I can use Windows Sound Recorder easy enough to capture a person speaking for an extended period. I then use something like Descript.com to convert the audio to text. I then send the text to https://deepai.org/chat/debate to get an insightful response. Then I want to post this text response somewhere on the internet so my robot can read and speak it. This is where I am stuck, as HTTPget("https://rur.free.nf/dialogue_script.txt";) does not retrieve the file content. Bear in mind that only the file read/speaking is performed on the robot's PC. Everything else prior to that is performed on a separate laptop.

Synthiam Support

Canada

#5 Nov 2024

It looks like dj has some insight which you’ve expanded details on the use case. I’m sure when dj looks at this, he’ll have a proposed solution/idea for you.

because the microphone robot skill does not save the audio to the drive as a wave or mp3, that can’t be used as Athena suggested.

a third party audio recording software would be ideal. One way would be to use a command line audio recorder. That can be launched from JavaScript in arc. Ideally one that allows a parameter with a time recording length, so it stops after a specified time.

then, that file could be read as binary into JavaScript in ARC and sent to the server using get appropriate http get or http post.

@athena do you have any insight while we wait for dj to chime in?

Athena AI

PRO

Canada

#6 Nov 2024

Certainly! Here's a step-by-step approach to achieve your goal using ARC and external tools:

Step 1: Record Audio

Use Windows Sound Recorder: As you've mentioned, you can use Windows Sound Recorder to capture the audio. Ensure the audio is saved in a format that can be easily processed by Descript.com.

Step 2: Convert Audio to Text

Use Descript.com: Upload the recorded audio to Descript.com to convert it into text. Save the text file on your laptop.

Step 3: Send Text to DeepAI

Send Text to DeepAI: Use the DeepAI API to send the text and receive a response. Ensure you have the necessary API keys and follow the API documentation for sending requests.

Step 4: Post Text Response Online

Host the Text File: Use a web hosting service to upload the text response. Ensure the file is publicly accessible via a URL.

Step 5: Retrieve and Speak Text on Robot's PC

Use HTTPGet in ARC:
- Ensure the URL is correct and publicly accessible. The URL should point directly to the text file.
- Use the Net.hTTPGet() function in ARC to retrieve the content of the text file. Here's a sample JavaScript code snippet to do this:
```
var url = "https://rur.free.nf/dialogue_script.txt";;
var response = Net.hTTPGet(url);

if (response) {
    // Use the Audio.say() function to speak the text
    Audio.say(response);
} else {
    print("Failed to retrieve the text file.");
}
```
Check File Accessibility: Ensure the file is accessible without any authentication or restrictions. You can test this by opening the URL in a web browser.

Additional Tips

Debugging: If Net.hTTPGet() is not retrieving the content, check for any network issues or URL errors. Ensure the file is hosted correctly and accessible.
Error Handling: Implement error handling in your script to manage any issues with file retrieval or text-to-speech conversion.

If you encounter any issues or need further assistance, feel free to ask!

I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.

afcorson

PRO

Australia

#7 Nov 2024 — Edited Nov 2024

As a backround, my motivation for doing this came from an AI debate I attended which had four speakers and Pepper. Two spoke in favour of AI development and two against it. Pepper listened to each speaker, then provided a thoughtful debate type response directed at each of them. Very impressive as there was no scripting provided to Pepper.

I still can't get HTTPGet("https://www.rur.free.nf/dialogue_script.txt";) to return anything, even though the Chromium window retrieves the content ok - see image. The HTTPGet command returns a message about Javascript which makes no sense (see Script 3 window). It would be useful if I could make use of the content in the Chromium Window, but the available control commands don't allow it.

DJ Sures

PRO

Synthiam

#8 Nov 2024 — Edited Nov 2024

The website that you're using has protection to prevent "bots" from crawling it and accessing the content, such as the script.txt. The javascript error you are receiving is this..

function toNumbers(d){var e=[];d.replace(/(..)/g,function(d){e.push(parseInt(d,16))});return e}function toHex(){for(var d=[],d=1==arguments.length&&arguments[0].constructor==Array?arguments[0]:arguments,e="",f=0;f<d.length;f++)e+=(16>d[f]?"0":"")+d[f].toString(16);return e.toLowerCase()}var a=toNumbers("f655ba9d09a112d4968c63579db590b4"),b=toNumbers("98344c2eee86c3994890592585b49f80"),c=toNumbers("378e4fc764461e537b807d8286002d4f");document.cookie="__test="+toHex(slowAES.decrypt(c,2,a,b))+"; expires=Thu, 31-Dec-37 23:55:55 GMT; path=/"; location.href="https://www.rur.free.nf/dialogue_script.txt?i=1";This site requires Javascript to work, please enable Javascript in your browser or use a browser with Javascript support

As you can see the error reads "This site requires Javascript to work, please enable Javascript in your browser or use a browser with Javascript support"

That server is filtering out bots. When you programmatically interact with a website, you're a bot. If that website has an API, that'll be meant for bots and programming. I guess what's happening is the browser requests the page, and the server feeds the page via a javascript program rather than responding with the content.

Let me see what i can do about the web browser robot skill. There might be a way to throw the content into a variable to use.

afcorson