Australia
Asked

Recording Audio To Send To AI Platform

How can I record audio from the microphone for an extended period, such as up to 10 minutes? I want to capture a person speaking, save the audio as an MP3 file on my hard drive, and then send it to an AI platform for a response. I have attempted to use the Microphone skill in Synthiam ARC, but it failed to record anything and does not save the audio to my hard drive.

Could you provide guidance on how to achieve this functionality?


Related Hardware EZ-B v4
Related Control Microphone

ARC Pro

Upgrade to ARC Pro

Get access to the latest features and updates before they're released. You'll have everything that's needed to unleash your robot's potential!

PRO
Canada
#1  

To record audio for an extended period using the Microphone skill in Synthiam ARC and save it as an MP3 file, you'll need to follow a few steps. Unfortunately, the Microphone skill in ARC doesn't directly support saving audio to your hard drive as an MP3 file. However, you can use a workaround by recording the audio and then using additional software to convert and save it. Here's a step-by-step guide:

Step 1: Configure the Microphone Skill

  1. Add the Microphone Skill: Ensure the Microphone skill is added to your ARC project. You can do this by navigating to Project -> Add Skill -> Audio -> Microphone.

  2. Configure the Microphone:

    • Open the settings of the Microphone skill.
    • Set the Sample Rate and Sample Size according to your needs. A higher sample rate will give you better quality but will also result in larger file sizes.
    • Enable Auto Record if you want the recording to start automatically when sound is detected.
  3. Test the Microphone: Make sure your microphone is working correctly by checking the audio waveform in the Microphone skill's main window.

Step 2: Record the Audio

  1. Start Recording: Click the Record button in the Microphone skill to start recording. You can record for as long as needed, but keep in mind that the buffer might have limitations.

  2. Stop Recording: Once you have recorded the desired audio, click the Stop button.

Step 3: Export and Convert the Audio

  1. Export the Recording: Use the Export to Soundboard button to save the recording to the Sound Board (EZB) skill. This will allow you to play it back, but it won't save it as an MP3 file.

  2. Save the Audio: To save the audio as an MP3 file, you will need to use external software. You can use a program like Audacity to record directly from your computer's audio output or convert the saved file from the Sound Board.

Step 4: Send to AI Platform

  1. Convert to MP3: Use Audacity or another audio editing software to convert the recorded audio to an MP3 file.

  2. Send to AI Platform: Once you have the MP3 file, you can use an API or platform-specific method to send the audio file to your AI platform for processing.

Additional Tips

  • Use a High-Quality Microphone: For better audio quality, use a high-quality external microphone.
  • Check Audio Settings: Ensure your computer's audio input settings are correctly configured to capture sound from the desired microphone.
  • Consider Buffer Limitations: Be aware of any buffer limitations in the Microphone skill, as it might not support very long recordings.

If you encounter any issues or need further assistance, feel free to ask!


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Synthiam
#2   — Edited

That's a tricky question for @athena or me to answer because there is insufficient information. Let me explain...

  1. An "AI platform" would use a specific communication protocol for its API. That could be an HTTP post, a WebSocket, a streaming HTTP post, etc. There are several ways.

  2. an "AI Platform" API would also have a specific document template or application protocol specification. This is their "form" for providing data to their server.

  3. The response from the "AI Platform" would be in the communication protocol outlined in #1 with a specific document template or application protocol specification outlined in #2

So, without knowing anything, the equivalent would be standing yelling random sounds at someone, expecting them to do something. I can imagine making a pretty good Monty Python skit.:D

The first thing you can do is research the "AI Platform's" API. After that, you'll know how to send data to it. Once you know how to send data, you'll see the response and how to parse it. At that point, you can determine how to record and include the audio in the transmission.

Sending the audio in the transmission could become a complicated script or be straightforwardbut right now, it's unknown.

The more detail you can share, the closer you'll accomplish the goal.:D

PRO
Canada
#3  

Thank you for the additional context, DJ_Sures! You're absolutely right. To effectively send audio to an AI platform, we need to understand the specific requirements and protocols of that platform. Here's a more detailed approach to tackle this:

Step 1: Understand the AI Platform's API

  1. Research the API Documentation:

    • Look for the API documentation of the AI platform you intend to use. This will provide details on how to send data, including audio, to the platform.
    • Identify the communication protocol (e.g., HTTP POST, WebSocket) and the data format (e.g., JSON, XML) required by the API.
  2. Identify Audio Requirements:

    • Check if the platform requires audio in a specific format (e.g., MP3, WAV) and any specific encoding settings.
    • Determine if the platform supports streaming audio or if it requires a complete file upload.

Step 2: Record and Prepare Audio

  1. Record Audio:

    • Use the Microphone skill in ARC to capture audio. As mentioned earlier, you may need to use external software like Audacity to record directly from your computer's audio output if the Microphone skill's buffer is insufficient for long recordings.
  2. Convert Audio:

    • Convert the recorded audio to the required format using audio editing software if necessary.

Step 3: Send Audio to the AI Platform

  1. Develop a Script:

    • Write a script in ARC using JavaScript or Python to handle the HTTP POST or WebSocket communication with the AI platform.
    • Use the Net.hTTPPost() function in ARC to send the audio file if the platform uses HTTP POST. You may need to include additional headers or authentication tokens as specified in the API documentation.
  2. Handle the Response:

    • Parse the response from the AI platform to extract the information or action you need.
    • Implement any additional logic in ARC to process the response and trigger actions in your robot.

Step 4: Test and Iterate

  • Test the Integration:
    • Run tests to ensure that the audio is being sent correctly and that the AI platform is responding as expected.
  • Iterate and Improve:
    • Make any necessary adjustments to the script or audio processing based on the test results.

If you can provide more details about the AI platform you're using, I can offer more specific guidance on how to integrate it with ARC. Feel free to ask if you have any questions or need further assistance!


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Australia
#4  

Ok. I have done some more thinking on this. I can use Windows Sound Recorder easy enough to capture a person speaking for an extended period. I then use something like Descript.com to convert the audio to text. I then send the text to https://deepai.org/chat/debate to get an insightful response. Then I want to post this text response somewhere on the internet so my robot can read and speak it. This is where I am stuck, as HTTPget("https://rur.free.nf/dialogue_script.txt";) does not retrieve the file content. Bear in mind that only the file read/speaking is performed on the robot's PC. Everything else prior to that is performed on a separate laptop.

#5  

It looks like dj has some insight which you’ve expanded details on the use case. I’m sure when dj looks at this, he’ll have a proposed solution/idea for you.

because the microphone robot skill does not save the audio to the drive as a wave or mp3, that can’t be used as Athena suggested.

a third party audio recording software would be ideal. One way would be to use a command line audio recorder. That can be launched from JavaScript in arc. Ideally one that allows a parameter with a time recording length, so it stops after a specified time.

then, that file could be read as binary into JavaScript in ARC and sent to the server using get appropriate http get or http post.

@athena do you have any insight while we wait for dj to chime in?

PRO
Canada
#6  

Certainly! Here's a step-by-step approach to achieve your goal using ARC and external tools:

Step 1: Record Audio

  1. Use Windows Sound Recorder: As you've mentioned, you can use Windows Sound Recorder to capture the audio. Ensure the audio is saved in a format that can be easily processed by Descript.com.

Step 2: Convert Audio to Text

  1. Use Descript.com: Upload the recorded audio to Descript.com to convert it into text. Save the text file on your laptop.

Step 3: Send Text to DeepAI

  1. Send Text to DeepAI: Use the DeepAI API to send the text and receive a response. Ensure you have the necessary API keys and follow the API documentation for sending requests.

Step 4: Post Text Response Online

  1. Host the Text File: Use a web hosting service to upload the text response. Ensure the file is publicly accessible via a URL.

Step 5: Retrieve and Speak Text on Robot's PC

  1. Use HTTPGet in ARC:

    • Ensure the URL is correct and publicly accessible. The URL should point directly to the text file.
    • Use the Net.hTTPGet() function in ARC to retrieve the content of the text file. Here's a sample JavaScript code snippet to do this:
    var url = "https://rur.free.nf/dialogue_script.txt";;
    var response = Net.hTTPGet(url);
    
    if (response) {
        // Use the Audio.say() function to speak the text
        Audio.say(response);
    } else {
        print("Failed to retrieve the text file.");
    }
    
  2. Check File Accessibility: Ensure the file is accessible without any authentication or restrictions. You can test this by opening the URL in a web browser.

Additional Tips

  • Debugging: If Net.hTTPGet() is not retrieving the content, check for any network issues or URL errors. Ensure the file is hosted correctly and accessible.
  • Error Handling: Implement error handling in your script to manage any issues with file retrieval or text-to-speech conversion.

If you encounter any issues or need further assistance, feel free to ask!


I am a customer support GPT-powered AI and still learning. If you have doubts about my response, please use the support documentation to verify my answer.
PRO
Australia
#7   — Edited

As a backround, my motivation for doing this came from an AI debate I attended which had four speakers and Pepper. Two spoke in favour of AI development and two against it. Pepper listened to each speaker, then provided a thoughtful debate type response directed at each of them. Very impressive as there was no scripting provided to Pepper.

I still can't get HTTPGet("https://www.rur.free.nf/dialogue_script.txt";) to return anything, even though the Chromium window retrieves the content ok - see image. The HTTPGet command returns a message about Javascript which makes no sense (see Script 3 window). It would be useful if I could make use of the content in the Chromium Window, but the available control commands don't allow it.

User-inserted image

PRO
Synthiam
#8   — Edited

The website that you're using has protection to prevent "bots" from crawling it and accessing the content, such as the script.txt. The javascript error you are receiving is this..

function toNumbers(d){var e=[];d.replace(/(..)/g,function(d){e.push(parseInt(d,16))});return e}function toHex(){for(var d=[],d=1==arguments.length&&arguments[0].constructor==Array?arguments[0]:arguments,e="",f=0;f<d.length;f++)e+=(16>d[f]?"0":"")+d[f].toString(16);return e.toLowerCase()}var a=toNumbers("f655ba9d09a112d4968c63579db590b4"),b=toNumbers("98344c2eee86c3994890592585b49f80"),c=toNumbers("378e4fc764461e537b807d8286002d4f");document.cookie="__test="+toHex(slowAES.decrypt(c,2,a,b))+"; expires=Thu, 31-Dec-37 23:55:55 GMT; path=/"; location.href="https://www.rur.free.nf/dialogue_script.txt?i=1";This site requires Javascript to work, please enable Javascript in your browser or use a browser with Javascript support


User-inserted image

As you can see the error reads "This site requires Javascript to work, please enable Javascript in your browser or use a browser with Javascript support"

That server is filtering out bots. When you programmatically interact with a website, you're a bot. If that website has an API, that'll be meant for bots and programming. I guess what's happening is the browser requests the page, and the server feeds the page via a javascript program rather than responding with the content.

Let me see what i can do about the web browser robot skill. There might be a way to throw the content into a variable to use.

PRO
Australia
#9  

Thanks for the update to the Chromium skill with the global content variable. This works well and allows me to retrieve and speak the text. The value of the Chromium content variable could get quite huge if the person speaks for 5 mins or so.  I will have to try out the whole process by recording a live speaker and test out the whole process.

PRO
Synthiam
#10  

Yeah that's true. Another option would be to have a robot skill that records audio for a long time and writes to a file to send to your ai platform.

What's the process you are currently taking to generate the file and send it to the ai platform? Because maybe if i have some time i can think of a robot skill that could be useful for stuff like this - instead of using advanced bing speech recognition i guess. I think the biggest challenge you're up against is how to perform text to speech on a super long audio recording.

PRO
Australia
#11  

There seems to be numerous online audio to text converters out there - this one I tried was very good (https://transcribetotext.ai). But I need to record a 5 min speech and thoroughly test this site. I'll try and do this in the next day or so and report back. I'll send the text to https://deepai.org/chat/debate and see what it responds with.

PRO
Australia
#12  

I recorded a 6 min speech today, used transcribetotext.ai to convert to text, and sent the text to deepai.org/chat/debate for a reponse. Within a few seconds a meaningful response came back which my robot could speak. So the process is feasible. Just need to streamline the process and test it in a live environment.

PRO
Synthiam
#13  

I'm not sure if that can be automated. They don't have an API for performing that - they only seem to have vision for their API. here's a screenshot of what i saw on their website for the list of api's

User-inserted image

PRO
Australia
#14  

You are right there. But at the rate AI is developing, I am sure someone will eventually provide an API for audio to text and debate responses.

PRO
Synthiam
#15  

Will the chat gpt not do a debate response? If you instruct the personality for it, it should.

then we just need to figure out the long recording analysis of speech to text from an audio file.

PRO
Australia
#16  

I haven't tried ChatGPT for a debate type response. I'll give it a go and compare the results.

PRO
Australia
#17  

I can get a reasonably good debate response from Chat GPT using a prompt such as this...

I want you to act as a debater. I will provide you with one person’s discussion on a topic and your task is to research both sides of the topic, present valid arguments for each side, refute opposing points of view, and draw persuasive conclusions based on evidence. Your goal is to help people come away from the discussion with increased knowledge and insight into the topic at hand. You may address the speaker by name. Here is the first speaker.

At this point, the speaker's talk translated to text, needs to be added.