当前位置:   article > 正文

python调用麦克风和扬声器,并调用百度实时语音转文字_python麦克风

python麦克风
  1. # [1]导入必要的模块和配置百度的 SDK
  2. import time
  3. import queue
  4. import sounddevice as sd
  5. import numpy as np
  6. from aip import AipSpeech
  7. import sys
  8. # 百度云配置信息
  9. APP_ID = '' # 替换为实际的 APP ID
  10. API_KEY = '' # 替换为实际的 API KEY
  11. SECRET_KEY = '' # 替换为实际的 SECRET KEY
  12. client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
  13. # Queue to hold the recorded audio data
  14. audio_queue = queue.Queue()
  15. speaker_queue = queue.Queue()
  16. # Callback function to capture audio data from microphone
  17. def audio_callback(indata, frames, time, status):
  18. if status:
  19. print(status, file=sys.stderr)
  20. audio_queue.put(indata.copy())
  21. # Callback function to capture audio data from speaker
  22. def speaker_callback(indata, frames, time, status):
  23. if status:
  24. print(status, file=sys.stderr)
  25. speaker_queue.put(indata.copy())
  26. # [2]实现实时语音识别类
  27. class RealTimeSpeechRecognizer:
  28. def __init__(self, client, name):
  29. self.client = client
  30. self.name = name
  31. def send_audio(self, audio_data):
  32. result = self.client.asr(audio_data, 'pcm', 16000, {
  33. 'dev_pid': 1537,
  34. })
  35. if result.get('err_no') == 0:
  36. print(f"{self.name} 识别结果: {result['result']}")
  37. else:
  38. print(f"{self.name} 错误: {result['err_msg']}")
  39. # 调用百度的语音转文字的接口
  40. def recognize_speech(audio_data, recognizer):
  41. audio_data = np.concatenate(audio_data)
  42. recognizer.send_audio(audio_data.tobytes())
  43. # [3]开始音频流并处理音频数据
  44. def start_audio_stream(mic_recognizer, speaker_recognizer, speaker_device_index):
  45. with sd.InputStream(callback=audio_callback, channels=1, samplerate=16000, dtype='int16') as mic_stream, \
  46. sd.InputStream(callback=speaker_callback, channels=1, samplerate=16000, dtype='int16', device=speaker_device_index) as spk_stream:
  47. print("Recording audio... Press Ctrl+C to stop.")
  48. mic_audio_buffer = []
  49. speaker_audio_buffer = []
  50. try:
  51. while True:
  52. while not audio_queue.empty():
  53. mic_audio_buffer.append(audio_queue.get())
  54. while not speaker_queue.empty():
  55. speaker_audio_buffer.append(speaker_queue.get())
  56. if len(mic_audio_buffer) >= 10:
  57. recognize_speech(mic_audio_buffer, mic_recognizer)
  58. mic_audio_buffer = [] # Clear buffer after sending
  59. if len(speaker_audio_buffer) >= 10:
  60. recognize_speech(speaker_audio_buffer, speaker_recognizer)
  61. speaker_audio_buffer = [] # Clear buffer after sending
  62. time.sleep(0.1)
  63. except KeyboardInterrupt:
  64. print("Stopping audio recording.")
  65. # [4]主程序入口
  66. if __name__ == "__main__":
  67. speaker_device_index = 8 # 使用 pulse 设备(索引 8)来捕获扬声器输出
  68. mic_recognizer = RealTimeSpeechRecognizer(client, "麦克风接收:")
  69. speaker_recognizer = RealTimeSpeechRecognizer(client, "扬声器接收:")
  70. start_audio_stream(mic_recognizer, speaker_recognizer, speaker_device_index)

一、实时的短语音识别场景

在某些应用场景中,可能需要同时捕获麦克风和扬声器的音频数据,例如以下几种情况:

1. 实时翻译和转录会议

在一个会议或对话中,你可能需要捕获:

  • 麦克风音频:捕获发言者的声音,进行实时转录。
  • 扬声器音频:捕获对话的另一方通过扬声器播放的声音,进行实时转录。

这样可以同时转录双方的发言,提供完整的对话记录。

2. 语言学习和教学

在语言学习的应用中,教师可能会播放音频材料,而学生则通过麦克风回答。捕获这两种音频数据可以帮助:

  • 麦克风音频:捕获学生的回答和发言。
  • 扬声器音频:捕获教师播放的音频材料。

通过同时转录这两种音频,可以对学生的发言和教师播放的材料进行分析和评估。

3. 语音控制系统

在语音控制系统中,系统可能需要捕获:

  • 麦克风音频:捕获用户的语音指令。
  • 扬声器音频:捕获系统播放的反馈音,确认系统是否正确播放了反馈信息。

这有助于确保系统对用户的指令做出了正确的反馈。

4. 电话会议录音

在电话会议中,可能需要捕获:

  • 麦克风音频:捕获本地发言者的声音。
  • 扬声器音频:捕获远程参与者通过扬声器播放的声音。

这样可以完整地记录整个会议过程。

基于上述场景,我的代码实现了同时捕获麦克风和扬声器的音频数据,并分别进行转录。

二、实现步骤

1. 导入必要的模块和配置百度的 SDK

  1. import time
  2. import queue
  3. import sounddevice as sd
  4. import numpy as np
  5. from aip import AipSpeech
  6. import sys
  7. # 百度云配置信息
  8. APP_ID = '你的 App ID' # 替换为实际的 APP ID
  9. API_KEY = '你的 Api Key' # 替换为实际的 API KEY
  10. SECRET_KEY = '你的 Secret Key' # 替换为实际的 SECRET KEY
  11. client = AipSpeech(APP_ID, API_KEY, SECRET_KEY)
  12. # Queue to hold the recorded audio data
  13. audio_queue = queue.Queue()
  14. speaker_queue = queue.Queue()
  15. # Callback function to capture audio data from microphone
  16. def audio_callback(indata, frames, time, status):
  17. if status:
  18. print(status, file=sys.stderr)
  19. audio_queue.put(indata.copy())
  20. # Callback function to capture audio data from speaker
  21. def speaker_callback(indata, frames, time, status):
  22. if status:
  23. print(status, file=sys.stderr)
  24. speaker_queue.put(indata.copy())

2. 实现实时语音识别类

  1. class RealTimeSpeechRecognizer:
  2. def __init__(self, client, name):
  3. self.client = client
  4. self.name = name
  5. def send_audio(self, audio_data):
  6. result = self.client.asr(audio_data, 'pcm', 16000, {
  7. 'dev_pid': 1537,
  8. })
  9. if result.get('err_no') == 0:
  10. print(f"{self.name} 识别结果: {result['result']}")
  11. else:
  12. print(f"{self.name} 错误: {result['err_msg']}")
  13. # 调用百度的语音转文字的接口
  14. def recognize_speech(audio_data, recognizer):
  15. audio_data = np.concatenate(audio_data)
  16. recognizer.send_audio(audio_data.tobytes())

3. 开始音频流并处理音频数据

  1. def start_audio_stream(mic_recognizer, speaker_recognizer, speaker_device_index):
  2. with sd.InputStream(callback=audio_callback, channels=1, samplerate=16000, dtype='int16') as mic_stream, \
  3. sd.InputStream(callback=speaker_callback, channels=1, samplerate=16000, dtype='int16', device=speaker_device_index) as spk_stream:
  4. print("Recording audio... Press Ctrl+C to stop.")
  5. mic_audio_buffer = []
  6. speaker_audio_buffer = []
  7. try:
  8. while True:
  9. while not audio_queue.empty():
  10. mic_audio_buffer.append(audio_queue.get())
  11. while not speaker_queue.empty():
  12. speaker_audio_buffer.append(speaker_queue.get())
  13. if len(mic_audio_buffer) >= 10:
  14. recognize_speech(mic_audio_buffer, mic_recognizer)
  15. mic_audio_buffer = [] # Clear buffer after sending
  16. if len(speaker_audio_buffer) >= 10:
  17. recognize_speech(speaker_audio_buffer, speaker_recognizer)
  18. speaker_audio_buffer = [] # Clear buffer after sending
  19. time.sleep(0.1)
  20. except KeyboardInterrupt:
  21. print("Stopping audio recording.")

4. 主程序入口

  1. if __name__ == "__main__":
  2. speaker_device_index = 8 # 使用 pulse 设备(索引 8)来捕获扬声器输出
  3. mic_recognizer = RealTimeSpeechRecognizer(client, "麦克风接收:")
  4. speaker_recognizer = RealTimeSpeechRecognizer(client, "扬声器接收:")
  5. start_audio_stream(mic_recognizer, speaker_recognizer, speaker_device_index)

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/喵喵爱编程/article/detail/994614
推荐阅读
相关标签
  

闽ICP备14008679号