PyAudio recording/capturing and stop/terminate in Python on Raspi - raspbian

I am not an expert of Python, trying to capture/record the Audio by USB audio device.
It is working fine on command terminal.
But I want to make a program which just recording audio and stop when I want.
I heard ab8 Pyaudio library which has certain API to perform this job(like pyaudio.PyAudio(), pyaudio.Pyaudio.open(), pyaudio.stream, pyaudio.stream.close, pyaudio.PyAudio.terminate().....
Can somebody help to make a simple program for audio recording in Python?
Thank you.

I just add relevant comments in front of commands, let me know If you want to clear more ab8 it
import pyaudio, wave, sys
CHUNK = 8192
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 10
WAVE_OUTPUT_FILENAME = 'Audio_.wav'
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
input_device_index = 0,
frames_per_buffer = CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done recording")
stream.stop_stream() # "Stop Audio Recording
stream.close() # "Close Audio Recording
p.terminate() # "Audio System Close
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()

Related

Using trained webcam on trained roboflow model

I'm trying to run a trained roboflow model using my webcam on visual code studio. The webcam does load up alongside the popup, but it's just a tiny rectangle in the corner and you can't see anything else. If i change "image", image to "image",1 or something else in the cv2.imshow line, the webcam lights up for a second and returns the error code:
cv2.error: OpenCV(4.5.4) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'
Here is my code as obtained from a github roboflow has:
# load config
import json
with open('roboflow_config.json') as f:
config = json.load(f)
ROBOFLOW_API_KEY = "********"
ROBOFLOW_MODEL = "penguins-ojf2k"
ROBOFLOW_SIZE = "416"
FRAMERATE = config["FRAMERATE"]
BUFFER = config["BUFFER"]
import asyncio
import cv2
import base64
import numpy as np
import httpx
import time
# Construct the Roboflow Infer URL
# (if running locally replace https://detect.roboflow.com/ with eg http://127.0.0.1:9001/)
upload_url = "".join([
"https://detect.roboflow.com/",
ROBOFLOW_MODEL,
"?api_key=",
ROBOFLOW_API_KEY,
"&format=image", # Change to json if you want the prediction boxes, not the visualization
"&stroke=5"
])
# Get webcam interface via opencv-python
video = cv2.VideoCapture(0,cv2.CAP_DSHOW)
# Infer via the Roboflow Infer API and return the result
# Takes an httpx.AsyncClient as a parameter
async def infer(requests):
# Get the current image from the webcam
ret, img = video.read()
# Resize (while maintaining the aspect ratio) to improve speed and save bandwidth
height, width, channels = img.shape
scale = min(height, width)
img = cv2.resize(img, (2000, 1500))
# Encode image to base64 string
retval, buffer = cv2.imencode('.jpg', img)
img_str = base64.b64encode(buffer)
# Get prediction from Roboflow Infer API
resp = await requests.post(upload_url, data=img_str, headers={
"Content-Type": "application/x-www-form-urlencoded"
})
# Parse result image
image = np.asarray(bytearray(resp.content), dtype="uint8")
image = cv2.imdecode(image, cv2.IMREAD_COLOR)
return image
# Main loop; infers at FRAMERATE frames per second until you press "q"
async def main():
# Initialize
last_frame = time.time()
# Initialize a buffer of images
futures = []
async with httpx.AsyncClient() as requests:
while True:
# On "q" keypress, exit
if(cv2.waitKey(1) == ord('q')):
break
# Throttle to FRAMERATE fps and print actual frames per second achieved
elapsed = time.time() - last_frame
await asyncio.sleep(max(0, 1/FRAMERATE - elapsed))
print((1/(time.time()-last_frame)), " fps")
last_frame = time.time()
# Enqueue the inference request and safe it to our buffer
task = asyncio.create_task(infer(requests))
futures.append(task)
# Wait until our buffer is big enough before we start displaying results
if len(futures) < BUFFER * FRAMERATE:
continue
# Remove the first image from our buffer
# wait for it to finish loading (if necessary)
image = await futures.pop(0)
# And display the inference results
img = cv2.imread('img.jpg')
cv2.imshow('image', image)
# Run our main loop
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(main())
# Release resources when finished
video.release()
cv2.destroyAllWindows()
It looks like you're missing your model's version number so the API is probably returning a 404 error which OpenCV is trying to read as an image.
I found your project on Roboflow Universe based on the ROBOFLOW_MODEL in your code; it looks like you're looking for version 3.
So try changing the line
ROBOFLOW_MODEL = "penguins-ojf2k"
to
ROBOFLOW_MODEL = "penguins-ojf2k/3"
It also looks like your model was trained at 640x640 (not 416x416) so you should change ROBOFLOW_SIZE to 640 as well for best results.

How to play a tensorflow `decode_wav()` waveform "without" IPython notebook?

In the tensorflow simple_audio example, how to play the waveform without IPython notebook?
Separate loading and playing works:
import pydub,simpleaudio
sndfile = '/home/roland/.keras/datasets/mini_speech_commands/left/b19f7f5f_nohash_0.wav'
#sndfile = filenames[0].numpy().decode()
sound = pydub.AudioSegment.from_wav(sndfile)
#c,w,r = 1,2,16000
c,w,r = sound.channels, sound.sample_width, sound.frame_rate
playback = simpleaudio.play_buffer(sound.raw_data,c,w,r)
But the waveform from tensorflow?
By trial and error I found this to work:
import tensorflow as tf
from IPython import display
audio_binary = tf.io.read_file(sndfile)
audio, _ = tf.audio.decode_wav(audio_binary)
waveform = tf.squeeze(audio, axis=-1)
da = display.Audio(waveform,rate=r) #16000)
simpleaudio.play_buffer(da.data,c,w,r) # YES
#simpleaudio.play_buffer(audio,c,w,r) # NO
#simpleaudio.play_buffer(waveform,c,w,r) # NO
#type(da.data) # bytes
#audio.numpy().dtype # float32
#waveform.numpy().dtype # float32
Then I found this conversion
from the simpleaudio tutorial
to work, too.
#audionp = audio.numpy()
audionp = waveform.numpy()
audionp *= 32767 / np.max(np.abs(audionp))
audionp = audionp.astype(np.int16)
play_obj = simpleaudio.play_buffer(audionp, c,w,r) #1, 2, 16000)

Increasing the volume of recording in Real Time of after saving it?

I used python to make a prototype, to increase the volume of audio signal in real time. It worked by using new_data = audioop.mul(data, 4, 4) where 'data' is chunks from Pyaudio in streaming.
Now, I have to apply similar in ObjectiveC, and even after searching I am unable to find it. How can it be done in Objective C? Do we have such control over data flow in Objective C and If we can't, Is there anyway that a recorded sample's volume can be increased?
import pyaudio
import wave
import audioop
import sys
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 1024
RECORD_SECONDS = 7
WAVE_OUTPUT_FILENAME1 = sys.argv[1]
WAVE_OUTPUT_FILENAME2 = sys.argv[2]
device_index = 2
print("----------------------record device list---------------------")
audio = pyaudio.PyAudio()
print(audio)
info = audio.get_host_api_info_by_index(0)
numdevices = info.get('deviceCount')
for i in range(0, numdevices):
if (audio.get_device_info_by_host_api_device_index(0, i).get('maxInputChannels')) > 0:
print ("Input Device id ", i, " - ", audio.get_device_info_by_host_api_device_index(0, i).get('name'))
print("-------------------------------------------------------------")
index = int((input()))
print(type(index))
print("recording via index "+str(index))
stream = audio.open(format=FORMAT, channels=CHANNELS,
rate=RATE, input=True,input_device_index = index,
frames_per_buffer=CHUNK)
print ("recording started")
Recordframes = []
Recordframes2= []
print(int(RATE / CHUNK * RECORD_SECONDS))
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
new_data = audioop.mul(data, 4, 4)
print("hshsh")
Recordframes.append(data)
Recordframes2.append(new_data)
# data = stream.read(CHUNK)
# print("hshsh")
# Recordframes.append(data)
# print ("recording stopped")
stream.stop_stream()
stream.close()
audio.terminate()
waveFile = wave.open(WAVE_OUTPUT_FILENAME1, 'wb')
waveFile.setnchannels(CHANNELS)
waveFile.setsampwidth(audio.get_sample_size(FORMAT))
waveFile.setframerate(RATE)
waveFile.writeframes(b''.join(Recordframes))
waveFile2 = wave.open(WAVE_OUTPUT_FILENAME2, 'wb')
waveFile2.setnchannels(CHANNELS)
waveFile2.setsampwidth(audio.get_sample_size(FORMAT))
waveFile2.setframerate(RATE)
waveFile2.writeframes(b''.join(Recordframes2))
waveFile.close()
waveFile2.close()
You can use AVAudioEngine (link) to tap into the raw audio data. Alternatively, still using AVAudioEngine, you could add an AVAudioUnitEQ (link) node to your audio graph and use that boost the gain.
Using either method, you can then write out to a file using AVAudioFile (link).

Any ideas about how to save the sound produced by runAndWait()?

I'm using pyttsx3 to convert text to speech.
import pyttsx3
def tts(text):
engine = pyttsx3.init()
engine.say(text)
engine.runAndWait()
but the problem is that I can't save the sound to a file (which is so weird for a library like this).
I've tried some other alternatives like espeak which is the driver for pyttsx but the sound was bad even after tweaking some options.
If you have any suggestions of how can I save the sound or names of other offline libraries offering good speech quality (even with other programming languages) that would be so helpful.
Thank you.
This may be late answer but I hope it will be useful :)
# pip install comtypes
import pyttsx3
engine = pyttsx3.init()
voices = engine.getProperty('voices')
voiceList = []
for voice in voices:
voiceList.append(voice.name)
print("Voice List: " ,voiceList)
def playItNow(textf, filename, useFile = True, rate = 2, voice = voiceList[0]):
from comtypes.client import CreateObject
engine = CreateObject("SAPI.SpVoice")
engine.rate = rate # can be -10 to 10
for v in engine.GetVoices():
if v.GetDescription().find(voice) >= 0:
engine.Voice = v
break
else:
print("Voice not found")
if useFile:
datei = open(textf, 'r',encoding="utf8")
text = datei.read()
datei.close()
else:
text = textf
stream = CreateObject("SAPI.SpFileStream")
from comtypes.gen import SpeechLib
stream.Open(filename, SpeechLib.SSFMCreateForWrite)
engine.AudioOutputStream = stream
engine.speak(text)
stream.Close()
import winsound
winsound.PlaySound(filename, winsound.SND_FILENAME)
playItNow("TextFile.txt", "test_2.wav", useFile= False, rate = -1)

Redis not returning result after upgrading Celery from 3.1 to 4.0

I recently upgraded my Celery installation to 4.0. After a few days of wrestling with the upgrade process, I finally got it to work... sort of. Some tasks will return, but the final task will not.
I have a class, SFF, that takes in and parses a file:
# Constructor with I/O file
def __init__(self, file):
# File data that's gonna get used a lot
sffDescriptor = file.fileno()
fileName = abspath(file.name)
# Get the pointer to the file
filePtr = mmap.mmap(sffDescriptor, 0, flags=mmap.MAP_SHARED, prot=mmap.PROT_READ)
# Get the header info
hdr = filePtr.read(HEADER_SIZE)
self.header = SFFHeader._make(unpack(HEADER_FMT, hdr))
# Read in the palette maps
print self.header.onDemandDataSize
print self.header.onLoadDataSize
palMapsResult = getPalettes.delay(fileName, self.header.palBankOff - HEADER_SIZE, self.header.onDemandDataSize, self.header.numPals)
# Read the sprite list nodes
nodesStart = self.header.sprListOff
nodesEnd = self.header.palBankOff
print nodesEnd - nodesStart
sprNodesResult = getSprNodes.delay(fileName, nodesStart, nodesEnd, self.header.numSprites)
# Get palette data
self.palettes = palMapsResult.get()
# Get sprite data
spriteNodes = sprNodesResult.get()
# TESTING
spritesResultSet = ResultSet([])
numSpriteNodes = len(spriteNodes)
# Split the nodes into chunks of size 32 elements
for x in xrange(0, numSpriteNodes, 32):
spritesResult = getSprites.delay(spriteNodes, x, x+32, fileName, self.palettes, self.header.palBankOff, self.header.onDemandDataSizeTotal)
spritesResultSet.add(spritesResult)
break # REMEMBER TO REMOVE FOR ENTIRE SFF
self.sprites = spritesResultSet.join_native()
It doesn't matter if it's a single task that returns the entire spritesResult, or if I split it using a ResultSet, the outcome is always the same: the Python console I'm using just hangs at either spritesResultSet.join_native() or spritesResult.get() (depending on how I format it).
Here is the task in question:
#task
def getSprites(nodes, start, end, fileName, palettes, palBankOff, onDemandDataSizeTotal):
sprites = []
with open(fileName, "rb") as file:
sffDescriptor = file.fileno()
sffData = mmap.mmap(sffDescriptor, 0, flags=mmap.MAP_SHARED, prot=mmap.PROT_READ)
for node in nodes[start:end]:
sprListNode = dict(SprListNode._make(node)._asdict()) # Need to convert it to a dict since values may change.
#print node
#print sprListNode
# If it's a linked sprite, the data length is 0, so get the linked index.
if sprListNode['dataLen'] == 0:
sprListNodeTemp = SprListNode._make(nodes[sprListNode['index']])
sprListNode['dataLen'] = sprListNodeTemp.dataLen
sprListNode['dataOffset'] = sprListNodeTemp.dataOffset
sprListNode['compression'] = sprListNodeTemp.compression
# What does the offset need to be?
dataOffset = sprListNode['dataOffset']
if sprListNode['loadMode'] == 0:
dataOffset += palBankOff #- HEADER_SIZE
elif sprListNode['loadMode'] == 1:
dataOffset += onDemandDataSizeTotal #- HEADER_SIZE
#print sprListNode
# Seek to the data location and "read" it in. First 4 bytes are just the image length
start = dataOffset + 4
end = dataOffset + sprListNode['dataLen']
#sffData.seek(start)
compressedSprite = sffData[start:end]
# Create the sprite
sprite = Sprite(sprListNode, palettes[sprListNode['palNo']], np.fromstring(compressedSprite, dtype=np.uint8))
sprites.append(sprite)
return json.dumps(sprites, cls=SpriteJSONEncoder)
I know it reaches the return statement, because if I put a print right above it, it will print in the Celery window. I also know that the task is running to completion because I get the following message from the worker:
[2016-11-16 00:03:33,639: INFO/PoolWorker-4] Task framedatabase.tasks.getSprites[285ac9b1-09b4-4cf1-a251-da6212863832] succeeded in 0.137236133218s: '[{"width": 120, "palNo": 30, "group": 9000, "xAxis": 0, "yAxis": 0, "data":...'
Here are my celery settings in settings.py:
# Celery settings
BROKER_URL='redis://localhost:1717/1'
CELERY_RESULT_BACKEND='redis://localhost:1717/0'
CELERY_IGNORE_RESULT=False
CELERY_IMPORTS = ("framedatabase.tasks", )
... and my celery.py:
from __future__ import absolute_import
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'framedatabase.settings')
from django.conf import settings # noqa
app = Celery('framedatabase', backend='redis://localhost:1717/1', broker="redis://localhost:1717/0",
include=['framedatabase.tasks'])
# Using a string here means the worker will not have to
# pickle the object when using Windows.
app.config_from_object('django.conf:settings', namespace='CELERY')
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
Found the problem. Apparently it was leading to deadlock as mentioned in the section "Avoid launching synchronous subtasks" in the Celery documentation here: http://docs.celeryproject.org/en/latest/userguide/tasks.html#tips-and-best-practices
So I got rid of the line:
sprNodesResult.get()
And changed the final result to a chain:
self.sprites = chain(getSprNodes.s(fileName, nodesStart, nodesEnd, self.header.numSprites),
getSprites.s(0,32,fileName,self.palettes,self.header.palBankOff,self.header.onDemandDataSizeTotal))().get()
And it works! Now I just have to find a way to split this the way I want!