Can I do recognition from numpy array in python SpeechRecognition? - numpy

I'm recording a numpy array dt and then writing it in .wav by code like this:
dt = np.int16(dt/np.max(np.abs(dt)) * 32767)
scipy.io.wavfile.write("tmp.wav", samplerate, dt)
after that I read it and recognize by code
import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile("tmp.wav") as source:
audio_text = r.listen(source)
return r.recognize_google(audio_text, language = lang)
Can I do recognition from numpy array without using wav? Cuz it takes excess time

Assuming this is the module you are using, and according to its documentation, you can pass any file-like object to AudioFile(). File-like objects are objects that support read and write operations.
You should be able to stick the byte representation of the wav file into a io.BytesIO object, which supports these operations, and pass that into your speech recognition module. scipy.io.wavfile.write() supports writing to such file-like objects.
I don't have the package or any WAV files to test it, but let me know if something like this works:
wav_bytes = io.BytesIO()
scipy.io.wavfile.write(wav_bytes, samplerate, dt)
with sr.AudioFile(wav_bytes) as source:
...

You can create an audio data object first with AudioData, this is the source that the recognizer needs as a file-like object:
import io
from scipy.io.wavfile import write
import speech_recognition
byte_io = io.BytesIO(bytes())
write(byte_io, sr, audio_array)
result_bytes = byte_io.read()
audio_data = speech_recognition.AudioData(result_bytes, sr, 2)
r = speech_recognition.Recognizer()
text = r.recognize_google(audio_data)
audio_array is a 1-D numpy.ndarray with int16 values and sr is the sampling rate.

Related

how to change the data type from string to bytes, keeping the contents of the string unchanged?

How can you convert string to bytes? And it's not about decode/encode, I have just bytes in the string, I just need to convert the format of the string to bytes.
The point is that I want to write the array numpy in the image metadata. In order to save both the shape of the array and its contents I use the pickle package, but in the image metadata can be written only a string, so I convert the pickle object to a string by simple srt(). such data are written and read from the image metadata look the same as the bits but in string format:
b'\x80\x04\x95\xaa\x00\x00\x00...
Now for pickle to be able to convert this data to numpy array I need to return it to type (bytes), but how can I do that? Everyone on the internet is talking about converting strings to bytes and vice versa via decode/encode, but that's not what I need
It would be good if you show some code.
pickle.dumps returns a bytes object; ideally you would be able to write this straight to your image metadata without the str; investigate if you can write metadata in "binary" mode. If that is not an option, I suggest looking at base64 encoding.
If you insist on using the str method, you could use ast.literal_eval to somewhat safely convert back to a bytes object.
This sample demonstrates binary and str and ast.literal_eval
import ast
import pickle
import numpy as np
a = np.array([[1.23, np.pi], [3, 4]])
# binary mode
with open('data.bin', 'wb') as outfile:
outfile.write(pickle.dumps(a))
with open('data.bin', 'rb') as infile:
b = pickle.loads(infile.read())
print('binary')
print(b)
print(f'{(a==b).all() = }')
# ast.literal_eval hack
with open('data.txt', 'w') as outfile:
outfile.write(str(pickle.dumps(a)))
with open('data.txt') as infile:
c = pickle.loads(ast.literal_eval(infile.read()))
print()
print('str & ast.literal_eval')
print(c)
print(f'{(c==b).all() = }')
On my system (Ubuntu 20.04, Python 3.9.7 via Conda), this gives:
binary
[[1.23 3.14159265]
[3. 4. ]]
(a==b).all() = True
str & ast.literal_eval
[[1.23 3.14159265]
[3. 4. ]]
(c==b).all() = True

How to make a tf.transform (Tensorflow Transform) encoded dict?

I'm trying to get a "tf.transform encoded dict" with this tfx.components.Transform function.
transform = Transform(
examples=example_gen.outputs['examples'],
schema=schema_gen.outputs['schema'],
module_file=os.path.abspath(_taxi_transform_module_file),
instance_name="taxi")
context.run(transform)
I need a dict like this: " a dict of the data you load ({feature_name: feature_value})."
Transform as mentioned above gives me a TfRecord file. How can i decode it properly?
Any help would be appreciated.
import tensorflow_transform as tft
def preprocessing_fn(inputs):
NUMERIC_FEATURE_KEYS = ['PetalLengthCm', 'PetalWidthCm',
'SepalLengthCm', 'SepalWidthCm']
TARGET_FEATURES = "Species"
outputs = inputs.copy()
del outputs['Id']
for key in NUMERIC_FEATURE_KEYS:
outputs[key] = tft.scale_to_0_1(outputs[key])
return outputs
Write a module like this i have written one for iris dataset it's simple to understand for your dataset also you can do like this it will be saved as a tfrecord dataset

Getting error while converting base64 string into image using pyspark

I want to extract and process an image data (3D array) available in base64 format using pyspark. I'm using pandas_udf with pyarrow as a processing function. While parsing the base64 string into pandas_udf function, first I convert the base64 string into image. But, at this step I'm getting error as "TypeError: file() argument 1 must be encoded string without null bytes, not str."
I am using function base64.b64decode(imgString) to convert base64 string to image. I'm using python 2.7
...
avrodf=sqlContext.read.format("com.databricks.spark.avro").load("hdfs:///Raw_Images_201803182350.avro")
interested_cols = ["id","name","image_b64"]
indexed_avrodf = avrodf.select(interested_cols)
ctx_cols = ["id","name"]
result_sdf = indexed_avrodf.groupby(ctx_cols).apply(img_proc)
schema = StructType([
StructField("id",StringType()),
StructField("name",StringType()),
StructField("image",StringType()),
StructField("Proc_output",StringType())
])
#pandas_udf(schema, PandasUDFType.GROUPED_MAP)
def img_proc(df):
df['Proc_output'] = df['image_b64'].apply(is_processed)
return df
def is_processed(imgString):
import cv2
from PIL import Image, ImageDraw, ImageChops
import base64
wisimg = base64.b64decode(imgString)
image = Image.open(wisimg)
.....
return processed_status

Getting data from odo.resource(source) to odo.resource(target)

I'm trying to extend the odo library with functionality to convert a GDAL dataset (raster with spatial information) to a NetCDF file.
Reading in the gdal dataset goes fine. But in the creation stage of the netcdf I need some metadata of the gdal dataset (metadata that is not know yet when calling odo.odo(source,target) ). How could I achieve this?
a short version of my code so far:
import odo
from odo import resource, append
import gdal
import netCDF4 as nc4
import numpy as np
#resource.register('.+\.tif')
def resource_gdal(uri, **kwargs):
ds = gdal.Open(uri)
# metadata I need to transfer to netcdf
b = ds.GetGeoTransform() #bbox, interval
return ds
#resource.register('.+\.nc')
def resource_netcdf(uri, dshape=None, **kwargs):
ds = nc4.Dataset(uri,'w')
# create lat lon dimensions and variables
ds.createDimension(lat, dshape[0].val)
ds.createDimension(lon, dshape[1].val)
lat = ds.createVariable('lat','f4', ('lat',))
lon = ds.createVariable('lon','f4', ('lon',))
# create a range from the **gdal metadata**
lat_array = np.arange(dshape[0].val)*b[1]+b[0]
lon_array = np.arange(dshape[1].val)*b[5]+b[3]
# assign the range to the netcdf variable
lat[:] = lat_array
lon[:] = lon_array
# create the variable which will hold the gdal data
data = ds.createVariable('data', 'f4', ('lat', 'lon',))
return data
#append.register(nc4.Variable, gdal.Dataset)
def append_gdal_to_nc4(tgt, src, **kwargs):
arr = src.ReadAsArray()
tgt[:] = arr
return tgt
Thanks!
I don't have much experience with odo, but from browsing the source code and docs it looks like resource_netcdf() should not be involved in translating gdal data to netcdf. Translating should be the job of a gdal_to_netcdf() function decorated by convert.register. In such a case, the gdal.Dataset object returned by resource_gdal would have all sufficient information (georeferencing, pixel size) to make a netcdf.

Empty outputs with python GDAL

Hello im new to Gdal and im struggling a with my codes. Everything seems to go well in my code mut the output bands at the end is empty. The no data value is set to 256 when i specify 255, so I don't really know whats wrong. Thanks any help will be appreciated!!!
Here is my code
from osgeo import gdal
from osgeo import gdalconst
from osgeo import osr
from osgeo import ogr
import numpy
#graticule
src_ds = gdal.Open("E:\\NFI_photo_plot\\photoplotdownloadAllCanada\\provincial_merge\\Aggregate\\graticule1.tif")
band = src_ds.GetRasterBand(1)
band.SetNoDataValue(0)
graticule = band.ReadAsArray()
print('graticule done')
band="none"
#Biomass
dataset1 = gdal.Open("E:\\NFI_photo_plot\\photoplotdownloadAllCanada\provincial_merge\\Aggregate\\Biomass_NFI.tif")
band1 = dataset1.GetRasterBand(1)
band1.SetNoDataValue(-1)
Biomass = band1.ReadAsArray()
maskbiomass = numpy.greater(Biomass, -1).astype(int)
print("biomass done")
Biomass="none"
band1="none"
dataset1="none"
#Baseline
dataset2 = gdal.Open("E:\\NFI_photo_plot\\Baseline\\TOTBM_250.tif")
band2 = dataset2.GetRasterBand(1)
band2.SetNoDataValue(0)
baseline = band2.ReadAsArray()
maskbaseline = numpy.greater(baseline, 0).astype(int)
print('baseline done')
baseline="none"
band2="none"
dataset2="none"
#sommation
biosource=(graticule+maskbiomass+maskbaseline)
biosource1=numpy.uint8(biosource)
biosource="none"
#Écriture
dst_file="E:\\NFI_photo_plot\\photoplotdownloadAllCanada\\provincial_merge\\Aggregate\\Biosource.tif"
dst_driver = gdal.GetDriverByName('GTiff')
dst_ds = dst_driver.Create(dst_file, src_ds.RasterXSize,
src_ds.RasterYSize, 1, gdal.GDT_Byte)
#projection
dst_ds.SetProjection( src_ds.GetProjection() )
dst_ds.SetGeoTransform( src_ds.GetGeoTransform() )
outband=dst_ds.GetRasterBand(1)
outband.WriteArray(biosource1,0,0)
outband.SetNoDataValue(255)
biosource="none"
graticule="none"
A few pointers:
Where you have ="none", these need to be = None to close/cleanup the objects, otherwise you are setting the objects to an array of characters: n o n e, which is not what you intend to do.
Why do you have band1.SetNoDataValue(-1), while other NoData values are 0? Is this data source signed or unsigned? If unsigned, then -1 doesn't exist.
When you open rasters with gdal.Open without the access option, it defaults to gdal.GA_ReadOnly, which means your subsequent SetNoDataValue calls do nothing. If you want to modify the dataset, you need to use gdal.GA_Update as your second parameter to gdal.Open.
Another strategy to create a new raster is to use driver.CreateCopy; see the tutorial for details.