how to get the details of ASR VOSK - tensorflow

I have working with Vosk and I need to get the time of each word in my file.mp3 this is my code
def voice_recognition(filename):
model = Model(model_name="vosk-model-fa-0.5")
rec = KaldiRecognizer(model, FRAME_RATE)
rec.SetWords(True)
mp3 = AudioSegment.from_mp3(filename)
mp3 = mp3.set_channels(CHANNELS)
mp3 = mp3.set_frame_rate(FRAME_RATE)
step = 45000
transcript = ""
for i in range(0, len(mp3), step):
segment = mp3[i:i+step]
rec.AcceptWaveform(segment.raw_data)
result = rec.Result()
text = json.loads(result)["text"]
transcript += text
return transcript
I need something like this
time word
-----------------------
(0.0.01, 0.0.2) hi
(0.0.03, 0.0.4) how
(0.0.04, 0.0.5) are
(0.0.05, 0.0.6) you
is there any way get the data like this?

I just found all I need are already there when you set the rec.SetWords(True) all the details are in result = rec.Result()

Related

efficient way to join 65,000 .csv files

I have say 65,000 .csv files that I need to work with in julia language.
The goal is to perform basic statistics on the data set.
I had some ways of joining all the data sets
#1 - set a common index and leftjoin() - perform statistics row wise
#2 - vcat() the dataframes on top of each other - vertically stacked use group by
Eitherway the final data frames are very large ! and become slow in processing
Is there an efficient way of doing this ?
I thought of performing either #1 or #2 and splitting the joining operations in thirds, lets say after 20,000 joins save to .csv and operate in chunks then at the end join all 3 in one last operation.
Well not sure how to replicate making 65k .csv files but basically below I loop through the files in the directory, load the csv then vcat() to one df. Question more relating to if there is a better way to manage the size of the operation. vcat() makes something grow. Ahead of time maybe I can cycle through the .csv files, obtain file dimensions per .csv, initialize the full dataframe to final output size, then cycle through each .csv row by row and populate the initialized df.
using CSV
using DataFrames
# read all files in directory
csv_dir_tmax = cd(readdir, "C:/Users/andrew.bannerman/Desktop/Julia/scripts/GHCN data/ghcnd_all_csv/tmax")
# initialize outputs
tmax_all = DataFrame(Date = [], TMAX = [])
c=1
for c = 1:length(csv_dir_tmax)
print("Starting csv file ", csv_dir_tmax[c]," - Iteration ",c,"\n")
if c <= length(csv_dir_tmax)
csv_tmax = CSV.read(join(["C:/Users/andrew.bannerman/Desktop/Julia/scripts/GHCN data/ghcnd_all_csv/tmax/", csv_dir_tmax[c]]), DataFrame, header=true)
tmax_all = vcat(tmax_all, csv_tmax)
end
end
The following approach should be relatively efficient (assuming that data fits into memory):
tmax_all = reduce(vcat, [CSV.read("YOUR_DIR$x", DataFrame) for x in csv_dir_tmax])
initializing the final output to the total size of final output (like vcat() would finally build). Then populate it element wise seems to be working way better:
# get the dimensions of each .csv files
tmax_all_total_output_size = fill(0, size(csv_dir_tmax,1))
tmin_all_total_output_size = fill(0, size(csv_dir_tmin,1))
tavg_all_total_output_size = fill(0, size(csv_dir_tavg,1))
tmax_dim = Int64[]
tmin_dim = Int64[]
tavg_dim = Int64[]
c=1
for c = 1:length(csv_dir_tmin) # 47484 - last point
print("Starting csv file ", csv_dir_tmin[c]," - Iteration ",c,"\n")
if c <= length(csv_dir_tmax)
tmax_csv = CSV.read(join(["C:/Users/andrew.bannerman/Desktop/Julia/scripts/GHCN data/ghcnd_all_csv/tmax/", csv_dir_tmax[c] ]), DataFrame, header=true)
global tmax_dim = size(tmax_csv,1)
tmax_all_total_output_size[c] = tmax_dim
end
if c <= length(csv_dir_tmin)
tmin_csv = CSV.read(join(["C:/Users/andrew.bannerman/Desktop/Julia/scripts/GHCN data/ghcnd_all_csv/tmin/", csv_dir_tmin[c]]), DataFrame, header=true)
global tmin_dim = size(tmin_csv,1)
tmin_all_total_output_size[c] = tmin_dim
end
if c <= length(csv_dir_tavg)
tavg_csv = CSV.read(join(["C:/Users/andrew.bannerman/Desktop/Julia/scripts/GHCN data/ghcnd_all_csv/tavg/", csv_dir_tavg[c]]), DataFrame, header=true)
global tavg_dim = size(tavg_csv,1)
tavg_all_total_output_size[c] = tavg_dim
end
end
# sum total dimension of all .csv files
tmax_sum = sum(tmax_all_total_output_size)
tmin_sum = sum(tmin_all_total_output_size)
tavg_sum = sum(tavg_all_total_output_size)
# initialize final output to total final dimension
tmax_date_array = fill(Date("13000101", "yyyymmdd"),tmax_sum)
tmax_array = zeros(tmax_sum)
tmin_date_array = fill(Date("13000101", "yyyymmdd"),tmin_sum)
tmin_array = zeros(tmin_sum)
tavg_date_array = fill(Date("13000101", "yyyymmdd"),tavg_sum)
tavg_array = zeros(tavg_sum)
# initialize outputs
tmax_all = DataFrame(Date = tmax_date_array, TMAX = tmax_array)
tmin_all = DataFrame(Date = tmin_date_array, TMIN = tmin_array)
tavg_all = DataFrame(Date = tavg_date_array, TAVG = tavg_array)
tmax_count = 0
tmin_count = 0
tavg_count = 0
Then begin filling the initialized df.

Why wont the NASA pictures display?

The pictures are not displaying. The code executes just fine. I took out the api key.
def gimmePictures(num):
for n in range(0,num):
now = datetime.datetime.now()
day4Pictures= now - datetime.timedelta(days = n)
data = {'api_key':'',
'date':day4Pictures.date()}
print(data)
# using the paramas argument in our request
result = requests.get('https://api.nasa.gov/planetary/apod',params=data)
# create a dictionary for yesterday's picture
dict_day = result.json()
print(dict_day['date'])
Image(dict_day['url'])
gimmePictures(10)
How can I display an image from a file in Jupyter Notebook?
def gimmePictures(num):
listofImageNames=[]
for n in range(0,num):
now = datetime.datetime.now()
day4Pictures= now - datetime.timedelta(days = n)
data = {'api_key':'dcS6cZ9DJ4zt9oXwjF6hgemj38bNJo0IGcvFGZZj', 'date':day4Pictures.date()}
# using the paramas argument in our request
result = requests.get('https://api.nasa.gov/planetary/apod',params=data)
# create a dictionary for yesterday's picture
dict_day = result.json()
listofImageNames.append(dict_day['url'])
for imageName in listofImageNames:
display(Image(imageName))
gimmePictures(10)

How can I use a loop to apply a function to a list of csv files?

I'm trying to loop through all files in a directory and add "indicator" data to them. I had the code working where I could select 1 file and do this, but now am trying to make it work on all files. The problem is when I make the loop it says
ValueError: Invalid file path or buffer object type: <class 'list'>
The goal would be for each loop to read another file from list, make changes, and save file back to folder with changes.
Here is complete code w/o imports. I copied 1 of the "file_path"s from the list and put in comment at bottom.
### open dialog to select file
#file_path = filedialog.askopenfilename()
###create list from dir
listdrs = os.listdir('c:/Users/17409/AppData/Local/Programs/Python/Python38/Indicators/Sentdex Tutorial/stock_dfs/')
###append full path to list
string = 'c:/Users/17409/AppData/Local/Programs/Python/Python38/Indicators/Sentdex Tutorial/stock_dfs/'
listdrs_path = [ string + x for x in listdrs]
print (listdrs_path)
###start loop, for each "file" in listdrs run the 2 functions below and overwrite saved csv.
for file in listdrs_path:
file_path = listdrs_path
data = pd.read_csv(file_path, index_col=0)
########################################
####function 1
def get_price_hist(ticker):
# Put stock price data in dataframe
data = pd.read_csv(file_path)
#listdr = os.listdir('Users\17409\AppData\Local\Programs\Python\Python38\Indicators\Sentdex Tutorial\stock_dfs')
print(listdr)
# Convert date to timestamp and make index
data.index = data["Date"].apply(lambda x: pd.Timestamp(x))
data.drop("Date", axis=1, inplace=True)
return data
df = data
##print(data)
######Indicator data#####################
def get_indicators(data):
# Get MACD
data["macd"], data["macd_signal"], data["macd_hist"] = talib.MACD(data['Close'])
# Get MA10 and MA30
data["ma10"] = talib.MA(data["Close"], timeperiod=10)
data["ma30"] = talib.MA(data["Close"], timeperiod=30)
# Get RSI
data["rsi"] = talib.RSI(data["Close"])
return data
#####end functions#######
data2 = get_indicators(data)
print(data2)
data2.to_csv(file_path)
###################################################
#here is an example of what path from list looks like
#'c:/Users/17409/AppData/Local/Programs/Python/Python38/Indicators/Sentdex Tutorial/stock_dfs/A.csv'
The problem is in line number 13 and 14. Your filename is in variable file but you are using file_path which you've assigned the file list. Because of this you are getting ValueError. Try this:
### open dialog to select file
#file_path = filedialog.askopenfilename()
###create list from dir
listdrs = os.listdir('c:/Users/17409/AppData/Local/Programs/Python/Python38/Indicators/Sentdex Tutorial/stock_dfs/')
###append full path to list
string = 'c:/Users/17409/AppData/Local/Programs/Python/Python38/Indicators/Sentdex Tutorial/stock_dfs/'
listdrs_path = [ string + x for x in listdrs]
print (listdrs_path)
###start loop, for each "file" in listdrs run the 2 functions below and overwrite saved csv.
for file_path in listdrs_path:
data = pd.read_csv(file_path, index_col=0)
########################################
####function 1
def get_price_hist(ticker):
# Put stock price data in dataframe
data = pd.read_csv(file_path)
#listdr = os.listdir('Users\17409\AppData\Local\Programs\Python\Python38\Indicators\Sentdex Tutorial\stock_dfs')
print(listdr)
# Convert date to timestamp and make index
data.index = data["Date"].apply(lambda x: pd.Timestamp(x))
data.drop("Date", axis=1, inplace=True)
return data
df = data
##print(data)
######Indicator data#####################
def get_indicators(data):
# Get MACD
data["macd"], data["macd_signal"], data["macd_hist"] = talib.MACD(data['Close'])
# Get MA10 and MA30
data["ma10"] = talib.MA(data["Close"], timeperiod=10)
data["ma30"] = talib.MA(data["Close"], timeperiod=30)
# Get RSI
data["rsi"] = talib.RSI(data["Close"])
return data
#####end functions#######
data2 = get_indicators(data)
print(data2)
data2.to_csv(file_path)
Let me know if it helps.

Python3: Can't write to text file with obtained data from SQL table

I think I am stucked on an easy job as a beginner but have to ask this question.
My objective is to create another list from data obtained from a SQL Table.
This list will be created in acocrdance with the input data by user.
The SQL table has no problem, but I couldn't write on a txt file despite not receiving an error message.
Where is the problem?
import sqlite3
db = sqlite3.connect("air2.sql")
cs = db.cursor()
epcs = dict()
for i in range(19):
epcs[i] = 0
def addepc():
epcNo = input("EPC No:")
a = "SELECT * FROM 'epcval5' WHERE epc='EPC{}'".format(epcNo)
cs.execute(a)
data = cs.fetchone()
print("You have selected this EPC:")
for i in data:
print(i)
b = "SELECT value FROM 'epcval5' as float WHERE epc='EPC{}'".format(epcNo)
cs.execute(b)
epcv = cs.fetchone()
res = str('.'.join(str(ele) for ele in epcv))
print(type(res))
with open('epcs.txt', 'w') as f:
epcs[epcNo] = res
f.write(epcs[epcNo])
addepc()
print("Done! Press ENTER to continue")
input()
You could use pandas to write it to a csv file
with sqlite3.connect(DB_FILENAME) as con:
df = pd.read_sql_query('your sql query',con)
df.to_csv('file_name')

error youtube v3 python response not ready

I use a python script to search for video information with youtube v3 api. On one computer the script works perfectly, but on another it receive the following error:
File "script.py", line 105, in youtube_search(options)
File "script.py", line 16, in youtube_search developerKey = DEVELOPER_KEY
.....
File "C:\Python27\lib\httplib.py", line 1013, in getresponse raise ResponseNotReady()
httplib.ResponseNotReady.
The youtube_search() function that I'm using is:
def youtube_search(options):
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION,
developerKey=DEVELOPER_KEY)
search_response = youtube.search().list(
q = options.q,
part = "id, snippet",
maxResults=options.maxResults
).execute()
videos = []
channels = []
playlists = []
videoInfo = []
t = datetime.datetime.now()
ff_name = '[' + options.q + '].txt'
f = open(ff_name, 'w')
no_results = search_response.get("pageInfo")["totalResults"]
page = 1
while (page <= (no_results / 50)):
nextPage = search_response.get("nextPageToken")
for search_result in search_response.get("items", []):
if search_result["id"]["kind"] == "youtube#video":
info = youtube.videos().list(
part = "statistics,contentDetails"
,id = search_result["id"]["videoId"]).execute()
for info_result in info.get("items", []):
videos.append("%s<|>%s<|>%s" % (
time.strftime("%x")
,nextPage
,search_result["snippet"]["title"]
)
f.write(str(videos))
f.write('\n')
videos = []
page = page + 1
search_response = youtube.search().list(
q = options.q,
part = "id,snippet",
maxResults = options.maxResults,
pageToken = nextPage
).execute()
Do you have any hints on why I encounter this behavior?
Thanks.
That specific exception is explained at Python httplib ResponseNotReady
One thing to point out, though, is that you don't need to perform a separate youtube.videos.list() call for each video id. You can pass up to 50 comma-separate video ids as the id= parameter to a single youtube.videos.list() call. Cutting down on the number of HTTP requests you're making will lead to better performance and may work around the exception.