Voice recognition with Julius. How to make .voca file? - voice-recognition

I'm making a voice recognition system and Julius shows not bad results in this work.
Words from sample .voca file are recognizing perfectly but how to place own words and transcriptions to the file?
I've tried VoxForge (http://www.voxforge.org/) last release and nightly builds for acoustic models with their vocabulary but I've got a lot a lot errors at julius start like this:
Error: voca_load_htkdict: line 19: triphone "r-d+v" not found
Error: voca_load_htkdict: line 19: triphone "d-v+aa" not found
Error: voca_load_htkdict: the line content was: 2 [AARDVARK] aa r d v aa r k
Error: voca_load_htkdict: begin missing phones
Error: voca_load_htkdict: r-d+v
Error: voca_load_htkdict: d-v+aa
Error: voca_load_htkdict: end missing phones
Error: init_voca: error in reading /usr/src/custom/julius/quickstart/grammar/sample.dict
ERROR: failed to read dictionary "/usr/src/custom/julius/quickstart/grammar/sample.dict"
ERROR: m_fusion: some error occured in reading grammars
ERROR: Error in loading model
Anyone knows the rules of word transcription for .voca files?

error reason:
julius optput these messages when your word dictionary contains words that are not trained in the Acoustic Model because the "voca_load_htkdict.c" tries to match the triphones in dict file with the triphone list in Acoustic Model, so when it does not find it, it shows this error and stops the program.
possible error solutions:
1. enable -forcedict option or uncomment it jconf file to Skip error words in dictionary and force running.
or..
2. map the "not found triphone" to the most close physical triphone in hmmlist file "tiedlist".
for example:
b-ey+t v-eh+t
The first column is the name of triphone (generated from your dictionary), and the second column is the name of the HMM actually defined in your AM.
but this solution can be done if the "not found triphones" are little, not too many.
the best solution is to not to include words in your dict file that are not in the A.M
note that the first two solutions are for testing julius only because for production or comercial projects you must train the acoustic model and language model with the same corpus.

Related

How to fix "ERROR 1: Cannot get geotransform" in GDAL

I am trying to read values from a geoTiff and am using gdallocationinfo for that purpose.
However, when I try to do that, e.g with gdallocationinfo out.tif -wgs85 8.5 47.3, the following error occurs:
root#bc21abca5e07:/usr/src/app# gdallocationinfo out.tif -wgs84 8.5 47.3
ERROR 1: Cannot get geotransform
Note: if I leave the -wgs84 option away, I am able to read the values from the .tif. Also, the -geoloc function is resulting in the same output as -wgs84.
Is this a problem with the geoTiff? I have already tried this command on Windows and on Debian, resulting in the same output both times.
You can't "fix it" short of properly georeferencing your dataset.
The error means that your dataset lacks georeferencing information so GDAL is unable to convert the WGS84 coordinates to pixel coordinates.

How do I get Source Extractor to Analyze an Image?

I'm relatively inexperienced in coding, so right now I'm just familiarizing myself with the basics of how to use SE, which I'll need to use in the near future.
At the moment I'm trying to get it to analyze a FITS file on my computer (which is a Mac). I'm sure this is something obvious, but I haven't been able to get it do that. Following the instructions in Chapters 6 and 7 of Source Extractor for Dummies (linked below), I input the following:
sex MedSpiral_20deg_Serl2_.45_.fits.fits -c configuration_file.txt
And got the following error message:
WARNING: configuration_file.txt not found, using internal defaults
----- SExtractor 2.19.5 started on 2020-02-05 at 17:10:59 with 1 thread
Setting catalog parameters
ERROR: can't read default.param
I then tried entering parameters manually:
sex MedSpiral_20deg_Ser12_.45_.fits.fits -c configuration_file.txt -DETECT_TYPE CCD -MAG_ZEROPOINT 2.5 -PIXEL_SCALE 0 -SATUR_LEVEL 1 -SEEING_FWHM 1
And got the same error message. I tried referencing default.sex directly:
sex MedSpiral_20deg_Ser12_.45_.fits.fits -c default.sex
And got the same error message again, substituting "configuration_file.txt not found" with "default.sex not found" (I checked that default.sex was on my computer, it is). The same thing happened when I tried to use default.param.
Here's the link to SE for Dummies (Chapter 6 begins on page 19):
http://astroa.physics.metu.edu.tr/MANUALS/sextractor/Guide2source_extractor.pdf
If you run the command "sex MedSpiral_20deg_Ser12_.45_fits.fits -c default.sex" within the config folder (within the sextractor folder), you will be able to run it.
However, I wonder how I can possibly run sextractor command from any folder in my computer?

Uploading job fails on the same file that was uploaded successfully before

I'm running regular uploading job to upload csv into BigQuery. The job runs every hour. According to recent fail log, it says:
Error: [REASON] invalid [MESSAGE] Invalid argument: service.geotab.com [LOCATION] File: 0 / Offset:268436098 / Line:218637 / Field:2
Error: [REASON] invalid [MESSAGE] Too many errors encountered. Limit is: 0. [LOCATION]
I went to line 218638 (the original csv has a headline, so I assume 218638 should be the actual failed line, let me know if I'm wrong) but it seems all right. I checked according table in BigQuery, it has that line too, which means I actually successfully uploaded this line before.
Then why does it causes failure recently?
project id: red-road-574
Job ID: Job_Upload-7EDCB180-2A2E-492B-9143-BEFFB36E5BB5
This indicates that there was a problem with the data in your file, where it didn't match the schema.
The error message says it occurred at File: 0 / Offset:268436098 / Line:218637 / Field:2. This means the first file (it looks like you just had one), and then the chunk of the file starting at 268436098 bytes from the beginning of the file, then the 218637th line from that file offset.
The reason for the offset portion is that bigquery processes large files in parallel in multiple workers. Each file worker starts at an offset from the beginning of the file. The offset that we include is the offset that the worker started from.
From the rest of the error message, it looks like the string service.geotab.com showed up in the second field, but the second field was a number, and service.geotab.com isn't a valid number. Perhaps there was a stray newline?
You can see what the lines looked like around the error by doing:
cat <yourfile> | tail -c +268436098 | tail -n +218636 | head -3
This will print out three lines... the one before the error (since I used -n +218636 instead of +218637), the one that had the error, and the next line as well.
Note that if this is just one line in the file that has a problem, you may be able to work around the issue by specifying maxBadRecords.

Pig Filter Syntax error, unexpected symbol

inputData = LOAD '$input' AS (line:chararray);
statusLineFilter = FILTER smallData BY (line MATHCES '^.* AppWrite-Dispatcher: Status code: [0-9]+$');
This code, when I run it, yields this error: ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Syntax error, unexpected symbol at or near 'line'
The log file says the exact same thing. I'm at a loss, because the exact same syntax is working in other scripts I've written.
In order to avoid misspelling of key words I recommend you to use an IDE or a Text-Editor like emacs with the pig-mode.el which add syntax highlight ;)

error when trying to import ps file by grImport in R

I need to create a pdf file with several chart created by ggplot2 arranged in a A4 paper, and repeat it 20-30 times.
I export the ggplot2 chart into ps file, and try to PostScriptTrace it as instructed in grImport, but it just keep giving me error of Unrecoverable error, exit code 1.
I ignore the error and try to import and xml file generated into R object, give me another error:
attributes construct error
Couldn't find end of Start Tag text line 21
Premature end of data in tag picture line 3
Error: 1: attributes construct error
2: Couldn't find end of Start Tag text line 21
3: Premature end of data in tag picture line 3
What's wrong here?
Thanks!
If you have no time to deal with Sweave, you could also write a simple TeX document from R after generating the plots, which you could later compile to pdf.
E.g.:
ggsave(p, file=paste('filename', id, '.pdf'))
cat(paste('\\includegraphics{',
paste('filename', id, '.pdf'), '}', sep=''),
file='report.pdf')
Later, you could easily compile it to pdf with for example pdflatex.