TTS reading error for thai words - text-to-speech

when TTS reads sentence
1. "กลอนอกหักน้อยใจ ก็แค่คนหนึ่งคน" In the last word,it supposed to read "Khon" but it reads "Ko-no".
"วันนี้ฝนตก." It reads wan nee phon tor kor" It should be "wan nee phon tok".

Related

How does lex match tokens

I am learning lex. I made a simple lex file containing one rule:
%%
“Hello” puts(“response\n”);
%%
After running lex file.l, I’d like to inspect the outputted file file.yy.c. I presume that the lexer stores the tokens somehow, and matches it (probably with a switch statement) with the input. Looking at the file, I am able to see the output (puts(“response\n”);, but I cannot find the tokens themselves. I can see many tables (matrices?) in the outputted file, but I cannot figure out how they are translated into the tokens.
Any help explaining how tokens are matched by the lexer is much appreciated!
lex builds a state machine (a DFA) that consumes one character at a time until each reaches a state that can't match any longer token, and then runs the code corresponding to the longest token it found.
In your example, it will build a very simple DFA with about 7 states -- from an initial state that will match 'H' that goes to second state matching 'e' etc. If it gets to the 6th state (after matching the 'o') it will print the message, but any of the states on any other character will go to a state that does the default "single character echo" action.

Tagged PDF NOT able to Read HyphenSpan

Here I have 2 PDF's where we have Hyphen used in the text of the document at the end of the fourth line of the PDF (see below.)
Where we have a text de - (cember) in the next line.
In both the PDF's the '-' has been tagged as Hyphen Span. But while using the JAWS to read both the files, in one file it is reading as December and another PDF file as De dash cember.
PDF that reads december as de -(dash) cember
PDF that which reads december as december
I wish to confirm if it is related to a content stream.
To read this text de - (cember) as december, we need to make hyphen - as soft hyphen by setting the actual text as the Unicode of the soft hyphen \u00AD.
structureElement.setActualText("\u00AD");

Sparkfun SC16IS750 multiple read

I am struggling to get multiple read for the RHR register working on my SC16IS750 breakout board. I am using the board with I2C standard frequency and
normal single reads and write are working.
The chip has a FiFo which can hold up to 64 characters. The Fig 24 in the Chip Manual (http://www.nxp.com/documents/data_sheet/SC16IS740_750_760.pdf) shows how it should work.
However when initiating such a multiple read only the first character is correctly transfered. All other characters are "0". If eg the FiFO held 16 characters before the transfer it will be empty afterwards.
Here what I do when receiving a FiFO Interrupt from the Chip:
- Read interrupt status register IIR
- Read Line Status register
- Read RX Level register RXLVL - ie number of Charcaters in Fifo
- Read Multiple data:
I2C start, slave address+write, register address, repeated start, slave address+read, read charcter, ACK...read last character, NACK
I'll send a test pattern to the chip with 16 characters, only the first character is correct the rest are "0". RXLVL shows 16 before read.
Does anybody know what needs to be setup/considered for this operation with the NXP chip?

Barcode Scanner Decoding

I am experience some trouble decoding the output of a 1D Chinese Barcode Reader. The reader uses a USB interface and connects as a Keyboard HID device (which I have no problem with). After interfacing the device with Labview and generating the inf driver file I tried reading device interrupt data from a test barcode in the configuration manual "000200" the output of the Device is sent serially and is as follows "39 39 39 31 39 39 40".
I am guessing that 40 is the escape character the 39 is 0 and the 31 is 2.
After doing some research I could not find the relevant key code table for this encoding. I have tried disabling all other encoding formats using the configuration manual (39, full ascii, int 2 to 5..).
The module was able to read Upper case letter and send an additional character noting that it is an Upper Case
The device stopped reading the barcode after disabling the Code 128. I re-enabled this option and reading was successful. however the code 128 table have the "G" assigned to the 39 output and not the 0 which messes up the reading.
Did anyone work with the following format? if so which key code is it? or should I map the character set manually?
The following is a link to the purchased Module:
Reader
Thank you it is much appreciated!
As per this answer, a USB HID device sends USB usage codes, not ASCII character codes. That answer links to the lengthy official documentation on usb.org, but this document from microsoft.com appears to be a concise summary. If those links break in future, a web search for usb hid key codes or similar should find an equivalent.
Looking at the HID Usage ID column on the Microsoft document, the code for '0' is 27 in hexadecimal, which is 39 in decimal. '2' is 1F which is 31, and 40 decimal is 28 hex which corresponds to Return. That would be consistent with the output you're seeing, assuming you're reporting it as a sequence of decimal values. As you've observed, a capital letter is sent as two codes, the first of which will probably correspond to the 'shift' key in the HID usage table.
You could try searching or asking around for a LabVIEW VI to translate these codes into ASCII characters but it's probably quicker to build your own based on the table linked above. To test it, you could use a barcode generator program or webpage to create barcodes for all the characters you want to be able to decode and check that scanning them with your device gives the correct output.

Reading file line by line in Lua

As per Lua documentation, file:read("*l") reads next line skipping end of line.
Note:- "*l": reads the next line skipping the end of line, returning nil on end of file. This is the default format
Is this documentation right? Because file:read("*l") reads the current line,instead of next line or my understanding is wrong? Pretty confusing...
Lua manages files using the same model of the underlying C implementation (this model is used also by other programming languages and it is fairly common). If you are not familiar with this way of looking at files, the terminology could be unclear, indeed.
In this model a file is represented as a stream of bytes having a so called current position. The current position is a sort of conceptual pointer to the first byte in the file that will be read or written by the next I/O operation. When you open a file for reading, a new stream is set-up so that its current position is the beginning of the file, i.e. the current position "points" to the first byte in the file.
In Lua you manage streams through so-called file handles, which are a sort of intermediaries for the underlying streams. Any operation you perform using the handle is carried over to the corresponding stream.
Lua io.open opens a file, associates a C stream with it and returns a file handle that represents that stream:
local file_handle = io.open( "myfile.txt" ) -- file opened for reading
Therefore, if you perform any operation that reads some bytes (usually interpreted as characters, if you work with text files) those are read from the stream and for each byte read the current position of the stream advances by one, pointing each time to the next byte to be read.
Lua documentation implies this model. Thus when it says next line, it means that the input operation will read all characters in the stream starting from the current position until an end-of-line character is found.
Note that if you look at text files as a sequence of lines you could be misled, since you could think of a "current line" and a "next line". That would be an higher level model compared to the C model. There is no "current line" in C. In C text files are nothing more than a sequence of bytes where some special characters (end-of-line characters) undergo some special treatment (which is mostly implementation-dependent) and are used by some C standard functions as line terminators, i.e. as marks to detect when stop reading characters.
Another source of confusion for newbies or people coming from higher level languages is that in C, for an historical accident, bytes are handled as characters (the basic data type to handle single bytes is char, which is the smallest numeric type in C!). Therefore for people with a C background it is natural to think of bytes as characters and vice versa.
Although Lua is a much higher level language than C, its close relationship with C (it was designed to be easily interfaced with C code) makes it inherit part of this C "bytes-as-characters" approach. In fact, for example, Lua strings can hold arbitrary bytes and can be used to process raw binary data.
Like Lorenso said above, read starts at the current file position and reads from that position some portion of the file. How much of the file it reads depends on read instruction. For reference, in Lua 5.3:
"*all" : reads to the end of the file
"*line" : reads from the current position to the end of the line.
The end of the line is marked by a special character usually denoted
LfCr (Line feed, carriage return )
"*number" : reads a number, that is, it will read up to the end of what
it recognizes in the text as a number, stopping at, for example, a
comma ",".
num : reads a string with up to num characters
Here's an example that reads a file with a list of numbers into an array (a table), then returns the array. (Just change the "*number" to "*line" and it would read a file line by line):
function read_array(file)
local arr = {}
local handle = assert( io.open(file,"r") )
local value = handle:read("*number")
while value do
table.insert( arr, value )
value = handle:read("*number")
end
handle:close()
return arr
end