Too Few Observation Sequences while training new voice for MarryTTS - text-to-speech

I'm trying to build a new voice for MaryTTS in German for a while now, but didn't succeed so far. I followed a tutorial (https://github.com/marytts/marytts/wiki/HMMVoiceCreation) and tried to understand each step. No matter what I do, I get stuck at step 14 (HMMVoiceMakeVoice), the error being:
ERROR [+2121] HInit: Too Few Observation Sequences
which usually means, that the tested phone (en9 in this example) is not found within my data set.
After changing the locale, the same error happend on the phone "de27" as Nikolay Shmyrev pointed out.
I doubt that though, since I use about 500 Audio files, which have a length of at least 5 sec, so a total well over an hour of footage.
In fact, I skipped the "en9" phone, since I don't know what exactly is represented by it. The next one to fail was "oI", which I located manually about ten times in the first few audio files.
I think it has to do with the automatic labeling to not work properly (step 2-4), but I don't know, what I can do, to get a better result?
Edit: I uploaded all the files I get until this step, which can be inspected on this shared google drive. Note, that I could not, for copyright reasons, upload the wav folder. In the logs directory, you can find the logs after each step. I couldn't find any problems there, but maybe someone will.
I do not completely understand the structure of the generated data, but I thought changing the MARYBASE/mary/trickyPhones.txt and running the make tools again would be enough to change the map name from "tS" to "Z" which sounds about the same in German. But the HMMVoiceMakeVoice still results in the same output.

Related

How to watermark a wave file with NAudio?

My application outputs wave files and I have the requirement to implement watermarking. By this I mean taking another wave file containing, say, 4 seconds of someone saying "This file is copyrighted material" and overlay this into the original file every 30 seconds.
I have tried many things but none quite works. I can pad the watermark to 30 seconds using OffsetSampleProvider and mix this with MixingSampleProvider, but that would only mix in one copy.
LoopStream is a WaveStream, not an ISampleProvider, and even if I could do the conversion, I think this would mix forever, because the LoopStream would never stop returning data.
This seems like a fairly basic use case, but I cannot figure out how to do it!
You're not too far off. You can turn a LoopStream into an ISampleProvider with the ToSampleProvider extension method, and then you can use the Take extension method to limit the duration of the looped stream to match the duration of the file you're mixing it with.

finding malware in 2 different files of the same program

So this is an intro class I am taking in reverse engineering.
So I have two files that are the same program and one is supposed to have a trojan in it.
I looked at both files and have found some very odd things. However, I don't have reasons as to why it would happen.
The PE header is different. In one file in the DOS header the PE header is located at offset F0 and the other at F8. Why? I don't really understand. Why would someone change the PE header by 8 bytes?
I noticed the code entry points are different too. Does this mean that the start of the program is jumping else where meaning both programs are running from different locations.
I noticed all of the RVA's for say the export or import table have increased or shifted up higher. I assume this is because the PE header shifted by 8 bytes, therefore everything else in the file will shift up too.
The size of code value is different, as I found one file is a bit larger than the other. The time stamps are different too meaning that the file must have been edited.
One of the files has the import symbol execve, while the other does not. I don't know what this symbol does?
Lastly, I think 1 of the export symbols has jumps and such, that the other does not have. Meaning that it is doing something it shouldn't be doing.
Anyway, these are some observations I have noticed. I just need help making sense of what these observations might mean.
Thanks.
A Noob reverse engineer.
hopefully this will clear some things up.
I noticed the code entry points are different too. Does this mean that the start of the program is jumping else where meaning both programs are running from different locations.
Ok the change in the code entry points can clearly indicate that the code has been tampered with and often means that the malicious code will be called on entry and then the malicious code will run the normal code there-after. This is done so that the user does not notice the application has been tampered with.
The size of code value is different, as I found one file is a bit larger than the other. The time stamps are different too meaning that the file must have been edited.
The change in size can also indicate that there is malicious code in the executable because executables are not supposed to grow (I don't know you are feeding yours).
One of the files has the import symbol execve, while the other does not. I don't know what this symbol does?
As for execv, please see _execv, _wexecv MSDN

Any way to determine when a .net program was compiled/built

I need to prove that a VB.NET program that I wrote was written at a particular time.
(the reason is an academic integrity investigation where someone copied my code).
I have all the code on my disk including the debug and release folders, with my username in the build paths.
Are their addition things I could do, such as looking in the binaries?
If you use IL Disassembler to open the EXE/DLL, then select menu option View>Header, there is a field called "Time-date stamp" in the COFF/PE header. It's in binary format, and according to MSDN it is:
The low 32 bits of the time stamp of the image. This represents the date and time the image was created by the linker. The value is represented in the number of seconds elapsed since midnight (00:00:00), January 1, 1970, Universal Coordinated Time, according to the system clock.
First thing you should do it copy all of the data as it stands to another device - making sure you preserve all date times. Do not open or edit any of the files.
Each file will have three timestamps, when it was created, when it was last modified etc. These can be found using DIR /T
/T Controls which time field displayed or used for sorting
timefield C Creation
A Last Access
W Last Written
Get a listing of the directory like this:
DIR myrootdir /s /ah /as /tc > fileslist.txt
This will dump out all the files with creation times to a file called fileslist.txt
Also as #EricJ says : offer your disk as evidence - but like I said make a copy first. It would be best to make an image copy (windows backup) to an another drive first.
The investigators are going about this all the wrong way.
Any timestamp data can be faked, so the best way would be for them to sit down and ask detailed questions about how the code works, to both parties seperately.
Or to ask both parties to complete a small test project, again seperately - under exam conditions.
The one that copied the work wont understand what they copied most likely, and wont be able to reproduce something based on similar concepts.
The one who did write it - well unless they cheated to, they will understand it all in depth.

Problems with KML layers

My map has 9 layers which can be toggled on and off by means of a tick box. Each layer is loaded in by JS. My problem is that only 5 of the nine load when their tick box is active. Is there a limit on how many kml layers can be loaded in to v3 api? In total there is less than 1mb, though they do have quite a few custom marker icons.
Thanks
Darren
I am having a eerily familiar-sounding issue. I am working with KML files in similar fashion (an array of map layers as KML files are picked/loaded by JS via a tick box, not big, although quite a few custom marker icons, long descriptions, etc..).
I have just now become aware of what seems like a NEW limit. Wish I could be more help, but I will share what I know.
In my case, the exact same code was working fine as far back as June 2011 and the map has been viewed/used by 1000s of people. It's a custom v3 api map for an annual event. The 'legend' has about 34 layers containing lines or markers, classified by type, any of which could be 'on' at any given time. The default, and most commonly used, setup had about 12 "layers" on. Only one of the layers (happens to be the last one in the array when it loads) is one that changes significantly from year to year. It's also the most important one. If people were getting significant errors in June 2011 in some browser/OS, I would have heard about it. It was this file that I sat down to "quickly" edit and re-upload in time for the 2012 event. I hadn't even touched it before I noticed that something was wrong. I haven't needed to look at the map since last year, so unfortunately I can't be more precise as to when this started happening.
Findings/Conclusions/Ideas?:
dumping the browser cache does nothing
Through trial and error, I found that it will actually load up to 6 "layers", but it seems to depend on the size/type/juju? of any given kML file in the comma delineated list. But, like you, it's usually around 5 or 6.
Oddly, the KML files themselves seem to be there in the background. Invisible markers will affect the mouse pointer "on hover over" and their info boxes will show up when clicked. Strange.
This problem is definitely less than a year old for me.
If not somehow related to some new action of Google's (in which case I would suspect we'd hear about it or have found something in the API forums..) perhaps it is related to some new limits placed by one of our servers and is simply a coincidence?
---> hmm. I recently had a wordpress site's menu item limit issue and got my hosting provider to change some suhosin request limits(?) I can try migrating the source file to another server, and see if it works from there. If it does, that would be a significant clue, and I will report back.
Other than that, I have no idea. Just thought I'd share what I've learned and hope someone else has some idea of what's going on.

Error checking overkill?

What error checking do you do? What error checking is actually necessary? Do we really need to check if a file has saved successfully? Shouldn't it always work if it's tested and works ok from day one?
I find myself error checking for every little thing, and most of the time if feels overkill. Things like checking to see if a file has been written to a file system successfully, checking to see if a database statement failed.......shouldn't these be things that either work or don't?
How much error checking do you do? Are there elements of error checking that you leave out because you trust that it'll just work?
I'm sure I remember reading somewhere something along the lines of "don't test for things that'll never really happen".....can't remember the source though.
So should everything that could possibly fail be checked for failure? Or should we just trust those simpler operations? For example, if we can open a file, should we check to see if reading each line failed or not? Perhaps it depends on the context within the application or the application itself.
It'd be interesting to hear what others do.
UPDATE: As a quick example. I save an object that represents an image in a gallery. I then save the image to disc. If the saving of the file fails I'll have to image to display even though the object thinks there is an image. I could check for failure of the the image saving to disc and then delete the object, or alternatively wrap the image save in a transaction (unit of work) - but that can get expensive when using a db engine that uses table locking.
Thanks,
James.
if you run out of free space and try to write file and don't check errors your appliation will fall silently or with stupid messages. i hate when i see this in other apps.
I'm not addressing the entire question, just this part:
So should everything that could
possibly fail be checked for failure?
Or should we just trust those simpler
operations?
It seems to me that error checking is most important when the NEXT step matters. If failure to open a file will allow error messages to get permanently lost, then that is a problem. If the application will simply die and give the user an error, then I would consider that a different kind of problem. But silently dying, or silently hanging, is a problem that you should really do your best to code against. So whether something is a "simple operation" or not is irrelevant to me; it depends on what happens next, or what would be the result if it failed.
I generally follow these rules.
Excessively validate user input.
Validate public APIs.
Use Asserts that get compiled out of production code for everything else.
Regarding your example...
I save an object that represents an image in a gallery. I then save the image to disc. If the saving of the file fails I'll have [no] image to display even though the object thinks there is an image. I could check for failure of the the image saving to disc and then delete the object, or alternatively wrap the image save in a transaction (unit of work) - but that can get expensive when using a db engine that uses table locking.
In this case, I would recommend saving the image to disk first before saving the object. That way, if the image can't be saved, you don't have to try to roll back the gallery. In general, dependencies should get written to disk (or put in a database) first.
As for error checking... check for errors that make sense. If fopen() gives you a file ID and you don't get an error, then you don't generally need to check for fclose() on that file ID returning "invalid file ID". If, however, file opening and closing are disjoint tasks, it might be a good idea to check for that error.
This may not be the answer you are looking for, but there is only ever a 'right' answer when looked at in the full context of what you're trying to do.
If you're writing a prototype for internal use and if you get the odd error, it doens't matter, then you're wasting time and company money by adding in the extra checking.
On the other hand, if you're writing production software for air traffic control, then the extra time to handle every conceivable error may be well spent.
I see it as a trade off - extra time spent writing the error code versus the benefits of having handled that error if and when it occurs. Religiously handling every error is not necessary optimal IMO.