Bulk text2speech Generation with R

Bulk text2speech Generation with R - text-to-speech

Is it possible to loop through a list of words in R that can each be generated into separate “speech” files using the speech2text website?
https://www.text2speech.org/
To make one file manually one has to type in the text one one page then submit it. A second page then opens with the option to download the file. Since I want to do many of these I would like to find a way to automate it. I have no idea how to approach this idea though.
EDIT
So I am using "say" on mac based on a the helpful comments. I am running it through R using a loop for all strings in a vector
for(i in 1:nrow(test[1:5])){
system("say", intern =F,input = test$English[i])%>%saveRDS(paste0("/Users/Desktop/tts/", test$English[i],".aiff"))
}
This creates the files as expected in the expected location but the .aiff files won't play in any media player. Does anyone see what I am doing wrong?

Consider using a CLI based solution for TTS, like espeak/espeak-ng (cross-platform), festival (linux), say (osx) or via powershell on windows (here's a script). Looks like the page you're referencing is using flite, which is a lightweight (and downloadable) version of festival.

Related

Passing multiple files to input node with Watir (using Ruby)

So I have hit a bit of a snag. I am trying to automate a test case where I need to pass multiple files to an input node and I cannot figure out how to do so. I can use either Mechanize or Watir, but have found very little information on a topic that seems relatively major in automation. In the snippets below, I'm using Watir with Ruby. The main issue I'm having is that it seems when multiple files are selected, the input node is no longer visible. The input node does accept multiple files, and passing in a single path does result in a successful upload, like so.
path1 = "/path/to/file.json"
file_field.set path1
I would think that passing in multiple files would be as simple as passing in a string with multiple paths separated by some sort of delimiter. I'm not particularly savvy with web dev however, and am struggling to grasp where I should even start. When I attempt to pass in multiple files like so:
multiple_paths = ("/path/to/file1.json"; "/path/to/file2.json")
file_field.set multiple_paths
it gives uploads the second file but not the first (making me think maybe it's uploading them in sequence, and the second is overwriting).
Do you think this is even possible using Watir? I know that Chrome has a workaround for uploading multiple files using \n as the delimiter, is there a similar workaround for Firefox?

Currently there doesn't seem to be a workaround for Firefox. If anybody knows of one, please post the answer as I couldn't find a solution anywhere. I figured I'd post the solution for Chrome here because resources are scarce on this.
If you need to test for multiple file uploads, have that particular instance load the Chrome driver with:
#browser = Watir::Browser.new :chrome, :prefs => profile
Then you're going to want to pass it a string that looks something like this:
paths = "path/to/first/file.json\npath/to/second/file.json\n...etc
file_field.send_keys paths

Scrape a part of website and notify on change

The website of my university unfortunately does not provide feeds but they keep publishing information there that is important for me (deadlines, dates of exams etc.) as links to pdfs
in a certain section of the website.
How can I regularly scrape that section of the site and have me notified (growl, mail something alike).
Normally I would use wget to mirror it but how to extract only parts of the website?
Is there a cli tool that can extract the XHTML via XPATH or similar?

Try this:
wget --spider --server-response http://example.com
This will print the headers which might contain the "Length"-attribute. If it changes, you can notify yourself.
edit: If it changes, you can download the whole html file, grep for a pdf file or whatever you want to look for (maybe for "<div id='news'>(.*?)</div>")

Mmm... You should take a look at QueryPath. QueryPath makes easy to parse HTML. What if the HTML structure changes? What if you want specific elements of the page? QueryPath does the hard work for you. Do you like JQuery? QueryPath is like the JQuery of PHP.
See: http://www.ibm.com/developerworks/opensource/library/os-php-querypath/index.html?S_TACT=105AGX01&S_CMP=HP
See: http://querypath.org/

You might be interested in looking at Pjscrape (disclaimer: this is my project). It's a web-scraping tool built on PhantomJS, giving you full jQuery access to the page in a headless Webkit browser context. It makes it very easy to pull semi-structured data from webpages via the command line, particularly if the page you're scraping has a consistent structure for new elements.
For example, you can pull all the course titles from this course catalog with the following code:
pjs.addScraper(
// the page you're scraping
'http://www.ischool.berkeley.edu/courses/catalog',
// selector for elements you want to pull text from
'.views-row .views-field-title'
);
// suppress STDOUT logging
pjs.config('log', 'none');
Running this from the command line gives you JSON to STDOUT by default:
~> phantomjs /path/to/pjscrape.js my_script.js
["W10. Introduction to Information","24. Freshman Seminar", ...]
So it would be pretty simple to run this script on a regular basis, capture the output in a file, and then alert you when the new output doesn't match the previous scrape. You can also write your own scraper functions, so there's a lot of flexibility for more complex scraping if a simple selector won't do the trick.

Externally triggering Thunderbird into displaying a wanted message

I would like having a way to trigger Thunderbird, from an external script, into displaying a particular message in a particular folder.
If it were Firefox, say, I would use firefox -new-tab http://some-URL, and an already running Firefox (or a new one if none) would nicely fetch and display URL. But I found no way to do something equivalent with Thunderbird, neither on the Thunderbird site or through existing extensions, and even after some furious Googling around, which I attempted more than once!
One problem, compared to a plain URL, is the need some notation for selecting a message. Short of a better solution, I wrote a script which knows folder:SOME-FOLDER:ORDINAL, and behaves like an extension of xdg-open. My tool inserts a proper prefix and a few .sbd as needed within the SOME-FOLDER part to turn it into an absolute Thunderbird file reference, and ORDINAL picks a message in that folder. My tool then grabs the message, heuristically converts it into HTML file, and then, directs a Web browser to the resulting file (and if :ORDINAL is not given, it processes the whole folder instead, yielding an HTML index and many linked messages).
My current tool helps a bit at saving message references in other documents and efficiently retrieving them later, but I handle a copy of the Thunderbird message, and not the original. So if I want to delete it, refile it in another Thunderbird folder, and do other similar operation, I still have to go to Thunderbird, interactively find my way again to the wanted message before I can handle it, and this, is not efficient. What I'm dreaming of is a way to get rid of all my HTML conversion and browser trickery, but still keep the pseudo-URL paradigm and pseudo xdg-open interface, to directly force Thunderbird into the correct folder, with the wanted message correctly displayed.
In previous email readers I used (Emacs RMAIL and then Gnus, and Mutt as well later), such things could be managed, and I heavily used such capabilities in scripts. I am astonished, surprised, even a bit dismayed, by the apparent weakness of Thunderbird as a scriptable mail reader. Am I missing something evident? Any avenue or suggestion?
François
P.S. Of course, I agree that using ORDINAL is not very clever. It might mean a different message if the folder get some messages added or deleted. This is a lesser bad. A better but potentially heavier notation might use Message-ID values, but then, an index would also be needed to find the Thunderbird folder containing each message.

There seems to be some way to do it since Google Desktop supported it according to this thread - http://forums.mozillazine.org/viewtopic.php?f=39&t=584542. Perhaps try installing Google Desktop and see what kind of hyperlink its using?
I'll add Outlook supports using external hyperlinks using the outlook: naming scheme, for example outlook:Inbox or outlook:0000000007A2379547B0624691F4FB2E5468A0D7642E2000. See http://www.davidtan.org/create-hyperlinks-to-outlook-messages-folders-contacts-events/ for more info.

Refresh browser via cron(or not) to a different page on remote request?

I need to display pages in a tutorial fashion. I looked in to netsupport, beamyourscreen and other possibilities but, I do not want the viewers to download anything. I cannot use gd / send screenshots due to audio / video instructions embedded in some of the pages.
Basically, I need the ability to "refresh" a users browser window to a different page via an interface on my end. Whether via a form submission, javascript or any other type of "controller" that allows me to change the page on the viewers browser. PERL preferred but, PHP / javascript whatever works and is cross browser. I set up a simple javascript page forward timer that "works" but, page load times and conversation interruptions are a huge factor.
The entire tutorial website will be developed around this ability.
I was looking in to curl / cron / wget methods but, found little information.
I have seen forum and chat scripts that basically perform a similar task but, there must be a simple(ish) solution in leau of hacking up another script to suit my needs.
I do not want others to control the pages either. The site really, only needs to be accessable during the tutorial however, It "could" remain web accessable as long as user interaction was normal unless (being controlled).
The initial site concept is based on instructing people how to properly introduce new pets into a home. Will be operated by a veteranarian that saved my pets life. I wanted to give something back.
Possible? I really appreciate simple examples etc...

You have no other way but to keep polling the server for "instructions" using javascript. No, you can't send nothing to the end user browser, neither curl nor wget.
Mainly, you'll have to set up a simple request/response protocol between the browser and the server.
If you want to go deeper, you can use something like cometd/meteord/etc. If not, a hidden iframe that reloads himself and receives pages with javascript code for the needed actions can do the trick.

Another alternative.
With javascript dopolling and single character flatfile. Have a simple one character flatfile with a single var. Write it in perl (it is faster and uses less resources than php). The parent script calls a javascript variable in a flatfile. It hits the flatfile and goes wherever the var sets it. The flatfile is written to by the controller. Done.
I guess you could also rename an empty flatfile and use that as the controller. I am usure which is faster, open and read a specific file or hit the directory and return the file name. On the controller side, opening and writing to a file vs renaming a file. Maybe they counter each other in resources and time?
This way the site can act as a normal site. When you want to have remote users see a "presentation" (automatically being shown the site pages at the controllers pace), the controller activates polling and tells the viewers to push a start button. This allows a remote instructor to load pages for the viewers at his leisure.
It is a simple solution that works with nothing really sophisticated going on. No frames are needed either. Just need javascript enabled.
Any better suggestions are welcome!

It occurred to me that what you might want to use is HTML Push technology. Check out the wiki, they have several links. I have never used it myself

Automate adding entries to a wiki

Once I have my renamed files I need to add them to my project's wiki page. This is a fairly repetitive manual task, so I guess I could script it but I don't know where to start.
The process is:
Got to appropriate page on the wiki
for each team member (DeveloperA, DeveloperB, DeveloperC)
{
for each of two files ('*_current.jpg', '*_lastweek.jpg')
{
Select 'Attach' link on page
Select the 'manage' link next to the file to be updated
Click 'Browse' button
Browse to the relevant file (which has the same name as the previous version)
Click 'Upload file' button
}
}
Not necessarily looking for the full solution as I'd like to give it a go myself.
Where to begin? What language could I use to do this and how difficult would it be?

Check if the wiki you mean to talk to supports XMLRPC, because if it does it should be a snap. I wrote a tool called WikiUp to solve a similar problem (updating a delineated section on a wiki page).

If you're writing in C#, the WebClient classes might be a good place to start. I bet people could give more specific advice if you mentioned which wiki platform you are using, and whether it requires authentication, though.
I'd probably start by downloading fiddler and watching the http requests from doing it manually. Then you could use some simple scripts and regexes to build your http requests for automating the process.
Of course, if your wildly lucky, your wiki would have a backend simple enough that you could just plug them into its db directly. :)

You might find CoScripter useful -- it's a Firefox extension that allows you to automate tasks you perform on websites. I'm not certain how you'd integrate this with the list of files you're changing on your local system, but it can certainly handle the file uploading through a web form.
Better bet is probably using cURL or a similar HTTP library with your programming language of choice. If you're on *nix, you can use the cURL commandline program inside your shell script to get this done fairly easily. (Like #jsight said you will need to analyze the actual forms you're using on the webpage, using Fiddler or just looking at the form elements and re-creating the POST through cURL.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas