Automate adding entries to a wiki - scripting

Once I have my renamed files I need to add them to my project's wiki page. This is a fairly repetitive manual task, so I guess I could script it but I don't know where to start.
The process is:
Got to appropriate page on the wiki
for each team member (DeveloperA, DeveloperB, DeveloperC)
{
for each of two files ('*_current.jpg', '*_lastweek.jpg')
{
Select 'Attach' link on page
Select the 'manage' link next to the file to be updated
Click 'Browse' button
Browse to the relevant file (which has the same name as the previous version)
Click 'Upload file' button
}
}
Not necessarily looking for the full solution as I'd like to give it a go myself.
Where to begin? What language could I use to do this and how difficult would it be?

Check if the wiki you mean to talk to supports XMLRPC, because if it does it should be a snap. I wrote a tool called WikiUp to solve a similar problem (updating a delineated section on a wiki page).

If you're writing in C#, the WebClient classes might be a good place to start. I bet people could give more specific advice if you mentioned which wiki platform you are using, and whether it requires authentication, though.
I'd probably start by downloading fiddler and watching the http requests from doing it manually. Then you could use some simple scripts and regexes to build your http requests for automating the process.
Of course, if your wildly lucky, your wiki would have a backend simple enough that you could just plug them into its db directly. :)

You might find CoScripter useful -- it's a Firefox extension that allows you to automate tasks you perform on websites. I'm not certain how you'd integrate this with the list of files you're changing on your local system, but it can certainly handle the file uploading through a web form.
Better bet is probably using cURL or a similar HTTP library with your programming language of choice. If you're on *nix, you can use the cURL commandline program inside your shell script to get this done fairly easily. (Like #jsight said you will need to analyze the actual forms you're using on the webpage, using Fiddler or just looking at the form elements and re-creating the POST through cURL.)

Related

How to get Webtop Drl of a document via .net application?

Is it possible to retrieve drl ex:https://host:port /ewebtop/drl/objectId/0900a58e80970f7b of document via .net application?.So that when users clicks on this link they can be able to edit the document and when they close the document the document should be autosaved onto documentum.
First of all: a link is a link. What you decide to do with it I u to you. Default handler in browser will just redirect you to webtop application. If you have SSO you can have the document opened for edit. There are some extra arguments that can be provided (view/edit).
The object id is the only varying part of the URL, so you can easily construct this in code.
Secondly: what is your goal? There is no way to make the document upload itself into Documentum repo. You can write a plugin for every application to handle that, but it seems like a big task - especially dealing with security.
The problem is that upon check-in, user must provide some information - at least about the new version number...
If you're building a thick client in .net I would go with DFS - that's the only real option here.

Does google index script tags as content when using handlebars.js

If you use the standard handlesbar.js implementation, does Google view the content within the custom script tags as content, script or unknown content?
If you're in doubt, do in pure HTML. Unfortunately, Google should ignore this. I looked about, and all I heard is that this application was not made ​​to be searchfriendly.
In fact, Google undestand and even follow links created via Javascript, but handlebarsjs is very more complex.
Possible solution
A strong suggestion that I make to you is load a simplified version with some content in plain simplified and after use handlebarsjs, so at then at least do not let google completely blind. But thsi version should be used also to end user, because google Will know if you show a diferent content just for Googlebot.
Possible solution 2
Exist a way to make websites that rely heavily on AJAX still work in Making AJAX Applications Crawlable

Meteor File Uploads

I see that this has been asked here before, but nothing since Meteor.http has been available. I'm still grasping the concepts of Meteor and file uploads are totally eluding me.
Here's my question:
So, in what I believe to be the right method,
Meteor.http.call("POST", url, [options], [asyncCallback]) what do you put for the url? With the client/server javascript relationship in meteor, it doesn't seem like it really uses urls that much.
If anyone has a basic example of a file upload in meteor, that would just be extra awesome.
well been playing a bit with meteor. Made a collectionFS a mix of meteor and gridFS (could be compatible).
Test it here: http://collectionfs.meteor.com/
It support quit large files, multiple files, users etc. I've tested a 50Mb seems ok, if connection is lost or browser dies the user can resume upload.
It should even be possible to have multiple users upload to exact same file - haven't quit found a usecase for it, but it's possible.
Accounts, publishing etc. is as with collections - the test is in autopublish mode, though only meta data is avaliable - chunks of data is served in background via blobs.
I'll try getting it on github,
Take a look at filepicker.io. They handle the upload, store it into your S3, and return to you the url that you can dump into your db.
Wget the filepicker script into your client folder.
wget https://api.filepicker.io/v0/filepicker.js
Insert a filepicker input tag
<input type="filepicker" id="attachment">
In the startup, initialize it:
Meteor.startup( function() {
filepicker.setKey("YOUR FILEPICKER API KEY");
filepicker.constructWidget(document.getElementById('attachment'));
});
Attach a event handler
Template.templateNameHere.events({
'change #attachment': function(evt){
console.log(evt.files);
}
});
(I had posted on How would one handle a file upload with Meteor? Sorry. I'm new here. Is it kosher to copy the same answer twice? Anyone who knows better can feel free to edit this.)
Checkout how to accomplish this using Meteor.Method on the server and the FileReader's api on the client
https://gist.github.com/dariocravero/3922137
After several searches, this looks to me the easiest (and for the moment the meteor's style way) to handle a file upload with no extra dependencies.
Since meteor includes JQuery by default, you can utilize a Jquery plugin for that, i presume, something like: https://github.com/blueimp/jQuery-File-Upload/wiki/Options can do the trick for you, and supports both GET and PUT.
Otherwise it would be a pain in the ass to get it to work, but not impossible, since you can access PUT in meteor.
If you would prefer a more pure JS sollution maybe you can look at: http://igstan.ro/posts/2009-01-11-ajax-file-upload-with-pure-javascript.html
And adapt it.
There is no ready made support for file uploads so share what you come up with, i would be very interested!
Alternatively (if you wouldn't like to use a 3rd party solution like filepicker) you could use the meteor router package.
This handles the HTTP requests on server-side.

Figure out if a website has restricted/password protected area

I have a big list of websites and I need to know if they have areas that are password protected.
I am thinking about doing this: downloading all of them with httrack and then writing a script that looks for keywords like "Log In" and "401 Forbidden". But the problem is these websites are different/some static and some dynamic (html, cgi, php,java-applets...) and most of them won't use the same keywords...
Do you have any better ideas?
Thanks a lot!
Looking for password fields will get you so far, but won't help with sites that use HTTP authentication. Looking for 401s will help with HTTP authentication, but won't get you sites that don't use it, or ones that don't return 401. Looking for links like "log in" or "username" fields will get you some more.
I don't think that you'll be able to do this entirely automatically and be sure that you're actually detecting all the password-protected areas.
You'll probably want to take a library that is good at web automation, and write a little program yourself that reads the list of target sites from a file, checks each one, and writes to one file of "these are definitely passworded" and "these are not", and then you might want to go manually check the ones that are not, and make modifications to your program to accomodate. Using httrack is great for grabbing data, but it's not going to help with detection -- if you write your own "check for password protected area" program with a general purpose HLL, you can do more checks, and you can avoid generating more requests per site than would be necessary to determine that a password-protected area exists.
You may need to ignore robots.txt
I recommend using the python port of perls mechanize, or whatever nice web automation library your preferred language has. Almost all modern languages will have a nice library for opening and searching through web pages, and looking at HTTP headers.
If you are not capable of writing this yourself, you're going to have a rather difficult time using httrack or wget or similar and then searching through responses.
Look for forms with password fields.
You may need to scrape the site to find the login page. Look for links with phrases like "log in", "login", "sign in", "signin", or scrape the whole site (needless to say, be careful here).
I would use httrack with several limits and then search the downloaded files for password fields.
Typically, a login form could be found within two links of the home page. Almost all ecommerce sites, web apps, etc. have login forms that are accessed just by clicking on one link on the home page, but another layer or even two of depth would almost guarantee that you didn't miss any.
I would also limit the speed that httrack downloads, tell it not to download any non-HTML files, and prevent it from downloading external links. I'd also limit the number of simultaneous connections to the site to 2 or even 1. This should work for just about all of the sites you are looking at, and it should be keep you off the hosts.deny list.
You could just use wget and do something like:
wget -A html,php,jsp,htm -S -r http://www.yoursite.com > output_yoursite.txt
This will cause wget to download the entire site recursively, but only download endings listed with the -A option, in this case try to avoid heavy files.
The header will be directed to file output_yoursite.txt which you then can parse for the header value 401, which means that the part of the site requires authentication, and parse the files accordingly to Konrad's recommendation also.
Looking for 401 codes won't reliably catch them as sites might not produce links to anything you don't have privileges for. That is, until you are logged in, it won't show you anything you need to log in for. OTOH some sites (ones with all static content for example) manage to pop a login dialog box for some pages so looking for password input tags would also miss stuff.
My advice: find a spider program that you can get source for, add in whatever tests (plural) you plan on using and make it stop of the first positive result. Look for a spider that can be throttled way back, can ignore non HTML files (maybe by making HEAD requests and looking at the mime type) and can work with more than one site independently and simultaneously.
You might try using cURL and just attempting to connect to each site in turn (possibly put them in a text file and read each line, try to connect, repeat).
You can set up one of the callbacks to check the HTTP response code and do whatever you need from there.

Refresh browser via cron(or not) to a different page on remote request?

I need to display pages in a tutorial fashion. I looked in to netsupport, beamyourscreen and other possibilities but, I do not want the viewers to download anything. I cannot use gd / send screenshots due to audio / video instructions embedded in some of the pages.
Basically, I need the ability to "refresh" a users browser window to a different page via an interface on my end. Whether via a form submission, javascript or any other type of "controller" that allows me to change the page on the viewers browser. PERL preferred but, PHP / javascript whatever works and is cross browser. I set up a simple javascript page forward timer that "works" but, page load times and conversation interruptions are a huge factor.
The entire tutorial website will be developed around this ability.
I was looking in to curl / cron / wget methods but, found little information.
I have seen forum and chat scripts that basically perform a similar task but, there must be a simple(ish) solution in leau of hacking up another script to suit my needs.
I do not want others to control the pages either. The site really, only needs to be accessable during the tutorial however, It "could" remain web accessable as long as user interaction was normal unless (being controlled).
The initial site concept is based on instructing people how to properly introduce new pets into a home. Will be operated by a veteranarian that saved my pets life. I wanted to give something back.
Possible? I really appreciate simple examples etc...
You have no other way but to keep polling the server for "instructions" using javascript. No, you can't send nothing to the end user browser, neither curl nor wget.
Mainly, you'll have to set up a simple request/response protocol between the browser and the server.
If you want to go deeper, you can use something like cometd/meteord/etc. If not, a hidden iframe that reloads himself and receives pages with javascript code for the needed actions can do the trick.
Another alternative.
With javascript dopolling and single character flatfile. Have a simple one character flatfile with a single var. Write it in perl (it is faster and uses less resources than php). The parent script calls a javascript variable in a flatfile. It hits the flatfile and goes wherever the var sets it. The flatfile is written to by the controller. Done.
I guess you could also rename an empty flatfile and use that as the controller. I am usure which is faster, open and read a specific file or hit the directory and return the file name. On the controller side, opening and writing to a file vs renaming a file. Maybe they counter each other in resources and time?
This way the site can act as a normal site. When you want to have remote users see a "presentation" (automatically being shown the site pages at the controllers pace), the controller activates polling and tells the viewers to push a start button. This allows a remote instructor to load pages for the viewers at his leisure.
It is a simple solution that works with nothing really sophisticated going on. No frames are needed either. Just need javascript enabled.
Any better suggestions are welcome!
It occurred to me that what you might want to use is HTML Push technology. Check out the wiki, they have several links. I have never used it myself