I have been given a task configure 1000 of servers with some simple data. Lets say I need to login to server (linux or windows) and setup the ntp server. I need to come up with some kind of automation framework using perl. I have some ideas and want to get more.
Here is my thought process:
a) Since there are 1000s of servers, definitely the framework should be able to read in a csv file so all inputs can be provided as apposed to single input.
b) Since there are so many servers, I have to find a way to do things in parallel. I cant go server by server in a sequential way
c) I should have some output file that shows the results of all the servers that I successfully configured, servers that failed. That way I can compare input file and output file and generate a report
Should I consider anything else in my framework ?
How can I do parallel processing using perl ?
Even if you want to stick with Perl, it looks like there are already some alternatives available that would keep you from implementing another framework from scratch.
Check out the comments from http://my.opera.com/cstrep/blog/2010/05/14/puppet-fabric-and-a-perl-alternative for a couple options.
Related
I have a task to do. I need to build a WCF service that allow a client to import a file inside a database using the server backend. In order to do this, i need to communicate to the server, the setting, the events needed to start and set the importation and most importantly the file to import. Now the problem is that these files can be extremely large (much bigger then 2gb), so it's not possible to send them via browser as they are. The only thing that comes into my mind is to split these files and send them one by one to the server.
I have also another requirement: i need to be 100% sure that this file are not corrupted, so i need to implement also a sort of policy for correction and possibly recover of the errors.
Do you know if there is a sort of API or dll that can help me to achieve my goals or is it better to write the code by myself? And in this case, which would be the optimal size of the packets?
I have sequence of steps that an user does, e.g. logging on the a remote UNIX shell, creation of files/directories, changing permission, Running remote Shell scripts and commands, File deletion, File movements,
Run DB queries and basis the query results perform certain tasks exporting the results to a file or run further shell commands/scripts or DB insert statements etc etc.
doing there steps users achieves different processed or data processing and validating.
What is the best way to automate the above schenerio, Should we go for a Workflow tools like Activiti etc. or is there a better framework/way to achieve the requirements.
My requirement is to work with Open-source, and possibly Java based.
I am completely new to this so any help pointers would be appreciated.
The scenario you describe is certainly possible with a workflow tool like Activiti. Apache Camel or Spring Integration would be another possibility (as all the steps you mention are automatic system tasks).
A workflow framework would be a good option if you need one of these
you want to store the history data for 'audit purposes': who did what/when/how long did it take.
you want to visually model your steps, perhaps to discuss it with business people.
there is a need for human interaction between some of the steps
Your description reminds me of a software/account provisioning process.
There are a large number of provisioning tools on the market both Open Source or otherwise (Dell Crowbar is one options).
However, A couple of the comments you made in your response to Joram indicate a more general purpose tool such as Activiti may be an option:
"Swivel Chair" tasks - User tasks that may one day be automated
Visual model of process state
Most provisioning tools dont allow for generic user tasks and dont provide a (good) visual model of the process state.
However, they generally include remote script execution which would need to be cobbled together as a service task if using a BOM tool.
I would certainly expand my research to include provisioning tools as they sound like a better fit, however if you cant find anything that works for you, a BPM platform provides a generic framework to build what you need.
Can I trigger an OpenRefine script to run in the background without user interaction? Possibly use a windows service to load a OpenRefine config file or start the OpenRefine web server with parameters and save the output?
We parse various data sources from files and place the output into specific tables and fields in sql server. We have a very old application that creates these "match patterns" and would like to replace it with something more modern. Speed is important but not critical. We are parsing files with 5 to 1,000,000 lines typically.
I could be going in the wrong direction with OpenRefine if so please let me know. Our support team that creates these "match patterns" would be best suited with a UI like OpenRefine instead of writing Perl or Python scripts.
Thanks for your help.
OpenRefine has a set of library that let you automated an existing job. The following are available:
* two in Python here and here
* one in ruby
* one in nodejs
Those libraries needs two inputs:
a source file to be processed in OpenRefine
the OpenRefine operation in JSON format.
At RefinePro (disclaimer I am the founder and CEO of RefinePro), we have written some extra wrapper to schedule to select an OpenRefine project, extract the JSON operations, start the library and save the result. The newly created job can then be scheduled.
Please keep in mind that OpenRefine has very poor error handling which limits it's usage as an ETL platform.
I am looking to control access to some large files (we're talking many GB here) by the use of signed URLs. The files are currently restricted by LDAP Basic authentication (mod_auth_ldap), but I need to change this to verify the signature (passed as a query parameter in the URL).
Basically, I just need to run a script to verify the signature, and allow the request to proceed as if authentication had succeeded. My initial thought to this was just to use a simple CGI script, but as the files are so large I'm concerned about performance. So, really, this question is (probably) more like "are there any performance implications of streaming large files from a CGI script via Apache?"… and if so, "is there a better way of doing this (short of writing a dedicated authentication module)?"
If this makes any sense, help would be much appreciated :)
P.S. I wasn't sure exactly what to search for for this (10 minutes of Googling were fruitless), so I may very well be duplicating someone else's post.
Have a look at the crypto cookies/sessions in apache - one way to do this is to put a must-have-valid session limit on that directory - forward anyone who does not have a valid one to a cgi-script; auth there - and then forward back to the actual download.
That way apache can use its normal sendfile() and other optimizations.
However keep in mind that a shell script or perl script ending with a simple 'execvp', 'exec cat' or something like that is not that expensive.
An alternative is more uRL based - like http://authmemcookie.sourceforge.net/.
Dw.
I ended up solving this with a CGI script as mentioned… cookies weren't an option because we need to be able to support clients that don't use cookies (apt).
I have a server with 10+ virtual domains (most running Mediawiki). I'd like to be able to watch their traffic remotely with something nicer than tail -f . I could cobble something together, but was wondering if something super-deluxe already exists that involves a minimum of hacking and support. This is mostly to understand what's going on, not so much for security (though it could serve that role too). It must:
be able to deal with vhost log files
be able to handle updates every 10 seconds or so
Be free/open source
The nice to haves are:
Browser based display (supported by a web app/daemon on the server)
Support filters (bots, etc)
Features like counters for pages, with click to view history
Show a nice graphical display of a geographic map, timeline, etc
Identify individual browsers
Show link relationships (coming from remote site, to page, to another page)
Be able to identify logfile patterns (editing or creating a page)
I run Debian on the server.
Thanks!
Take a look at Splunk.
I'm not sure if it supports real time (~10 second) updates but there are a ton of features and it's pretty easy to get set up.
The free version has some limitations but there is also an enterprise version.
Logstash is the current answer. (=
Depending on the volume, Papertrail could be free for you. It is the closest thing to a tail -f and is searchable, archivable and also sends alerts based on custom criteria.