Is it possible to run an OpenRefine script in the background? - openrefine

Can I trigger an OpenRefine script to run in the background without user interaction? Possibly use a windows service to load a OpenRefine config file or start the OpenRefine web server with parameters and save the output?
We parse various data sources from files and place the output into specific tables and fields in sql server. We have a very old application that creates these "match patterns" and would like to replace it with something more modern. Speed is important but not critical. We are parsing files with 5 to 1,000,000 lines typically.
I could be going in the wrong direction with OpenRefine if so please let me know. Our support team that creates these "match patterns" would be best suited with a UI like OpenRefine instead of writing Perl or Python scripts.
Thanks for your help.

OpenRefine has a set of library that let you automated an existing job. The following are available:
* two in Python here and here
* one in ruby
* one in nodejs
Those libraries needs two inputs:
a source file to be processed in OpenRefine
the OpenRefine operation in JSON format.
At RefinePro (disclaimer I am the founder and CEO of RefinePro), we have written some extra wrapper to schedule to select an OpenRefine project, extract the JSON operations, start the library and save the result. The newly created job can then be scheduled.
Please keep in mind that OpenRefine has very poor error handling which limits it's usage as an ETL platform.

Related

Is there a way to export a query or table from BigQuery in .txt format?

I have to deposit a report in .txt format once a day and upload it to an SFTP. I have generated the report in BigQuery but can't find a way to export it as .txt. Is it possible?
There are quite a number of ways to accomplish this and almost all involve some extend of coding with clients of your choice or great GCP tools like Dataflow, etc. They all require skilled engineers at hand
For sure, there will be few answers covering those options
Meantime, I want to provide different option.
There are some third party tools that helps to achieve same w/o no extra coding (rather than BigQuery querying)
Below is example of how simple it is to do with Magnus which is part of Potens.io suite of powerful and efficient tools for BigQuery designed so that even the non-engineer can easily explore and automate workflows to become self-sufficient in their data needs like in your question.
Disclosure: Google Developer Expert in Cloud here - author of BigQuery Mate and Potens.io (Magnus and Goliath) productivity tools
So, in below screenshot you see workflow with just two Tasks.
First Task defines payload of your report and Second Task uploads it to client's SFTP
Below you can see flip side of second task with more settings - zero coding!
In this particular example - you do not even need to persist your report in BQ Table - Second Task will just pick it from the first Task (even though obviously in real life you most likely to preserve report - which is still easy to set in first Task using Destination Entry)
I recommend you to try

How to pass random parameters to SilkTest Workbench or Classic Record&Play Scenario

I am new to SilkTest and I don't have any scripting background. What I need to do is to record some test cases and then play them to check my system. After getting used to it, I plan to learn scripting and dive into it, but first things first.
What I need is to pass random generated (or randomly read from a text file or pre-defined) parameters into the recordins so that every time I run the tests, different parameters are used. For example, there is a component in which I write some letters and the component filters the results based on the text. Then, I select one of the results. Now, instead of recording the same letters everytime, how can I use random given parameters?
Thanks
What you are looking for is called Active Data in Silk Test.
It allows enhancing your visual tests with external data, for example from an Excel file.
ActiveData testing enables you to leverage existing data in external files as input for powerful, comprehensive application testing solutions. ActiveData testing enables you to perform multiple transactions against test applications using a different set of data for each transaction without writing complicated code or compromising existing data.
You can find an introduction to Active Data in the online documentation or in the tutorial video.
I have a question, what version of Silk Test are you using, also, what client are you using (Silk Test Workbench, Silk4Net or Silk4J). Each of these clients has the ability to receive parameters from an external source whether it is from a command line or from an external data file.
You indicate that you want random data, do you really mean random data or external data? If it is random data that you want you probably need to use a random number/string generator for the client that you are working with (.Net code for Workbench and Silk4Net and Java code for Silk4J).

Need suggestions on what tool to use for manipulating a file

We have a need to create a daily process that will manipulate a file that is now being manually generating before FTPing it to a vendor. The issues with the current file are as follows:
1) It is currently comma delimited and it needs to be pipe delimited.
2) The vendor only want specific columns to be sent. They have a limit of 26 columns.
We need to develop an automated process that can be scheduled to run once a day and pick up a file with a specific extension, do the file manipulation and FTP the file.
Ideally, we would like to have some error handling in the process. We would want an email to get sent out if there was no file to process or if there was an error during the manipulation or FTP process.
My first thought was to use SQL Server Import/Export. I've done this before but that was only for packages that could be run manually. This process needs to be fully automated (after the existing file is manually generated.) I don't see a way to pick up any file with a specific extension. It looks like I have to select a specific file.
Is there a way to use Import/Export or some similar tool?
Or, do I need to write a program to do this sort of task? It seems to me like it would be more work to write a program. So, I am trying to avoid that.
Thank you for your help!
You should write a program. Seriously.

configure multiple servers and scale

I have been given a task configure 1000 of servers with some simple data. Lets say I need to login to server (linux or windows) and setup the ntp server. I need to come up with some kind of automation framework using perl. I have some ideas and want to get more.
Here is my thought process:
a) Since there are 1000s of servers, definitely the framework should be able to read in a csv file so all inputs can be provided as apposed to single input.
b) Since there are so many servers, I have to find a way to do things in parallel. I cant go server by server in a sequential way
c) I should have some output file that shows the results of all the servers that I successfully configured, servers that failed. That way I can compare input file and output file and generate a report
Should I consider anything else in my framework ?
How can I do parallel processing using perl ?
Even if you want to stick with Perl, it looks like there are already some alternatives available that would keep you from implementing another framework from scratch.
Check out the comments from http://my.opera.com/cstrep/blog/2010/05/14/puppet-fabric-and-a-perl-alternative for a couple options.

Any way to automate the process of opening a .mpp file and saving it as a .csv?

I need to find a way to automate the process when a user uploads a microsoft project file to a web application I already have created. The process will need to basically use the save as from project to save into a .csv file so I can use this to import the data to an SQL database (this is needed for custom reporting we already have set up using SQL). I need to automate this process because I will be receiving tons of project files, and if the process is automated the users will then be able to instantly see results.
Basically, is there any way to create or run an automated process that will save these project files as .csv files? Even if the csv files are not formatted correctly, I can find a way around that, just need to first get them into .csv files.
Thank you.
edit - the only way i could think of this is to follow the instructions listed below, but
I would then need to automate a process to open the file and hit save so this works... any other suggestions?
http://social.technet.microsoft.com/Forums/en-US/projectprofessional2010general/thread/eea4ca15-0a0b-4c07-9989-87536b961385/
edit 2 - also looking into ways using Microsoft.Office.Interop.MSProject but not finding any luck.
edit 3 0 now using mpxj - the only issue I am having is the following listed below. Converting their example to vb.
Private Shared Function ToEnumerable(ByVal javaCollection As Collection) As EnumerableCollection
Return New EnumerableCollection(javaCollection)
End Function
the error is with EnumberableCollection - visual studio is not picking it up as a valid type - anything I am doing wrong or should substitute?
If you aren't wedded to using MS Project itself to extract data from the project files, you could consider using the MPXJ library. This would allow you to write a simple utility to open the MPP files you are given, extract the data items you are interested in, and write them directly to your database (or an intermediate CSV file, as required). MPXJ comes in Java and .Net flavours, so you can use your preferred language to do the work.
Jon
p.s. Disclaimer: I maintain MPXJ