Need suggestions on what tool to use for manipulating a file - sql

We have a need to create a daily process that will manipulate a file that is now being manually generating before FTPing it to a vendor. The issues with the current file are as follows:
1) It is currently comma delimited and it needs to be pipe delimited.
2) The vendor only want specific columns to be sent. They have a limit of 26 columns.
We need to develop an automated process that can be scheduled to run once a day and pick up a file with a specific extension, do the file manipulation and FTP the file.
Ideally, we would like to have some error handling in the process. We would want an email to get sent out if there was no file to process or if there was an error during the manipulation or FTP process.
My first thought was to use SQL Server Import/Export. I've done this before but that was only for packages that could be run manually. This process needs to be fully automated (after the existing file is manually generated.) I don't see a way to pick up any file with a specific extension. It looks like I have to select a specific file.
Is there a way to use Import/Export or some similar tool?
Or, do I need to write a program to do this sort of task? It seems to me like it would be more work to write a program. So, I am trying to avoid that.
Thank you for your help!

You should write a program. Seriously.

Related

Is it possible to automate updating Tableau extract for Tableau Reader?

Situation now:
I have a data warehouse job profile that publishes .txt file in Data folder every day in the morning. I open Tableau workbook which automatically updates data visualisations because of union I made. I save this workbook as extract and collages without Tableau Desktop can view it via Tableau Reader.
What I need:
This reporting format is heavily dependent on me and I need to automate this.
Is this even possible without Tableau Server?
Since Tableau Viewer can only use packaged workbooks with extracted data, you may not directly achieve this.
However, you may automate the packaging process using Tableau's command line parameters and the process will not be dependent on anyone anymore.
You may check the .PDF file on below link. Using that help document, you may create a .BAT file and get that .BAT file periodically started using Task Scheduler on your computer. The users then may open the packaged file from the network location you have saved. Or else (If all user computers have Tableau Desktop installed) you may put the file opening line at the end of the .BAT file, so the user can run the .BAT when they want to see the report.
https://community.tableau.com/docs/DOC-5209
Bernardo was correct in saying the Extract API can be used to programatically create extracts, and thus "refresh" an extract by simply recreating it (the point about Tableau Server is only relevant if you want to publish the extract that you create with the Extract API).
Where you might have trouble is that there is no currently supported way to programatically replace an extract within a .twbx file. That said, it should be possible to do this by simply renaming the .twbx to .zip (it is after all just an archive) and then using something like Python's zip module to manipulate the archive to replace the extract with your new extract.
NB: The Extract API can only be used to create .hyper files. If you want to work with .tde files, then you'll need to use the Tableau SDK instead

errors in transformation Kettle

I want to get errors generated by system in Pentaho Kettle and expose it as results in transformation or job, for example i want to get errors of the HL7 input from log and expose it as results in the next step.
I want to get errors generated by system
You mean like Apache or MySQL errors? If that's the case, you may just point a Pentaho transformation to those files. They usually have a default place like /var/logs/apache2 and that would be pretty easy to read.
The part that's not that easy is if you want to parse those errors into something easier to analyse. For that I would use "load file in memory" and some "regex evaluation" steps to get the data you want out of the raw text.
But, there are better solutions for reading your logs and analyzing errors.
See LogStash for more info or similar products.
You could you save those results in a temporary csv file that the next step(s) can consume.
If you go with this solution I would recommend:
Adding a unique jobID or identifier in the file name to ensure that your next step is reading the right file.
Adding a step at the end that removes old temp files

Executing Abaqus Model in Taverna

I'm pretty new to both Taverna and Abaqus but I am trying to run an Abaqus model using a "Tool" in Taverna remotely on a HPC. This works fine if I already have my model file and inputs on the HPC but I need a way of uploading the files dynamically in Taverna (trying to generically wrap Abaqus models).
I've tried adding a input port that takes a file list but I don't know how I can copy it to the "location" that I've set for the tool. Could a beanshell service be the answer or can I iterate through the file list and copy them up before executing the abaqus model?
Thanks
When you say that you created an input port that takes a file list, I guess you mean an input to the tool service.
Assuming the input port is called my_file_list, when the tool service is run, it will take a list of data values on port my_file_list. As an example, say it has "hello", "hi" and "hola" is the three values in the list.
On the location where the tool service is run, it executes in a temporary directory - a different directory for each execution of the service. It is normally something like /tmp/usecase-2029778474741087696
Three files will be created in the temporary directory; those files contain the (in this example) three values the tool service received on port my_file_list. The files could be called
/tmp/usecase-2029778474741087696/tempfile.0.tmp containing hello
/tmp/usecase-2029778474741087696/tempfile.1.tmp containing hi
/tmp/usecase-2029778474741087696/tempfile.2.tmp containing hola
There will also be a file called my_input_list. That file will contain
/tmp/usecase-2029778474741087696/tempfile.0.tmp
/tmp/usecase-2029778474741087696/tempfile.1.tmp
/tmp/usecase-2029778474741087696/tempfile.2.tmp
The script of your tool service would normally read the contents of my_input_list line by line and do something with the contents of the listed file(s).
I have also seen some scripts that 'cheat' and iterate directly over tempfile*.tmp but that would be "a bad thing". The problem with that trick, is that if you want to add a second list of files to the tool service then the file my_input_list could contain
/tmp/usecase7932018053449784034/tempfile.4.tmp
/tmp/usecase7932018053449784034/tempfile.5.tmp
/tmp/usecase7932018053449784034/tempfile.6.tmp
as other temporary files were used for the other file list port.
I hope that helps
The tool service allows you to upload files - but if you are using the HPC through a job submission node, then you would have to modify your command line tool to then use the job file staging command to further push the files as part of the job. The files would be available in the current (temporary) directory of the specified tool script.
I would try to do it through the Tool service and not involve the beanshell - then you can keep your workflow simpler.
A good thing to remember is that you can write multiple shell commands in the box.
Similarly you would probably want to retrieve back the results so that you can process them further in the workflow (unless they are massive - in which case you should just output their remote filenames and send them in again to the next HPC job)
The exact commands to use for staging files and retrieving them depends on the HPC job submission system. Which one are you using?
Thanks for the input guys.
It was my misunderstanding of how Taverna uses the File list. All the files in the list are copied to the temp "sandbox" and are therefore available for use.
Another nice easy way is to zip the directory and pass the zipped files into an input port for the service. Then just unzip the files inside the command.
Thanks again

Any way to automate the process of opening a .mpp file and saving it as a .csv?

I need to find a way to automate the process when a user uploads a microsoft project file to a web application I already have created. The process will need to basically use the save as from project to save into a .csv file so I can use this to import the data to an SQL database (this is needed for custom reporting we already have set up using SQL). I need to automate this process because I will be receiving tons of project files, and if the process is automated the users will then be able to instantly see results.
Basically, is there any way to create or run an automated process that will save these project files as .csv files? Even if the csv files are not formatted correctly, I can find a way around that, just need to first get them into .csv files.
Thank you.
edit - the only way i could think of this is to follow the instructions listed below, but
I would then need to automate a process to open the file and hit save so this works... any other suggestions?
http://social.technet.microsoft.com/Forums/en-US/projectprofessional2010general/thread/eea4ca15-0a0b-4c07-9989-87536b961385/
edit 2 - also looking into ways using Microsoft.Office.Interop.MSProject but not finding any luck.
edit 3 0 now using mpxj - the only issue I am having is the following listed below. Converting their example to vb.
Private Shared Function ToEnumerable(ByVal javaCollection As Collection) As EnumerableCollection
Return New EnumerableCollection(javaCollection)
End Function
the error is with EnumberableCollection - visual studio is not picking it up as a valid type - anything I am doing wrong or should substitute?
If you aren't wedded to using MS Project itself to extract data from the project files, you could consider using the MPXJ library. This would allow you to write a simple utility to open the MPP files you are given, extract the data items you are interested in, and write them directly to your database (or an intermediate CSV file, as required). MPXJ comes in Java and .Net flavours, so you can use your preferred language to do the work.
Jon
p.s. Disclaimer: I maintain MPXJ

ms-access: doing repetitive processes with vba/sql

i have an access database backend that contains three tables. i have distributed the front end to several users. this is a very simple database with minimal functionality. i need to import certain rows from a file every hour into one of the tables in the database. i would like to know what is the best way to automate this process so that i can have it running hourly. i need it to be running sort of as a service in the background. can you tell me how you would do this?
You could have for example:
a ms-access file with all necessary code to run the import proc
a BAT file containing the command line(s) that will run this ms-access file with all requested parameters. Check ms-access command line parameters to see the available options.
a task scheduler service software to launch the BAT file: depending on the task scheduler and the command line to be sent, you could even avoid the BAT file step
If all you want to do is run some queries, I would not do this by automating all of Access, but instead by writing a VBScript that uses DAO to execute the SQL directly. That's a much more efficient way to do it, and will run without a console logon (which may or may not be required for full Access to be run by the task scheduler).