Pentaho-kettle: Need to create ETL Jobs dynamically based on user input - dynamic

In my application, user can specify the format of their file. Based on user input we dynamically create SSIS package.
http://lakshmik.blogspot.com/2005/05...eate-ssis.html
Dynamically created SSIS package is used for processing user's files.
We want to evaluate Pentaho-Kettle for this requirement. Is this possible with Kettle to dynamically create ETL jobs based on user's inputs?
If not Pentaho, is there any Java ETL tool which allows use to dynamically create ETL jobs?

I dont know about others, but this is traditionally quite tricky in Kettle but people have done it in various ways.
The best option for this is the (brand new) inject step which lets you do really clever stuff with metadata - but it only works for some basic steps. I think it will do what you want, read about it in Matt Casters (PDI Creator and god) blog here:
http://www.ibridge.be/?s=inject&submit=Go
If that doesnt work; then your other options are to go down the generic field name route (nasty) or dynamically generate the transformation. This is easier than it sounds - but you will need to get much more heavily involved in the Java side than usual for an ETL tool.

It is possible, and not very hard.
You can use the Kettle API to dynamically create transformations that can do anything Kettle does. The GUI designer uses the API to create transformations, so anything you can do with the GUI, you can do thru the API.
If you look in the 'test' source tree you will find lots of examples of how to create transformations dynamically.

Related

Convert an online JSON set of files to a relational DB (SQL Server, MySQL, SQLITE)

I'm using a tool called Teamwork to manage my team's projects.
The have an online API that consists of JSON files that are accessible with authorisation
https://developer.teamwork.com/projects/introduction/welcome-to-the-teamwork-projects-api
I would like to be able to convert this online data to an sql db so i can create custom reports for my management.
I can't seem to find anything ready to do that.
I need a strategy to do this..
If you know how to program, this should be pretty straightforward.
In Python, for example, you could:
Come up with a SQL schema that maps to the JSON data objects you want to store. Create it in a database of your choice.
Use the Requests library to download the JSON resources, if you don't already have them on your system.
Convert each JSON resource to a python data structure using json.loads.
Connect to your database server using the appropriate Python library for your database. e.g., PyMySQL.
Iterate over the python data, inserting rows into the database as appropriate. This is essentially the JSON-to-Tables mapping from step 1 made procedural.
If you are not looking to do this in code, you should be able to use an open-source ETL tool to do this transformation. At LinkedIn a coworker of mine used to use Talend Data Integration for solid ETL work of a very similar nature (JSON to SQL). He was very fond of it and I respected his opinion, so I figured I should mention it, although I have zero experience of it myself.

The Pentaho BI Platform Workflow Issue

I have been working with Pentaho for the last few days. I have been able to setup the Pentaho Report Designer to generate a sample report by follow their documentation. Then I follow this article http://www.robertomarchetto.com/www/how_to_use_pentaho_report_designer_tutorial and managed to export the report to Pentaho BI server.
All I don't understand is Pentaho workflow. What should be the process I should follow which means what's the purpose of exporting the export to Pentaho BI server? Why there is a Data Integration tool? Why there is a BI sever when I can export the report from the Designer tool?
Requirement
All I want to do is retrieve the data from the MYSQL DB. Put them into a data-mart. Then from the data-mart generate a report.(According to what I have read, creating a data mart is the efficient way).
How can I get it done?
Pentaho Data Integration can be used to make this report generation automated.
In report designer you will be passing a parameter or set of parameters to generate a single report output.
With Data integration you can generate the reports for different set of parameters. for eg: if reports are generated on daily basis, we can make it automated for the whole month, so that there is no need of generating reports daily and manually.
And using the Pentaho Business Intelligence server we can make all these operations scheduled.
To generate Data/Table(Fact tables/dimension table) in MYSQL DB From difference source like files/different DB - Data Integration tool comes in to picture .
To create Schema on top of Fact tables - Mondrian tool
To handle user/roles on top of created cubes -Meta data editor
To create simple reports on top of small tables - Report Designer
For sequential Execution (at a go) usage of DI jobs/transformation , Reports, Java script - Design Studio
thanks to user surya.thanuri # forums.pentaho.com
The Data Integration tool is mostly for ETL, it's a separate tool and you can ignore it unless you are doing complex analysis of data from multiple dissimilar data sources. You don't need to 'export' reports to the pentaho server, you can write them directly to a directory then refresh the repository from inside the Pentaho web application. Exporting them is just one workflow technique.
You're going to find that there are about a dozen ways to do any one thing with Pentaho. For instance I use the CDA datasources with my reports vice placing the sql code inside my report. Alternatively you can link up to a Data Integration server to execute the Data Integration scripts to view a result set.
Just to answer your datamart question. In general a datamart should probably be supported by either the Data Integration tool (depending on your situation I don't exactly recommend this) or database functions/replication streams (recommended).
Just to hazard a guess, it sounds like someone tossed you a project saying: We need a BI system, here's the database where the data is stored, here are the reports we're already getting. X looked at Pentaho and liked it. You should use that.
First thing you need to do is understand the shape of the data, volume, tables, interrelations. Figure out what the real questions they want to answer are. Determine whether they need real time reporting, etc..etc. Just getting the datamart together itself, if you even need one, can take quite awhile. I think you may have jumped the gun on Pentaho itself.
thanks to user flamierd # forums.pentaho.com

Creating a simple user interface to access an Oracle database

Here is what I have:
1) a simple sql file given to me that creates tables and fills them with data
2) a simple sql file that contains PL/SQL procedures I've written for displaying/manipulating the tables
The goal is to create some sort of user interface that allows a student to login, view their transcript, withdraw from classes, etc.
I am using sqlplus. I have procedures that do all the required displaying/manipulating. I am successful at creating a simple command line UI with sqlplus, but the problem arrives when I need to get user input inside a loop (for allowing them to see course information for any number of courses until they wish to go back to the main menu). After doing research, I learned that this would be the job of something like PHP, C, etc. Unfortunately, I am not proficient in any language required, and setting up extensions and such have proven to be extremely complicated.
I am capable of learning the necessary techniques to complete this, but I do not know which direction to go in. What is easiest to implement a simple UI? Should I use PHP? C? C++? Is there some sort of program out there that automatically creates simple UIs given database data?
Oracle Apex is perfect for your situation. You can easily create web based forms and reports to perform all CRUD operations. Plus, it's free to use with any licensed Oracle database: Oracle Application Express

How can I provide users with the functionality of the DBUnit DatabaseOperation methods from a web interface?

I am currently updating a java-based web application which allows database developers to create stored procedure regression test suites for database testing.
Currently, for test setup, execution and clean-up stages, the user is provided with text boxes where they are able to enter SQL code which is executed by the isql command.
I would like to extend the application to use DB Unit’s DatabaseOperation methods to provide more ways to setup the state of the database than just SQL statements. The main reason for using Db Unit rather than just SQL statements is to be able to create and store xml and xls DataSets on a server where they can be associated with their test cases and used for data setup.
My question is:
How can I provide users with the functionality of the DBUnit DatabaseOperation methods from a web interface?
I have considered:
Creating a simple programming language and a parser to read some simple syntax involving the DB Unit method names which accept a parameter being the file location to an xml or xls DataSet. I was thinking of allowing the user to register the files they need with the web app which would catalogue them and provide each file with an identifier which could passed as a parameter to the methods in this simple programming language.
Creating an XML DTD which provides the user with the ability to specify operations and parameters. If I went this approach, how can I execute the methods and their parameters that I parse from the XML document?
Creating a table in the database which stores the method and a FK relation to a catalogued DataSet file, however I don’t think this would be good solution due to the fact that data entry would be tedious.
Thanks for your help.
This actually seems like rather simple problem when I think about it again.
DBUnit has plugins for Maven and Ant integration which run tests written in XML in the Maven POM file.
I'm going to take a similar approach and go ahead with the XML option using the Xerces-J parser and create a collection of Operation, Export and Compare objects which are run in order.

Will I be able to create dynamic classes at runtime in Oslo?

For instance, will I be able to create an application that allows users to create and modify existing types at runtime? Will I be able to persist instances of those types in SQL without having to worry about the user who adds 100,000 records and expects a (really) fast query on them?
Think SharePoint Content Types... but on steroids. Oslo steroids - Possible or not?
That would be awesome!
In the demos, they create the new extent, and then a (updatable) view with the old name.
But I haven't heard about a feature that would automatically merge existing data into the new structure, though. For now, they suggest using SQL Server Integration Services for that part - but then it's a DB-Admin task.
Regarding performance, after MSchema is compiled to SQL statements, it's all plain SQL Server performance.