Generate serie of queries in PENTAHO - automation

How do I build a general query based in a list of tables and run/export the results?
Here is my conceptual structure
conceptual structure
Basically the from clause in query would be inserted by the table rows.
After that, each schema.table with true result returned must be stored in a text archive.
Could someone help me?

As pentaho doesn't support loops, it's a bit complicated. Basically you read a list of tables and copy the rows to the result and then you have a foreach-job that runs for every row in the result.
I use this setup for all jobs like this: How to process a Kettle transformation once per filename
In the linked example, it is used to process files, but it can be easily adapted to your use.

Related

How to get table/column usage statistics in Redshift

I want to find which tables/columns in Redshift remain unused in the database in order to do a clean-up.
I have been trying to parse the queries from the stl_query table, but it turns out this is a quite complex task for which I haven't found any library that I can use.
Anyone knows if this is somehow possible?
Thank you!
The column question is a tricky one. For table use information I'd look at stl_scan which records info about every table scan step performed by the system. Each of these is date-stamped so you will know when the table was "used". Just remember that system logging tables are pruned periodically and the data will go back for only a few days. So may need a process to view table use daily to get extended history.
I ponder the column question some more. One thought is that query ids will also be provided in stl_scan and this could help in identifying the columns used in the query text. For every query id that scans table_A search the query text for each column name of the table. Wouldn't be perfect but a start.

SSIS task for variables from another table

I'm very new to SSIS. Prior to this I have been writing ETL procedures manually, hence my lack of confidence or familiarity in this Environment. I know the basics like execute SQL and Import using a static query.
However, I'm looking for the best way to do the following:
I have a lookup table with a list of records containing to fields and a flag.
For each record where the flag says 0, i want to import records using a query to dump data to one common table using the two fields in the where clause of the query that comes from the above mentioned lookup table.
Could someone please help this noob.
thanks in advance. will be much appreciate.
Use "Conditional split". It allows splitting row to many streams by a condition. In your case it would be "[Flag] == 0"

SSIS Moving data from one place to another

I was asked by the company I work for, to create SSIS that will take data from few tables in one data source and change few things in the data, then put it in few tables in the destination.
The main entity is "Person". In the people table, each person has a PersonID.
I need to loop on these records and for each person, take his names from the orders from the orders table, and other data from few other tables.
I know how to take data from one table and just move it to a different table in the destination. What I don't know is how to manipulate the data before dumping it in the destination. Also, how can i get data from few tables for each person id?
I need to be done with this very fast, so if someone can tell me which items in ssis i need to use and how, that will be greate.
Thanks
Microsoft has a few tutorials.
Typically it is easy to simply do your joins in SQL before extracting and use that query as the source for extraction. You can also do data modification in that query.
I would recommend using code in SSIS tasks for only things where SQL is problematic - custom scalar functions which can be quicker in the scripting runtime and handling disparate data sources.
I would start with the Data Flow Task.
Use the OledbSource to execute a stored proc that will read, manipulate and return the data you need.
Then you can pass that to either an OleDBDestination or an OleDBCommand that will move that to the destination.

is TSQL the right tool to product "transposed" query

I have a task which require the data manipulation to transpose the rows to columns. The data is stored in SQL Server, typical relational database model. I know it can be done with TSQL, but it is so complex that there are almost ten different row groups to be transposed to about 200 columns.
Just wondering whether there is other better tools to do that?
You will have to generate dynamic SQL to do this. Someone asked something similar a couple days ago:
sql-query-to-display-db-data
Are you using SQL Server 2005+? I have a stored procedure that makes it pretty easy to pivot data like that.
This post as an example of using it: SQL query result monthly arrangement
For 200 columns, you should probably break it up into logical groups and store the pivoted data in temp tables (the stored proc does that as an option) - then join up the temp tables for your final result. I've had squirrely results with more than 147 columns with PIVOT.
Some example set up source can be found here.
Without a detailed question, it's hard to say which way is the best. However, you can turn rows to columns in TSQL using PIVOT. You may want to check it out and see if it can do what you need.
I would also try to do it from the application as this might work faster than pivoting the data. But it is one of those things you probably need to try both ways to really see waht is best in your own particular case.

Spring Batch SQL Query with in clause

I am new to Spring batch.
I am currently developing a test project to learn Spring batch and I have run into an issue.
My requirement is that I need to query my oracle database to find the ids from one table and then pass those ids and get the details from other table for those ids. Currently I have roughly 300 ids.
I can get the ids but I am not sure how to pass those Ids in the Sql query in clause all at once to get the other fields which are stored in different table.
I am also open to other suggestions to solve this issue.
Thanks,
Nik
I can get the ids but I am not sure how to pass those Ids in the Sql query in clause all at once to get the other fields which are stored in different table
You can create a:
first step (tasklet) that gets those IDs and puts them in the execution context
second step (chunk-oriented) that reads those IDs from the execution context and use them in the in clause of the reader's query
Passing data between steps in explained in details in the Passing Data to Future Steps section of the reference documentation.
My requirement is that I need to query my oracle database to find the ids from one table and then pass those ids and get the details from other table for those ids
I am also open to other suggestions to solve this issue.
I suggest to use a common pattern called the Driving Query Pattern because I think it is a good fit for your requirement. The idea is that the reader gets only the IDs and a processor asks for the details of each ID from other tables.
Hope this helps.