I am new to PDI, my project will export data from multiple views in a postgresql database and output multiple files. One requirement is for each generated file, add two fields to show the start and end time of the transformation (from query a view to generating a file). What components/scripts to use?
Related
Question in regard of Pentaho Spoon (Data Integration):
How can I transfer the input of multiple tables from one database to multiple tables in another database? Basically a 1:1 data migration with creating the tables automatically in the target database.
I basically want to multiply the following transfomation: Picture of table transformation
Try the Copy Tables wizard, under the tools menu.
To use it, you will need to create a new transformation and define both database connections that you want to use.
I am developing a migration tool and using Talend ETL tool (Free edition).
Challenges faced:-
is it possible to create a Talend job that uses dynamic schema every time it runs i.e. no hard-coded mappings in tMap component.
I want user to give a input CSV/Excel file and the job should create mappings on the basis of that input file. Is it possible in talend?
Any other free source ETL tool can also be helpful, or any sample job.
Yes, this can be done in Talend but if you do not wish to use a tMap then your table and file must match exactly. The way we have implemented it is for stage tables which are all datatype of varchar. This works when you are loading raw data into a stage table, and your validation is done after the load, prior to loading the stage data into a data warehouse.
Here is a summary of our method:
the filenames contain the table name so the process starts with a tFileList and parsing out the table name from the file name.
using tMSSQLColumnList obtain each column name, type, and length for the table (one way is to store it as an inline table in tFixedFlowInput)
run this thru a tSetDynamicSchema to produce your dynamic for that table
use a file input reference the dynamic schema.
load that into a MSSQLOutput again referencing the dynamic schema.
One more note on data types. It may work with data types than varchar, but our stage tables only have varchar and datetime. We had issues with datetime, so we filtered out those column types with a tMap.
Keep in mind, this is a summary to point you in the right direction, not a precise tutorial. But with this info in your hands, it can save you many hours of work while building your solution.
I have Multiple CSV files in Folder
Example :
Member.CSv
Leader.CSv
I need to load them in to Data base tables .
I have worked on it using ForEachLoop Container ,Data FlowTask, Excel Source and OLEDB Destination
we can do if by using Expressions and Precedence Constraints but how can I do using Script task if I have more than 10 files ..I got Stuck with this one
We have a similar issue, our solution is a mixture of the suggestions above.
We have a number of files types sent from our client on a daily basis.
These have a specific filename pattern (e.g. SalesTransaction20160218.csv, Product20160218.csv)
Each of these file types have a staging "landing" table of the structure you expect
We then have a .net script task that takes the filename pattern and loads that data into a landing table.
There are also various checks that are done within the csv parser - matching number of columns, some basic data validation, before loading into the landing table
We are not good enough .net programmers to be able to dynamically parse an unknown file structure, create SQL table and then load the data in. I expect it is feasible, after all, that is what the SSIS Import/Export Wizard does (with some manual intervention)
As an alternative to this (the process is quite delicate), we are experimenting with a HDFS data landing area, then it allows us to use analytic tools like R to parse the data within HDFS. After that utilising PIG to load the data into SQL.
Database platform: SQL Server 2012
I have a folder with a lot of CSV's. I require the creation of a table for each CSV. The CSV has the column names in the first row, data in subsequent rows.
I have a handy SSIS package to iterate through a folder and import over into existing tables in a database but in this case, it is our first load and we would also like to create the tables as part of the process.
I know how to do it one at a time through the import wizard or SSIS DBO source, new table button. I was wonder if there was a more automated way using SSIS.
After further review of the 313 CSV's, I determined that 75% of them are lookup tables and the other 25% are relevant data. I will simply go through each one and build out a staging table for each one and then properly build out the structure. Only will take about 1 day to build one SSIS package to churn through all the CSV's I want to use and then I'm all set!
here is my newbie to SQL Server Problem.
I created a table with a flat file (.txt) using the Import and Export Data Wizard.
In order to use this table in Arc SDE, I had to create another field named ObjectID.
Need to do the following:
Use the daily generated flat file and erase data from table and replace with new data.
ObjectID field is derived and not in flat file, but need it to stay in table and autopopulate.
Develop script or sql statement
Set up daily process
provide error or completed reports.
Generally speaking, I would use an Integration Services package executed as part of a Sql Server Scheduled Task. Those are the two primary technologies involved, and that should get you started.
you can use ssis technologie (Business Intelligence Development Studio) BIDS, it allows you
to integrate your text file in your sql server table with tranformation you want (adding Object ID field )
a sample :http://www.databasejournal.com/features/mssql/article.php/3832386/Flat-File-Imports-with-SQL-Server-Integration-Services-2008.htm
make a schedule task using a sql agent
a sample here :http://decipherinfosys.wordpress.com/2008/09/17/scheduling-ssis-packages-with-sql-server-agent/