I have an Excel file containing raw data without primary keys and with many fields that don't exist in my MySQL database tables. How do I perform an ETL (in Pentaho Kettle) that can:
1. Retrieve the necessary columns
2. Edit and attach an ID_KEY column
3. Generate and increment id_key
4. Inject all data into my MySQL database tables
Follow the below steps:
Use "Microsoft Excel Input" Step: Read the excel file which is your source data.
Next, use "Select Values" step to edit the input data in the format you need.
In the next hop, use "Add Sequence" Step to generate an unique key for every input rows.
Finally use a "Table Output" Step with the MySQL database credentials to upload your data.
Hope it helps :)
Related
I am doing insert/update step (text file to DB) on spoon and I have a question.
Suppose that in my text file I have 10 columns and in my DB I have 18, because 8 columns will be completed from another text file later.
On insert/update step, I chose a key to look up the value (which is client_id, for example) and on "Update fields", I did mappings for those 10 columns. When I checked SQL query, I saw those 8 columns will be dropped.
But I want to keep them. Any solution for it?
The Insert/Update step will NOT drop columns when run normally.
The SQL button inspects the table and suggests changes based on the fields you specified in the step. It's only a convenience for quick ETL development, for example when sending rows from text files to staging table using a Table Output step. It only drops columns if you execute the script it generates. Don't do that and your columns will be perfectly safe!
I have database a with schema foo which contains 20 tables. I want to move all of the contents of schema foo into database b without overriding the current content in database b.
Is there also a way to do it in pgadmin?
I found this link and perhaps it will be quite similar. But this particular link is for transferring a table.
Copy a table from one database to another in Postgres
You can script the first database with all its data once scripted you can run the th script within the other database it should work as long as you dont have tables in the second database with the same name
so in pg admin follow these steps to script the
-Right click on the database and click Backup.
-Selece a filepath and filename on where you want to save your script
-Select Plain as the format in the format dropdown.
-Go to Options and check "schema and data" in tab # 1.
-Then click on Backup.
-Then click Done.
-Then right click on your 2nd database and create a new query.
-Find where you saved the script and copy the script to the query
-run the query and should be all good
if you are unsure about this just create 2 practice databases and practice on those before you do it on the main one
I'm working on an SSIS project that pulls data form Excel and loads to Oracle Database every month. I plan to pull data from Excel file and load to Oracle stage table. I will be using a merge statement because the data that gets loaded each month is a rolling 12 month list and the data can change, so need to be able to INSERT when records don't match or UPDATE when they do. My control flow looks like this: Truncate Stage Table (to clear out table from last package run)---> DATA FLOW from Excel to Stage Table---> Merge to Target Table in Oracle.
My problem is that the data in the source Excel file doesn't have any unique columns to select a primary key or a composite key, as it is a possibility (although very unlikely) that a new record could have the exact same information. I am unable to utilize the "generated always as identity" because my SSIS package needs to truncate at the beginning of each job to clear out the Stage Table. This would generate the same ID numbers in the new load and create problems in the Target Table.
Any suggestions as to how I can get around this problem?
Welcome to SO and ETL. Instead of using a staging table, in SSIS use two sources: Excel file and existing production table. Sort both inputs and then perform a merge join on the unique identifier. From there, use a derived column transformation to add a new column called 'Action' which will mark a row as either an INSERT/UPDATE/DELETE based on whether the join key is NULL. So:
NULL from file means DELETE (not in file, in database)
NULL from database means INSERT (in file, not in database)
Not NULL for both means UPDATE (in file, in database)
From there, use a conditional split to split rows to either a OLE DB Destination (INSERT), or SQL Command (UPDATE or DELETE). You can now remove the stage environment and MERGE command from your process. This has the added benefit of removing the ETL load from the SQL Server, assuming SSIS is running on a separate server.
Note: The sort transformation has the option to remove duplicates.
I want to get a backup of a single table with its data from a database in SQL Server using a script.
How can I do that?
SELECT * INTO mytable_backup FROM mytable
This makes a copy of table mytable, and every row in it, called mytable_backup. It will not copy any indices, constraints, etc., just the structure and data.
Note that this will not work if you have an existing table named mytable_backup, so if you want to use this code regularly (for example, to backup daily or monthly), you'll need to run drop mytable_backup first.
You can use the "Generate script for database objects" feature on SSMS.
Right click on the target database
Select Tasks > Generate Scripts
Choose desired table or specific object
Hit the Advanced button
Under General, choose value on the Types of data to script. You can select Data only, Schema only, and Schema and data. Schema and data includes both table creation and actual data on the generated script.
Click Next until wizard is done
There are many ways you can take back of table.
BCP (BULK COPY PROGRAM)
Generate Table Script with data
Make a copy of table using SELECT INTO, example here
SAVE Table Data Directly in a Flat file
Export Data using SSIS to any destination
You can create table script along with its data using following steps:
Right click on the database.
Select Tasks > Generate scripts ...
Click next.
Click next.
In Table/View Options, set Script Data to True; then click next.
Select the Tables checkbox and click next.
Select your table name and click next.
Click next until the wizard is done.
For more information, see Eric Johnson's blog.
Put the table in its own filegroup. You can then use regular SQL Server built in backup to backup the filegroup in which in effect backs up the table.
To backup a filegroup see:
https://learn.microsoft.com/en-us/sql/relational-databases/backup-restore/back-up-files-and-filegroups-sql-server
To create a table on a non-default filegroup (its easy) see:
Create a table on a filegroup other than the default
Another approach you can take if you need to back up a single table out of multiple tables in a database is:
Generate script of specific table(s) from a database (Right-click database, click Task > Generate Scripts...
Run the script in the query editor. You must change/add the first line (USE DatabaseName) in the script to a new database, to avoid getting the "Database already exists" error.
Right-click on the newly created database, and click on Task > Back Up...
The backup will contain the selected table(s) from the original database.
To get a copy in a file on the local file-system, this rickety utility from the Windows start button menu worked:
"C:\Program Files (x86)\Microsoft SQL Server\110\DTS\Binn\DTSWizard.exe"
I have a database1 which has more than 500 tables and I have database2 which also has the same number of tables and in both the databases the name of tables are same.. some of the tables have different table definitions, for example a table reports in database1 has 9 columns and the table reports in database2 has 10.
I want to copy all the data from database1 to database2 and it should overwrite the same data and append the columns if structure does not match. I have tried the import export wizard in SQL Server 2008 but it gives an error when it comes to the last step of copying rows. I don't have the screen shot of that error right now, it is my office PC. It says that error inserting into the readonly column xyz, some times it says that vs_isbroken, for the read only column error as I mentioned a enabled the identity insert but it did not help..
Please help me. It is an opportunity in my office for me.
SSIS and SQL Server 2008 Wizards can be finicky tools.
If you get a "can't insert into column ABC", then it could be one of the following:
Inserting into a PK column -> when setting up the mappings, you need to indicate to overwrite the value
Inserting into a column with a smaller range -> for example from nvarchar(256) into nvarchar(50)
Inserting into a calculated column (pointed out by #Nick.McDermaid)
You could also get issues with referential integrity if your database uses this (most do).
If you're going to do this more often, then I suggest you build an SSIS package instead of using the wizard tooling. This way you will see warnings on all sorts of issues like the ones I've described above. You can then run your package on demand.
Another suggestion I would make, is that you insert DB1 into "stage" tables in DB2. These tables should have no relational integrity and will allow you to break the process into several steps as follows.
Stage the data from DB1 into DB2
Produce reports/queries on issues pertinent to your database/rules
Merge the data from stage tables into target tables using SQL
That last step is where you can use merge statements, or simple insert/updates depending on a key match. Using SQL here in the local database is then able to use set theory to manage the overlap of the two sets and figure out what is new or to be updated.
SSIS "can" do this, but you will not be able to do a bulk update using SSIS, whereas with SQL you can. SSIS would do what is known as RBAR (row by agonizing row), something slow and to be avoided.
I suggest you inform your seniors that this will take a little longer to ensure it is reliable and the results reportable. Then work step by step, reporting on each stages completion.
Another two small suggestions:
Create _Archive tables of each of the stage tables and add a Tstamp column to each. Merge into these after the stage step which will allow you to quickly see when which rows were introduced into DB2
After stage and before the SQL merge step, create indexes on your stage tables. This will improve the merge performance
Drop those Indexes after each merge, this will increase the bulk insert Performance
Basic on Staging (response to question clarification):
Links:
http://www.codeproject.com/Articles/173918/How-to-Create-your-First-SQL-Server-Integration-Se
http://www.jasonstrate.com/tag/31daysssis/
http://blogs.msdn.com/b/andreasderuiter/archive/2012/12/05/designing-an-etl-process-with-ssis-two-approaches-to-extracting-and-transforming-data.aspx
Staging is the act of moving data from one place to another without any checks.
First you need to create the target tables, the schema should match the source tables.
Open up BIDS and create a new Project and in it a new SSIS package.
In the package, create a connection for the source server and another for the destination.
Then create a data flow step, in the step create a data source for each table you want to copy from.
Connect each source to a new data destination and set the appropriate connection and table.
When done, save and do a test run.
Before the data flow step, you might like to add a SQL step that will truncate all the target tables.
If you're open to using tools then what about using something like Red Gate Sql Compare and Red Gate SQL Data Compare?
First I would use data compare to manage the schema differences, add the new columns you want to your destination database (database2) from the source (database1). Then with data compare you match the contents of the tables any columns it can't match based on names you specify how to handle. Then you can pick and choose what data you want to copy from your destination. So you'll see what data is new and what's different (you can delete data in the destination that's not in the source or ignore it). You can either have the tool do the work or create you a script to run when you want.
There's a 15 day trial if you want to experiment.
Seems like maybe you are looking for Replication technology as is offered by SQL Server Replication.
Well, if i understood your requirement correctly, you need to make database2 a replica of database1. Why not take a full backup of database1 and restore it as database2? Your database2 will be exactly what database1 is at the time of backup.