Improve ETL from COBOL file to SQL - sql

I have a multiserver/multiprocess/multithreaded solution which can parse and extract over 7 million records from a 6gb EBCDIC Cobol file, to 27 SQL tables, all under 20 minutes. The problem; the actual parsing and extracting of the data really only takes about 10 minutes using bulk inserts into staging tables. It then takes almost another 10 minutes to copy the data from the staging tables to their final table. Any ideas on how I can improve the 2nd half of my process? I've tried using In-Memory tables but it blows out the SQL server.

Related

Merge rows from multiple tables SQL (incremental)

I'm consolidating the information of 7 SQL databases into one.
I've made a SSIS package and used Lookup transformation and I managed to get the result as expected.
The problem: 30 million rows. And I want to perform a daily task to add to the destination table the new rows in the source tables.
So it takes like 4 hours to execute the package...
Any suggestion ?
Thanks!
I have only tried full cache mode...

Divide a SQL file into smaller files and save each of them as a CSV/Excel/TXT file dynamically

So, I'm working on a SAP HANA database that has 10 million records in one table and there are 'n' number of tables in the db. The constraints that I'm facing are:
I do not have write access to the db.
The maximum RAM in the system is 6 GB.
Now, I need to extract the data from this table and save it as a csv or txt or excel file. I tried Select * from query. Using this the machine extracts ~700k records before showing an out of memory exception.
I've tried using LIMIT and OFFSET in SAP HANA and it works perfectly, but it takes around ~30 mins for the machine to process ~500k records. So, going by this route will be very time consuming.
So, I wanted to known if there is anyway by which I can automate the process of selecting 500k records using LIMIT and OFFSET and save each such sub-file containing 500k records automatically into as a csv/txt file on the system, so that I can run this query and leave the system overnight to extract data.

How to run an SQL code on a very large CSV file (4 million+ records) without needing to open it

I have a very large file of 4 million+ records that I want to run an sql query on. However, when I open the file it will only return 1 million contacts and not load the rest. Is there a way for me to run my query without opening the file so I do not lose any records? PS I am using a Macbook so some functions and add ins are not available for me.

Pentaho | Tools-> Wizard-> Copy Tables

I want to copy tables from one database to another database.
I have gone through google and find out that we can do this with Wizard option of Tools Menu in Spoon.
Currently I am trying to copy just one table from one database into another table.
My table has just 130 000 records and it took 10 mins to copy table.
Can we improve this loading timings? I mean just to copy 100k records, it should not take more than 10 seconds.
Try the mysql bulk loader - note: that is linux only
OR
fix the batch size:
http://julienhofstede.blogspot.co.uk/2014/02/increase-mysql-output-to-80k-rowssecond.html
You'll get massive improvements that way.

Insertion in Leaf table in SQL Server MDS is very slow

I am using SSIS to move data from an existing database to an MDS database.
I am following the following Control flow;
Truncate TableName_Leaf
Load Data to stg
The second step has the following data flow:
1. Load data from source database (This has around 90000 records)
2. Apply a data conversion task to convert string datatype to Unicode (as MDS only supports Unicode)
3. Specify TableName_Leaf as OLE DB destination.
The step 1 and 2 are completing quickly, but the insertion to Leaf table is extremely slow. (It took 40 seconds to move 100 rows end to end, and around 6 minutes to move 1000 records.)
I tried deleting extra constraints from the Leaf table, but that also did not improve the performance much.
Is there any other way to insert data to MDS which is quicker or better?
Using a Table or view - fast load. in OLE DB destination connection helped resolve the issue. I used a batch size of 1000 for my case and it worked fine.