Using wbimport on sql workbench/j with presto driver - sql

so I am using workbench to import a 160k line text file into a table code is:
Wbimport
-usepgcopy
-type=text
-endrow=164841
-file=‘book1.csv’
-table=it.table1
-delimiter=,
-multiline=true
So I have tried this with a 3 line version of my 160k line file and it completed in a few seconds. This only seems to complete in auto commit mode, and when I try to run it on the full 160k line file it takes over 200h to complete any idea why - or alternatives.
I am using workbench build 125 and presto jdbc-0.216
Thanks

Most likely the reason is that the overall transaction gets to large and this imposes too big a load on wbimport and the jdbc connection. It would probably work much faster if you break this up into separation imports of e.g. 1000 records per import.
If you cut up the file first into multiple files and then import them one at a time, you are also avoiding the repeated read of the large file to find the right record.

Related

BigQuery faster way to insert million of rows

I'm using bq command line and trying to insert large amount of json files with one table per day.
My approach:
list all file to be push (date named YYYMMDDHHMM.meta1.meta2.json)
concatenate in the same day file => YYYMMDD.ndjson
split YYYMMDD.ndjson file (500 lines files each) YYYMMDD.ndjson_splittedij
loop over YYYMMDD.ndjson_splittedij and run
bq insert --template_suffix=20160331 --dataset_id=MYDATASET TEMPLATE YYYMMDD.ndjson_splittedij
This approach works. I just wonder if it is possible to improve it.
Again you are confusing streaming inserts and job loads.
You don't need to split each file in 500 rows (that applies to streaming insert).
You can have very large files for insert, see the Command line tab examples listed here: https://cloud.google.com/bigquery/loading-data#loading_csv_files
You have to run only:
bq load --source_format=NEWLINE_DELIMITED_JSON --schema=personsDataSchema.json mydataset.persons_data personsData.json
JSON file compressed must be under 4 GB if uncompressed must be under 5 TB, so larger files are better. Always try with 10 line sample file until you get the command working.

Importing Massive text file into a sql server database

I am currently trying to import a text file with 180+ million records with about 300+ columns into my sql server database. Needless to say the file is roughly 70 GBs large. I have been at it for days and when i get close something happens and it craps out on me. I need the quickest and most efficient way to do this import. I have tried the wizard which should have been the easiest, then i tried just saving as an ssis package. I havent been able to figure out how to do a bulk import with the settings i think would work great. The error i keep on getting is 'not enough virtual memory'. I changed my virtual memory to 36 gigs . My system has 24 gigs of physical memory. Please help me.
If you are using BCP (and you should be for files this large), use a batch size. Otherwise, BCP will attempt to load all records in one transaction.
By command line: bcp -b 1000
By C#:
using (System.Data.SqlClient.SqlBulkCopy bulkCopy =
new System.Data.SqlClient.SqlBulkCopy(sqlConnection))
{
bulkCopy.DestinationTableName = destinationTableName;
bulkCopy.BatchSize = 1000; // 1000 rows
bulkCopy.WriteToServer(dataTable); // May also pass in DataRow[]
}
Here are the highlights from this MSDN article:
Importing a large data file as a single batch can be problematic, so
bcp and BULK INSERT let you import data in a series of batches, each
of which is smaller than the data file. Each batch is imported and
logged in a separate transaction...
Try reducing the maximum server memory for SQL Server to as low as you can get away with. (Right click the SQL instance in Mgmt Studio -> properties -> memory).
This may free up enough memory for the OS & SSIS to process such a big text file.
I'm assuming the whole process is happening locally on the server.
I had a similar problem with SQL 2012 and trying to import (as a test) around 7 million records into a database. I too ran out of memory and had to cut the bulk import into smaller pieces. The one thing to note is that all the memory that the import process uses (no matter what manner you leverage) up a ton of memory and won't release said system memory until the server was rebooted. I'm not sure if this is intended behavior by SQL Server but it's something to note for your project.
Because I was using the SEQUENCE command with this process I had to leverage T-sql code saved as sql scripts and then use SQLCMD in small pieces to lessen the memory overhead.
You'll have to play around with what works for you and highly recommend to not run the script all at once.
It's going to be a pain in the ass to break it down in smaller pieces and import it in but in the long run you'll be happier.

Looking for efficient methods of loading large excel (xlsx) files into SQL

I'm looking for alternate data import solutions. Currently my process is as follows:
Open a large xlsx file in excel
Replace all "|" (pipes) with a space or another unique character
Save the file as pipe-delimited CSV
Use the import wizard in SQL Server Management Studio 2008 R2 to import the CSV file
The process works; however, steps 1-3 take a long time since the files being loaded are extremely large (approx. 1 million records).
Based on some research, I've found a few potential solutions:
a) Bulk Import - This unfortunately does not eliminate steps 1-3 mentioned above since the files need to be converted to a flat (or CSV) format
b) OpenRowSet/OpenDataSource - There are 2 issues with this problem. First, it takes a long time to load (about 2 hours for a million records). Second, when I try to load many files at once (about 20 file each containing 1 million records), I receive an "out-of-memory" error
I haven't tried SSIS; I've heard it has issues with large xlsx files
So this leads to my question. Are there any solutions/alternate options out there that will make importing of large excel files faster?
Really appreciate the help.
I love Excel as a data visualization tool but it's pants as a data transport layer. My preference is to either query it with the JET/ACE driver or use C# for non-tabular data.
I haven't cranked it up to the millions but I'd have to believe the first approach would have to be faster than your current simply based on the fact that you do not have to perform double reads and writes for your data.
Excel Source as Lookup Transformation Connection
script task in SSIS to import excel spreadsheet
Something I have done before (and I bring up because I see your file type is XLSX, not XLS) is open the file though winzip, pull the XML data out, then import it. Starting in 2007, the XLSX file is really a zip file with many folders/files in it. if the excel file is simple (not a lot of macros, charts, formating, etc), you can just pull the data from the XML file that is in the background. I know you can see it through WINZIP, I dont know about other compression apps.

importing a text file using pgAdmin

I have just downloaded pgAdmin 1.14.3 in an effort to import, query, and manage large textfiles. These textfiles are either quote comma quote delimited or tab delimited (they come as quote comma quote and I edited many for use with another software). While version 1.16 allows an import function, it has not been released yet and I am wondering how to import data into a newly created table using pgAdmin.
The text files range from 12MB to 2GB, so I'm looking for a comprehensive solution that would not involve importing row by row. I tried this with phppgadmin, but ran into file size limitations embedded in the php.ini file (separate post) and am trying this as a possible workaround. I'm a little new to SQL, so not really sure of all the commands possible at my fingertips. Any helps is appreciated - thanks!
You can issue a COPY statement, like this:
COPY table_name (column_name)
FROM 'd:\test.sql';
Query returned successfully: 6 rows affected, 31 ms execution time.
See the documentation here:
http://www.postgresql.org/docs/9.1/static/sql-copy.html
Note that I did not test this in PgAdmin for large files, but using psql I have never seen a case where the file had been too big for COPY.

Importing an .RPT (6 gigs) file into SQL Server 2005

I'm trying to import two seperate .RPT files into SQL, one is small, one is large. Both have issues with determining where the columns are seperated.
My solution for this was to import the file into access, define the columns and then save it as a txt file.
This worked perfectly.
The problem however is the larger file is 6 gigs and MS Access won't allow me to open it. When trying to change the extension to simply .txt and importing it into SQL, everything comes down under one column (despite there being 10) and there is no way to accurately seperate the data.
Please help!
As Tony stated Access has a hard 2GB limit on database size.
You don't say what kind of file the .RPT file is. If it is a text file, then you could break it into smaller chunks by reading it line by line and appending it into temporary files. Then import/export these smaller files one at a time.
Keep in mind the 2GB limit is on the Access database, so your temporary text files will need to be somewhat smaller because the import will likely introduce some additional overhead. Also, you may need to compact/repair the database in between import/export cycles to reclaim space in the database; simply deleting the records is not enough.
If the file has column delimiters or fixed column widths you can try the following in SQL Management Studio:
Right click on a database, select "Tasks" and then "Import data...". This will take you through a wizard where you can define the source columns and map them to an existing or new table.