I have a CSV file that needs to be BULK INSERTed in my database.
The actual scheme is:
Client generates file01.csv
Client moves file01.csv into the shared folder \\SERVERNAME\Sharing, that points to C:\Data in the Server
Client tells database the file is called file01.csv
Server BULK INSERTs C:\Data\file01.csv into the final table
Server removes the file01.csv from its queue
(It'll be deleted later)
The Windows shared folders are a bit buggy and unstable, so I want to make it a bit different:
Client generates file01.csv
Client inserts file01.csv in VARBINARY(MAX) column
Server simulates the CSV from the VARBINARY and BULK INSERTs it into the final table
(without generating any file in the server side)
The only way I found to make the second option happen is:
Server generates temp.csv from the VARBINARY data
Server BULK INSERTs temp.csv into the final table
(It'll be deleted later)
Is there a way to use a VARBINARY variable instead of a file in the BULK INSERT?
Or if it isn't possible, is there a better way to do this?
(Searched Google for a answer and found only how to read a VARBINARY value from a CSV file, so my question may be a duplicate)
One way you can do what you describe is to create an SSIS package (or console app for that matter) with a script task that reads the Varbinary column into a single-column DataTable, parses into a "final-table-formatted" data table, and then does the bulk insert. The whole process would be in-memory.
You could insert the varbinary(max) data into a FileTable. SQL Server would actually write the data to a local file, which you could then process with BULK INSERT.
Related
I have a table in SQL Server where I need to insert data on regular base. Each day I perform same task importing data manually, it makes me feel tedious so I need your help. Is it possible to send data from CSV file to SQL Server's existing table without doing manual procedure.
Or using python to create a scrip that send data from CSV file to SQL Server at fixed time automatically.
First you have to create a python script that inserts data into SQL server after reading CSV file. Then you should create a CRON job on your server that runs this script regularly. This might be a possible solution for your problem.
I have an external application which creates CSV files. I would like to write these files into SQL automatically but as incremental.
I was looking into Bulk Insert, but i do not think this is incremental. The CSV files can get huge so incremental will be the way to go.
Thank you.
The usual way to handle this is to bulk insert the entire CSV into a staging table, and then do the incremental merge of the data in the staging table into the final destination table with a stored procedure.
If you are still concerned that the CSV files are too big for this, the next step is to write a program that reads the CSV, and produces a truncated file with only the new/changed data that you want to import, and then bulk insert that smaller CSV instead of the original one.
Create a text or csv file with names of all the csv files which you want to load in the table. You can include file path if not repeated. You can do this using shell scripting.
Then make a temporary table which loads all the csv file names which need to be inserted. Using a procedure.
Using above temporary table, loop by the number of rows and load it to the target table(without truncating in the loop). If truncate is required then do it before the loop. You can load data to staging if any transformation is required(use procedure for transformation)
We also had the same problem and we were using this method. Recently, we switched to using Python which does all the task and loads data into a staging table. After transformations, it is finally loaded into the target table.
My problem statement is that I have a csv blob and I need to import that blob into a sql table. Is there an utility to do that?
I was thinking of one approach, that first to copy blob to on-premise sql server using AzCopy utility and then import that file in sql table using bcp utility. Is this the right approach? and I am looking for 1-step solution to copy blob to sql table.
Regarding your question about the availability of a utility which will import data from blob storage to a SQL Server, AFAIK there's none. You would need to write one.
Your approach seems OK to me. Though you may want to write a batch file or something like that to automate the whole process. In this batch file, you would first download the file on your computer and the run the BCP utility to import the CSV in SQL Server. Other alternatives to writing batch file are:
Do this thing completely in PowerShell.
Write some C# code which makes use of storage client library to download the blob and once the blob is downloaded, start the BCP process in your code.
To pull a blob file into an Azure SQL Server, you can use this example syntax (this actually works, I use it):
BULK INSERT MyTable
FROM 'container/folder/folder/file'
WITH ( DATA_SOURCE = 'ds_blob',BATCHSIZE=10000,FIRSTROW=2);
MyTable has to have identical columns (or it can be a view against a table that yields identical columns)
In this example, ds_blob is an external data source which needs to be created beforehand (https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql)
The external data source needs to use a database contained credential, which uses an SAS key which you need to generate beforehand from blob storage https://learn.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql)
The only downside to this mehod is that you have to know the filename beforehand - there's no way to enumerate them from inside SQL Server.
I get around this by running powershell inside Azure Automation that enumerates blobds and writes them into a queue table beforehand
I have a BCP process to move data from one server to another server, but it takes two trips: one to a .dat file, and one to the destination server. Is there any way to send all of the data directly to the destination server?
I'm trying to improve the speed of this process.
Assuming that you're using SQL Server 2005+, then SSIS; BCP writes to a file, but SSIS can go from one connection to another. Here's a few articles on how to bulk load data in SSIS:
Optimizing Bulk Import Performance
http://msdn.microsoft.com/en-us/library/ms190421(v=sql.105).aspx
The Data Loading Performance Guide
http://technet.microsoft.com/en-us/library/dd425070(SQL.100).aspx
We Loaded 1TB in 30 Minutes with SSIS, and So Can You
http://msdn.microsoft.com/en-us/library/dd537533(v=sql.100).aspx
I am a C# developer, I am not really good with SQL. I have a simple questions here. I need to move more than 50 millions records from a database to other database. I tried to use the import function in ms SQL, however it got stuck because the log was full (I got an error message The transaction log for database 'mydatabase' is full due to 'LOG_BACKUP'). The database recovery model was set to simple. My friend said that importing millions records using task->import data will cause the log to be massive and told me to use loop instead to transfer the data, does anyone know how and why? thanks in advance
If you are moving the entire database, use backup and restore, it will be the quickest and easiest.
http://technet.microsoft.com/en-us/library/ms187048.aspx
If you are just moving a single table read about and use the BCP command line tools for this many records:
The bcp utility bulk copies data between an instance of Microsoft SQL Server and a data file in a user-specified format. The bcp utility can be used to import large numbers of new rows into SQL Server tables or to export data out of tables into data files. Except when used with the queryout option, the utility requires no knowledge of Transact-SQL. To import data into a table, you must either use a format file created for that table or understand the structure of the table and the types of data that are valid for its columns.
http://technet.microsoft.com/en-us/library/ms162802.aspx
The fastest and probably most reliable way is to bulk copy the data out via SQL Server's bcp.exe utility. If the schema on the destination database is exactly identical to that on the source database, including nullability of columns, export it in "native format":
http://technet.microsoft.com/en-us/library/ms191232.aspx
http://technet.microsoft.com/en-us/library/ms189941.aspx
If the schema differs between source and target, you will encounter...interesting (yes, interesting is a good word for it) problems.
If the schemas differ or you need to perform any transforms on the data, consider using text format. Or another format (BCP lets you create and use a format file to specify the format of the data for export/import).
You might consider exporting data in chunks: if you encounter problems it gives you an easier time of restarting without losing all the work done so far.
You might also consider zipping the exported data files up to minimize time on the wire.
Then FTP the files over to the destination server.
bcp them in. You can use the bcp utility on the destination server for the BULK IMPORT statement in SQL Server to do the work. Makes no real difference.
The nice thing about using BCP to load the data is that the load is what is described as a 'non-logged' transaction, though it's really more like a 'minimally logged' transaction.
If the tables on the destination server have IDENTITY columns, you'll need to use SET IDENTITY statement to disable the identity column on the the table(s) involved for the nonce (don't forget to reenable it). After your data is imported, you'll need to run DBCC CHECKIDENT to get things back in synch.
And depending on what your doing, it can sometimes be helpful to put the database in single-user mode or dbo-only mode for the duration of the surgery: http://msdn.microsoft.com/en-us/library/bb522682.aspx
Another approach I've used to great effect is to use Perl's DBI/DBD modules (which provide access to the bulk copy interface) and write a perl script to suck out the data from the source server, transform it and bulk load it directly into the destination server, without having to save it to disk and move it. Also means you can trap errors and design things for recovery and restart right at the point of failure.
Use BCP to migrate data.
Another approach i have used in the past is to take a backup of the transaction log and shrink the log Prior to the migration. Split the migration script in parts and run the log backup- shrink - migrate iteration a few times.