We have a small size Spark cluster version 3.x. We have to analyze a database dump file obtained from MS SQL Exporting. Its format is .sql that contains first schema of each table and then insert statements. The schema is completely in MS SQL format.
I have searched a lot and found some connectors to MS SQL server but could not found any stuff related to the analysis of MSSQL dump file. What could be the possible way for this Purpose? I am using PySPark API.
Some references that I found in this regard:
https://learn.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver15
https://learn.microsoft.com/en-us/sql/big-data-cluster/spark-mssql-connector?view=sql-server-ver15
Related
My problem statement is that I have a csv blob and I need to import that blob into a sql table. Is there an utility to do that?
I was thinking of one approach, that first to copy blob to on-premise sql server using AzCopy utility and then import that file in sql table using bcp utility. Is this the right approach? and I am looking for 1-step solution to copy blob to sql table.
Regarding your question about the availability of a utility which will import data from blob storage to a SQL Server, AFAIK there's none. You would need to write one.
Your approach seems OK to me. Though you may want to write a batch file or something like that to automate the whole process. In this batch file, you would first download the file on your computer and the run the BCP utility to import the CSV in SQL Server. Other alternatives to writing batch file are:
Do this thing completely in PowerShell.
Write some C# code which makes use of storage client library to download the blob and once the blob is downloaded, start the BCP process in your code.
To pull a blob file into an Azure SQL Server, you can use this example syntax (this actually works, I use it):
BULK INSERT MyTable
FROM 'container/folder/folder/file'
WITH ( DATA_SOURCE = 'ds_blob',BATCHSIZE=10000,FIRSTROW=2);
MyTable has to have identical columns (or it can be a view against a table that yields identical columns)
In this example, ds_blob is an external data source which needs to be created beforehand (https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-data-source-transact-sql)
The external data source needs to use a database contained credential, which uses an SAS key which you need to generate beforehand from blob storage https://learn.microsoft.com/en-us/sql/t-sql/statements/create-database-scoped-credential-transact-sql)
The only downside to this mehod is that you have to know the filename beforehand - there's no way to enumerate them from inside SQL Server.
I get around this by running powershell inside Azure Automation that enumerates blobds and writes them into a queue table beforehand
I have an SQL query that generates a result regarding the daily data from the database. I want a csv formatted file to be generated everyday with this query and saved in a folder. Is there any way I can do this?
NOTE: I am using SQL Server Management Studio 2008 with regards to the DB.
This is a question about bulk export which well documented in MSDN - Importing and Exporting Bulk Data
I am using SQL Server 2008 R2 on my local PC. In my database there is one table with approx 98000 rows. Now I want to transfer that data directly to online server database. I have tried by making script of that table but when I run that script, it gives me error of insufficient memory. plz help me.. how can I do this. Thanks
There are a variety of strategies you can employ in this instance. Here's a few off the top of my head...
Got some .NET programming up your sleeve? Try the SqlBulkCopy class
Export the data to a transferable format, e.g. CSV file and then use BULK INSERT to insert the data.
Try using OPENROWSET to copy from the local to remote. Stackoverflow example
If you've got the full leverage of SSIS, example of SSIS is just here
A bit Heath Robinson but why not grab the data out into CSV and using some Excel skills, build the individual statements yourself. Example here using INSERT INTO and UNION
HTH
I have to import an csv file to SQL database table which already created (empty and has the same number of named columns). It would be great if you could suggest any tutorials or give some tips.
I assume you are using Microsoft SQL Server. Do you need to do this in a program or manually? There is a tutorial on using the bcp command for that, or alternatively a SQL command. If you need to parse the CSV file for your own code, see this previous SO question.
I have a bunch of xml files that is about 700 GB in size.
I'm going to load the data within those files into a SQL Server 2008 database table(tabular data).
In addition to the fields that will hold the data in a tabular format, the table will contain a field of SQL Server XML type that holds the xml data as a whole.
I want to use the FILESTREAM feature of SQL Server 2008 instead of loading the whole xml into the field.
I want to know the benefits the performance of the queries that will be made on such a very large-table will gain and the pros and cons of this feature.
Thank you in advance.
I do not expect this will ever be marked as the answer because the true answer will only be discovered after an through study of available solutions.
BUT
The answer I have is really a question for you. How are you going to use this data? IF your are going to shread the xml to retrieve the reporting values and keep the complete xml for a reference then I would goto Filestream. If you are going to run reports directly from the xml then you will have to load the data into the database creating the needed indexes.
Loading all data into SQL Server as a combination of shreaded xml and an xml datatype
PRO
All data is avaiable all the time from one source
A single backup contains all data
Additional data from XML can be shreaded to enhance reports on server side
CON
- Backup size
- Backup time
- Slow if data is in native XML
Loading values from XML into SQL Server and using Filestream
PRO
Data source (filestream) is tied to data values
Source data can be presented to client
Con
Filestream content is not available directly from within query
Filestream and SQL backups to syncronize for disaster recovery
Be aware of your storage needs for backups and the maintenaince window need.