XML file generation using Windows Azure SQL Database - azure-sql-database

Could someone tell me the best way to generate XML file with data from Windows Azure SQL database ?
Basically, I want to create a XML with data from Windows Azure SQL Database by querying certain table and the data is huge (around 90 MB). As I need to run this job at least every couple of hours, this should perform very good.
Any suggestions ?
Thanks,
Prav

This is a very general question, and not really specific to SQL Azure, so it's difficult to give you a good answer. I would suggest you research the different ways to query a SQL database, and also the different ways of writing XML. That might give you ideas for more specific questions to ask.
90MB is not particularly large - it shouldn't be difficult to that into memory. But nonetheless you might want to consider approaches that keep only a small part of the data in memory at once. eg, reading data from a SqlDataReader and immediately writing it to an XmlTextWriter, or something along those lines.

One way to do what you are looking for is to query the database and have the result saved to an ADO.net DataTable. Once you have the DataTable, give it a name of your choosing using the TableName property. Then use the WriteXml method of the DataTable to save the DataTable to a location of your choosing. Make sure to specify the XmlWriteMode.WriteSchema to make sure you save the schema and the data.
Note that if the datatable is going to be 2Gb or greater, you hit the default object memory limit of .Net. One solution is to break your query into smaller chunks and store multiple datatables in XML format per original query. Another solution, is to increase the max size of an object in .Net to greater than 2Gb. This comes with it's own set of risks and performance issues. However, to exceed that 2Gb object size restriction, make sure that you're application is 64 bit, the application is compiled to .Net 4.5 or later and the app.config file should have gcAllowVeryLargeObjects enabled="true".
using System.Data;
using System.Data.SqlClient;
string someConnectionString = "enter Aure connection string here";
DataTable someDataTable = new DataTable();
SqlConnection someConnection = new SqlConnection(someConnectionString);
someConnection.Open();
SqlDataReader someReader = null;
// enter your query below
SqlCommand someCommand = new SqlCommand(("SELECT * FROM [SomeTableName]"), someConnection);
// Since your are downloading a large amount of data, I effectively turned the timeout off in the line below
someCommand.CommandTimeout = 0;
someReader = someCommand.ExecuteReader();
someDataTable.Load(someReader);
someConnection.Close();
// you need to name the datatable before saving it in XML
someDataTable.TableName = "anyTableName";
string someFileNameAndLocation = #"C:\Backup\backup1.xml";
// the XmlWriteMode is necessary to save the schema and data of the datatable
someDataTable.WriteXml(someFileNameAndLocation, XmlWriteMode.WriteSchema);

Related

Copying Tables contents of one Database to another from a vb.NET app using OracleDataAdapter.InsertCommand

So the rundown of what I'm trying to achieve is essentially an update app that will pull data from our most recent production databases and copy it's contents to the Devl or QA databases. I plan to limit what is chosen by a number of rows so to increase the consistency that this update can happen by allowing us to only get what we need, as for right now these databases rarely get updated due to the vast size of the copy job. The actual pl/sql commands will be stored in a table that I plan to reference for each table, but I'm currently stuck on the best and easiest way to transfer these between these two databases while still getting my commandText to be used. I figured the best way would be to use the OracleDataAdapter.InsertCommand command, but very few examples can be found as to what I'm doing, any suggestions aside from the .InsertCommand are welcome as I'm still getting my footing with Oracle all together.
Dim da As OracleDataAdapter = New OracleDataAdapter
Dim cmd As New OracleCommand()
GenericOraLoginProvider.Connect()
' Create the SelectCommand.
cmd = New OracleCommand("SELECT * FROM TAT_TESTTABLE ", GenericOraLoginProvider.Connection())
da.SelectCommand = cmd
' Create the InsertCommand.
cmd = New OracleCommand("INSERT INTO TAT_TEMP_TESTTABLE", GenericOraLoginProvider.Connection())
da.InsertCommand = cmd
Question: This is an example of what I've been trying as a first step with the Insert command, TAT_TESTTABLE and TAT_TEMP_TESTTABLE are just junk tables that I loaded with data to see if I could move things the way I wanted this way.
As why I'm asking this question the data isn't transferring over, while these tables are on the same database in the future they will not be along with the change to the previously mentioned pl/sql commands. Thankyou for any help, or words of wisdom you can provide, and sorry for the wall of text I tried to keep it specific.
lookup sqlbulkcopy. I use this to transfer data between all kinds of vendor databases.. https://msdn.microsoft.com/en-us/library/ex21zs8x(v=vs.110).aspx

How do I select just a portion of huge binary (file)?

My problem is this: I have the potential for huge files being stored in a binary (image) field on SQL Server 2008 (> 1GB).
If I return the entire binary using a regular select statement, the query takes more than a minute to return results to my .NET program and my client apps time out. What I'm looking for is TSQL code that will limit the size of the data returned (maybe 300mb), allowing me to iterate through the remaining chunks and prevent timeouts.
This has to happen in the SQL query, not in the processing after the data is returned.
I've tried SubString, which MS says works with binary data, but all I get back is 8000 bytes maximum. The last thing I tried looked like this:
select substring(Package,0,300000000) 'package', ID from rplPackage where ID=0
--where package is the huge binary stored in a image field
Data Streaming isn't really an option either, because of the client apps.
Any ideas?
OK, I figured it out. The way to do this is with the substring function, which MS accurately says works with binaries. What they don't say is that substring will return only 8,000 bytes, which is what threw me.
In other words, if the blob data type is image and you use this:
select substring(BlobField,0,100000000)
from TableWithHugeBlobField
where ID = SomeIDValue
--all you'll get is the first 8K bytes (use DataLength function to get the size)
However, if you declare a variable of varbinary(max) and the blob field data type is varbinary(max) - or some size that's useful to you - then use the substring function to bring back the partial binary into the variable you declared. This works just fine. Just like this:
Declare #PartialImage varbinary(max)
select #PartialImage = substring(BlobField, 0, 100000000) --1GB
from TableWithHugeBlobField
where ID = SomeIDValue
select DataLength(#PartialImage) -- should = 1GB
The question was posed earlier, why use SQL to store file data? It's a valid question; imagine you're having to replicate data as files to hundreds of different client devices (like iPhones), each package unique from the other because different clients have different needs, then storing the file packages as blobs on a database is a lot easier to program against than it would be to programmatically dig through folders to find the right package to stream out to the client.
Use this:
select substring(*cast(Package as varbinary(max))*,0,300000000) 'package', ID
from rplPackage
where ID=0
Consider using FileStream
FILESTREAM Overview
Managing FILESTREAM Data by Using Win32
sqlFileStream.Seek(0L, SeekOrigin.Begin);
numBytes = sqlFileStream.Read(buffer, 0, buffer.Length);

Best way to store Sql Scripts in a database table

I need to store stored procedure execution scripts in a database table.
As an example:
exec proc_name 'somedata'
These are for execution at a later time after the data that will be changed has gone through a moderation process.
What is the best way to cleanse the script so that the statement cannot be used for sql injection.
Is there a specific type for encoding that I can use? Or is it as simple as doing a replacement on the '
Then it sounds like you would want to use a varchar(max) column and have a separate table for parameters.. If you use Parameters you should be safe from SQL injections. See quickie C# example below:
C# psuedo-code example
SQLCommand command = new SQLCommand("select * from myScripts where scriptid = #scriptid");
SQLParameter param = new SQLParameter("#scriptid", 12, int);
...new SQLCommand("select * from myParams where scriptid = #scriptid");
...new SQLParameter...
DataReader dr = new blah blah...
SQLCommand userCommand = new SQLCommand(dr['sql']);
foreach (parameter in params)
{
userCommand.Parameter.Add(parameter['name'], value);
}
userCommand.Execute...
There is no way to "cleanse" scripts.
The only way to secure your code is to separate the code from data. And "cleanse" the data only.
That's why we have our code separated from data.
The code is solid and secured, and data is variable and haver to be "cleansed".
As you are breaking this fundamental law, treating the code as data, there is no way to secure it.
Judging by the utter unusualness of the task, I'd say there is a proper solution for sure.
You just choose the wrong architecture.
So, you'd better ask another question, something like "I want to deal with quite complex metadata structure (with the structure and the purpose provided)" and you will get a proper solution that will require no storing SQL codes among the data.
You can either store your scripts for later execution in a Stored Procedure or a scheduled job. I don't see any reason for encoding a stored procedure, as you can put user privileges to prevent different users from reading or even seeing them.

INSERTing data from a text file into SQL server (speed? method?)

Got about a 400 MB .txt file here that is delimited by '|'. Using a Windows Form with C#, I'm inserting each row of the .txt file into a table in my SQL server database.
What I'm doing is simply this (shortened by "..." for brevity):
while ((line = file.ReadLine()) != null)
{
string[] split = line.Split(new Char[] { '|' });
SqlCommand cmd = new SqlCommand("INSERT INTO NEW_AnnualData VALUES (#YR1984, #YR1985, ..., #YR2012)", myconn);
cmd.Parameters.AddWithValue("#YR1984", split[0]);
cmd.Parameters.AddWithValue("#YR1985", split[1]);
...
cmd.Parameters.AddWithValue("#YR2012", split[28]);
cmd.ExecuteNonQuery();
}
Now, this is working, but it is taking awhile. This is my first time to do anything with a huge amount of data, so I need to make sure that A) I'm doing this in an efficient manner, and that B) my expectations aren't too high.
Using a SELECT COUNT() while the loop is going, I can watch the number go up and up over time. So I used a clock and some basic math to figure out the speed that things are working. In 60 seconds, there were 73881 inserts. That's 1231 inserts per second. The question is, is this an average speed, or am I getting poor performance? If the latter, what can I do to improve the performance?
I did read something about SSIS being efficient for this purpose exactly. However, I need this action to come from clicking a button in a Windows Form, not going through SISS.
Oooh - that approach is going to give you appalling performance. Try using BULK INSERT, as follows:
BULK INSERT MyTable
FROM 'e:\orders\lineitem.tbl'
WITH
(
FIELDTERMINATOR ='|',
ROWTERMINATOR ='\n'
)
This is the best solution in terms of performance. There is a drawback, in that the file must be present on the database server. There are two workarounds for this that I've used in the past, if you don't access to the server's file system from where you're running the process. One is to install an instance of SQL Express on the workstation, add the main server as a linked server to the workstation instance, and then run "BULK INSERT MyServer.MyDatabase.dbo.MyTable...". The other option is to reformat the CSV file as XML, which can be processed very quickly, and then passing the XML to query and processing it using OPENXML. Both BULK INSERT and OPENXML are well documented on MSDN, and you'd do well to read through the examples.
Have a look at SqlBulkCopy on MSDN, or the nice blog post here. For me that goes up to tens of thousands of inserts per second.
I'd have to agree with Andomar. I really quite like SqlBulkCopy. It is really fast (you need to play around with BatchSizes to make sure you find one that suits your situation.)
For a really in depth article discussing the various options, check out Microsoft's "Data Loading Performance Guide";
http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx
Also, take a look at the C# example with SqlBulkCopy of CSV Reader. It isn't free, but if you can write a fast and accurate parser in less time, then go for it. At least, it'll give you some ideas.
I have fonud SSIS to be much faster than this type of method but there are a bunch of variables that can affect performence.
If you want to experiment with SSIS, use the Import and Export wizard in Management Studio to generate a SSIS package that will import a pipe delimited file. You can save out the package and run it from a .NET application
See this article: http://blogs.msdn.com/b/michen/archive/2007/03/22/running-ssis-package-programmatically.aspx for info on how to run an SSIS package programatically. It includes options on how to run from the client, from the server, or wherever.
Also, take a look at this article for additional ways you can improve bulk insert performance in general. http://msdn.microsoft.com/en-us/library/ms190421.aspx

insert blob via odp.net through dblink size limitations

i am using ODP.NET (version 2.111.7.0) and C#, OracleCommand & OracleParameter objects and OracleCommand.ExecuteNonQuery method
i was wondering if there is a way to insert a big byte array into an oracle table that resides in another database, through DB link. i know that lob handling through DB links is problematic in general, but i am a bit hesitant to modify code and add another connection.
will creating a stored procedure that takes blob as parameter and talks internally via dblink make any difference? don't think so..
my current situation is that Oracle will give me "ORA-22992: cannot use LOB locators selected from remote tables" whenever the parameter i pass with the OracleCommand is a byte array with length 0, or with length > 32KB (i suspect, because 20KB worked, 35KB didn't)
I am using OracleDbType.Blob for this parameter.
thank you.
any ideas?
i ended up using a second connection, synchronizing the two transactions so that commits and rollbacks are always performed jointly. i also ended up actually believing that there's a general issue with handling BLOBs through a dblink, so a second connection was a better choice, although in my case the design of my application was slightly disrupted - needed to introduce both a second connection and a second transaction. other option was to try and insert the blob in chunks of 32k, using PL/SQL blocks and some form of DBMS_LOB.WRITEAPPEND, but this would require more deep changes in my code (in my case), therefore i opted for the easier and more straightforward first solution.