SSIS Bulk XML Import - sql

I think I may be a bit over my head on this one for what my sql skills are. I have an xml file that is almost 3GB large. I need a process to import this data and insert it into tables. When I try to use SSIS xml source it complains about the XSD being complex with multiple data types. I then tried to convert the xml to csv using XLST.
I have a link to my skydrive below where I have teh XSD since I could not upload documents here. I am looking for guidance and advice on how to get this data into sql. Any help is appreciated.
https://skydrive.live.com/?cid=d75b2e7f757393ef&sc=documents&id=D75B2E7F757393EF%21286

The built in XML components in SSIS are relatively limited. For a larger, more complex XML source you might consider pulling the file in a script task and shredding using C#

You can bulk insert any size XML file using DataStreams and SqlBulkCopy.

In your schema, do you have any complex types that have mixed="true" defined? SSIS does not support this. You would need to change your content model on complex types to not be mixed, or preprocess the documents with an XSLT.
Try running the XSLT by removing mixed="true" for testing before you invest time in pre-processing.

Related

Equivalent of SQLXML Bulk Load in PostgreSQL

Is there an equivalent of Microsoft .Net's SQLXML Bulk Load (http://msdn.microsoft.com/en-us/library/ms171878.aspx) for Postgresql/PostGIS that I can run on linux? I have a huge and complicated XML file I'd like to import into PostGIS on a linux server without having to write a ton of code to shred the XML. I already have the XSD for it (this one: http://www.dft.gov.uk/transxchange/schema/schemas.htm) so I was hoping I could just specify the relations in the XSD (eg sql:key-fields="ProductID") and set it going.
If there isn't what's the next best thing to import it if I don't want to have to spend weeks writing code to convert XML into tables?
I am not aware of any utility, but I wonder if this is because, for the most of the open source weakly typed languages have good XML file parsers you could use to just pull it into a giant data structure and process it however you like.
So assuming you don't have files that are are huge, my recommendation would be something like Perl, DBI, and XML::Simple.

SQL - Querying with BLOBS

Our set up today takes XML data and splits that data into multiple tables in SQL. The only benefit of this is reporting is good. However, whenever we want to retrieve data we are having to re-bind all the data from hundreds of tables to re-export the XML. Each XML could be several MB to several GB.
We hardly ever run reports ironically but do retrieve / save the data very often. Due to splitting it/compiling it with several tables, both saving and retrieval is not very efficient.
Since the data comes in as XML, I'm considering updating our method and saving the XML as a large BLOB into the table. That would be so simple.
The issue now comes with reporting - without the ability to index blobs I'm wondering what options I could have to run as efficient as possible reports.
The database is in the 100's GBs.
I'm not a DBA (I'm a C# person) - I've just landed in this position at work so the only way I could think about this would be to do it using C# - build each BLOB as XML and then query the XML data in C#. This however, seems like it would be very inefficient. Maybe XQuery in SQL is better?! Despite not being a DBA, I'm more than happy for any programming (C#/VB) or SQL suggestions.
You can save the data in a single XML-type column in your database and then access the data via XQuery.
XQuery is for me, personally, a bit fiddly, but I found this list of tips to be of great help:
http://www.jackdonnell.com/?p=266
The advantage is that you are only persisting one version of the data so updates and reads are quick, apart from the XML parsing-bit (but that may depend on your data volume). Getting the data into the database from C# is straightforward, as you can map your XML to a corresponding SqlDbType.

What is the advantage of using XML with regards to SQL?

I've seen a few relational databases where the XML directly mirrors the SQL, and I was wondering if anyone could give me some insight as to why people use XML over other options. I was under the impression that it was more a personal preference, but I was told by a classmate that XML is considered "better" ie more efficient in certain cases. So I wanted to pose the question to you folks, because frankly I wanted a second opinion.
The question: When would you use XML instead of ColdFusion or PHP (or other alternatives)? What are some inherent advantages that would make it a more desirable option?
For example, this is what the XML might look like:
<data>
<dataObject name="Test">
<primaryKey>Num</primaryKey>
<foreignKey dataObject="Test" key="Num"/>
<datums>
<datum type="integer" key="itemRecnum" label="Item Recnum" data="required"/>
<datum type="string" key="status" label="Status" data="required"/>
<datum type="integer" key="idnumber" label="ID Number" data="required"/>
</datums>
<constraints/>
</dataObject>
</data>
So in the SQL server, each of these have a 1-1 correspondence, with each datum type being a column.
Can someone please explain what the advantages of using XML to pull from the database are? What exactly is happening here and why is it used over CF or PHP? And how is it pushing and pulling from the database?
What if you were to mix the two? Perhaps one would use coldfusion for inserts, and xml just for views?
The intent of XML is to store data in a flat file,
humanly readable (XML has a huge overhead in the textual naming of the entities. Also it is not meant to be human readable, it is a transport medium), easily accessible form. Methods for accessing an XML
data "store" are quite robust and evolving all the time, to include a
proposal from Microsoft for "XQL" - an SQL equivalent designed to
manipulate XML data stores.
XML is so simple that it can itself be used as a database – a very flexible one, indeed: your XML implementation can be infinitely customized through tags and a different array of libraries. As a plus, should your database get corrupted, you can open it in virtually any text editor – it's a text file, after all. However, XML has a major drawback: it is slower than SQL when processing data, and requires more resources to run.
About ColdFusion & XML you can read HERE
Where XML wins is if you've got data about a business object (let's say a hotel) scattered across 20 tables and you want to send that data to someone who organizes the data quite differently into 16 tables with a different structure. XML allows you to capture all the information about the object in one message, that's independent of the design of your database and possibly conforms to some industry standard like OTA, and load it into a different database with a quite different design.
If your XML, on the other hand, is intimately tied to the tables and columns of your SQL database design, then you aren't getting much value from it.
I A/B tested a very busy site using a cached XML product file vs caching a very large query vs caching smaller individual queries and so far the XML has performed the worst everytime. The time it took to read the file find specific records and then parse the data out was crippling the server. If you have a database at hand and are looking to build a website that is data intensive I would strongly advise avoiding XML unless you are storing XML in your database for one purpose or another.
If you are really looking for a flat file system for a website I would look into NoSQL databases such as MongoDB or CouchDB there are a few Coldfusion drivers and CFC's that have been written to work with these systems.
XML is a data storage mechanism. ColdFusion and PHP are data processing languages. XML does not pull data from a database. There are lots of reasons why people store data in XML. Some of the reasons are discussed here: Why would I ever choose to store and manipulate XML in a relational database?
PHP and XML works with XML and so does ColdFusion. If you are looking to turn XML to something for an end user, you may want to consider XSLT.

sql server openrowset to read huge xml files in one step

It's my first post ever... and I really need help on this one so any one who has some knowlege on the subject - please help!
What I need to do is to read an xml file into sql server data tables. I was looking over and over for solutions to this one and have found a few actualy. The problem is the size of the xml which is being loaded. It weights 2GB (and there will 10GB ones). I have managed to do this but I saw one particular solution which seems to me to be a great one but I cannot figure it out.
Ok lets get to the point. Currently I do it this way:
I read an entire XML using the openrowset into a variable. (this takes the whole ram memory...)
next I use the .node() thing to pull out the data and fill the tables with them.
Thats a two-step process. I was wondering if I could do it in only one step. I saw that there are things like format files and there are numerus examples on how to use that to pull out data from flat files or even excel documents in a record-based maner (in stead of sucking the whole thing into a variable) but I CANNOT find any example which would show how to read that huge XML into a table parsing the data on the fly (based on the format file). Is it even possible? I would really appreciate some help, or guidence on where to find a good example.
Pardon my English - it's been a while since I had to write so much in that language :-)
Thanks in advance!
For very large files, you could use SSIS: Loading XML data into SQL Server 2008
It gives you the flexibility of transforming the XML data, as well as reducing your memory footprint for very large files. Of course, it might be slower compared to using OPENROWSET in BULK mode.

Downsides (If any) With SQL Server 2005 XML Datatype

I have a need to record XML fragments in a SQL Server 2005 database for logging purposes (SOAP messages, in this case). Is there any reason to use the XML datatype over a simple varchar or nvarchar column?
I can see instances already where being able to use XPath to get into the data in the XML would be rather nice, and I haven't been able to find any real downsides in some brief research.
Are there any pitfalls that need to be watched out for or is there a better way to approach this?
Here's a great forum post on the topic. In general, use XML datatypes if you forsee needing the XML manipulation and typing functionality.
Depending upon the size of xml and the ability of extracting particular information from within the xml, it can become a drawback. Did you consider storing the xml on the file system and just having the path in the database?
The only problem we have encountered is that writing queries against the data is more difficult.
In some cases we have stored data in XML and later dumped it into tables for reporting.