Is there an equivalent of Microsoft .Net's SQLXML Bulk Load (http://msdn.microsoft.com/en-us/library/ms171878.aspx) for Postgresql/PostGIS that I can run on linux? I have a huge and complicated XML file I'd like to import into PostGIS on a linux server without having to write a ton of code to shred the XML. I already have the XSD for it (this one: http://www.dft.gov.uk/transxchange/schema/schemas.htm) so I was hoping I could just specify the relations in the XSD (eg sql:key-fields="ProductID") and set it going.
If there isn't what's the next best thing to import it if I don't want to have to spend weeks writing code to convert XML into tables?
I am not aware of any utility, but I wonder if this is because, for the most of the open source weakly typed languages have good XML file parsers you could use to just pull it into a giant data structure and process it however you like.
So assuming you don't have files that are are huge, my recommendation would be something like Perl, DBI, and XML::Simple.
Related
I've seen a few relational databases where the XML directly mirrors the SQL, and I was wondering if anyone could give me some insight as to why people use XML over other options. I was under the impression that it was more a personal preference, but I was told by a classmate that XML is considered "better" ie more efficient in certain cases. So I wanted to pose the question to you folks, because frankly I wanted a second opinion.
The question: When would you use XML instead of ColdFusion or PHP (or other alternatives)? What are some inherent advantages that would make it a more desirable option?
For example, this is what the XML might look like:
<data>
<dataObject name="Test">
<primaryKey>Num</primaryKey>
<foreignKey dataObject="Test" key="Num"/>
<datums>
<datum type="integer" key="itemRecnum" label="Item Recnum" data="required"/>
<datum type="string" key="status" label="Status" data="required"/>
<datum type="integer" key="idnumber" label="ID Number" data="required"/>
</datums>
<constraints/>
</dataObject>
</data>
So in the SQL server, each of these have a 1-1 correspondence, with each datum type being a column.
Can someone please explain what the advantages of using XML to pull from the database are? What exactly is happening here and why is it used over CF or PHP? And how is it pushing and pulling from the database?
What if you were to mix the two? Perhaps one would use coldfusion for inserts, and xml just for views?
The intent of XML is to store data in a flat file,
humanly readable (XML has a huge overhead in the textual naming of the entities. Also it is not meant to be human readable, it is a transport medium), easily accessible form. Methods for accessing an XML
data "store" are quite robust and evolving all the time, to include a
proposal from Microsoft for "XQL" - an SQL equivalent designed to
manipulate XML data stores.
XML is so simple that it can itself be used as a database – a very flexible one, indeed: your XML implementation can be infinitely customized through tags and a different array of libraries. As a plus, should your database get corrupted, you can open it in virtually any text editor – it's a text file, after all. However, XML has a major drawback: it is slower than SQL when processing data, and requires more resources to run.
About ColdFusion & XML you can read HERE
Where XML wins is if you've got data about a business object (let's say a hotel) scattered across 20 tables and you want to send that data to someone who organizes the data quite differently into 16 tables with a different structure. XML allows you to capture all the information about the object in one message, that's independent of the design of your database and possibly conforms to some industry standard like OTA, and load it into a different database with a quite different design.
If your XML, on the other hand, is intimately tied to the tables and columns of your SQL database design, then you aren't getting much value from it.
I A/B tested a very busy site using a cached XML product file vs caching a very large query vs caching smaller individual queries and so far the XML has performed the worst everytime. The time it took to read the file find specific records and then parse the data out was crippling the server. If you have a database at hand and are looking to build a website that is data intensive I would strongly advise avoiding XML unless you are storing XML in your database for one purpose or another.
If you are really looking for a flat file system for a website I would look into NoSQL databases such as MongoDB or CouchDB there are a few Coldfusion drivers and CFC's that have been written to work with these systems.
XML is a data storage mechanism. ColdFusion and PHP are data processing languages. XML does not pull data from a database. There are lots of reasons why people store data in XML. Some of the reasons are discussed here: Why would I ever choose to store and manipulate XML in a relational database?
PHP and XML works with XML and so does ColdFusion. If you are looking to turn XML to something for an end user, you may want to consider XSLT.
There is a xml file in www.samplexxxxx.com/myfile.xml
I want to read or query this file with SQL Server in my computer
Can I make this?
Here is one example: http://pratchev.blogspot.com/2008/11/import-xml-file-to-sql-table.html
Also see dba.stackexchange.com for another option
Quite a bit of manual work. It's a bit of a square peg in a round hole, DBMS systems aren't great at working with XML data. I would recommend parsing the XML using Java, PHP, python, whatever...and inserting the data into a DBMS if needed.
I think I may be a bit over my head on this one for what my sql skills are. I have an xml file that is almost 3GB large. I need a process to import this data and insert it into tables. When I try to use SSIS xml source it complains about the XSD being complex with multiple data types. I then tried to convert the xml to csv using XLST.
I have a link to my skydrive below where I have teh XSD since I could not upload documents here. I am looking for guidance and advice on how to get this data into sql. Any help is appreciated.
https://skydrive.live.com/?cid=d75b2e7f757393ef&sc=documents&id=D75B2E7F757393EF%21286
The built in XML components in SSIS are relatively limited. For a larger, more complex XML source you might consider pulling the file in a script task and shredding using C#
You can bulk insert any size XML file using DataStreams and SqlBulkCopy.
In your schema, do you have any complex types that have mixed="true" defined? SSIS does not support this. You would need to change your content model on complex types to not be mixed, or preprocess the documents with an XSLT.
Try running the XSLT by removing mixed="true" for testing before you invest time in pre-processing.
I have many, (15-20) different XML files that I need to load to VB.Net. They're designed as they would be in a database; they're designed in Access and bulk exported into XML files. Each file represents a different table in the database.
Now, I need to load this information into VB.Net. Initially, I'd love to use DAO and access the MDB directly via queries, but this won't be possible as I'm making sure the project will be easily ported to XNA/C# down the road. (Xbox 360 cannot use MDBs, so I'd rather deal with this problem now than down the road).
So, I'm stuck now trying to figure out how to wrangle together all of these XML files together. I've tried using Factories to parse each one individually. E.g., if three XML files contain data for a 'character' class, i'd pass in an instance of Character to each XML factory and the classes would apply the necessary data.
I'm trying to get past this though, as maintaining many different classes with redundant code is a pain. plus it is hard to debug as well. So I'm trying to figure out a new solution.
The only thing I can think of right now is using System.Reflection, where I parse through each member of the class/structure I'm instantiating, and then using the names of those members to read in the data from that element of the XML file.
However, this makes the assumption that each member of the structure/class has a matching element in the XML file, and vice-versa.
If you know the schema of the XML files - you could create .NET classes that can deserialize one of those XML files into an instance of a .NET object.
You can also you use xsd.exe (comes with Windows SDK download) to generate the .NET class definition for you if you have an XSD file (or can write an XSD easier than you can write a serializable .NET class).
Linq-to-XML is a good solution (and even better in VB.NET with things like XML Literals and Global Namespaces). Treating multiple XML files as DB tables can be a rough road some times, but certainly not impossible. I guess I'd start with JOIN (even though it has "C#" in the title, the samples are also in VB)..
Is there any framework for querying XML SQL Syntax, I seriously tire of iterating through node lists.
Or is this just wishful thinking (if not idiotic) and certainly not possible since XML isn't a relational database?
XQuery and XPath... XQuery is more what you are looking for if a SQL structure is desirable.
You could try LINQ to XML, but it's not language agnostic.
.Net Framework provides LINQ to do this or you can use the .Net System.Data namespace to load data from XML files.
You can even create queries that have joins among the tables, etc.
For example, System.Data.DataTable provides a ReadXml() method.
XQuery is a functional language that is closest to SQL. XPath is a notation for locating a node within the document that is used as part of XSLT and XQuery.
XML databases such as MarkLogic serve as XQuery engines for XML data, much as relational databases serve as SQL engines for relational data.
That depends on the problem you are solving. If XML file is pretty large, sometimes it's a necessity to use something like SAX parsers to traverse the file node by node, or you will get OutOfMemoryException or will run out even of virtual memory on your computer.
But, if the expected size of XML file is relatively small, you can simply use something like Linq, also see my answer - here I tried to explain, how to make traversing through nodes much easier with constructions like yield return.
SQL Server 2005 supports XML DML on it's native xml data type.
XQuery is certainly the way forward. This is what is used by XML databases like eXist and MarkLogic.
In the Java world there are several solutions for running XQuery on flat files, most notably Saxon
For .NET, there is not so much available. Microsoft did have an XQuery library, although this was pulled from .NET 2 and has never resurfaced. XQSharp is a native .NET alternative, although currently only a command line version has been released.