XML-based databases as with SQL? - sql

Is it possible to store data as plain XML files and be able to search and sort information as in SQL? Various applications I have in mind are simple phonebooks, bookmarks lists, passwords lists for personal use.
For example:
<accounts>
<account>
<website>mail.google.com</website>
<username>example#gmail.com</username>
<password>mypassword</password>
</account>
<account>
...
</account>
</accounts>
In this case, I should be able to select only those websites where I use a particular password or username, for example.
If possible, I'd like to accomplish this just using a web browser or something such as them. No web servers or other daemons should be running on my machine as I don't want much overhead for such simple things.
Let me know if the question is not clear enough.
Thank you.

Have you seen XQuery ?
XQuery is to XML what SQL is to
database tables.
Various implementations exist, including this in-browser version.

Take a look at Sedna, eXist and BaseX, these appear to be both open source and alive.

There are two main classes of XML databases: those that do "only XML", of which the best known are probably eXist as an open source solution, and MarkLogic as a commercial product; and those that do XML alongside relational data, which is possible with DB2, SQL server, and Oracle. All offer XQuery (sometimes with extensions or restrictions) as the main query language. If your data is naturally hierarchic and already represented in XML, then you should certainly consider these products rather than converting your data into the tabular form required for storage in a relational database - which is basically wasted effort.

Related

What is the advantage of using XML with regards to SQL?

I've seen a few relational databases where the XML directly mirrors the SQL, and I was wondering if anyone could give me some insight as to why people use XML over other options. I was under the impression that it was more a personal preference, but I was told by a classmate that XML is considered "better" ie more efficient in certain cases. So I wanted to pose the question to you folks, because frankly I wanted a second opinion.
The question: When would you use XML instead of ColdFusion or PHP (or other alternatives)? What are some inherent advantages that would make it a more desirable option?
For example, this is what the XML might look like:
<data>
<dataObject name="Test">
<primaryKey>Num</primaryKey>
<foreignKey dataObject="Test" key="Num"/>
<datums>
<datum type="integer" key="itemRecnum" label="Item Recnum" data="required"/>
<datum type="string" key="status" label="Status" data="required"/>
<datum type="integer" key="idnumber" label="ID Number" data="required"/>
</datums>
<constraints/>
</dataObject>
</data>
So in the SQL server, each of these have a 1-1 correspondence, with each datum type being a column.
Can someone please explain what the advantages of using XML to pull from the database are? What exactly is happening here and why is it used over CF or PHP? And how is it pushing and pulling from the database?
What if you were to mix the two? Perhaps one would use coldfusion for inserts, and xml just for views?
The intent of XML is to store data in a flat file,
humanly readable (XML has a huge overhead in the textual naming of the entities. Also it is not meant to be human readable, it is a transport medium), easily accessible form. Methods for accessing an XML
data "store" are quite robust and evolving all the time, to include a
proposal from Microsoft for "XQL" - an SQL equivalent designed to
manipulate XML data stores.
XML is so simple that it can itself be used as a database – a very flexible one, indeed: your XML implementation can be infinitely customized through tags and a different array of libraries. As a plus, should your database get corrupted, you can open it in virtually any text editor – it's a text file, after all. However, XML has a major drawback: it is slower than SQL when processing data, and requires more resources to run.
About ColdFusion & XML you can read HERE
Where XML wins is if you've got data about a business object (let's say a hotel) scattered across 20 tables and you want to send that data to someone who organizes the data quite differently into 16 tables with a different structure. XML allows you to capture all the information about the object in one message, that's independent of the design of your database and possibly conforms to some industry standard like OTA, and load it into a different database with a quite different design.
If your XML, on the other hand, is intimately tied to the tables and columns of your SQL database design, then you aren't getting much value from it.
I A/B tested a very busy site using a cached XML product file vs caching a very large query vs caching smaller individual queries and so far the XML has performed the worst everytime. The time it took to read the file find specific records and then parse the data out was crippling the server. If you have a database at hand and are looking to build a website that is data intensive I would strongly advise avoiding XML unless you are storing XML in your database for one purpose or another.
If you are really looking for a flat file system for a website I would look into NoSQL databases such as MongoDB or CouchDB there are a few Coldfusion drivers and CFC's that have been written to work with these systems.
XML is a data storage mechanism. ColdFusion and PHP are data processing languages. XML does not pull data from a database. There are lots of reasons why people store data in XML. Some of the reasons are discussed here: Why would I ever choose to store and manipulate XML in a relational database?
PHP and XML works with XML and so does ColdFusion. If you are looking to turn XML to something for an end user, you may want to consider XSLT.

should xml or sqlite3 be used?

I just started iOS development am currently developing an application that just reads data from a server and displays it onto the screen. What I am not sure of is whether to use XML or sqlite3 to store the data. Which method should be more preferred and why? thanks in advance.
It is important to remember they are two different things, suited to different tasks. Choose the one that fits the problem. (In this case I would likely use XML or "just plain text" because it sounds like just a simple download-cache. Either the raw response could be kept or, perhaps the data already transformed into objects and then automatically serialized into XML or whatnot. In any case, keep it simple.)
XML is (at the very core) a markup format. XML documents are a (hopefully well-defined) structure. There is a large set of tooling that supports manipulation and querying within a hierarchical "document" model. I use XML a good bit for a serialization format and also use it for local caching if appropriate (e.g. there are no non-hierarchical relationships). XML is often loaded entirely into memory (e.g. a DOM) for manipulation.
SQLite is a relational database that is designed around tables and relationships between sets of tables. Being able to run (complex) queries is where a relational database really shines. SQLite is also very fast and can process large data-sets which can't all fit in memory. Columns in SQLite can also contain text (read: XML) so the approaches are not orthogonal.
Happy coding.
Probably all depends on how data is processed after it was stored. If data must be sorted, uses specific selection etc. then, sqlite is better solution.
Second, not so important, concern is how much data will be stored, if it's just one "table" with 10 rows then sqlite is probably too much for it.
If you want to read data from server and want to display on screen and don't need to save it locally then use XML.
If you want to store it locally and don't want to fetch from server then use XML files or sqlite database in your project.
If you want to fetch from server and also to store it locally then first use XML to fetch data and then use sqlite to store it locally.
and look at #pst answer for what is the difference between them.

Is SQL the ''assembler'' of the NoSQL database world?

I recently came across http://www.fossil-scm.org/index.html/doc/tip/www/theory1.wiki by D. Richard Hipp, the developer responsible for SQLite.
it go me thinking, is Fossil the only NoSQL database that uses SQL?
Do others uses SQL as a 'High Level Scripting Language'?
From the article, it sounds like Fossil isn't a database any more than git is a database. Yes, it's a thing that contains data, and yes, it's backed by a database, but it seems pretty far from a database itself. So the first part of of your question basically relies on a faulty assumption. There is a database called Friendly which uses MySQL to store schema-less models, but it seems like an awkward bandaid sort of solution at best.
I'm certainly not familiar with all of the NoSQL options out there, but, to my knowledge, none of the well-though-of ones use SQL for anything. MongoDB and CouchDB, the two I'm most familiar with, both use Javascript as part of their query interface, though in very different ways. MongoDB has queries more like what you'd expect from a relational database: you can write an arbitrary query for all documents that match a certain set of attributes. However, unlike a relational database, there's no such thing as a join (you'll only ever get a list of distinct documents back, not compound documents) and you can write arbitrary Javascript code to select documents. CouchDB, on the other hand, does not allow arbitrary queries. Instead, you create views (which are essentially simpler key-value stores) using map/reduce functions written in Javascript and then query those views from a start key to and end key.
In both cases, the type of information being transmitted to the server to perform the query isn't well-suited for the type of problem that SQL is good at solving. The trade-off to SQL being so high-level (to use the logic of the author of the paper) is that it's only suitable for a very narrow set of problems.
The creator of Fossil / SQLite is working and pushing UnQL as the NoSQL standard:
UnQL means Unstructured Query Language.
It's an open query language for JSON, semi-structured and document
databases.
It looks like a stripped down version of SQL.

Is there an XML language for defining/authoring SQL database schemas?

Is there a dialect of XML for defining the tables, indexes and relations of a relational database, and a "compiler" or stylesheet to transform that definition into SQL CREATE statements (DDL)?
EG, something that might look like:
<Table Name="orders">
<Column Name="order_id" Type="varchar" Size="20"/>
... etc ...
</Table>
I'd like to keep the configuration of a service and its dependencies all in one place, and XML is looking like the best choice because of its wide support and its ability to mix namespaces. With it, I could write an installation program that can install this service and create the database, its tables, indexes, relations, etc. without being tied to a specific SQL implementation.
Edit: This has nothing to do with ORM.
Something like xml2ddl?
Sounds like XML based migrations, never seen one though.
If you're into OR/M you could take a look at (N)Hibernate's hbm2ddl tool. It generates the appropriate create commands for the schema on various database dialects out of an XML definition.
I've written my own a couple of times for different projects. If you're good at XSLT, knowledgeable about DDL, and have a good development environment, it's surprisingly easy (like, 2 or 3 hours work) to hack together a schema for representing metadata and a transform that produces your database-creation script.
This has all the usual advantages and disadvantages of doing it yourself: on the one hand, you control the feature set, but on the other hand, you're responsible for the feature set. In my projects, the feature set was small enough that it was easier to build it myself than it would have been to learn how to work with someone else's application framework.

Good reasons NOT to use a relational database?

Can you please point to alternative data storage tools and give good reasons to use them instead of good-old relational databases? In my opinion, most applications rarely use the full power of SQL--it would be interesting to see how to build an SQL-free application.
Plain text files in a filesystem
Very simple to create and edit
Easy for users to manipulate with simple tools (i.e. text editors, grep etc)
Efficient storage of binary documents
XML or JSON files on disk
As above, but with a bit more ability to validate the structure.
Spreadsheet / CSV file
Very easy model for business users to understand
Subversion (or similar disk based version control system)
Very good support for versioning of data
Berkeley DB (Basically, a disk based hashtable)
Very simple conceptually (just un-typed key/value)
Quite fast
No administration overhead
Supports transactions I believe
Amazon's Simple DB
Much like Berkeley DB I believe, but hosted
Google's App Engine Datastore
Hosted and highly scalable
Per document key-value storage (i.e. flexible data model)
CouchDB
Document focus
Simple storage of semi-structured / document based data
Native language collections (stored in memory or serialised on disk)
Very tight language integration
Custom (hand-written) storage engine
Potentially very high performance in required uses cases
I can't claim to know anything much about them, but you might also like to look into object database systems.
Matt Sheppard's answer is great (mod up), but I would take account these factors when thinking about a spindle:
Structure : does it obviously break into pieces, or are you making tradeoffs?
Usage : how will the data be analyzed/retrieved/grokked?
Lifetime : how long is the data useful?
Size : how much data is there?
One particular advantage of CSV files over RDBMSes is that they can be easy to condense and move around to practically any other machine. We do large data transfers, and everything's simple enough we just use one big CSV file, and easy to script using tools like rsync. To reduce repetition on big CSV files, you could use something like YAML. I'm not sure I'd store anything like JSON or XML, unless you had significant relationship requirements.
As far as not-mentioned alternatives, don't discount Hadoop, which is an open source implementation of MapReduce. This should work well if you have a TON of loosely structured data that needs to be analyzed, and you want to be in a scenario where you can just add 10 more machines to handle data processing.
For example, I started trying to analyze performance that was essentially all timing numbers of different functions logged across around 20 machines. After trying to stick everything in a RDBMS, I realized that I really don't need to query the data again once I've aggregated it. And, it's only useful in it's aggregated format to me. So, I keep the log files around, compressed, and then leave the aggregated data in a DB.
Note I'm more used to thinking with "big" sizes.
The filesystem's prety handy for storing binary data, which never works amazingly well in relational databases.
Try Prevayler:
http://www.prevayler.org/wiki/
Prevayler is alternative to RDBMS. In the site have more info.
If you don't need ACID, you probably don't need the overhead of an RDBMS. So, determine whether you need that first. Most of the non-RDBMS answers provided here do not provide ACID.
Custom (hand-written) storage engine / Potentially very high performance in required uses cases
http://www.hdfgroup.org/
If you have enormous data sets, instead of rolling your own, you might use HDF, the Hierarchical Data Format.
http://en.wikipedia.org/wiki/Hierarchical_Data_Format:
HDF supports several different data models, including multidimensional arrays, raster images, and tables.
It's also hierarchical like a file system, but the data is stored in one magic binary file.
HDF5 is a suite that makes possible the management of extremely large and complex data collections.
Think petabytes of NASA/JPL remote sensing data.
G'day,
One case that I can think of is when the data you are modelling cannot be easily represented in a relational database.
Once such example is the database used by mobile phone operators to monitor and control base stations for mobile telephone networks.
I almost all of these cases, an OO DB is used, either a commercial product or a self-rolled system that allows heirarchies of objects.
I've worked on a 3G monitoring application for a large company who will remain nameless, but whose logo is a red wine stain (-: , and they used such an OO DB to keep track of all the various attributes for individual cells within the network.
Interrogation of such DBs is done using proprietary techniques that are, usually, completely free from SQL.
HTH.
cheers,
Rob
Object databases are not relational databases. They can be really handy if you just want to stuff some objects in a database. They also support versioning and modify classes for objects that already exist in the database. db4o is the first one that comes to mind.
In some cases (financial market data and process control for example) you might need to use a real-time database rather than a RDBMS. See wiki link
There was a RAD tool called JADE written a few years ago that has a built-in OODBMS. Earlier incarnations of the DB engine also supported Digitalk Smalltalk. If you want to sample application building using a non-RDBMS paradigm this might be a start.
Other OODBMS products include Objectivity, GemStone (You will need to get VisualWorks Smalltalk to run the Smalltalk version but there is also a java version). There were also some open-source research projects in this space - EXODUS and its descendent SHORE come to mind.
Sadly, the concept seemed to die a death, probably due to the lack of a clearly visible standard and relatively poor ad-hoc query capability relative to SQL-based RDMBS systems.
An OODBMS is most suitable for applications with core data structures that are best represented as a graph of interconnected nodes. I used to say that the quintessential OODBMS application was a Multi-User Dungeon (MUD) where rooms would contain players' avatars and other objects.
You can go a long way just using files stored in the file system. RDBMSs are getting better at handling blobs, but this can be a natural way to handle image data and the like, particularly if the queries are simple (enumerating and selecting individual items.)
Other things that don't fit very well in a RDBMS are hierarchical data structures and I'm guessing geospatial data and 3D models aren't that easy to work with either.
Services like Amazon S3 provide simpler storage models (key->value) that don't support SQL. Scalability is the key there.
Excel files can be useful too, particularly if users need to be able to manipulate the data in a familiar environment and building a full application to do that isn't feasible.
There are a large number of ways to store data - even "relational databse" covers a range of alternatives from a simple library of code that manipulates a local file (or files) as if it were a relational database on a single user basis, through file based systems than can handle multiple-users to a generous selection of serious "server" based systems.
We use XML files a lot - you get well structured data, nice tools for querying same the ability to do edits if appropriate, something that's human readable and you don't then have to worry about the db engine working (or the workings of the db engine). This works well for stuff that's essentially read only (in our case more often than not generated from a db elsewhere) and also for single user systems where you can just load the data in and save it out as required - but you're creating opportunities for problems if you want multi-user editing - at least of a single file.
For us that's about it - we're either going to use something that will do SQL (MS offer a set of tools that run from a .DLL to do single user stuff all the way through to enterprise server and they all speak the same SQL (with limitations at the lower end)) or we're going to use XML as a format because (for us) the verbosity is seldom an issue.
We don't currently have to manipulate binary data in our apps so that question doesn't arise.
Murph
One might want to consider the use of an LDAP server in the place of a traditional SQL database if the application data is heavily key/value oriented and hierarchical in nature.
BTree files are often much faster than relational databases. SQLite contains within it a BTree library which is in the public domain (as in genuinely 'public domain', not using the term loosely).
Frankly though, if I wanted a multi-user system I would need a lot of persuading not to use a decent server relational database.
Full-text databases, which can be queried with proximity operators such as "within 10 words of," etc.
Relational databases are an ideal business tool for many purposes - easy enough to understand and design, fast enough, adequate even when they aren't designed and optimized by a genius who could "use the full power," etc.
But some business purposes require full-text indexing, which relational engines either don't provide or tack on as an afterthought. In particular, the legal and medical fields have large swaths of unstructured text to store and wade through.
Also:
* Embedded scenarios - Where usually it is required to use something smaller then a full fledged RDBMS. Db4o is an ODB that can be easily used in such case.
* Rapid or proof-of-concept development - where you wish to focus on the business and not worry about persistence layer
CAP theorem explains it succinctly. SQL mainly provides "Strong Consistency: all clients see the same view, even in presence of updates".
K.I.S.S: Keep It Small and Simple
I would offer RDBMS :)
If you do not wont to have troubles with set up/administration go for SQLite.
Built in RDBMS with full SQL support. It even allows you to store any type of data in any column.
Main advantage against for example log file: If you have huge one, how are you going to search in it? With SQL engine you just create index and speed up operation dramatically.
About full text search: SQLite has modules for full text search too..
Just enjoy nice standard interface to your data :)
One good reason not to use a relational database would be when you have a massive data set and want to do massively parallel and distributed processing on the data. The Google web index would be a perfect example of such a case.
Hadoop also has an implementation of the Google File System called the Hadoop Distributed File System.
I would strongly recommend Lua as an alternative to SQLite-kind of data storage.
Because:
The language was designed as a data description language to begin with
The syntax is human readable (XML is not)
One can compile Lua chunks to binary, for added performance
This is the "native language collection" option of the accepted answer. If you're using C/C++ as the application level, it is perfectly reasonable to throw in the Lua engine (100kB of binary) just for the sake of reading configs/data or writing them out.