LINQ to XML or SQL Server 2005 XML DML?

LINQ to XML or SQL Server 2005 XML DML? - sql-server-2005

I'm writing an app that retrieves RSS feeds on a scheduled daily basis and saves the contents of each feed as XML Data in a SQL Server 2005 database. I want to display the aggregated items, sorted by date, for example, from the saved feed data in my app in Asp.Net GridViews.
My question is: should I use LINQ to XML to query the feed data, given that I already use LINQ to SQL in the app, or should I aggregate the data using a SQL query in a stored procedure, given that SQL Server 2005 has powerful XML-handling support?
I'm relatively new to both SQL Server 2005 and LINQ, so I can't really see what the relative advantages of either solution are, or whether they aren't really the same thing, effectively.

Well, I guess it really depends on where you prefer to write code. Personally I'm a lot more comfortable in C# than in T-SQL, so I'd do it in LINQ to XML (and indeed I've done exactly that for an RSS feed before now). It's likely to be pretty simple either way, but I don't see there's really much benefit in doing it in the database unless you're likely to have multiple clients which all want the same XML. Debugging etc tends (IMO) to be easier in C#, and unit tests are easier to write too.

Related

SQL - Querying with BLOBS

Our set up today takes XML data and splits that data into multiple tables in SQL. The only benefit of this is reporting is good. However, whenever we want to retrieve data we are having to re-bind all the data from hundreds of tables to re-export the XML. Each XML could be several MB to several GB.
We hardly ever run reports ironically but do retrieve / save the data very often. Due to splitting it/compiling it with several tables, both saving and retrieval is not very efficient.
Since the data comes in as XML, I'm considering updating our method and saving the XML as a large BLOB into the table. That would be so simple.
The issue now comes with reporting - without the ability to index blobs I'm wondering what options I could have to run as efficient as possible reports.
The database is in the 100's GBs.
I'm not a DBA (I'm a C# person) - I've just landed in this position at work so the only way I could think about this would be to do it using C# - build each BLOB as XML and then query the XML data in C#. This however, seems like it would be very inefficient. Maybe XQuery in SQL is better?! Despite not being a DBA, I'm more than happy for any programming (C#/VB) or SQL suggestions.

You can save the data in a single XML-type column in your database and then access the data via XQuery.
XQuery is for me, personally, a bit fiddly, but I found this list of tips to be of great help:
http://www.jackdonnell.com/?p=266
The advantage is that you are only persisting one version of the data so updates and reads are quick, apart from the XML parsing-bit (but that may depend on your data volume). Getting the data into the database from C# is straightforward, as you can map your XML to a corresponding SqlDbType.

Confused about the role of a query language

So, I haven't had any luck finding any articles or forum posts that have explained to me how exactly a query language works in conjunction with a general use programming language like c++ or vb. So I guess it wont hurt to ask >.<
Basically, I've been having a hard time understanding what the roles of the query language are ( we'll use SQL as an example for query language and VB6 for norm language) if i'm creating a simple database query that fills a table with normal information (first name, last name, address etc). I somewhat know the steps in setting up a program like this using ado objects for the connection and whatnot, but how do we decide which language of the 2 gets used for certain things ? Does vb6 specifically handle the basics like loops, if else's, declarations of your vars, and SQL specifically handles things like connecting to the database and doing the searching, filtering and sorting ? Is it possible to do certain general use vb6 actions (loops or conditionals) in SQL syntax instead ? Any help would be GREATLY appreciated.

SQL is a language to query a database. SQL is an ISO standard and relational database vendors implement to the ISO standard and then add on their own customizations. For example in SQL Server it is called T-SQL and in Oracle it is called PL-SQL. They both implement ISO standards and so each will have identical queries for a simple select like
select columname from tablename where columnname=1
However, each have different syntax for string functions, date functions, etc....
The ISO SQL standard by design is not a full procedural language with looping, subroutines, ect as in a full procedural language like VB.
However, each vendor has added capabilities to their version to add some of this functionality in.
For example both T-SQL and PL-SQL can "loop" through records using various constructs in their language.
There is also a difference when working with data that many developers are not well in tuned with. That is set based operations vs. procedural based.
Databases can work with procedural constructs but are often more performant with set based. A developer who is not versed in this concept may end up creating a very innefficient query. Here's an example of this discussion.
With any situation you have to weight out the pro's/con's of where it is best to do this work.
I tend to favor using procedural constructs such as loops in the language I am using over SQL. I find it easier to maintain and the language I am using offers more powerful syntax for me to get the job done.
However, I keep both options as a tool in the toolbox. For example, I have written data conversion scripts in SQL and in this case I have used the looping constructs in SQL.

Usually programming language are executed in the client side (app server too), and query languages are executed in the db server, so in the end it depends where you want to put all the work. Sometimes you can put lot of work in the client side by doing all the calculations with the programming language and other times you want to use more the db server and you end up using the query language or even better tsql/psql or whatever.

Relational databases are designed to manage data. In particular, they provide an efficient mechanism for managing memory, disk, and processors for large quantities of data. In addition, relational databases can handle multiple clients, guarantee transactional integrity, security, backups, persistence, and numerous other functions.
In general, if you are using an RDBMS with another language, you want to design the data structure first and then think about the API (applications programming interface) between the two. This is particularly true when you have an app/server relationship.
For a "simple" type of application, which uses a lot of data but with minimal or batch changes to it, you want to move as much of the processing into the database as is reasonable. Here are things you do not want to do:
Use queries to load things into arrays, and then do array manipulations at the language level. SQL provides joins for this.
Load data into an array and do manipulations and summaries on the array. SQL provides aggregations for this.
Save data into a file to have a backup. Databases provide backup mechanisms.
If you data fits into an array or on an Excel spreadsheet, it is often sufficient to get started with the data stored there. Only when you start to expand the needs (multiple clients, security, integration with other data) do the advantages of a database become more apparent.
These are just for guidance and to give you some ideas.

In terms of doing what where, do as much as is sensible in SQL (given it runs on a server) as you can.
So for instance don't do stuff like this (psuedo code)
foreach(row in "Select * from Orders")
if (row[CustomerID] = 876)
Display(row)
Do
foreach(row in "Select * from Orders where CustomerId = 876")
Display(row)
First it's likely Orders is indexed by CustomerID so it will find all 876s order way quicker.
Second to do the first one you just sucked every record in that table into the client's memory space probably across your network.
What language is used is essentially irrelevant, you could invent your own DBMS with it's own language.
It's where you do what processing that matters. It's Rule with exceptions, but the essential idea is let your backend do as much as it can.

SQLce DAL, Linq-to-Sql or EntityFramework

I'm learning databases, using SqlCe, and need business object to database mapping.
Currently I try to decide if to use Linq to Sql, or EntityFramework. (I understand a bit L2S, but haven't familiarized with EF yet)
The program will only be developed and used by myself, so I have good control of the priorities:
I don't need to consider potential change of database type or data storage type, as I'm quite certain SQLce will stay sufficient.
I DO expect continued development and changes to the data scheme while the program is in active use; change business object properties (Hence database columns), and possibly overall table scheme. So old data must be transported to new scheme.
I also want to keep a decent degree of layer separation DAL/BLL, although this may not be necessary, it is good for me to learn these principles.
My question is: With these priorities, would I have any benefit by choosing either Linq2Sql vs. EntityFramwork? (and please explain why)
Btw, the project involves very simple table scheme and relations with only ~4 tables total.
Thanks!

u can use Linq to sql for this,actually linq to sql is the subset of adoentity framnework.
as per ur need its better to use linq to sql becoz ur database is not complicated as well it just have some tables. linq to sql is easy to use in respect to adoentitiesframeowrk

Keep in mind that Linq2Sql only works with MS SQL Server out of the box, not with SqlCe.
As it seems, there are some tricks to get it to work, but I never tried it myself...no idea if it works as well as with the "real" SQL Server.
So I guess Entity Framework would be the safer choice.

Powerful tools for creating SQL queries

I'm looking for a tool, which would help creating complex SQL queries. Sometimes it's difficult to even verify, whether the results of a query are correct. It's especially easy to get queries joining several tables to return too little or too much data.
The tool should enable at least creation of test tables, some kind of visualization how the queries gather their data and hopefully give better parsing of error cases than for example Oracle does.
Are there tools like this or do I have to stick with creating test tables manually, filling them with test data and commiting all kinds of queries with SQuirrel SQL?

When you have a very complex query it is usually easiest to validate by breaking it up into multiple queries that populate temp tables. These intermediary results can be individually verified and then you bring them together to produce the final result set. Depending on performance needs you can stick with the temp table approach or you can then rewrite to a single statement. Typically when I have a huge query it is for background processing so I stick with the temp table approach.

What RDBMS are you using? All of the major ones have some type of console available (e.g.-SSMS in SQL Server, Toad in Oracle, MySQL Query Browser/Administrator for MySQL, etc.), and they all have Query Execution Plans where you can see how the query will actually run. So, the answer to your question is that it's entirely dependent on what RDBMS you're using, but the safe bet answer is: Yes.

I recommend trying SQL Server 2008 Management Studio Express (SSMSE) if you are working with SQL Server. I have used it at work and I believe it does everything you are looking for.
You can get it and SQL Server (express editions) here.

Certainly not a free, open-source solution, but I believe Quest Software's TOAD will fit your requirements. Quest seems to offer alot of tools in that space...they have tools for modeling and analysis, however I've never used the modeler or analyzer.
I personally have experience with the commercial version of TOAD for Oracle. It's GUI is overwhelming at first, but after you mentally filter out all of the extra buttons that you'll never use, it's manageable.

Is this a valid benefit of using embedded SQL over stored procedures?

Here's an argument for SPs that I haven't heard. Flamers, be gentle with the down tick,
Since there is overhead associated with each trip to the database server, I would suggest that a POSSIBLE reason for placing your SQL in SPs over embedded code is that you are more insulated to change without taking a performance hit.
For example. Let's say you need to perform Query A that returns a scalar integer.
Then, later, the requirements change and you decide that it the results of the scalar is > x that then, and only then, you need to perform another query. If you performed the first query in a SP, you could easily check the result of the first query and conditionally execute the 2nd SQL in the same SP.
How would you do this efficiently in embedded SQL w/o perform a separate query or an unnecessary query?
Here's an example:
--This SP may return 1 or two queries.
SELECT #CustCount = COUNT(*) FROM CUSTOMER
IF #CustCount > 10
SELECT * FROM PRODUCT
Can this/what is the best way to do this in embedded SQL?

A very persuasive article
SQL and stored procedures will be there for the duration of your data.
Client languages come and go, and you'll have to re-implement your embedded SQL every time.

In the example you provide, the time saved is sending a single scalar value and a single follow-up query over the wire. This is insignificant in any reasonable scenario. That's not to say there might not be other valid performance reasons to use SPs; just that this isn't such a reason.

I would generally never put business logic in SP's, I like them to be in my native language of choice outside the database. The only time I agree SPs are better is when there is a lot of data movement that don't need to come out of the db.
So to aswer your question, I'd rather have two queries in my code than embed that in a SP, in my view I am trading a small performance hit for something a lot more clear.

How would you do this efficiently in
embedded SQL w/o perform a separate
query or an unnecessary query?
Depends on the database you are using. In SQL Server, this is a simple CASE statement.

Perhaps include the WHERE clause in that sproc:
WHERE (all your regular conditions)
AND myScalar > myThreshold

Lately I prefer to not use SPs (Except when uber complexity arises where a proc would just be better...or CLR would be better). I have been using the Repository pattern with LINQ to SQL where my query is written in my data layer in a strongly typed LINQ expression. The key here is that the query is strongly typed which means when I refactor I am refactoring properties of a class that is directly generated from the database table (which makes changes from the DB carried all the way forward super easy and accurate). While my SQL is generated for me and sent to the server I still have the option of sticking to DRY principles as the repository pattern allows me to break things down into their smallest component. I do have the issue that I might make a trip to the server and based on the results of query I may find that I need to make another trip to the server. I don't worry about this up front. If I find later that it becomes an issue then I may refactor that code into something more performant. The over all key here is that there is no one magic bullet. I tend to work on greenfield applications which allows this method of development to be most efficient for me.

Benefits of SPs:
Performance (are precompiled)
Easy to change (without compiling the application)
SQL set based features make very easy doing really difficult data tasks
Drawbacks:
Depend heavily on the database engine used
Makes deployment of upgrades a little harder (you have to deploy the App + the scripts)
My 2 cents...
About your example, it can be done like this:
select * from products where (select count(*) from customers>10)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas