LINQ Count .. best method - vb.net

My company has just started using LINQ and I still am having a little trouble with the abstractness (if thats a word) of the LINQ command and the SQL, my question is
Dim query = (From o In data.Addresses _
Select o.Name).Count
In the above in my mind, the SQL is returning all rows and the does a count on the number rows in the IQueryable result, so I would be better with
Dim lstring = Aggregate o In data.Addresses _
Into Count()
Or am I over thinking the way LINQ works ? Using VB Express at home so I can't see the actual SQL that is being sent to the database (I think) as I don't have access to the SQL profiler

As mentioned, these are functionally equivalent, one just uses query syntax.
As mentioned in my comment, if you evaluate the following as a VB Statement(s) in LINQPad:
Dim lstring = Aggregate o In Test _
Into Count()
You get this in the generated SQL output window:
SELECT COUNT(*) AS [value]
FROM [Test] AS [t0]
Which is the same as the following VB LINQ expression as evaluated:
(From o In Test_
Select o.Symbol).Count
You get the exact same result.

I'm not familiar with Visual Basic, but based on
http://msdn.microsoft.com/en-us/library/bb546138.aspx
Those two approaches are the same. One uses method syntax and the other uses query syntax.
You can find out for sure by using SQL Profiler as the queries run.
PS - The "point" of LINQ is you can easily do query operations without leaving code/VB-land.

An important thing here, is that the code you give will work with a wide variety of data sources. It will hopefully do so in a very efficient way, though that can't be fully guaranteed. It certainly will be done in an efficient way with a SQL source (being converted into a SELECT COUNT(*) SQL query. It will be done efficiently if the source was an in-memory collection (it gets converted to calling the Count property). It isn't done very efficiently if the source is an enumerable that is not a collection (in this case it does read everything and count as it goes), but in that case there really isn't a more efficient way of doing this.
In each case it has done the same conceptual operation, in the most efficient manner possible, without you having to worry about the details. No big deal with counting, but a bigger deal in more complex cases.
To a certain extent, you are right when you say "in my mind, the SQL is returning all rows and the does a count on the number rows". Conceptually that is what is happening in that query, but the implementation may differ. Compare with how the real query in SQL may not match the literal interpretation of the SQL command, to allow the most efficient approach to be picked.

I think you are missing the point as Linq with SQL has late binding the search is done when you need it so when you say I need the count number then a Query is created.
Before that Linq for SQL creates Expression trees that will be "translated" in to SQL when you need it....
http://weblogs.asp.net/scottgu/archive/2007/05/19/using-linq-to-sql-part-1.aspx
http://msdn.microsoft.com/en-us/netframework/aa904594.aspx
How to debug see Scott
http://weblogs.asp.net/scottgu/archive/2007/07/31/linq-to-sql-debug-visualizer.aspx
(source: scottgu.com)

Related

Database Function VS Case Statement

Yesterday we got a scenario where had to get type of a db field and on base of that we had to write the description of the field. Like
Select ( Case DB_Type When 'I' Then 'Intermediate'
When 'P' Then 'Pending'
Else 'Basic'
End)
From DB_table
I suggested to write a db function instead of this case statement because that would be more reusable. Like
Select dbo.GetTypeName(DB_Type)
from DB_table
The interesting part is, One of our developer said using database function will be inefficient as database functions are slower than Case statement. I searched over the internet to find the answer which is better approach in terms of efficiency but unfortunately I found nothing that could be considered satisfied answer. Please enlighten me with your thoughts, which approach is better?
UDF function is always slower than case statements
Please refer the article
http://blogs.msdn.com/b/sqlserverfaq/archive/2009/10/06/performance-benefits-of-using-expression-over-user-defined-functions.aspx
The following article suggests you when to use UDF
http://www.sql-server-performance.com/2005/sql-server-udfs/
Summary :
There is a large performance penalty paid when User defined functions is used.This penalty shows up as poor query execution time when a query applies a UDF to a large number of rows, typically 1000 or more. The penalty is incurred because the SQL Server database engine must create its own internal cursor like processing. It must invoke each UDF on each row. If the UDF is used in the WHERE clause, this may happen as part of the filtering the rows. If the UDF is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing.
It's the row by row processing that slows SQL Server the most.
When using a scalar function (a function that returns one value) the contents of the function will be executed once per row but the case statement will be executed across the entire set.
By operating against the entire set you allow the server to optimise your query more efficiently.
So the theory goes that the same query run both ways against a large dataset then the function should be slower. However, the difference may be trivial when operating against your data so you should try both methods and test them to determine if any performance trade off is worth the increased utility of a function.
Your devolper is right. Functions will slow down your query.
https://sqlserverfast.com/?s=user+defined+ugly
Calling functionsis like:
wrap parts into paper
put it into a bag
carry it to the mechanics
let him unwrap, do something, wrapt then result
carry it back
use it

Is Aggregate fatally flawed because each into clause is executed separately?

Is VB.NET's Aggregate query fatally flawed when used as the first (outer) clause of a Linq expression with multiple Into clauses because each Into clause is executed separately?
The "obvious" answer to SELECT MIN(ZoneMin), MAX(ZoneMin) FROM Plant in LINQ to SQL is
Dim limits = Aggregate p In Plants Select p.ZoneMin Into Min(), Max()
However, this answer actually retrieves each of Min and Max (and if you include other aggregate functions like Count and Average) in separate SQL queries. This can be easily seen in LINQPad.
Is there a transaction (or something else making these queries atomic) not shown by LINQPad, or is this a race condition waiting to happen? (And so you have to do the tricks shown in the answer to the above question to force a single query that returns multiple aggregates.)
In summary, is there a LINQ-to-SQL query using Aggregate that returns multiple aggregate functions in a single (or at least "atomic") query?
(I also say "obvious" because the obvious answer to me, Aggregate p In Plants Into Min(p.ZoneMin), Max(p.ZoneMin), actually retrieves the whole table twice, even when optimised, and then uses the Linq-to-Entities Min and Max to obtain the result :-( )
I thought Aggregate wasn't VB-specific, but it looks like C# does not have this query expression, so I've changed the .net to vb.net.
Although it doesn't use the Aggregate keyword, you can do multiple functions in a single query using the following syntax:
Dim query = From book In books _
Group By key = book.Subject Into Group _
Select id = key, _
BookCount = Group.Count, _
TotalPrice = Group.Sum(Function(_book) _book.Price), _
LowPrice = Group.Min(Function(_book) _book.Price), _
HighPrice = Group.Max(Function(_book) _book.Price), _
AveragePrice = Group.Average(Function(_book) _book.Price)
There does appear to be an issue with the Aggregate clause implementation though. Consider the following query from Northwind:
Aggregate o in Orders
into Sum(o.Freight),
Average(o.Freight),
Max(o.Freight)
This issues 3 database requests. The first two perform separate aggregate clauses. The third pulls the entire table back to the client and performs the Max on the client through Linq to Objects.
To answer my broader question: is Aggregate broken for producing separate SQL queries without transactions?
All of LINQ can cause that if you don't carefully adjust your queries to only result in a single SELECT, and that may not be possible, without "giving up", retrieving a larger result in a single query and then using Linq-to-Objects to aggregate or otherwise manipulate the data. This 'query' for example.
So it is, in general, up to the programmer to ensure transactions are added around LINQ queries that may cause multiple queries. We just need to know for sure which LINQ queries may transform into multiple SQL queries.

Dynamic Pivot Query without storing query as String

I am fully familiar with the following method in the link for performing a dynamic pivot query. Is there an alternative method to perform a dynamic pivot without storing the Query as a String and inserting a column string inside it?
http://www.simple-talk.com/community/blogs/andras/archive/2007/09/14/37265.aspx
Short answer: no.
Long answer:
Well, that's still no. But I will try to explain why. As of today, when you run the query, the DB engine demands to be aware of the result set structure (number of columns, column names, data types, etc) that the query will return. Therefore, you have to define the structure of the result set when you ask data from DB. Think about it: have you ever ran a query where you would not know the result set structure beforehand?
That also applies even when you do select *, which is just a sugar syntax. At the end, the returning structure is "all columns in such table(s)".
By assembling a string, you dynamically generate the structure that you desire, before asking for the result set. That's why it works.
Finally, you should be aware that assembling the string dynamically can theoretically and potentially (although not probable) get you a result set with infinite columns. Of course, that's not possible and it will fail, but I'm sure you understood the implications.
Update
I found this, which reinforces the reasons why it does not work.
Here:
SSIS relies on knowing the metadata of the dataflow in advance and a
dynamic pivot (which is what you are after) is not compatible with
that.
I'll keep looking and adding here.

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.

Expression Too Complex In Access 2007

When I try to run this query in Access through the ODBC interface into a MySQL database I get an "Expression too complex in query expression" error. The essential thing I'm trying to do is translate abbreviated names of languages into their full body English counterparts. I was curious if there was some way to "trick" access into thinking the expression is smaller with sub queries, or if someone else had a better idea of how to solve this problem. I thought about making a temporary table and doing a join on it, but that's not supported in Access SQL.
Just as an FYI, the query worked fine until I added the big long IFF chain. I tested the query on a smaller IFF chain for three languages, and that wasn't an issue, so the problem definitely stems from the huge IFF chain (It's 26 deep). Also, I might be able to drop some of the options (like combining the different forms of Chinese or Portuguese)
As a test, I was able to get the SQL query to work after paring it down to 14 IFF() statements, but that's a far cry from the 26 languages I'd like to represent.
SELECT TOP 5 Count( * ) AS [Number of visits by language], IIf(login.lang="ar","Arabic",IIf(login.lang="bg","Bulgarian",IIf(login.lang="zh_CN","Chinese (Simplified Han)",IIf(login.lang="zh_TW","Chinese (Traditional Han)",IIf(login.lang="cs","Czech",IIf(login.lang="da","Danish",IIf(login.lang="de","German",IIf(login.lang="en_US","United States English",IIf(login.lang="en_GB","British English",IIf(login.lang="es","Spanish",IIf(login.lang="fr","French",IIf(login.lang="el","Greek",IIf(login.lang="it","Italian",IIf(login.lang="ko","Korean",IIf(login.lang="hu","Hungarian",IIf(login.lang="nl","Dutch",IIf(login.lang="pl","Polish",IIf(login.lang="pt_PT","European Portuguese",IIf(login.lang="pt_BR","Brazilian Portuguese",IIf(login.lang="ru","Russian",IIf(login.lang="sk","Slovak",IIf(login.lang="sl","Slovenian","IIf(login.lang="fi","Finnish",IIf(login.lang="sv","Swedish",IIf(login.lang="tr","Turkish","Unknown")))))))))))))))))))))))))) AS [Language]
FROM login, reservations, reservation_users, schedules
WHERE (reservations.start_date Between DATEDIFF('s','1970-01-01 00:00:00',[Starting Date in the Following Format YYYY/MM/DD]) And DATEDIFF('s','1970-01-01 00:00:00',[Ending Date in the Following Format YYYY/MM/DD])) And reservations.is_blackout=0 And reservation_users.memberid=login.memberid And reservation_users.resid=reservations.resid And reservation_users.invited=0 And reservations.scheduleid=schedules.scheduleid And scheduletitle=[Schedule Title]
GROUP BY login.lang
ORDER BY Count( * ) DESC;
# Michael Todd
I completely agree. The list of languages should have been a table in the database and the login.lang should have been a FK into that table. Unfortunately this isn't how the database was written, and it's not really mine to modify. The languages are placed into the login.lang field by the PHP running on top of the database.
I thought about making a temporary table and doing a join on it, but that's not supported in Access SQL.
Did you try making a table of languages within Access, and joining it to the MySQL tables?
You may try the below expression. what I did is, your expression is cut down to two parts, then a final 'IIf' check will do the trick. You will have additional 2 fields and you may ignore those. I had the same situation and this worked well for me. PS: You may need to double check the closing brackets in the below expression. I did it quickly.
Thanks,
Shibin
IIf(login.lang="ar","Arabic",IIf(login.lang="bg","Bulgarian",IIf(login.lang="zh_CN","Chinese (Simplified Han)",IIf(login.lang="zh_TW","Chinese (Traditional Han)",IIf(login.lang="cs","Czech",IIf(login.lang="da","Danish",IIf(login.lang="de","German",IIf(login.lang="en_US","United States English",IIf(login.lang="en_GB","British English",IIf(login.lang="es","Spanish",IIf(login.lang="fr","French",IIf(login.lang="el","Greek",IIf(login.lang="it","Italian",""))))))))))))) as l1,
IIf(login.lang="ko","Korean",IIf(login.lang="hu","Hungarian",IIf(login.lang="nl","Dutch",IIf(login.lang="pl","Polish",IIf(login.lang="pt_PT","European Portuguese",IIf(login.lang="pt_BR","Brazilian Portuguese",IIf(login.lang="ru","Russian",IIf(login.lang="sk","Slovak",IIf(login.lang="sl","Slovenian","IIf(login.lang="fi","Finnish",IIf(login.lang="sv","Swedish",IIf(login.lang="tr","Turkish","Unknown")))))))))))) as l2,
IIf(l1="",l2,l1) AS [Language]
If you can't use a lookup table, create a custom VB function, so that instead of 26 IIf statements, you have one function call.