LINQ syntax vs SQL syntax - sql

Why did Andres Heilsberg designed LINQ syntax to be different than that of SQL (whereby made an overhead for the programmers to learn a whole new thing)?
Weren't it better if it used same syntax as of SQL?

LINQ isn't meant to be SQL. It's meant to be a query language which is as independent of the data source as reasonably possible. Now admittedly it has a strong SQL bias, but it's not meant to just be embedding SQL in source code (fortunately).
Personally, I vastly prefer LINQ's syntax to SQL's. In particular, the ordering is much more logical in LINQ. Just by looking at the order of the query clauses, you can see the logical order in which the query is processed. You start with a data source, possibly do some filtering, ordering etc, and usually end with a projection or grouping. Compare that with SQL, where you start off saying which columns you're interested in, not even knowing what table you're talking about yet.
Not only is LINQ more logical in that respect, but it allows tools to work with you better - if Visual Studio knows what data you're starting with, then when you start writing a select clause (for example) it can help you with IntelliSense. Additionally, it allows the translation from LINQ query expressions into "dot notation" to be relatively simple using extension methods, without the compiler having to be aware of any details of what the query will actually do.
So from my point of view: no, LINQ would be a lot worse if it had slavishly followed SQL's syntax.

First, choose your flavor of SQL - there are several! (T-, PL-, etc).
Ultimately, there are similarities and differences. A lot of the LINQ changes make more sense - i.e. choosing your source (FROM) before you try filtering (WHERE) / projection (SELECT), allowing better static analysis etc (including intellisense), and a more natural query comprehension syntax. This helps both the developer and the compiler, so I'm happy.

It is simpler to parse expression when initial data is provided in its beginning.
Because of this VS provides code completion even for partially written LINQ queries (great feature IMO).

The reason is the C# language designers use this approach because of when I first specify where the data is coming from, now Visual Studio and the C# compiler know what my data looks like. And I can have IntelliSense help in the rest of the query because Visual Studio will know that "city" (for example) is a string, and it has operations like startsWith and a property named length. And really, inside of a relational database like SQL Server, the select clause that you are writing in a SQL statement at the top is really one of the last pieces of the information the query engine has to figure out. Before that it has to figure out what table you are working against in the from clause even though the from clause comes later in SQL syntax

Related

Rails - find_by_sql vs active record querying

I would like to use the find_by_sql method in place of the active record query interface because of the flexibility I get in writing my own SQL.
For example,
It is considered a good practice in SQL to order the tables in your query from smallest to largest. When using the active record query interface, you will have to start with the result model which could be the largest table.
I can also easily avoid the N+1 problem by including the target table in the join itself and including the required columns from multiple tables.
I would like to know if there are reasons I should not be using the find_by_sql option, especially when I will be writing ANSI SQL that should be compatible with all, if not most databases.
Thanks!
Writing SQL code directly is normally not encouraged because it will prevent you from accessing features such as the lazy-execution of SQL queries and may potentially lead to invalid or unsafe queries.
In fact, ActiveRecord automatically converts Ruby objects into corresponding SQL statements (such as ranges into BETWEEN or arrays into IN), filters strings from SQL injections and ActiveRecord::Relations provide lazy query executions.
That said, if you know what you are doing or using ActiveRecord would be extremely complex to achieve a specific query, there is nothing wrong to write your SQL.
My suggestion is to avoid doing that for queries that can easily be reproduced in ActiveRecord and AREL.

What's the common practice in constituting the WHERE clause based on the user input

If we take a database table, we can query all the rows or we can choose to apply a filter on it. The filter can vary depending on the user input. In cases when there are few options we can specify different queries for those few specific conditions. But if there are lots and lots of options that user might or might not specify, aforementioned method does not come handy. I know, I can compose the filter based upon the user input and send it as a string to the corresponding stored procedure as a parameter, build the query with that filter and finally execute the query string with the help of EXECUTE IMMEDIATE(In Oracle's case). Don't know why but I really don't like this way of query building. I think this way I leave the doors open for SQL injectors. And besides, that I always have trouble with the query itself as everything is just a string and I need to handle dates and numbers carefully.What is the best and most used method of forming the WHERE clause of a query against a database table?
Using database parameters instead of attempting to quote your literals is the way forward.
This will guard you against SQL injection.
A common way of approaching this problem is building expression trees that represent your query criteria, converting them to parameterized SQL (to avoid SQL injection risks), binding parameter values to the generated SQL, and executing the resultant query against your target database.
The exact approach depends on your client programming framework: .NET has Entity Framework and LINQ2SQL that both support expression trees; Java has Hibernate and JPA, and so on. I have seen several different frameworks used to construct customizable queries, with great deal of success. In situations when these frameworks are not available, you can roll your own, although it requires a lot more work.

NHibernate problem choosing between CreateSql and CreateCriteria

I have a very silly doubt in NHibernate. There are two or three entities of which two are related and one is not related to other two entities. I have to fetch some selected columns from these three tables by joining them. Is it a good idea to use session.CreateSql() or we have to use session.CreateCriteria(). I am really confused here as I could not write the Criteria queries here and forced to use CreateSql. Please advise.
in general you should avoid writing SQL whenever possible;
one of the advantages of using an ORM is that it's implementation-agnostic.
that means that you don't know (and don't care) what the underlying database is, and you can actually switch DB providers or tweak with the DB structure very easily.
If you write your own SQL statements you run the risk of them not working on other providers, and also you have to maintain them yourself (for example- if you change the name of the underlying column for the Id property from 'Id' to 'Employee_Id', you'd have to change your SQL query, whereas with Criteria no change would be necessary).
Having said that- there's nothing stopping you from writing a Criteria / HQL that pulls data from more than one table. for example (with HQL):
select emp.Id, dep.Name, po.Id
from Employee emp, Department dep, Posts po
where emp.Name like 'snake' //etc...
There are multiple ways to make queries with NH.
HQL, the classic way, a powerful object oriented query language. Disadvantage: appears in strings in the code (actually: there is no editor support).
Criteria, a classic way to create dynamic queries without string manipulations. Disadvantages: not as powerful as HQL and not as typesafe as its successors.
QueryOver, a successor of Criteria, which has a nicer syntax and is more type safe.
LINQ, now based on HQL, is more integrated then HQL and typesafe and generally a matter of taste.
SQL as a fallback for cases where you need something you can't get the object oriented way.
I would recommend HQL or LINQ for regular queries, QueryOver (resp. Criteria) for dynamic queries and SQL only if there isn't any other way.
To answer your specific problem, which I don't know: If all information you need for the query is available in the object oriented model, you should be able to solve it by the use of HQL.

Is there any way I can compare two sql strings to check if they are semantically equivalent?

I am writing some Java unit tests and I need to compare two sql strings where the sql statements are semantically equivalent but can be syntactically different. I can't do string comparison since an order of the from clause and where clause can be different yet the two queries might be equivalent.
Is there anyway to do this in java without having to write my own Oracle SQL Parser ? :)
P.S. The query can be very complicated !
Thank you !
The general answer is NO, because you can always call some kind of stored procedure which hides a Turing machine.
The fact that you can do arithmetic in a SQL statement I think pushes you over the Turing cliff, too.
Of course, theorists always tell us everything is impossible, so we should all roll over and die.
Nah.
So what can you do? Well, a "simple" possibility is to normalize the SQL queries, much as you simplify algebraic equations. If you could somehow, for a SQL statement, "normalize" (convert) it into the absolutely shortest equivalent SQL that did the same thing, then you could normalize both SQL statements and compare the result statements; if the they are equal modulo identifier renaming, then they have the same "semantics". For every operator in SQL, there is some semantics behind it, and some set of equivalent operations, just as there is in algebra. So, if you can determine the set of algebraic equivalences for each SQL operator, you can replace each algebraic computation by the shortest algebriac equivalent which does the same thing.
To do this, you have to be able to parse SQL, and apply SQL rewrites to the parsed SQL, which means you need a program transformation engine. (You can see an analog of this at Parsing and Rewriting Algebra)
This doesn't work in all cases. First, there may be several same-length "shortest" SQL statements that are equivalent (2+X is the same as X+2 but it isn't obvious to a a tool). Now you've got a theorem proving problem (for our X+2 example, use the commutative law to prove they are equal), back to stuck in theory. Second, you may not know how to generate the shortest possible sequence using your rewrites; even math equations sometimes have to swell up before they can get small again. Technically you have to search all possible algebraic equivalences to find the shortest, and that's impossibly large.
So, hard to do in practice, too. So, NO.
Not a direct solution for your problem, but you may want to look into JSqlParser which may already cover part of what you need.

Plain SQL vs Dialects

DBMS Vendors use SQL dialect features to differentiate their product, at the same time claiming to support SQL standards. 'Nuff said on this.
Is there any example of SQL you have coded that can't be translated to SQL:2008 standard SQL ?
To be specific, I'm talking about DML (a query statement), NOT DDL, stored procedure syntax or anything that is not a pure SQL statement.
I'm also talking about queries you would use in Production, not for ad-hoc stuff.
Edit Jan 13
Thanks for all of your answers : they have conveyed to me an impression that a lot of the DBMS-specific SQL is created to allow work-arounds for poor relational design. Which leads me to the conclusion you probably wouldn't want to port most existing applications.
Typical differences include subtly differnt semantics (for example Oracle handles NULLs differently from other SQL dialects in some cases), different exception handling mechanisms, different types and proprietary methods for doing things like string operations, date operations or hierarchical queries. Query hints also tend to have syntax that varies across platforms, and different optimisers may get confused on different types of constructs.
One can use ANSI SQL for the most part across database systems and expect to get reasonable results on a database with no significant tuning issues like missing indexes. However, on any non-trivial application there is likely to be some requirement for code that cannot easily be done portably.
Typically, this requirement will be fairly localised within an application code base - a handful of queries where this causes an issue. Reporting is much more likely to throw up this type of issue and doing generic reporting queries that will work across database managers is very unlikely to work well. Some applications are more likely to cause grief than others.
Therefore, it is unlikely that relying on 'portable' SQL constructs for an application will work in the general case. A better strategy is to use generic statements where they will work and break out to a database specific layer where this does not work.
A generic query mechanism could be to use ANSI SQL where possible; another possible approach would be to use an O/R mapper, which can take drivers for various database platforms. This type of mechanism should suffice for the majority of database operations but will require you to do some platform-specifc work where it runs out of steam.
You may be able to use stored procedures as an abstraction layer for more complex operations and code a set of platform specific sprocs for each target platform. The sprocs could be accessed through something like ADO.net.
In practice, subtle differences in paramter passing and exception handling may cause problems with this approach. A better approach is to produce a module that wraps the
platform-specific database operations with a common interface. Different 'driver' modules can be swapped in and out depending on what DBMS platform you are using.
Oracle has some additions, such as model or hierarchical queries that are very difficult, if not impossible, to translate into pure SQL
Even when SQL:2008 can do something sometimes the syntax is not the same. Take the REGEXP matching syntax for example, SQL:2008 uses LIKE_REGEX vs MySQL's REGEXP.
And yes, I agree, it's very annoying.
Part of the problem with Oracle is that it's still based on the SQL 1992 ANSI standard. SQL Server is on SQL 1999 standard, so some of the things that look like "extensions" are in fact newer standards. (I believe that the "OVER" clause is one of these.)
Oracle is also far more restrictive about placing subqueries in SQL. SQL Server is far more flexible and permissive about allowing subqueries almost anywhere.
SQL Server has a rational way to select the "top" row of a result: "SELECT TOP 1 FROM CUSTOMERS ORDER BY SALES_TOTAL". In Oracle, this becomes "SELECT * FROM (SELECT CUSTOMERS ORDER BY SALES_TOTAL) WHERE ROW_NUMBER <= 1".
And of course there's always Oracle's infamous SELECT (expression) FROM DUAL.
Edit to add:
Now that I'm at work and can access some of my examples, here's a good one. This is generated by LINQ-to-SQL, but it's a clean query to select rows 41 through 50 from a table, after sorting. It uses the "OVER" clause:
SELECT [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ContactName]) AS [ROW_NUMBER], [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN 40 + 1 AND 40 + 10
ORDER BY [t1].[ROW_NUMBER]
Common here on SO
ISNULL (SQL Server)
NVL ('Orable)
IFNULL (MySQL, DB2?)
COALESCE (ANSI)
To answer exactly:
ISNULL can easily give different results as COALESCE on SQL Server because of data type precedence, as per my answer/comments here