Is there any way to get NHibernate to use a Window function?
Specifically, I'm looking to create a query like the following:
SELECT
date,
SUM(Price) OVER (ORDER BY date)
FROM purchases
GROUP BY date
I've Search the web and can't find anything about window functions and NHibernate
Specifically, I'm using the QueryOver interface for NHibernate.
If it's not possible, what other solutions are there to keep database independence in my code?
Good question but unfortunately I've never seen any code within NHibernate itself that would allow it to generate such a window function.
Your only option would be to use a named query. In your named query you could either type out the SQL by hand or call a stored procedure.
You stated that you wanted to keep your solution database independent. I've never personally cared much doing such as thing as I always work with SQL Server but I can understand where you're coming from. I've heard of several people who run their integration tests with an in memory SQLite solution and then run their actual code on SQL Server or Oracle.
In your scenario, I don't believe the SQL in your question is ANSI compliant anyway so you'd have to re-write it in a different way unfortunately if you used a named query to make it database independent.
As far as I know, you can't do that with the QueryOver API.
What I would do to keep as DB independant as possible would be to consider the windowed query as a join between your table and a group by query. Then to write a QueryOver which would lead this join.
Considering you query it would lead to this : (though the example does not seem to really show the need for a windowed query)
SELECT
purchases.date,
purchasesGroup.priceSum
FROM
purchases
INNER JOIN
(SELECT date, SUM(Price) as priceSum FROM purchases GROUP BY date)
AS purchasesGroup
on purchases.date = purchasesGroup.date
You could potentially use a view for this. That could at least abstract away a little bit of it, although you are really migrating the dependency to the database schema.
I've run into this kind of issue as well lately, and I'm starting to realize that I am often trying to use NHibernate in a way that isn't really about persisting entities. If I were Ayende Rahien I would probably know what to do, but since I am not I usually end up mixing in a little Dapper when I just need something like a read-only aggregate result such as statistics about data.
Related
I would like to use the find_by_sql method in place of the active record query interface because of the flexibility I get in writing my own SQL.
For example,
It is considered a good practice in SQL to order the tables in your query from smallest to largest. When using the active record query interface, you will have to start with the result model which could be the largest table.
I can also easily avoid the N+1 problem by including the target table in the join itself and including the required columns from multiple tables.
I would like to know if there are reasons I should not be using the find_by_sql option, especially when I will be writing ANSI SQL that should be compatible with all, if not most databases.
Thanks!
Writing SQL code directly is normally not encouraged because it will prevent you from accessing features such as the lazy-execution of SQL queries and may potentially lead to invalid or unsafe queries.
In fact, ActiveRecord automatically converts Ruby objects into corresponding SQL statements (such as ranges into BETWEEN or arrays into IN), filters strings from SQL injections and ActiveRecord::Relations provide lazy query executions.
That said, if you know what you are doing or using ActiveRecord would be extremely complex to achieve a specific query, there is nothing wrong to write your SQL.
My suggestion is to avoid doing that for queries that can easily be reproduced in ActiveRecord and AREL.
I have a very silly doubt in NHibernate. There are two or three entities of which two are related and one is not related to other two entities. I have to fetch some selected columns from these three tables by joining them. Is it a good idea to use session.CreateSql() or we have to use session.CreateCriteria(). I am really confused here as I could not write the Criteria queries here and forced to use CreateSql. Please advise.
in general you should avoid writing SQL whenever possible;
one of the advantages of using an ORM is that it's implementation-agnostic.
that means that you don't know (and don't care) what the underlying database is, and you can actually switch DB providers or tweak with the DB structure very easily.
If you write your own SQL statements you run the risk of them not working on other providers, and also you have to maintain them yourself (for example- if you change the name of the underlying column for the Id property from 'Id' to 'Employee_Id', you'd have to change your SQL query, whereas with Criteria no change would be necessary).
Having said that- there's nothing stopping you from writing a Criteria / HQL that pulls data from more than one table. for example (with HQL):
select emp.Id, dep.Name, po.Id
from Employee emp, Department dep, Posts po
where emp.Name like 'snake' //etc...
There are multiple ways to make queries with NH.
HQL, the classic way, a powerful object oriented query language. Disadvantage: appears in strings in the code (actually: there is no editor support).
Criteria, a classic way to create dynamic queries without string manipulations. Disadvantages: not as powerful as HQL and not as typesafe as its successors.
QueryOver, a successor of Criteria, which has a nicer syntax and is more type safe.
LINQ, now based on HQL, is more integrated then HQL and typesafe and generally a matter of taste.
SQL as a fallback for cases where you need something you can't get the object oriented way.
I would recommend HQL or LINQ for regular queries, QueryOver (resp. Criteria) for dynamic queries and SQL only if there isn't any other way.
To answer your specific problem, which I don't know: If all information you need for the query is available in the object oriented model, you should be able to solve it by the use of HQL.
Could you please list some of the bad practices in SQL, that novice people do?
I have found the use of "WHILE loop" in scenarios which could be resolved using set operations.
Another example is inserting data only if it does not exist. This can be achieved using LEFT OUTER JOIN. Some people go for "IF"
Any other thoughts?
Edit: What I am looking for is specific scenarios (as mentioned in the question) that could be achieved using SQL without using procedural constructs
Thanks
Lijo
Here are some I have seen:
Using cursors instead of equivalent (and faster) set operations (joins etc).
Dynamic SQL for everything.
Code that is open to SQL Injection attacks.
Full outer joins even when they are not needed.
Huge stored procedures (hundreds/thousands of lines).
No comments.
Placing ODBC or dynamic SQL calls all over the code.
Often it is better to define a data abstraction layer that provides access
to the databases. All the SQL code can hide in that layer.
This often avoids replication of similar queries, and makes changing
data models easier to do.
Personally for me: anything that is not a plain INSERT, UPDATE, DELETE or SELECT statement
I don't like logic in SQL.
My biggest beef here is definitely repetitive SQL. As an example, multiple stored procedures that perform the exact same joins but different filters.
Using Views in such cases can make your database MUCH easier to look at and work with
Creating vendor-specific SQL, when generic SQL would do.
Creating tables dynamically at runtime (other than TEMPORARY tables).
Letting your application code have table create or super user privs.
The question asking for a list of SQL smells, no answer can be
exhaustive. I will be expanding my answer as time permits and memory
serves:
Redundant grouping
Redundant grouping is the application of the GROUP BY statement—and
consequently of aggregate functions—to more columns than required. It
occurs when the author starts by collecting most of or all the data that
he needs, to group it at the very end. Redundant grouping is,
therefore, late grouping, for the correct approach is to group early,
and only the data that needs grouping.
If a main entity (main) have a journal (jrnl) and refer to another
enity appendage (apnd), then the following query:
SELECT
main . Id ,
main . Name ,
MAX(jrnl.Entry) AS Entry ,
MAX(jrnl.Date ) AS Date ,
apnd . Reference,
apnd . Status
FROM main
JOIN jrnl ON jrnl.Parent = main.Id
JOIN apnd ON apnd.Id = main.ApndId
GROUP BY main.Id, main.Name, apnd.Reference, apnd.Status
has redundant grouping, because the sole purpose of the GROUP BY
clause is to obtain the latest journal entry. It should be rewritten in
a non-redundant mannger as follows:
SELECT
main.Id ,
main.Name ,
skel.Entry ,
jrnl.Date ,
apnd.Reference,
apnd.Status
FROM
( SELECT
jrnl.Parent AS MainId,
MAX(jrnl.Entry) AS MaxEntry
FROM main
GROUP BY jrnl.Parent
) skel -- eton
JOIN main ON main.Id = skel.MainId
JOIN jrnl ON jrnl.Entry = skel.MaxEntry
JOIN apnd ON apnd.Id = main.ApndId
That is—we group on the narrowest dataset possible, and join the rest
afterwards, even if it means referencing the same tables!
DBMS Vendors use SQL dialect features to differentiate their product, at the same time claiming to support SQL standards. 'Nuff said on this.
Is there any example of SQL you have coded that can't be translated to SQL:2008 standard SQL ?
To be specific, I'm talking about DML (a query statement), NOT DDL, stored procedure syntax or anything that is not a pure SQL statement.
I'm also talking about queries you would use in Production, not for ad-hoc stuff.
Edit Jan 13
Thanks for all of your answers : they have conveyed to me an impression that a lot of the DBMS-specific SQL is created to allow work-arounds for poor relational design. Which leads me to the conclusion you probably wouldn't want to port most existing applications.
Typical differences include subtly differnt semantics (for example Oracle handles NULLs differently from other SQL dialects in some cases), different exception handling mechanisms, different types and proprietary methods for doing things like string operations, date operations or hierarchical queries. Query hints also tend to have syntax that varies across platforms, and different optimisers may get confused on different types of constructs.
One can use ANSI SQL for the most part across database systems and expect to get reasonable results on a database with no significant tuning issues like missing indexes. However, on any non-trivial application there is likely to be some requirement for code that cannot easily be done portably.
Typically, this requirement will be fairly localised within an application code base - a handful of queries where this causes an issue. Reporting is much more likely to throw up this type of issue and doing generic reporting queries that will work across database managers is very unlikely to work well. Some applications are more likely to cause grief than others.
Therefore, it is unlikely that relying on 'portable' SQL constructs for an application will work in the general case. A better strategy is to use generic statements where they will work and break out to a database specific layer where this does not work.
A generic query mechanism could be to use ANSI SQL where possible; another possible approach would be to use an O/R mapper, which can take drivers for various database platforms. This type of mechanism should suffice for the majority of database operations but will require you to do some platform-specifc work where it runs out of steam.
You may be able to use stored procedures as an abstraction layer for more complex operations and code a set of platform specific sprocs for each target platform. The sprocs could be accessed through something like ADO.net.
In practice, subtle differences in paramter passing and exception handling may cause problems with this approach. A better approach is to produce a module that wraps the
platform-specific database operations with a common interface. Different 'driver' modules can be swapped in and out depending on what DBMS platform you are using.
Oracle has some additions, such as model or hierarchical queries that are very difficult, if not impossible, to translate into pure SQL
Even when SQL:2008 can do something sometimes the syntax is not the same. Take the REGEXP matching syntax for example, SQL:2008 uses LIKE_REGEX vs MySQL's REGEXP.
And yes, I agree, it's very annoying.
Part of the problem with Oracle is that it's still based on the SQL 1992 ANSI standard. SQL Server is on SQL 1999 standard, so some of the things that look like "extensions" are in fact newer standards. (I believe that the "OVER" clause is one of these.)
Oracle is also far more restrictive about placing subqueries in SQL. SQL Server is far more flexible and permissive about allowing subqueries almost anywhere.
SQL Server has a rational way to select the "top" row of a result: "SELECT TOP 1 FROM CUSTOMERS ORDER BY SALES_TOTAL". In Oracle, this becomes "SELECT * FROM (SELECT CUSTOMERS ORDER BY SALES_TOTAL) WHERE ROW_NUMBER <= 1".
And of course there's always Oracle's infamous SELECT (expression) FROM DUAL.
Edit to add:
Now that I'm at work and can access some of my examples, here's a good one. This is generated by LINQ-to-SQL, but it's a clean query to select rows 41 through 50 from a table, after sorting. It uses the "OVER" clause:
SELECT [t1].[CustomerID], [t1].[CompanyName], [t1].[ContactName], [t1].[ContactTitle], [t1].[Address], [t1].[City], [t1].[Region], [t1].[PostalCode], [t1].[Country], [t1].[Phone], [t1].[Fax]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ContactName]) AS [ROW_NUMBER], [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName], [t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region], [t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
) AS [t1]
WHERE [t1].[ROW_NUMBER] BETWEEN 40 + 1 AND 40 + 10
ORDER BY [t1].[ROW_NUMBER]
Common here on SO
ISNULL (SQL Server)
NVL ('Orable)
IFNULL (MySQL, DB2?)
COALESCE (ANSI)
To answer exactly:
ISNULL can easily give different results as COALESCE on SQL Server because of data type precedence, as per my answer/comments here
Sql is the standard in query languages, however it is sometime a bit verbose. I am currently writing limited query language that will make my common queries quicker to write and with a bit less mental overhead.
If you write a query over a good database schema, essentially you will be always joining over the primary key, foreign key fields so I think it should be unnecessary to have to state them each time.
So a query could look like.
select s.name, region.description from shop s
where monthly_sales.amount > 4000 and s.staff < 10
The relations would be
shop -- many to one -- region,
shop -- one to many -- monthly_sales
The sql that would be eqivilent to would be
select distinct s.name, r.description
from shop s
join region r on shop.region_id = region.region_id
join monthly_sales ms on ms.shop_id = s.shop_id
where ms.sales.amount > 4000 and s.staff < 10
(the distinct is there as you are joining to a one to many table (monthly_sales) and you are not selecting off fields from that table)
I understand that original query above may be ambiguous for certain schemas i.e if there the two relationship routes between two of the tables. However there are ways around (most) of these especially if you limit the schema allowed. Most possible schema's are not worth considering anyway.
I was just wondering if there any attempts to do something like this?
(I have seen most orm solutions to making some queries easier)
EDIT: I actually really like sql. I have used orm solutions and looked at linq. The best I have seen so far is SQLalchemy (for python). However, as far as I have seen they do not offer what I am after.
Hibernate and LinqToSQL do exactly what you want
I think you'd be better off spending your time just writing more SQL and becoming more comfortable with it. Most developers I know have gone through just this progression, where their initial exposure to SQL inspires them to bypass it entirely by writing their own ORM or set of helper classes that auto-generates the SQL for them. Usually they continue adding to it and refining it until it's just as complex (if not more so) than SQL. The results are sometimes fairly comical - I inherited one application that had classes named "And.cs" and "Or.cs", whose main functions were to add the words " AND " and " OR ", respectively, to a string.
SQL is designed to handle a wide variety of complexity. If your application's data design is simple, then the SQL to manipulate that data will be simple as well. It doesn't make much sense to use a different sort of query language for simple things, and then use SQL for the complex things, when SQL can handle both kinds of thing well.
I believe that any (decent) ORM would be of help here..
Entity SQL is slightly higher level (in places) than Transact SQL. Other than that, HQL, etc. For object-model approaches, LINQ (IQueryable<T>) is much higher level, allowing simple navigation:
var qry = from cust in db.Customers
select cust.Orders.Sum(o => o.OrderValue);
etc
Martin Fowler plumbed a whole load of energy into this and produced the Active Record pattern. I think this is what you're looking for?
Not sure if this falls in what you are looking for but I've been generating SQL dynamically from the definition of the Data Access Objects; the idea is to reflect on the class and by default assume that its name is the table name and all properties are columns. I also have search criteria objects to build the where part. The DAOs may contain lists of other DAO classes and that directs the joins.
Since you asked for something to take care of most of the repetitive SQL, this approach does it. And when it doesn't, I just fall back on handwritten SQL or stored procedures.