Aggregate multiple columns without groupBy in Slick 2.0

Aggregate multiple columns without groupBy in Slick 2.0 - sql

I would like to perform an aggregation with Slick that executes SQL like the following:
SELECT MIN(a), MAX(a) FROM table_a;
where table_a has an INT column a
In Slick given the table definition:
class A(tag: Tag) extends Table[Int](tag, "table_a") {
def a = column[Int]("a")
def * = a
}
val A = TableQuery[A]
val as = A.map(_.a)
It seems like I have 2 options:
Write something like: Query(as.min, as.max)
Write something like:
as
.groupBy(_ => 1)
.map { case (_, as) => (as.map(identity).min, as.map(identity).max) }
However, the generated sql is not good in either case. In 1, there are two separate sub-selects generated, which is like writing two separate queries. In 2, the following is generated:
select min(x2."a"), max(x2."a") from "table_a" x2 group by 1
However, this syntax is not correct for Postgres (it groups by the first column value, which is invalid in this case). Indeed AFAIK it is not possible to group by a constant value in Postgres, except by omitting the group by clause.
Is there a way to cause Slick to emit a single query with both aggregates without the GROUP BY?

The syntax error is a bug. I created a ticket: https://github.com/slick/slick/issues/630
The subqueries are a limitation of Slick's SQL compiler currently producing non-optimal code in this case. We are working on improving the situation.
As a workaround, here is a pattern to swap out the generated SQL under the hood and leave everything else intact: https://gist.github.com/cvogt/8054159

I use the following trick in SQL Server, and it seems to work in Postgres:
select min(x2."a"), max(x2."a")
from "table_a" x2
group by (case when x2.a = x2.a then 1 else 1 end);
The use of the variable in the group by expression tricks the compiler into thinking that there could be more than one group.

Related

Sequelize raw include/join

I want to be able to do the following simple SQL query using Sequelize:
SELECT * FROM one
JOIN (SELECT COUNT(*) AS count, two_id FROM two GROUP BY two_id) AS table_two
ON one.two_id = two.two_id
I can't seem to find anything about raw include, or raw model
For performance reason, I don't want subselect in the main query (which I know sequelize already works well with) aka:
SELECT * FROM one, (SELECT COUNT(*) AS count FROM two WHERE one.two_id = two.two_id) AS count
Regarding the following sequelize code (models One and Two exists)
models.One.findAll({
include: [
models: model.Two
// what to add here in order to get the example SQL
]
})

Seems like I found a somewhat hacky workaround:
You can use fn inside selections to use any SQL word (like JOIN), resulting in something like this for my use case:
models.One.findAll({
attributes: [
fn('JOIN', literal('SELECT COUNT(*) AS count FROM two WHERE one.two_id = two.two_id')),
],
});
Note you can do that only on the last attribute (else it's a misplaced joint)

SQLite alias (AS) not working in the same query

I'm stuck in an (apparently) extremely trivial task that I can't make work , and I really feel no chance than to ask for advice.
I used to deal with PHP/MySQL more than 10 years ago and I might be quite rusty now that I'm dealing with an SQLite DB using Qt5.
Basically I'm selecting some records while wanting to make some math operations on the fetched columns. I recall (and re-read some documentation and examples) that the keyword "AS" is going to conveniently rename (alias) a value.
So for example I have this query, where "X" is an integer number that I render into this big Qt string before executing it with a QSqlQuery. This query lets me select all the electronic components used in a Project and calculate how many of them to order (rounding to the nearest multiple of 5) and the total price per component.
SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category,
Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer,
Inventory.price, UsedItems.qty_used as used_qty,
UsedItems.qty_used*X AS To_Order,
ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5,
Inventory.price*Nearest5 AS TotPrice
FROM Inventory
LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid
WHERE UsedItems.pid='1'
ORDER BY RefDes, value ASC
So, for example, I aliased UsedItems.qty_used as used_qty. At first I tried to use it in the next field, multiplying it by X, writing "used_qty*X AS To_Order" ... Query failed. Well, no worries, I had just put the original tab.field name and it worked.
Going further, I have a complex calculation and I want to use its result on the next field, but the same issue popped out: if I alias "ROUND(...)" AS Nearest5, and then try to use this value by multiplying it in the next field, the query will fail.
Please note: the query WORKS, but ONLY if I don't use aliases in the following fields, namely if I don't use the alias Nearest5 in the TotPrice field. I just want to avoid re-writing the whole ROUND(...) thing for the TotPrice field.
What am I missing/doing wrong? Either SQLite does not support aliases on the same query or I am using a wrong syntax and I am just too stuck/confused to see the mistake (which I'm sure it has to be really stupid).

Column aliases defined in a SELECT cannot be used:
For other expressions in the same SELECT.
For filtering in the WHERE.
For conditions in the FROM clause.
Many databases also restrict their use in GROUP BY and HAVING.
All databases support them in ORDER BY.
This is how SQL works. The issue is two things:
The logic order of processing clauses in the query (i.e. how they are compiled). This affects the scoping of parameters.
The order of processing expressions in the SELECT. This is indeterminate. There is no requirement for the ordering of parameters.
For a simple example, what should x refer to in this example?
select x as a, y as x
from t
where x = 2;
By not allowing duplicates, SQL engines do not have to make a choice. The value is always t.x.

You can try with nested queries.
A SELECT query can be nested in another SELECT query within the FROM clause;
multiple queries can be nested, for example by following the following pattern:
SELECT *,[your last Expression] AS LastExp From (SELECT *,[your Middle Expression] AS MidExp FROM (SELECT *,[your first Expression] AS FirstExp FROM yourTables));
Obviously, respecting the order that the expressions of the innermost select query can be used by subsequent select queries:
the first expressions can be used by all other queries, but the other intermediate expressions can only be used by queries that are further upstream.
For your case, your query may be:
SELECT *, PRC*Nearest5 AS TotPrice FROM (SELECT *, ROUND((UsedItems.qty_used*X/5)+0.5)*5*CAST((X > 0) AS INT) AS Nearest5 FROM (SELECT Inventory.id, UsedItems.pid, UsedItems.RefDes, Inventory.name, Inventory.category, Inventory.type, Inventory.package, Inventory.value, Inventory.manufacturer, Inventory.price AS PRC, UsedItems.qty_used*X AS To_Order FROM Inventory LEFT JOIN UsedItems ON Inventory.id=UsedItems.cid WHERE UsedItems.pid='1' ORDER BY RefDes, value ASC))

How do I calculate percentage of one column in Jooq / SQL in only one transaction?

Question:
Grades Table
---------------
Name Score
"Bob" "A"
"Sally" "A"
"Joe" "B"
"Ann" "C"
Suppose I have this table, and I want to calculate what percentage of students have a C. The correct answer would be 25%. How do I do that in one transaction in JOOQ (or raw SQL if I must)? Or is it not possible? Thank you.
Bad solution: Two Transactions:
float numberOfC = database.fetchCountOfStudentsWithGrade("C"); //Transaction
float numberOfStudents = database.fetchCountOfStudents(); //Transaction
float percentage = numberOfC / numberOfStudents;
Good solution attempt: One Transaction - JOOQ
context.select(val(context.selectCount().from(TABLE1))
.div(val(context.selectCount().from(TABLE1)))) // This line has error
.fetch(0, int.class); //One transaction
//Error: Cannot resolve method `div(org.jooq.Param<T>)`
Jooq Docs for Arithmetic Expressions:
https://www.jooq.org/doc/latest/manual/sql-building/column-expressions/arithmetic-expressions/

In raw sql, you can do:
select avg(case when score = 'C' then 1.0 else 0 end) as c_ratio
from t;
The above is standard syntax and should work in all databases. In some databases, you can write this as:
select avg( score = 'C' ) as c_ratio
from t;

Using SQL Standard FILTER (WHERE ..)
One option in jOOQ would be to use AggregateFunction.filterWhere() as such:
ctx.select(count().filterWhere(T.SCORE.eq("C"))
.cast(BigDecimal.class)
.div(count()))
.from(T)
.fetch();
The above is assuming the following static import:
import static org.jooq.impl.DSL.*;
HSQLDB and PostgreSQL have native support for the COUNT(*) FILTER (WHERE x) syntax. In all other databases, jOOQ will emulate this using COUNT(CASE WHEN x THEN 1 END).
A note on the approach with correlated subqueries
In your question, you suggested an approach using correlated subqueries that do the COUNT(*) calculations. It's almost never a good idea to run several such subqueries if there's a solution running several aggregations in one step

Select records with highest values for each subset

I have a set of records of which some, but not all, have a 'path' field, and all have a 'value' field. I wish to select only those which either do not have a path, or have the largest value of all the records with a particular path.
That is, given these records:
Name: Path: Value:
A foo 5
B foo 6
C NULL 2
D bar 2
E NULL 4
I want to return B, C, D, and E, but not A (because A has a path and it's path is the same as B, but A has a lower value).
How can I accomplish this, using ActiveRecord, ARel and Postgres? Ideally, I would like a solution which functions as a scope.

You could use something like this by using 2 subqueries (will do only one SQL query which has subqueries). Did not test, but should get you in the right direction. This is for Postgres.
scope :null_ids, -> { where(path: nil).select('id') }
scope :non_null_ids, -> { where('path IS NOT NULL').select('DISTINCT ON (path) id').order('path, value desc, id') }
scope :stuff, -> {
subquery = [null_ids, non_null_ids].map{|q| "(#{q.to_sql})"}.join(' UNION ')
where("#{table_name}.id IN (#{subquery})")
}
If you are using a different DB you might need to use group/order instead of distinct on for the non_nulls scope. If the query is running slow put an index on path and value.
You get only 1 query and it's a chainable scope.

A straightforward transliteration of your description to SQL would look like this:
select name, path, value
from (
select name, path, value,
row_number() over (partition by path order by value desc) as r
from your_table
where path is not null
) as dt
where r = 1
union all
select name, path, value
from your_table
where path is null
You could wrap that in a find_by_sql and get your objects out the other side.
That query works like this:
The row_number window function allows us to group the rows by path, order each group by value, and then number the rows in each group. Play around with the SQL a bit inside psql and you'll see how this works, there are other window functions available that will allow you to do all sorts of wonderful things.
You're treating NULL path values separately from non-NULL paths, hence the path is not null in the inner query.
We can peel off the first row in each of the path groups by selecting those rows from the derived table that have a row number of one (i.e. where r = 1).
The treatment of path is null rows is easily handled by the section query.
The UNION is used to join the result sets of the queries together.
I can't think of any way to construct such a query using ActiveRecord nor can I think of any way to integrate such a query with ActiveRecord's scope mechanism. If you could easily access just the WHERE component of an ActiveRecord::Relation then you could augment the where path is not null and where path is null components of that query with the WHERE components of a scope chain. I don't know how to do that though.
In truth, I tend to abandon ActiveRecord at the drop of a hat. I find ActiveRecord to be rather cumbersome for most of the complicated things I do and not nearly as expressive as SQL. This applies to every ORM I've ever used so the problem isn't specific to ActiveRecord.

I have no experience with ActiveRecord, but here's a sample with SQLAlchemy to silent the just-use-SQL crowd ;)
q1 = Session.query(Record).filter(Record.path != None)
q1 = q1.distinct(Record.path).order_by(Record.path, Record.value.desc())
q2 = Session.query(Record).filter(Record.path == None)
query = q1.from_self().union(q2)
# Further chaining, e.g. query = query.filter(Record.value > 3) to return B, E
for record in query:
print record.name

Multiple aggregate functions in Hibernate Query

I want to have an HQL query which essentially does this :
select quarter, sum(if(a>1, 1, 0)) as res1, sum(if(b>1, 1, 0)) as res2 from foo group by quarter;
I want a List as my output list with Summary Class ->
Class Summary
{
long res1;
long res2;
int quarter;
}
How can I achieve this aggregation in HQL? What will be the hibernate mappings for the target Object?
I don't want to use SQL kind of query that would return List<Object[]> and then transform it to List<Summary>

Since Summary is not an entity, you don't need a mapping for it, you can create an appropriate constructor and use an HQL constructor expression instead. Aggregate functions and ifs are also possible, though you need to use case syntax instead of if.
So, if Foo is an entity mapped to the table Foo it would look like this:
select new Summary(
f.quarter,
sum(case when f.a > 1 then 1 else 0 end),
sum(case when f.b > 1 then 1 else 0 end)
) from Foo f group by f.quarter
See also:
Chapter 16. HQL: The Hibernate Query Language

It might be possible with a subselect in the mapping. Have a look at More complex association mappings in the hibernate documentation. I've never tried that possibility.
But even the hibernate guys recommend "... but it is more practical to handle these kinds of cases using HQL or a criteria query." That's what I would do: Use the group-by in the HQL statement and work with the List. The extra time for copying this list into a list of suitable objects is negligible compared with the time which the group-by is using in the database. But it seems you don't like this possibility.
A third possibility is to define a view in the database containing your group-by and then create a normal mapping for this view.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas