Hibernate secure order of colums in native query - native

I'm trying to switch our project to Hibernate. The problem is in some places we rely on column order of ResultSet. But a query like:
List<Map<String, Object>> rows = session.createSQLQuery("select * from names")
.setResultTransformer(Criteria.ALIAS_TO_ENTITY_MAP).list();
gives me a list of Maps, which order of pairs is undefined. That is, pair for "id" could appear as last one, while it is first in the table...
How can I get column names in order like it is in native SQL?

I've had the same problem.
What helped me was the following stackoverflow question:
How to get an order Map wih AliasToEntityMapResultTransformer ?
By extending AliasToEntityMapResultTransformer you can switch out the HashMap for a LinkedHashMap and preserve the order of your query result.

Related

Aggregation of an annotation in GROUP BY in Django

UPDATE
Thanks to the posted answer, I found a much simpler way to formulate the problem. The original question can be seen in the revision history.
The problem
I am trying to translate an SQL query into Django, but am getting an error that I don't understand.
Here is the Django model I have:
class Title(models.Model):
title_id = models.CharField(primary_key=True, max_length=12)
title = models.CharField(max_length=80)
publisher = models.CharField(max_length=100)
price = models.DecimalField(decimal_places=2, blank=True, null=True)
I have the following data:
publisher title_id price title
--------------------------- ---------- ------- -----------------------------------
New Age Books PS2106 7 Life Without Fear
New Age Books PS2091 10.95 Is Anger the Enemy?
New Age Books BU2075 2.99 You Can Combat Computer Stress!
New Age Books TC7777 14.99 Sushi, Anyone?
Binnet & Hardley MC3021 2.99 The Gourmet Microwave
Binnet & Hardley MC2222 19.99 Silicon Valley Gastronomic Treats
Algodata Infosystems PC1035 22.95 But Is It User Friendly?
Algodata Infosystems BU1032 19.99 The Busy Executive's Database Guide
Algodata Infosystems PC8888 20 Secrets of Silicon Valley
Here is what I want to do: introduce an annotated field dbl_price which is twice the price, then group the resulting queryset by publisher, and for each publisher, compute the total of all dbl_price values for all titles published by that publisher.
The SQL query that does this is as follows:
SELECT SUM(dbl_price) AS total_dbl_price, publisher
FROM (
SELECT price * 2 AS dbl_price, publisher
FROM title
) AS A
GROUP BY publisher
The desired output would be:
publisher tot_dbl_prices
--------------------------- --------------
Algodata Infosystems 125.88
Binnet & Hardley 45.96
New Age Books 71.86
Django query
The query would look like:
Title.objects
.annotate(dbl_price=2*F('price'))
.values('publisher')
.annotate(tot_dbl_prices=Sum('dbl_price'))
but gives an error:
KeyError: 'dbl_price'.
which indicates that it can't find the field dbl_price in the queryset.
The reason for the error
Here is why this error happens: the documentation says
You should also note that average_rating has been explicitly included
in the list of values to be returned. This is required because of the ordering of the values() and annotate() clause.
If the values() clause precedes the annotate() clause, any annotations
will be automatically added to the result set. However, if the
values() clause is applied after the annotate() clause, you need to explicitly include the aggregate column.
So, the dbl_price could not be found in aggregation, because it was created by a prior annotate, but wasn't included in values().
However, I can't include it in values either, because I want to use values (followed by another annotate) as a grouping device, since
If the values() clause precedes the annotate(), the annotation will be computed using the grouping described by the values() clause.
which is the basis of how Django implements SQL GROUP BY. This means that I can't include dbl_price inside values(), because then the grouping will be based on unique combinations of both fields publisher and dbl_price, whereas I need to group by publisher only.
So, the following query, which only differs from the above in that I aggregate over model's price field rather than annotated dbl_price field, actually works:
Title.objects
.annotate(dbl_price=2*F('price'))
.values('publisher')
.annotate(sum_of_prices=Count('price'))
because the price field is in the model rather than being an annotated field, and so we don't need to include it in values to keep it in the queryset.
The question
So, here we have it: I need to include annotated property into values to keep it in the queryset, but I can't do that because values is also used for grouping (which will be wrong with an extra field). The problem essentially is due to the two very different ways that values is used in Django, depending on the context (whether or not values is followed by annotate) - which is (1) value extraction (SQL plain SELECT list) and (2) grouping + aggregation over the groups (SQL GROUP BY) - and in this case these two ways seem to conflict.
My question is: is there any way to solve this problem (without things like falling back to raw sql)?
Please note: the specific example in question can be solved by moving all annotate statements after values, which was noted by several answers. However, I am more interested in solutions (or discussion) which would keep the annotate statement(s) before values(), for three reasons: 1. There are also more complex examples, where the suggested workaround would not work. 2. I can imagine situations, where the annotated queryset has been passed to another function, which actually does GROUP BY, so that the only thing we know is the set of names of annotated fields, and their types. 3. The situation seems to be pretty straightforward, and it would surprise me if this clash of two distinct uses of values() has not been noticed and discussed before.
Update: Since Django 2.1, everything works out of the box. No workarounds needed and the produced query is correct.
This is maybe a bit too late, but I have found the solution (tested with Django 1.11.1).
The problem is, call to .values('publisher'), which is required to provide grouping, removes all annotations, that are not included in .values() fields param.
And we can't include dbl_price to fields param, because it will add another GROUP BY statement.
The solution in to make all aggregation, which requires annotated fields firstly, then call .values() and include that aggregations to fields param(this won't add GROUP BY, because they are aggregations).
Then we should call .annotate() with ANY expression - this will make django add GROUP BY statement to SQL query using the only non-aggregation field in query - publisher.
Title.objects
.annotate(dbl_price=2*F('price'))
.annotate(sum_of_prices=Sum('dbl_price'))
.values('publisher', 'sum_of_prices')
.annotate(titles_count=Count('id'))
The only minus with this approach - if you don't need any other aggregations except that one with annotated field - you would have to include some anyway. Without last call to .annotate() (and it should include at least one expression!), Django will not add GROUP BY to SQL query. One approach to deal with this is just to create a copy of your field:
Title.objects
.annotate(dbl_price=2*F('price'))
.annotate(_sum_of_prices=Sum('dbl_price')) # note the underscore!
.values('publisher', '_sum_of_prices')
.annotate(sum_of_prices=F('_sum_of_prices')
Also, mention, that you should be careful with QuerySet ordering. You'd better call .order_by() either without parameters to clear ordering or with you GROUP BY field. If the resulting query will contain ordering by any other field, the grouping will be wrong.
https://docs.djangoproject.com/en/1.11/topics/db/aggregation/#interaction-with-default-ordering-or-order-by
Also, you might want to remove that fake annotation from your output, so call .values() again.
So, final code looks like:
Title.objects
.annotate(dbl_price=2*F('price'))
.annotate(_sum_of_prices=Sum('dbl_price'))
.values('publisher', '_sum_of_prices')
.annotate(sum_of_prices=F('_sum_of_prices'))
.values('publisher', 'sum_of_prices')
.order_by('publisher')
This is expected from the way group_by works in Django. All annotated fields are added in GROUP BY clause. However, I am unable to comment on why it was written this way.
You can get your query to work like this:
Title.objects
.values('publisher')
.annotate(total_dbl_price=Sum(2*F('price'))
which produces following SQL:
SELECT publisher, SUM((2 * price)) AS total_dbl_price
FROM title
GROUP BY publisher
which just happens to work in your case.
I understand this might not be the complete solution you were looking for, but some even complex annotations can also be accommodated in this solution by using CombinedExpressions(I hope!).
Your problem comes from values() follow by annotate(). Order are important.
This is explain in documentation about [order of annotate and values clauses](
https://docs.djangoproject.com/en/1.10/topics/db/aggregation/#order-of-annotate-and-values-clauses)
.values('pub_id') limit the queryset field with pub_id. So you can't annotate on income
The values() method takes optional positional arguments, *fields,
which specify field names to which the SELECT should be limited.
This solution by #alexandr addresses it properly.
https://stackoverflow.com/a/44915227/6323666
What you require is this:
from django.db.models import Sum
Title.objects.values('publisher').annotate(tot_dbl_prices=2*Sum('price'))
Ideally I reversed the scenario here by summing them up first and then doubling it up. You were trying to double it up then sum up. Hope this is fine.

Path references in QueryDSL join to subquery

I'd like to know the best way to implement this query in SQL-style QueryDSL which joins to a subquery. I struggled a bit, but got it to generate the necessary SQL. I'm wondering if there are any simplifications/improvements, however, particularly related to the three "paths" I had to create? For example, would be great to define latestCaseId in terms of latestSubQuery.
In the simplified form of my actual query below, I am finding the set of records (fields spread across ucm and pcm) which have the latest timestamp per case group. The subquery identifies the latest timestamp per group so that we can filter the outer query by it.
final SimplePath<ListSubQuery> latestSubQueryPath = Expressions.path(ListSubQuery.class, "latest");
final SimplePath<Timestamp> caseLatestMentioned = Expressions.path(Timestamp.class, "caseLatestMentioned");
final SimplePath<Integer> latestCaseId = Expressions.path(Integer.class, "latest.caseId");
final ListSubQuery<Tuple> latest = new SQLSubQuery()
.from(ucm2)
.innerJoin(pcm2).on(ucm2.id.eq(pcm2.id))
.groupBy(pcm2.caseId)
.list(pcm2.caseId.as(latestCaseId), ucm2.lastExtracted.max().as(caseLatestMentioned));
q.from(ucm)
.join(pcm).on(ucm.id.eq(pcm.id))
.innerJoin(latest, latestSubQueryPath).on(pcm.caseId.eq(latestCaseId))
.where(ucm.lastExtracted.eq(caseLatestMentioned));
I believe you could use the .get(<various Path impls>) method of PathBuilder. The way I like to think of it is that creating final PathBuilder<Tuple> latestSubQueryPath = new PathBuilder<>(Tuple.class, "latest") and joining to it .innerJoin(latest, latestSubQueryPath) is creating an alias for the subquery. Then you can use .get(<various Path impls>) to access the fields as follows:
q.from(ucm)
.join(pcm).on(ucm.id.eq(pcm.id))
.innerJoin(latest, latestSubQueryPath).on(pcm.caseId.eq(latestSubQueryPath.get(pcm2.caseId)))
.where(ucm.lastExtracted.eq(latestSubQueryPath.get(maxLastExtractedDate)));
I've not run the code but hopefully this is in the right direction. If not, I'll have a look tomorrow when I have the relevant codebase to hand.
Update: As mentioned in the comments, ucm2.lastExtracted.max() requires an alias. I've called it maxLastExtractedDate and assume it's used to alias ucm2.lastExtracted.max() when creating the subquery.

ORMlite join and order by over three tables

I have a many to many relationship and am trying to order by the one side. So in SQL this would be:
select * from
patient join patientuserrelation on patient.id=patientuserrelation.p_id
join user on patientuserrelation.u_id=user.id
order by user.name
Which I have implemented in Ormlite as:
QueryBuilder<Visit, String> qbVisit = setupAccess(Visit.class)
.queryBuilder();
QueryBuilder<UserVisitRelation, String> qbUserVisitRelation = setupAccess(
UserVisitRelation.class).queryBuilder();
QueryBuilder<User, String> qbUser = setupAccess(User.class)
.queryBuilder();
qbUser.orderBy(sortByThisColumn, true);
qbUserVisitRelation.join(qbUser);
qbVisit.join(qbUserVisitRelation);
return qbVisit.distinct().query();
However, this does not work. The results are not ordered at all. I could try to use rawSQL and rawRowMapper but that bloat up my code.
There is a similar question here: ORMLITE order by a column from another table. Unfortunately with no answer. Is there a helpful expert around?
Ok, to answer my own question for posterity: It seems like join and order across multiple tables is not supported in ormlite 4.48. If you think about it for a while you figure out why this is probably the case. Anyway, the solution is to write a raw sql statement, only select the necessary columns WITHOUT foreign collections and cast it to your object using RawRowMapper and GenericRawResults. Not what you like to do when using an ORM, but OK.

SQL, how to pick portion of list of items via SELECT, WHERE clauses?

I'm passing three parameter to URL: &p1=eventID, &p2=firstItem, &p3=numberOfItems. first parameter is a column of table. second parameter is the first item that I'm looking for. Third parameter says pick how many items after firstItem.
for example in first query user send &p1=1, &p2=0, &p3=20. Therefore, I need first 20 items of list.
Next time if user sends &p1=1, &p2=21, &p3=20, then I should pass second 20 items (from item 21 to item 41).
PHP file is able to get parameters. currently, I'm using following string to query into database:
public function getPlaceList($eventId, $firstItem, $numberOfItems) {
$records = array();
$query = "select * from {$this->TB_ACT_PLACES} where eventid={$eventId}";
...
}
Result is a long list of items. If I add LIMIT 20 at the end of string, then since I'm not using Token then result always is same.
how to change the query in a way that meets my requirement?
any suggestion would be appreciated. Thanks
=> update:
I can get the whole list of items and then select from my first item to last item via for(;;) loop. But I want to know is it possible to do similar thing via sql? I guess this way is more efficient way.
I would construct the final query to be like this:
SELECT <column list> /* never use "select *" in production! */
FROM Table
WHERE p1ColumnName >= p2FirstItem
ORDER BY p1ColumnName
LIMIT p3NumberOfItems
The ORDER BY is important; according to my reading of the documentation, PostGreSql won't guarantee which rows you get without it. I know Sql Server works much the same way.
Beyond this, I'm not sure how to build this query safely. You'll need to be very careful that your method for constructing that sql statement does not leave you vulnerable to sql injection attacks. At first glance, it looks like I could very easily put a malicious value into your column name url parameter.
If you are using mysql, then use the LIMIT keyword.
SELECT *
FROM Table
LIMIT $firstItem, $numberOfItems
See the documentation here (search for LIMIT in the page).
I found my answer here: stackoverflow.com/a/8242764/513413
For other's reference, I changed my query based on what Joel and above link said:
$query = "select places.title, places.summary, places.description, places.year, places.date from places where places.eventid = $eventId order by places.date limit $numberOfItems offset $firstItem";

Counting occurence of each distinct element in a table

I am writing a log viewer app in ASP.NET / C#. There is a report window, where it will be possible to check some information about the whole database. One kind of information there I want to display on the screen is the number of times each generator (an entity in my domain, not Firebirds sequence) appears in the table. How do I do that using COUNT ?
Do I have to :
Gather the key for each different generator
Run one query for each generator key using count
Display it somehow
Is there any way that I can do it without having to do two queries to the database? The database size can be HUGE, and having to query it "X" times where "X" is the number of generators would just suck.
I am using a Firebird database, is there any way to fetch this information from any metadata schema or there is no such thing available?
Basically, what I want is to count each occurrence of each generator in the table. Result would be something like : GENERATOR A:10 times,GENERATOR B:7 Times,Generator C:0 Times and so on.
If I understand your question correctly, it is a simple matter of using the GROUP BY clause, e.g.:
select
key,
count(*)
from generators
group by key;
Something like the query below should be sufficient (depending on your exact structure and requirements)
SELECT KEY, COUNT(*)
FROM YOUR_TABLE
GROUP BY KEY
I solved my problem using this simple Query:
SELECT GENERATOR_,count(*)
FROM EVENTSGENERAL GROUP BY GENERATOR_;
Thanks for those who helped me.
It took me 8 hours to come back and post the answer,because of the StackOverflow limitation to answer my own questions based in my reputation.